Member-only story

Building an End-to-End Retrieval-Augmented Generation (RAG) Chatbot with Databricks

11 min readJun 21, 2024

The goal of this article is to leverage recent tools and techniques to implement Retrieval-Augmented Generation (RAG) in Databricks.
We’ll explore how to utilize Databricks’ serving endpoints for LLMs, different chunking strategies, open-source vector stores vs. Databricks Vector Index, and how to maintain chat history.

High level steps which are involved here are:

Data Preparation: Collect and preprocess text data, removing redundancy and performing exploratory data analysis (EDA).
Chunking and Embeddings: Split the text into manageable chunks, create embeddings, and filter out less informative chunks.
Implementing RAG: Develop a question-answering system using an LLM, write prompts, and evaluate results with the RAGAs library.
Model Deployment and Management: Deploy the model and manage the model using MLflow.

Install libraries

%pip install tiktoken==0.5.2 faiss-cpu==1.7.4 langchain --upgrade mlflow==2.9.2 databricks-genai-inference spacy --upgrade databricks-sdk --upgrade

%sh
python -m spacy download en_core_web_sm

from langchain.vectorstores import FAISS
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from transformers import AutoTokenizer, pipeline
from langchain import HuggingFacePipeline
from langchain.chains…

Building an End-to-End Retrieval-Augmented Generation (RAG) Chatbot with Databricks

Written by Naveen Kumar

No responses yet