Member-only story

Building an End-to-End Retrieval-Augmented Generation (RAG) Chatbot with Databricks

Naveen Kumar
11 min readJun 21, 2024

--

The goal of this article is to leverage recent tools and techniques to implement Retrieval-Augmented Generation (RAG) in Databricks.
We’ll explore how to utilize Databricks’ serving endpoints for LLMs, different chunking strategies, open-source vector stores vs. Databricks Vector Index, and how to maintain chat history.

High level steps which are involved here are:

  • Data Preparation: Collect and preprocess text data, removing redundancy and performing exploratory data analysis (EDA).
  • Chunking and Embeddings: Split the text into manageable chunks, create embeddings, and filter out less informative chunks.
  • Implementing RAG: Develop a question-answering system using an LLM, write prompts, and evaluate results with the RAGAs library.
  • Model Deployment and Management: Deploy the model and manage the model using MLflow.

Install libraries

%pip install tiktoken==0.5.2 faiss-cpu==1.7.4 langchain --upgrade mlflow==2.9.2 databricks-genai-inference spacy --upgrade databricks-sdk --upgrade
%sh
python -m spacy download en_core_web_sm
from langchain.vectorstores import FAISS
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from transformers import AutoTokenizer, pipeline
from langchain import HuggingFacePipeline
from langchain.chains…

--

--

Naveen Kumar
Naveen Kumar

Written by Naveen Kumar

Full Stack Data Scientist at Bosch

No responses yet