Member-only story

Building an End-to-End Retrieval-Augmented Generation (RAG) Chatbot with Databricks

Naveen Kumar
11 min readJun 21, 2024

The goal of this article is to leverage recent tools and techniques to implement Retrieval-Augmented Generation (RAG) in Databricks.
We’ll explore how to utilize Databricks’ serving endpoints for LLMs, different chunking strategies, open-source vector stores vs. Databricks Vector Index, and how to maintain chat history.

High level steps which are involved here are:

  • Data Preparation: Collect and preprocess text data, removing redundancy and performing exploratory data analysis (EDA).
  • Chunking and Embeddings: Split the text into manageable chunks, create embeddings, and filter out less informative chunks.
  • Implementing RAG: Develop a question-answering system using an LLM, write prompts, and evaluate results with the RAGAs library.
  • Model Deployment and Management: Deploy the model and manage the model using MLflow.

Install libraries

%pip install tiktoken==0.5.2 faiss-cpu==1.7.4 langchain --upgrade mlflow==2.9.2 databricks-genai-inference spacy --upgrade databricks-sdk --upgrade
%sh
python -m spacy download en_core_web_sm
from langchain.vectorstores import FAISS
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from transformers import AutoTokenizer, pipeline
from langchain import HuggingFacePipeline
from langchain.chains…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Naveen Kumar
Naveen Kumar

Written by Naveen Kumar

Full Stack Data Scientist at Bosch

No responses yet

Write a response