Step-by-Step Guide: Building AI Chatbots Using Langchain and RAG Techniques

In this guide, we’ll walk you through building an AI assistant that truly understands you and can answer questions about you. We’ll be using Retrieval Augmented Generation (RAG), a powerful technique that helps your AI assistant provide reliable answers based on your data. It's an excellent project and has a good scope for learning and adding more features to it. By the end, you’ll have a practical personal AI assistant and a solid understanding of how RAG works.

Understanding RAG with Example

Remember those English exams from school? You’d get a paragraph and have to answer questions based on it. Without that paragraph, you’d be lost because these weren’t general knowledge questions — you needed that specific text to find the answers. This is similar to how RAG works! Just like we needed that paragraph in our exam, AI models need context about us to answer questions about our lives or work. Even though they have been trained with the vast amount of data without knowing about us, they’ll either say “I don’t know” or make stuff up.

The Building Blocks of RAG

**Vector Embeddings: Converting Words to Numbers
**Computers only understand numbers, not words. That’s where embeddings come in:

Text gets broken down into tokens (words or sub-words)
Each token gets converted into an array of numbers
These number arrays (vectors) capture the meaning of your text
Different models use different array sizes for these embeddings

**Chunking: Breaking Down Your Data
**Think about it — if you have 10 pages about yourself, passing all of that to an AI every time would be

Expensive (because AI models charge by how many words they process)
Not be efficient (like giving someone your whole life story when they just asked where you went to college)
Make the generation process slower

So we need to break large documents into smaller, meaningful sections. Here in our case, I have created a markdown file and structured about myself through headers, so we can split them by headers (##). Each chunk should be self-contained enough to make sense. The chunking strategy affects both retrieval quality and cost.

**Vector Databases
**Vector databases are built specifically for handling embeddings — those arrays of numbers that represent meaning. Here’s what makes them special:

— Data is stored as high-dimensional vectors (like arrays of 1,536 numbers for some models) and varies based on models. Each vector represents the “meaning” of a chunk of text

— Instead of looking for matching words, they use something called “cosine similarity” or “Euclidean distance” to measure how “close” vectors are to each other. The closer the two vectors are, the more similar their meanings

— Uses Approximate Nearest Neighbor (ANN) algorithms to speed up search time for large datasets by trading off accuracy.

**Example
**Let’s say you have these chunks in your database:

Chunk 1: "I worked at Google as a senior developer"
Chunk 2: "My cat likes to sleep all day"
Chunk 3: "I've built multiple large-scale web applications"

When someone asks: “What’s your programming background?”

Their question gets turned into a vector
The vector DB quickly finds which vectors are closest
It would return Chunks 1 and 3 because their meaning is close to the question
Chunk 2 would be ignored because its vector is very different

**
**Let’s talk about why we need special databases for RAG and how they’re different from regular databases like MySQL or MongoDB. Think about how you search in a normal database:

In MySQL, you might search “WHERE name = ‘John’” — it’s exact matching
Or maybe “WHERE description LIKE ‘%programming%’” — it looks for specific words
These work great for finding exact matches or simple text patterns

In the case of RAG, we’re not looking for exact matches. We’re looking for text that means similar things. For example:

Query: “What’s your experience with coding?”
Relevant text: “I’ve been developing software for 5 years”

These sentences don’t share many words, but they’re talking about the same thing. Regular databases can’t handle this kind of “meaning-based” search.

Our Technology Stack

Frontend: Next.js
Backend: NodeAs .js with LangChain
Vector Store: In-memory implementation
AI Model: Gemini

Before we begin you can get the Gemini API key for https://aistudio.google.com/

For this guide, we’ll be using the Gemini-1.5 Pro model. It has a free tier with rate limits — it’s more than enough to get started and build a fully functional AI assistant. So nothing is stopping you from diving in and creating something awesome!

Setting Up the Project

Initialize a Node.Js typescript project, first let's create a file having some details about me I will create it as a markdown for now later we will try using a different file format like PDF. We will have structured content with headers accordingly.

Example

## 👋 Introduction
Hi there! I'm Marish, a full-stack web developer with a passion for creating elegant, user-friendly solutions to complex problems. I specialize in modern web technologies and have 5 years of experience building scalable applications.

## 💼 Professional Summary
- Full-stack web developer with expertise in modern JavaScript frameworks
- Proven track record of delivering high-performance web applications
- Strong focus on clean code and best practices
- Experience with agile development methodologies
- Committed to continuous learning and staying current with industry trends

## 🛠️ Technical Skills

### Frontend Development
- HTML5, CSS3, JavaScript (ES6+)
- React.js, Next.js
- TypeScript
- Responsive Design
- UI/UX Best Practices
- Testing (Jest, React Testing Library)

### Backend Development
- Node.js, Express.js
- Python, Django
- RESTful APIs
- GraphQL
- Database Design

### Database Technologies
- MongoDB
- PostgreSQL
- MySQL
- Redis

### DevOps & Tools
- Git, GitHub
- Docker
- AWS/Azure
- CI/CD (Jenkins, GitHub Actions)
- Linux/Unix

## 🚀 Featured Projects

### Project 1: E-Commerce Platform
**Technologies**: React, Node.js, MongoDB, AWS
- Developed a full-stack e-commerce solution with real-time inventory management
- Implemented secure payment processing using Stripe
- Achieved 99.9% uptime and 2-second average page load time

### Project 2: Content Management System
**Technologies**: Vue.js, Django, PostgreSQL
- Built a customizable CMS serving 10,000+ daily users
- Implemented role-based access control and content workflow
- Reduced content publishing time by 60%

### Project 3: Real-time Analytics Dashboard
**Technologies**: React, Socket.io, Express, Redis
- Created a real-time dashboard for monitoring system metrics
- Implemented WebSocket connections for live data updates
- Optimized performance for handling 1M+ daily data points

## 📈 Professional Experience

### Senior Web Developer | Tech Solutions Inc.
*January 2022 - Present*
- Lead developer for client-facing web applications
- Mentored junior developers and conducted code reviews
- Implemented automated testing, improving code coverage by 40%
- Reduced deployment time by 50% through CI/CD optimization

### Full Stack Developer | Digital Innovations Co.
*March 2020 - December 2021*
- Developed and maintained multiple client websites
- Implemented responsive designs and mobile-first approaches
- Collaborated with UX team to improve user engagement by 35%

## 🎓 Education

### Bachelor of Science in Computer Science
- Graduated with Honors
- Focus on Web Technologies and Software Engineering
- Relevant Coursework: Data Structures, Algorithms, Web Development

## 📜 Certifications
- AWS Certified Developer Associate
- MongoDB Certified Developer
- Google Cloud Platform Fundamentals

## 💡 Additional Skills
- Strong problem-solving abilities
- Excellent communication skills
- Team leadership experience
- Agile project management
- Performance optimization

## 🤝 Let's Connect
- GitHub: [CodeWithMarish Github](https://github.com/codewithmarish)
- Email: your.email@example.com
- Portfolio Website: portfolio.codewithmarish.com

## 📱 Contact Information
Feel free to reach out for collaborations or opportunities:
- Phone: (XXX) XXX-XXXX
- Best contact hours: 9 AM - 6 PM EST
- Available for remote work

## 🎯 Career Objectives
Seeking opportunities to:
- Lead challenging web development projects
- Contribute to open-source communities
- Mentor upcoming developers
- Work with cutting-edge technologies

Let's install the dependencies we need, for running the app we used nodemon

npm i langchain cors dotenv express @langchain/core @langchain/google-genai

We will setup the express app, necessary imports from langchain and initialize the Gemini AI model

import { ChatPromptTemplate } from "@langchain/core/prompts";
import {
  ChatGoogleGenerativeAI,
  GoogleGenerativeAIEmbeddings,
} from "@langchain/google-genai";
import cors from "cors";
import "dotenv/config";
import express from "express";
import { MarkdownTextSplitter } from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import fs from "node:fs";

const app = express();

app.use(express.json());
app.use(cors());

const model = new ChatGoogleGenerativeAI({
  model: "gemini-pro"
});

Next, we will initialize Google’s Embedding model (text-embedding-004) which we will use to convert text into numbers (vectors). Think of it as turning words into a list of numbers that capture their meaning.

...
const embeddings = new GoogleGenerativeAIEmbeddings({
   model: "text-embedding-004",
   apiKey: process.env.GOOGLE_API_KEY,
});

Initialize the vector store, and create simple storage in computer memory to hold our embeddings. The createEmbedding function reads the markdown file and uses langchain MardownTextSplitter to split the content into smaller chunks upto 500 characters and separator used here will mark down headers and store them in in-memory vector storage

...
const store = new MemoryVectorStore(embeddings);
async function createEmbedding() {
   const text = fs.readFileSync("about.md", "utf8");
   const splitter = new MarkdownTextSplitter({ chunkSize:500 });
   const output = await splitter.splitText(text);
   const docs = await splitter.createDocuments(output);
   await store.addDocuments(docs);
}

Now The RAG Process (in the chat endpoint)

Takes the user’s question
Uses similaritySearchWithScore to find the 2 most relevant chunks of information from our storage
Combines these chunks into one context
Sends the user’s question AND this relevant context to the AI
The AI uses this context to give an accurate answer about the developer

....

app.post("/chat", async (req, res) => { const { message } = req.body; const contexts = await store.similaritySearchWithScore(message, 2); console.log(contexts);

let finalContext = ""; contexts.forEach((context) => { finalContext += context[0].pageContent; });

const response = await chain.invoke({ input: message, context: finalContext, }); res.send({ data: response.content }); });

app.listen(3001, async () => { console.log("Server started on port 3001"); await createEmbedding(); });

The flow is like:

The user asks: “Where did you work before?”
Code searches through stored embeddings to find chunks mentioning work history
Finds relevant chunks and adds them as context
AI reads both the question and this context to give an accurate answer

We also get the score of the relevant chunks if needed we can set a threshold like if the score > 0.5 then use it as a context.

import { ChatPromptTemplate } from "@langchain/core/prompts";
import {
  ChatGoogleGenerativeAI,
  GoogleGenerativeAIEmbeddings,
} from "@langchain/google-genai";
import cors from "cors";
import "dotenv/config";
import express from "express";
import { MarkdownTextSplitter } from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import fs from "node:fs";

const app = express();

app.use(express.json());
app.use(cors());

const model = new ChatGoogleGenerativeAI({
  model: "gemini-pro",
  maxOutputTokens: 2048,
});

const embeddings = new GoogleGenerativeAIEmbeddings({
  model: "text-embedding-004",
  apiKey: process.env.GOOGLE_API_KEY,
});

const store = new MemoryVectorStore(embeddings);

const prompt = ChatPromptTemplate.fromMessages([
  [
    "system",
    "You are a chatbot, your task is to provide information about the developer based on the user query. Only answer questions based on the context provided. If question is not related the developer information, say 'I don't know'.",
  ],

  ["assistant", "Context:\n {context}"],
  ["human", "{input}"],
]);

const chain = prompt.pipe(model);

async function createEmbedding() {
  const text = fs.readFileSync("about.md", "utf8");
  const splitter = new MarkdownTextSplitter();
  const output = await splitter.splitText(text);
  const docs = await splitter.createDocuments(output);
  await store.addDocuments(docs);
}

app.post("/chat", async (req, res) => {
  const { message } = req.body;
  const contexts = await store.similaritySearchWithScore(message, 2);
  console.log(contexts);

  let finalContext = "";
  contexts.forEach((context) => {
    finalContext += context[0].pageContent;
  });

  const response = await chain.invoke({
    input: message,
    context: finalContext,
  });
  res.send({ data: response.content });
});

app.listen(3001, async () => {
  console.log("Server started on port 3001");
  await createEmbedding();
});

In the .env we add your gemini key

GOOGLE_API_KEY=YOUR_API_KEY

Now for frontend, we will have the NextJS project initialized

npx create-next-app dev-chatbot

For UI I will use shadcn

npx shadcn@latest init

For markdown display, we will use react-markdown and remark-gfm plugin for formatting.

npm i react-markdown remark-gfm

We will start with setting up server actions to make calls to our API

"use server";

export const generateResponse = async (message: string) => {
  const res = await fetch("http://localhost:3001/chat", {
    body: JSON.stringify({
      message,
    }),
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
  });
  const data = await res.json();
  console.log(data);
  return data.data;
};

In our app.tsx, let’s have a state for storing all the messages and message for input. Ref for messages container which will be used for scrolling to the bottom when a new message gets added and useTransition for performing non-blocking updates.

"use client";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { generateResponse } from "@/lib/actions";
import { cn } from "@/lib/utils";
import { Forward, Sparkles } from "lucide-react";
import { useEffect, useRef, useState, useTransition } from "react";
import ReactMarkdown from "react-markdown";
import remarkGfm from "remark-gfm";

export default function Home() {
  const [messages, setMessages] = useState<{ role: string; message: string }[]>([]);
  const [message, setMessage] = useState("");
  const messagesRef = useRef<HTMLDivElement>(null);
  const [isPending, startTransition] = useTransition();
  ...
}

Next, we have the message submit function, which adds a new message along with thinking to display it temporarily until AI generates a response. Then call our action to make a call to API during this process isPending from useTransition will be set to true. Once completed we remove the Thinking message and append the AI-generated response and clear input message.

...
const handleSubmit = async (e: React.FormEvent<HTMLFormElement>) => {
    e.preventDefault();
    if (message.trim() === "") return;
   
    let newMessages = [...messages];
    newMessages.push({ role: "human", message });
    newMessages.push({ role: "assistant", message: "Thinking..." });
    setMessages(newMessages);
    
    startTransition(async () => {
      const content: string = await generateResponse(newMessages, message);
      setMessages((prev) => [
        ...prev.slice(0, -1),
        { role: "assistant", message: content},
      ]);
      setMessage("");
    });
};
...

We have useEffect with side effects for messages, here using messages container ref we scroll to the bottom.

useEffect(() => {
    if (messagesRef.current) {
      messagesRef.current.scrollTop = messagesRef.current.scrollHeight;
    }
}, [messages]);

Putting all those things in UI and adding some styles using tailwindcss.

...
return (
    <div className="h-full relative overflow-auto" ref={messagesRef}>
      <header className="flex sticky top-0 backdrop-blur-sm py-3 flex-col gap-6 items-center justify-center">
        <h1 className="text-xl font-semibold">ChatBot</h1>
      </header>
      <main
        style={{
          height: "calc(100% - 52px)",
        }}
        className="max-w-screen-sm mx-auto flex flex-col p-4 gap-2"
      >
        <div className="flex-1 flex flex-col gap-4">
          {messages.map((chat, index) => (
            <div
              key={index}
              className={cn("flex gap-1", {
                "self-end ml-16": chat.role === "human",
              })}
            >
              {chat.role == "assistant" && (
                <div
                  className={cn(
                    "rounded-full shrink-0 flex items-center justify-center w-8 h-8 border"
                  )}
                >
                  <Sparkles size={16} />
                </div>
              )}
              <div className="p-2 rounded-xl bg-slate-50 border">
                <ReactMarkdown
                  className={
                    "prose-code:text-sm prose-h1:text-xl prose-h2:text-lg prose-h3:text-base prose-p:text-sm prose-a:text-blue-500 prose-strong:font-semibold prose-li:list-disc prose-li:text-sm prose-ul:list-disc prose-ul:ml-4 prose-ol:list-decimal prose-ol:ml-4"
                  }
                  remarkPlugins={[remarkGfm]}
                >
                  {chat.message}
                </ReactMarkdown>
              </div>
            </div>
          ))}
        </div>
        <form
          onSubmit={handleSubmit}
          className="flex w-full items-center gap-4 bottom-0 py-4  sticky backdrop-blur-sm"
        >
          <Input
            type="text"
            disabled={isPending}
            value={message}
            onChange={(e) => setMessage(e.target.value)}
            className="w-full resize-none sticky bottom-4 bg-gray-100 rounded-lg p-4 text-black"
            placeholder="Type your message here..."
          />

          <Button
            disabled={isPending}
            type="submit"
            size={"icon"}
            variant={"secondary"}
            className=" bg-gray-100 rounded-lg p-4 text-black"
          >
            <Forward size={16} />
          </Button>
        </form>
      </main>
    </div>
  );

The App looks like this

It works well for simple questions, but for complex or rephrased queries, it may respond with “I don’t know” due to the current retrieval process and data chunking method. However, this is just the beginning — we’ll dive deeper into RAG concepts and explore ways to enhance the retrieval process for more accurate and insightful responses. I have used basic prompts you can try out different prompts and adjust the response style.

Thanks for reading, and Happy Coding!

Step-by-Step Guide: Building AI Chatbots Using Langchain and RAG Techniques

Understanding RAG with Example

The Building Blocks of RAG

Our Technology Stack

Setting Up the Project

Related Posts

Building an OpenAI-Powered Product Review Analysis Web App: A Step-by-Step Guide

Build a Resume Scanner with OpenAI, Node JS & Next JS: A Step-by-Step Tutorial