Quick Answer
How to Use RAG with Your Documents
How to Use RAG with Your Documents
RAG (Retrieval Augmented Generation) lets AI answer questions using YOUR documents. The process: 1) Split documents into chunks, 2) Convert to embeddings and store in a vector database, 3) When querying, retrieve relevant chunks and include them in the AI prompt.
Quick Answer
RAG solves the “AI doesn’t know my stuff” problem. Instead of hoping the AI was trained on your data, you give it the relevant context at query time.
Simplest approach: Use LangChain + Pinecone + OpenAI. You can have a working RAG system in under an hour.
How RAG Works
Your Question → Find Similar Documents → Add to Prompt → AI Answers
- Embed your documents: Convert text to vectors (numbers)
- Store in vector DB: Pinecone, Qdrant, Chroma, etc.
- Query: Convert question to vector, find similar doc chunks
- Generate: Send question + relevant chunks to LLM
- Answer: AI responds using your documents as context
Quickstart: RAG in 30 Minutes
Prerequisites
pip install langchain langchain-openai chromadb
Step 1: Load Documents
from langchain_community.document_loaders import DirectoryLoader, TextLoader
# Load all .txt files from a folder
loader = DirectoryLoader('./docs/', glob="**/*.txt", loader_cls=TextLoader)
documents = loader.load()
print(f"Loaded {len(documents)} documents")
Step 2: Split into Chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Characters per chunk
chunk_overlap=200, # Overlap for context continuity
)
chunks = splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")
Step 3: Create Embeddings & Store
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
# Initialize embeddings (requires OPENAI_API_KEY env var)
embeddings = OpenAIEmbeddings()
# Store in Chroma (local vector DB)
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
print("Embeddings stored!")
Step 4: Query with RAG
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o")
# Create RAG chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # Simple: stuff all docs in prompt
retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)
# Ask questions!
response = qa_chain.invoke("What is the main topic of these documents?")
print(response)
Production Setup
For real applications, upgrade these components:
Vector Database Options
| Database | When to Use |
|---|---|
| Chroma | Prototyping (local, simple) |
| Pinecone | Production (managed, scalable) |
| Qdrant | Self-hosted, complex filtering |
| pgvector | Already using Postgres |
Embedding Models
| Model | Quality | Speed | Cost |
|---|---|---|---|
| OpenAI text-embedding-3-large | Best | Fast | $0.13/1M tokens |
| OpenAI text-embedding-3-small | Good | Fastest | $0.02/1M tokens |
| Cohere embed-v3 | Excellent | Fast | $0.10/1M tokens |
| Local (Ollama) | Good | Varies | Free |
LLM Options
| Model | Best For |
|---|---|
| GPT-4o | Quality answers |
| Claude 3.5 | Long context, nuanced |
| Llama 3.3 (local) | Privacy, free |
Best Practices
Chunking Strategy
- Chunk size: 500-1000 characters usually works
- Overlap: 10-20% overlap prevents context loss
- Semantic chunking: Split by meaning, not just length
Retrieval Optimization
- k=3-5: Start with retrieving 3-5 chunks
- Hybrid search: Combine vector + keyword search
- Reranking: Use a reranker model to improve relevance
Prompt Engineering
template = """Use the following context to answer the question.
If you don't know the answer from the context, say "I don't have that information."
Context: {context}
Question: {question}
Answer:"""
No-Code Alternatives
Don’t want to code? Use these:
| Tool | Setup Time | Best For |
|---|---|---|
| NotebookLM | 1 min | Google Docs, PDFs |
| ChatGPT + Files | 1 min | Quick analysis |
| AnythingLLM | 10 min | Self-hosted RAG |
| Dify | 15 min | Visual RAG builder |
Common Pitfalls
- Chunks too big: AI gets overwhelmed, misses details
- Chunks too small: Loses context, fragmented answers
- Wrong embedding model: Garbage in, garbage out
- Not enough retrieval: Missing relevant information
- Too much retrieval: Noise drowns out signal
Related Questions
Last verified: 2026-03-03