AI agents · OpenClaw · self-hosting · automation

Quick Answer

How to Use RAG with Your Documents

Published: • Updated:

How to Use RAG with Your Documents

RAG (Retrieval Augmented Generation) lets AI answer questions using YOUR documents. The process: 1) Split documents into chunks, 2) Convert to embeddings and store in a vector database, 3) When querying, retrieve relevant chunks and include them in the AI prompt.

Quick Answer

RAG solves the “AI doesn’t know my stuff” problem. Instead of hoping the AI was trained on your data, you give it the relevant context at query time.

Simplest approach: Use LangChain + Pinecone + OpenAI. You can have a working RAG system in under an hour.

How RAG Works

Your Question → Find Similar Documents → Add to Prompt → AI Answers
  1. Embed your documents: Convert text to vectors (numbers)
  2. Store in vector DB: Pinecone, Qdrant, Chroma, etc.
  3. Query: Convert question to vector, find similar doc chunks
  4. Generate: Send question + relevant chunks to LLM
  5. Answer: AI responds using your documents as context

Quickstart: RAG in 30 Minutes

Prerequisites

pip install langchain langchain-openai chromadb

Step 1: Load Documents

from langchain_community.document_loaders import DirectoryLoader, TextLoader

# Load all .txt files from a folder
loader = DirectoryLoader('./docs/', glob="**/*.txt", loader_cls=TextLoader)
documents = loader.load()

print(f"Loaded {len(documents)} documents")

Step 2: Split into Chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Characters per chunk
    chunk_overlap=200,    # Overlap for context continuity
)
chunks = splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks")

Step 3: Create Embeddings & Store

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# Initialize embeddings (requires OPENAI_API_KEY env var)
embeddings = OpenAIEmbeddings()

# Store in Chroma (local vector DB)
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

print("Embeddings stored!")

Step 4: Query with RAG

from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o")

# Create RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # Simple: stuff all docs in prompt
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)

# Ask questions!
response = qa_chain.invoke("What is the main topic of these documents?")
print(response)

Production Setup

For real applications, upgrade these components:

Vector Database Options

DatabaseWhen to Use
ChromaPrototyping (local, simple)
PineconeProduction (managed, scalable)
QdrantSelf-hosted, complex filtering
pgvectorAlready using Postgres

Embedding Models

ModelQualitySpeedCost
OpenAI text-embedding-3-largeBestFast$0.13/1M tokens
OpenAI text-embedding-3-smallGoodFastest$0.02/1M tokens
Cohere embed-v3ExcellentFast$0.10/1M tokens
Local (Ollama)GoodVariesFree

LLM Options

ModelBest For
GPT-4oQuality answers
Claude 3.5Long context, nuanced
Llama 3.3 (local)Privacy, free

Best Practices

Chunking Strategy

  • Chunk size: 500-1000 characters usually works
  • Overlap: 10-20% overlap prevents context loss
  • Semantic chunking: Split by meaning, not just length

Retrieval Optimization

  • k=3-5: Start with retrieving 3-5 chunks
  • Hybrid search: Combine vector + keyword search
  • Reranking: Use a reranker model to improve relevance

Prompt Engineering

template = """Use the following context to answer the question.
If you don't know the answer from the context, say "I don't have that information."

Context: {context}

Question: {question}

Answer:"""

No-Code Alternatives

Don’t want to code? Use these:

ToolSetup TimeBest For
NotebookLM1 minGoogle Docs, PDFs
ChatGPT + Files1 minQuick analysis
AnythingLLM10 minSelf-hosted RAG
Dify15 minVisual RAG builder

Common Pitfalls

  1. Chunks too big: AI gets overwhelmed, misses details
  2. Chunks too small: Loses context, fragmented answers
  3. Wrong embedding model: Garbage in, garbage out
  4. Not enough retrieval: Missing relevant information
  5. Too much retrieval: Noise drowns out signal

Last verified: 2026-03-03