Quick Answer
How to Use Vector Databases for AI Applications
How to Use Vector Databases for AI Applications
Vector databases store numerical representations (embeddings) of your data, enabling semantic search and AI memory. Use them to build RAG systems that let LLMs answer questions about your documents, products, or knowledge base.
Quick Answer
Traditional databases find exact matches. Vector databases find similar content—essential for AI applications where you need “find documents about X” rather than “find documents containing the word X.”
Step-by-Step Guide
Step 1: Choose a Vector Database
| Database | Best For | Pricing |
|---|---|---|
| Pinecone | Production apps, managed | Free tier, then $70/mo+ |
| Weaviate | Open-source, full-featured | Free (self-hosted) |
| Chroma | Local development, Python | Free |
| Qdrant | High performance, Rust | Free (self-hosted) |
| Milvus | Enterprise scale | Free (self-hosted) |
| pgvector | PostgreSQL users | Free (extension) |
Step 2: Generate Embeddings
Convert your text into vectors using an embedding model:
from openai import OpenAI
client = OpenAI()
def get_embedding(text):
response = client.embeddings.create(
input=text,
model="text-embedding-3-small"
)
return response.data[0].embedding
# Example
embedding = get_embedding("How to build an AI agent")
# Returns: [0.023, -0.041, 0.067, ...] (1536 dimensions)
Step 3: Store Vectors
Using Chroma (simplest for local development):
import chromadb
# Create client and collection
client = chromadb.Client()
collection = client.create_collection("my_docs")
# Add documents with embeddings
collection.add(
documents=["AI agents automate tasks", "RAG improves LLM accuracy"],
metadatas=[{"source": "doc1"}, {"source": "doc2"}],
ids=["id1", "id2"]
)
Using Pinecone (production):
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
index = pc.Index("my-index")
# Upsert vectors
index.upsert(
vectors=[
{"id": "doc1", "values": embedding, "metadata": {"text": "..."}}
]
)
Step 4: Query for Similar Content
# Search for similar documents
results = collection.query(
query_texts=["How do AI agents work?"],
n_results=3
)
# Returns most semantically similar documents
print(results['documents'])
Step 5: Build RAG Pipeline
Combine vector search with LLM generation:
def rag_answer(question):
# 1. Find relevant context
results = collection.query(
query_texts=[question],
n_results=3
)
context = "\n".join(results['documents'][0])
# 2. Generate answer with context
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"Answer using this context:\n{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
Common Use Cases
| Use Case | How It Works |
|---|---|
| Document Q&A | Embed docs → Query → LLM answers |
| Semantic search | Find similar content by meaning |
| Recommendation | Find similar products/content |
| Chatbot memory | Store/retrieve conversation history |
| Code search | Find similar code snippets |
Embedding Model Options
| Model | Dimensions | Cost | Quality |
|---|---|---|---|
| OpenAI text-embedding-3-small | 1536 | $0.02/1M tokens | Good |
| OpenAI text-embedding-3-large | 3072 | $0.13/1M tokens | Better |
| Cohere embed-v3 | 1024 | $0.10/1M tokens | Excellent |
| Voyage-3 | 1024 | $0.06/1M tokens | Excellent |
| all-MiniLM-L6-v2 (local) | 384 | Free | Good |
| BGE-large (local) | 1024 | Free | Very good |
Best Practices
Chunking Strategy
- Chunk size: 256-512 tokens is often optimal
- Overlap: 10-20% overlap between chunks
- Preserve context: Keep sentences intact
Metadata
Always store useful metadata:
{
"source": "document_name.pdf",
"page": 5,
"section": "Introduction",
"date": "2026-03-06"
}
Hybrid Search
Combine vector search with keyword filtering:
results = collection.query(
query_texts=["AI agents"],
where={"source": "technical_docs"}, # Filter
n_results=5
)
Scaling Considerations
| Scale | Recommended |
|---|---|
| < 10K vectors | Chroma (local), pgvector |
| 10K-1M vectors | Pinecone, Qdrant Cloud |
| 1M-100M vectors | Weaviate, Milvus |
| > 100M vectors | Custom infrastructure |
Related Questions
- What is a vector database?
- Best vector databases 2026
- Best RAG frameworks 2026
- How to build chatbots with RAG
Last verified: 2026-03-06