What is a Vector Database?
What is a Vector Database?
A vector database is a specialized database that stores and searches numerical representations (vectors/embeddings) of data. Unlike traditional databases that match exact values, vector databases find similar content—essential for AI applications like semantic search, RAG, and recommendations.
Quick Answer
When you convert text, images, or any data into vectors (lists of numbers), a vector database can quickly find the most similar items. This powers ChatGPT plugins that search your documents, product recommendations, and any AI that needs to “remember” or “find related” information.
How It Works
Traditional Database
Query: "machine learning"
→ Finds: Documents containing "machine learning" (exact match)
→ Misses: Documents about "AI algorithms" (same topic, different words)
Vector Database
Query: "machine learning" → Convert to [0.23, -0.41, 0.67, ...]
→ Finds: All documents with similar meaning
→ Includes: "AI algorithms", "neural networks", "deep learning"
The Technical Basics
What are Vectors/Embeddings?
Numerical representations that capture meaning:
"cat" → [0.23, 0.87, -0.12, 0.45, ...] # 1536 dimensions
"dog" → [0.21, 0.85, -0.15, 0.42, ...] # Similar = close vectors
"car" → [-0.32, 0.11, 0.78, -0.23, ...] # Different = distant vectors
How Similarity Search Works
Vector databases use algorithms (like HNSW or IVF) to efficiently find nearest neighbors:
Query vector: [0.22, 0.86, -0.13, 0.44, ...]
Result: "dog" (distance: 0.02), "cat" (distance: 0.03), "pet" (distance: 0.05)
Popular Vector Databases
| Database | Type | Best For | Pricing |
|---|---|---|---|
| Pinecone | Managed | Production, simplicity | Free tier, $70/mo+ |
| Weaviate | Open-source | Full-featured, hybrid | Free (self-hosted) |
| Chroma | Open-source | Local dev, Python | Free |
| Qdrant | Open-source | Performance | Free (self-hosted) |
| Milvus | Open-source | Enterprise scale | Free (self-hosted) |
| pgvector | Extension | PostgreSQL users | Free |
Use Cases
1. RAG (Retrieval-Augmented Generation)
Give LLMs access to your documents:
User: "What's our refund policy?"
→ Vector search finds relevant policy docs
→ LLM generates answer from those docs
2. Semantic Search
Find content by meaning, not keywords:
Search: "affordable apartments near transit"
→ Finds: "budget-friendly condos close to subway"
3. Recommendation Systems
Find similar items:
User liked: Product A
→ Find products with similar embeddings
→ Recommend: Products B, C, D
4. Duplicate Detection
Find near-duplicate content:
New article submitted
→ Compare embedding to existing articles
→ Flag potential duplicates
5. Image Search
Search images by description:
Query: "sunset over mountains"
→ Find images with matching visual embeddings
Simple Example
import chromadb
# Create database
client = chromadb.Client()
collection = client.create_collection("my_docs")
# Add documents (Chroma auto-generates embeddings)
collection.add(
documents=[
"Machine learning is a subset of AI",
"Neural networks process information",
"Cats are popular pets"
],
ids=["doc1", "doc2", "doc3"]
)
# Semantic search
results = collection.query(
query_texts=["artificial intelligence"],
n_results=2
)
# Returns: doc1 and doc2 (both about AI)
# NOT doc3 (about cats)
Vector Database vs Traditional Database
| Feature | Vector DB | Traditional DB |
|---|---|---|
| Search type | Similarity | Exact match |
| Data format | Vectors (floats) | Structured rows |
| Query | Find nearest | WHERE clause |
| Use case | AI/ML | Transactions |
| Scalability | High-dimensional | Relational |
Key Concepts
- Embedding: The vector representation of data
- Dimension: Number of values in a vector (e.g., 1536)
- Distance metric: How similarity is measured (cosine, euclidean)
- Index: Data structure for fast nearest-neighbor search
- ANN (Approximate Nearest Neighbor): Trade accuracy for speed
Getting Started
- Choose a database: Chroma for local dev, Pinecone for production
- Pick an embedding model: OpenAI, Cohere, or open-source
- Embed your data: Convert documents to vectors
- Store and index: Add vectors to the database
- Query: Search by similarity
Related Questions
Last verified: 2026-03-06