AI agents · OpenClaw · self-hosting · automation

Quick Answer

What is a Vector Database?

Published: • Updated:

What is a Vector Database?

A vector database is a specialized database that stores and searches numerical representations (vectors/embeddings) of data. Unlike traditional databases that match exact values, vector databases find similar content—essential for AI applications like semantic search, RAG, and recommendations.

Quick Answer

When you convert text, images, or any data into vectors (lists of numbers), a vector database can quickly find the most similar items. This powers ChatGPT plugins that search your documents, product recommendations, and any AI that needs to “remember” or “find related” information.

How It Works

Traditional Database

Query: "machine learning"
→ Finds: Documents containing "machine learning" (exact match)
→ Misses: Documents about "AI algorithms" (same topic, different words)

Vector Database

Query: "machine learning" → Convert to [0.23, -0.41, 0.67, ...]
→ Finds: All documents with similar meaning
→ Includes: "AI algorithms", "neural networks", "deep learning"

The Technical Basics

What are Vectors/Embeddings?

Numerical representations that capture meaning:

"cat" → [0.23, 0.87, -0.12, 0.45, ...]  # 1536 dimensions
"dog" → [0.21, 0.85, -0.15, 0.42, ...]  # Similar = close vectors
"car" → [-0.32, 0.11, 0.78, -0.23, ...]  # Different = distant vectors

How Similarity Search Works

Vector databases use algorithms (like HNSW or IVF) to efficiently find nearest neighbors:

Query vector: [0.22, 0.86, -0.13, 0.44, ...]
Result: "dog" (distance: 0.02), "cat" (distance: 0.03), "pet" (distance: 0.05)
DatabaseTypeBest ForPricing
PineconeManagedProduction, simplicityFree tier, $70/mo+
WeaviateOpen-sourceFull-featured, hybridFree (self-hosted)
ChromaOpen-sourceLocal dev, PythonFree
QdrantOpen-sourcePerformanceFree (self-hosted)
MilvusOpen-sourceEnterprise scaleFree (self-hosted)
pgvectorExtensionPostgreSQL usersFree

Use Cases

1. RAG (Retrieval-Augmented Generation)

Give LLMs access to your documents:

User: "What's our refund policy?"
→ Vector search finds relevant policy docs
→ LLM generates answer from those docs

Find content by meaning, not keywords:

Search: "affordable apartments near transit"
→ Finds: "budget-friendly condos close to subway"

3. Recommendation Systems

Find similar items:

User liked: Product A
→ Find products with similar embeddings
→ Recommend: Products B, C, D

4. Duplicate Detection

Find near-duplicate content:

New article submitted
→ Compare embedding to existing articles
→ Flag potential duplicates

Search images by description:

Query: "sunset over mountains"
→ Find images with matching visual embeddings

Simple Example

import chromadb

# Create database
client = chromadb.Client()
collection = client.create_collection("my_docs")

# Add documents (Chroma auto-generates embeddings)
collection.add(
    documents=[
        "Machine learning is a subset of AI",
        "Neural networks process information",
        "Cats are popular pets"
    ],
    ids=["doc1", "doc2", "doc3"]
)

# Semantic search
results = collection.query(
    query_texts=["artificial intelligence"],
    n_results=2
)

# Returns: doc1 and doc2 (both about AI)
# NOT doc3 (about cats)

Vector Database vs Traditional Database

FeatureVector DBTraditional DB
Search typeSimilarityExact match
Data formatVectors (floats)Structured rows
QueryFind nearestWHERE clause
Use caseAI/MLTransactions
ScalabilityHigh-dimensionalRelational

Key Concepts

  • Embedding: The vector representation of data
  • Dimension: Number of values in a vector (e.g., 1536)
  • Distance metric: How similarity is measured (cosine, euclidean)
  • Index: Data structure for fast nearest-neighbor search
  • ANN (Approximate Nearest Neighbor): Trade accuracy for speed

Getting Started

  1. Choose a database: Chroma for local dev, Pinecone for production
  2. Pick an embedding model: OpenAI, Cohere, or open-source
  3. Embed your data: Convert documents to vectors
  4. Store and index: Add vectors to the database
  5. Query: Search by similarity

Last verified: 2026-03-06