Building AI-powered semantic search with pgvector.
Users don't remember filenames. They remember what's inside the file. We built a search system that understands file contents and lets users chat with their documents.
Filename search doesn't scale.
Traditional file storage relies on users remembering filenames and folder structures. Once you have hundreds of files, that breaks down. Users couldn't find what they needed — even when they knew the content existed somewhere in their library.
Lost context
Users upload a contract, a report, or meeting notes — then can't find it three weeks later because they forgot what they named it.
No content awareness
Searching for 'Q3 revenue' returns nothing if the file is called 'board-deck-final-v2.pdf'. The search doesn't know what's inside.
Growing libraries
As storage grows, the gap between 'I know it's here' and 'I can find it' widens. Manual organization can't keep up.
Understand every file, search by meaning.
We built a pipeline that extracts content from uploaded files, generates semantic embeddings using OpenAI, stores them in PostgreSQL with pgvector, and matches search queries by meaning — not just keywords.
Content Extraction
When a file is uploaded, a background worker extracts its text content. PDFs go through Unstructured.io for parsing, preserving structure and meaning.
Chunking & Embedding
Extracted text is split into semantic chunks, then each chunk is embedded using OpenAI's text-embedding-3-small model — producing 1536-dimensional vectors.
Vector Storage
Embeddings are stored in PostgreSQL using pgvector with HNSW indexing. This enables sub-second similarity search across thousands of documents.
Hybrid Search + Chat
Search queries hit both traditional text search and vector similarity search. The RAG chat system streams answers using GPT-4o-mini with source attribution.
Why we made the choices we made.
Every technical decision in this pipeline was made to balance cost, performance, and simplicity. Here's what we chose and why.
pgvector over Pinecone
We kept embeddings in PostgreSQL instead of a separate vector database. One database to manage, one connection to maintain, and pgvector's HNSW indexing handles our scale without the operational overhead of a managed vector service.
Dedicated worker for processing
PDF parsing and embedding generation run on a dedicated 8GB worker, not the web app. This prevents memory-intensive file processing from crashing the application for all users.
Streaming chat over batch responses
Chat responses stream via Server-Sent Events from the backend API. Users see answers forming in real-time — no waiting 10-15 seconds for GPT-4o-mini to finish generating.
Hybrid search by default
We combine traditional text search with vector similarity, not just one or the other. This catches both exact keyword matches and semantic meaning in a single query.
The outcome.
Users can now find any file by describing what's inside it — and ask questions about their documents in natural language.
Vector embeddings per chunk, enabling high-fidelity semantic matching.
Hybrid search response time across thousands of stored documents.
Streaming chat responses with source attribution for every answer.
Need AI features in your product?
We build AI-powered features that solve real problems — not demos. Let's talk about what AI can do for your product.