Case Study — disposal.space

Building AI-powered semantic search with pgvector.

Users don't remember filenames. They remember what's inside the file. We built a search system that understands file contents and lets users chat with their documents.


[ The problem ]

Filename search doesn't scale.

Traditional file storage relies on users remembering filenames and folder structures. Once you have hundreds of files, that breaks down. Users couldn't find what they needed — even when they knew the content existed somewhere in their library.

Lost context

Users upload a contract, a report, or meeting notes — then can't find it three weeks later because they forgot what they named it.

No content awareness

Searching for 'Q3 revenue' returns nothing if the file is called 'board-deck-final-v2.pdf'. The search doesn't know what's inside.

Growing libraries

As storage grows, the gap between 'I know it's here' and 'I can find it' widens. Manual organization can't keep up.


[ The approach ]

Understand every file, search by meaning.

We built a pipeline that extracts content from uploaded files, generates semantic embeddings using OpenAI, stores them in PostgreSQL with pgvector, and matches search queries by meaning — not just keywords.

01

Content Extraction

When a file is uploaded, a background worker extracts its text content. PDFs go through Unstructured.io for parsing, preserving structure and meaning.

02

Chunking & Embedding

Extracted text is split into semantic chunks, then each chunk is embedded using OpenAI's text-embedding-3-small model — producing 1536-dimensional vectors.

03

Vector Storage

Embeddings are stored in PostgreSQL using pgvector with HNSW indexing. This enables sub-second similarity search across thousands of documents.

04

Hybrid Search + Chat

Search queries hit both traditional text search and vector similarity search. The RAG chat system streams answers using GPT-4o-mini with source attribution.


[ Key Decisions ]

Why we made the choices we made.

Every technical decision in this pipeline was made to balance cost, performance, and simplicity. Here's what we chose and why.

pgvector over Pinecone

We kept embeddings in PostgreSQL instead of a separate vector database. One database to manage, one connection to maintain, and pgvector's HNSW indexing handles our scale without the operational overhead of a managed vector service.

Dedicated worker for processing

PDF parsing and embedding generation run on a dedicated 8GB worker, not the web app. This prevents memory-intensive file processing from crashing the application for all users.

Streaming chat over batch responses

Chat responses stream via Server-Sent Events from the backend API. Users see answers forming in real-time — no waiting 10-15 seconds for GPT-4o-mini to finish generating.

Hybrid search by default

We combine traditional text search with vector similarity, not just one or the other. This catches both exact keyword matches and semantic meaning in a single query.


The outcome.

Users can now find any file by describing what's inside it — and ask questions about their documents in natural language.

1536-dim

Vector embeddings per chunk, enabling high-fidelity semantic matching.

< 500ms

Hybrid search response time across thousands of stored documents.

Real-time

Streaming chat responses with source attribution for every answer.


Need AI features in your product?

We build AI-powered features that solve real problems — not demos. Let's talk about what AI can do for your product.