What is Indexer?

Indexer is a pipeline component that writes processed documents, chunks, embeddings, metadata, or sparse retrieval features into a searchable storage system for later retrieval.

How It Works

An Indexer turns transformed documents into durable retrieval assets. It may write dense vectors to a vector database, text fields to a search engine, graph relationships to a graph store, or hybrid records to several systems at once. The indexer is also an operational component: it must handle batching, upserts, deletes, retries, backpressure, versioning, and reindexing. In regulated or multi-tenant systems, it must preserve permissions and deletion semantics so retrieval does not expose stale or unauthorized content.

Key Characteristics

  • Persistence role: writes retrieval-ready artifacts into vector, search, graph, database, or hybrid storage
  • Identity management: maintains document IDs, chunk IDs, source IDs, index versions, and deduplication keys
  • Update semantics: supports upsert, delete, rebuild, incremental refresh, and rollback workflows
  • Operational resilience: must handle batching, retries, partial failures, rate limits, and backpressure
  • Governance impact: carries permissions, retention rules, and deletion requirements into the retrieval layer

Common Use Cases

  1. Writing embeddings and metadata into a vector database for RAG
  2. Maintaining a hybrid index that supports both BM25 and vector similarity
  3. Rebuilding an index after a chunking or embedding model change
  4. Deleting customer documents from all retrieval stores for compliance
  5. Tracking index versions during retrieval quality experiments

Example

loading...
Loading code...

Frequently Asked Questions

Is an Indexer the same as a vector database?

No. A vector database is a storage and search backend. An Indexer is the pipeline component that prepares records and writes them into one or more backends, including vector databases, search engines, graph stores, or custom databases.

Why does index versioning matter?

Index versioning makes quality experiments and rollbacks possible. If chunking strategy, embedding model, metadata schema, or filters change, teams need to know which index produced a retrieval result and whether it can be rebuilt.

What should happen when a source document is deleted?

The Indexer should propagate deletion to every retrieval store that contains derived chunks, embeddings, metadata, or sparse features. Leaving stale records behind can create compliance, privacy, and answer-quality problems.

How does an Indexer affect online latency?

Indexing is usually offline or asynchronous, but its choices affect online latency indirectly. Chunk size, metadata schema, index type, and hybrid-search design influence how much work the retriever must do at query time.

Related Tools

Related Terms

Related Articles