Amgix (pronounced
a-MAG-ix) - short for Amalgam Index
amalgam: a mixture or blend of different elements
Amgix is an open-source system that handles ingestion, embedding, and hybrid retrieval behind one REST API. You do not need to stitch together queues, a vector database, and ranking or fusion logic in your application.
Try it — Amgix One bundles everything you need in a single container:
docker run -d -p 8234:8234 -v <path/on/host>:/data amgixio/amgix-one:1Full walkthrough: Getting started.
Beyond that single-container start, Amgix scales into independently deployable API, ingestion, query, communication, and storage tiers. It natively understands messy enterprise data (part numbers, SKUs, mixed alphanumeric strings) through a custom WMTR tokenizer. Even while coordinating the full pipeline — from ingestion to embedding to fused ranking — it delivers typeahead-level latency on multi-million-document corpora (see benchmarks).
To get hybrid search working today, teams usually have to build a fragile machine: an ML embedding service, a message broker, a vector database, and custom fusion code in the application layer.
Amgix replaces all of that with a single system boundary:
POST a document.
Amgix handles the queueing, deduplication, distributed locking, retries,
and ML embeddings internally.Standard search engines are built for clean paragraphs. They aggressively strip punctuation and short numbers, which ruins searches for SKUs, part numbers, mixed-alphanumeric, and identifier-heavy data.
Amgix ships with WMTR (Weighted Multilevel Token Representation) — a custom tokenizer built specifically for “ugly” data. It represents text through multiple lexical views at once: a surface-form view that stays closer to the original tokens, a language-aware normalized view built with Unicode word boundaries, stopword filtering, and stemming, and a character-level view that captures short local patterns inside the text. Those signals are then weighted together into a single sparse representation.
You don’t even need a dedicated vector database to start. Amgix can run its entirely asynchronous ingestion queue and vector storage natively on the PostgreSQL or MariaDB instances you already operate. If you need maximum scale, simply point Amgix at a Qdrant database. The API remains exactly the same.
What we think is cool about Amgix · Why we built it