Amgix - Open-Source Hybrid Search System
Amgix (pronounced a-MAG-ix) - short for Amalgam Index
amalgam: a mixture or blend of different elements
Amgix is an open-source system that handles ingestion, embedding, and hybrid retrieval behind one REST API. You do not need to stitch together queues, a vector database, and ranking or fusion logic in your application.
The same product is meant to be straightforward for developers to integrate, for operators to run, and for end users to get relevant results on real-world data.
-
For developers
Today, hybrid search often means gluing together an embedding service, a broker, a vector database, and fusion logic in your app. Amgix replaces that with one REST API and a single system behind it.
- Zero glue code:
POSTa document; Amgix handles queueing, deduplication, distributed locking, retries, and embeddings internally. - Server-side fusion: Dense vectors, sparse models (e.g. SPLADE), and keyword tokens on one collection; search and fuse on the server.
- Zero glue code:
-
For operators
- Single container to distributed: Amgix One (see below) runs everything in one container; when you need more, scale out into API, ingestion, query, communication, and storage tiers.
- Autonomous MLOps: Encoder workers self-orchestrate and route work by capacity and demand - you do not pin models to machines by hand.
- Your databases: No dedicated vector DB required on day one - async ingestion and vector storage can run on PostgreSQL or MariaDB you already operate. For maximum scale, point Amgix at Qdrant; the API stays the same.
-
For your end users
Most search engines use tokenizers designed for natural language. By default they strip punctuation and short tokens - exactly the characters that make part numbers, SKUs, and identifiers unique.
Amgix ships WMTR (Weighted Multilevel Token Representation): multiple lexical views at once (surface form, normalized language-aware tokens, character-level patterns), combined into one sparse representation so messy enterprise text still retrieves well.
The full pipeline - from ingestion through embedding to fused ranking - delivers typeahead-level latency on large corpora and your users get both keyword precision and semantic relevance in every query; see benchmarks.
-
Try it: Amgix One bundles everything you need in a single container:
docker run -d -p 8234:8234 -v <path/on/host>:/data amgixio/amgix-one:1Full walkthrough: Getting started.