How to Add Search to Your Application with Amgix (and Make it Hybrid in One Extra Line)
Amgix makes it easy to integrate search (keyword, semantic, or hybrid) into your application in a few easy steps.
Your Application
Let's say you have an application that manages Products (we could call them Widgets, Items, Documents, etc.), so you have a class in your application that looks something like this:
class ProductStorage:
async def add_product(self, prod: Product):
...
await db.save(prod)
async def update_product(self, prod: Product):
...
await db.save(prod)
async def delete_product(self, prod: Product):
...
await db.delete(prod)
class ProductStorage {
async addProduct(prod: Product): Promise<void> {
// ...
await db.save(prod);
}
async updateProduct(prod: Product): Promise<void> {
// ...
await db.save(prod);
}
async deleteProduct(prod: Product): Promise<void> {
// ...
await db.delete(prod);
}
}
It's time to add search:
async def search_products(self, query: str) -> List[Product]:
return ???
async searchProducts(query: string): Promise<Product[]> {
return ???;
}
This tutorial shows how you may do this with Amgix.
Your Data
First of all, let's look at what we are going to search. Say your Product has the following shape:
class Product:
id: int
title: str
description: str
last_modified: datetime
interface Product {
id: number;
title: string;
description: string;
lastModified: Date;
}
Decision Time
What fields do we want to search and how: keywords, semantic, both?
Amgix makes it incredibly easy to configure either scenario in just a couple of lines of code. The options you have out of the box are: keyword (wmtr is an alias), full_text, whitespace, trigrams, dense_model, or sparse_model. There are also dense_custom and sparse_custom options, if you want to bring your own vectors, but we'll try to keep this tutorial simple.
On the keyword side of things, the rules of thumb are these:
- if your content is natural language: use
full_text - if your content is identifier-heavy data (part numbers, SKUs, numbers, etc.): use
keywordorwmtr(the same thing) - if your content is a mix of the above or unknown: go with
wmtr
If you choose semantic search or hybrid, you just need to pick a dense or sparse (think SPLADE) model on Hugging Face to use.
Let's say, for the sake of the example, you decided that you want to search both the title and the description of the products and you want a hybrid search. Great! We'll configure it in a minute.
Start Amgix
Now let's bring in our search system. For this example, we'll use Amgix-One as it's a single container deployment and can be launched in one line:
docker run -d -p 8234:8234 -v <path/on/host>:/data amgixio/amgix-one:1
Tip
amgix-one has GPU (if you want GPU-accelerated embedding) and "no-embed" (if you are not planning on working with models) variants: amgix-one:1-gpu or amgix-one:1-noembed. The default image embeds using CPU.
Tip
Amgix-One stores data in the same exact format as full Amgix, so you can scale up later without having to re-index your data.
Your Amgix instance is running on port 8234 and you can take a look at it by visiting http://localhost:8234/dashboard
Configuring Your Collection
So we already decided that we will search the title and the description with hybrid search. We also want to use a lightweight small model for the dense similarity search: sentence-transformers/all-MiniLM-L6-v2.
Somewhere at startup of your application you would do this:
import amgix_client
...
async def productAppStartup():
...
# Defining the host is optional and defaults to http://localhost:8234
amgix_config = amgix_client.Configuration(host = "http://localhost:8234")
# Enter a context with an instance of the API client
async with amgix_client.ApiClient(amgix_config) as api_client:
# Create an instance of the API class
amgix_api = amgix_client.AmgixApi(api_client)
amgix_collection_name = "products"
try:
api_response = await amgix_api.collection_exists(amgix_collection_name)
if not api_response.exists:
collection_config = amgix_client.CollectionConfig()
collection_config.vectors = [
amgix_client.VectorConfig(type="wmtr", name="keyword", index_fields=["name", "description"]),
amgix_client.VectorConfig(type="dense_model", name="semantic", index_fields=["name", "description"],
model="sentence-transformers/all-MiniLM-L6-v2")]
await amgix_api.create_collection(amgix_collection_name, collection_config)
except ApiException as e:
print("Exception when calling AmgixApi: %s\n" % e)
import { AmgixApi, Configuration } from 'amgix-client';
// ...
async function productAppStartup(): Promise<void> {
// ...
// Defining the basePath is optional and defaults to http://localhost:8234
const amgixConfig = new Configuration({ basePath: 'http://localhost:8234' });
const amgixApi = new AmgixApi(amgixConfig);
const amgixCollectionName = 'products';
try {
const apiResponse = await amgixApi.collectionExists({ collectionName: amgixCollectionName });
if (!apiResponse.exists) {
await amgixApi.createCollection({
collectionName: amgixCollectionName,
collectionConfig: {
vectors: [
{ type: 'wmtr', name: 'keyword', index_fields: ['name', 'description'] },
{ type: 'dense_model', name: 'semantic', index_fields: ['name', 'description'],
model: 'sentence-transformers/all-MiniLM-L6-v2' },
],
},
});
}
} catch (e) {
console.error('Exception when calling AmgixApi:', e);
}
}
That's it. When your app starts it checks if the collection "products" exists and creates it if it doesn't exist with everything you need to run hybrid searches in your application.
Note
If you are not using hybrid or semantic search, just drop the VectorConfig entry from the above code. All it takes is one line of code to go from hybrid to keyword only and back.
Tip
The above configuration creates 4 vectors (2 for wmtr and 2 for semantic). If your title (name in Amgix speak) is short, identifier-heavy, or semantically meaningless text, you may not want to index it with the dense vectors; they will just add noise to your results, consume storage and slow down your search. In that case, just remove "name" from the index_fields of the "dense_model" configuration and search with 3 vectors.
Keep Index Updated
This section was updated on April 25, 2026
The original text described the delete_document endpoint as synchronous. In Amgix v1.3.0, delete_document became asynchronous; the synchronous behavior is available via the new delete_document_sync endpoint.
We have configured Amgix the way we want. It's time to integrate with our ProductStorage class:
class ProductStorage:
amgix_collection_name = "products"
amgix_config = amgix_client.Configuration(host = "http://localhost:8234")
async def saveToAmgix(self, prod: Product):
async with amgix_client.ApiClient(self.amgix_config) as api_client:
amgix_api = amgix_client.AmgixApi(api_client)
document = amgix_client.Document(
id=str(prod.id),
name=prod.title,
description=prod.description,
timestamp=prod.last_modified
)
await amgix_api.upsert_document(self.amgix_collection_name, document)
async def deleteFromAmgix(self, prod: Product):
async with amgix_client.ApiClient(self.amgix_config) as api_client:
amgix_api = amgix_client.AmgixApi(api_client)
await amgix_api.delete_document(self.amgix_collection_name, str(prod.id), datetime.now(timezone.utc))
async def searchWithAmgix(self, query: str) -> List[Product]:
async with amgix_client.ApiClient(self.amgix_config) as api_client:
amgix_api = amgix_client.AmgixApi(api_client)
search_query = amgix_client.SearchQuery(query=query, limit=10)
results = await amgix_api.search(self.amgix_collection_name, search_query)
return [
Product(
id=int(r.id),
title=r.name,
description=r.description,
last_modified=r.timestamp)
for r in results
]
async def add_product(self, prod: Product):
...
await db.save(prod)
await self.saveToAmgix(prod)
async def update_product(self, prod: Product):
...
await db.save(prod)
await self.saveToAmgix(prod)
async def delete_product(self, prod: Product):
...
await db.delete(prod)
await self.deleteFromAmgix(prod)
async def search_products(self, query: str) -> List[Product]:
return await self.searchWithAmgix(query)
import { AmgixApi, Configuration, Document, SearchQuery } from 'amgix-client';
class ProductStorage {
private readonly amgixCollectionName = 'products';
private readonly amgixApi = new AmgixApi(
new Configuration({ basePath: 'http://localhost:8234' })
);
private async saveToAmgix(prod: Product): Promise<void> {
const document: Document = {
id: String(prod.id),
name: prod.title,
description: prod.description,
timestamp: prod.lastModified,
};
await this.amgixApi.upsertDocument({ collectionName: this.amgixCollectionName, document });
}
private async deleteFromAmgix(prod: Product): Promise<void> {
await this.amgixApi.deleteDocument({
collectionName: this.amgixCollectionName,
documentId: String(prod.id),
requestTimestamp: new Date(),
});
}
private async searchWithAmgix(query: string): Promise<Product[]> {
const searchQuery: SearchQuery = { query, limit: 10 };
const results = await this.amgixApi.search({
collectionName: this.amgixCollectionName,
searchQuery,
});
return results.map(r => ({
id: Number(r.id),
title: r.name ?? '',
description: r.description ?? '',
lastModified: r.timestamp,
}));
}
async addProduct(prod: Product): Promise<void> {
// ...
await db.save(prod);
await this.saveToAmgix(prod);
}
async updateProduct(prod: Product): Promise<void> {
// ...
await db.save(prod);
await this.saveToAmgix(prod);
}
async deleteProduct(prod: Product): Promise<void> {
// ...
await db.delete(prod);
await this.deleteFromAmgix(prod);
}
async searchProducts(query: string): Promise<Product[]> {
return this.searchWithAmgix(query);
}
}
Note
In production you'd want to manage the client lifecycle differently.
Warning
Amgix expects document timestamp to be in UTC. If your last_modified date is in some other timezone format, it has to be converted to UTC first:
timestamp = prod.last_modified.astimezone(timezone.utc)
// If lastModified is a Date object, it's already UTC-compatible.
// If it's a string with timezone offset, convert it first:
timestamp: new Date(prod.lastModified)
Tip
SearchQuery can be as simple as you've seen above. It has sensible defaults. By default, it uses Reciprocal Rank Fusion (RRF) to fuse results from all available vectors and uses a default weight of 1.0 on all vectors. This works well out of the box, but SearchQuery also has a lot of properties you can set to control your search results relevancy and filtering. We won't go into these details in order to keep it simple.
Your application is now completely integrated with Amgix and is capable of returning hybrid (or not hybrid, if you went that route) search results to your users.
A few things to note about the above code:
- You may have noticed that add and update are the same thing in Amgix - upsert. So both code paths call the same
upsert_document()on Amgix API. upsert_documentis asynchronous. Meaning, the API call doesn't wait for the document to be processed by Amgix. Instead, the document is submitted to an internal Amgix queue and will be processed by backend workers. This is an eventual consistency scenario. The benefit of this is that your application doesn't have to wait for Amgix processing of the document. Your add/update logic remains almost as fast as it was before (just a quick API call). The downside is that the updated document may not appear in your search results immediately. If this is unacceptable in your application, simply replace the call withupsert_document_sync. Sync call is slower, because you have to wait on document embedding and if Amgix server is very busy it may take some extra time. But it does guarantee that your document was indexed by Amgix before you continue.delete_documentis also asynchronous and returns as soon as the request is queued. It requires therequest_timestampparameter to prevent possible race conditions with queued upsert requests for this document. We recommend settingrequest_timestampto the current time in UTC.delete_document_syncendpoint/function is also available if you need a synchronous option.
Existing Data: The Chicken and The Egg
But wait, you may say, we are not done. What about our existing data that was never indexed in Amgix?
Good point. The difficulty is usually this. Our options:
- Import existing data into Amgix (using an outside script/code) and then release the updated app with Amgix integration.
- Release the app with Amgix integration and then run a script/code to upload all the data.
Both options seem problematic because there is a time gap between the import of historical data and release of the app, no matter in what order you do it. If you import first, what happens to the data that got changed between the import and the app release? If you release first, the search is useless because most data is not in the Amgix collection yet, and won't the import after the release clobber the newly updated records?
Feels like the chicken or the egg problem.
But it's really not. Thanks to the fact that Amgix does automatic deduplication of documents on upsert, based on their timestamps, it's safe to import the older documents while the newly updated records are already in the collection. Older versions will simply be discarded by Amgix.
So instead of the chicken or the egg it turns into having your cake and eating it too. Or having both the chicken and the egg. But we digress.
So what is our sequence of actions when it comes to importing old data?
- Release the app, let newly added and updated records be indexed by Amgix. You may want to hide the search functionality until the import is done, if you feel like exposing search before all the data is in the system is a problem.
- Run import script/code on all the data from your database.
- If you previously disabled/hid your search UI/API, make it visible.
That's it. Done.
But wait, what does this external import of existing data look like? You can simply use the same upsert_document API calls as you've seen before. But it's much better to use bulk upsert functionality of Amgix. So your code to import data may look something like this:
while True:
batch_of_products = await db.read_next_batch()
if not batch_of_products:
break
amgix_batch = []
for prod in batch_of_products:
document = amgix_client.Document(
id=str(prod.id),
name=prod.title,
description=prod.description,
timestamp=prod.last_modified
)
amgix_batch.append(document)
documents = amgix_client.BulkUploadRequest(documents = amgix_batch)
await amgix_api.upsert_documents_bulk(amgix_collection_name, documents)
while (true) {
const batchOfProducts: Product[] = await db.readNextBatch();
if (!batchOfProducts.length) {
break;
}
const amgixBatch: Document[] = batchOfProducts.map(prod => ({
id: String(prod.id),
name: prod.title,
description: prod.description,
timestamp: prod.lastModified,
}));
await amgixApi.upsertDocumentsBulk({
collectionName: amgixCollectionName,
bulkUploadRequest: { documents: amgixBatch },
});
}
upsert_documents_bulk is also asynchronous, so your process won't be waiting for docs to be processed. Amgix will queue them internally and will process them as soon as possible.
Enjoy
Now, we are really done. Your application is completely integrated with Amgix search. Your old data is part of the search. Your users can enjoy finding out about your Products using keywords, or semantic meaning, or both.
Maybe it's a good time to kick back and open your Amgix Dashboard and see how the documents and searches are flowing through the system: http://localhost:8234/dashboard
Retrospective
We accomplished a lot with very few steps. But it may be a good moment to reflect on things we didn't have to do:
- We didn't have to build a complex ingestion pipeline to feed documents into the search system.
- We didn't have to build an embedding pipeline to generate vectors.
- We didn't have to mess with loading models.
- We didn't have to write code to handle deduplication.
- We didn't have to worry about retries and queues.
- We didn't have to write code to fuse and rank search results from multiple vector searches.
- We didn't have to write complex configurations and search queries in obscure DSL languages.
- We didn't have to wonder about what other tools/infra we needed to hook up and stand up to observe the health of the search system.
All of the above things can take teams weeks and months to glue together. Amgix gives all of these to us for free.
Looking Forward
Are you stuck running Amgix-One? No. It's a good starting place, but as your needs and demands grow, you may want to explore more advanced options for deploying Amgix, while keeping your data. We plan to write about it in our future tutorials.