Amgix v1.1.0: Linear Fusion and WMTR Trigram Weights
Stable release of Amgix server v1.1.0 is now available.
New Features
Linear Fusion
Updated on April 6, 2026
This section was updated to include hybrid-search content
Before v1.1.0 release the only score fusion option available in Amgix was weighted RRF (Reciprocal Rank Fusion). This release introduces a new search query option: fusion_mode. It is set to rrf by default, but can be changed to linear to allow for weighted linear fusion of scores from multiple vectors. Linear fusion normalizes raw vector scores from searched vectors (min-max), sums them up for returned documents, sorts results based on the sum, and returns the top candidates.
Why is this interesting?
For some datasets, relevance of search results improves with weighted linear fusion over ranking fusion. NQ (Natural Questions) dataset is a good example.
If you look at our published benchmarks for NQ you may notice that WMTR (Weighted Multilevel Token Representation) underperforms compared to BM25 baseline on this dataset:
| BM25 | WMTR | WMTR (tuned) | |
|---|---|---|---|
| nDCG@10 | 0.329 | 0.2579 | 0.2741 |
| Recall@10 | 0.4222 | 0.4266 |
Even when weights between name and content fields were tuned, WMTR still trails BM25 by a significant margin: 0.0549. That's 5-7% of possible relevance left on the table.
The above tests were performed on Amgix v1.0.0-beta3.3, where the only available score fusion function was RRF. With the new fusion_mode setting in v1.1.0, we re-ran the tests with linear fusion setting. Here are the results:
| BM25 | WMTR | WMTR (tuned) | |
|---|---|---|---|
| nDCG@10 | 0.329 | 0.2849 | 0.3171 |
| Recall@10 | 0.4265 | 0.4821 |
This is over 10% relative improvement of WMTR score with default weights. For tuned WMTR weights we are only ~1% below the BM25 baseline. Recall@10 has also jumped up with tuned weights.
Why the difference?
RRF discards score magnitude and only uses rank position. For datasets where score magnitude carries meaningful signal about relevance, like NQ where content matches are far stronger than title matches, linear fusion captures that signal and RRF throws it away.
Added on April 6, 2026
We went back and re-tested the hybrid-search on NQ. Hybrid search was performed with 3 vectors, just like in our original benchmarks (WMTR on name and content, all-MiniLM-L6-v2 on content). RRF numbers are from the original benchmarks. Here are the results:
| BM25 | Hybrid (RRF) | Hybrid (RRF, tuned) | Hybrid (Linear) | Hybrid (Linear, tuned) | |
|---|---|---|---|---|---|
| nDCG@10 | 0.329 | 0.3850 | 0.4075 | 0.4122 | 0.4392 |
| Recall@10 | 0.5685 | 0.5944 | 0.5907 | 0.6229 |
Default Hybrid with Linear fusion outperforms even the tuned RRF Hybrid. Tuning Linear fusion further pushes nDCG@10 to 0.4392, a 7.8% improvement over tuned RRF.
Test Both Fusion Modes
Linear fusion is not always the answer. RRF performs better on some datasets and Linear fusion on others. Look, for example, at the results of the hybrid search on ArguAna dataset:
| BM25 | Hybrid (RRF) | Hybrid (RRF, tuned) | Hybrid (Linear) | Hybrid (Linear, tuned) | |
|---|---|---|---|---|---|
| nDCG@10 | 0.414 | 0.4966 | 0.5306 | 0.3824 | 0.5413 |
| Recall@10 | 0.7909 | 0.8172 | 0.5832 | 0.8279 |
Not only does default RRF hybrid outperform Linear, but default linear fusion scores below the BM25 baseline. The difference is huge. However, tuning recovers the lead for the linear fusion. Which suggests that you should try both modes and tune fusion for your specific dataset, or fall back to RRF as a safer default when you don't have relevance judgments to tune with.
WMTR Trigram Weights
In Amgix v1.1.0 we added new search query option called wmtr_trigram_weight that allow you to control the influence the trigrams component of WMTR has on the result scores. To better understand what this means, we have to look at what WMTR does. Internally, WMTR represents text in multiple views (tokenizes text at multiple levels). For simplicity sake, we'll focus on two of the views: lexical (~words) and trigrams (3-character combinations). The views are namespaced inside of the resulting sparse vector and weighted differently. By default, WMTR prioritizes lexical view scores to trigrams, because matches on whole "words" are much more significant than partial matches in the text. The default weights are heavily skewed towards lexical view and trigram view plays only a complementary role.
With wmtr_trigram_weight you can override the default behavior and give trigrams more or less influence, depending on your dataset and requirements. And since it's a per query parameter, you can even tweak this setting on per search query basis.
Let's look at a concrete example
We'll use our PC Parts dataset from benchmarks. It is a collection with 6000+ video card names. The records look like this:
| Video Cards |
|---|
| Asus EN8400GS/DI/512MD2(LP) |
| Asus EAH6450 SILENT/DI/512MD3(LP) |
| PowerColor AE4350 512MD2-H |
| Gigabyte XTREME Rev 2.0 |
| NVIDIA Star Wars Galactic Empire |
| ... |
The dataset is indexed with WMTR and let's say you are searching for one of those Asus cards (above). You don't remember the exact model, but remember it has 512MD and LP in the name. So you search for 512md lp.
Let's look at top 5 results for this query with different wmtr_trigram_weight settings.
-
Q:
512md lp, weight = 1.0 (default)
Video Card Raw Score Yeston LP 155.138 MSI LP 154.943 GALAX LP 154.902 MSI LP 154.889 Yeston LP 154.867 Ignoring the fact that there are duplicate records in the dataset, this result is surprising. Why
Yeston LPand other cards are in the top 5, when we know that there are many GPUs in there with512MDin the name. Even though512MDis a partial match, it should have contributed to match onLPto score higher than these cards. -
Q:
512md lp, weight = 6.0
Video Card Raw Score Asus EN8400GS/DI/512MD2(LP) 186.185 Asus EAH6450 SILENT/DI/512MD3(LP) 181.54 Asus EAH5450 SILENT/DI/512MD3(LP) 181.356 Asus EN8400GS SILENT/DI/512MD2(LP) 181.267 Yeston LP 180.469 This is finally, a more intuitive result, where
LPand512MDare both in the names of the top results.
What is happening here?
The first result is surprising until you consider the fact that LP is a match on the whole word and therefore the score is pretty high. 512MD, on the other hand, is a partial match. It generates multiple trigrams: 512, 12m, 2md. But given the default trigram weight of 1.0 in WMTR, the scores of three trigram matches are not enough to overcome the score of matches on LP alone. The reason we don't see any of the Asus cards in the list, is because WMTR algorithm also considers the length of the token in relation to the document length. LP in Yeston LP or MSI LP scores higher than LP in Asus EAH6450 SILENT/DI/512MD3(LP) because they are shorter.
Second result, with wmtr_trigram_weight of 6.0, finally gives trigram partial matches the magnitude to overcome the scores of the word LP in shorter documents.
This is just an example, of course, the "right" value for wmtr_trigram_weight depends on your dataset.
Conclusion
Amgix v1.1.0 introduces two new features that allow you to optimize your queries and deliver more relevant search results to your users.