Skip to content

Amgix v1.1.0: Linear Fusion and WMTR Trigram Weights

Stable release of Amgix server v1.1.0 is now available.

New Features

Linear Fusion

Updated on April 6, 2026

This section was updated to include hybrid-search content

Before v1.1.0 release the only score fusion option available in Amgix was weighted RRF (Reciprocal Rank Fusion). This release introduces a new search query option: fusion_mode. It is set to rrf by default, but can be changed to linear to allow for weighted linear fusion of scores from multiple vectors. Linear fusion normalizes raw vector scores from searched vectors (min-max), sums them up for returned documents, sorts results based on the sum, and returns the top candidates.

Why is this interesting?

For some datasets, relevance of search results improves with weighted linear fusion over ranking fusion. NQ (Natural Questions) dataset is a good example.

If you look at our published benchmarks for NQ you may notice that WMTR (Weighted Multilevel Token Representation) underperforms compared to BM25 baseline on this dataset:

BM25 WMTR WMTR (tuned)
nDCG@10 0.329 0.2579 0.2741
Recall@10 0.4222 0.4266

Even when weights between name and content fields were tuned, WMTR still trails BM25 by a significant margin: 0.0549. That's 5-7% of possible relevance left on the table.

The above tests were performed on Amgix v1.0.0-beta3.3, where the only available score fusion function was RRF. With the new fusion_mode setting in v1.1.0, we re-ran the tests with linear fusion setting. Here are the results:

BM25 WMTR WMTR (tuned)
nDCG@10 0.329 0.2849 0.3171
Recall@10 0.4265 0.4821

This is over 10% relative improvement of WMTR score with default weights. For tuned WMTR weights we are only ~1% below the BM25 baseline. Recall@10 has also jumped up with tuned weights.

Why the difference?

RRF discards score magnitude and only uses rank position. For datasets where score magnitude carries meaningful signal about relevance, like NQ where content matches are far stronger than title matches, linear fusion captures that signal and RRF throws it away.

Added on April 6, 2026

We went back and re-tested the hybrid-search on NQ. Hybrid search was performed with 3 vectors, just like in our original benchmarks (WMTR on name and content, all-MiniLM-L6-v2 on content). RRF numbers are from the original benchmarks. Here are the results:

BM25 Hybrid (RRF) Hybrid (RRF, tuned) Hybrid (Linear) Hybrid (Linear, tuned)
nDCG@10 0.329 0.3850 0.4075 0.4122 0.4392
Recall@10 0.5685 0.5944 0.5907 0.6229

Default Hybrid with Linear fusion outperforms even the tuned RRF Hybrid. Tuning Linear fusion further pushes nDCG@10 to 0.4392, a 7.8% improvement over tuned RRF.

Test Both Fusion Modes

Linear fusion is not always the answer. RRF performs better on some datasets and Linear fusion on others. Look, for example, at the results of the hybrid search on ArguAna dataset:

BM25 Hybrid (RRF) Hybrid (RRF, tuned) Hybrid (Linear) Hybrid (Linear, tuned)
nDCG@10 0.414 0.4966 0.5306 0.3824 0.5413
Recall@10 0.7909 0.8172 0.5832 0.8279

Not only does default RRF hybrid outperform Linear, but default linear fusion scores below the BM25 baseline. The difference is huge. However, tuning recovers the lead for the linear fusion. Which suggests that you should try both modes and tune fusion for your specific dataset, or fall back to RRF as a safer default when you don't have relevance judgments to tune with.

WMTR Trigram Weights

In Amgix v1.1.0 we added new search query option called wmtr_trigram_weight that allow you to control the influence the trigrams component of WMTR has on the result scores. To better understand what this means, we have to look at what WMTR does. Internally, WMTR represents text in multiple views (tokenizes text at multiple levels). For simplicity sake, we'll focus on two of the views: lexical (~words) and trigrams (3-character combinations). The views are namespaced inside of the resulting sparse vector and weighted differently. By default, WMTR prioritizes lexical view scores to trigrams, because matches on whole "words" are much more significant than partial matches in the text. The default weights are heavily skewed towards lexical view and trigram view plays only a complementary role.

With wmtr_trigram_weight you can override the default behavior and give trigrams more or less influence, depending on your dataset and requirements. And since it's a per query parameter, you can even tweak this setting on per search query basis.

Let's look at a concrete example

We'll use our PC Parts dataset from benchmarks. It is a collection with 6000+ video card names. The records look like this:

Video Cards
Asus EN8400GS/DI/512MD2(LP)
Asus EAH6450 SILENT/DI/512MD3(LP)
PowerColor AE4350 512MD2-H
Gigabyte XTREME Rev 2.0
NVIDIA Star Wars Galactic Empire
...

The dataset is indexed with WMTR and let's say you are searching for one of those Asus cards (above). You don't remember the exact model, but remember it has 512MD and LP in the name. So you search for 512md lp.

Let's look at top 5 results for this query with different wmtr_trigram_weight settings.

  • Q: 512md lp, weight = 1.0 (default)


    Video Card Raw Score
    Yeston LP 155.138
    MSI LP 154.943
    GALAX LP 154.902
    MSI LP 154.889
    Yeston LP 154.867

    Ignoring the fact that there are duplicate records in the dataset, this result is surprising. Why Yeston LP and other cards are in the top 5, when we know that there are many GPUs in there with 512MD in the name. Even though 512MD is a partial match, it should have contributed to match on LP to score higher than these cards.

  • Q: 512md lp, weight = 6.0


    Video Card Raw Score
    Asus EN8400GS/DI/512MD2(LP) 186.185
    Asus EAH6450 SILENT/DI/512MD3(LP) 181.54
    Asus EAH5450 SILENT/DI/512MD3(LP) 181.356
    Asus EN8400GS SILENT/DI/512MD2(LP) 181.267
    Yeston LP 180.469

    This is finally, a more intuitive result, where LP and 512MD are both in the names of the top results.

What is happening here?

The first result is surprising until you consider the fact that LP is a match on the whole word and therefore the score is pretty high. 512MD, on the other hand, is a partial match. It generates multiple trigrams: 512, 12m, 2md. But given the default trigram weight of 1.0 in WMTR, the scores of three trigram matches are not enough to overcome the score of matches on LP alone. The reason we don't see any of the Asus cards in the list, is because WMTR algorithm also considers the length of the token in relation to the document length. LP in Yeston LP or MSI LP scores higher than LP in Asus EAH6450 SILENT/DI/512MD3(LP) because they are shorter.

Second result, with wmtr_trigram_weight of 6.0, finally gives trigram partial matches the magnitude to overcome the scores of the word LP in shorter documents.

This is just an example, of course, the "right" value for wmtr_trigram_weight depends on your dataset.

Conclusion

Amgix v1.1.0 introduces two new features that allow you to optimize your queries and deliver more relevant search results to your users.