Regime Change: How CatBoost Dethroned Our Previous Price Detection Model

A New Champion

This past weekend, something happened that we internally call a “Regime Change”: Our price detection model based on LightGBM was replaced by a CatBoost model. That sounds technical — and it is. But the effects are directly noticeable: price detection on your product pages just got more accurate.

To mark the occasion, we want to give you a peek behind the curtain at how our AI-powered price detection works, how we train our models, and why CatBoost now comes out on top.

The Problem: One Number Among Many

A typical product page contains dozens of numbers: article IDs, ratings, shipping costs, quantities, crossed-out prices, variant prices. The actual purchase price is just one of them. Our model needs to pick the right one from all these candidates — on any website, regardless of layout, language, or shop system.

How Our Pipeline Works

Price detection runs in several stages:

Stage 1: Collecting Candidates

When you add a URL for tracking, our system analyzes the complete page structure. Every element that could contain a price is identified. We use multiple sources in parallel:

Structured data: JSON-LD and Schema.org markup that many shops provide for search engines.
DOM analysis: Every text element is examined for price-like patterns — numbers with currency symbols, decimal separators, etc.
JavaScript extraction: For shops with configurable products (e.g., different sizes), we extract variant prices directly from embedded JavaScript.

Stage 2: Feature Extraction

For each price candidate, we compute roughly two dozen features that help the model distinguish the real price from noise:

HTML context: Does the surrounding element contain words like “price”, “offer”, or “current”? Is the text visually emphasized (bold, large font)?
Page position: How deeply nested is the element in the DOM? Where does it sit relative to other candidates?
Statistical context: How does the value compare to other numbers on the page? Is it an outlier or within a typical price range?
Shop-specific signals: How well does the model historically detect prices on this domain? Some shops are harder than others.

Stage 3: Prediction

All candidates with their features are sent to our ML service. The model scores each candidate with a probability: “How confident am I that this is the actual product price?” The candidate with the highest score wins.

Training: How the Model Learns

Our model learns from real user data. Every time you confirm or correct a price, that feedback flows back as a training signal. The mapping “this number on this page is the correct price” becomes a labeled example for the next training run.

The challenge: out of all candidates on a page, typically only one is the correct price — the ratio is roughly 1:50. This imbalance must be accounted for during training, otherwise the model simply learns to classify everything as “not a price.”

We regularly train multiple model types in parallel and compare their performance on a held-out test set. The test set is strictly split by page — the model is never tested on pages it has seen during training.

Why CatBoost Won

In our latest model comparison, CatBoost outperformed the previous LightGBM model (which had been in production since January) on the key metrics:

Top-1 accuracy of 80%: For 4 out of 5 product pages, the model identifies the correct price on the first try.
Top-3 accuracy of 84%: When considering the three best candidates, the correct price is almost always among them.

What makes CatBoost better? Two factors stand out:

Better handling of class imbalance. CatBoost has a built-in strategy for automatic class weight balancing that works more robustly in practice than the manual calibration needed for LightGBM.

Smarter processing of categorical features. Features like HTML tag type or candidate source (JSON-LD vs. DOM text vs. JavaScript) are processed natively by CatBoost, without us having to manually encode them as numbers. This reduces information loss.

Automatic Retraining

Our pipeline doesn’t just train models once — it does so continuously. Every day, the current best model is retrained with new data. Once a week, a full comparison of all model configurations runs — that’s how we discovered the “Regime Change” to CatBoost.

The detector service that performs real-time price detection loads new models automatically. From discovering a better model to deploying it in production takes only minutes.

What This Means for You

In short: better price detection, fewer manual corrections needed. You should especially notice improvements on shops with complex page layouts, multiple price variants, or unusual presentations.

When the model is uncertain, you’ll see it in the confidence indicator during tracker creation. In those cases, you can simply confirm the price manually — and help the model learn for its next training round at the same time.

Try it out and create your next tracker — CatBoost is now handling the price detection.

Regime Change: How CatBoost Dethroned Our Previous Price Detection Model

A New Champion

The Problem: One Number Among Many

How Our Pipeline Works

Stage 1: Collecting Candidates

Stage 2: Feature Extraction

Stage 3: Prediction

Training: How the Model Learns

Why CatBoost Won

Automatic Retraining

What This Means for You

Ready to Never Miss a Deal Again?

Related Posts

We now read every shop's terms of service — and we're failing at exactly the right step

When shops lock us out — why some prices don't refresh

Three small features that make DealMonitor better in everyday use

Smarter Price Alerts and Self-Healing Trackers

Invite Friends and Unlock Tracker Slots

Importa listele tale de dorinte — Steam si Amazon cu un clic

1 Year of DealMonitor: From Idea to Price Tracker

v0.12: HTTP-First Scraping and the End of Selenium Dependency

The 5 Best Price Comparison Tools in 2026 — Compared

Amazon Price History: How to Track Prices the Right Way

v0.11: API Integrations for Etsy, Game Stores, and Multi-Price Tracking

Already Thinking About Christmas Gifts? Yes, March Is the Right Time.

v0.10: Tackling Amazon and AliExpress with APIs

How Online Shops Trick You Into Buying — And How to Fight Back

DealMonitor Goes Beta: Everything That's New

The Ultimate Guide to Smart Online Shopping

v0.8: Dark Mode, Error Monitoring, and Our First Blog Posts

v0.7: Web Push Notifications and Dashboard Search

How AI Detects Prices on Any Website

v0.6: Telegram Notifications, Tracker Groups, and Sharing

5 Ways to Save Money with Price Alerts

v0.5: Google OAuth and 9 Languages from Day One

How to Track Prices Online