DealMonitor Logo
Back to Blog
v0.12: HTTP-First Scraping and the End of Selenium Dependency

v0.12: HTTP-First Scraping and the End of Selenium Dependency

ยทby DealMonitor Teamยท5 min read
releasescrapingperformanceflights

For months, DealMonitor relied heavily on headless Chrome browsers to scrape prices from online shops. It worked, but it was slow, resource-hungry, and increasingly fragile as more shops deployed bot detection. With v0.12, we are fundamentally rethinking how we fetch prices. The new HTTP-first pipeline delivers results in milliseconds instead of seconds, and headless browsers only spin up when truly needed.

Why HTTP-First?

Running a headless Chrome instance for every price check is like driving a truck to buy a loaf of bread. Most shops serve their product pages as plain HTML with prices embedded directly in the markup or in structured data. A simple HTTP request is enough to get everything we need.

The problem was never the shops themselves. It was bot detection. Services like Cloudflare analyze incoming requests at the TLS layer, checking whether the connection looks like it comes from a real browser or from a script. A standard Python HTTP client fails this check instantly because its TLS fingerprint looks nothing like Chrome's.

That is where curl_cffi comes in.

Chrome TLS Fingerprint Impersonation

Every browser leaves a unique fingerprint during the TLS handshake: the cipher suites it supports, the order it presents them, the extensions it negotiates. Cloudflare and similar services maintain a database of known browser fingerprints. If your connection does not match one, you get blocked.

curl_cffi is a Python library that wraps libcurl with the ability to impersonate specific browser versions at the TLS level. When DealMonitor makes an HTTP request through curl_cffi, the connection looks indistinguishable from a real Chrome browser to the receiving server. The cipher suites, the ALPN protocols, the TLS extensions are all exactly what Chrome would send.

This means we can now fetch pages from Cloudflare-protected shops without launching a browser at all. The request completes in under a second instead of the 5-15 seconds a Selenium session would take.

The Pipeline in Action

Here is how the new scrape pipeline works for every price check:

  1. HTTP attempt first: We send a request via curl_cffi with a Chrome TLS fingerprint. If the shop returns a valid HTML page with detectable prices, we are done. No browser needed.
  2. Smart header rotation: Each request uses headers matched to the target shop. The User-Agent, Referer, and Accept-Language headers are rotated and localized for the shop's domain and country. A German shop gets German headers; a US shop gets American English.
  3. Selenium fallback: If the HTTP response indicates a challenge page or JavaScript-rendered content that we cannot parse statically, a headless Chrome session takes over in the background.

The result is dramatic. In our testing, roughly 70% of all shops now resolve via HTTP alone. That means faster results for you and significantly reduced server load for us.

Shop Scrape Mode Learning

The system remembers what works. After five consecutive successful HTTP-only scrapes for a given shop domain, that shop is automatically marked as HTTP-only. Future checks skip the Selenium fallback entirely, saving even more resources. If an HTTP-only shop later returns a challenge page, the system resets and tries Selenium again.

This adaptive behavior means the pipeline gets smarter over time without any manual configuration.

Flight Price Tracking

v0.12 also introduces something our users have been requesting since day one: flight price tracking. Airline and travel booking sites are among the most aggressively protected websites on the internet. Scraping them is practically impossible even with headless browsers. Instead, we took the API approach.

DealMonitor now integrates with the Amadeus and Kiwi.com flight APIs. Note: flight price tracking is planned for a future release. We are evaluating sustainable API options for reliable airline price data. When you add a URL from Expedia, Google Flights, Kayak, Skyscanner, or Momondo, we extract the route and date parameters and query the underlying API directly. This gives us reliable, real-time pricing without touching the website at all.

Flight prices are notoriously volatile, changing multiple times per day. With API-based tracking, we can check prices much more frequently than our standard 12-hour scrape cycle allows for regular shops.

JSON-LD Deduplication

A subtle but important fix in v0.12 addresses a long-standing issue with structured data. Many shops embed price information both in visible HTML elements and in JSON-LD structured data blocks. Our candidate detection was picking up both, sometimes leading to duplicate candidates with slightly different confidence scores.

The new pipeline deduplicates candidates: when a JSON-LD price matches an HTML candidate, the JSON-LD version takes priority. Structured data is machine-readable by design and far less prone to parsing errors than screen-scraped text. This improves detection accuracy across the board.

Code Quality Cleanup

Behind the scenes, we ran ruff across all 60 backend Python files to enforce PEP 8 compliance. Imports are sorted, whitespace is consistent, unused code is gone. This may not be user-facing, but a clean codebase is a reliable codebase. It also makes it easier for us to ship features faster with fewer bugs.

What This Means for You

The HTTP-first pipeline is the biggest architectural change since DealMonitor launched. Here is what you will notice:

  • Faster price checks: Most shops now respond in under a second instead of 5-15 seconds.
  • Better reliability: TLS fingerprinting bypasses many bot detection systems that previously blocked our checks.
  • Flight tracking: Add URLs from major travel sites and track flight prices alongside your regular product trackers.
  • Smarter detection: JSON-LD deduplication means fewer false positives and more accurate prices.

We are committed to making price tracking faster, more reliable, and more comprehensive with every release. If you have not tried DealMonitor yet, create your free account and see the difference the HTTP-first pipeline makes. And if you are already tracking prices, your existing trackers are already benefiting from these improvements.

Check out the full technical changelog on our changelog page.

Ready to Never Miss a Deal Again?

Start tracking prices in seconds. No credit card required.

Start for Free

Related Posts

v0.12: HTTP-First Scraping and the End of Selenium Dependency