GUI testing wiki

❯

❯

❯

logic flow tampering

❯

Keeping out the masses. Understanding the popularity and implications of internet paywalls

Keeping out the masses. Understanding the popularity and implications of internet paywalls

Jan 28, 20262 min read

paper
paywall
clientSideAttacks
businessFlowTampering
webApplication

Goals of the paper:

measuring how widely paywalls have been adopted
present an automated system for detecting whether a site uses a paywall

The web is increasingly moving away from “open” models to “paywalled” models (see paywalls), to keep the user engaged with the platform.

People are annoyed by ad-based websites because:

Google and Facebook have a monopoly on the advertising industry (70% of revenues)
Ad-based funding systems suffer from significant and increasing rates of fraud
privacy concerns

Paywall circumvention

We find that all observed paywalls are trivial to circumvent. Well-known techniques includes:

emptying the cookie jar (75% effective)
enabling Incognito/Private Mode (sufficient to bypass most paywalls)
changing the screen size dimensions
hiding the user’s IP address
changing the user agent string
using an ad blocker extension
enabling “Reader Mode”
using the Pocket web service5
blocking HTTP requests for popular paywall libraries

Paywall detection

Our model consists of two components:

a crawling component that visits a subset of pages on a site, records information about each page’s execution, and extracts some ML features
- text features: the presence of specific keywords is checked (“subscribe”, “sign up”, “remaining”, translated in 87 languages)
- structural features: if the website has a RSS (RDF Site Summary) or atom feed, the number of text nodes increases after cookies are cleaned, etc
- visual features: how many text nodes are obscured before and after cleaning cookies, number of text nodes, number of nodes that have z-index styles (to detect pop-ups)
a classifier, that uses the extracted features to predict if the site uses a paywall
- a random forest is used from SciKit-Learn python library
- the final accuracy was of 77%

References

(Papadopoulos, Snyder, et al., 2020)

Graph View

Paywall circumvention
Paywall detection
References

Backlinks

No backlinks found

Created with Quartz v4.4.0 © 2026

GitHub
Discord Community