Technology and methodology

A description of how our scoring methodology works.

Our methodology

NS8 employs many different detection methods to determine whether or not a user is valid. Some of these methods are algorithmic, while others are learned over time by detecting patterns in the data. We explain some of these methods below, but in order to protect our intellectual property and prevent reverse-engineering, not all methods will be listed here.

Scoring

Some methods are fairly conclusive about users' validity. Other methods produce a likelihood of validity, which we then compile into a unified score with a value between 0 and 1000. The lower the score, the less likely the user is valid. For example, someone using a proxy to route traffic through a data center's IP address will receive a low score. On the other hand, there may be certain countries from which the bulk of invalid traffic originates, but these countries also have real users—legitimate users from these countries will still receive high scores.

Bots

A large and growing percentage of web traffic is generated by bots, spiders, extensions, headless browsers, toolbars, and other nonhumans (collectively referred to as “bots”). These bots have become increasingly sophisticated in how they disguise themselves, which means that fraud defense systems must continuously evolve their methods for detecting malicious traffic.

Here are some of the indicators that we check when assessing a potential bot:

Block List

We check every IP address against our database of known infected machines. This tool detects machines that have been hijacked as spambots and machines that are infected with viruses and generate large amounts of automated traffic or clicks. This database is maintained in real-time to detect emerging sources of fraud and keep us up-to-date on the latest trends.

Data Center Origin

We maintain a list of data center-based IP address ranges since many bot networks will use data centers to create or proxy traffic. For example, a session from within Amazon Web Services' data center address block is unlikely to be valid, because these locations are server rooms that are usually inaccessible to human beings.

Public Web Proxies

Public web proxies are also used to hide a user’s location by broadcasting an IP address that appears to come from somewhere other than their real location, much like proxying through a data center above. We maintain a real-time database of public web proxies so we can detect proxy-based sessions and score users accordingly.

TOR

The Onion Router (TOR) is free-to-use software that enables anonymous online communication. TOR does have legitimate uses, but because it hides the origin of the user, it is inherently suspicious and can be used to generate random sessions.

Spoofed User Agents

Bots often rotate their user agents to appear to be multiple devices in order to generate realistic-looking traffic. We have developed technology to match the user agent to the browser’s capabilities and detect sessions that have altered their user agent.

Invalid Searches

Bots sometimes create fake referrer headers to appear to be from a search engine. In many cases, these headers differ from real search engine referrer structures.

Collusion

This method detects the coincidence of a set of IP addresses and a set of publisher sites.

Other Proprietary Methods

We have developed several other methods for detecting fraudulent sessions and this continues to be a primary focus of our research efforts.

Hidden Users

Hidden users originate from sessions where no page is ever visible on the screen. Whether they are a bot or just a human user that never looks at their display, a hidden session will receive a score of 0 because no page content was ever looked at by a real person.

Here are some of the primary reasons that a user may be categorized as a hidden session:

Preloading

Search engines will preload pages in the background while a user types in a search query. The search engine attempts to predict which link or links the user will click and then loads the pages from those links. This is a way to improve the performance of web browsing; however, many of the preloaded pages are never made visible and should not be counted as evidence of real site activity.

Browser Window Hidden

This occurs when a browser window is behind another window, so web content can’t be seen by the human user.

Background Browser Tabs

A browser tab can be launched in the background and load pages. These pages are never visible unless the user opens the tab.

Bots

The session is detected as a bot and not a real person. This is usually the default reason unless the user falls under one of the categories above.

Our technology tracks whether a session is ever viewed and updates the visibility based on that. For example, if a page is hidden during a preload, it is initially recorded as hidden and given a score of zero. If the user clicks the link to view the preloaded page, that is detected and the session is updated with a new score.