Tracking the Trackers: To Catch a History Thief

Original at the Stanford Center for Internet and Society.

Last week we reported some early results from the Stanford Security Lab‘s new web measurement platform on how advertising networks respond to opt outs and Do Not Track. This week we’re back with a new discovery in the online advertising ecosystem: Epic Marketplace,1 a member of the self-regulatory Network Advertising Initiative (NAI), is history stealing.

Many thanks once again to research assistants Akshay Jagadeesh and Jovanni Hernandez.

Background

A link can be styled differently based on whether you’ve been to the page it points to. You may recall, for example, that in the early days of the web links you hadn’t visited were blue and links you had visited were purple. History stealing is a practice that exploits link styling to learn a user’s web browsing history. The approach is simple: to test whether the user has visited a link, add it to a page and check how it’s styled.2

Members of the computer security community have long considered history stealing a serious privacy vulnerability. The risk goes beyond leaking individual tidbits about past browsing; history stealing can be used to track or even identify a user. Mozilla finally implemented a fix in Firefox 4, and the other major browser vendors quickly followed. According to browser usage statistics roughly half of users remain vulnerable to history stealing.

About a year ago researchers at UCSD conducted the first comprehensive study of history stealing in practice. They found that a few popular adult sites were history stealing to learn whether users had visited their competitors. The UCSD team also discovered history stealing by several advertising networks, including Interclick (another NAI member). Class action litigation is ongoing.

Technical Findings – History Stealing

While testing the JavaScript instrumentation in our new web measurement platform we stumbled across Epic Marketplace history stealing on Flixster and Charter.net. We reverse engineered the Epic Marketplace history stealing script and found a number of features:

  • The script is fast. Thousands of links are tested per second.
  • Links are added in an invisible iframe; there is no apparent effect on the page layout.
  • The script dynamically loads lists of URLs and associated interest segments using JSONP.
  • Progress is stored in a cookie so the script can resume where it left off.
  • The script sets a cookie indicating when it was last run; it will not history steal more than once every twenty-four hours.
  • If history stealing is still in progress when the window is closed (e.g. the user navigates to another page) the script sends its findings before ending execution.
  • The script slows down if a URL list takes over two seconds to process.
  • To prevent multiple history stealing attempts in parallel, the script uses a mutex cookie.
  • The script does not directly report the URLs that it detects the user has visited; it sends a deduplicated list of the interest segments associated with the visited URLs.

(For the technically inclined reader, here are an example iframe, script, and URL list.)

We also examined a series of URL lists (spreadsheet) that contain 15,511 entries. The URLs and interest segments range greatly. Some URLs are for a landing page; others are for a specific page. Some interest segments are broad; others are fine-grained. A few example segments:

Several interest segments are highly sensitive:

  • Segment 760: pages about getting pregnant and fertility, including at the Mayo Clinic
  • Segment 2640: pages about menopause, including at the NIH and the University of Maryland
  • Segment 2014: pages about repairing bad credit, including at the FTC
  • Segment 2265: pages about debt relief, including at the FTC and the IRS

 

Technical Findings – Opt Out

We applied the methodology from last week’s study to examine Epic Marketplace’s opt-out practices. (Epic Marketplace was one of the eleven NAI members not included in that study.) We found that Epic Marketplace leaves its tracking cookies in place after both opting out with the NAI mechanism and enabling Do Not Track. We also found that history stealing continues after using either choice mechanism.

Privacy Representations

The 2008 NAI Code of Conduct requires member companies to receive express consent from a user before collecting “Sensitive Consumer Information,” defined as:

  • Social Security Numbers or other Government-issued identifiers
  • Insurance plan numbers
  • Financial account numbers
  • Information that describes the precise real-time geographic

    location of an individual derived through location-based services

    such as through GPS-enabled devices

  • Precise information about past, present, or potential future health

    or medical conditions or treatments, including genetic, genomic,

    and family medical history

(The Code of Conduct includes the unhelpful footnote, “[t]his provision is to be further developed in a distinct implementation guideline.”)

The Epic Marketplace privacy policy contains the following paragraph under the headings “Information We Collect” and “Non-Personally Identifiable Information”:

Epic Marketplace also automatically receives and records anonymous information that your browser sends whenever you visit a website which is part of the Epic Marketplace Network. We use log files to collect Internet protocol (IP) addresses, browser type, Internet service provider (ISP), referring/exit pages, platform type, date/time stamp, one or more cookies that may uniquely identify your browser, and responses by a web surfer to an advertisement delivered by us. This information may be stored on our systems for about one year.

The privacy policy also claims that:

Web surfers may elect not to provide non-personally identifiable information by following the cookie opt-out procedures set forth below.

As with our prior work, we leave it to the reader to assess whether Epic Marketplace is complying with its privacy representations.

 


Thanks to Gordon Franken for reviewing this post.

1. Epic Marketplace was, until recently, named Traffic Marketplace. It hosts its third-party content on trafficmp.com.

2. Other forms of history stealing, beyond the scope of this post, rely on page layout, background images, and user interaction.