The Web Is Flat

Consider this a bug report for the National Security Agency and its overseers. Dragnet online surveillance may be directed at international activity. But it nonetheless ensnares ordinary Americans as they browse domestic websites.

The spy outfit admits to vacuuming vast quantities of network traffic as it passes through the United States. Some taps are on the nation’s borders; others are on the domestic Internet backbone. International partner agencies, most prominently the UK’s Government Communications Headquarters, contribute to the NSA’s reach. Recent leaks have provided substantial detail: Under the Marina program, the agency appears to retain web browsing activity for a year.1 The XKeyscore system offers at least one way for analysts at the NSA and cooperating services to efficiently query both historical and realtime data.

Agency apologists are quick to point out that the snooping has limits. The NSA only acquires online communications when a sender or recipient seems international. Doing otherwise might, in their view, violate congressional restrictions or constitutional protections.

Tough luck for foreigners. But if you’re within the United States, the notion goes, you don’t have much cause for concern.

That’s wrong. Americans routinely send personal data outside the country. They just might not know it.

Here’s an example: From approximately mid-August through mid-October, the House of Representatives website was not entirely “Made in the USA.” What you read was shared with a business in London.2

When you loaded a House webpage, your browser began by chatting up Akamai, a prolific and speedy web hosting service. Scant surprise there.3

As the page progressed, your browser was instructed to load some code provided by a company named Texthelp. The House’s aim was praiseworthy; Texthelp software assists individuals who have difficulty reading.

The House website in early October. A green arrow indicates the Texthelp feature.

Texthelp is, however, incorporated in and operated from the United Kingdom. When your browser schmoozed with the Brits, it passed along a “referrer”—a technical tipoff about the page that you’re reading.4

GET /Detect.ashx HTTP/1.1
Host: babm.texthelp.com
. . .
Origin: http://house.gov
. . .
Referer: http://house.gov/legislative/date/2013-10-4
. . .

So there’s the general problem. A person within the United States may be reading a webpage that looks, and is, as American as apple pie. But that webpage can pull in dozens of unexpected sources—advertising companies, analytics services, and social networks, among others. If just one of those third parties is international, your browsing activity could be swept into the NSA’s dragnet.5, 6

I conducted a small experiment in late September to gauge the magnitude of the international referrer issue. Using FourthParty, a Web measurement platform we’ve built in the Stanford Security Lab, I tested 2,500 popular websites.7 The results were concerning, albeit unsurprising: international referrers are pervasive. I spotted the phenomenon on pages across many categories of popular websites, including political commentary (examples: National Review and Talking Points Memo), popular culture (Buzzfeed and Parade), sports (Major League Baseball and the PGA Tour), travel (Lonely Planet), consumer products (Nike), retail (Overstock), and personal health (Medicare.gov). Yes, even the apple pie recipe on CHOW has a Canadian component. So much for a bright line dividing the domestic and international Web.

This technical result raises serious legal questions. Has Congress authorized wholesale surveillance of apparently domestic online activity? Does the Fourth Amendment tolerate rampant prying into our homeland web browsing?

There is a strong argument that the answer to both questions is no.8 The NSA’s purported statutory authority for Internet surveillance within the U.S. expressly prohibits snooping on domestic communications where “all intended recipients” are also domestic. And both the courts and Congress have rightly recognized that an intercept of Internet content entirely within the United States, much like a telephone wiretap, requires a warrant and probable cause. Even the questionably effective Foreign Intelligence Surveillance Court has required the NSA to ditch purely domestic communications.

For whatever FISC oversight is worth, perhaps the NSA has provided (secret) briefing on international referrers. Perhaps a judge has (secretly) approved. There is, however, cause for doubt.

In 2011, the NSA alerted its judicial overseers to a different technical glitch. When email and other messages transit the Internet, they can get bundled together. If the NSA intercepts an international note, it might also snag purely domestic messages in the same bunch. A FISC judge lambasted the NSA for yet another “substantial misrepresentation” about its mass surveillance and held the program unconstitutional. So, this would hardly be the NSA’s first omission gaffe.

Even if the international referrer issue has been rigorously reviewed, there are myriad other ways that Americans might unknowingly send data overseas. Domestic organizations often place servers outside the United States. A tiny part of the Chevrolet website, for example, resides in Frankfurt, Germany.10 What’s more, even if a user and server are both stateside, the path connecting them might wander into Canada or Mexico. A domestic cloud business might shuffle data to or from a data center overseas. And many of the online services that Americans use are not obviously international. The popular scheduling website Doodle? Swiss. The music streaming service Spotify? Swedish. The dating website Plenty of Fish? Canadian. The popular link shortener is.gd, used by @Bruce_Schneier11 for blog updates lambasting the NSA? British.

It is difficult to believe that the NSA’s independent supervisors have the technical savvy to consistently identify, assess, and remediate these sorts of problems. The FISC has itself vented frustration about having to accept the agency’s technical claims at face value.

Were intelligence oversight adequate, problems like these would nevertheless recur. The Internet is not balkanized along geopolitical boundaries. Communications are not neatly labeled by nationality and locale. Online systems routinely repackage and reroute activity in convoluted ways. Attempts at singling out Americans will necessarily rely on patchy guesswork. And they will necessarily get it wrong, a lot.


Many thanks to the friends and colleagues at Stanford, Princeton, and elsewhere who provided feedback on this work. All views and errors are solely my own.

This research was the basis for a submission to the Review Group on Intelligence and Communications Technologies within the Office of the Director of National Intelligence.

1. Leaked slides on XKeyscore suggest NSA mass metadata collection includes HTTP headers, and according to the Guardian, a guide on Marina indicates that the program “tracks a user’s browser experience.”

2. I used the Internet Archive’s Wayback Machine to estimate the period during which the House website hosted a Texthelp script. At the time of writing, the script remains on House webpages, but is commented out.

3. Oddly, www.house.gov is hosted by Akamai, while house.gov appears to be hosted from a federal datacenter near Philadelphia.

4. Specifically, the House website included the following in its standard template:

<script type="text/javascript">
  var _baLocale =  "uk";
  var _baUseCookies = true;
  var _baHiddenMode =  false;
  var _baHideOnLoad = false;
  var _baMode = "/content/static/img/bgNavTools.gif";
</script> 
<script type="text/javascript" src ="http://house.gov/content/static/js/ba.js"></script>

The Texthelp BrowseAloud script, in turn, triggered an HTTP request to babm.texthelp.com.

var $bajq, browsealoud = {
  BASE_ADDRESS: "babm.texthelp.com",
    . . .
    init: function () {
      . . .
      var d;
      try {
        if (window.XDomainRequest) {
          d = new XDomainRequest()
        } else {
          d = new XMLHttpRequest()
        }
        d.open("GET", (this.isSecure ? "https" : "http") + "://" + this.BASE_ADDRESS + "/Detect.ashx", false);
        d.send(null)
      } catch (s) {
        this.debug("ERROR: init: " + s)
      }
      if (d.responseText !== "") {
        this.localeId = d.responseText
      }
      . . .
    },
    . . .
};
document.write(browsealoud.init());

5. A corollary concern, which I do not address here, is that an Internet user outside the United States may access a website that is also outside the United States, but includes American third-party content. Many of the largest third parties are based in the United States, so this phenomenon is quite likely pervasive. Researchers at the University of Toronto have documented the related concern of network paths that happen to pass through the United States.

6. I cannot, of course, say how often NSA analysts inspect international referrers from domestic websites—leaks have not (yet?) provided such granular detail. As privacy scholars have pointed out time and again, though, concerns arise from the moment of data collection—not just when data is used. One particular source of concern is insider misbehavior, which the NSA has hardly proven immune to.

7. The crawler visited the Quantcast U.S. top 2,500 websites, following five links from each landing page. It spent fifteen seconds on each page so that dynamic content could load. After the crawl finished, I searched HTTP Request-URI and Referer headers for leakage of the URLs of pages visited during the crawl. Next, I used the MaxMind GeoLite Country database to spot receiving servers possibly located outside the United States. I finally confirmed servers were international by running the traceroute utility and manually inspecting its output. All Internet access was through the Stanford University network.

8. Since the focus of this already-lengthy post is technology and policy concerns, I do not provide a detailed treatment of legal considerations. Related issues include the location of acquisition, the scope of Executive Order 12333 and Article II authority, the extent of Fourth Amendment protection in HTTP headers, whether statutory and constitutional protections encompass international referrers as domestic, quasi-domestic, or one-end-foreign communications, whether acquisition of international referrers is permissible as incidental to lawful acquisition, and whether statutory and constitutional protections are triggered at the time of acquisition or only when used in intelligence analysis.

9. To the extent Internet traffic acquisition occurs outside the U.S., the administration appears to not even brief the FISC and not provide unprompted briefing to Congress.

10. The chevrolet.com server appears to be located in Frankfurt. If you load chevrolet.com in your browser, you will then be redirected to www.chevrolet.com, which is hosted domestically by Akamai.

11. I have no idea if this account is actually operated by Bruce Schneier, though that’s beside the point. Many users clicking on links critical of the NSA may be, ironically, tipping off the NSA.