Co-authored with Patrick Mutchler.
MetaPhone is a crowdsourced study of phone metadata. If you own an Android smartphone, please consider participating. In an earlier post, we reported how automated analysis of call and text activity can detect private relationships.
Does the National Security Agency have court authority to pore over your phone records? Quite possibly.
According to declassified documents, the NSA operates under a rote legal procedure for querying domestic phone metadata. The agency begins by identifying a “seed” number, with reasonable and articulable suspicion of terrorist activity.1 Next, the NSA has discretion to follow up to three degrees of calling separation (“hops”).2 The NSA is authorized to retrieve a complete set of phone records at each hop, and just one call in the past five years appears sufficient to make a hop.
Many observers have been deeply critical of this reach. By some estimates, a single seed number could establish authority to query phone records for thousands of Americans. Other estimates have counted into the millions.
A common approach for calculating these figures has been to simply assume an average number of call relationships per phone line (“degree”), then multiply out the number of hops. If a single phone number has average degree d, and the NSA can make h hops, then a single query gives expected access to about dh complete sets of phone records.3, 4
We turned to our crowdsourced MetaPhone dataset for an empirical measurement. Given our small, scattershot, and time-limited sample of phone activity, we expected our graph to be largely disconnected. After all, just one pair from our hundreds of participants had held a call.
Surprisingly, our call graph was connected. Over 90% of participants were related in a single graph component. And within that component, participants were closely linked: on average, over 10% of participants were just 2 hops away, and over 65% of participants were 4 or fewer hops away!
The reason, we found, is that the call graph does not solely resemble a diffuse social network. It also includes a hub-and-spoke structure, where many individuals are linked through well-known numbers.
The following figures illustrate this phenomenon with the MetaPhone call graph. Blue nodes reflect participants, red nodes indicate non-participant numbers, and gray edges connect call activity.
First, consider the graph of just the participants. Again, only one pair has held a call.
Now let’s add the most common non-participant number, which is for T-Mobile’s voicemail system.
Already 17.5% of participants are linked. That makes intuitive sense—many Americans use T-Mobile for mobile phone service, and many call into voicemail. Now think through the magnitude of the privacy impact: T-Mobile has over 45 million subscribers in the United States. That’s potentially tens of millions of Americans connected by just two phone hops, solely because of how their carrier happens to configure voicemail.
Let’s add additional frequent numbers.
Ever received a call from a Skype user?5 Authenticated your Google account by phone? You’re just two hops from everyone else in the same boat.
Note that phone spam is now in the mix. That’s especially troublesome for call graph connectivity, since the very purpose is to call as many numbers as possible. You’re just two hops from everyone else who’s been harassed about “cardmember services” or “your auto warranty.” (Hey, maybe the NSA should have competed in the FTC’s robocall challenge!)
You get the idea. Finally, here’s the entire call graph, omitting numbers called by just one participant.
So much for the notion that our crowdsourced call graph would be disconnected.
In thinking through the scope of NSA phone metadata authority, then, a simple dh estimate does not tell the whole story. Calculations also have to account for the presence of high-degree hubs, which roughly map onto a power law.
What’s more, connections through high-degree hubs may be attenuated by a third hop. The discussion above focused on two-hop connections, where a hub merely links two numbers.
But a hub could also be one step removed from the seed.
Or one step removed from the user.
There is also the possibility of two hubs on a three-hop path from a seed to a user. Suppose, for example, that a suspicious number is phoned by a Skype user; a different Skype user has called FedEx; and you have phoned FedEx. You’re fair game.
The presence of hubs radically alters the connectivity of the phone graph. We resampled our data to estimate the reach of three hops, and consistently found that a majority of participant numbers would be included.6
So, what does all this mean for NSA watchers? The sample of MetaPhone participants is not representative of American phone use, and we do not know the properties of NSA seed numbers. We cannot, therefore, place statistically rigorous confidence bounds on our results. But our measurements are highly suggestive that many previous estimates of the NSA’s three-hop authority were conservative. Under current FISA Court orders, the NSA may be able to analyze the phone records of a sizable proportion of the United States population with just one seed number.
And by the way, there are tens of thousands of qualified seed numbers.
Many thanks to our advisors and colleagues for their invaluable input on this project. All views are solely our own.
1. For background, the reasonable and articulable suspicion standard is quite easy to satisfy— it’s the same basis for New York City’s controversial stop-and-frisk program. Also, between 2006 and 2009 the NSA failed to even meet the RAS standard for most of the seed numbers that it used.
2. We are not claiming that the NSA exercises this authority. We have no way of knowing, of course. FISA Court opinions indicate NSA technical staff may identify and ignore (“defeat”) high-degree numbers. But the opinions do not suggest the agency must do so. In fact, the FISC appears to view these efforts as an additional intrusion, since they involve analyzing the entire call database.
3. The NSA can also observe calls with a final hop number, such that dh+1 numbers might be affected (but not have full call records revealed) in a single query. This extended analysis appears to be the basis of a widely cited figure that, if an average person has 40 contacts, then three hops are sufficient to reach 2.5 million numbers.
4. More precisely, if node degree is fixed at d, then the number of nodes within h hops is dh + dh-1 + . . . + 1 = (dh+1 – 1) / (d – 1).
5. It appears that outbound Skype calls are, by default, placed from a small set of shared numbers.
6. Our call graph does not accurately reflect three-hop paths, since we do not have call records for the many non-participant numbers. We conducted a resampling analysis to account for this shortcoming, focusing on seed-spoke-hub-user paths. Over 10,000 iterations, we randomly selected a seed number from the participants. For high-degree neighbors, we simply followed two hops. For unknown neighbors, with probability .5, we randomly selected a new participant number and followed two hops over the high-degree neighbors.