Advancing Empirical Legal Scholarship: Federal Appellate Opinions and Rules

Last December I shared XML versions of the U.S. Code and Supreme Court opinions through early 2012. My intent was and remains to facilitate empirical legal scholarship by providing government-authored materials in a machine-readable format.

This post is accompanied by additional documents: opinions and rules of various federal appellate tribunals. As before, I welcome feedback from the academic research community.
Continue reading

The New Firefox Cookie Policy

The default Firefox cookie policy will, beginning with release 22, more closely reflect user privacy preferences. This mini-FAQ addresses some of the questions that I’ve received from Mozillans, web developers, and users.

How does the new Firefox cookie policy work?

Roughly: Only websites that you actually visit can use cookies to track you across the web.

More precisely: If content has a first-party origin,1 nothing changes. Content from a third-party origin only has cookie permissions if its origin already has at least one cookie set.

How does Firefox’s new policy compare to the other major browsers?

Chrome – Allows all cookies.

Internet Explorer – Cookie permissions vary by P3P compact policy. In practice, almost all third-party tracking cookies are allowed.2

Safari – First-party content has cookie permissions. Third-party content only has cookie permissions if the content already has at least one cookie set.

In short, the new Firefox policy is a slightly relaxed version of the Safari policy.3

Will the new Firefox policy break websites?

Collateral impact should be limited. Safari’s cookie policy has been in place for over a decade, and it is included in both the desktop and iOS versions of the browser. A few websites may require a tiny code change to accommodate Firefox in the same way as Safari.

Just to be sure, the Mozilla privacy team is closely monitoring the policy before final release. The patch will spend about 6 weeks each in the pre-alpha, alpha, and beta builds. If you spot any oddities, please report them to Mozilla support!

How can I test whether my website has cookie permissions?

Easy: try to set a cookie. This approach can introduce cookie permissions into both server-side and client-side code.

Browser sniffing is generally disfavored since it can be unreliable and requires updating. Moreover, sniffing will not accommodate Chrome and Internet Explorer users who have switched from the default cookie policy.

I operate a third-party website that uses cookies. What should I do?

If a Firefox user appears to have intentionally interacted with your content, take the same approach as for Safari users.4 Examples of content within this category include Facebook apps and comment widgets where a user has typed text.

If a user does not seem to have intentionally interacted with your content, or if you’re uncertain, you should ask for permission before setting cookies. Most analytics services, advertising networks, and unclicked social widgets would come within this category.

In sum, working around the policy’s technical limits may be reasonable in certain cases, but undermining the policy’s privacy purpose is never acceptable.

What happens to preexisting cookies?

The new policy does not make any special provision for preexisting cookies. Current Firefox users should clear their cookies to fully benefit from the new policy.5

What comes next for the Firefox cookie policy?

There’s still plenty of work to do. Some possible directions that I’m interested in:

  • Extending the cookie policy to other storage technologies (e.g. HTML5 Web Storage).
  • Providing a uniform mechanism for requesting storage permissions.
  • Relaxing the cookie policy for websites that honor Do Not Track.

Please share your ideas on the mozilla.dev.privacy mailing list!


All views are solely my own. I do not speak for Mozilla.

This was my first contribution to the Firefox codebase. Huge thanks to Sid Stamm, Monica Chew, Brendan Eich, Asa Dotzler, Josh Matthews, Justin Dolske, Daniel Veditz, and many other members of the Mozilla community for their advice, guidance, and tolerance of my inexperience.

1. An origin is determined by public suffix + 1.  ↩

2. Many researchers have criticized Microsoft’s approach for being ineffective, convoluted, and relying on the de facto deprecated P3P standard. For background, see Token Attempt: The Misrepresentation of Website Privacy Policies Through the Misuse of P3P Compact Policy Tokens by Leon et al.  ↩

3. The difference is primarily owing to engineering convenience.  ↩

4. The most transparent practice is for you to redirect the user through your origin. You could also use a non-cookie storage technology, though alternatives may be limited by this policy in future.  ↩

5. Conventional wisdom in the web privacy community is that users clear their cookies every few months.  ↩

Electronic Privacy and Economic Choice

Critics of consumer privacy protections frequently invoke revealed preference as a justification for laissez-faire policy. If users really cared about their privacy, the argument goes, we should expect to see revolts against intrusive practices. A number of scholars have demonstrated pervasive information asymmetries1 and bounded rationality2 in consumer privacy choices; the decisions that users actually make about online privacy can hardly be expected to reflect their actual preferences.

But let’s suppose that consumers and online firms are fully informed and completely rational. The economic story that consumers value their privacy less than the marginal income from privacy intrusions is certainly consistent with market behavior.

We should not, however, conclude that the status quo is optimal. There is another congruent economic story, where privacy intrusions are inefficient but nevertheless result owing to transaction costs and competition barriers. This post relates the alternative economic story with two possible examples, then closes with policy implications.
Continue reading

Advancing Empirical Legal Scholarship

Modern quantitative analysis has upended the social sciences and, in recent years, made exciting inroads with law. How complex are the nation’s statutes?1 Did a shift in Supreme Court voting dodge President Roosevelt’s court-packing plan?2 How do courts apply fair use doctrine in copyright cases?3 What factors determine the outcome of intellectual property litigation?4 Researchers have begun to answer these and many more questions through the use of empirical methodologies.

Academics have vaulted numerous hurdles to advance this far, including deep institutional siloing and specialization. But barriers do still exist, and one of the greatest remaining is, quite simply, data. There is no easy-to-get, easy-to-process compilation of America’s primary legal materials. In the status quo, researchers are compelled to spend far too much of their time foraging for datasets instead of conducting valuable analysis. Consequences include diminished scholarly productivity, scant uniformity among published works, and—most frustratingly—deterrence for prospective researchers.

My hope is to facilitate empirical legal scholarship by providing machine-readable primary legal materials. In this first release of data, I have prepared XML versions of the U.S. Code and opinions of the Supreme Court of the United States, through approximately early 2012. Subsequent releases may include additional primary legal materials. I would greatly appreciate feedback from the academic community, particularly with regards to the XML schema, text formatting, and prioritizing materials for release.

United States Code: ZIP (110 MB)
Supreme Court of the United States Opinions: ZIP (348 MB)


Please note, this is a personal project. It is not related to my coursework or research at Stanford University.

1. Michael J. Bommarito II & Daniel M. Katz, A Mathematical Approach to the Study of the United States Code, 389 Physica A 4195 (2010), available at http://www.sciencedirect.com/science/article/pii/S0378437110004875.
2. Daniel E. Ho & Kevin M. Quinn, Did a Switch in Time Save Nine?, 2 J. Legal Analysis 69 (2010), available at http://jla.oxfordjournals.org/content/2/1/69.full.pdf.
3. Matthew Sag, Predicting Fair Use, 73 Ohio St. L.J. 47 (2012), available at http://moritzlaw.osu.edu/students/groups/oslj/files/2012/05/73.1.Sag_.pdf.
4. Mihai Surdeanu et al., Risk Analysis for Intellectual Property Litigation, Proc. 13th Int’l Conf. on Artificial Intelligence & L. 116 (2011), available at http://dl.acm.org/citation.cfm?id=2018375.

Presidential Identifying Information

Sunday’s New York Times included a story about how the presidential campaigns are making extensive use of third-party web trackers. In response to privacy concerns, “[o]fficials with both campaigns emphasize[d] that [tracking] data collection is ‘anonymous.’”1

The campaigns are wrong: tracking data is very often identified or identifiable. Arvind Narayanan has previously written a comprehensive and accessible explanation of why web tracking is hardly anonymous; my survey paper on web tracking provides more extensive discussion.

One of the ways in which web tracking data can become identified or identifiable is “leakage”—data flowing to trackers from the websites that users interact with. Leakage most commonly occurs when a website includes identifying information in a page URL or title. Embedded third parties receive the identifying information if they receive the URL (e.g. referrer headers) or the title (e.g. document.title). Even a little identifying information leakage thoroughly undermines the privacy properties of web tracking: once a user’s identity leaks to a tracker, all of the tracker’s past, present, and future data about the user becomes identifiable.

Web services frequently fail to account for information leakage in their design and testing; a study I conducted last year found that over half of popular websites were leaking identifying information.2 More than a few website operators have made inaccurate representations about the information they share with third parties; in just the past year the Federal Trade Commission settled deception claims against both Facebook and Myspace for falsely disclaiming identifying information leakage.

The Times coverage piqued my curiosity: Are the campaigns identifying their supporters to third-party trackers? Are they directly undermining the anonymity properties that they are so quick to invoke?

Yes, they are. I tested the two leading candidate websites using the methodology from my prior study of identifying information leakage. Both leak. The following sections describe my observations from the Barack Obama and Mitt Romney campaign websites.
Continue reading

The Trouble with ID Cookies: Why Do Not Track Must Mean Do Not Collect

Original at the Stanford Center for Internet and Society.

Co-authored by Arvind Narayanan.

The debate over the meaning of Do Not Track has raged for well over a year now. The primary forum is the W3C Tracking Protection Working Group, with frequent sparring in the press and capitals worldwide. There are, broadly, two Do Not Track proposals: one chiefly backed by the ad industry, and another advanced by privacy advocates [1]. These proposals reflect vastly different visions for Do Not Track with vastly different practical consequences. The two sides have unsurprisingly been at loggerheads, with scant movement towards resolution of the key issues.
Continue reading

Tracking Not Required: Advertising Measurement

Co-authored by Arvind Narayanan.

Measurement is central to online advertising: it facilitates billing, performance measurement, targeting decisions, spending allocation, and more. In a pair of earlier posts we explained how advertisement frequency capping and behavioral targeting are achievable without compiling a user’s browsing history. This post similarly proposes practical, privacy-improved approaches to advertising measurement.

Continue reading

Do Track: Browser-Based Do Not Track Exceptions

Users hold widely varying preferences on web tracking.1 Some don’t mind the practice. Some object to it entirely. Many trust certain organizations to follow them around the web.

Do Not Track accomodates these divergent preferences in two ways. First, browsers and other user agents include an option for universally signaling a preference against tracking (“DNT: 1”). Firefox, Internet Explorer, and Safari have all integrated this feature, and Chrome will support it by the end of the year. Second, a user can configure exceptions to the universal signal. Some websites may choose to build a proprietary “out-of-band” exception mechanism, using ordinary web technologies, that trumps the “DNT: 1” signal. The Do Not Track Cookbook includes an example of how a Facebook out-of-band exception mechanism might appear.

The W3C Do Not Track standard will provide another option: a simple JavaScript interface that allows a website to request an exception, paired with a signal that some tracking is allowed (“DNT: 0”).

Continue reading

Tracking Not Required: Behavioral Targeting

Original at 33 Bits of Entropy.

Co-authored by Arvind Narayanan and Subodh Iyengar.

In the first installment of the Tracking Not Required series, we discussed a relatively straightforward case: frequency capping. Now let’s get to the 800-pound gorilla, behaviorally targeted advertising, putatively the main driver of online tracking. We will show how to swap a little functionality for a lot of privacy.

Admittedly, implementing behavioral targeting on the client is hard and will require some technical wizardry. It doesn’t come for “free” in that it requires a trade-off in terms of various privacy and deployability desiderata. Fortunately, this has been a fertile topic of research over the past several years, and there are papers describing solutions at a variety of points on the privacy-deployability spectrum. This post will survey these papers, and propose a simplification of the Adnostic approach — along with prototype code — that offers significant privacy and is straightforward to implement.

Continue reading

Tracking Not Required: Frequency Capping

Co-authored by Arvind Narayanan.

Debates over web tracking and Do Not Track tend to be framed as a clash between consumer privacy and business need. That’s not quite right. There is, in fact, a spectrum of possible tradeoffs between business interests and consumer privacy.

Our aim with the Tracking Not Required series is to show how those tradeoffs are not at all linear; it is possible to swap a little functionality for a lot of privacy. We only use technologies that are already deployed in browsers, and the solutions we propose are externally verifiable.1

We focus on issues at the center of Do Not Track negotiations in the World Wide Web Consortium. Advertising companies have pledged to stop forms of ad targeting once a user enables Do Not Track, but many maintain that tracking is essential for a litany of “operational uses.” The Tracking Not Required series demonstrates how business functionality can be implemented without exposing users to the risks of tracking.

This first post addresses frequency capping in online advertising, the most frequently cited “operational use” necessitating tracking.

Continue reading