Advancing Empirical Legal Scholarship

Modern quantitative analysis has upended the social sciences and, in recent years, made exciting inroads with law. How complex are the nation’s statutes?1 Did a shift in Supreme Court voting dodge President Roosevelt’s court-packing plan?2 How do courts apply fair use doctrine in copyright cases?3 What factors determine the outcome of intellectual property litigation?4 Researchers have begun to answer these and many more questions through the use of empirical methodologies.

Academics have vaulted numerous hurdles to advance this far, including deep institutional siloing and specialization. But barriers do still exist, and one of the greatest remaining is, quite simply, data. There is no easy-to-get, easy-to-process compilation of America’s primary legal materials. In the status quo, researchers are compelled to spend far too much of their time foraging for datasets instead of conducting valuable analysis. Consequences include diminished scholarly productivity, scant uniformity among published works, and—most frustratingly—deterrence for prospective researchers.

My hope is to facilitate empirical legal scholarship by providing machine-readable primary legal materials. In this first release of data, I have prepared XML versions of the U.S. Code and opinions of the Supreme Court of the United States, through approximately early 2012. Subsequent releases may include additional primary legal materials. I would greatly appreciate feedback from the academic community, particularly with regards to the XML schema, text formatting, and prioritizing materials for release.

Update January 13, 2014: The data is now hosted on Amazon S3 in a requester pays bucket. If you have not properly configured your request, you will receive an “Access Denied” error.

United States Code: ZIP (110 MB)
Supreme Court of the United States Opinions: ZIP (348 MB)


Please note, this is a personal project. It is not related to my coursework or research at Stanford University.

1. Michael J. Bommarito II & Daniel M. Katz, A Mathematical Approach to the Study of the United States Code, 389 Physica A 4195 (2010), available at http://www.sciencedirect.com/science/article/pii/S0378437110004875.
2. Daniel E. Ho & Kevin M. Quinn, Did a Switch in Time Save Nine?, 2 J. Legal Analysis 69 (2010), available at http://jla.oxfordjournals.org/content/2/1/69.full.pdf.
3. Matthew Sag, Predicting Fair Use, 73 Ohio St. L.J. 47 (2012), available at http://moritzlaw.osu.edu/students/groups/oslj/files/2012/05/73.1.Sag_.pdf.
4. Mihai Surdeanu et al., Risk Analysis for Intellectual Property Litigation, Proc. 13th Int’l Conf. on Artificial Intelligence & L. 116 (2011), available at http://dl.acm.org/citation.cfm?id=2018375.