Over the past year, I have shared various federal primary legal materials formatted in XML. The project’s focus has been enabling empirical legal scholarship with machine-readable government documents.
This final post is accompanied by state materials, including statutes, court opinions, regulations, and administrative rulings. I continue to welcome feedback from fellow researchers.
In earlier posts I have shared XML versions of certain legal materials, including federal statutes, appellate opinions, and appellate rules. My aim has been to assist empirical legal scholars by providing machine-readable government documents.
Additional legal materials accompany this post, including federal trial-level opinions and rules. Suggestions from the research community remain very much welcome.
Last December I shared XML versions of the U.S. Code and Supreme Court opinions through early 2012. My intent was and remains to facilitate empirical legal scholarship by providing government-authored materials in a machine-readable format.
This post is accompanied by additional documents: opinions and rules of various federal appellate tribunals. As before, I welcome feedback from the academic research community.
Modern quantitative analysis has upended the social sciences and, in recent years, made exciting inroads with law. How complex are the nation’s statutes?1 Did a shift in Supreme Court voting dodge President Roosevelt’s court-packing plan?2 How do courts apply fair use doctrine in copyright cases?3 What factors determine the outcome of intellectual property litigation?4 Researchers have begun to answer these and many more questions through the use of empirical methodologies.
Academics have vaulted numerous hurdles to advance this far, including deep institutional siloing and specialization. But barriers do still exist, and one of the greatest remaining is, quite simply, data. There is no easy-to-get, easy-to-process compilation of America’s primary legal materials. In the status quo, researchers are compelled to spend far too much of their time foraging for datasets instead of conducting valuable analysis. Consequences include diminished scholarly productivity, scant uniformity among published works, and—most frustratingly—deterrence for prospective researchers.
My hope is to facilitate empirical legal scholarship by providing machine-readable primary legal materials. In this first release of data, I have prepared XML versions of the U.S. Code and opinions of the Supreme Court of the United States, through approximately early 2012. Subsequent releases may include additional primary legal materials. I would greatly appreciate feedback from the academic community, particularly with regards to the XML schema, text formatting, and prioritizing materials for release.
Update January 13, 2014: The data is now hosted on Amazon S3 in a requester pays bucket. If you have not properly configured your request, you will receive an “Access Denied” error.
United States Code: ZIP (110 MB)
Supreme Court of the United States Opinions: ZIP (348 MB)
Please note, this is a personal project. It is not related to my coursework or research at Stanford University.
1. Michael J. Bommarito II & Daniel M. Katz, A Mathematical Approach to the Study of the United States Code, 389 Physica A 4195 (2010), available at http://www.sciencedirect.com/science/article/pii/S0378437110004875.
2. Daniel E. Ho & Kevin M. Quinn, Did a Switch in Time Save Nine?, 2 J. Legal Analysis 69 (2010), available at http://jla.oxfordjournals.org/content/2/1/69.full.pdf.
3. Matthew Sag, Predicting Fair Use, 73 Ohio St. L.J. 47 (2012), available at http://moritzlaw.osu.edu/students/groups/oslj/files/2012/05/73.1.Sag_.pdf.
4. Mihai Surdeanu et al., Risk Analysis for Intellectual Property Litigation, Proc. 13th Int’l Conf. on Artificial Intelligence & L. 116 (2011), available at http://dl.acm.org/citation.cfm?id=2018375.