Automated Classification of Congressional Legislation

Legislation covers a wide variety of topics. In their paper Automated Classification of Congressional Legislation, Purpura and Hillard present a method that classifies US legislation by subject topic. Their set of Support Vector Machines achieve a precision of 82.20% on the 20 top-level topics, and 71.02% on the 226 subtopics. They claim this is comparable to human performance.

One interesting aspect of Purpura and Hillard’s experiment is its hierarchical nature. To mimic the taxonomical classification performed by human coders, one set of SVMs is trained to assign a bill to one of 20 major topics, like macroeconmics, health, agriculture or education. Based on the top 3 predicted major topics for each bill, another set of SVMs then classifies that bill into one of the candidate subtopics. This two-step process “greatly reduces the computational expense of the sorting” (p.3).

The reported results look generally promising. In an experiment with 108,000 documents (with 50% used for training and 50% for testing), the method achive a precision of well over 80% and even 90% for frequent topics and subtopics. Only classes that contain very different documents (like the “other” class), and topics that are infrequent, prove challenging.

However, Purpura and Hillard’s claim that these results are comparable to human performance, is difficult to check. As they only have one human classification per bill to go by, their evaluation of the agreement between human and machine is merely based on general experience with classification tasks of a similar complexity. A real computation of inter-annotator agreement would be needed to bear out this claim.

I’m keen to try out a similar classification method for the legislation content that we process at Wolters Kluwer. Even if it may not be able to replicate the detailed classification in our taxonomy with thousands of nodes, there are several ways in which a more coarse-grained classification can still be helpful.

Leave a comment