Automatic Semantics Extraction in Law Documents

Automatic Semantics Extraction in Law Documents is a paper presented by Biagioli et al. at the International Conference on Artificial Intelligence and Law in 2005. It discusses two tasks: the automatic classification of law paragraphs into eleven semantic categories, and the extraction of the arguments present in those categories. It’s a promising first step towards making legislative content more accessible.

The first task is paragraph classification. Biagioli et al. identify eleven types of law paragraphs: repeal, definition, deligation, delegification, prohibition, reservation, insertion, obligation, permission, penalty and substitution. In a leave-one-out experiment on 582 such paragraphs, their Multiclass Support Vector Machine is able to classify 92.44% of the paragraphs correctly. Paragraphs are treated as bags of (stemmed) words, frequency is treated as a binary function, and only the 500 words with the highest information gain in the corpus are kept as features.The confusion matrix clearly shows that the performance of the classifier is at its lowest for small classes, which have fewer than 20 instances.

The second task is the extraction of arguments within these paragraphs. For each class, Biagioli et al. have determined a frame with the relevant arguments. For example, an obligation has an addressee and an action, which the addressee must carry out.The argument extractor takes a POS-tagged and chunked paragraph, together with its correct class, and performs two steps: first it performs a syntactic dependency analysis (on the basis of a finite-state grammar), and then it applies a list of specialized rules for the semantic annotation of the text. This two-step process correctly identifies all of the arguments for 82.09% of the paragraphs, and at least one of the arguments for another 15.35% of the paragraphs.

I really like this paper. Laws often prove pretty inaccessible, and I’m convinced we can harness the power of NLP to help close the gap between citizens and their legislation. Biagioli et al.’s experiments are fairly straightforward, but their results show that even relatively simple techniques can go a long way. Only the second experiment leaves some questions unanswered, like a more detailed look into the specialized grammar, and the effort required for its compilation. In addition, the results of the second experiment must be an overestimate, as they assume a perfect outcome of the first. Still, there’s no denying that they open up a promising area for further experimentation.

Leave a comment