Automated Classification of Congressional Legislation

Legislation covers a wide variety of topics. In their paper Automated Classification of Congressional Legislation, Purpura and Hillard present a method that classifies US legislation by subject topic. Their set of Support Vector Machines achieve a precision of 82.20% on the 20 top-level topics, and 71.02% on the 226 subtopics. They claim this is comparable to human performance.

Continue reading

Automatic Semantics Extraction in Law Documents

Automatic Semantics Extraction in Law Documents is a paper presented by Biagioli et al. at the International Conference on Artificial Intelligence and Law in 2005. It discusses two tasks: the automatic classification of law paragraphs into eleven semantic categories, and the extraction of the arguments present in those categories. It’s a promising first step towards making legislative content more accessible.

Continue reading

Automatic Classification of Sentences in Dutch Laws

Legislation is a mixed bag of content. Within the same law, one can find definitions of terms, norms that describe a right or a duty, specifications of penalization, descriptions of changes to previous laws, etc. In their paper Automatic Classification of Sentences in Dutch Laws, Emile de Maat and Radboud Winkels present a method to identify these types of sentences in legislation automatically.

Continue reading

Translations from the Crowd

Crowdsourcing is one of the most promising trends of recent years. As I’m interested in this topic mainly from the perspective of language technology, I was wondering if we can use the crowd to obtain high-quality translations. Yesterday I found a paper by Omar F. Zaidan and Chris Callison-Burch that discusses crowdsourcing translations from Amazon’s Mechanical Turk.

Continue reading

Akoma Ntoso: an XML standard for legislative content

In recent years there have been several intiatives to standardize legislative content. One of the most promising is Akoma Ntoso, an XML standard for parliamentary, legislative and judiciary documents. It provides relatively simple XML elements that capture the structural and semantic properties of the content, in order to make it machine-readable and easily interchangable. Akoma Ntoso was developed in the context of the Africa i-Parliaments Action Plan, an Africa-wide initiative that aims to make African legislative processes more transparent. Its application isn’t restricted to Africa, however: it is currently used by the European Union and considered by the United States Congress.

Continue reading

NoSQL Distilled: A Review

nosql_distilledRecently in my job at Wolters Kluwer I’ve started to look beyond the world of relational databases. The relational model forces our data into an unnatural structure, which through the years has become less and less practical to work with. The linked nature of our data is severely disregarded, and the inflexible table structure makes our new texts exceedingly hard to manage. Our team of engineers is therefore growing increasingly interested in non-relational databases. That’s why I picked up NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence by Martin Fowler and Pramod J. Sadalage, a book that promised me a clear and succinct introduction to the world of NoSQL databases. Here’s my summary and review. 

Continue reading

Flemish Open Data Day

Yesterday I took part in the Open Data Day organized by the Flemish government in Brussels. Open Data is experiencing a boost these days: governments all over the world are pledging to open up their information, and they expect good things to happen in return. The European Union dreams out loud about increased government efficiency, more transparancy, and 140 billion euros in economic activity. Good things are certainly happening: inspiring initiatives are sprouting up, and with more than 200 people in attendance, the Flemish Open Data Day showed that in Flanders too, interest is high. At the same time, it brought to light some possible conflicts and practical problems.

Continue reading

Web Intelligence and Big Data

The last ten weeks I’ve been taking the course Web Intelligence and Big Data on Coursera. Coursera is a company founded by two Stanford professors that works together with renowned universities to offer their courses online. Students take classes in the form of short videos, they answer quizzes, make homeworks and finally complete an exam. Whether or not it will disrupt higher education in the sense that its founders intended, this new method of learning is certainly a great way to polish up some long-forgotten knowledge or to get an introduction to a new field.

Continue reading