Building a search application in 15 person-days
When a governing body like the United Nations passes a resolution, a critical component in its success is ensuring that the actual content of the resolution is known to citizens in the affected regions and relevant NGOs. Even for the countries that participated in the decision, it can be challenging to keep all affected government bodies informed and up to date about decisions and how they evolve over time.
The resolutionfinder.org data repository aims to facilitates access to UN Resolutions, so that anyone interested in a particular resolution can readily find its text. The aim of the project is to make it easier for citizens anywhere in the world to find the text of these resolutions, and use the results to help deliver the implementation of the resolutions.
Within just 15 man days a team of three developers put together an easy to navigate frontend that encompassed the entire breadth of data that was collected by a volunteer team of graduate students and young professionals from all over Europe over the course of two years. The initial data set spans some 1,000 documents and over 3,000 clauses in four thematic areas; small arms and light weapons; women and education; clean drinking water; and malaria.
The data was collected manually from various different online document repositories provided by different UN organizations. The resulting data and the developed Solr driven search interface was one of the highlights at the “UN-connecting the World“ conference held in Geneva in late May 2010, where the application was launched as a useable concept and technology show case that will form the basis for future developments.
The rapid development of application was only possible due to the powerful out of the box features of Lucene and Solr as well as the existence of integration plugins with the popular PHP based Symfony framework.
The idea behind resolutionfinder.org was to create an instrument that facilitates access to UN agreements with the purpose of improving the process of implementation. Unlike other databases available at the moment, resolutionfinder.org not only compiles documents, but it also extracts clauses relevant for implementation and provides the evolution of documents and clauses. In its first release, the repository contains a substantial amount of information in four thematic areas: Clean Drinking Water, Malaria, Small Arms and Light Weapons and Women and Education.
The development of resolutionfinder.org has been supported in the past two years by the World Federation of United Nations Associations (WFUNA) and especially by the United Nations Association of Germany (DGVN). The project is currently in negotiations for a partnership with the International Security Network (ISN). Search is currently limited to 4 thematic areas, which means that there may not be content for search strings across all UN documents ever issued. However, there are some good examples of search already up and running
A search for “malaria” documents which have been tagged with “local strategies”, which might be relevant for an NGO looking to support local strategies:
more advanced query strings, such as for UN Security Council resolutions touching on regional topics outside of Africa:
Key Search features of Solr leveraged in development
- Full text searching with stemming to be able to handle search input more flexibly
- Faceting to both enable the user to get a better understanding of what type of data matched the initial full text search, as well as enabling additional filtering
- Highlighting to better visualize why documents are included in the final results
- Tight integration with the PHP symfony framework through the sfSolrPlugin and the Doctrine ORM to assist in generation of Solr configuration files and automatic data import into Solr
- The Open Source nature of the entire application stack meant that no licensing costs were incurred for the entire application
Since resolutionfinder.org is an entirely volunteer effort that currently does not have an IT budget the number and is mostly comprised of experts in the domain of UN research and not application development, the available development ressources were slim. It became clear that even with development time and hosting sponsored by the Liip AG of Switzerland, there would effectively be only 15 man days that could be dedicated to the development of the frontend. Especially as in parallel there was still work going on with migrating the excel sheets containing the last 2 years of research into the relational database.
The goal was to provide full text search capabilities with facet based filtering, displaying of documents and their containing clauses including their historical development. Users should also be able to register in order to bookmark and comment on clauses and documents.
The team already had an existing database schema and a symfony based administration tool which was to handle the excel sheet import. Due to the use of the Doctrine ORM and sfSolrPlugin loading the data into Solr was fully automated through just a small configuration file, which mapped properties and methods in the data model to fields in Solr. As a result after importing an excel sheet it was automatically available in Solr for searching without any additional code. The same configuration file also generated the main Solr configuration files.
Finally sfSolrPlugin bundled a fully working Solr installation using Jetty as the servlet container including administrative scripts for Solr. Within just one day a test data set was imported into Solr and the first tests on text searches were implemented giving convince to the entire team that the target was indeed possible. This meant that there was also zero time wasted having to install and configure Solr on each of the developer’s machines.
Within just a few days an entire facet based filtering system was integrated that enabled users to click to reduce the result set along several dimensions without having to manually trigger a page reload. Via the native highlighting capabilities the user gets visual indication of why the given document is relevant. Additionally the results are color coded to give the users a better idea of the relative legal value of the document. In order to be able to drill down a result set users are presented facet based filters over eight dimensions. The facet dimensions also help in giving the user an idea of how the data is distributed as each of the filter options also indicates the number of documents that match the given dimension for the given search criteria.
The entire source code base is available under the BSD Open Source license. There are even plans to make the entire data available to enable others to innovate on their own.
“The entire team was surprised how much was possible in such a short timeframe, even leaving time for additional polish where we expected to have to make due with an application that would just be a raw tech demo which would have been dependent on the imagination of users rather than showing a concrete version that is already useable for end users.” says Lukas Smith, of the resolutionfinder.org team.
Over the next couple of months, the main focus is to improve the quality of the database and in the long run to extend it up to the point when it includes all thematic areas on the UN agenda. In this regard, research is on going for IT solutions in order to make the database universal in a more efficient way. Especially data mining tools to automate the parsing of PDF and HTML based UN documents into the database which should allow for quickly growing the data set by orders of magnitude. Once the data mining tools are in place localization of the interface as well as covering documents and clauses in all six official UN languages will also become a focus area of development. Further work is also planned to enable different types of searches that focus more on chronological aspects or certain UN organisations or member states.
- Dedicated virtual host
- Linux Debian
- MySQL RDBMS
- PHP Frontend