Clear the Clutter with Google Refine 2.0, an Open Source Tool to Manage Messy Data

Written By Sam on 13 November 2010

Google comes up with yet another new product. Called Google Refine 2.0, this one is for everyone who needs help managing their messy data and putting it all straight. Google Refine 2.0 is an open source power tool that helps users manage and maintain their messy data by cleaning data inconsistencies, transforming formats from one to the other and even extending them with new from web services or other databases.

Google Refine is a reiteration of the Metaweb acquisition that took place in July. This acquisition also brought along the Freebase Gridword, which was an open source software project that was developed for cleaning, sorting and enhancing data sets.

Google Refine 2.0 offers:

  • A new extensions architecture
  • Just like Freebase, it proves a reconciliation framework to link records with other databases.
  • Transformation expressions and commands have also been abundantly added.
  • Its key features include importing, editing, filtering, exporting, undoing and redoing history and extensions.
  • A new feature called “reconciliation” has been added to link texts in the data with database identifiers called database keys or IDs.

Google has high expectations and hopes of the product being a hit with the likes of data journalism and open government data communities as they were previous users of Freebase Gridwords 1.0 and were very happy with the services offered in the application. Students, journalists, data researchers and others handling large amounts of data will also find this product useful.

Leave your response!