Honest John: The MoT Files: The Story Behind The Data
Reply
Good case study on FoI, open data and the how cleaning data is always the first, and often most valuable, step in the data journalism process: "Following the launch of the OpenData website ... we downloaded the MoT data when it became available and set about getting it into a format that could be easily accessed. With more than 355m records, 200m MoTs (all those since the system was computerised in 2006) and 40gb of data, this wasn't an easy task. Like the BBC, we have also had a few problems dealing with the MoT data that's provided by the Government. Firstly, it's huge and difficult to work with. Secondly, as it's sourced from thousands of technicians - and humans make mistakes - it was littered with errors. There were plenty of cars registered in the 1800s and a few steam-powered Renault Clios to boot. We've done our best to ensure it's as clean as possible, but with such a huge data set, there may still be the odd error."