Honest John: The MoT Files: The Story Behind The Data

Good case study on FoI, open data and the how cleaning data is always the first, and often most valuable, step in the data journalism process: "Following the launch of the OpenData website ... we downloaded the MoT data when it became available and set about getting it into a format that could be easily accessed. With more than 355m records, 200m MoTs (all those since the system was computerised in 2006) and 40gb of data, this wasn't an easy task. Like the BBC, we have also had a few problems dealing with the MoT data that's provided by the Government. Firstly, it's huge and difficult to work with. Secondly, as it's sourced from thousands of technicians - and humans make mistakes - it was littered with errors. There were plenty of cars registered in the 1800s and a few steam-powered Renault Clios to boot. We've done our best to ensure it's as clean as possible, but with such a huge data set, there may still be the odd error."

FT.com: Gove faces probe over private e-mails

"As part of its inquiry, the FT saw or obtained from third parties e-mails discussing government business circulated through private accounts. It then sought disclosure of all or part of seven of them using targeted FOIA requests. The requests explicitly asked for checks on named private accounts. In each case, the department said the information was not held."

Telegraph: Government ‘will take 35 years to recoup tuition fee losses’

"Official estimates suggest the amount of money loaned to students will balloon to a record level by 2047 before the Treasury starts to recoup the losses from graduates. ... Estimates obtained after a Freedom of Information request show that the size of the loans bill will grow for 35 years. This is based on the Government paying out an average fee loan of just over £7,500. "

Belfast Telegraph: FCO red-faced over briefing blunder

"The embarrassing blunder comes just two months after the Ministry of Defence was forced into an emergency retraction of secret information about Britain's nuclear-powered submarines it posted online. Both official documents contained sensitive material that appeared to be blacked out but could in fact be read by anyone using a computer to copy and paste it into another file."

ProPublica: A Reader’s Guide to the (Still Coming) Sarah Palin Emails

"Alaska’s decision to provide only paper copies has been puzzling. While nothing in the state’s public records law requires the state to provide records in electronic form, public agencies are “encouraged” to “make information available in usable electronic formats to the greatest extent feasible.” Though government agencies have fumbled on redactions in the past, software certainly exists to safely redact electronic data. ...
Various news agencies have joined the scramble to sift through the documents and restore them to an electronic format."

Sunlight Labs: The Palin Emails and Redaction Technology

"Today's release of the Palin emails is prompting frustration among reporters, environmentalists and people who know how to use computers over the fact that the documents are being delivered in the form of a huge, $700+ stack of paper. ... this decision is being attributed to the difficulty of performing redaction properly within an all-digital system. ... Redaction mistakes do happen -- the brilliant Tim Lee recently released some interesting work showing how to quantify just how often -- but doing it properly isn't rocket science."

New York Times: Alaska to Release Sarah Palin’s E-Mails

"The news media have descended here en masse to sift through the trove, with many organizations sending teams of reporters and database specialists to comb the documents and post them online. ... some news organizations are setting up elaborate systems for scanning them and inviting the public to help search them online. MSNBC.com, ProPublica and Mother Jones magazine are working with a research company to create an online database of the documents. ... The New York Times and other news organizations intend to assemble their own searchable online databases of the documents, and some, including The Times, were asking readers Thursday to help reporters sift through the voluminous correspondence in the coming days. "