Honest John: The MoT Files: The Story Behind The Data

Good case study on FoI, open data and the how cleaning data is always the first, and often most valuable, step in the data journalism process: "Following the launch of the OpenData website ... we downloaded the MoT data when it became available and set about getting it into a format that could be easily accessed. With more than 355m records, 200m MoTs (all those since the system was computerised in 2006) and 40gb of data, this wasn't an easy task. Like the BBC, we have also had a few problems dealing with the MoT data that's provided by the Government. Firstly, it's huge and difficult to work with. Secondly, as it's sourced from thousands of technicians - and humans make mistakes - it was littered with errors. There were plenty of cars registered in the 1800s and a few steam-powered Renault Clios to boot. We've done our best to ensure it's as clean as possible, but with such a huge data set, there may still be the odd error."

New York Times: Joe Weisenthal vs. the 24-Hour News Cycle

Profile of Joe Weisenthal of Business Insider: "During the course of an average 16-hour day, Weisenthal writes 15 posts, ranging from charts with a few lines of explanatory text to several hundred words of closely reasoned analysis. He manages nearly a dozen reporters, demanding and redirecting story ideas. He fiddles incessantly with the look and contents of the site. And all the while he holds a running conversation with the roughly 19,000 people who follow his Twitter alter ego, the Stalwart. ... He is like the host of a daylong radio show, except no one speaks out loud. He rarely makes phone calls. His phone almost never rings."

O’Reilly Radar: Why StreetEasy rolled its own maps

"StreetEasy co-founder Sebastian Delmont (@sd) says that when Google told them last autumn that it intended to enforce pricing on its Maps API, the StreetEasy team looked about for more affordable options. Their experience was similar to others who have turned from proprietary systems to open data: It's work to get started, but ultimately you have more freedom to create and innovate."

Research notes: A completely arbitrary list of takeaways from two unconferences

Matt Waite on the trouble with finding budding journalist-developers: "I think the problem with finding these students starts with reward structures. Students are told from even before they walk on campus that being a journalist means Being a Good Writer, Being a Good Editor, Being a Good Photographer. No one is telling them they could be an application developer, or a data journalist, or a media entrepreneur. Or if they have heard it, that voice is getting drowned out by traditionalists. A disturbing amount of time, the traditionalists drowning those students out are other students. Until we can attach a reward to this — until it cracks the consciousness of students that there are jobs in this path — I think we’ll continue to struggle."

Nieman Journalism Lab: The newsonomics of the long goodbye: Kodak’s, Sears’, and newspapers’

Ken Doctor: on digitally disrupted companies' "long goodbye": "data shows 44 percent less newsprint usage (and about 75-80 percent of all newsprint usage is attributed to newspapers) over the past four years, according to The Reel Time Report. ... I’m tracking revenues from Kodak, Sears, and all U.S. dailies through 2010 ... U.S. newspapers’ ad revenue decline is worse, percentage wise, than either Kodak’s or Sears’. Yes, although Kodak and Sears are now poster children of legacy businesses gone wrong, newspapers — as counted through their main revenue source — are doing worse."

Sydney Morning Herald: New form of journalism must adhere to old rules

Pollster Mark Textor: "Too often, data journalists suddenly pretend to be experts. But a journalist is a not a mathematician or statistician. With data journalism that is exactly what they pretend to be. They imagine they are something way beyond the pay grade of the average journalist with a graduate degree. Also there is a subtle but significant change in roles that is a dangerous precedent. Rather than independently comparing different data sets, they become advocates for their own information."