Question:
What are some of the innovative ways journalists are using Web scraping to access and organize data?
Accessing information for reporting is easier than ever, but very little information comes in a structured form that lends itself to easy analysis. Reporters often are faced with lists or tables that aren’t so easily manipulated, which is where Web scraping comes in handy.
Answers: Remember to refresh often to see latest comments!
6 answers so far.
[…] a social network campaign where we listen and ask a simple question, other weeks may be focused on crowdsourcing feedback around new apps and tools breaking on to the scene. Some weeks the JA team may be supporting a hyperlocal, […]
RT @MichaelMacLeod1: Three cheers for @scraperwiki RT @DataMinerUK: Innovative ways journalists are web scraping to access data: http://t.co/3fiNoqn
The most fascinating results through scraperwiki (and other scraping tools) is, to see what is possible even with unstructured data like large pdf files. This is not only for journalism.
Also important is to visualize the data in good graphs etc.
ScraperWiki (currently featured in the sidebar —–>) is a really good tool for scraping web data. It requires a little bit of coding knowledge, but not an insurmountable barrier for a dedicated journalist with an interest. The nice thing about ScraperWiki is that the scraping is done by the ScraperWiki servers so you do not need to re-run the scraper yourself, but can be notified when ScraperWiki sees changes to the scraping results.
Additionally, it allows people to collaborate on scraping by forking the scraping code.
I agree, three cheers for ScraperWiki,which promises to get even better as time goes on. Here are a couple of reasons why I think it’s a great resource for journalists:
First, you can download data in comma-separated value (.csv) text, which you can easily open in the spreadsheet of your choice (Google Docs, Microsoft Excel, OpenOffice Calc). If you’re looking for an open database format, you can download your data in SQLite 3.
Second, it’s transparent. If you’ve worked with data a little bit it’s not too tough to figure out what the programmer is doing with the code. Also, you can see a reference to the original data source and verify it yourself.
Thanks Randall. We see the Redirectory team has provided scraper examples freely hosted on ScraperWiki to help people use your site to “get their city involved”.
Max Ogden (http://www.maxogden.com/) has a screencast on ScraperWiki: http://vimeo.com/17462239