Resource:

Resources Index

What's Scraping & How Does it Work?

Web Scraping 411

Accessing information to support reporting is easier than ever. Governments and corporations publish countless bytes of data online so web scrapers can use a rotating residential proxy to gather the data that’s relevant to them. But very little information comes in a structured form that lends itself to easy analysis. Reporters often are faced with lists or tables – usually in HTML format – that aren’t so easily manipulated. That’s where web scraping comes in handy.

Web scraping for journalists involves a few key steps:

Getting information from somewhere
Storing it somewhere that can be accessed later
And in a form that makes it easy (or easier) to analyze and interrogate

For instance, a web scraper could be used to gather information from a local police department website, and store it in a spreadsheet that can be used to sort through, average, total up, filter and so on.

But those are just the initial aspects of web scraping. Scraping tools and customized scripts offer further benefits, including:

Scheduling a scraper to run at regular intervals
Re-formatting data to clarify it, filter it, or make it compatible with other sets of data (for example, converting lat-long coordinates to postcodes, or feet to meters)
Visualizing data (for example as a chart, or on a map)
Combining data from more than one source (for example, scraping a list of company directors and comparing that against a list of donors)

Journalists have used web scraping to tell a number of important stories in the public interest. Many investigations have relied on scrapers to pull and organize data from the Web. Now some programmer journalists are working to develop new tools to expand and redefine what web scrapers can do.

@OWNIeu talks about the state of open data:

The Open Data movement has made incredible progress in the last year, but challenges remain ahead bit.ly/vrCo3W #OGDcamp

– Owni Europe (@OwniEU) October 31, 2011

The Journalism Accelerator is not responsible for the content we post here, as excerpts from the source, or links on those sites. The JA does not endorse these sites or their products outright but we sure are intrigued with what they’re up to.

Tools to manage your freelancing career, tips for making hard facts easy to read, making journalism “memberful,” a report about the importance of LatinX communities, resources for covering the climate crisis Tools & Tactics Tips & Techniques Innovation & Experiments Reports & Articles People & Collaboration

A Twitter sorting tool, transparency tips, an AI institute for helping vets, how to practice “right speech,” and keeping track of protests and riots Tools & Tactics Tips & Techniques Innovation & Experiments Reports & Articles People & Collaboration

Weigh In: Remember to refresh often to see latest comments!

0 comments so far.

Comment Feed

Check out what's here, offer your comments on what you see. When you do post a comment, the JA team will invite the people behind the resource to connect back with you, responding in line to your comment. Conversation and connection made easy.

Questions Resources Blog Projects About

Resource:

Resources Index

Web Scraping 411

Weigh In: Remember to refresh often to see latest comments!

Get JA Updates

Related Questions

What are some of the innovative ways journalists are using Web scraping to access and organize data?

Recent Blog Posts

Decoding Collaboration Part 3: Collective impact deconstructed

Decoding Collaboration Part 2: News collaborations - defining impact

Decoding Collaboration Part 1: Can or should news collaboration be forced?

What kind of journalism education today best sets students up for success tomorrow?

Recent ResourcesMore

Tweets for Keeps: February 2020

Tweets for Keeps: December 2019

About

Contact Us

Site

Connect