Blog:

Blog Index

The Register-Guard’s John Heasly on Web Scraping

Posted by Journalism Accelerator on April 18, 2011

An interview with The Register-Guard’s scraper John Heasly by JA’s Tram Whitehurst

John Heasly is the Web content editor at The Register-Guard in Eugene, OR.

What are some of the ways you’re using web scraping at your newspaper? How has it enhanced your work?

“Here’s one of my favorites. It updates every fifteen minutes and uses this page as its source. Another big scrape is elections… Here’s the full results page.

We also have a smaller customizable widget of selected races that we put on the homepage. It uses both county and state pages for data. The data also gets formatted into InDesign templates and is reverse-published into the paper.”

Where do you see the technology headed? Are there new ways in which it can be used?

“I don’t see any wildly revolutionary changes in the field of screen-scraping, as it’s kind of a hack/workaround to begin with. I think as data publishers-news sources-governments get their acts together, there will be more APIs, so people can get at the data directly. I think the ease of geolocating events is going to continually increase.”

Who else is doing web scraping well?

“Well, any of the EveryBlock sites, of course. The L.A. Times’ crime map recently came to my attention. It seems pretty amazing.”

What is your advice to journalists looking to start using web scraping in their own work?

“Find a problem, attack it! Make sure it’s something you’re passionate about, otherwise, when you hit a bump — and you will hit bumps — you’ll get de-railed. Ask questions in newsgroups, Google groups, help.hackshackers.com. The open-source software and the advice/help are free. All you need is a computer and an Internet connection and an appropriate pig-headedness and you’re set!”

What are your favorite Web scraping tools and guides?

“I like Python as a language. I like the Python module BeautifulSoup for taming what I’ve scraped and Django as a Web framework for serving the scrapings.”

Tools to manage your freelancing career, tips for making hard facts easy to read, making journalism “memberful,” a report about the importance of LatinX communities, resources for covering the climate crisis Tools & Tactics Tips & Techniques Innovation & Experiments Reports & Articles People & Collaboration

A Twitter sorting tool, transparency tips, an AI institute for helping vets, how to practice “right speech,” and keeping track of protests and riots Tools & Tactics Tips & Techniques Innovation & Experiments Reports & Articles People & Collaboration

Questions Resources Blog Projects About

Blog:

Blog Index

The Register-Guard’s John Heasly on Web Scraping

Weigh In: Remember to refresh often to see latest comments!

Get JA Updates

Related Questions

What are some of the innovative ways journalists are using Web scraping to access and organize data?

Recent Blog Posts

Decoding Collaboration Part 3: Collective impact deconstructed

Decoding Collaboration Part 2: News collaborations - defining impact

Decoding Collaboration Part 1: Can or should news collaboration be forced?

What kind of journalism education today best sets students up for success tomorrow?

Recent ResourcesMore

Tweets for Keeps: February 2020

Tweets for Keeps: December 2019

About

Contact Us

Site

Connect