Christopher Guess

132 Followers
147 Following
113 Posts

Journalist | Comp Sci | Policy | Disaster Relief

Lead Technologist at Duke Reporters’ Lab working on fact checking. Amateur chess player.

Technically based in Brooklyn, NY

Christopher Guess
Christopher Guess
Christopher Guess
Christopher Guess
Here's a request: please don't poke fun at Americans. Roughly half of them are heartbroken or even frightened right now. They're the ones who read you. The folk that made this a reality don't, and if they do they give zero fucks what you call them. So maybe just show some compassion.
Fun story: Tim McVeigh was originally arrested for an unregistered handgun. Which is no longer an arrest-able offense in Oklahoma
Every time a researcher learns about SciHub a professor gets their tenure approved.
Anyone around #37c3 this week? Would love to meet some folks!
@pbrass Agreed! Almost its own book in its own right
@dkiesow Giving it a read today!
@dkiesow That’s the question! I’ve gotten pretty good at it over the years but I’m sure there’s a ton of different theories and techniques out there.
@dkiesow things like link path tracing, rescrape timing, detection of index pages vs (what I call) "terminal pages" (think an article on a news site which will basically never change vs a section homepage that is constantly updated). Content extraction from arbitrary pages, also things such as building a scraper that won't accidentally overwhelm a page.
Has anyone ever written a book on web scraping theory not tied to a specific language or framework? Not "web scraping with scrapy" or "web scraping with python" but something along the lines of "advanced web scraping theory"?
@cguess @eNBeWe Hopefully my talk gets chosen I guess.