Mastodawn

Graham MacDonald Dec 15, 2022

Our year end wrap-up here at Urban: Check out how we do our work, behind the scenes, in our top 5 Data@Urban posts of 2022: https://urban-institute.medium.com/data-urbans-top-posts-of-2022-5f913e2b5196

Data@Urban’s Top Posts of 2022 - Data@Urban - Medium

Optical character recognition (OCR) services vary by cost, ease of use, confidentiality and ability to handle other types of data. We compared four examples that vary across these dimensions…

Medium

Graham MacDonald Dec 10, 2022

Danielle Navarro Dec 10, 2022

Since Hadley has announced it on twitter I will do the honours on here, but I'll forego the pirate-speak out of common decency...

There's a new chapter on #ApacheArrow and Parquet data in R4DS. It's mostly based on my work so please let me know if you spot any problems with the chapter and I promise to annoy Hadley with a pull request fixing it #RStats

https://r4ds.hadley.nz/arrow.html

22 Arrow – R for Data Science (2e)

Graham MacDonald Nov 23, 2022

It's crazy how much better the tooling has got for doing background research and literature reviews before tackling a topic to better understand what's already been done.

My workflow currently is https://elicit.org/ to discover and quickly summarize papers and https://www.researchrabbit.ai/ to dive deeper into related papers after that initial scan. What's yours?

Graham MacDonald Nov 20, 2022

Danielle Navarro Nov 20, 2022

My favourite trick for working with huge data sets in R. If your dataset is larger than memory and the query result is also larger than memory, you can still use dplyr/arrow pipelines. Example:

library(arrow)
library(dplyr)

nyc_taxi <- open_dataset("nyc-taxi/")
nyc_taxi |>
filter(payment_type == "Credit card") |>
group_by(year, month) |>
write_dataset("nyc-taxi-credit")

Input is 1.7 billion rows (70GB), output is 500 million (15GB). Takes 3-4 mins on my laptop 🙂

#rstats #ApacheArrow

Graham MacDonald Nov 20, 2022

Alright folks, I've made the jump from Twitter and will be fully committed here, as I'm finding it much more useful professionally. If there are folks you're loving on here that I should follow, let me know!

Graham MacDonald Nov 18, 2022

Hi folks! A lot more activity on here, so introducing myself.

I lead the Urban Institute's Technology and Data Science team, and post mostly about our innovative, cutting edge work in partnership with our top researchers, providing new data and analytics tools that help communities, organizations, advocates, and policymakers make better, more equitable decisions.

If that's your space, follow me and I'll likely follow you back!

Graham MacDonald Nov 18, 2022

My Mastodon tips so far:

- Use the home timeline to get info from the people you actually follow (!)
- Use the # Explore timeline to get the dopamine hit from the most popular tweets.

I'm currently doing both to ween myself off the addictive twitter scrolling, but as more of the people I like to follow move here, I'm hoping to just stick to the home timeline going forward.

Graham MacDonald Nov 7, 2022

How do federal government statistical agencies use data science in their work? A summary from Statistics Canada: https://hdsr.mitpress.mit.edu/pub/x0l4x099/release/1#data-science-applications

Bio	https://www.urban.org/author/graham-macdonald
Blog	https://urban-institute.medium.com/
Data Catalog	https://datacatalog.urban.org/