| Bio | https://www.urban.org/author/graham-macdonald |
| Blog | https://urban-institute.medium.com/ |
| Data Catalog | https://datacatalog.urban.org/ |
| Bio | https://www.urban.org/author/graham-macdonald |
| Blog | https://urban-institute.medium.com/ |
| Data Catalog | https://datacatalog.urban.org/ |
Since Hadley has announced it on twitter I will do the honours on here, but I'll forego the pirate-speak out of common decency...
There's a new chapter on #ApacheArrow and Parquet data in R4DS. It's mostly based on my work so please let me know if you spot any problems with the chapter and I promise to annoy Hadley with a pull request fixing it #RStats
It's crazy how much better the tooling has got for doing background research and literature reviews before tackling a topic to better understand what's already been done.
My workflow currently is https://elicit.org/ to discover and quickly summarize papers and https://www.researchrabbit.ai/ to dive deeper into related papers after that initial scan. What's yours?
My favourite trick for working with huge data sets in R. If your dataset is larger than memory and the query result is also larger than memory, you can still use dplyr/arrow pipelines. Example:
library(arrow)
library(dplyr)
nyc_taxi <- open_dataset("nyc-taxi/")
nyc_taxi |>
filter(payment_type == "Credit card") |>
group_by(year, month) |>
write_dataset("nyc-taxi-credit")
Input is 1.7 billion rows (70GB), output is 500 million (15GB). Takes 3-4 mins on my laptop 🙂
Hi folks! A lot more activity on here, so introducing myself.
I lead the Urban Institute's Technology and Data Science team, and post mostly about our innovative, cutting edge work in partnership with our top researchers, providing new data and analytics tools that help communities, organizations, advocates, and policymakers make better, more equitable decisions.
If that's your space, follow me and I'll likely follow you back!
My Mastodon tips so far:
- Use the home timeline to get info from the people you actually follow (!)
- Use the # Explore timeline to get the dopamine hit from the most popular tweets.
I'm currently doing both to ween myself off the addictive twitter scrolling, but as more of the people I like to follow move here, I'm hoping to just stick to the home timeline going forward.