Carl Boettiger

@cboettig
5 Followers
196 Following
38 Posts
Associate Professor, UC Berkeley. Theoretical ecology & evolution, data science, open science.
homepagehttps://carlboettiger.info

Really love what https://ecoevo.social/about has set up with approved signup, https://masto.host/ hosting, and open funding https://opencollective.com/ecoevosocial (my recurrent payment setup ✅ )!

Way to go @alxsim !

I think it's time to migrate over... 🤞 err... can one of you wonderful people send me an invite link?

ecoevo.social

Dedicated to Ecology and Evolution. We welcome academics, students, industry scientists, folks from other fields with links to E&E, scientific societies, and nature enthusiasts in general.

Mastodon hosted on ecoevo.social

Proud and excited that our work at @ropensci to provide peer review of statistical #RStats software is reaching fruition. Our first packages have earned their badges after our collaborative, constructive + rigorous peer-review process, helped by community-driven standards and a lot of new automated tools.

Thanks to all our authors, reviewers, and editors, the Sloan Foundation for funding, and a *huge* shout-out to @mpadge who has shepherded this process.

https://ropensci.org/blog/2022/11/30/first-peer-reviewed-stats-packages @rstats

Our First Peer-Reviewed Statistical R Packages!

rOpenSci is very excited to announce our first peer-reviewed statistical R packages! One of rOpenSci’s core programs is software peer-review, where we use best practices from software engineering and academic peer-review to improve scientific software. Through this, we aim to make scientific software more robust, usable, and trustworthy, and build a supportive community of practitioners. Historically, we have focused on R packages that manage the research data life cycle. Now, thanks to work over the past two years supported by the Sloan Foundation we also facilitate peer-review of packages that implement statistical algorithms.

NOAA and Microsoft have forged a formal agreement to harness Azure’s #cloud computing tools and help advance NOAA’s mission to create a Climate-Ready Nation

https://www.noaa.gov/news-release/noaa-microsoft-team-up-to-advance-climate-ready-nation

#HPC #AI #ML via @hpcnotes

NOAA, Microsoft team up to advance Climate-Ready Nation

NOAA and Microsoft have forged a formal agreement to harness Microsoft’s cloud computing tools and help advance NOAA’s mission to create a Climate-Ready Nation.  “We are excited about the potential of partnering NOAA’s environmental intelligence with Microsoft’s cloud computing in hopes of ampli

For anyone interested in this, I highly recommend taking a look at their recent paper, https://doi.org/10.3390/data4030092, which describes the approach and underlying gdalcubes C++ library, and comparison to the GEE.

#rstats #geospatial #cpp

On-Demand Processing of Data Cubes from Satellite Image Collections with the gdalcubes Library

Earth observation data cubes are increasingly used as a data structure to make large collections of satellite images easily accessible to scientists. They hide complexities in the data such that data users can concentrate on the analysis rather than on data management. However, the construction of data cubes is not trivial and involves decisions that must be taken with regard to any particular analyses. This paper proposes on-demand data cubes, which are constructed on the fly when data users process the data. We introduce the open-source C++ library and R package gdalcubes for the construction and processing of on-demand data cubes from satellite image collections, and show how it supports interactive method development workflows where data users can initially try methods on small subsamples before running analyses on high resolution and/or large areas. Two study cases, one on processing Sentinel-2 time series and the other on combining vegetation, land surface temperature, and precipitation data, demonstrate and evaluate this implementation. While results suggest that on-demand data cubes implemented in gdalcubes support interactivity and allow for combining multiple data products, the speed-up effect also strongly depends on how original data products are organized. The potential for cloud deployment is discussed.

MDPI

dplyr 1.1.0 is coming soon!! 🎉🎉

We are so excited to introduce you to the new features we've been working on, including:
- Temporary inline grouping with `.by`
- Non-equi joins
- Faster `arrange()`

And SO much more! #rstats

Check out the blog post from @davis https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/

dplyr 1.1.0 is coming soon

dplyr 1.1.0 is coming soon! This post introduces some of the exciting new features coming in 1.1.0, and includes a call-for-feedback as we finalize the release.

Mind blown 🤯 from impressive examples and expressive syntax in the new gdalcubes 🛰️ 🌍 from Marius Appel, @edzer & team when working with STAC catalog entries: https://r-spatial.org/r/2021/04/23/cloud-based-cubes.html

Their intuitive, high-level API lets us process some massive datasets with almost no RAM footprint, making these examples accessible for classroom use (e.g. on free-tier codespaces) 🚀 .

#rstats

🚀 I can't explain how proud I'm to start this series of @ropensci interviews of _The Stars of R-Universe_ with the work done by @tuqmano , @pablote & their team in opening data in Argentina and using #RStats #RStatsES

@rstats

🤩 And with a bilingual entry (Spanish and English on the blog post and video captions).

Come and check 👇

[EN] https://ropensci.org/blog/2022/11/23/r-universe-stars-1-en/

[ES] https://ropensci.org/blog/2022/11/23/r-universe-stars-1-es/

Meeting the stars of the R-universe: R Community, Exchange and Learn

This is the first post of our interview series __"Meeting the stars of the R-universe"__. We begin our journey in _Argentina_ with a team that uses R and develops R packages in the Argentinean State.

This is an excellent post about Visual Studio Code and its design to fracture.

VSCode is not open-source; it's proprietary software.

While I don't have anything against proprietary software, what Microsoft does here is open-source washing, which feels very malignant.

https://ghuntley.com/fracture/

#VScode
#VisualStudio
#OpenSource
#FOSS
#Programming
#Microsoft

Visual Studio Code is designed to fracture

A couple of moments ago, I finished reading the article by Rob O'Leary about the pervasive data collection done by Visual Studio Code. Now that I'm no longer an employee at Gitpod, I'm finally able to author a blog post freely about something that has been troubling me for quite

Geoffrey Huntley

My favourite trick for working with huge data sets in R. If your dataset is larger than memory and the query result is also larger than memory, you can still use dplyr/arrow pipelines. Example:

library(arrow)
library(dplyr)

nyc_taxi <- open_dataset("nyc-taxi/")
nyc_taxi |>
filter(payment_type == "Credit card") |>
group_by(year, month) |>
write_dataset("nyc-taxi-credit")

Input is 1.7 billion rows (70GB), output is 500 million (15GB). Takes 3-4 mins on my laptop 🙂

#rstats #ApacheArrow

#Introduction — I am an Associate Professor at #ucberkeley trained in #EvolutionaryBiology, now mainly in #MicrobialEcology. Work in my group focuses on how interactions between #plants, #microbiomes, and #phages shapes natural #diversity and could be used to shape #sustainableagriculture. I really loved #twitter and am sad it was brought down by a megalomaniac with far too much money and ego. So here I am.