We would like to guide our efforts towards improving duckplyr, focusing on the features with the most impact. For this, duckplyr now has opt-in telemetry: https://duckdblabs.github.io/duckplyr/reference/fallback.html .

This has to be enabled explicitly, and is fairly easy to opt out of entirely. We make an effort to anonymize the data before collecting it or uploading it.

I’m aware this is a sensitive topic, and I’m curious to hear your thoughts. Would you opt in, opt out, or upload manually? https://masto.machlis.com/@smach/112055384564988705

Fallback to dplyr — fallback

The duckplyr package aims at providing a fully compatible drop-in replacement for dplyr. To achieve this, only a carefully selected subset of dplyr's operations, R functions, and R data types are implemented. Whenever duckplyr encounters an incompatibility, it falls back to dplyr. To assist future development, the fallback situations can be logged to the console or to a local file and uploaded for analysis. By default, duckplyr will not log or upload anything. The functions and environment variables on this page control the process. fallback_sitrep() prints the current settings for fallback logging and uploading, the number of reports ready for upload, and the location of the logs. fallback_review() prints the available reports for review to the console. fallback_upload() uploads the available reports to a central server for analysis. The server is hosted on AWS and the reports are stored in a private S3 bucket. Only authorized personnel have access to the reports. fallback_purge() deletes some or all available reports.

@kirill R needs a telemetry package so you don't have to roll your own, and people then know where to go looking for telemetry settings.
@geospacedman Antoine Fabri has https://github.com/moodymudskipper/bigbrothr, but it needs more attention. Could that be an R Consortium project?
GitHub - moodymudskipper/bigbrothr: Provide Automated Feedback to Package Maintainers on the usage of their package.

Provide Automated Feedback to Package Maintainers on the usage of their package. - moodymudskipper/bigbrothr

GitHub
@geospacedman And I’m still curious: would you opt in or opt out?
@kirill I'd opt out and add the IP address to my blocklist :)

@geospacedman That’s definitely a valid choice, I hope it’s easy enough with duckplyr.

And I also hope it’s easy enough to do it the other way.

@kirill My main duckdb use is on a Secure TRE and telemetry would get blocked anyway. Takes 24 hours to get approval for data egress!
@geospacedman I have split data collection and data upload, both can be configured separately. Do you have concerns regarding the data collection per se, or only regarding the upload?
@kirill Data collection (locally) is just "logging" by another name, right (does it use an extant logging package?). No problem with logging. I never want to involve remote servers with my data analysis in any way whatsoever though (and yes I understand this is the default for your system).
@geospacedman Just to be super clear: both logging/data collection and uploads are opt-in. We write to a standard location, but it’s ndjson without a logging package. You could opt in for logging, review the logs and share with us what you think is good to share. How does that feel?