Mastodawn

@modrak_m I don't even think that is going to work. If something is registered as an S3 method, you cannot really address it as a function. (And if you could, it probably has to be triple ::: ).

Stas Kolenikov Jan 28

RE: https://fosstodon.org/@R_devs_news/115966889498732323

now that is very sensible

Stas Kolenikov Jan 28

@keefeglise That was a SAS or SPSS programmer. These are case insensitive.

Show thread

Stas Kolenikov Jan 28

@kirill (I just recall earlier versions of duckplyr to have very explicit `overwrite_dplyr_verbs()` or something like that, that's why I said, "overwrite". Getting methods dispatched before it gets to dplyr is a more graceful thing to say but takes three passes of Adv-R to understand.)

Show thread

Stas Kolenikov Jan 28

Thanks @kirill

OK more pointedly -- I see that duckplyr code boldly uses methods without namespace prefixing -- `count()` here (https://github.com/tidyverse/duckplyr/blob/fa9b12e72f234524042542039499d361c6a32b14/R/count.R#L92) and `select()` there (https://github.com/tidyverse/duckplyr/blob/fa9b12e72f234524042542039499d361c6a32b14/R/select.R#L39)... so it falls onto the S3 system to figure it out. The vignette (https://duckplyr.tidyverse.org/articles/duckdb.html) however shoves it down the throat with conflicted::conflict_prefer("filter", "dplyr") and I don't think conflicted should be used in the context of (any) package code and only be used in analytical code.

Stas Kolenikov Jan 27

2. #' @importFrom duckplyr select filter mutate arrange count summarize

to take these functions and expect that duckplyr will figure out how to fall back onto dplyr when needed.

4/4

P.S. @kirill hope you can shed some light

Show thread

Stas Kolenikov Jan 27

If not... I can think of two relaxations of the package::function() style rule.

1. Ignore it entirely and just write dplyr / duckplyr verbs as is. This is a ticking bomb as filter or select that is not namespaced could just as well go back to stats for a time series function and MASS for God only knows what (I hope Ripley and Venables and Bates forgive me for such a reference... but that function is not really documented in its entry in MASS).

3/4

Show thread

Stas Kolenikov Jan 27

However my understanding of duckplyr approach to life and universe is that it overwrites the dplyr verbs. So if I explicitly declare dplyr:: namespacing in my package functions, I am denying duckplyr the opportunity to take over and provide that 5-10x speedup I am hoping to see. Should I expect that duckplyr::dplyr_verb() will work properly in this context?

2/maybe 4

Stas Kolenikov Jan 27

Shouting to the void: How to properly namespace #duckdb / #duckplyr in my #rstats packages?

One of @hadleywickham core style recommendations for package development is that every external function needs to be explicitly namespaced:

function_in_my_package <- function(df, x, ...) {
df |> dplyr::mutate(xx = stringr::str_do_something(x))
# implict return
}

1/maybe 4

Show thread

Stas Kolenikov Nov 22

From https://duckplyr.tidyverse.org/articles/duckdb.html

3. use dd$fun() for functions internal to duckdb and SQL (https://cynkra.github.io/dd/reference/index.html) -- compute string distances on the server with dd$damerau_levenshtein() and dd$jaro_winkler_similarity()

4. Distinguish between "lavish" (materialze right away), "stingy" (never materialize) and "thrifty" (materialize with <1M cells) flavors of duckplyr frames (reset with read_parquet_duckdb(..., prudence = c(cells = 10000, rows = 1000) )

Personal website	https://staskolenikov.net/
Google Scholar	https://scholar.google.com/citations?user=TuJeDtcAAAAJ&hl=en

Interoperability with DuckDB and dbplyr