You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This would be useful in packages to avoid cran warnings as pick() is the new preferred way.
library(tidyverse)
options(lifecycle_verbosity="error")
mtcars<-utils::head(mtcars, n=6)
# distinct -----------# renaming generally makes a lot of sense, as distinct is like `transmute(...)` + `distinct()`# consider this in EDAmtcars|>
distinct(x=vs)
#> x#> Mazda RX4 0#> Datsun 710 1# as it is equivalent to: (to avoid cran warnings)mtcars|>
distinct(x=.data$vs)
# Before dplyr 1.1, `across()` was recommended, which works, but .fns = NULL is soft-deprecated.mtcars|>
distinct(across(all_of(c(x="vs"))))
# The alternative when you may not know variable names in a package# with `pick()` should therefore work too, but errors.# I think it should workmtcars|>
distinct(pick(x="vs"))
#> Error in `distinct()`:#> ℹ In argument: `pick(x = "vs")`.#> Caused by error in `pick()`:#> ! Can't rename variables in this context.# Similarly:mtcars|>
distinct(pick(all_of(c(x="vs"))))
# I would expect distinct() and transmute() + distinct() to be same as above (and it is!) ---# transmute() is still very useful as it is mutate + select in a single call)mtcars|>
transmute(x=vs) |>
distinct()
#> x#> Mazda RX4 0#> Datsun 710 1mtcars|>
transmute(x=.data$vs) |>
distinct()
# I think this should be allowed.mtcars|>
transmute(across(all_of(c(x="vs")))) |>
distinct()
#> Error in `transmute()`:#> ℹ In argument: `pick(all_of(c(x = "vs")))`.#> Caused by error in `pick()`:#> ! Can't rename variables in this context.
Now comes arrange()...
arrange()
While I believe distinct() should accept renaming under any circonstances, I believe that accepting renaming in arrange() is inconsistent, as arrange() ignores it silently. renaming (or attempt to do so) should error in all cases for arrange().
However, it behaves exactly the same as distinct(). (so it is consistent in that sense, but ignores )
# arrange() is same as distinct() but silently ignores renaming...# no x variablemtcars|>
arrange(x=vs)
# no x variablemtcars|>
arrange(x=.data$vs)
# no x variablemtcars|>
arrange(across(all_of(c(x="vs"))))
# this rightfully errors!mtcars|>
arrange(pick(x="vs"))
#> Error in `arrange()`:#> ℹ In argument: `..1 = pick(x = "vs")`.#> Caused by error in `pick()`:#> ! Can't rename variables in this context.
Maybe a soft-deprecation warning would be desirable for attempting to rename in arrange()
Something like names ignored, use unnamed, or remove name, or create this variable with mutate() to keep it.
# A beginner may think that this will create the variable as nothing is stopping them.y<-mtcars|> arrange(x=vs, y= desc(disp))
## later tries to access the variable and gets a warning or even worse. This confusing data frame.y|> mutate(z=y**2) # gets very bad output!
Suggestions for signaling that arrange() ignores renaming.
(preferred) soft-deprecation warning
something like: named input is ignored in arrange(). Please omit them.
in all_of(named_external_selection) the adjustment required would just be all_of(unname(named_external_selection))
if you want to create a new variable or rename, use rename() or mutate(), as arrange silently ignores names provided.
using the case of tidyselect deprecation of external selection as an example
it seems pretty similar as the deprecation of select(.data$vs) in favour of select("vs") + requiring any_of() all_of(). it ended up just increasing
general consitency of code base, and improve clarity.
# Before when you read this in someone else's code (i.e. with no context)mtcars|>
select(vs)
# you had no idea if the data had a `vs` column, or `vs` was an external vector# the alternative made everything much clearer.mtcars|>
select(all_of(vs))
My proposed change would force users to remove this potentially misleading code
have the benefit that no one would question if the resulting data frame has a x column or not.
#6980 would be addressed automatically as part of this idea. i.e no longer allowing named selection would act a bit like a check_dots_used() condition
# beforemtcars|>
arrange(x=vs)
# this code is clearer about what it does (and gives the same result as above)# proposedmtcars|>
arrange(vs)
Breaking change (respect renaming in arrange() i.e. arrange() == mutate() + arrange()), Downsides probably outweigh benefits here at this point..
Quoting the 1.1 release notes:
across(), c_across(), if_any(), and if_all() now require the .cols and .fns arguments. In general, we now recommend that you use pick() instead of an empty across() call or across() with no .fns (e.g. across(c(x, y)). (#6523).
Relying on the previous default of .fns = NULL is not yet formally soft-deprecated, because there was no good alternative until now, but it is discouraged and will be soft-deprecated in the next minor release.
I think that allowing renaming in pick() would allow you to soft-deprecate .fns = NULL of across and warn on distinct() + across().
Edit: pick() should also allow renaming in group_by()
The text was updated successfully, but these errors were encountered:
+1 I agree that the current renaming behavior in pick() via pick(new = old) is underspecified, and I struggle to develop a good mental model of when/how it should work.
To revive the discussion, can I also raise an alternative that's similar in spirit but at the opposite extreme in implementation? What if we disallow renaming inside pick() via new = old entirely (so, tidyselect with allow_rename = FALSE), but port the .names argument from across()? For one, I don't think renaming with pick(new = old) is a common pattern that people have picked up on yet, so it doesn't feel as breaking (I'm not even sure whether this behavior is acknowledged or encouraged from reading the docs - it seems to be supported through technicality). More importantly, I think renaming via .names instead would greatly simplify the problem since then it just becomes a question of how different dplyr verbs consume data frames (which sometimes contain new column names), and the behavior for that is already well defined and familiar (e.g., users can translate their existing experience with across(.names)). So I think .names and the act of "passing entire dataframes to dplyr verbs" is a bit more concrete and accessible than having to reason about which verbs invoke a renaming context, which feels a bit more abstract.
Some consequences of what that would look like:
library(dplyr)
# Assume a `pick2()` implementation with `.names`pick2<-function(..., .names) {
across(..., .fns=identity, .names=.names)
}
df<-data.frame(a= c(1,1,2), b= c("a", "a", "b"))
df#> a b#> 1 1 a#> 2 1 a#> 3 2 bdf %>%
mutate(pick2(a:b, .names="{toupper(.col)}"))
#> a b A B#> 1 1 a 1 a#> 2 1 a 1 a#> 3 2 b 2 bdf|>
transmute(pick2(a, .names="x"))
#> x#> 1 1#> 2 1#> 3 2df|>
distinct(pick2(a, .names="x"))
#> x#> 1 1#> 2 2# Solves "renaming in `group_by()`" for freedf|>
group_by(pick2(a, .names="x"))
#> # A tibble: 3 × 3#> # Groups: x [2]#> a b x#> <dbl> <chr> <dbl>#> 1 1 a 1#> 2 1 a 1#> 3 2 b 2
Note that the oddity of the arrange() case still remains a separate problem. In the .names approach, it's just treated like sorting with an external vector. But with .names I feel like there's less of an expectation that you'd create a new column - I think the special .names syntax should sufficiently discourage people into expecting a new column to be created (vs. the new = old syntax).
df|>
arrange(pick2(a, .names="x"))
#> a b#> 1 1 a#> 2 1 a#> 3 2 b
One obvious downside is that pick() would lose the select+rename combo via any_of(<named vector>), which admittedly would be pretty annoying for users who've been enjoying that lean syntax (though I suspect it's not frequently used for pick()).
This would be useful in packages to avoid cran warnings as
pick()
is the new preferred way.Now comes
arrange()
...arrange()
While I believe
distinct()
should accept renaming under any circonstances, I believe that accepting renaming inarrange()
is inconsistent, asarrange()
ignores it silently. renaming (or attempt to do so) should error in all cases forarrange()
.However, it behaves exactly the same as
distinct()
. (so it is consistent in that sense, but ignores )Maybe a soft-deprecation warning would be desirable for attempting to rename in arrange()
Something like names ignored, use unnamed, or remove name, or create this variable with mutate() to keep it.
Suggestions for signaling that
arrange()
ignores renaming.something like: named input is ignored in arrange(). Please omit them.
in all_of(named_external_selection) the adjustment required would just be all_of(unname(named_external_selection))
if you want to create a new variable or rename, use
rename()
ormutate()
, as arrange silently ignores names provided.using the case of tidyselect deprecation of external selection as an example
it seems pretty similar as the deprecation of
select(.data$vs)
in favour ofselect("vs")
+ requiring any_of() all_of(). it ended up just increasinggeneral consitency of code base, and improve clarity.
My proposed change would force users to remove this potentially misleading code
have the benefit that no one would question if the resulting data frame has a
x
column or not.#6980 would be addressed automatically as part of this idea. i.e no longer allowing named selection would act a bit like a
check_dots_used()
conditionQuoting the 1.1 release notes:
I think that allowing renaming in
pick()
would allow you to soft-deprecate.fns = NULL
of across and warn ondistinct()
+across()
.Edit: pick() should also allow renaming in
group_by()
The text was updated successfully, but these errors were encountered: