I (only! but luckily!) recently got introduced to the magic of purrr::reduce()
. Thank you, Tobias! I was told about it right as I was unhappily using many for loops in a package1, for lack of a better idea. In this post I’ll explain how purrr::reduce()
helped me reduce my for loop usage. I also hope that if I’m doing something wrong, someone will come forward and tell me!
This post was featured on the R Weekly podcast by Eric Nantz and Mike Thomas.
Before: many for, much sadness
I was starting from a thing, that could be a list, or even a data.frame. Then for a bunch of variables, I tweaked the thing. My initial coding pattern was therefore:
for (var in variables_vector) {
thing <- do_something(thing, var, other_argument = other_argument)
}
I was iteratively changing the thing, along a variables_vector
, or sometimes a variables_list
.
Silly example
Ugh, finding an example is hard, it feels very contrived but I promise my real-life adoption of purrr::usage()
was life-changing!
# Some basic movie information
movies <- tibble::tribble(
~title, ~color, ~elements,
"Barbie", "pink", "shoes",
"Oppenheimer", "red", "history"
)
# More information to add to movies
info_list <- list(
list(title = "Barbie", info = list(element = "sparkles")),
list(title = "Barbie", info = list(element = "feminism")),
list(title = "Oppenheimer", info = list(element = "fire"))
)
# Don't tell me this is weirdly formatted data,
# who never obtains weirdly formatted data?!
info_list
#> [[1]]
#> [[1]]$title
#> [1] "Barbie"
#>
#> [[1]]$info
#> [[1]]$info$element
#> [1] "sparkles"
#>
#>
#>
#> [[2]]
#> [[2]]$title
#> [1] "Barbie"
#>
#> [[2]]$info
#> [[2]]$info$element
#> [1] "feminism"
#>
#>
#>
#> [[3]]
#> [[3]]$title
#> [1] "Oppenheimer"
#>
#> [[3]]$info
#> [[3]]$info$element
#> [1] "fire"
add_element <- function(movies, info) {
movies[movies[["title"]] == info[["title"]],][["elements"]] <-
toString(c(
movies[movies[["title"]] == info[["title"]],][["elements"]],
info[["info"]][[1]]
))
movies
}
Now how do I add each element of the list to the original table? I could type something like:
for (info in info_list) {
movies <- add_element(movies, info)
}
movies
#> # A tibble: 2 × 3
#> title color elements
#> <chr> <chr> <chr>
#> 1 Barbie pink shoes, sparkles, feminism
#> 2 Oppenheimer red history, fire
It’s not too bad, really. But since there’s another way, we can change it.
After
With purrr::reduce()
for (var in variables_vector) {
thing <- do_something(thing, var)
}
can become
thing <- purrr::reduce(variables_vector, do_something, .init = thing)
And (notice the other argument),
for (var in variables_vector) {
thing <- do_something(thing, var, other_argument = other_argument)
}
can become
thing <- purrr::reduce(
variables_vector,
\(thing, x) do_something(thing, x, other_argument = other_argument),
.init = thing
)
I haven’t completely internalized the pattern above but the documentation of purrr::reduce()
states
“We now generally recommend against using … to pass additional (constant) arguments to .f. Instead use a shorthand anonymous function:
Instead of x |> map(f, 1, 2, collapse = “,") do: x |> map((x) f(x, 1, 2, collapse = “,")) This makes it easier to understand which arguments belong to which function and will tend to yield better error messages.”
It might remind you of how things work for dplyr::across()
these days.
Back to our silly example!
# Some basic movie information
movies <- tibble::tribble(
~title, ~color, ~elements,
"Barbie", "pink", "shoes",
"Oppenheimer", "red", "history"
)
# More information to add to movies
info_list <- list(
list(title = "Barbie", info = list(element = "sparkles")),
list(title = "Barbie", info = list(element = "feminism")),
list(title = "Oppenheimer", info = list(element = "fire"))
)
add_element <- function(movies, info) {
movies[movies[["title"]] == info[["title"]],][["elements"]] <-
toString(c(
movies[movies[["title"]] == info[["title"]],][["elements"]],
info[["info"]][[1]]
))
movies
}
purrr::reduce(info_list, add_element, .init = movies)
#> # A tibble: 2 × 3
#> title color elements
#> <chr> <chr> <chr>
#> 1 Barbie pink shoes, sparkles, feminism
#> 2 Oppenheimer red history, fire
If we tweak the add_element()
function to add a separator
argument to it,
add_element <- function(movies, info, separator) {
movies[movies[["title"]] == info[["title"]],][["elements"]] <-
paste(c(
movies[movies[["title"]] == info[["title"]],][["elements"]],
info[["info"]][[1]]
), collapse = separator)
movies
}
purrr::reduce(
info_list,
\(movies, x) add_element(movies, x, separator = " - "),
.init = movies
)
#> # A tibble: 2 × 3
#> title color elements
#> <chr> <chr> <chr>
#> 1 Barbie pink shoes - sparkles - feminism
#> 2 Oppenheimer red history - fire
purrr::reduce(
info_list,
\(movies, x) add_element(movies, x, separator = " PLUS "),
.init = movies
)
#> # A tibble: 2 × 3
#> title color elements
#> <chr> <chr> <chr>
#> 1 Barbie pink shoes PLUS sparkles PLUS feminism
#> 2 Oppenheimer red history PLUS fire
And voilà!
Conclusion
In this post I presented my approximate understanding of purrr::reduce()
, that helped me avoid writing some for loops and instead more elegant code… or at least helped me understand a pattern that in the future I could use elegantly. I can only hope I purrr::accumulate()
more experience, as I very much still feel like a newbie.
For more information I’d recommend reading the documentation of purrr::reduce()
to be aware of other features, the content on the reduce family in Advanced R by Hadley Wickham… and release-watching the purrr repo to keep up-to-date with latest recommendations. You can also use GitHub Advanced Search to find examples of usage of the function in, say, CRAN packages.
Edit: For another take of / use case of purrr::reduce()
, June Choe wrote a nice detailed tutorial “Collapse repetitive piping with reduce()".