What I edit when refactoring a test file

2024/05/23

This post was featured on the R Weekly highlights podcast hosted by Eric Nantz and Mike Thomas.

I’m currently refactoring test files in a package. Beside some automatic refactoring, I am also manually updating lines of code. Here are some tips (or pet peeves, based on how I look at it / how tired I am 😁)

Prequel: please read the R packages book

The new edition of the R Packages book by Hadley Wickham and Jenny Bryan features three chapters on testing, all well worth a read. The “High-level principles for testing” in particular are good to keep in mind.

Use informative variable or function names

If your test is

test_that("function_maelle_does_not_know_well() works", {
  q <- function_maelle_does_not_know_well(1:10)
  ab <- function(x) {
    y <- exp(x) + 42 - 7
    round(log(y / 50))
}
  expect_equal(q, ab(6))
})

you might want to rename q and ab to something more informative. Note that this is a fictional example, it does not make any mathematical sense.

test_that("function_maelle_does_not_know_well() works", {
  transformed_vector <- function_maelle_does_not_know_well(1:10)
  crude_transformation_approximation <- function(x) {
    y <- exp(x) + 42 - 7
    round(log(y / 50))
}
  expect_equal(transformed_vector, crude_transformation_approximation(6))
})

Imagine it will be me reading your test file, and I know nothing about your use case. More seriously, that’s one of the numerous aspects code review can help with.

Vary variable names within tests

To prevent copy-pasting mistakes and reading fatigue,

test_that("awesome_function() works", {
  x <- awesome_function(1:10, method = "easy")
  expect_typeof(x, "double")
  expect_gt(x, 5)
  
  x <- awesome_function(1:10, method = "medium")
  expect_typeof(x, "double")
  expect_gt(x, 15)
  
  x <- awesome_function(1:10, method = "hard")
  expect_typeof(x, "double")
  expect_gt(x, 25)
  
})

is in my opinion better as

test_that("awesome_function() works", {
  easy_output <- awesome_function(1:10, method = "easy")
  expect_typeof(easy_output, "double")
  expect_gt(easy_output, 5)
  
  medium_output <- awesome_function(1:10, method = "medium")
  expect_typeof(medium_output, "double")
  expect_gt(medium_output, 15)
  
  hard_output <- awesome_function(1:10, method = "hard")
  expect_typeof(hard_output, "double")
  expect_gt(hard_output, 25)
  
})

I do not hate it when several tests use the same variable names though.

Imagine I made the code chunk from above

test_that("awesome_function() works", {
  easy_output <- awesome_function(1:10, method = "easy")
  medium_output <- awesome_function(1:10, method = "medium")
  hard_output <- awesome_function(1:10, method = "hard")
  
  expect_typeof(easy_output, "double")
  expect_gt(easy_output, 5)
  expect_typeof(medium_output, "double")
  expect_gt(medium_output, 15)
  expect_typeof(hard_output, "double")
  expect_gt(hard_output, 25)
  
})

I much prefer the version

test_that("awesome_function() works", {
  easy_output <- awesome_function(1:10, method = "easy")
  expect_typeof(easy_output, "double")
  expect_gt(easy_output, 5)
  
  medium_output <- awesome_function(1:10, method = "medium")
  expect_typeof(medium_output, "double")
  expect_gt(medium_output, 15)
  
  hard_output <- awesome_function(1:10, method = "hard")
  expect_typeof(hard_output, "double")
  expect_gt(hard_output, 25)
  
})

because reading the expectations doesn’t require me to go too far back or to remember too much.

In some cases, if the object creation is simple, and it is used in one expectation only, why not put it inside the expectation rather than having to assign it to an object with an informative name?

the_log <- log(x)
expect_gt(the_log, 1)

vs.

expect_gt(log(x), 1)

Split the tests as needed, navigate them seamlessly

Splitting the tests (into several test_that()) means each one is more focused, easier to read, easier to investigate when failing (tests tend to fail).

In RStudio IDE, each test_that() calls is shown in the script outline (the table of contents on the right) so you can hop from one to the other at will.

Use a bug summary, not number, as test name

Instead of

test_that("Bug #666 is fixed", {
  ...
})

I prefer

test_that("awesome_function() does not remove the ordering", {
  # <link-to-bug-report>
  ...
})

because:

I can understand what the test is about without clicking the link;
If your repository moved for instance from a monorepo to a regular repository, the hyperlink added by RStudio IDE might no longer be valid.

Conclusion

In this post I summarized my testing preferences du jour. Do you have any tips for refactoring test scripts?