2 Syntax

2.1 Object names

“There are only two hard things in Computer Science: cache invalidation and naming things.”

— Phil Karlton

Variable and function names should use only lowercase letters, numbers, and _. Use underscores (_) (so called snake case) to separate words within a name.

# Good
day_one
day_1

# Bad
DayOne
dayone

Base R uses dots in function names (contrib.url()) and class names (data.frame), but it’s better to reserve dots exclusively for the S3 object system. In S3, methods are given the name function.class; if you also use . in function and class names, you end up with confusing methods like as.data.frame.data.frame().

If you find yourself attempting to cram data into variable names (e.g. model_2018, model_2019, model_2020), consider using a list or data frame instead.

Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful (this is not easy!).

# Good
day_one

# Bad
first_day_of_the_month
djm1

Where possible, avoid re-using names of common functions and variables. This will cause confusion for the readers of your code.

# Bad
T <- FALSE
c <- 10
mean <- function(x) sum(x)

2.2 Spacing

2.2.1 Commas

Always put a space after a comma, never before, just like in regular English.

# Good
x[, 1]

# Bad
x[,1]
x[ ,1]
x[ , 1]

2.2.2 Parentheses

Do not put spaces inside or outside parentheses for regular function calls.

# Good
mean(x, na.rm = TRUE)

# Bad
mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE )

Place a space before and after () when used with if, for, or while.

# Good
if (debug) {
  show(x)
}

# Bad
if(debug){
  show(x)
}

Place a space after () used for function arguments:

# Good
function(x) {}

# Bad
function (x) {}
function(x){}

2.2.3 Embracing

The embracing operator, { }, should always have inner spaces to help emphasise its special behaviour:

# Good
max_by <- function(data, var, by) {
  data |>
    group_by({{ by }}) |>
    summarise(maximum = max({{ var }}, na.rm = TRUE))
}

# Bad
max_by <- function(data, var, by) {
  data |>
    group_by({{by}}) |>
    summarise(maximum = max({{var}}, na.rm = TRUE))
}

2.2.4 Infix operators

Most infix operators (==, +, -, <-, etc.) should always be surrounded by spaces:

# Good
height <- (feet * 12) + inches
mean(x, na.rm = TRUE)

# Bad
height<-feet*12+inches
mean(x, na.rm=TRUE)

There are a few exceptions, which should never be surrounded by spaces:

The operators with high precedence: ::, :::, $, @, [, [[, ^, unary -, unary +, and :.

# Good
sqrt(x^2 + y^2)
df$z
x <- 1:10

# Bad
sqrt(x ^ 2 + y ^ 2)
df $ z
x <- 1 : 10

Single-sided formulas when the right-hand side is a single identifier.

# Good
~foo
tribble(
  ~col1, ~col2,
  "a",   "b"
)

# Bad
~ foo
tribble(
  ~ col1, ~ col2,
  "a", "b"
)

Note that single-sided formulas with a complex right-hand side do need a space.

# Good
~ .x + .y

# Bad
~.x + .y

When used in tidy evaluation !! (bang-bang) and !!! (bang-bang-bang) (because they have precedence equivalent to unary -/+).
```
# Good
call(!!xyz)

# Bad
call(!! xyz)
call( !! xyz)
call(! !xyz)
```

The help operator.

# Good
package?stats
?mean

# Bad
package ? stats
? mean

2.2.5 Extra spaces

Adding extra spaces is ok if it improves alignment of = or <-.

# Good
list(
  total = a + b + c,
  mean  = (a + b + c) / n
)

# Also fine
list(
  total = a + b + c,
  mean = (a + b + c) / n
)

Do not add extra spaces to places where space is not usually allowed.

2.3 Vertical space

Use vertical whitespace sparingly, and primarily to separate your “thoughts” in code, much like paragraph breaks in prose.

Avoid empty lines at the start or end of functions.
Only use a single empty line when needed to separate functions or pipes.
It often makes sense to put an empty line before a comment block, to help visually connect the explanation with the code that it applies to.

2.4 Function calls

2.4.1 Named arguments

A function’s arguments typically fall into two broad categories: one supplies the data to compute on; the other controls the details of computation. When you call a function, you typically omit the names of data arguments, because they are used so commonly. If you override the default value of an argument, use the full name:

# Good
mean(1:10, na.rm = TRUE)

# Bad
mean(x = 1:10, , FALSE)
mean(, TRUE, x = c(1:10, NA))

Avoid partial matching, where you supply a unique prefix of a function argument.

# Good
rep(1:2, times = 3)
cut(1:10, breaks = c(0, 4, 11))

# Bad
rep(1:2, t = 3)
cut(1:10, br = c(0, 4, 11))

2.4.2 Assignment

Avoid assignment in function calls:

# Good
x <- complicated_function()
if (nzchar(x) < 1) {
  # do something
}

# Bad
if (nzchar(x <- complicated_function()) < 1) {
  # do something
}

The only exception is in functions that capture side-effects:

output <- capture.output(x <- f())

2.4.3 Long function calls

Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function or use early returns to reduce the nesting in your code.

If a function call is too long to fit on a single line, use one line each for the function name, each argument, and the closing ). This makes the code easier to read and to change later.

# Good
do_something_very_complicated(
  something = "that",
  requires = many,
  arguments = "some of which may be long"
)

# Bad
do_something_very_complicated("that", requires, many, arguments,
                              "some of which may be long"
                              )

As described under Named arguments, you can omit the argument names for very common arguments (i.e. for arguments that are used in almost every invocation of the function). If this introduces a large disparity between the line lengths, you may want to supply names anyway:

# Good
my_function(
  x,
  long_argument_name,
  extra_argument_a = 10,
  extra_argument_b = c(1, 43, 390, 210209)
)

# Also good
my_function(
  x = x,
  y = long_argument_name,
  extra_argument_a = 10,
  extra_argument_b = c(1, 43, 390, 210209)
)

You may place multiple unnamed arguments on the same line if they are closely related to each other. A common example of this is creating strings with paste(). In such cases, it’s often beneficial to match one line of code to one line of output.

# Good
paste0(
  "Requirement: ", requires, "\n",
  "Result: ", result, "\n"
)

# Bad
paste0(
  "Requirement: ", requires,
  "\n", "Result: ",
  result, "\n")

2.5 Braced expressions

Braced expressions, {}, define the most important hierarchy of R code, allowing you to group multiple R expressions together into a single expression. The most common places to use braced expressions are in function definitions, control flow, and in certain function calls (e.g. tryCatch() and test_that()).

To make this hierarchy easy to see:

{ should be the last character on the line. Related code (e.g., an if clause, a function declaration, a trailing comma, …) must be on the same line as the opening brace.
The contents should be indented by two spaces.
} should be the first character on the line.

# Good
if (y < 0 && debug) {
  message("y is negative")
}

if (y == 0) {
  if (x > 0) {
    log(x)
  } else {
    message("x is negative or zero")
  }
} else {
  y^x
}

test_that("call1 returns an ordered factor", {
  expect_s3_class(call1(x, y), c("factor", "ordered"))
})

tryCatch(
  {
    x <- scan()
    cat("Total: ", sum(x), "\n", sep = "")
  },
  interrupt = function(e) {
    message("Aborted by user")
  }
)

# Bad
if (y < 0 && debug) {
message("Y is negative")
}

if (y == 0)
{
    if (x > 0) {
      log(x)
    } else {
  message("x is negative or zero")
    }
} else { y ^ x }

It is occasionally useful to have empty braced expressions, in which case it should be written {}, with no intervening space.

# Good
function(...) {}

# Bad
function(...) { }
function(...) {

}

2.6 Control flow

2.6.1 Loops

R defines three types of looping constructs: for, while, and repeat loops.

The body of a loop must be a braced expression.

# Good
for (i in seq) {
  x[i] <- x[i] + 1
}

while (waiting_for_something()) {
  cat("Still waiting...")
}

# Bad
for (i in seq) x[i] <- x[i] + 1

while (waiting_for_something()) cat("Still waiting...")

It is occasionally useful to use a while loop with an empty braced expression body to wait. As mentioned in Braced expressions, there should be no space within the {}.

2.6.2 If statements

A single line if statement must never contain braced expressions. You can use single line if statements for very simple statements that don’t have side-effects and don’t modify the control flow.

# Good
message <- if (x > 10) "big" else "small"

# Bad
message <- if (x > 10) { "big" } else { "small" }

if (x > 0) message <- "big" else message <- "small"

if (x > 0) return(x)

A multiline if statement must contain braced expressions.

# Good
if (x > 10) {
  x * 2
}

if (x > 10) {
  x * 2
} else {
  x * 3
}

# Bad
if (x > 10)
  x * 2

# In particular, this if statement will only parse when wrapped in a braced
# expression or call
{
  if (x > 10)
    x * 2
  else
    x * 3
}

When present, else should be on the same line as }.

Avoid implicit type coercion (e.g. from numeric to logical) in the condition of an if statement:

# Good
if (length(x) > 0) {
  # do something
}

# Bad
if (length(x)) {
  # do something
}

& and | should never be used inside of an if clause because they can return vectors. Always use && and || instead.

ifelse(x, a, b) is not a drop-in replacement for if (x) a else b. ifelse() is vectorised (i.e. if length(x) > 1, then a and b will be recycled to match) and it is eager (i.e. both a and b will always be evaluated).

2.6.3 Control flow modifiers

Syntax that affects control flow (like return(), stop(), break, or next) should always go in their own {} block:

# Good
if (y < 0) {
  stop("Y is negative")
}

find_abs <- function(x) {
  if (x > 0) {
    return(x)
  }
  x * -1
}

for (x in xs) {
  if (is_done(x)) {
    break
  }
}

# Bad
if (y < 0) stop("Y is negative")

find_abs <- function(x) {
  if (x > 0) return(x)
  x * -1
}

for (x in xs) {
  if (is_done(x)) break
}

2.6.4 Switch statements

Avoid position-based switch() statements (i.e. prefer names).
Each element should go on its own line unless all element can fit on one line.
Elements that fall through to the following element should have a space after =.
Provide a fall-through error unless you have previously validated the input.

# Good
switch(x,
  a = ,
  b = 1,
  c = 2,
  stop("Unknown `x`", call. = FALSE)
)

# Bad
switch(x,
  a =,
  b = 1,
  c = 2
)
switch(x,
  a = long_function_name1(), b = long_function_name2(),
  c = long_function_name2()
)
switch(y, 1, 2, 3)

2.7 Semicolons

Semicolons are never recommended. In particular, don’t put ; at the end of a line, and don’t use ; to put multiple commands on one line.

# Good
my_helper()
my_other_helper()

# Bad
my_helper();
my_other_helper();

{ my_helper(); my_other_helper() }

2.8 Assignment

Use <-, not =, for assignment.

# Good
x <- 5

# Bad
x = 5

2.9 Data

2.9.1 Character vectors

Use ", not ', for quoting text. The only exception is when the text already contains double quotes and no single quotes.

# Good
"Text"
'Text with "quotes"'
'<a href="http://style.tidyverse.org">A link</a>'

# Bad
'Text'
'Text with "double" and \'single\' quotes'

2.9.2 Logical vectors

Prefer TRUE and FALSE over T and F.

2.10 Comments

Each line of a comment should begin with the comment symbol and a single space: #

In data analysis code, use comments to record important findings and analysis decisions. If you need comments to explain what your code is doing, consider rewriting your code to be clearer. If you discover that you have more comments than code, consider switching to R Markdown.