4 Pipes

4.1 Introduction

Use %>% when you find yourself composing three or more functions together into a nested call, or creating intermediate objects that you don’t care about. Put each verb on its own line. This makes it simpler to rearrange them later, and makes it harder to overlook a step. It is ok to keep a one-step pipe in one line.

# Good
foo_foo %>%
  hop(through = forest) %>%
  scoop(up = field_mouse) %>%
  bop(on = head)

foo_foo %>% fall("asleep")

# Bad
foo_foo <- hop(foo_foo, through = forest)
foo_foo <- scoop(foo_foo, up = field_mice)
foo_foo <- bop(foo_foo, on = head)

iris %>% group_by(up) %>% summarize_all(mean) %>%
  ungroup %>% gather(field_mice, foo_foo, -on) %>%
  arrange(head)

bop(
  scoop(
    hop(foo_foo, through = forest),
    up = field_mice
  ),
  on = head
)

(If you’re not familiar with Litte)

Avoid using the pipe when:

  • You need to manipulate more than one object at a time. Reserve pipes for a sequence of steps applied to one primary object.

  • There are meaningful intermediate objects that could be given informative names.

There’s one exception to this rule: sometimes it’s useful to include a short pipe as an argument to a function in a longer pipe. Carefully consider whether the code is more readable with a short inline pipe (which doesn’t require a lookup elsewhere) or if it’s better to move the code outside the pipe and give it an evocative name.

# Good
x %>%
  select(a, b, w) %>%
  left_join(y %>% select(a, b, v), by = c("a", "b"))

x_join <-
  x %>%
  select(a, b, w)
y_join <-
  y %>%
  filter(!u) %>%
  gather(a, v, -b) %>%
  select(a, b, v)
left_join(x_join, y_join, by = c("a", "b"))

# Bad
x %>%
  select(a, b, w) %>%
  left_join(y %>% filter(!u) %>% gather(a, v, -b) %>% select(a, b, v), by = c("a", "b"))

4.2 Spacing and indenting

%>% should always have a space before it and a new line after it. After the first step, each line should be indented by two spaces.

# Good
iris %>%
  group_by(Species) %>%
  summarize_if(is.numeric, mean) %>%
  ungroup() %>%
  gather(measure, value, -Species) %>%
  arrange(value)

# Bad
iris %>% group_by(Species) %>% summarize_all(mean) %>%
ungroup %>% gather(measure, value, -Species) %>%
arrange(value)

4.3 No arguments

magrittr allows you to omit () on functions that don’t have arguments. Avoid this.

# Good
x %>% 
  unique() %>%
  sort()

# Bad
x %>% 
  unique %>%
  sort

4.4 Long lines

If the arguments to a function don’t all fit on one line, put each argument on its own line and indent:

iris %>%
  group_by(Species) %>%
  summarise(
    Sepal.Length = mean(Sepal.Length),
    Sepal.Width = mean(Sepal.Width),
    Species = n_distinct(Species)
  )

4.5 Assignment

Use a separate line for the target of the assignment followed by <-.

Personally, I think you should avoid using -> to create an object at the end of the pipe. While starting with the assignment is a little more work when writing the code, it makes reading the code easier. This is because the name acts as a heading, which reminds you of the purpose of the pipe.

# Good
iris_long <-
  iris %>%
  gather(measure, value, -Species) %>%
  arrange(-value)

# Bad
iris_long <- iris %>%
  gather(measure, value, -Species) %>%
  arrange(-value)

iris %>%
  gather(measure, value, -Species) %>%
  arrange(-value) ->
  iris_long

4.6 Comments

In data analysis code, use comments to record important findings and analysis decisions. If you need comments to explain what your code is doing, consider rewriting your code to be clearer. If you discover that you have more comments than code, considering switching to RMarkdown.