First sighting of a quosure in the wild…
quo… that isn’t a word!
1998 Kramer, Sienfeld Season 1, Episode 1
Scrabble scene

This tutorial is brought to by the rlang functions f_text, enquo, eval_tidy, lang_head, land_tail, expr_interp, and is_lang

Introduction

This is a tutorial about quosures and the some of the basics of using them. Quosures are objects that are the basis for non-standard evaluation of expressions in the tidyverse.

You may have never heard of quosures before but if you have used packages like dplyr and many others you are using quosures under the covers. It’s quosures that make the convienent and compact syntax of a function like dplyr::select possible:

t1 <- tibble::tribble(
    ~col1, ~col2, ~col3,
    1, 2, 3,
    4, 5, 6,
    7, 8, 9
)
# you just specify the columns you want with literal names
dplyr::select(t1, first=col2, last=col3)
## # A tibble: 3 x 2
##   first  last
##   <dbl> <dbl>
## 1     2     3
## 2     5     6
## 3     8     9

select is certainly a nicer syntax than something like:

t1 <- data.frame(
    col1 = 1:3,
    col2 = 2:4,
    col3 = 3:5
)
t2 <- subset(t1, T, select = c(col2, col3))
colnames(t2) <- c("first", "last")
t2
##   first last
## 1     2    3
## 2     3    4
## 3     4    5

This tutorial focuses on the behaviour, i.e. the mechanics of how quosures work, and the structure of quosures, not necessarily particular applications of quosures.

However it does include an example of applying quosures to a function that uses non-standard evaluation to build a part number. That, at first glance, might not seem like much but it lets part numbers use their native syntax… you’ll see that shortly.

By the end of this tutorial you should be able to think out of the box about some ways to make use of quosures to make function that has a clean and simple syntax for input, similar to the way that dplyr::select does.

But before we dive into the details of quosures we need to get a big picture of why we care about them and non-standard evaluation.

Big Picture

Most of the time in R expressions are evaluated and replaced with the value that the evaluation produces. This is what R calls standard evaluation. Here is an example of a function takes exp as it’s sole argument and then just returns it. However it does not return literally what is passed into it… it returns the result of R’s standard evaluation of what is passed into it :

a <- 21
b <- 1
f1 <- function(exp){
    exp 
}
f1(a-b)
## [1] 20

So calling f1 in this example produces, as you might expect, 20. But f1 didn’t do much to compute that result. It depended on R’s standard evaluation of a - b and returned the result that produced.

But what if you wanted f1(a-b) to produce the string “21-1”, that is the values of a and b concatonated with a dash? You might want to do this if you wanted a function that produced part numbers in a significant part numbering system.

In a significant part numbering system the part number describes the part it represents as opposed to being the next part number available when the part was added to the system. https://en.wikipedia.org/wiki/Part_number#Significant_versus_non-significant_part_numbers

For example the “21” in our made up part number means the part is a hex nut and the “1” means the part class is “expendable”. In effect a dash separates the attributes of the part. Of course for a real world part numbering system there would have to be a lot more attributes for a hex nut but we’re using just two here to keep the example simple.

So what we want is a make_part_number function that works like this:

make_part_number <- function(expr) {
    "21-1"
}
part_type <- 21
stock_class <- 1
make_part_number(part_type-stock_class)
## [1] "21-1"

It produces the string concatonation of values of part_type and stock_class we were looking for.

But it might seem like it’s not possible to write a function like this and not even necessary.

It might seem impossible because R’s standard evaluation of arguments, which is what we are all used to, seems to evaluate an argument before the body of the function gets to look at it. However quosures, as we’ll see as we go though this tutorial, give you a way to circumvent R’s standard evaluation of arguments and work with the literal expression in argument directly.

But it might seem unnecessary to even consider creating a function like this. For example this function could fill the bill:

make_part_number_alt <- function(part_type, stock_class) {
    glue::glue("{part_type}-{stock_class}")
}
part_type <- 21
stock_class <- 1
make_part_number_alt(a, b)
## 21-1

And as you can see it has no trouble making a part number.

make_part_number_alt is an example of a developer centered implementation of a product requirement to have a function that can make a part number. It leverages the developers understanding of the R language but not necessarily the users understanding of part numbers.

The first part number function we looked at, make_part_number, is an example of a user centered implementation of a product requirement. It takes into account the users understanding of the syntax of part numbers. For example the documentation for our made up part numbering system has the format of a screw type fastener defined as:

part type-part class

… which, again, is just the attributes of the part separated by a dash.

There is a computer science term for what make_part_number is doing. make_part_number’s argument is an expression from a DSL, or domain specific langauge. All DSL means is that that functions use formats and operations that the user of the functions is familiar with regardless of how difficult the implementation will be for the developer :-)

Now the users of the make_part_number function might be in a purchasing department and just be poking at part data from an R Studio console window. These users work with part numbers all day long, every day, and asking them translate dashes into commas and a little white space is just asking them to make errors.

The user also might be a developer using make_part_number in a package that will be the foundation for a system for tracking the statisics of part usage. Even here having part number arguments that look like the ones in the documentation the developer will be referencing is also a win in terms of minimizing errors.

As a general principle it’s better to use a syntax that already exists rather than create a new one. It’s better to use a more simple syntax than one that is littered with ceremonial programming text.

Oh…. let’s not forget the developer who is going to implement make_part_number. That developer will have a harder task than users will and will have to make use of quosures to create a DSL that reflects the users view of the problem domain. But that is what developers do, spend great effort and time to minimize the effort and time users spend.

And later in this tutorial we’ll be that developer and build the make_part_number function for our made up part numbering system.

Now that we had a quick look at the big picture and motivation for quosures let’s start digging details.

Quoting

The concept of quoting in programming refers to taking an expression used in a program and turning into it a string-like object so that it can be directly analyzed in a program rather than being evaluated.

R provides a number of ways to quote an expression. In the tidyverse the function enquo is used to quote an expression which is used as an argument to a function. In this tutorial we will focus on enquo because we will need it for the make_part_number function.

The enquo function returns an object called a quosure. A quosure is a quoted expression.

The rlang package provides a number of functions that let us easily tease apart the pieces of a quosure and we’ll see them as we go though this tutorial.

As an example of a simple one of these functions let’s make a quosure and use the f_text function to get its text value.

# function that makes a quosure
f1 <- function(expr) {
  rlang::enquo(expr)
}
# make the quosure
q1 <- f1(a-b-c-d)
# now get as text what was quoted
rlang::f_text(q1)
## [1] "a - b - c - d"

The text value of the quosure is the expression input to enquo but prettified a bit by adding whitespace around the dashes.

Note that the best practice for using binary operator like - is to put whitespace on both sides of the operator. We didn’t do that here on purpose… we want to write a part number exactly in the way the “documentation” for our made up part numbering system says we should.

One of the things you can do with a quosure is evaluate it by using the eval_tidy function from the rlang package. We’re using package name prefixes in the examples in this tutorial just so that it is easier to see where functions are coming from.

# assign some values to some varibles
a <- 10
b <- 5
c <- 3
d <- 2
# this is the same function as in the previous example
f1 <- function(expr) {
    rlang::enquo(expr)
}
# make the quosure
q1 <- f1(a-b-c-d)
# evaluate the quosure, that is compute the value of a-b-c-d
rlang::eval_tidy(q1)
## [1] 0

This doesn’t seem to be all that interesting. It looks like just a roundabout way to execute a-b-c-d in the console like this:

a <- 10
b <- 5
c <- 3
d <- 2
a-b-c-d
## [1] 0

But it is interesting. We’ll need to take a closer look at the structure of a quosure to see why.

It turns out that our make_part_number function will not have to evaluate a quoted expession like “a-b-c-d”, but it will need to evaluate individually the a, b, c and d that make up the part number. We’ll have to look how quosures are interpreted in order to see how that can be done, and we’ll look at that next.

Interpretation

Interpretation is the process of taking an expression, like a-b-c-d, and breaking it down in to pieces that can be computed and then putting those computed pieces back together to get the final result.

The preceding definition probably isn’t all clear but as we go though this part of the tutorial it will make sense.

First of all you might ask the question “Why do we have to break down anything?”. Why can’t you just compute a-b-c-d and be done with it.

There are a couple of reasons for breaking things down. Some expressions are much more complicated than a-b-c-d and there is no reason to think that they could be evaluated in a single gulp. But another reason is an artifact of how almost all modern computers work. In a nutshell a computer has instructions built into to it to compute most binary operations. Here is some psuedo assembly code that does subtraction:

sub R2, R1
load R6

The first instruction says to subtract R1 from R2 and put the result in the accumlator. R1, R2, R6 and the accumlator are special memory locations, called registers, that are internal to the computer itself.

The second instruction loads the accumlator into R6. So an R expression like:

R6 <- R2 - R1

… can be easily computed by modern computer hardware.

However there are no assembly instructions like:

sub R3, R2, R1

So an R expression like:

R6 <- R3 - R2 - R1

… would have to somehow be broken down into a series of binary operations for it to be evaluated.

It turns out, as luck would have it, that a quosure will do at least part of that breakdown for you.

A quosure can be broken up into a head and a tail. The head contains an operation or a function and the tail is a list of arguments for that operation or function. Let’s use some of the functions from rlang see how the quosure for a-b-c-d is broken down.

rm(list = ls())
# here is that function that makes a closure again
f1 <- function(expr) {
    rlang::enquo(expr)
}
q1 <- f1(a-b-c-d)
# get the operation/function of q1
rlang::lang_head(q1)
## `-`
# get the arguments of q1
rlang::lang_tail(q1)
## [[1]]
## a - b - c
## 
## [[2]]
## d

From the head you can see that it is a - operation being used. A - operation should have exactly 2 arguments.

And you can see that the tail has exactly 2 argruments in it. Not only does it have exactly two arguments it looks like quosures have pretty good idea of how breakdown an expression!

The first argument is a pruned off version of the a-b-c-d and the second is what was pruned off of it.

The second argument d is just a solo symbol. We can evaluate that like this:

rm(list = ls())
a <- 10
b <- 5
c <- 3
d <- 2
f1 <- function(expr) {
    rlang::enquo(expr)
}
q1 <- f1(a-b-c-d)
# arg2 is d
arg2 <- rlang::lang_tail(q1)[[2]]
arg2
## d
# which we can evaluate
rlang::eval_tidy(arg2)
## [1] 2

… and we get a value of 2 for d, which is in fact the value that was assigned to d. We did this by using eval_tidy on the second entry in the tail…remember the tail is a list.

We could evaluate the first argument if we want to and it would compute a-b-c, but for our make_part_number function we need the individual values for a, b, and ’c`. Let’s see how to do that.

What we want to do is turn the first argument in the tail of the q1 quosure into a quosure itself so we can use the tail of that quosure to prune off the c. To do that we need to interpret that first argument. That’s what the rlang expr_interp function does. Like this:

rm(list = ls())
a <- 10
b <- 5
c <- 3
d <- 2
f1 <- function(expr) {
    rlang::enquo(expr)
}
q1 <- f1(a-b-c-d)
arg1 <- rlang::lang_tail(q1)[[1]]
# this is the first argument, that is the first element in the tail
arg1
## a - b - c
# now make a quosure out of it
q2 <- rlang::expr_interp(arg1)
# now get the second argument of the tail of the q2 quosure
arg2 <- rlang::lang_tail(q2)[[2]]
# this will show that arg2 is c
arg2
## c
# and evaluate it
rlang::eval_tidy(arg2)
## [1] 3

And we can see now that expr_interp was able to make a quosure out of the first argument of q1‘s tail. Then we were able to prune off the ’c’ from that expression and evaluate it. The result of the evaluation was what we expected, 3, the value that was assigned to c.

With a little recursion we can prune off all of the part number attributes individually and evaluate them. Will see that shortly when we make the make_part_number function.

Now we know just enough about quosures to make the make_part_number function. It won’t be a full fledged production version, just to keep the example simple, but it will work.

make_part_number

Below is the make_part_number function.

The one “trick” that make_part_number is using that we haven’t looked at yet is that only language objects have a head and tail. The only thing you can do with quosures that are not language objects is to evaluate them. The rlang function is_lang will tell you if a quosure is a language object or not.

Comments are scattered thoughout… exercise for the student to understand how make_part_number works. :-)

make_part_number <- function(expr) {
    # used for recursion
    eval_attrs <- function(q, part_attributes = vector(mode = "character")) {
        if(!rlang::is_lang(q)) {
            # q is not a language object so just evaluate it
            # and add it to the list of part attributes
            part_attributes <- c(as.character(rlang::eval_tidy(q)), part_attributes)
            # and finish the recursion
            return(part_attributes)
        }
        # if we got to here q is a language object so it has a tail
        tail <- rlang::lang_tail(q)
        # add the second entry in the tail to the part attributes
        part_attributes <- 
            c(as.character(rlang::eval_tidy(rlang::expr_interp(tail[[2]]))),
                part_attributes)
        # recurse to find the next part number attribute
        eval_attrs(rlang::expr_interp(tail[[1]]), part_attributes)
    }
    # make a quosure out of expr
    q <- rlang::enquo(expr)
    # recurse to find all the part number attributes
    atrs <- eval_attrs(q)
    # concatonate all the part number attributes with "-"
    stringr::str_c(atrs, collapse="-")  
}

Now let’s try it out.

a <- 10
b <- 9
c <- 4
d <- 7
make_part_number(a-b-c-d)
## [1] "10-9-4-7"

Well that works as expected… Yippee!!!

Actually it works a bit better than you might think. Here we set an attribute to a string value and another to a computation:

a <- "xxZ23"
b <- 9
c <- b + d
d <- 7
make_part_number(a-b-c-d)
## [1] "xxZ23-9-16-7"

… and that works too.

And we can even do some computations in line in the part number itself:

a <- 10
b <- 9
c <- 4
d <- 7
make_part_number(a-b-c-b*c)
## [1] "10-9-4-36"

It may seem to be a surprise that these last couple of examples worked. But it really isn’t because we are using the same technique that R uses when is does a standard evaluation of an expression. It walks through the expression using the head and tail of each quosure it finds.

At this point we have seen the very basics of how you can use quosures. There is actually a lot more to look at. One of the things this tutorial glossed over was the role of environments in the evaluation of quosures. You might not be familiar with environments but they are really important for quosures, but that’s for later tutorial.

There are other ways to make quosures and use them… we’ve only look at a couple of the functions that R has for working with quosures. Quosures can handle much more complicated expressions than the part number we’ve started with here.

Quosures can also be used to analyze an expressions. In fact make_part_number doesn’t do a very good job of this as is. If you experiment with it a little bit you will see that it’s easy to give it a part number that it will misinterpret. But all that is for a later tutorials too.

Hope you found this tutorial informative and engaging,

Dan