6 Lists

In this chapter, we’re going to learn about lists. Lists can be a bit confusing the first time you begin to use them. Heaven knows it took me ages to get comfortable with them. However, they’re a very powerful way to structure data and, once mastered, will give you all kinds of control over pretty much anything the world can throw at you. If vectors are R’s atoms, lists are molecules.

By the end of this chapter, you will know the following:

  • What is a list and how are they created?
  • What is the difference between a list and vector?
  • When and how do I use lapply?

6.1 Lists Overview

A list is a bit like a vector in that it is a container for many elements of data. However, unlike a vector, the elements of a list may have different data types. In addition, lists may store data recursively. This means that a list may contain list, which contains another list and so on. Partially for this reason, access and assignment will use two new operators: [[]] and $. Confusing? Sorta. But don’t worry, we’ll walk through why and how you’ll work with lists.

The table below outlines some of the similarities and differences between vectors and lists.

Table 6.1: Key similarities and differences between vectors and lists
Vectors Lists
Ordered Also ordered
All elements have the same data type Elements may be any data type, even other lists
One dimension Doesn’t apply in this context
Access and assignment via Access and assignment via [] and $
May contain metadata May contain metadata

6.2 List construction

The list function will create a list. Have a look at the code below and try it out yourself.

x <- list(c("This", "is", "a", "list")
          , c(pi, exp(1)))
typeof(x)
#> [1] "list"
summary(x)
#>      Length Class  Mode     
#> [1,] 4      -none- character
#> [2,] 2      -none- numeric
str(x)
#> List of 2
#>  $ : chr [1:4] "This" "is" "a" "list"
#>  $ : num [1:2] 3.14 2.72

Visually, here’s what that looks like:

We said earlier that the concept of dimension doesn’t really apply to lists. We mean it and so does R. If you ask for the dimension of a list, you’ll get NULL. I’m not mathy enough to have anything intelligent to say abouut the dimensionality of a single construct composed of elements which each have their own dimension.

Note that you can ask for the length. This will return the number of top-most elements in the list.

dim(x)
#> NULL
length(x)
#> [1] 2

Note that, weirdly - and a little confusingly - you can create a list by using the vector function.

myList <- vector(mode = "list", length = 5)

As we said in the Vectors chapter, you’ll likely only do this to improve memory management performance. Don’t worry about that at this stage. However, if you’d like to play with access and assignment, vector may be useful.

6.2.1 Recursive storage

Lists can contain other lists as elements. And these lists may contain other lists and so on. It sounds complex, but it’s no stranger than your file system, e.g. a folder contains a folder, which contains another folder.

trey <- list("Trey Anastasio", "guitar", 1964)
page <- list("Page McConnell", "piano", 1963)
jon <- list("Jon Fishman", "drums", 1965)
mike <- list("Mike Gordon", "bass", 1965)
phish_members <- list(trey, page, jon, mike)
phish_albums <- c("Junta", "Rift", "Hoist")
phish <- list(phish_members, phish_albums)

This will be familiar to anyone who uses XML, JSON or YAML.

6.2.2 List metadata

Again, metadata will typically be names. However, these become very important for lists as names are handled with the special $ operator. We’ll talk about $ shortly. We can also assign a name as the list is being constructed.

names(phish) <- c("Members", "Albums")
phish <- list(Members = phish_members, Albums = phish_albums)

6.3 Access and assignment

Because list elements can be arbitrarily complex, access and assignment get new operators. We’ll use the [[ ]] operator when we want to access a single element of a list by name or position. $ will work for named arguments only.

phish[["Albums"]]
#> [1] "Junta" "Rift"  "Hoist"
phish[[2]]
#> [1] "Junta" "Rift"  "Hoist"
phish$Albums
#> [1] "Junta" "Rift"  "Hoist"

6.3.1 [ vs. [[

[ ] may also be used to access elements of a list. When first learning R, I found the distinction betwee [ ] and [[ ]] particularly vexing. After using lists for a while, I finally decided that this was the best way to distinguish between them:

  • [ is used to set and return an element of the same type as the containing object.
  • [[ is used to set and return an element of the same type as the contained object.

This is why [ ] will return a list when applied to a list. This is also why it may be used to return more than one element of list. Have a look at the code snippet below:

typeof(phish["Albums"])
#> [1] "list"
typeof(phish[["Albums"]])
#> [1] "character"

Don’t worry if this doesn’t make sense yet. It’s difficult for most R programmers.

6.3.2 Assignment

As with vectors, assignment may be thought of as access in reverse.

phish[["Albums"]] <- c("Lawn Boy", "A Picture of Nectar")

6.4 Summary functions

Because lists are arbitrary, we can’t expect functions like sum or mean to work. Instead, we use functions like lapply to summarize particular list elements. lapply will apply the same function to each element of a list. In the example below, we’ll generate some statistics for three different vectors stored in a list.

myList <- list(firstVector = c(1:10)
               , secondVector = c(89, 56, 84, 298, 56)
               , thirdVector = c(7,3,5,6,2,4,2))
lapply(myList, mean)
#> $firstVector
#> [1] 5.5
#> 
#> $secondVector
#> [1] 116.6
#> 
#> $thirdVector
#> [1] 4.142857
lapply(myList, median)
#> $firstVector
#> [1] 5.5
#> 
#> $secondVector
#> [1] 84
#> 
#> $thirdVector
#> [1] 4
lapply(myList, sum)
#> $firstVector
#> [1] 55
#> 
#> $secondVector
#> [1] 583
#> 
#> $thirdVector
#> [1] 29

Why lapply? Two reasons:

  1. It’s expressive. A loop is a lot of code which does little to clarify intent. lapply indicates that we want to apply the same function to each element of a list and it does it in only one line of code. Think of a formula that exists as a column in a spreadsheet.
  2. It’s easier to type at an interactive console. In its very early days, S was fully interactive. Typing a for loop at the console is a tedius and unnecessary task.

Note that we can also use lapply on structures like a vector.

6.5 Exercises

  • Create a list with two elements. Have the first element be a vector with 100 numbers. Have the second element be a vector with 100 dates. Give your list the names: “Claim” and “AccidentDate”.
  • What is the average value of a claim?

6.6 Answers

myList <- list()
myList$Claims <- rlnorm(100, log(10000))
myList$AccidentDate <- sample(seq.Date(as.Date('2000-01-01'), as.Date('2009-12-31'), length.out = 1000), 100)
mean(myList$Claims)
#> [1] 16349.04