rm(list = ls()) # start from scratch31 31. lapply ( SOME_LIST , SOME_FUNCTION )
31.1 What does lapply do?
##############################################################################.
#
# lapply ( SOME_LIST_OR_VECTOR , SOME_FUNCTION ) ####
#
#
# The lapply function is used to "apply" a function to every entry in a list
# or to every entry in a vector.
#
# The function is run as many times as there are entries in the list.
#
# lapply returns a list that contains the return values from each of the
# times the function was called.
#
# The "l" in "lapply" stands for "list", i.e. the return value of the
# "lapply" function is a list. There are other similar functions
# such as sapply, mapply and apply that have slightly different ways
# of working. For now we will just focus on "lapply". Later in the course
# we will learn about "sapply" and perhaps "mapply" and "apply".
#
# (see the examples below)
#
##############################################################################.31.2 Using lapply with a vector
# Recall that the builtin sqrt function takes the square root of a number
sqrt(4)[1] 2
sqrt(100)[1] 10
# If we want to apply the sqrt function to many number we can create a vector#
# with those numbers and then use lapply to apply the sqrt function to
# each of those numbers. lapply will return a list of the return values
# from the calls to the sqrt function.
# Let's start by creating a vector of numbers
nums = c(4,25,100,2)
# The following code will automatically
# pass each value in the nums vector one at a time to the sqrt function.
# Then return a list that contains all of the return values.
#
# Notice that the word sqrt is just specified by itself with no (parentheses).
lapply(nums, sqrt)[[1]]
[1] 2
[[2]]
[1] 5
[[3]]
[1] 10
[[4]]
[1] 1.414214
# We can also do this all in one command
lapply(c(4,25,100,2), sqrt)[[1]]
[1] 2
[[2]]
[1] 5
[[3]]
[1] 10
[[4]]
[1] 1.414214
31.3 We can also use lapply with our own functions.
We can build our own functions that can then be used in conjunction with lapply.
# Create a function that takes a single argument
doubleAndSquare = function(n){
return((n * 2) ^ 2)
}
# Apply our function to all values in a vector
lapply(2:5, doubleAndSquare)[[1]]
[1] 16
[[2]]
[1] 36
[[3]]
[1] 64
[[4]]
[1] 100
31.4 Functions that take more than one argument
We can pass to lapply any function that only requires one argument. For example the following function allows for more that one argument but since the 2nd argument has a default value, the function can be used with lapply and the default value will be used by lapply for the value of the second argument.
# Create a function that takes more than one value>
# The 2nd argument has a default value, so this function can be used with
# lapply
doubleAndPower = function(n, p=2){
return((n * 2) ^ p)
}
doubleAndPower(n=2,p=3) # result is (2*2)^3 = 64[1] 64
doubleAndPower(n=2) # p is set by default to 2 - result is (2*2)^2 = 16[1] 16
# Apply our function to all values in a vector
# Same result as when we used doubleAndSquare since default for the argument p is 2
lapply(2:5, doubleAndPower) [[1]]
[1] 16
[[2]]
[1] 36
[[3]]
[1] 64
[[4]]
[1] 100
31.5 Using lapply with a list
We’ll use the following list to demonstrate.
# create a list
gradebook = list(grades=c(80,85,72,95,100,89,65,84),
students=c("joe","sue","bob","al","frank","mike","anne","clara"),
honors=c(FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE))
gradebook$grades
[1] 80 85 72 95 100 89 65 84
$students
[1] "joe" "sue" "bob" "al" "frank" "mike" "anne" "clara"
$honors
[1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
31.5.1 length, range, summary
The function must require only one argument. The following are some built in functions that we can use:
- length
- range
- summary
#-----------------------------------------------------------------------.
# Some built-in functions
#
# We'll use the following functions to help explain the lapply and sapply functions
#
# length(SOME_VECTOR_OR_LIST) returns the number of items in the vector or the list
# range(SOME_VECTOR) - returns c( min(SOME_VECTOR) , max(SOME_VECTOR))
# summary(SOME_OBJECT) - returns a summary of what is in SOME_OBJECT
#-----------------------------------------------------------------------.
# length returns the number of items in a vector (or in a list)
length( gradebook$grades ) # 8[1] 8
length( gradebook$students ) # 8[1] 8
length( gradebook$honors ) # 8[1] 8
# range returns min and max of the vector
range( gradebook$grades) # 65 100[1] 65 100
range( gradebook$students) # "al" "sue"[1] "al" "sue"
range( gradebook$honors) # 0 1[1] 0 1
# summary returns a short summary of data.
# different results for different types of data.
summary( gradebook$grades ) # see output below Min. 1st Qu. Median Mean 3rd Qu. Max.
65.00 78.00 84.50 83.75 90.50 100.00
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 65.00 78.00 84.50 83.75 90.50 100.00
summary( gradebook$students ) # see output below Length Class Mode
8 character character
# Length Class Mode
# 8 character character
summary( gradebook$honors ) # see output below Mode FALSE TRUE
logical 5 3
# Mode FALSE TRUE
# logical 5 3 31.6 lapply ( SOME_LIST , SOME_FUNCTION )
#-----------------------------------------------------------------------.
# lapply ( SOME_LIST_OR_VECTOR , SOME_FUNCTION )
#
# The lapply function takes two arguments:
# X - a list or a vector
# FUN - the name of a function.
#
# lapply returns a new list that contains as many entries as there
# were in the list or vector, X. Each entry in the new list contains the results
# of "applying" the function X to an entry of the original list or vector, X.
#
# EXAMPLE:
# > lst = list( c(10,20,30), # length of this is 3
# c("apple", "orange")) # length of this is 2
#
# > lapply(X=lst, FUN=length) # list of 3 and 2
# [[1]]
# [1] 3
#
# [[2]]
# [1] 2
#-----------------------------------------------------------------------.
#........................................
# EXAMPLE 1 - lapply(gradebook, length)
#........................................
# Use the lapply function to automatically apply the length function to
# each of the items in the gradebook list.
# This returns a new list that contains all of the results.
lapply(gradebook, length) # see results below$grades
[1] 8
$students
[1] 8
$honors
[1] 8
# lapply returns a list of the answers
mode(lapply(gradebook, length)) # "list"[1] "list"
#........................................
# EXAMPLE 2 - lapply(gradebook, range)
#........................................
# Show the results of the range function on each of the items in the list
range(gradebook$grades) # 65 100[1] 65 100
range(gradebook$students) # "al" "sue"[1] "al" "sue"
range(gradebook$honors) # 0 1[1] 0 1
# apply range function to all items in the gradebook list
lapply(gradebook, range) # see results below$grades
[1] 65 100
$students
[1] "al" "sue"
$honors
[1] 0 1
#........................................
# EXAMPLE 3 - lapply(gradebook, summary)
#........................................
# summary returns a short summary of data.
# different results for different types of data.
summary( gradebook$grades ) # see output below Min. 1st Qu. Median Mean 3rd Qu. Max.
65.00 78.00 84.50 83.75 90.50 100.00
summary( gradebook$students ) # see output below Length Class Mode
8 character character
summary( gradebook$honors ) # see output below Mode FALSE TRUE
logical 5 3
# apply summary function to all items in the gradebook list
lapply(gradebook, summary) # all of the above results in a single list$grades
Min. 1st Qu. Median Mean 3rd Qu. Max.
65.00 78.00 84.50 83.75 90.50 100.00
$students
Length Class Mode
8 character character
$honors
Mode FALSE TRUE
logical 5 3
# apply mode function to all items in the gradebook list
lapply(gradebook, mode) # all of the above results in a single list$grades
[1] "numeric"
$students
[1] "character"
$honors
[1] "logical"
31.7 lapply with nested lists
#---------------------------------------------------------------------------.
# lapply also works with nested lists
# (i.e. lists that contain other lists)
#---------------------------------------------------------------------------.
rm(list = ls())
classes = list( year = 2021,
semester = "fall",
section1 = list ( students = c("abe","bob","charlie"),
test1 = c(70,80,90),
test2 = c(75,85,95)),
section2 = list( students = c("fran", "anne", "sue", "bertha", "maxine"),
test1 = c(100,90,80,70,60),
test2 = c(95,85,75,65,55),
test3 = c(93,83,73,63,53)) )
length(classes) # 4 (classes contains 4 objects)[1] 4
lapply(classes, length) # a list that contains 1,1,3,4, i.e. the length of each object in classes$year
[1] 1
$semester
[1] 1
$section1
[1] 3
$section2
[1] 4
# remember the str function is very helpful for seeing the structure of complex lists
str(classes)List of 4
$ year : num 2021
$ semester: chr "fall"
$ section1:List of 3
..$ students: chr [1:3] "abe" "bob" "charlie"
..$ test1 : num [1:3] 70 80 90
..$ test2 : num [1:3] 75 85 95
$ section2:List of 4
..$ students: chr [1:5] "fran" "anne" "sue" "bertha" ...
..$ test1 : num [1:5] 100 90 80 70 60
..$ test2 : num [1:5] 95 85 75 65 55
..$ test3 : num [1:5] 93 83 73 63 53
31.8 NAMED LISTS WITH lapply VS UNNAMED LISTS
#-----------------------------------------------------------------------.
# NAMED LISTS WITH lapply VS UNNAMED LISTS
#
# The return value of lapply will be a named list only if the
# original list has names.
#-----------------------------------------------------------------------.
hasNames = list( fruit=c("apple","orange","pear","plum"),
evens=seq(2,10,by=2),
odds=seq(1,30,by=2),
someLogicals=c(TRUE,FALSE,TRUE))
hasNames$fruit
[1] "apple" "orange" "pear" "plum"
$evens
[1] 2 4 6 8 10
$odds
[1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
$someLogicals
[1] TRUE FALSE TRUE
str(hasNames)List of 4
$ fruit : chr [1:4] "apple" "orange" "pear" "plum"
$ evens : num [1:5] 2 4 6 8 10
$ odds : num [1:15] 1 3 5 7 9 11 13 15 17 19 ...
$ someLogicals: logi [1:3] TRUE FALSE TRUE
length(hasNames)[1] 4
lapply(hasNames, length)$fruit
[1] 4
$evens
[1] 5
$odds
[1] 15
$someLogicals
[1] 3
lapply(hasNames, range)$fruit
[1] "apple" "plum"
$evens
[1] 2 10
$odds
[1] 1 29
$someLogicals
[1] 0 1
lapply(hasNames, summary)$fruit
Length Class Mode
4 character character
$evens
Min. 1st Qu. Median Mean 3rd Qu. Max.
2 4 6 6 8 10
$odds
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 8 15 15 22 29
$someLogicals
Mode FALSE TRUE
logical 1 2
noNames = list( c("apple","orange","pear","plum"),
seq(2,10,by=2),
seq(1,30,by=2),
c(TRUE,FALSE,TRUE))
noNames[[1]]
[1] "apple" "orange" "pear" "plum"
[[2]]
[1] 2 4 6 8 10
[[3]]
[1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
[[4]]
[1] TRUE FALSE TRUE
str(noNames)List of 4
$ : chr [1:4] "apple" "orange" "pear" "plum"
$ : num [1:5] 2 4 6 8 10
$ : num [1:15] 1 3 5 7 9 11 13 15 17 19 ...
$ : logi [1:3] TRUE FALSE TRUE
length(noNames)[1] 4
lapply(noNames, length)[[1]]
[1] 4
[[2]]
[1] 5
[[3]]
[1] 15
[[4]]
[1] 3
lapply(noNames, range)[[1]]
[1] "apple" "plum"
[[2]]
[1] 2 10
[[3]]
[1] 1 29
[[4]]
[1] 0 1
lapply(noNames, summary)[[1]]
Length Class Mode
4 character character
[[2]]
Min. 1st Qu. Median Mean 3rd Qu. Max.
2 4 6 6 8 10
[[3]]
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 8 15 15 22 29
[[4]]
Mode FALSE TRUE
logical 1 2
31.9 using custom functions with lapply
############################################################.
# using custom functions with lapply
#
# You can use any function that contains one argument
# with lapply, even your own custom functions
############################################################.
stuff = list(grades = c(70,80,95,88,60),
students = c("larry", "zeke","charlie","yaakov","abe"),
age = c(17,23,20,20,21,19))
# define a function to find the 2nd largest value in a vector
secondLargest = function( vec ){
sort(vec)[length(vec)-1]
}
# Test the function
nums = c(10,1000,900,-55,23)
secondLargest(nums) # 900[1] 900
# Example - use our function the vectors in the list, stuff
secondLargest( stuff$grades) # 88[1] 88
secondLargest( stuff$students) # "yaakov"[1] "yaakov"
secondLargest( stuff$age) # 21[1] 21
# Use our custom function with lapply to automatically apply the function
# to all entries in the list, stuff
lapply( stuff, secondLargest) # list of the answers$grades
[1] 88
$students
[1] "yaakov"
$age
[1] 21
31.10 functions defined in one line dont need {curly braces}
###########################################.
# functions defined in one line
#
# If a function only needs one line of code
# the {curly braces} aren't required.
###########################################.
# Another way to define the second largest function - all in one line
# (you don't the the {curly braces})
secondLargest = function ( vec ) sort(vec)[length(vec)-1]
# Test the function
nums = c(10,1000,900,-55,23)
secondLargest(nums)[1] 900
31.11 Using “one line functions” with lapply
###########################################.
# Using "one line functions" with lapply
###########################################.
# create a list
gradebook = list(grades=c(80,85,72,95,100,89,65,84),
students=c("joe","sue","bob","al","frank","mike","anne","clara"),
honors=c(FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE))
gradebook$grades
[1] 80 85 72 95 100 89 65 84
$students
[1] "joe" "sue" "bob" "al" "frank" "mike" "anne" "clara"
$honors
[1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
# If you want to define a function just to use with lapply you
# can define it directly in the call to lapply
# Use the secondLargest function we defined above
lapply(gradebook, secondLargest) # we defined gradebook above$grades
[1] 95
$students
[1] "mike"
$honors
[1] TRUE
# This also works
lapply(gradebook, function ( vec ) {
sort(vec)[length(vec)-1]
})$grades
[1] 95
$students
[1] "mike"
$honors
[1] TRUE
# So does this
lapply(gradebook, function ( vec ) sort(vec)[length(vec)-1] )$grades
[1] 95
$students
[1] "mike"
$honors
[1] TRUE
# You can capture the output of lapply in a variable if you like
answers = lapply(gradebook, function ( vec ) sort(vec)[length(vec)-1] )
answers$grades
[1] 95
$students
[1] "mike"
$honors
[1] TRUE
answers$students[1] "mike"
answers[[1]] [1] 95
mode(answers) # "list"[1] "list"
31.12 –PRACTICE–
#------------------------------------------------------------------------.
# QUESTION
#
# stuff is a list.
# Write a single line of code that returns the number of
# numeric vectors that are contained in a single list.
#
# HINT: Use the is.numeric function, lapply and unlist
#
# EXAMPLE
#
# > stuff = list(c(100,200,300),
# c("apple", "orange", "pear"),
# seq(10,100,by=5),
# c(TRUE, FALSE))
#
# > YOUR CODE GOES HERE
# [1] 2
#------------------------------------------------------------------------.#------------------------------------------------------------------------.
# QUESTION
#
# stuff is a list.
# Write a single line of code that creates a new variable,
# named numericStuff, that contains just a new list with
# just the numeric vectors from stuff.
#
# EXAMPLE:
# > stuff = list( seq(10,20,length.out=4) , # numeric
# NULL, # NULL
# c("apple", "orange"), # character
# rep(5, 3), # numeric
# 1:4 > pi, # logical
# NULL, # NULL
# paste0(letters[1:5], 1:5), # character
# 2 ^ (1:5) ) # numeric
#
# > numericStuff = YOUR CODE GOES HERE
#
# > numericStuff
# [[1]]
# [1] 10.00000 13.33333 16.66667 20.00000
#
# [[2]]
# [1] 5 5 5
#
# [[3]]
# [1] 2 4 8 16 32
#------------------------------------------------------------------------.#------------------------------------------------------------------------.
# QUESTION
#
# (see previous question)
# Same idea as previous question, but this time create a vector
# called someStuff that contains just the logical and numeric vectors
# from stuff.
#
# HINT: one way to do this is to create a custom function that
# takes a single argument. The custom function should
# return TRUE if the argument contains a numeric or a logical vector
# and FALSE otherwise.
#
# EXAMPLE:
# > stuff = list( seq(10,20,length.out=4) , # numeric
# NULL, # NULL
# c("apple", "orange"), # character
# rep(5, 3), # numeric
# 1:4 > pi, # logical
# NULL, # NULL
# paste0(letters[1:5], 1:5), # character
# 2 ^ (1:5) ) # numeric
#
# > someStuff = YOUR CODE GOES HERE
#
# > someStuff
# [[1]]
# [1] 10.00000 13.33333 16.66667 20.00000
#
# [[2]]
# [1] 5 5 5
#
# [[3]]
# [1] 2 4 8 16 32
#------------------------------------------------------------------------.31.13 directly supply more than one argument to a function
You can pass the other arguments to lapply directly by specifying the names of the extra arguments in the call to lapply. For example:
lapply(c(1, 3, 5), seq, to = 10, by = 2)[[1]]
[1] 1 3 5 7 9
[[2]]
[1] 3 5 7 9
[[3]]
[1] 5 7 9
31.14 passing a value to an argument is is NOT the first argument
If the value that you want lapply to pass it’s value to is not the first argument in the function you can use the following technique.
Examine the following example:
myseq = function(myBy){
seq(from=1, to=10, by=myBy)
}
# Generate different sequences that start with 1 and end with 100.
# The lapply will provide the value for the by argument.
lapply(2:4, myseq)[[1]]
[1] 1 3 5 7 9
[[2]]
[1] 1 4 7 10
[[3]]
[1] 1 5 9
# We can also do this all in one line
lapply(2:4, function(x) seq(from=1, to=10, by=x))[[1]]
[1] 1 3 5 7 9
[[2]]
[1] 1 4 7 10
[[3]]
[1] 1 5 9
We used lapply to call the seq function. The seq function takes several arguments. The issue is that we want lapply to pass a value the the by argument (which is the 3rd argument to seq - see ?seq).
This is problematic because by lapply will only pass to the first argument of the function. To address this issue, we “wrapped” the seq function with our own myseq function. Our myseq function only takes one argument. That argument is then passed to the by argument of the builtin seq function. We can now use lapply to pass values to the myseq function which in turn will ensure that we are passing that same value to the 3rd argument, by, of the seq function.