x = sample(0:9, 10000000, replace=TRUE)42 42. use vector operations instead of loops when possible
The R language has the ability to preform repetitive tasks by using vector operations. For example:
sum(c(1,2,3,4))actually performs several different additions. This code actually calculates:
total = 1
total = total + 2
total = total + 3
total = total + 4the resulting total is then returned from the sum function. In most other languages you would need to write a loop to do this. However, because R has built in vector operations many times you can avoid writing loops and use R’s built in vector operations instead. Programmers who have experience with other languages often default to using loops in R even when the same can be accomplished without loops. However, when coding in R, avoiding loops is often the better option.
Vector operations in R are very fast compared to the equivalent code using loops. For example, the following code generates ten million random numbers.
The following code calculates the sum using the sum function. This code returns almost instantaneously.
sum(x)[1] 45001921
The following code produces the exact same sum - but this time using a a loop instead of the sum function. This code takes a LOT longer to run.
total=0
pos = 1
while(pos <= length(x)){
total = total + x[pos]
pos=pos+1
}
total[1] 45001921
42.1 Calculating the speed of the code
The following code uses the system.time function to calculate the exact amount of time each of the above examples took in seconds (see ?system.time for more info)
t <- system.time({
sum(x)
})
cat("sum(x) took", t["elapsed"], "seconds\n")sum(x) took 0.009 seconds
t <- system.time({
total=0
pos = 1
while(pos <= length(x)){
total = total + x[pos]
pos=pos+1
}
total
})
cat("The loop took", t["elapsed"], "seconds\n")The loop took 1.059 seconds
42.2 Why loops take longer
We will not explain in detail here. However, if you want to research some more this will help you do your research …
There are two types of languages “interpreted” languages and “compiled” languages. Compiled languages are generally faster than interpreted languages (for reasons we will not explain now).
R is an “interpreted” language. However, “under the covers”, R itself was “written” in “C”, which is a compiled lanaguage.
The vector operations in R are actually implemented “under the covers” as code in the “C” language using C’s loops. Since “C” is a compiled language these vector operations run very quickly.
By contrast “loops” in R are “interpreted” and run much slower than the equivalent vector operations.
42.2.1 What about lapply?
You can also accomplish looping behavior with the “apply” family of functions in R (e.g. lapply, sapply, etc). These are also interpreted so they work at a similar speed to R’s loops.
42.3 — Practice —
42.3.1 QUESTION: splitVector
#------------------------------------------------------------
# Write a function, splitVector, that takes a single argument, vec, that
# is expected to be a vector. The function should returns a list
# that contains 3 different vectors.
#
# - The 1st vector in the list should contain the the values from vec
# that are negative
#
# - The 2nd vector in the list should contain the the values from vec
# that are between 0 and 100
#
# - The 3rd vector in the list should contain the the values from vec
# that are larger than 100
#
# PART A - answer the question without using loops
#
# PART B - answer the question using a for loop
# DO NOT USE VECTOR OPERATIONS
#
# PART C - answer the question using a while loop
# DO NOT USE VECTOR OPERATIONS
#------------------------------------------------------------