6 6. Intro to R vectors

6.1 What’s a vector? The is.vector() function.

A vector is a collection of several values. R streamlines processing of vector by working with all the values in a vector simultaneously. This is best explained with an example.

6.2 Example - using the c() function to create a vector

The c() function collects several values into a single “vector” of values. When you perform arithmetic with a vector each item in the vector is operated upon. In the following exmaple c(100,200,300) creates a single “vector” that includes the specified values. The number 5 is then added to each value in the vector. This produces a new vector that contains 105, 205 and 305.

c(100,200,300) + 5    # return a vector that contains c(105, 205, 305)

[1] 105 205 305

The above demonstrates a simple example of “vector arithmetic”. Any arithmetic that is done with vectors operates on all values in the vector. We’ll explore this in more depth soon.

6.3 Many R functions can be used to create vectors.

c() is NOT the only function that can create a vector. R has many, many functions that create vectors.

6.3.1 Intro to seq() function

For example, the seq function can create a vector of values that follow some sequence or pattern. The following uses the seq function to create a sequences with the numbers 11,12,13,14,15.

seq(11, 15)

[1] 11 12 13 14 15

We can do vector arithmetic with the results. In the following example 100 gets added to each value in the vector 3,4,5,6 to result in 103,104,105,106

100 + seq(3, 6)

[1] 103 104 105 106

6.3.2 Intro to rep() function

The rep() function returns a vector that contains repetitions of the specified values. For example

rep("hello", 5)  # repeat the word "hello" five times

[1] "hello" "hello" "hello" "hello" "hello"

rep(100,3)       # repeat the number 100 three times

[1] 100 100 100

6.4 Nesting function calls

You can pass the return value of one function call into another function call.

rep(c(10,11,12), 4)   # repeat 10,11,12 four times

 [1] 10 11 12 10 11 12 10 11 12 10 11 12

6.5 More vector arithmetic

Each of the above examples returns a single vector. We can do vector arithmetic with the results.

# repeat the number 100 three times then add 5 to each number
rep(100,3) + 5

[1] 105 105 105

# add 1000 to each of the values in the vector produced by rep(seq(3,5), 4)
1000 + rep(seq(3,5), 4)

 [1] 1003 1004 1005 1003 1004 1005 1003 1004 1005 1003 1004 1005

6.5.1 The seq function

6.6 order of operations with vectors too

# vector arithmetic also respects the order of operations
# In the following example the multiplication is done before the addition
# to yield the value c(205, 405, 605)

5 + c(100, 200, 300) * 2   # do the multiplication first

[1] 205 405 605

# This works as follows
#
# original:               5 + c(100, 200, 300) * 2
#
# do the *:               5 + c(200, 400, 600)
#
# then do the +:          c(205, 405, 605)
#
# result is displayed as: 205 405 605

6.7 (parentheses) change the order of operations

# we can change the order of operations with parentheses
# This yields a different result. 

(5 + c(100,200,300)) * 2   # pay close attention to the parenthesis!!!

[1] 210 410 610

# This works follows
#
# original:     (5 + c(100, 200, 300)) * 2
#
# do the +:     c(105, 205, 305) * 2
#
# then do the *: c(210, 410, 610)
#
# result is displayed as: 210 410 610

6.8 You may assign a vector to a variable

grades <- c(72,95,79,85)

grades   # show the values

[1] 72 95 79 85

#---------------------------------------------------------------
# QUESTION:
#
# Modify the grades variable by adding 2 points to each grade
#---------------------------------------------------------------

#---------------------------------------------------------------
# ANSWER
#---------------------------------------------------------------
grades = grades + 2    # you must assign the answer back to grades
grades

[1] 74 97 81 87

#---------------------------------------------------------------

6.9 is.vector( SOME_VALUE )

R has several YRTODO - FIX THIS The is.vector function returns TRUE if its argument is a vector #### and FALSE otherwise.

As we said above, a vector is a “collection” of values. The simplest vector is just a single value. (it is technically a collection of just one value).

#-----------------------------------------------------------------------.
# The is.vector function returns TRUE if its argument is a vector ####
# and FALSE otherwise.
#-----------------------------------------------------------------------.

#-----------------------------------------------------------------------.
# The simplest vector is just a single value ... ####
# (it is technically a collection of just one value).
#-----------------------------------------------------------------------.

# View the help page by typing: 
#
# ?is.vector

is.vector ( 3 )

[1] TRUE

is.vector( 99923141.32412431 )

[1] TRUE

# A variable that contains a vector is a vector   ####

priceOfApple = 1.99
is.vector(priceOfApple)   # TRUE

[1] TRUE

6.10 The c() function

# The c() function is used to combine multiple values into a single vector. ####
#
# You can think of the "c" as standing for the word "combine".
# "c" actually stands for the word "concatenate" which 
# is a technical fancy shmancy word for "combine things together".

# The following is a vector with mutliple values. 
# The c function combines (i.e. "concatenates") the multiple values into a 
# single "vector"
c(100,200,300, 50, -2, 25)

[1] 100 200 300  50  -2  25

is.vector(c(100,200,300, 50, -2, 25))    # this works

[1] TRUE

is.vector(100,200,300, 50, -2, 25) # ERROR: use c() to tie together different values

Error in is.vector(100, 200, 300, 50, -2, 25): unused arguments (300, 50, -2, 25)

someNumbers = c(100,200,300, 50, -2, 25) # combine (or concatenate) values into one vector
someNumbers

[1] 100 200 300  50  -2  25

is.vector(someNumbers)   # TRUE

[1] TRUE

6.11 Other R functions return vectors

6.12 range() function

#-----------------------------------------------------.
# Other functions can also create vectors.
#-----------------------------------------------------.

#.............................................................................
# The range function returns a vector
#
# The range function returns the minimum and maximum values that are in a vector ####
#.............................................................................


range(someNumbers)

[1]  -2 300

is.vector(range(someNumbers))

[1] TRUE

# You can also capture the result in  a variable
lowestAndHighest = range(someNumbers)
lowestAndHighest   # -2  300

[1]  -2 300

is.vector(lowestAndHighest)   # TRUE

[1] TRUE

6.13 seq() function

#.............................................................................
# The seq function returns a vector. In its simplest use, 
# seq returns the sequence starting with the 1st argument, ending with the 2nd argument ####
#
# NOTE - we will come back to the seq function to learn about 
#        much more complex ways of using it.
#.............................................................................

# Example 1
seq(5,10)                  # 5 6 7 8 9 10

[1]  5  6  7  8  9 10

is.vector( seq(5, 10) )    # TRUE

[1] TRUE

# Example 2
seq(10,5)                  # 10 9 8 7 6 5

[1] 10  9  8  7  6  5

is.vector( seq(10,5) )     # TRUE

[1] TRUE

# Example 3
seq(0.5, 2.5)              # 0.5   1.5   2.5

[1] 0.5 1.5 2.5

is.vector( seq(0.5, 2.5) ) # TRUE

[1] TRUE

# We can also capture the results in variables
example1 = seq(5,10)
example1               # 5  6  7  8  9  10

[1]  5  6  7  8  9 10

is.vector(example1)    # TRUE

[1] TRUE

example2 = seq(10,5)   
example2               # 10  9  8  7  6  5

[1] 10  9  8  7  6  5

is.vector(example2)    # TRUE

[1] TRUE

example3 = seq(.5, 2.5)  
example3               # 0.5  1.5  2.5

[1] 0.5 1.5 2.5

is.vector(example3)    # TRUE

[1] TRUE

seq(0.5, 3)

[1] 0.5 1.5 2.5

6.14 rep() function

#.............................................................................
# The rep function returns a vector   ####
#
# In its simplest use, the rep function returns a vector of it's first
# argument repeated the number of times specified by its 2nd argument.
#
# NOTE - we will come back to the rep function to learn about 
#        more complex ways of using it.
#.............................................................................

rep(100,3)        # 100  100  100

[1] 100 100 100

rep(  seq(1,3)  ,   2)  # 1  2  3  1  2  3

[1] 1 2 3 1 2 3

# QUESTION 
# Create a vector that has the numbers 1 3 1 3 1 3 etc. for a total
# of 20 numbers. Store the resulting vector into a variable named nums.

nums = rep( c(1,3) , 10)   # ANSWER

6.15 Use c() to combine vectors

#-------------------------------------------------------------------------------------.
# DO NOT WRITE INDIVIDUAL VALUES WITHOUT COMBINING THEM TOGETHER WITH A FUNCTION CALL!
#-------------------------------------------------------------------------------------.

100,200,300    # ERROR - individual values separated by commas are meaningless to R  ####

Error in parse(text = input): <text>:5:4: unexpected ','
4: 
5: 100,
      ^

# REMEMBER - if no other function call is being used, you can use the
#            c function to combine individual values

c(100,200,300)    # 100  200 300 (no error)

[1] 100 200 300

#-------------------------------------------------------------.
#
# More about the c function  ####
#
#-------------------------------------------------------------.

#..............................................................................
# If you "nest" calls to "c", ie. if you combine one vector inside of another
# vector by using the c function, the result is a single vector
#..............................................................................

c(100, 200, c(30, 20, 10), 600)   # same as c(100,200,300,400,500,600)

[1] 100 200  30  20  10 600

c(100, 200, 30, 20, 10, 600) # same thing

[1] 100 200  30  20  10 600

#..............................................................................
# You can use the c function to combine multiple vectors into a single vector.
#..............................................................................

x <- c(10,20,30)
y <- c(40, 50)
z <- c(x, y)      # combine the values from x and y into z
z

[1] 10 20 30 40 50

z <- x, y         # ERROR - use the c function to combine vectors into a single vector

Error in parse(text = input): <text>:1:7: unexpected ','
1: z <- x,
          ^

# QUESTION ####
# Find the sum of all the values that are in x and y, without using z

# ANSWER
sum(c(x,y))   # This works

[1] 150

sum(x,y)      # This works too - sum allows multiple vectors to be summed

[1] 150

# QUESTION  ####
#
# Find the average (i.e. mean) of all the values that are in x and y,
# without using z

# ANSWER
mean(c(x,y))      #This works

[1] 30

mean(x,y)         # ERROR

Error in mean.default(x, y): 'trim' must be numeric of length one

# QUESTION
# Why did we get an error in the last example?

# ANSWER 
#
# From the documentation for sum and mean (i.e. ?sum and ?mean) we can 
# see that the sum function allows multiple vectors that contain the numbers to be
# to be passed as separate arguments. However, the mean function requires
# all of the numbers to be averaged to be in a single vector that is passed 
# to the argument named x. It's true that one might expect these functions
# to be more similar in how they are called. However, the designers of the 
# language decided otherwise. The underlying reasons for the difference in
# the design of these functions is irrelevant - bottom line is you
# need to know how to call the functions. The place to learn this is 
# in the documentation for the functions (i.e. ?sum and ?mean)
#
# Look at the documentation for sum and for mean (i.e. ?sum and ?mean).
# The "Usage" section shows the names of the arguments and their default values.
# The "Arguments" section explains what each argument is expected to contain.
# The "Value" section explains how the return value for the function is calculated.
#
# It takes some time and practice to be proficient at reading R's help pages.
# However, understanding how to read and interpret R's help pages
# is a critical skill that allows you to become familiar with R's built in
# functions.
#
# An "ellipsis" (i.e. three periods, ... ) in the help pages
# stands for the ability to type several values in place of the 
# ellipsis. For example, the ... in the help page for sum, indicates
# the ability to type several different values to be summed. This is
# described in the ARGUMENTS section where it explains that ... stands
# for "numeric or complex or logical vectors".

# View the help page by typing: 
#
# ?sum    

# USAGE: sum(..., na.rm = FALSE)
# ARGUMENTS: 
# ...      numeric or complex or logical vectors
# na.rm    (see the help page)


# However, for the mean function, there is a single argument named x that 
# is expected to contain the values to be averaged. The ellipsis shown 
# in the help page for mean is used for a more subtle reason. It shows where
# additional arguments, not listed on this help page, might be specified 
# (this is an advanced concept that we'll return to later). 

# View the help page by typing: 
#
# ?mean   

# USAGE: mean(x, trim = 0, na.rm = FALSE, ...)
# ARGUMENTS: 
# x        An R object. (i.e. a vector - these are the numbers)
# trim     (see help page)
# na.rm    (see help page)
# ...        further arguments passed to or from other methods.


# You can use the c function to combine values from different functions.
# Make sure that you match parentheses correctly.

c(   rep(100,3)  ,   seq(-5,-7)  )   # 100 100 100 -5 -6 -7

[1] 100 100 100  -5  -6  -7

# DON'T FORGET THE c( ... )
rep(100,3), seq(-5,-7)      # ERROR

Error in parse(text = input): <text>:2:11: unexpected ','
1: # DON'T FORGET THE c( ... )
2: rep(100,3),
             ^

6.16 — Practice —

#----------------------------------------------------.
# QUESTION
# What does the following code produce?
# 
#
#    range(   c( rep(100,3)   , seq(990,1005)  ,  seq(-5,-7) )   )
#----------------------------------------------------.

Click here for the answer:

range(   c( rep(100,3)   , seq(990,1005)  ,  seq(-5,-7) )   )

[1]   -7 1005

6.17 — Practice —

#----------------------------------------------------.
# QUESTION
# Write R code that takes the average of the first
# 200 even numbers.
#----------------------------------------------------.

Click here for the answer:

mean(seq(from=2, to=400, by=2))

[1] 201

6.18 non-vectors (later in the course).

#----------------------------------------------------------------------------.
# Things that aren't vectors (e.g. dataframes, factors, matrices, etc)  ####
#----------------------------------------------------------------------------.

# A vector is the simplest arrangement of values in R.

# R allows for more complex arrangements of data, which we will learn about
# later in the course, such as factors, matrices, dataframes, etc.

# These more complex arrangements of data are created from vectors but are 
# technically not vectors themselves. One example of such an arrangement 
# of data is a data.frame.
# We will cover dataframes later in the course.
# For now, I just want to demonstrate that R has structures that are NOT vectors.



# A dataframe is made up of vectors, but it itself is NOT a vector.
example = data.frame(students = c("joe", "sue", "bob"), 
                      test1 = c(71,85,90),
                      test2 = c(83, 92, 95), stringsAsFactors = FALSE)
example

  students test1 test2
1      joe    71    83
2      sue    85    92
3      bob    90    95

is.vector(example)      # FALSE

[1] FALSE

is.data.frame(example)  # TRUE

[1] TRUE

6.19 length( SOME_VECTOR )

#-----------------------------------------------------------------------.
#
# length(vector)  returns the number of values in the vector ####
# 
#-----------------------------------------------------------------------.

# Set the value of grades
grades <- c(72,95,79,85)

# the length function returns the number of values in a vector
length(grades)          #4

[1] 4

length(c(25, 10))       #2

[1] 2

length(c(100,200,300))  #3

[1] 3

# A single value is a vector - but it doesn't need to be surrounded with c()

length(c(100))   # the length of a vector that contains a single item is 1

[1] 1

length(100)      # ... same thing ... don't use the c - it's not necessary

[1] 1

c(100)  # this is the same as just 100, the "c" is not necessary if you have just one value.

[1] 100

100 # same thing - don't use the c for a single value

[1] 100

grades   # show all grades

[1] 72 95 79 85

grades + 5   # show what the values would be if we added 5 to each grade

[1]  77 100  84  90

grades       # however, grades did NOT actually change

[1] 72 95 79 85

# If you want to change the value of grades, you need to 
# use the = sign or the <- or the ->. For example:
grades     # show grades

[1] 72 95 79 85

grades <- grades + 10   # add 10 to each grade and update grades with the new values
grades     # grades now has the new values

[1]  82 105  89  95

prices = c(1.99, 2.99, 3.99)
doublePrices = 2 * prices
doublePrices

[1] 3.98 5.98 7.98

6.20 Counting arguments

#############################################################.
#
# Arguments (AKA "parameters") to a function.   ####
#
# It is important to know how many arguments are being passed 
# to a function. The arguments to a function appear in the (parentheses)
# next to the function name and are separated from each other with commas.
#
#########################################################################.

# Remember that the round function takes TWO arguments
#
#   x is the values to round
#
#   digits is the position to round to

round(100.729, 1)  # 100.7

[1] 100.7

round(100.729, 2)  # 100.73

[1] 100.73

round (100.729)    # 101

[1] 101

# The first argument is allowed to be a vector with multiple values

round (   c(100.729, 200.618)  , 1)  # 100.7  200.6

[1] 100.7 200.6

grades = c(82, 105, 89, 95)

sum(grades)  # one argument - add up all grades (not very useful for grading ...)

[1] 371

sum(c(82,105,89,95))  # also one argument - same exact thing, sum is given 1 vector

[1] 371

sum(82,105,89,95)     # four arguments - same result, HOWEVER sum is given 4 different vectors - same answer

[1] 371

# The sum function will sum all of the values in all 
# of its arguments. The following all produce the same
# result (i.e. 306) but in different ways.

sum( c(100,200) , c(1,2,3))   # 2 arguments

[1] 306

sum( c(100,200,1,2,3) )       # 1 argument

[1] 306

sum( 100,200,1,2,3 )          # 5 arguments

[1] 306

6.21 To get an average use the mean function

# IMPORTANT: the mean function works a little differently than the sum function.
#
# The mean function requires that all values being averaged are passed as a single vector. ####

grades        # show all the grades

[1]  82 105  89  95

grades = c(82, 105, 89, 95)  
mean(grades)  # get the average

[1] 92.75

mean( c(82,105,89,95) )  # same thing - there is ONE vector, ie a vector, c(72,95,79,85)

[1] 92.75

mean(82,105,89,95)    # I didn't use the c() function here - there are 4 vectors!!!

[1] 82

# To summarize: 
# sum and mean are not consistent in the way they handle multiple arguments
sum(1,2,3)   # works as expected

[1] 6

mean(1,2,3)  # does not work as most people would expect - answer is 1

[1] 1

# View the help page by typing: 
#
#   ?mean

# Examine the documentation for mean to see why. The Usage section of the 
# documentation includes the following: mean(x, trim = 0, na.rm = FALSE, ...)
# The "x" corresponds to a single vector that contains
# the values to be averaged. If you pass the values without
# the c() function, then the 2nd value listed is actually
# passed to the "trim" argument of mean. If you want to know
# what the "trim" argument is used for, see the help
# page for "mean". If you don't specify any value for "trim"
# then "mean" will work as you expect.
# "x" 
#
# View the help page by typing: 
#
#   ?mean     # see the documentation for mean

# Arguments passed to mean:
#
#  x      - a vector that contains the values to be averaged
#
#  trim   - a fraction (0 to 0.5) of observations to be ignored (i.e. trimmed) from the beginning and end of the vector
#
#  na.rm  - WE WILL DISCUSS THIS LATER ...



# Return the average of the numbers in the vector.
#
# Return value is 400 , i.e.  (100+200+300+500+900) / 5
mean(c(100,200,300,500,900))

[1] 400

# the code above does the same as the next line
sum(c(100,200,300,500,900)) / 5

[1] 400

# DO NOT DO THE FOLLOWING !!!!  
# The mean function is being passed a SINGLE value and 
# does nothing meaningful in this case.

mean(sum(100,200,300,500,900) / 5)  # basically same as:  sum(100,200,300,500,900) / 5

[1] 400

# This is because by the time, the mean function
# starts working the value: sum(100,200,300,500,900) / 5
# has already been calculated as 400.
# It would be just as ridiculous as running the following code
# which just returns the number 400 - the mean function does
# nothing meaningful in this case. 

mean ( 400 )   # This is the same as 400 / 1

[1] 400

6.21.1 trim argument to mean

# the "trim" argument to mean   ####
#
# trim (ie. remove) 0.2 (ie. 1/5) of the values (ie. 1 value) 
# from the beginning and end of the vector
#
# Return value is 333.333, ie. mean(c(200,300,500))
mean(c(100,200,300,500,900), 0.2)

[1] 333.3333

mean(c(200,300,500)) # same result

[1] 333.3333

grades = c(5, 82, 85, 89, 105)

mean(grades)   # mean ( c(5,82,85,89,105))

[1] 73.2

mean(grades, trim = 0.2)  # mean(c(82,85,89))

[1] 85.33333

grades

[1]   5  82  85  89 105

# trim (ie. remove) 0.4 (ie. 2/5) of the values (i.e. 2 values)
# from the beginning and end of the vector
#
# Return value is 300, i.e. mean(300) 
mean(c(100,200,300,500,900), 0.4)   # trim 0.4 = 2/5 of the values from the beginning and end

[1] 300

mean(c(500,200,300,900,100), 0.4)   # trim 0.4 = 2/5 of the values from the beginning and end

[1] 300

# In the following the result is 100
# This is because the arguments are assigned in the following order
#
# x, ie. the values to be averaged = first argument = 100
# trim = second argument = 200
# na.rm = 3rd argument = 300
# ... = all other arguments = c(500,900)
#
# Other than the x=100, the other arguments are really meaningless so
# the result is the average of 100, which is 100.

mean(100,200,300,500,900)

[1] 100

# PROBLEM:
#
# REMEMBER that mean requires that all values being averaged are in a SINGLE vector
# Therefore to take the average of the values in x and in y the following WILL NOT WORK:

mean(x, y)   # will not work - will just show the mean of the values in x

Error in mean.default(x, y): 'trim' must be numeric of length one

# SOLUTION:
#
# Remember that you can combine multiple vectors into a single vector with 
# the c function.

x <- c(10,20,30)
y <- c(40, 50)

mean(c(x,y))  # combine x and y into a single vector and take the mean of that vector

[1] 30

#--------------------------------------------------------.
# QUESTION : ####
#
# Grades for class1 and class2 are as shown below. 
# 
#   class1grades <- c(80,90,100)
#   class2grades <- c(85, 88)
#
# (a) get the two averages, one for each class
# (b) get the average for all the students in both classes
#--------------------------------------------------------.

Click here for the answer

# ANSWER    
class1grades <- c(80,90,100)         # ANSWER   
class2grades <- c(85, 88)            # ANSWER   
class1average <- mean(class1grades) # ANSWER    
class2average <- mean(class2grades) # ANSWER    
allStudentsAverage <- mean ( c(class1grades, class2grades)) # ANSWER     - remember the c(...)
class1average   # ANSWER

[1] 90

class2average   # ANSWER

[1] 86.5

allStudentsAverage # ANSWER

[1] 88.6