c(100,200,300) + 5 # return a vector that contains c(105, 205, 305)
[1] 105 205 305
A vector is a collection of several values. R streamlines processing of vector by working with all the values in a vector simultaneously. This is best explained with an example.
The c() function collects several values into a single “vector” of values. When you perform arithmetic with a vector each item in the vector is operated upon. In the following exmaple c(100,200,300) creates a single “vector” that includes the specified values. The number 5 is then added to each value in the vector. This produces a new vector that contains 105, 205 and 305.
The above demonstrates a simple example of “vector arithmetic”. Any arithmetic that is done with vectors operates on all values in the vector. We’ll explore this in more depth soon.
c() is NOT the only function that can create a vector. R has many, many functions that create vectors.
For example, the seq function can create a vector of values that follow some sequence or pattern. The following uses the seq function to create a sequences with the numbers 11,12,13,14,15.
We can do vector arithmetic with the results. In the following example 100 gets added to each value in the vector 3,4,5,6 to result in 103,104,105,106
The rep() function returns a vector that contains repetitions of the specified values. For example
You can pass the return value of one function call into another function call.
Each of the above examples returns a single vector. We can do vector arithmetic with the results.
[1] 105 105 105
[1] 1003 1004 1005 1003 1004 1005 1003 1004 1005 1003 1004 1005
R has several YRTODO - FIX THIS The is.vector function returns TRUE if its argument is a vector #### and FALSE otherwise.
As we said above, a vector is a “collection” of values. The simplest vector is just a single value. (it is technically a collection of just one value).
#-----------------------------------------------------------------------.
# The is.vector function returns TRUE if its argument is a vector ####
# and FALSE otherwise.
#-----------------------------------------------------------------------.
#-----------------------------------------------------------------------.
# The simplest vector is just a single value ... ####
# (it is technically a collection of just one value).
#-----------------------------------------------------------------------.
# View the help page by typing:
#
# ?is.vector
is.vector ( 3 )
[1] TRUE
[1] TRUE
# A variable that contains a vector is a vector ####
priceOfApple = 1.99
is.vector(priceOfApple) # TRUE
[1] TRUE
# The c() function is used to combine multiple values into a single vector. ####
#
# You can think of the "c" as standing for the word "combine".
# "c" actually stands for the word "concatenate" which
# is a technical fancy shmancy word for "combine things together".
# The following is a vector with mutliple values.
# The c function combines (i.e. "concatenates") the multiple values into a
# single "vector"
c(100,200,300, 50, -2, 25)
[1] 100 200 300 50 -2 25
[1] TRUE
Error in is.vector(100, 200, 300, 50, -2, 25): unused arguments (300, 50, -2, 25)
#-----------------------------------------------------.
# Other functions can also create vectors.
#-----------------------------------------------------.
#.............................................................................
# The range function returns a vector
#
# The range function returns the minimum and maximum values that are in a vector ####
#.............................................................................
range(someNumbers)
[1] -2 300
[1] TRUE
# You can also capture the result in a variable
lowestAndHighest = range(someNumbers)
lowestAndHighest # -2 300
[1] -2 300
[1] TRUE
#.............................................................................
# The seq function returns a vector. In its simplest use,
# seq returns the sequence starting with the 1st argument, ending with the 2nd argument ####
#
# NOTE - we will come back to the seq function to learn about
# much more complex ways of using it.
#.............................................................................
# Example 1
seq(5,10) # 5 6 7 8 9 10
[1] 5 6 7 8 9 10
[1] TRUE
[1] 10 9 8 7 6 5
[1] TRUE
[1] 0.5 1.5 2.5
[1] TRUE
[1] 5 6 7 8 9 10
[1] TRUE
[1] 10 9 8 7 6 5
[1] TRUE
[1] 0.5 1.5 2.5
[1] TRUE
[1] 0.5 1.5 2.5
#.............................................................................
# The rep function returns a vector ####
#
# In its simplest use, the rep function returns a vector of it's first
# argument repeated the number of times specified by its 2nd argument.
#
# NOTE - we will come back to the rep function to learn about
# more complex ways of using it.
#.............................................................................
rep(100,3) # 100 100 100
[1] 100 100 100
[1] 1 2 3 1 2 3
#-------------------------------------------------------------------------------------.
# DO NOT WRITE INDIVIDUAL VALUES WITHOUT COMBINING THEM TOGETHER WITH A FUNCTION CALL!
#-------------------------------------------------------------------------------------.
100,200,300 # ERROR - individual values separated by commas are meaningless to R ####
Error in parse(text = input): <text>:5:4: unexpected ','
4:
5: 100,
^
# REMEMBER - if no other function call is being used, you can use the
# c function to combine individual values
c(100,200,300) # 100 200 300 (no error)
[1] 100 200 300
#-------------------------------------------------------------.
#
# More about the c function ####
#
#-------------------------------------------------------------.
#..............................................................................
# If you "nest" calls to "c", ie. if you combine one vector inside of another
# vector by using the c function, the result is a single vector
#..............................................................................
c(100, 200, c(30, 20, 10), 600) # same as c(100,200,300,400,500,600)
[1] 100 200 30 20 10 600
[1] 100 200 30 20 10 600
#..............................................................................
# You can use the c function to combine multiple vectors into a single vector.
#..............................................................................
x <- c(10,20,30)
y <- c(40, 50)
z <- c(x, y) # combine the values from x and y into z
z
[1] 10 20 30 40 50
Error in parse(text = input): <text>:1:7: unexpected ','
1: z <- x,
^
# QUESTION ####
# Find the sum of all the values that are in x and y, without using z
# ANSWER
sum(c(x,y)) # This works
[1] 150
[1] 150
# QUESTION ####
#
# Find the average (i.e. mean) of all the values that are in x and y,
# without using z
# ANSWER
mean(c(x,y)) #This works
[1] 30
Error in mean.default(x, y): 'trim' must be numeric of length one
# QUESTION
# Why did we get an error in the last example?
# ANSWER
#
# From the documentation for sum and mean (i.e. ?sum and ?mean) we can
# see that the sum function allows multiple vectors that contain the numbers to be
# to be passed as separate arguments. However, the mean function requires
# all of the numbers to be averaged to be in a single vector that is passed
# to the argument named x. It's true that one might expect these functions
# to be more similar in how they are called. However, the designers of the
# language decided otherwise. The underlying reasons for the difference in
# the design of these functions is irrelevant - bottom line is you
# need to know how to call the functions. The place to learn this is
# in the documentation for the functions (i.e. ?sum and ?mean)
#
# Look at the documentation for sum and for mean (i.e. ?sum and ?mean).
# The "Usage" section shows the names of the arguments and their default values.
# The "Arguments" section explains what each argument is expected to contain.
# The "Value" section explains how the return value for the function is calculated.
#
# It takes some time and practice to be proficient at reading R's help pages.
# However, understanding how to read and interpret R's help pages
# is a critical skill that allows you to become familiar with R's built in
# functions.
#
# An "ellipsis" (i.e. three periods, ... ) in the help pages
# stands for the ability to type several values in place of the
# ellipsis. For example, the ... in the help page for sum, indicates
# the ability to type several different values to be summed. This is
# described in the ARGUMENTS section where it explains that ... stands
# for "numeric or complex or logical vectors".
# View the help page by typing:
#
# ?sum
# USAGE: sum(..., na.rm = FALSE)
# ARGUMENTS:
# ... numeric or complex or logical vectors
# na.rm (see the help page)
# However, for the mean function, there is a single argument named x that
# is expected to contain the values to be averaged. The ellipsis shown
# in the help page for mean is used for a more subtle reason. It shows where
# additional arguments, not listed on this help page, might be specified
# (this is an advanced concept that we'll return to later).
# View the help page by typing:
#
# ?mean
# USAGE: mean(x, trim = 0, na.rm = FALSE, ...)
# ARGUMENTS:
# x An R object. (i.e. a vector - these are the numbers)
# trim (see help page)
# na.rm (see help page)
# ... further arguments passed to or from other methods.
# You can use the c function to combine values from different functions.
# Make sure that you match parentheses correctly.
c( rep(100,3) , seq(-5,-7) ) # 100 100 100 -5 -6 -7
[1] 100 100 100 -5 -6 -7
#----------------------------------------------------------------------------.
# Things that aren't vectors (e.g. dataframes, factors, matrices, etc) ####
#----------------------------------------------------------------------------.
# A vector is the simplest arrangement of values in R.
# R allows for more complex arrangements of data, which we will learn about
# later in the course, such as factors, matrices, dataframes, etc.
# These more complex arrangements of data are created from vectors but are
# technically not vectors themselves. One example of such an arrangement
# of data is a data.frame.
# We will cover dataframes later in the course.
# For now, I just want to demonstrate that R has structures that are NOT vectors.
# A dataframe is made up of vectors, but it itself is NOT a vector.
example = data.frame(students = c("joe", "sue", "bob"),
test1 = c(71,85,90),
test2 = c(83, 92, 95), stringsAsFactors = FALSE)
example
students test1 test2
1 joe 71 83
2 sue 85 92
3 bob 90 95
[1] FALSE
[1] TRUE
#-----------------------------------------------------------------------.
#
# length(vector) returns the number of values in the vector ####
#
#-----------------------------------------------------------------------.
# Set the value of grades
grades <- c(72,95,79,85)
# the length function returns the number of values in a vector
length(grades) #4
[1] 4
[1] 2
[1] 3
# A single value is a vector - but it doesn't need to be surrounded with c()
length(c(100)) # the length of a vector that contains a single item is 1
[1] 1
[1] 1
[1] 100
[1] 100
[1] 72 95 79 85
[1] 77 100 84 90
[1] 72 95 79 85
# If you want to change the value of grades, you need to
# use the = sign or the <- or the ->. For example:
grades # show grades
[1] 72 95 79 85
grades <- grades + 10 # add 10 to each grade and update grades with the new values
grades # grades now has the new values
[1] 82 105 89 95
[1] 3.98 5.98 7.98
#############################################################.
#
# Arguments (AKA "parameters") to a function. ####
#
# It is important to know how many arguments are being passed
# to a function. The arguments to a function appear in the (parentheses)
# next to the function name and are separated from each other with commas.
#
#########################################################################.
# Remember that the round function takes TWO arguments
#
# x is the values to round
#
# digits is the position to round to
round(100.729, 1) # 100.7
[1] 100.7
[1] 100.73
[1] 101
# The first argument is allowed to be a vector with multiple values
round ( c(100.729, 200.618) , 1) # 100.7 200.6
[1] 100.7 200.6
grades = c(82, 105, 89, 95)
sum(grades) # one argument - add up all grades (not very useful for grading ...)
[1] 371
[1] 371
sum(82,105,89,95) # four arguments - same result, HOWEVER sum is given 4 different vectors - same answer
[1] 371
# IMPORTANT: the mean function works a little differently than the sum function.
#
# The mean function requires that all values being averaged are passed as a single vector. ####
grades # show all the grades
[1] 82 105 89 95
[1] 92.75
[1] 92.75
[1] 82
# To summarize:
# sum and mean are not consistent in the way they handle multiple arguments
sum(1,2,3) # works as expected
[1] 6
[1] 1
# View the help page by typing:
#
# ?mean
# Examine the documentation for mean to see why. The Usage section of the
# documentation includes the following: mean(x, trim = 0, na.rm = FALSE, ...)
# The "x" corresponds to a single vector that contains
# the values to be averaged. If you pass the values without
# the c() function, then the 2nd value listed is actually
# passed to the "trim" argument of mean. If you want to know
# what the "trim" argument is used for, see the help
# page for "mean". If you don't specify any value for "trim"
# then "mean" will work as you expect.
# "x"
#
# View the help page by typing:
#
# ?mean # see the documentation for mean
# Arguments passed to mean:
#
# x - a vector that contains the values to be averaged
#
# trim - a fraction (0 to 0.5) of observations to be ignored (i.e. trimmed) from the beginning and end of the vector
#
# na.rm - WE WILL DISCUSS THIS LATER ...
# Return the average of the numbers in the vector.
#
# Return value is 400 , i.e. (100+200+300+500+900) / 5
mean(c(100,200,300,500,900))
[1] 400
[1] 400
# DO NOT DO THE FOLLOWING !!!!
# The mean function is being passed a SINGLE value and
# does nothing meaningful in this case.
mean(sum(100,200,300,500,900) / 5) # basically same as: sum(100,200,300,500,900) / 5
[1] 400
# This is because by the time, the mean function
# starts working the value: sum(100,200,300,500,900) / 5
# has already been calculated as 400.
# It would be just as ridiculous as running the following code
# which just returns the number 400 - the mean function does
# nothing meaningful in this case.
mean ( 400 ) # This is the same as 400 / 1
[1] 400
# the "trim" argument to mean ####
#
# trim (ie. remove) 0.2 (ie. 1/5) of the values (ie. 1 value)
# from the beginning and end of the vector
#
# Return value is 333.333, ie. mean(c(200,300,500))
mean(c(100,200,300,500,900), 0.2)
[1] 333.3333
[1] 333.3333
[1] 73.2
[1] 85.33333
[1] 5 82 85 89 105
# trim (ie. remove) 0.4 (ie. 2/5) of the values (i.e. 2 values)
# from the beginning and end of the vector
#
# Return value is 300, i.e. mean(300)
mean(c(100,200,300,500,900), 0.4) # trim 0.4 = 2/5 of the values from the beginning and end
[1] 300
[1] 300
# In the following the result is 100
# This is because the arguments are assigned in the following order
#
# x, ie. the values to be averaged = first argument = 100
# trim = second argument = 200
# na.rm = 3rd argument = 300
# ... = all other arguments = c(500,900)
#
# Other than the x=100, the other arguments are really meaningless so
# the result is the average of 100, which is 100.
mean(100,200,300,500,900)
[1] 100
# PROBLEM:
#
# REMEMBER that mean requires that all values being averaged are in a SINGLE vector
# Therefore to take the average of the values in x and in y the following WILL NOT WORK:
mean(x, y) # will not work - will just show the mean of the values in x
Error in mean.default(x, y): 'trim' must be numeric of length one
# SOLUTION:
#
# Remember that you can combine multiple vectors into a single vector with
# the c function.
x <- c(10,20,30)
y <- c(40, 50)
mean(c(x,y)) # combine x and y into a single vector and take the mean of that vector
[1] 30
#--------------------------------------------------------.
# QUESTION : ####
#
# Grades for class1 and class2 are as shown below.
#
# class1grades <- c(80,90,100)
# class2grades <- c(85, 88)
#
# (a) get the two averages, one for each class
# (b) get the average for all the students in both classes
#--------------------------------------------------------.
# ANSWER
class1grades <- c(80,90,100) # ANSWER
class2grades <- c(85, 88) # ANSWER
class1average <- mean(class1grades) # ANSWER
class2average <- mean(class2grades) # ANSWER
allStudentsAverage <- mean ( c(class1grades, class2grades)) # ANSWER - remember the c(...)
class1average # ANSWER
[1] 90
[1] 86.5
[1] 88.6
© 2025 Y. Rosenthal. All rights reserved.