###############################################.###############################################.## INDEXING (i.e. SUBSCRIPTING) A VECTOR###############################################.###############################################.# There are several terms that come up in this section# that will be defined below (e.g. "element", "index", "subscript").# # These terms are not actually typed in R commands.# Rather these terms are used to describe features that are common to many# programming languages (including R) but they are not actually R commands. # For example, you would never type the word "element" or "index" at an R prompt.#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~# DEFINITION: an "element" of a vector ###### An "element" of a vector is a single item in the vector. For example, # a single number or a single TRUE or FALSE value. For example given # the code:## > x = c(100,200,300) ## We can say that: # 100 is an element of the vector x.# 200 is also an element of the vector x.# 300 is the third, and final element of x.## Similarly, given the following code, we can say that 50, 60, 70 and 80 are# all elements of the vector y.# # > y = seq(50, length.out=4, by=10)# > y# [1] 50 60 70 80#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~# DEFINITION: "indexing" a vector ###### "Indexing a vector" means writing code that retrieves specific# elements from the vector. For example the following code # retrieves the 2nd element from the vector, stuff.## > stuff = c(10,20,30)## > stuff[2]# [1] 20## > 10 * stuff[2} + 7# [1] 207## Using positive position numbers is just one way to index a vector. There are# actually four different ways to index a vector e.g. using# "negative position numbers" to identify the values you "don't want". # All of the ways to index a vector will be explained in more detail below.### DEFINITION: "subscript"## In general, to index a vector you type the name of the vector followed# by [square brackets] that contain information that identifies which elements# of the vector to retrieve.## The information that appears between the [square brackets] when indexing# a vector is known as a "subscript". For example, in the code above,# stuff[2], the number 2 is a subscript. Some programmers refer to # the act of "indexing a vector" as "subscripting the vector".# When speaking casually, programmers will often refer to code such # as "stuff[2]" as "stuff sub two".### There are several alternative ways to "index" a vector in R.# These include the following ways and will be described in # much more detail below:# # WAYS TO INDEX (AKA to subscript) A VECTOR# - indexing a vector with positive numbers# - index with negative numbers to identify the elements you don't want# - indexing with TRUE/FALSE values # - indexing "named vectors" with the "names" of the elements (we'll cover this later)#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#########################################################.# Indexing with positive position numbers (AKA "index" numbers) ##### (this was shown above, but will be elaborated on in more detail here)## e.g. stuff[2] # stuff[c(2,5)]# stuff[c(5,2,5,2,5,2)]#########################################################.#-----------------------------------------------------------------------------.# To identify specific parts of a vector, type [square brackets].# Inside the [square brackets] type A SINGLE VECTOR that identifies the# values that you want. R has four different ways to specify which values you# want. We will learn all of them soon. For now, we will focus on the first# way ...# # The first way is to specify the values you want is by identifying the # position numbers (AKA "index" numbers) of the specific elements you# are interested in.## EXMAMPLE# > stuff = c(10,20,30,40) # > stuff[2] # get 2nd element - NOTE, 2 is a vector# [1] 20## > stuff[ c(2,4) ] # get the 2nd,4th elements - NOTE the c(2,5) is a SINGLE VECTOR# [1] 20 40#-----------------------------------------------------------------------------.stuff <-c(10,20,30,40,50)stuff
[1] 10 20 30 40 50
stuff[2] # 20 i.e. the 2nd value
[1] 20
stuff # 10 20 30 40 50 i.e. stuff didn't change
[1] 10 20 30 40 50
stuff [c(2,5)] # 20 50 i.e. 2nd and 5th values
[1] 20 50
stuff
[1] 10 20 30 40 50
# MAKE SURE that there is a JUST ONE vector in the [brackets] ####stuff[2,5] # ERROR - 2,5 is TWO VECTORS!
Error in stuff[2, 5]: incorrect number of dimensions
#-------------------------------------------------------.# You may repeat the same index numbers more than once #####-------------------------------------------------------.stuff[c(2,2,2,2)] # 20 20 20 20
[1] 20 20 20 20
stuff # 10 20 30 40 50 (stuff didn't change)
[1] 10 20 30 40 50
stuff[c(5,2,5,2)] # 50 20 50 20 (you can repeat values)
[1] 50 20 50 20
#------------------------------------------------------------.# QUESTION - part a# # Write a command to change the vector named stuff.# After your command runs, the vector should only contain the# first 3 elements that were originally in stuff. # For example:## EXAMPLE 1# > stuff = c(10,20,30,40,50)# > YOUR COMMAND GOES HERE# > stuff# [1] 10 20 30## EXAMPLE 2# > stuff = c(92, 100, 75, 63, 59, 95)# > YOUR COMMAND# > stuff# [1] 92 100 75#------------------------------------------------------------.# ANSWER# EXAMPLE 1# setup some datastuff =c(10,20,30,40,50)#.....................# YOUR CODE GOES HERE#.....................stuff = stuff[1:3] # **ANSWER**stuff # 10 20 30 # **ANSWER**
[1] 10 20 30
# EXAMPLE 2# set up some datastuff =seq(10, 20, by=2)stuff # 10 12 14 16 18 20
[1] 10 12 14 16 18 20
#.........................# YOUR COMMAND GOES HERE (must be the same code as EXAMPLE 1)#.........................stuff = stuff[1:3] # **ANSWER**stuff # 10 12 14 # **ANSWER**
[1] 10 12 14
#------------------------------------------------------------.# QUESTION - part b# # Define a function named, firstThreePlus1 that takes a single argument # named, vec that is expected to be a vector. The function should return# a vector that contains just the first 3 elements of the vec argument.## The values of the elements that are returned should be one more than# the values of the elements that were in the argument, vec.## FOR EXAMPLE:## > stuff = c(10,20,30,40,50)# > firstThreePlus1(stuff)# [1] 11 21 31## > grades = c(92,100,67,85,93,81)# > firstThreePlus1(vec = grades) # "vec=" isn't necessary, but should work# [1] 93 101 68## > firstThreePlus1 ( seq(5, 100, by=5) )# [1] 6 11 16#------------------------------------------------------------.# ANSWERfirstThreePlus1 =function( vec ) { # **ANSWER** vec[1:3] +1# **ANSWER**} # **ANSWER**#.......................# YOUR CODE GOES HERE#.......................# Check your answer:stuff =c(10,20,30,40,50)firstThreePlus1(stuff) # 11 21 31
#----------------------------------------------------------------------------.# Any code that generates a vector can be specified inside the brackets.##### This includes function calls and nested function calls.#----------------------------------------------------------------------------.# You can use any mechanism to generate the vector inside the bracketsstuff =c(10,20,30,40,50)stuff[rep(2,5)] # 20 20 20 20 20 i.e. stuff[c(2,2,2,2,2)]
# original: stuff[ c( rep( seq(3,1) , 2 ) , 5 )]# do seq(3,1) : stuff[ c( rep( c(3,2,1) , 2 ) , 5 ) ]# do rep(c(3,2,1),2) stuff[ c( c(3,2,1,3,2,1) , 5 )]# stuff[ c( 3,2,1,3,2,1, 5 )]# final answer: 30 20 10 30 20 10 50#------------------------------------------------------------.# If you try to access information past the end of a vector# you will get NA, i.e. the information is Not Available#------------------------------------------------------------.stuff
[1] 10 20 30 40 50
stuff[7] # NA (7th value is Not Available)
[1] NA
length(stuff)
[1] 5
stuff[3:8] # 30 40 50 NA NA NA
[1] 30 40 50 NA NA NA
#------------------------------------------------------.# Index the target of an assignment to assign values to specific positions #####------------------------------------------------------.stuff <-c(10,20,30,40,50)stuff
[1] 10 20 30 40 50
stuff[2] <-999# put 999 into the 2nd position in stuff ####stuff
[1] 10 999 30 40 50
#-----------------------------------------------------------------------------.# If the target of the assignment includes [brackets] the assignment happens element by element ##### (see example below)#-----------------------------------------------------------------------------.stuff <-c(10,20,30,40,50)stuff # 10 20 30 40 50
[1] 10 20 30 40 50
stuff[c(2,5)] <-c(999, 888) # put 999 in position 2 and 888 in position 5 ####stuff # 10 999 30 40 888
[1] 10 999 30 40 888
#-----------------------------------------------------------------------------.# This happens as long as TARGET of the assignment uses [square brackets].# This works even if the values being assigned do not use [brackets]#-----------------------------------------------------------------------------.newstuff <-c(999,888)newstuff
[1] 999 888
stuff <-c(10,20,30,40,50)stuff[ c(2,5) ] = newstuff # same thing, even though there newstuff has no bracketsstuff # 10 999 30 40 888
[1] 10 999 30 40 888
#-----------------------------------------------------------------------------.# The left hand side of an assignment that uses brackets (i.e. an index)# must specify a variable that ALREADY EXISTS#-----------------------------------------------------------------------------.thisDoesntExist [c(2,5)] <-c(999,888) # ERROR since thisDoesntExist doesn't exist
Error: object 'thisDoesntExist' not found
#-----------------------------------------------------------------------------.# Assigning a value to a position much past the end of a vector inserts NAs prior to that value.# (see the example)#-----------------------------------------------------------------------------.grades =100grades # 100
[1] 100
grades[2] # NA
[1] NA
grades[5] # NA
[1] NA
grades[c(2,5)] <-c(90, 80) # assign 90 to 2nd position, 80 to 5th positiongrades # 100 90 NA NA 80
[1] 100 90 NA NA 80
#-----------------------------------------------------------.# Assignment uses the recycling rule ##### # If there are more values on the left of the = sign than# on the right, R uses the recycling rule to recycle values# from the vector on the right hand side of the = sign.#-----------------------------------------------------------.#........................................# EXAMPLE 1 - recycling a single value #........................................stuff =c(10,20,30,40,50)stuff
[1] 10 20 30 40 50
stuff[c(1,3,5)] =999# same as stuff[c(1,3,5)] = c(999,999,999)stuff # 999 20 999 40 999
[1] 999 20 999 40 999
# original : stuff[c(1,3,5)] = 999# recycling rule: stuff[c(1,3,5)] = c(999, 999, 999)#........................................# EXAMPLE 2 - recycling multiple values#........................................longerStuff =c(10,20,30,40,50,60,70,80)longerStuff
[1] 10 20 30 40 50 60 70 80
longerStuff[1:8] =c(888,999) # uses the recycling rule (see below)longerStuff # 888 999 888 999 888 999 888 999
[1] 888 999 888 999 888 999 888 999
# original : longerStuff[1:8] = c(888,999) # colon operator : longerStuff[c(1,2,3,4,5,6,7,8)] = c(888,999)# recycling rule : longerStuff[c(1,2,3,4,5,6,7,8)] = c(888,999,888,999,888,999,888,999)#.................................................................# EXAMPLE 3 - left hand side accesses past the end of a vector #.................................................................longerStuff =c(10,20,30)longerStuff
[1] 10 20 30
# In the following line the brackets on the "left hand side" (LHS)# contains 1:6, however prior to this line executing, the length of# the vector was only 3, ie. 10,20,30# That's fine.longerStuff[1:6] =c(888,999) longerStuff # 888 999 888 999 888 999
[1] 888 999 888 999 888 999
#.......................................................# EXAMPLE 4 - WARNING - recycling two vectors where length of# longer is not a mutliple of length of shorter one. # This warning will appear whenever the recylcing rule is used, not just in # assignments.#.......................................................x =c(10,20,30)x
[1] 10 20 30
x[1:3] =c(888,999) # WARNING - vector lengths are not multiples
Warning in x[1:3] = c(888, 999): number of items to replace is not a multiple
of replacement length
x # 888 999 888 - even though we got a "warning" it still works.
[1] 888 999 888
#.......................................................# EXAMPLE 5 - same as above happens with empty brackets#.......................................................x =c(10,20,30)x
[1] 10 20 30
x[] =c(888,999) # WARNING - same reason as previous example
Warning in x[] = c(888, 999): number of items to replace is not a multiple of
replacement length
x # 888 999 888 - even though we got a "warning" it still works.
[1] 888 999 888
#.......................................................# EXAMPLE 6 - complex expression in the brackets#.......................................................longerStuff =c(10,20,30,40,50,60,70,80)# The [brackets] in the following command contain a complex expression. That's fine.longerStuff[seq(2,length(longerStuff),by=2)] =c(888,999) # 10 888 30 999 50 888 70 999longerStuff # 10 888 30 999 50 888 70 999
[1] 10 888 30 999 50 888 70 999
# original : longerStuff[seq(2,length(longerStuff),by=2)] = c(888,999)## length function: longerStuff[seq(2,8,by=2)] = c(888,999)## seq function : longerStuff[c(2,4,6,8)] = c(888,999)## recycling rule : longerStuff[c(2,4,6,8)] = c(888,999,888,999)## final result : 10 888 30 999 50 888 70 999#.......................................................................................# EXAMPLE 7 - entire vector is replaced since target of assignment doesn't use [brackets]## This should be obvious! Just remember this and don't get confused ...# If the left hand side of the = sign is just a variable name without # any indexes then the entire contents of the variable is replaced.#.......................................................................................# The code below just replaces the entire variable longerStuff with c(888,999).# Specifically in 3rd line of code, ie.## longerStuff = c(888,999) # entire vector is replaced## Compare this with EXAMPLE 2 above and compare with the following code.# This is the only line of code that differs between the following example and EXAMPLE 2.# In EXAMPLE2 the code was:## longerStuff[1:8] = c(888,999) ## The extra brackets in EXAMPLE 2 caused the assignment to happen element by # element, rather than replacing the entire vector as in the example below.longerStuff <-c(10,20,30,40,50,60,70,80)longerStuff
[1] 10 20 30 40 50 60 70 80
longerStuff =c(888,999) # no [brackets] in target of assignment - entire vector is replacedlongerStuff # 888 999
[1] 888 999
##################################################.# 2023 - WILF - UP TO HERE - AFTER CLASS 8##################################################.