15  13. “character” values (also known as “strings”), cat(), nchar(), paste(), paste0()

#########################################################################.
#########################################################################.
## 
## character values (also known as "strings")
##
## character values must be surrounded by
## "quotes"      (also known as "double quotes")   or 
## 'apostrophes'  (also known as 'single quotes')
##
#########################################################################.
#########################################################################.

# R has different types (or "modes") of data.
# We've already seen numeric data and logical (TRUE/FALSE) data.
# The other type (or "mode") of data we will learn about now
# is "character" data.
#
# Character data can contain ANY "characters" but the characters must
# be surrounded by "double quotes" (i.e. quotes) or 'single quotes'
# (i.e. apostrophes).

# Character data can be used to store words 
greeting <- "hello"
mode(greeting)          # "character"
[1] "character"
greeting                # "hello"
[1] "hello"
rep(greeting, times=5)  # "hello" "hello" "hello" "hello" "hello"
[1] "hello" "hello" "hello" "hello" "hello"

15.1 “double-quotes” vs ‘single-quotes’

# character data MUST be surrounded by "quotes" or 'apostrophes' 
# (they basically do the exact same thing in R)
#
# "quotes" are often called "double quotes"  (i.e. there are 2 lines for each quote)
# 'apostrophes' are often called 'single quotes' (i.e. there is 1 line for each apostrophe)

rep("hello", 5)   # "hello" is a character value, 5 is a numeric value
[1] "hello" "hello" "hello" "hello" "hello"
rep('hello', 5)   # 'hello' is a character value, 5 is a numeric value
[1] "hello" "hello" "hello" "hello" "hello"
# If you leave off the quotes, R will assume hello is a variable name
rep(hello, 5)   # ERROR - variable hello doesn't exist.
Error in eval(expr, envir, enclos): object 'hello' not found

15.2 R always displays character data surrounded with “double-quotes”

#------------------------------------------------------------------------------.
# Even if you use 'single quotes' to create a character vector, R will display
# the information with "double quotes". That is fine. Remember, 'single quotes'
# and "double quotes" are basically the same to R.
#------------------------------------------------------------------------------.
fruit = c("apple", "banana", "pear") # "double-quotes"

fruit      # no surprises:    "apple" "banana" "pear"
[1] "apple"  "banana" "pear"  
vegetables = c('potato', 'carrot')   # assigned with 'single-quotes'

vegetables # but displays with "double-quotes", i.e. "potato" "carrot"
[1] "potato" "carrot"

15.3 Many functions (e.g. length, rep, etc) work with all types of vectors

#-------------------------------------------------------------------.
# Many functions that you already know (e.g. length, rep)
# work with any type of vector (numeric, logical, character).
#-------------------------------------------------------------------.

people <- c("joe", "sue", "sam", "alice")   # "double quotes"
people
[1] "joe"   "sue"   "sam"   "alice"
length(people)  # 4
[1] 4
rep(people, 2)  # "joe" "sue" "sam" "alice" "joe" "sue" "sam" "alice" 
[1] "joe"   "sue"   "sam"   "alice" "joe"   "sue"   "sam"   "alice"
people          # people didn't change
[1] "joe"   "sue"   "sam"   "alice"
# As with numeric values, you don't need an actual variable.
# You can type the character values directly in wherever you want 
# to use it. You can use the c() function to create a vector with 
# more than one value.

rep(c("bob", "anne"), 2) # "bob" "anne" "bob" "anne"
[1] "bob"  "anne" "bob"  "anne"

15.4 Surround data containing quotes with apostrophes (and vice versa)

#-----------------------------------------------------------------------.
# Mixing "double quotes" and 'single quotes'
#
# Don't place 'apostrophes' inside of 'apostrophes' and
# don't place "quotes" inside of "quotes"
# unless ... (keep reading)
#
#-----------------------------------------------------------------------.

# You may type apostrophes inside of quotes (e.g. "How's it going?")
# and quotes inside of apostrophes (e.g. 'He said "Hello" to Sue').

saying <- c("What's up?")              # apostrophe inside quotes - that's fine
saying
[1] "What's up?"
sentence <- 'He said "Hello" to Sue.'  # "quotes" inside apostrophes is fine too

# When you display information that has a "quote" inside of a character value
# R will display the quotes that are inside of the value with a
# backslash before the quote, i.e. \"

sentence   # "He said \"Hello\" to Sue."
[1] "He said \"Hello\" to Sue."
# BACKSLASH      \   (used to "escape" quotes or apostrophes)
# FORWARD SLASH  /   (used for division, e.g. 3/4)

15.5 Quoting text that contains the same type of quote leads to errors.

#----------------------------------------------------------------------.
# You will get ERRORS if you you type a quote inside of quotes
# or an apostrophe inside of apostrophes. R will get confused.
#
# You can use \" or \' to place a quote inside of quotes 
# or an apostrophe inside of apostrophes
#----------------------------------------------------------------------.

# The next line would cause an error since the entire value
# is surrounded by apostrophes and it also contains an apostrophe
# in the actual text (i.e. in the word How's).
# (See below for an explanation of why this causes an error.)

question <- c('How's it going?')  # ERROR 
Error: <text>:14:20: unexpected symbol
13: 
14: question <- c('How's
                       ^
# How does R interpret this line of code?
# 
#   Step 1: R correctly interprets the 1st apostrophe (i.e. 'How ...)
#           as the start of the quoted text.
#
#   Step 2: R interprets the 2nd apostrophe (ie. the apostrophe that
#           appears after 'How') as the end of the quoted text.
#
#   Step 3: When R sees the 2nd apostrophe it assumes that the quotation
#           is over and expects to see a closing parentheses. However, instead
#           of a closing parentheses, the next symbol after the apostrophe
#           is the letter s. Therefore R gets confused and reports and error.

15.6 Use ” inside double-quotes. Use ’ inside single-quotes.

# FIX 1 - surround with "double quotes"
question <- c("How's it going?")  # this is fine
question
[1] "How's it going?"
# FIX 2 - use \' inside the character value
question <- c('How\'s it going?')  # backslash-apostrophe also works
question                           # "How's it going?"  (this is how it is displayed)
[1] "How's it going?"

15.7 Backslash (\) vs Forward Slash (/)

# NOTE:
#   There are two different slashes on most keyboards. 
#
#   BACKSLASH, "\", IS THE CORRECT SLASH:
#     The "backslash" i.e. "\" slants from the top left and towards the lower right.
#     This is the slash to use in this situation.
#
#   FORWARD SLASH, "/", IS THE WRONG SLASH:
#     The "forward slash" i.e. "/" slants from the lower left towards the top right.
#     This is NOT the slash to use in this case. This slash is used for division.
#
#   MNEMONIC TO REMEMBER \BACKSLASH vs /FORWARD SLASH
#     Imagine that someone is walking along next to the text from left to right.
#     Think of the slash as that person.
#     The backslash, i.e. \ , can be viewed as the person falling backwards.
#     The forward slash, i.e. / , can be viewed as the person falling forwards.
#  

15.8 Similarly, don’t type a double-quote inside of double-quotes.

# Don't place quotes inside of quotes.
#
# ERROR: (uncomment the next line to see that it is an error)

sentence <- c("Joe said "Hello" to Sue.")  # ERROR - "quote" inside "quotes"
Error: <text>:5:26: unexpected symbol
4: 
5: sentence <- c("Joe said "Hello
                            ^
# FIX 1 - surround the entire value with apostrophes (AKA 'single quotes')
sentence <- c('Joe said "Hello" to Sue.')     # this works

# NOTE: When you display a character value that contains quotation marks, R
# puts a backslash (\) in front of the quotation mark that is inside the quote.
# We'll explain more about this below.
sentence   # "Joe said \"Hello\" to Sue."     (this is how it is displayed)
[1] "Joe said \"Hello\" to Sue."
# FIX 2 - use \" inside the character value
sentence <- c("Joe said \"Hello\" to Sue.")
sentence
[1] "Joe said \"Hello\" to Sue."
# The backslashes demonstrate that the quotes with the backslashes are
# inside of the character value and do not surround (i.e. delimit) the
# character value.
sentence   # "Joe said \"Hello\" to Sue."
[1] "Joe said \"Hello\" to Sue."
# You can use the cat function to display the sentence in a more readable
# way. The output of cat doesn't display any quotes 
# around the sentence. cat also doesn't display the backslashes (\) that
# precede the quotes inside the sentence. In short, cat is used to 
# display the information in a "human friendly" way.
# We will discuss the cat function in more depth below. However, for now
# you can use cat to display the text in a more readable manner. 

cat(sentence)    # Joe said "Hello" to Sue
Joe said "Hello" to Sue.

15.9 MIXING ‘single quotes’ and “double quotes” in the same text.

#-----------------------------------------------------------------.
# MIXING 'single quotes' and "double quotes" in the same text.
#-----------------------------------------------------------------.

# If your data contains both quotation marks and apostrophes, you 
# should surround your data with either type of quote and then 
# use backslashes in front of that type of quote in your data.
# 
# The following are all valid and all result in the same thing.


# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# One way - surround with "double quotes" (i.e. quotation marks), use \" in the data
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

saying = "He's going to ask Sue \"Will you marry me\". He hopes she says yes."
saying        # this is what is stored
[1] "He's going to ask Sue \"Will you marry me\". He hopes she says yes."
cat(saying)   # this is a "human friendly" view of your data
He's going to ask Sue "Will you marry me". He hopes she says yes.
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# Another way - surround with 'single quotes' (i.e. apostrophes), use \' in the data
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
saying = 'He\'s going to ask Sue "Will you marry me". He hopes she says yes.'
saying        # this is what is stored
[1] "He's going to ask Sue \"Will you marry me\". He hopes she says yes."
cat(saying)   # this is a "human friendly" view of your data
He's going to ask Sue "Will you marry me". He hopes she says yes.
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# Yet another way - the backslashes are ok, even if they aren't necessary
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

# The following uses \' and \" in the data. 
# You don't need \" since the data is surrounded with 'single quotes'.
# The \" is unnecessary in this case but it doesn't hurt.

saying = 'He\'s going to ask Sue \"Will you marry me\". He hopes she says yes.'
saying        # this is what is stored
[1] "He's going to ask Sue \"Will you marry me\". He hopes she says yes."
cat(saying)   # this is a "human friendly" view of your data
He's going to ask Sue "Will you marry me". He hopes she says yes.

15.10 cat(value1, value2, etc.)

###########################################################################.
# cat( value1, value2, etc. )
#
# WHAT DOES IT DO
#
# The cat function is used to display information to the screen in a 
# user friendly way. The output of cat will NOT include extra backslash characters.
# The output of cat will also not be surrounded with quotes.
#
#
# YOU CANNOT "USE" THE OUTPUT OF CAT (other than to display information)
#
# The output of cat iS ONLY used to display information to the screen.
# You cannot capture the output of cat to a variable or use cat in a
# larger expression. It is simply used to display information to the screen.
# See below for an example of what happens if you try to use the output 
# from cat in a larger context (hint ... it doesn't work).
#
#
# SOME TRIVIA - WHY IS IT CALLED CAT? 
#
# If you're wondering why it's called "cat", it is because "cat" is short for
# the English word "concatenate". "concatenate" means to "link things together"
# - https://www.dictionary.com/browse/concatenate . The cat function can take
# multiple pieces of info and display them all together to the screen (see 
# the example code below). Incidentally, the "c()" function is also short 
# for "concatenate" since it "concatenates" (or combines) multiple different
# vectors into a single combined vector.
#
# The command "cat" predated the creation of R. It originated with the
# Unix operating system in the 1970's and was carried over to the Linux
# operating system which is very popular today. As often happens with
# programming languages and technology, newer languages and technology (e.g. R)
# inherit many of the features (i.e. cat) that were present in older
# languages and technology (e.g. Unix). 
###########################################################################.

#----------------------------------------------------------------.
# cat displays values to the screen in a "human friendly" way.
#----------------------------------------------------------------.

# Set a character value ...
sentence <- "Joe said \"Hello\" to Sue."

# Display the results ...
#
# To see the actual character value without the backslashes, you can use the cat function.
# Compare the difference in output of the following two lines.
# Notice that for cat there are no quotes around the text and no backslashes within the text.

sentence       # "Joe said \"Hello\" to Sue."
[1] "Joe said \"Hello\" to Sue."
cat(sentence)  # Joe said "Hello" to Sue.  (no backslashes, no "quotes" surrounding the data)
Joe said "Hello" to Sue.
# You can use cat to output multiple pieces of information to the screen.
# This includes multiple values within a single vector as well as 
# multiple different vectors.
# By default all items are separated with spaces.

people = c("joe", "sue", "bill")
food = c("apple", "orange", "pear")
cat(people, food)   # just outputs all values
joe sue bill apple orange pear
# You can change the separator to anything you like by using the sep argument
cat (people, food, sep="-")      
joe-sue-bill-apple-orange-pear
cat (people, food, sep=":---:")  # the separator can be anything, even multiple characters
joe:---:sue:---:bill:---:apple:---:orange:---:pear
cat (people, food, sep="")       # use the "" empty string to remove the separator
joesuebillappleorangepear
# another example
tf = TRUE
grade = 100
person = "Bill"
cat ( "It is", tf, "that", person, "got", grade, "on the test!")
It is TRUE that Bill got 100 on the test!

15.11 WILF - UP TO HERE - AFTER CLASS 9

15.12 \n    \t    \\

#########################################################.
# The backslash \ 
#########################################################.

# The backslash character ( \ ) is used to change the meaning from 
# the character that follows it. When used before a "double quote" (i.e. \" ) or
# a 'single quote' (i.e. \' ) it takes the special meaning of the quote away.
# The quote is no longer used to indicate the end of a quotation but becomes intead
# simply part of the text.
# 
# In a similar way, the backslash changes the meaning of some other characters.
# For example
#    \n is used for an ENTER (i.e. a "newline").
#    \t is used for a "tab" character
#    \\ is used for an actual backslash character.
#
# EXAMPLE

cat("This is line 1.\nThis is line 2.")   # displayed on two different lines
This is line 1.
This is line 2.
cat("This is line 1.\n\nWe just skipped a line.")  # use double \n\n to skip a line
This is line 1.

We just skipped a line.
cat("Hi.\tI really like pizza.\n\tI don't like broccoli.")  # two lines, notice the tabs
Hi. I really like pizza.
    I don't like broccoli.
cat("This ... \\ ... is a backslash.")
This ... \ ... is a backslash.

15.13 — Practice —

#...........................................................................
# QUESTION   TOPICS: cat, backslash
#
# (a) Write a single R command that assigns the following message
#     to the variable named message.
#  
#      The '\' is known as a "backslash". 
#      It's used to escape the meaning of
#      the character that follows it.
#
# (b) display the message variable to the screen by just typing the word, message.
# (c) Use the cat command to display the message to the screen.
#...........................................................................
message = "\tThe '\\' is known as a \"backslash\".\n\tIt's used to escape the meaning of\n\tthe character that follows it."

message       # this shows the backslashes, ie. \n   \t   \"   \'  
[1] "\tThe '\\' is known as a \"backslash\".\n\tIt's used to escape the meaning of\n\tthe character that follows it."
cat(message)  # this changes \n into ENTER, \t into tab, \" into " and \' into '
    The '\' is known as a "backslash".
    It's used to escape the meaning of
    the character that follows it.

15.14 DON’T use cat inside a larger expression.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Do NOT use the output of cat inside a larger expression.  ####
# (you CAN use the output of most other functions in a larger expression)
# 
# See examples below
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#-----------------------------------------------.
# SETUP SOME DATA TO USE IN THE EXAMPLES BELOW
#-----------------------------------------------.

# people and food are parallel vectors 
# - i.e. joe's favorite food is apple, sue's favorite food is orange, etc.

people       = c("joe", "sue", "bill")    
favoriteFood = c("apple", "orange", "pear")

people[2] # use the number 2 to get the 2nd person
[1] "sue"
cat(2)    # this just uses cat to display the number 2 on the screen
2
# . . . .. . . . . . . . . . . . . . . . . . . . . .. . . 
# ONLY USE cat TO DISPLAY INFO TO THE SCREEN.
# DON'T USE CAT FOR ANYTHING ELSE!
# . . . .. . . . . . . . . . . . . . . . . . . . . .. . . 

#...........
# EXAMPLE 1
#...........

# ONLY USE cat TO DISPLAY INFO TO THE SCREEN.
# DON'T USE CAT FOR ANYTHING ELSE - THIS DOESN'T WORK
people[cat(2)]   # error
2
character(0)
# HOWEVER, THIS WORKS - use an expression to display the 2nd person
people[ 1 + 1 ]  # sue
[1] "sue"
#...........
# EXAMPLE 2
#...........

# ONLY USE cat TO DISPLAY INFO TO THE SCREEN.
# DON'T USE CAT FOR ANYTHING ELSE - THIS DOESN'T WORK

y = cat(people[3])   # "bill"
bill
y                    # NULL
NULL
# Set the variable x to the name of the 2nd person
x = people[2]   # this works
x
[1] "sue"

15.15 nchar returns the number of symbols (letters, digits, spaces, etc) in a value.

########################################################.
# a few character functions :    nchar, paste0, paste   
# (there are many more): 
########################################################.

#---------------------------------------------------------.
# nchar returns the number of characters in a value.
# All characters, including spaces and punctuation count as part of the value.
#---------------------------------------------------------.

nchar(c("O.K.", "hello", "there", "hello there"))    
[1]  4  5  5 11
wierdGreeting = "hello\n\tthere"
cat(wierdGreeting)
hello
    there
nchar(wierdGreeting)
[1] 12

15.16 paste and paste0 combine character values from different vectors

#-------------------------------------------------------------------.
# paste and paste0 functions are used to combine character values
#-------------------------------------------------------------------.
paste0 ( "hi", "there")   # paste0 does not leave any space between values
[1] "hithere"
# You can capture the output of these functions in a variable

greeting = paste0("hi",  "there", "joe")
greeting
[1] "hitherejoe"
nchar(greeting)
[1] 10
# paste (as opposed to paste0)
paste("hi", "there")
[1] "hi there"
paste("hi", "there", "joe")  # paste by default leaves a space
[1] "hi there joe"
# you can change the separator with a sep argument 
paste("hi", "there", "joe", sep = "-")
[1] "hi-there-joe"
message = paste("hi", "there", "joe", sep = "-")
message
[1] "hi-there-joe"
message2 = cat("hi", "there", "joe", sep = "-")
hi-there-joe
message2
NULL
length(message)
[1] 1
length(message2)
[1] 0
# the empty string is "" or ''
paste("hi", "there", "joe", sep = "")   # exact same thing as paste0
[1] "hitherejoe"
# Like most functions, paste0 and paste work with the recycling rule

firstNames = c("joe","chaim","juan")
firstNames
[1] "joe"   "chaim" "juan" 
paste ("hello", firstNames)
[1] "hello joe"   "hello chaim" "hello juan" 
paste0 ("hello", firstNames)
[1] "hellojoe"   "hellochaim" "hellojuan" 
# becomes             paste0(  c("hello","hello","hello"), c("joe","chaim","juan"))
# vector operations   "hellojoe"  "hellochaim"  "hellojuan"

#...........................................................
# Using paste with vectors that have more than one value
#...........................................................
firstNames = c("joe","chaim","juan")
firstNames
[1] "joe"   "chaim" "juan" 
lastNames = c("smith", "schwartz", "rodriguez")
lastNames
[1] "smith"     "schwartz"  "rodriguez"
fullNames = paste (firstNames, lastNames)
fullNames
[1] "joe smith"      "chaim schwartz" "juan rodriguez"
length(firstNames)
[1] 3
length(lastNames)
[1] 3
length(fullNames)
[1] 3
# sep argument is used to separate the values being pasted together
paste(firstNames, lastNames, sep="-")
[1] "joe-smith"      "chaim-schwartz" "juan-rodriguez"
paste(firstNames, lastNames, sep=" ...---... ")
[1] "joe ...---... smith"      "chaim ...---... schwartz"
[3] "juan ...---... rodriguez"
paste(firstNames, lastNames, sep="'s last name is ")
[1] "joe's last name is smith"      "chaim's last name is schwartz"
[3] "juan's last name is rodriguez"
paste(firstNames, lastNames, sep="")   # same as paste0(firstNames, lastNames)
[1] "joesmith"      "chaimschwartz" "juanrodriguez"
# Three (or more) vectors also works
paste(c("abc","def"), c("UVW","XYZ"), c("012","345"))
[1] "abc UVW 012" "def XYZ 345"
paste(c("abc","def"), c("UVW","XYZ"), c("012","345"), sep = "")  # no more spaces
[1] "abcUVW012" "defXYZ345"
paste0(c("abc","def"), c("UVW","XYZ"), c("012","345"))  # same thing
[1] "abcUVW012" "defXYZ345"
# View the help page by typing: 
#
#   ?paste


# Recycling rule 
paste(c("abc","def"), c("RST", "UVW","XYZ"), sep="")  # "abcRST" "defUVW" "abcXYZ"
[1] "abcRST" "defUVW" "abcXYZ"
paste(c("a","b"), c("x","y","z"), "somethingElse", sep=" ")
[1] "a x somethingElse" "b y somethingElse" "a z somethingElse"
#.....................................................................
# collapse argument is used to collapse entire output into a 
# vector that contains a single item
#
# The value of the collapse argument is used to separate between the different
# values that are being collapsed.
#.....................................................................

# example of using collapse
paste(c("abc","def"), c("RST", "UVW","XYZ"), sep="") # without collapse
[1] "abcRST" "defUVW" "abcXYZ"
paste(c("abc","def"), c("RST", "UVW","XYZ"), sep="", collapse="-") # with collapse
[1] "abcRST-defUVW-abcXYZ"
# another example
answer = paste(firstNames, lastNames, sep="---") # without collapse 
answer
[1] "joe---smith"      "chaim---schwartz" "juan---rodriguez"
length(answer)
[1] 3
answer = paste(firstNames, lastNames, sep="---", collapse="@@@") # with collapse
answer
[1] "joe---smith@@@chaim---schwartz@@@juan---rodriguez"
length(answer)
[1] 1
# One or both of sep and collapse arguments can be the "empty string" (i.e. "")
paste(firstNames, lastNames, sep="-", collapse="")
[1] "joe-smithchaim-schwartzjuan-rodriguez"
paste(firstNames, lastNames, sep="", collapse="-")
[1] "joesmith-chaimschwartz-juanrodriguez"
paste(firstNames, lastNames, sep="", collapse="")
[1] "joesmithchaimschwartzjuanrodriguez"
# student's question
paste(firstNames, lastNames, sep=" ", collapse="") # sep is space
[1] "joe smithchaim schwartzjuan rodriguez"
paste(firstNames, lastNames, collapse="")  # same thing (default for sep is a space)
[1] "joe smithchaim schwartzjuan rodriguez"

15.17 Can’t do math with “character” values: “1” + “1” = ERROR

#---------------------------------------------------------------------------.
# You cannot do math with character data - even if it looks like a number
#---------------------------------------------------------------------------.

"5" * "3"   # ERROR
Error in "5" * "3": non-numeric argument to binary operator
5 * 3       # that's better
[1] 15
# Another example

charNums = c("85", "19" ,"103",  "100", "22", "099" , "9" , "1000000", "2000")
charNums
[1] "85"      "19"      "103"     "100"     "22"      "099"     "9"      
[8] "1000000" "2000"   
sum(charNums)  # ERROR - no math with character values
Error in sum(charNums): invalid 'type' (character) of argument

15.18 review - sort function with numeric vectors

###########################################################.
#
# sort function - how it works with character values
#
###########################################################.

#------------------------------------.
# numeric - numbers sort as you'd expect
#------------------------------------.

nums = c(85, 19 ,103,  100, 22, 099 , 9 , 1000000, 2000)

sort (nums) # 9  19  22  85  99  100  103  2000  1000000
[1]       9      19      22      85      99     100     103    2000 1000000
# you can sort in decreasing order too
sort (nums, decreasing=TRUE) # 9  19  22  85  99  100  103  2000  1000000
[1] 1000000    2000     103     100      99      85      22      19       9
# View the help page by typing: 
#
#   ?sort  # see the help for more info.

15.19 sort function with logical (i.e. TRUE/FALSE) vectors

#------------------------------------.
# logical - FALSE is less than TRUE
#------------------------------------.
tf = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)
sort (tf)   # FALSE FALSE FALSE TRUE TRUE TRUE
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE
sort (tf, decreasing = TRUE)   # TRUE TRUE TRUE FALSE FALSE FALSE
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

15.20 sort function with “character” vectors

#----------------------------------------------------------------------------.
# sorting character data
#
# In general, sort will arrange character data in alphabetical order.
#
# The sort order of "special characters" (@%#$ etc) spaces, numbers,
# CAPITALS vs lowercase depends on the "locale" of your computer,
# e.g. USA, Europe, Israel, etc.
# Discussion of "locale" is beyond the scope of this document. 
#----------------------------------------------------------------------------.

stuff = c("table", "@table", "  table", "book case", "101 dalmations", "apple", "ape", "apex")
stuff
[1] "table"          "@table"         "  table"        "book case"     
[5] "101 dalmations" "apple"          "ape"            "apex"          
sort (stuff)  # "  table"        "@table"         "101 dalmations" "ape"            "apex"           "apple"          "book case"      "table"  
[1] "  table"        "@table"         "101 dalmations" "ape"           
[5] "apex"           "apple"          "book case"      "table"         

15.21 sorting “character” numbers, eg. sort(“2”, “100”, “30”) # “100” “2” “30”

#.................................................................................
# If numbers are stored as characters then they will sort as characters do.
#
# The first character is the most important character for the sorting order. 
# Only when two of the first characters are the same will the 2nd character be important, etc.
#.................................................................................
sort(charNums) # "099"  "100"  "1000000"  "103"  "19"  "2000"  "22"  "85"  "9" 
[1] "099"     "100"     "1000000" "103"     "19"      "2000"    "22"     
[8] "85"      "9"      
# EXPLANATION OF THE SORT ORDER
#
# "099" sorts first because 0 is the "earliest" character in the 
# sequence 0,1,2,3,4,5,6,7,8,9
#
# 100 is next because "1" is after "0" in the sequence 01,2,3,4,5,6,7,8,9
#
# 1000000 is next because the first 3 characters, i.e. "100" is the same
# as the previous value, i.e."100" but "1000000" is longer than "100" in characters
# so it sorts after "100". This is similar to why "be" is followed by "bed" in the dictionary
# "bed" contains "be" but is longer than "be".
#
# "103" is next because the "10" is the same as the first two characters of the 
# previous value but the third character, i.e. "3" is "later" in the list 0,1,2,3,4,5,6,7,8,9
# than is "0", the third character of "1000000".
#
# "19" is next because "9" is later than the "0" from "103"
#
# "2000" is next because the "2" is later than the "1" from "19"
#
# "22" is next because the 2nd "2" is later then the first "0" from "2000"
#
# "85" is next because the "8" is later than the first "2" from "22"
#
# "9" is last because it has the latest first character from the list 01,2,3,4,5,6,7,8,9 from
# all of the values being sorted.

15.22 Some functions automatically convert “1” to 1

##############################################################.
# Sometimes, R implicitly (i.e. automatically) converts
# character values into numeric values. See one example
# shown below with the rep function.
##############################################################.

# The rep function expects a numeric value as its second argument.

rep("hello", 5)  # "hello" is character, 5 is numeric
[1] "hello" "hello" "hello" "hello" "hello"
# If instead you specify a character value that contains a 
# number, the rep function will convert the character into a 
# number.

rep("hello", "5")  #  "5" is converted to 5
[1] "hello" "hello" "hello" "hello" "hello"
# However such conversions are NOT guaranteed to happen for all functions.
# For example, the 2nd argument to the round function is used to specify 
# to what position to round the number. For example the following line
# rounds 3.579 to the 1st decimal place

round(3.579, 1)   # 3.6
[1] 3.6
# If you specify the character value "1" for the 2nd argument 
# instead of the number 1 you will get an ERROR since the 2nd argument
# is supposed to be numeric.

round(3.579, "1")  # ERROR - "1" is a character value, should be a number
Error in round(3.579, "1"): non-numeric argument to mathematical function
# Bottom line - the functions that are built into R are not consistent
# in how they handle this issue.