15.1.1 Download the code with install.packages() function
Before using functions from any package, you must install the package using the install.packages() function (see more below).
The install.packages() function, downloads the code for the R package(s) to your computer from a “repository” known as CRAN - the Comprehensive R Archive Network. CRAN is actually supported by different websites that are funded and managed by different institutions. Each of these sources of package info is known as a CRAN “mirror”. If you don’t specify where the the install.packages function should get the package, a window might popup asking your to choose a “CRAN mirror”. Alternatively, you can specify the mirror you want in the repos argument directly in the call to install.packages. This is what we do below.
You can install the stringr package with the following command.
# The following is the list of major CRAN mirrors. You do NOT need to do this # part. It is done for this website since the code to generate this website# is NOT run interactively and if a popup should appear asking us to choose # a "mirror" the website would not be generated correctly. Therefore we# use this code to specify a list of mirror from which install.packages()# can choose one. If the first one is not available (e.g. it is "down") then# install.packages() will try the 2nd one, etc.CRANrepos =c("https://mirror.las.iastate.edu/CRAN/", # Iowa State University, Iowa"http://ftp.ussg.iu.edu/CRAN/", # Indiana University, Indiana"https://repo.miserver.it.umich.edu/cran/", # University of Michigan"https://cran.wustl.edu/", # Washington University, Missouri"https://archive.linux.duke.edu/cran/", # Duke University, NC"https://cran.case.edu/", # Case Western Reserve University, OH"https://ftp.osuosl.org/pub/cran/", # Oregon State University"http://lib.stat.cmu.edu/R/CRAN/", # Carnegie Mellon University, PA"https://cran.mirrors.hoobly.com/", # Hoobly Classifieds, PA"https://mirrors.nics.utk.edu/cran/") # Nat. Inst. 4 Computational Sci, TN# NOTE: If you are running this command interactively you can leave out the# repos argument.install.packages("stringr", repos=CRANrepos)
Installing package into '/home/yitz/R/x86_64-pc-linux-gnu-library/4.5'
(as 'lib' is unspecified)
Note that stringr is part of the tidyverse family of packages. You can install any tidyverse package by itself or install the entire set of tidyverse packages with the command install.packages("tidyverse")
15.1.2 Calling functions from stringr
At this point the code is downloaded to your computer. You can now use the functions in stringr but you will have to prefix each call to a function with stringr::
For example the stringr::str_length() function returns the length of the strings (i.e. character values) that are passed to it.
stuff =c("Hi", "there.", "How are you?")stringr::str_length(stuff)
[1] 2 6 12
Notice that I must include stringr::
# Determine the length of each value in the vectorstuff =c("Hi", "there.", "How are you?")str_length(stuff)
Error in str_length(stuff): could not find function "str_length"
15.1.3 library(stringr) or require(stringr)
You can use the library or require commands to avoid needing to write stringr::
library(stringr) # require(stringr) will also work stuff =c("Hi", "there.", "How are you?")str_length(stuff)
[1] 2 6 12
Note that stringr is part of the tidyverse family of packages. If you’ve already installed the entire tidyverser set of packages with install.packages("tidyverse") you could then call library(tidyverse) to “load” the entire tidyverse into your R session.
# Extract in reverse sorted orderstr_sort(sort_data, decreasing =TRUE)
[1] "sort" "please!" "me"
15.5 stringr: String Interpolation
first <-c("Luke", "Han", "Jean-Luc")last <-c("Skywalker", "Solo", "Picard")# Interpolate (insert variable values) into strings with str_glue()str_glue("My name is {first}. {first} {last}.")
My name is Luke. Luke Skywalker.
My name is Han. Han Solo.
My name is Jean-Luc. Jean-Luc Picard.
minimum_age <-18over_minimum <-c(5, 17, 33)# Interpolate the result of an execution into a stringstr_glue("{first} {last} is {minimum_age + over_minimum} years old.")
Luke Skywalker is 23 years old.
Han Solo is 35 years old.
Jean-Luc Picard is 51 years old.
num <-c(1:5)# Interpolate the result of function callsstr_glue("The square root of {num} is {round(sqrt(num), 3)}.")
The square root of 1 is 1.
The square root of 2 is 1.414.
The square root of 3 is 1.732.
The square root of 4 is 2.
The square root of 5 is 2.236.
fuel_efficiency <-30# Interpolate strings using data from a data framemtcars %>%rownames_to_column("Model") %>%filter(mpg > fuel_efficiency) %>%str_glue_data("The {Model} gets {mpg} mpg.")
Error in rownames_to_column(., "Model"): could not find function "rownames_to_column"
15.6 stringr: String Matching
head(data,8)
author score
1 butt_ghost 3
2 buntaro_pup 1
3 iidealized 2
4 [deleted] 1
5 stathibus 6
6 soulslicer0 2
7 swiftsecond 1
body
1 Hdf5. It's structured, it's easy to get data in and out, and it's fast. Plus it will scale if you ever get up there in dataset size.
2 yep, good point.
3 Google must have done (and is doing) serious internal research in ranking. I've heard they're pretty good at that and they've even made some money doing it :P
4 [deleted]
5 Sebastian Thrun's book, Probabilistic Robotics, goes through this in great detail. Get it, read it, make it your bible.
6 This. Such a legendary book. Kalman filters, particle filters, recursive Bayesian filters and a whole bunch of other stuff. I learnt so much. Read these 3 for starts from the book, then come back and ask the questions
7 Do you still need help?
# Detecting the presence of a pattern in stringsstr_detect(data$body[1:100], pattern="deep")
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE NA NA NA NA NA
[13] NA NA NA NA NA NA NA NA NA NA NA NA
[25] NA NA NA NA NA NA NA NA NA NA NA NA
[37] NA NA NA NA NA NA NA NA NA NA NA NA
[49] NA NA NA NA NA NA NA NA NA NA NA NA
[61] NA NA NA NA NA NA NA NA NA NA NA NA
[73] NA NA NA NA NA NA NA NA NA NA NA NA
[85] NA NA NA NA NA NA NA NA NA NA NA NA
[97] NA NA NA NA
# Get the indicies of matched stringsstr_inds <-str_which(data$body[1:100], pattern="deep")str_inds
integer(0)
# Extract matched strings using detected indiciesdata$body[str_inds]
character(0)
# Count the number of matchesstr_count(data$body[1:100], "deep")
[1] 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# Get the position of matchesstr_locate_all(data$body[1], "deep")
[[1]]
start end
# Get a list of the first match found in each string as a vectorstr_extract(data$body[1:3], "deep|the|and")
[1] "and" NA "and"
# Get a list of the first match found in each string as matrixstr_match(data$body[1:3], "deep|the|and")
[,1]
[1,] "and"
[2,] NA
[3,] "and"
# Get a list of the all matches found in each string as list of matriciesstr_match_all(data$body[1:3], "deep|the|and")
author score
1 butt_ghost 3
2 buntaro_pup 1
3 iidealized 2
4 [deleted] 1
5 stathibus 6
6 soulslicer0 2
7 swiftsecond 1
body
1 Hdf5. It's structured, it's easy to get data in and out, and it's fast. Plus it will scale if you ever get up there in dataset size.
2 yep, good point.
3 Google must have done (and is doing) serious internal research in ranking. I've heard they're pretty good at that and they've even made some money doing it :P
4 [deleted]
5 Sebastian Thrun's book, Probabilistic Robotics, goes through this in great detail. Get it, read it, make it your bible.
6 This. Such a legendary book. Kalman filters, particle filters, recursive Bayesian filters and a whole bunch of other stuff. I learnt so much. Read these 3 for starts from the book, then come back and ask the questions
7 Do you still need help?
# Get a string subset based on character positionstr_sub(data$body[1], start=1, end=100)
[1] "Hdf5. It's structured, it's easy to get data in and out, and it's fast. Plus it will scale if you ev"
# Get a string subset based on wordsword(data$body[1], start=1, end=10)
[1] "Hdf5. It's structured, it's easy to get data in and"
# Get the strings that contain a certain patternstr_subset(data$body[1:100], pattern="deep")
character(0)
# Replace a substring with a new string by substring positionstr_sub(data$body[1], start=1, end=100) <-str_to_upper(str_sub(data$body[1], start=1, end=100))str_sub(data$body[1], start=1, end=100)
[1] "HDF5. IT'S STRUCTURED, IT'S EASY TO GET DATA IN AND OUT, AND IT'S FAST. PLUS IT WILL SCALE IF YOU EV"
# Replace first occurrence of a substring with a new string by matchingstr_replace(data$body[1], pattern="deep|DEEP", replacement="multi-layer")
[1] "HDF5. IT'S STRUCTURED, IT'S EASY TO GET DATA IN AND OUT, AND IT'S FAST. PLUS IT WILL SCALE IF YOU EVer get up there in dataset size."
# Replace all occurrences of a substring with a new string by matchingstr_replace_all(data$body[1], pattern="deep|DEEP", replacement="multi-layer")
[1] "HDF5. IT'S STRUCTURED, IT'S EASY TO GET DATA IN AND OUT, AND IT'S FAST. PLUS IT WILL SCALE IF YOU EVer get up there in dataset size."
15.8 stringr: Viewing Strings
# Basic printingprint(data$body[1:10])
[1] "HDF5. IT'S STRUCTURED, IT'S EASY TO GET DATA IN AND OUT, AND IT'S FAST. PLUS IT WILL SCALE IF YOU EVer get up there in dataset size."
[2] "yep, good point."
[3] "Google must have done (and is doing) serious internal research in ranking. I've heard they're pretty good at that and they've even made some money doing it :P"
[4] "[deleted]"
[5] "Sebastian Thrun's book, Probabilistic Robotics, goes through this in great detail. Get it, read it, make it your bible."
[6] "This. Such a legendary book. Kalman filters, particle filters, recursive Bayesian filters and a whole bunch of other stuff. I learnt so much. Read these 3 for starts from the book, then come back and ask the questions"
[7] "Do you still need help?"
[8] NA
[9] NA
[10] NA
deep_learning_posts <- data$body[str_which(data$body, "deep learning")]# View strings in HTML format with the first occurence of a pattern highlightedstr_view(deep_learning_posts, pattern="deep")
✖ Empty `string` provided.
# View strings in HTML format with the first all occurences highlightedstr_view_all(deep_learning_posts, pattern="deep")
Warning: `str_view_all()` was deprecated in stringr 1.5.0.
ℹ Please use `str_view()` instead.
✖ Empty `string` provided.
# Format strings into paragraphs of a given width with str_wrap()wrapped <-str_wrap(data$body[str_which(data$body, "deep learning")][1], width =50)wrapped
[1] NA
# Print wrapped string with output obeying newlineswrapped %>%cat()