8 8. What is the “tidyverse”

The tidyverse is a collection of R packages designed for data science that share common design philosophies and work seamlessly together. These packages make data manipulation, visualization, and analysis more intuitive and efficient.

See these resources for more info about the entire tidyverse family of packages.

8.1 Installation and Loading

To install all tidyverse packages:

install.packages("tidyverse")

To load the core tidyverse packages:

library(tidyverse)

Alternatively, you can install and load each package separately.

8.2 Core Tidyverse Packages

The core tidyverse includes packages that you’ll likely use in almost every analysis. When you run library(tidyverse), these packages are automatically loaded:

8.2.1 ggplot2

Purpose: Creates elegant data visualizations using the Grammar of Graphics
Key features: Layered approach to building plots, extensive customization options
Common uses: Creating statistical graphics, from basic scatter plots to complex multi-layered visualizations
Learn more: https://ggplot2.tidyverse.org

8.2.2 dplyr

Purpose: Data manipulation and transformation
Key features: Functions for filtering, selecting, mutating, summarizing, and joining data
Common uses: Subset rows, select columns, create new variables, calculate summaries
Learn more: https://dplyr.tidyverse.org

8.2.3 tidyr

Purpose: Data tidying and reshaping
Key features: Functions for pivoting data between wide and long formats, handling missing values
Common uses: Restructuring data to make it tidy (each variable in a column, each observation in a row)
Learn more: https://tidyr.tidyverse.org

8.2.4 readr

Purpose: Fast and friendly reading of rectangular data (like CSV files)
Key features: Consistent parsing, progress bars, warnings about parsing problems
Common uses: Reading CSV, TSV, and other delimited files
Learn more: https://readr.tidyverse.org

8.2.5 purrr

Purpose: Functional programming tools
Key features: Functions for working with vectors and lists, replacing many loop functions
Common uses: Applying functions to lists, working with nested data structures
Learn more: https://purrr.tidyverse.org

8.2.6 tibble

Purpose: Modern reimagining of data frames
Key features: Better printing, stricter subsetting, and enhanced creation of data frames
Common uses: Creating and working with data frames in a more consistent way
Learn more: https://tibble.tidyverse.org

8.2.7 stringr

Purpose: String manipulation
Key features: Consistent interface for working with strings
Common uses: Pattern matching, string extraction, modification, and joining
Learn more: https://stringr.tidyverse.org

8.2.8 forcats

Purpose: Working with categorical variables (factors)
Key features: Tools for reordering factor levels, recoding, and collapsing factors
Common uses: Managing and modifying factor variables
Learn more: https://forcats.tidyverse.org

8.2.9 lubridate

Purpose: Working with dates and times
Load with: library(lubridate)
Learn more: https://lubridate.tidyverse.org

8.3 Non-Core Tidyverse Packages

These packages are part of the tidyverse but aren’t loaded automatically with library(tidyverse). You’ll need to load them individually when needed:

8.3.1 readxl

Purpose: Reading Excel files
Load with: library(readxl)
Learn more: https://readxl.tidyverse.org

8.3.2 haven

Purpose: Reading SPSS, Stata, and SAS files
Load with: library(haven)
Learn more: https://haven.tidyverse.org

8.3.3 hms

Purpose: Working with time-of-day values
Load with: library(hms)
Learn more: https://hms.tidyverse.org

8.3.4 jsonlite

Purpose: Working with JSON data
Load with: library(jsonlite)
Learn more: https://jsonlite.tidyverse.org

8.3.5 xml2

Purpose: Working with XML data
Load with: library(xml2)
Learn more: https://xml2.tidyverse.org

8.3.6 rvest

Purpose: Web scraping
Load with: library(rvest)
Learn more: https://rvest.tidyverse.org

8.4 Finding Help and Learning More

Each package has its own website (links provided above) with comprehensive documentation, vignettes, and examples.
Within R, you can access help for any function using the ? operator:

?select  # Get help for dplyr's select function

Cheatsheets are available for many tidyverse packages:

Access in RStudio: Help > Cheatsheets
Download from: https://posit.co/resources/cheatsheets/

The official tidyverse website (https://www.tidyverse.org) provides additional resources and learning materials.
The “R for Data Science” book (https://r4ds.hadley.nz/) is an excellent resource for learning the tidyverse ecosystem.

8.5 Understanding Core vs Non-Core Packages

The distinction between core and non-core tidyverse packages is primarily about frequency of use and interdependence:

8.5.1 Core Packages:

Loaded automatically with library(tidyverse)
Used in most data analysis workflows
Work together seamlessly
Share common design principles and documentation styles
Updated and maintained on a coordinated release schedule

8.5.2 Non-Core Packages:

These packages are part of the tidyverse. They are installed with install.packages(“tidyverse”) but aren’t loaded automatically with library(tidyverse). You’ll need to load them individually (with library(PACKAGE_NAME) or require(PACKAGE_NAME).

Must be loaded individually
Serve more specialized purposes
Still follow tidyverse design principles
May be used less frequently
May have different release schedules
Often depend on core packages but are not required for basic data analysis workflows

Remember that being “non-core” doesn’t mean less important—it simply means the package serves a more specialized purpose that isn’t needed in every analysis.

8.6 Non-Core Tidyverse Packages (as of Feb 2025)

These packages are part of the tidyverse but aren’t loaded automatically with library(tidyverse). You’ll need to load them individually:

8.6.1 File Reading and Writing

readxl: Efficiently reads Excel files (.xls and .xlsx formats) into R. Provides functions to read entire sheets or specific cells, and handles various Excel formatting quirks.
haven: Specializes in reading data from other statistical software like SPSS, Stata, and SAS. Preserves value labels, variable labels, and other metadata from the original files.
fs: Provides a cross-platform interface for working with files and directories. Includes tools for creating, moving, copying, and deleting files, as well as working with file permissions and paths in a consistent way across operating systems.
jsonlite: Handles reading and writing JSON data with robust encoding support. Specializes in converting between JSON data and R objects while maintaining data types and structures.
xml2: Provides tools for working with XML files. Includes functions for reading, writing, and manipulating XML data, with support for XPath expressions and namespace handling.

8.6.2 Web and Database

httr2: Modern toolkit for working with web APIs and HTTP. Includes features for authentication, request/response handling, and working with different HTTP methods.
rvest: Makes it easy to scrape data from web pages. Provides tools for extracting specific elements from HTML pages, handling forms, and navigating web pages programmatically.
dbplyr: Allows you to work with databases using familiar dplyr syntax. Translates R code into SQL queries behind the scenes, making it easier to work with large datasets stored in databases.
googledrive: Provides a full-featured interface to Google Drive from R. Allows you to upload, download, and manage files on Google Drive programmatically.

8.6.3 Modeling and Development

modelr: Contains helper functions for modeling and model visualization. Includes tools for data partitioning, cross-validation, and creating model predictions.
broom: Converts statistical model objects into tidy tibbles. Makes it easier to work with model outputs using tidyverse tools by providing consistent data structures.
reprex: Creates reproducible examples for sharing code problems or solutions. Formats R code and its output in a way that’s perfect for posting to GitHub issues, Stack Overflow, or other forums.
devtools: Essential toolkit for R package development. Provides functions for creating, testing, and maintaining R packages, including tools for documentation and testing.

8.6.4 Time and Dates

hms: Specialized package for working with time-of-day values. Provides a simple class for storing and manipulating time measurements without dates.
clock: Comprehensive toolkit for working with dates and times. Offers more precise control over calendars, time zones, and date-time arithmetic than base R or lubridate.