8  8. What is the “tidyverse”

The tidyverse is a collection of R packages designed for data science that share common design philosophies and work seamlessly together. These packages make data manipulation, visualization, and analysis more intuitive and efficient.

See these resources for more info about the entire tidyverse family of packages.

8.1 Installation and Loading

To install all tidyverse packages:

install.packages("tidyverse")

To load the core tidyverse packages:

library(tidyverse)

Alternatively, you can install and load each package separately.

8.2 Core Tidyverse Packages

The core tidyverse includes packages that you’ll likely use in almost every analysis. When you run library(tidyverse), these packages are automatically loaded:

8.2.1 ggplot2

  • Purpose: Creates elegant data visualizations using the Grammar of Graphics
  • Key features: Layered approach to building plots, extensive customization options
  • Common uses: Creating statistical graphics, from basic scatter plots to complex multi-layered visualizations
  • Learn more: https://ggplot2.tidyverse.org

8.2.2 dplyr

  • Purpose: Data manipulation and transformation
  • Key features: Functions for filtering, selecting, mutating, summarizing, and joining data
  • Common uses: Subset rows, select columns, create new variables, calculate summaries
  • Learn more: https://dplyr.tidyverse.org

8.2.3 tidyr

  • Purpose: Data tidying and reshaping
  • Key features: Functions for pivoting data between wide and long formats, handling missing values
  • Common uses: Restructuring data to make it tidy (each variable in a column, each observation in a row)
  • Learn more: https://tidyr.tidyverse.org

8.2.4 readr

  • Purpose: Fast and friendly reading of rectangular data (like CSV files)
  • Key features: Consistent parsing, progress bars, warnings about parsing problems
  • Common uses: Reading CSV, TSV, and other delimited files
  • Learn more: https://readr.tidyverse.org

8.2.5 purrr

  • Purpose: Functional programming tools
  • Key features: Functions for working with vectors and lists, replacing many loop functions
  • Common uses: Applying functions to lists, working with nested data structures
  • Learn more: https://purrr.tidyverse.org

8.2.6 tibble

  • Purpose: Modern reimagining of data frames
  • Key features: Better printing, stricter subsetting, and enhanced creation of data frames
  • Common uses: Creating and working with data frames in a more consistent way
  • Learn more: https://tibble.tidyverse.org

8.2.7 stringr

  • Purpose: String manipulation
  • Key features: Consistent interface for working with strings
  • Common uses: Pattern matching, string extraction, modification, and joining
  • Learn more: https://stringr.tidyverse.org

8.2.8 forcats

  • Purpose: Working with categorical variables (factors)
  • Key features: Tools for reordering factor levels, recoding, and collapsing factors
  • Common uses: Managing and modifying factor variables
  • Learn more: https://forcats.tidyverse.org

8.2.9 lubridate

  • Purpose: Working with dates and times
  • Load with: library(lubridate)
  • Learn more: https://lubridate.tidyverse.org

8.3 Non-Core Tidyverse Packages

These packages are part of the tidyverse but aren’t loaded automatically with library(tidyverse). You’ll need to load them individually when needed:

8.3.1 readxl

  • Purpose: Reading Excel files
  • Load with: library(readxl)
  • Learn more: https://readxl.tidyverse.org

8.3.2 haven

  • Purpose: Reading SPSS, Stata, and SAS files
  • Load with: library(haven)
  • Learn more: https://haven.tidyverse.org

8.3.3 hms

  • Purpose: Working with time-of-day values
  • Load with: library(hms)
  • Learn more: https://hms.tidyverse.org

8.3.4 jsonlite

  • Purpose: Working with JSON data
  • Load with: library(jsonlite)
  • Learn more: https://jsonlite.tidyverse.org

8.3.5 xml2

  • Purpose: Working with XML data
  • Load with: library(xml2)
  • Learn more: https://xml2.tidyverse.org

8.3.6 rvest

  • Purpose: Web scraping
  • Load with: library(rvest)
  • Learn more: https://rvest.tidyverse.org

8.4 Finding Help and Learning More

  1. Each package has its own website (links provided above) with comprehensive documentation, vignettes, and examples.

  2. Within R, you can access help for any function using the ? operator:

?select  # Get help for dplyr's select function
  1. Cheatsheets are available for many tidyverse packages:
  • Access in RStudio: Help > Cheatsheets
  • Download from: https://posit.co/resources/cheatsheets/
  1. The official tidyverse website (https://www.tidyverse.org) provides additional resources and learning materials.

  2. The “R for Data Science” book (https://r4ds.hadley.nz/) is an excellent resource for learning the tidyverse ecosystem.

8.5 Understanding Core vs Non-Core Packages

The distinction between core and non-core tidyverse packages is primarily about frequency of use and interdependence:

8.5.1 Core Packages:

  • Loaded automatically with library(tidyverse)
  • Used in most data analysis workflows
  • Work together seamlessly
  • Share common design principles and documentation styles
  • Updated and maintained on a coordinated release schedule

8.5.2 Non-Core Packages:

These packages are part of the tidyverse. They are installed with install.packages(“tidyverse”) but aren’t loaded automatically with library(tidyverse). You’ll need to load them individually (with library(PACKAGE_NAME) or require(PACKAGE_NAME).

  • Must be loaded individually
  • Serve more specialized purposes
  • Still follow tidyverse design principles
  • May be used less frequently
  • May have different release schedules
  • Often depend on core packages but are not required for basic data analysis workflows

Remember that being “non-core” doesn’t mean less important—it simply means the package serves a more specialized purpose that isn’t needed in every analysis.

8.6 Non-Core Tidyverse Packages (as of Feb 2025)

These packages are part of the tidyverse but aren’t loaded automatically with library(tidyverse). You’ll need to load them individually:

8.6.1 File Reading and Writing

  • readxl: Efficiently reads Excel files (.xls and .xlsx formats) into R. Provides functions to read entire sheets or specific cells, and handles various Excel formatting quirks.

  • haven: Specializes in reading data from other statistical software like SPSS, Stata, and SAS. Preserves value labels, variable labels, and other metadata from the original files.

  • fs: Provides a cross-platform interface for working with files and directories. Includes tools for creating, moving, copying, and deleting files, as well as working with file permissions and paths in a consistent way across operating systems.

  • jsonlite: Handles reading and writing JSON data with robust encoding support. Specializes in converting between JSON data and R objects while maintaining data types and structures.

  • xml2: Provides tools for working with XML files. Includes functions for reading, writing, and manipulating XML data, with support for XPath expressions and namespace handling.

8.6.2 Web and Database

  • httr2: Modern toolkit for working with web APIs and HTTP. Includes features for authentication, request/response handling, and working with different HTTP methods.

  • rvest: Makes it easy to scrape data from web pages. Provides tools for extracting specific elements from HTML pages, handling forms, and navigating web pages programmatically.

  • dbplyr: Allows you to work with databases using familiar dplyr syntax. Translates R code into SQL queries behind the scenes, making it easier to work with large datasets stored in databases.

  • googledrive: Provides a full-featured interface to Google Drive from R. Allows you to upload, download, and manage files on Google Drive programmatically.

8.6.3 Modeling and Development

  • modelr: Contains helper functions for modeling and model visualization. Includes tools for data partitioning, cross-validation, and creating model predictions.

  • broom: Converts statistical model objects into tidy tibbles. Makes it easier to work with model outputs using tidyverse tools by providing consistent data structures.

  • reprex: Creates reproducible examples for sharing code problems or solutions. Formats R code and its output in a way that’s perfect for posting to GitHub issues, Stack Overflow, or other forums.

  • devtools: Essential toolkit for R package development. Provides functions for creating, testing, and maintaining R packages, including tools for documentation and testing.

8.6.4 Time and Dates

  • hms: Specialized package for working with time-of-day values. Provides a simple class for storing and manipulating time measurements without dates.

  • clock: Comprehensive toolkit for working with dates and times. Offers more precise control over calendars, time zones, and date-time arithmetic than base R or lubridate.