9  9. tibbles and tribbles

9.1 Introduction to Tibbles

# Tibbles are part of the tibble package, which is included in tidyverse
# You can install/load tibble directly
if(!require(tibble)) {install.packages("tibble");require(tibble);}
Loading required package: tibble
# For more info see:
#
# help(package="tibble")

9.2 What are Tibbles?

Tibbles are a modern reimagining of R’s traditional data.frame. They are designed to make working with data frames easier and more consistent. Here’s how to create a basic tibble:

9.2.1 tibble() function is similar to data.frame() function

Create a tibble with the tibble function. It’s use is very similar to how you would use the data.frame function to create a dataframe (we’re assuming that you’re familiar with creating dataframes in R)

# Create a tibble directly
my_tibble = tibble(
  x = 1:3,
  y = letters[1:3],
  z = LETTERS[1:3]
)
my_tibble
# A tibble: 3 × 3
      x y     z    
  <int> <chr> <chr>
1     1 a     A    
2     2 b     B    
3     3 c     C    

9.2.2 as_tibble() to convert data.frame or matrix to a tibble

# convert a dataframe to a tibble

df = data.frame(
  numbers = 1:3,
  letters = c("a", "b", "c")
)

tbl = as_tibble(df)
tbl

# convert a matrix to a tibble

mat = matrix(seq(10,120,by=10), nrow=3,ncol = 4) 
tbl = as_tibble(mat)
tbl

9.3 Key Differences from data.frames

Tibbles have several important differences from traditional data.frames:

9.3.1 Default Printing Behavior

# Create a wide tibble with many columns 25 rows.

health_data = tibble(
  patient_id = sprintf("P%04d", 1:25),
  admit_date = as.Date("2024-01-01") + sample(0:30, 25, replace = TRUE),
  age = sample(18:95, 25, replace = TRUE),
  heart_rate = round(rnorm(25, 75, 10)),
  blood_pressure_sys = round(rnorm(25, 120, 15)),
  blood_pressure_dia = round(rnorm(25, 80, 10)),
  temperature = round(rnorm(25, 98.6, 0.5), 1),
  oxygen_saturation = round(rnorm(25, 98, 2)),
  cholesterol = round(rnorm(25, 190, 30)),
  glucose = round(rnorm(25, 100, 20)),
  weight_kg = round(rnorm(25, 70, 15), 1),
  height_cm = round(rnorm(25, 170, 10))
)

# Tibbles show only the first 10 rows by default
# and only columns that fit on screen
health_data
# A tibble: 25 × 12
   patient_id admit_date   age heart_rate blood_pressure_sys
   <chr>      <date>     <int>      <dbl>              <dbl>
 1 P0001      2024-01-07    25         70                129
 2 P0002      2024-01-20    48         84                105
 3 P0003      2024-01-11    78         79                116
 4 P0004      2024-01-27    55         65                125
 5 P0005      2024-01-16    54        102                108
 6 P0006      2024-01-24    24         57                122
 7 P0007      2024-01-17    56         57                138
 8 P0008      2024-01-18    89         67                131
 9 P0009      2024-01-08    29         72                119
10 P0010      2024-01-18    49         92                 86
# ℹ 15 more rows
# ℹ 7 more variables: blood_pressure_dia <dbl>,
#   temperature <dbl>, oxygen_saturation <dbl>,
#   cholesterol <dbl>, glucose <dbl>, weight_kg <dbl>,
#   height_cm <dbl>
# Compare to data.frame which tries to print everything
as.data.frame(health_data)
   patient_id admit_date age heart_rate blood_pressure_sys
1       P0001 2024-01-07  25         70                129
2       P0002 2024-01-20  48         84                105
3       P0003 2024-01-11  78         79                116
4       P0004 2024-01-27  55         65                125
5       P0005 2024-01-16  54        102                108
6       P0006 2024-01-24  24         57                122
7       P0007 2024-01-17  56         57                138
8       P0008 2024-01-18  89         67                131
9       P0009 2024-01-08  29         72                119
10      P0010 2024-01-18  49         92                 86
11      P0011 2024-01-08  25         66                120
12      P0012 2024-01-13  25         68                 95
13      P0013 2024-01-02  74         78                106
14      P0014 2024-01-29  40         85                109
15      P0015 2024-01-04  87         50                127
16      P0016 2024-01-17  86         81                106
17      P0017 2024-01-13  80         87                150
18      P0018 2024-01-20  83         54                111
19      P0019 2024-01-14  83         77                122
20      P0020 2024-01-10  65         85                114
21      P0021 2024-01-26  83         76                 96
22      P0022 2024-01-15  64         86                120
23      P0023 2024-01-17  38         88                125
24      P0024 2024-01-24  41         59                118
25      P0025 2024-01-18  63         61                118
   blood_pressure_dia temperature oxygen_saturation
1                  62        98.5                99
2                  69        98.1               102
3                  67        98.2               100
4                  76        98.9                97
5                  87        98.2                99
6                  79        98.1                95
7                 110        98.2                99
8                  68        99.2                98
9                  78        98.4                99
10                 77        98.8                99
11                 98        98.9                99
12                 81        99.2                96
13                 75        97.6                98
14                 78        98.3                96
15                 86        97.8                98
16                101        98.0                95
17                 82        98.9                97
18                 80        98.9                99
19                 69        99.4               100
20                 86        98.7               101
21                 76        98.2                99
22                 73        98.8                95
23                 76        98.6                95
24                 77        98.7                98
25                 82        98.6                99
   cholesterol glucose weight_kg height_cm
1          225     107      73.1       183
2          205      82      51.2       155
3          206     113      61.3       163
4          221     129      73.1       173
5          176     118      67.3       190
6          196     112      56.3       185
7          146      98      61.2       168
8          185     150      58.7       167
9          159      90      70.2       197
10         172      78      69.8       173
11          97     101      63.6       170
12         206      82      59.2       171
13         220     105      63.1       164
14         208     107      59.2       184
15         189     107      73.7       184
16         138      76      94.6       165
17         223      95      49.4       154
18         168      86      39.6       162
19         225     102      84.7       161
20         238      75      81.1       175
21         205     127      55.3       165
22         164      66      76.8       163
23         206      72      79.2       165
24         154     112      43.0       160
25         112      98      83.9       154

9.4 Printing More Rows/Columns of a Tibble

# By default, print() shows 10 rows. Use n= to show more rows
print(health_data, n = 20)  # Shows 20 rows
# A tibble: 25 × 12
   patient_id admit_date   age heart_rate blood_pressure_sys
   <chr>      <date>     <int>      <dbl>              <dbl>
 1 P0001      2024-01-07    25         70                129
 2 P0002      2024-01-20    48         84                105
 3 P0003      2024-01-11    78         79                116
 4 P0004      2024-01-27    55         65                125
 5 P0005      2024-01-16    54        102                108
 6 P0006      2024-01-24    24         57                122
 7 P0007      2024-01-17    56         57                138
 8 P0008      2024-01-18    89         67                131
 9 P0009      2024-01-08    29         72                119
10 P0010      2024-01-18    49         92                 86
11 P0011      2024-01-08    25         66                120
12 P0012      2024-01-13    25         68                 95
13 P0013      2024-01-02    74         78                106
14 P0014      2024-01-29    40         85                109
15 P0015      2024-01-04    87         50                127
16 P0016      2024-01-17    86         81                106
17 P0017      2024-01-13    80         87                150
18 P0018      2024-01-20    83         54                111
19 P0019      2024-01-14    83         77                122
20 P0020      2024-01-10    65         85                114
# ℹ 5 more rows
# ℹ 7 more variables: blood_pressure_dia <dbl>,
#   temperature <dbl>, oxygen_saturation <dbl>,
#   cholesterol <dbl>, glucose <dbl>, weight_kg <dbl>,
#   height_cm <dbl>
# To see all rows
print(health_data, n = Inf)
# A tibble: 25 × 12
   patient_id admit_date   age heart_rate blood_pressure_sys
   <chr>      <date>     <int>      <dbl>              <dbl>
 1 P0001      2024-01-07    25         70                129
 2 P0002      2024-01-20    48         84                105
 3 P0003      2024-01-11    78         79                116
 4 P0004      2024-01-27    55         65                125
 5 P0005      2024-01-16    54        102                108
 6 P0006      2024-01-24    24         57                122
 7 P0007      2024-01-17    56         57                138
 8 P0008      2024-01-18    89         67                131
 9 P0009      2024-01-08    29         72                119
10 P0010      2024-01-18    49         92                 86
11 P0011      2024-01-08    25         66                120
12 P0012      2024-01-13    25         68                 95
13 P0013      2024-01-02    74         78                106
14 P0014      2024-01-29    40         85                109
15 P0015      2024-01-04    87         50                127
16 P0016      2024-01-17    86         81                106
17 P0017      2024-01-13    80         87                150
18 P0018      2024-01-20    83         54                111
19 P0019      2024-01-14    83         77                122
20 P0020      2024-01-10    65         85                114
21 P0021      2024-01-26    83         76                 96
22 P0022      2024-01-15    64         86                120
23 P0023      2024-01-17    38         88                125
24 P0024      2024-01-24    41         59                118
25 P0025      2024-01-18    63         61                118
# ℹ 7 more variables: blood_pressure_dia <dbl>,
#   temperature <dbl>, oxygen_saturation <dbl>,
#   cholesterol <dbl>, glucose <dbl>, weight_kg <dbl>,
#   height_cm <dbl>

9.4.1 Controlling Column Width

# width argument to print specifies the number of characters that should
# be printed in the widest row. In effect, this limits the number of columns
# being output to those columns that fit in the specified width.
print(health_data, width = 75)
# A tibble: 25 × 12
   patient_id admit_date   age heart_rate blood_pressure_sys
   <chr>      <date>     <int>      <dbl>              <dbl>
 1 P0001      2024-01-07    25         70                129
 2 P0002      2024-01-20    48         84                105
 3 P0003      2024-01-11    78         79                116
 4 P0004      2024-01-27    55         65                125
 5 P0005      2024-01-16    54        102                108
 6 P0006      2024-01-24    24         57                122
 7 P0007      2024-01-17    56         57                138
 8 P0008      2024-01-18    89         67                131
 9 P0009      2024-01-08    29         72                119
10 P0010      2024-01-18    49         92                 86
# ℹ 15 more rows
# ℹ 7 more variables: blood_pressure_dia <dbl>, temperature <dbl>,
#   oxygen_saturation <dbl>, cholesterol <dbl>, glucose <dbl>,
#   weight_kg <dbl>, height_cm <dbl>
# Show all columns by setting width to Inf
print(health_data, width = Inf)
# A tibble: 25 × 12
   patient_id admit_date   age heart_rate blood_pressure_sys blood_pressure_dia temperature oxygen_saturation cholesterol glucose weight_kg height_cm
   <chr>      <date>     <int>      <dbl>              <dbl>              <dbl>       <dbl>             <dbl>       <dbl>   <dbl>     <dbl>     <dbl>
 1 P0001      2024-01-07    25         70                129                 62        98.5                99         225     107      73.1       183
 2 P0002      2024-01-20    48         84                105                 69        98.1               102         205      82      51.2       155
 3 P0003      2024-01-11    78         79                116                 67        98.2               100         206     113      61.3       163
 4 P0004      2024-01-27    55         65                125                 76        98.9                97         221     129      73.1       173
 5 P0005      2024-01-16    54        102                108                 87        98.2                99         176     118      67.3       190
 6 P0006      2024-01-24    24         57                122                 79        98.1                95         196     112      56.3       185
 7 P0007      2024-01-17    56         57                138                110        98.2                99         146      98      61.2       168
 8 P0008      2024-01-18    89         67                131                 68        99.2                98         185     150      58.7       167
 9 P0009      2024-01-08    29         72                119                 78        98.4                99         159      90      70.2       197
10 P0010      2024-01-18    49         92                 86                 77        98.8                99         172      78      69.8       173
# ℹ 15 more rows

9.4.2 Row Names

# data.frames can have row names
df_rownames = data.frame(
  x = 1:3,
  y = letters[1:3],
  row.names = c("row1", "row2", "row3")
)
df_rownames
     x y
row1 1 a
row2 2 b
row3 3 c
# Tibbles don't support row names
# If you convert a data.frame with row names to a tibble,
# the row names become a regular column called 'rowname'
as_tibble(df_rownames, rownames = "id")
# A tibble: 3 × 3
  id        x y    
  <chr> <int> <chr>
1 row1      1 a    
2 row2      2 b    
3 row3      3 c    

9.5 Creating Tibbles

You can create tibbles in several ways:

# Using tibble()
t1 = tibble(
  x = 1:5,
  y = x * 2,  # Note: you can refer to columns created earlier
  z = letters[1:5]
)
t1
# A tibble: 5 × 3
      x     y z    
  <int> <dbl> <chr>
1     1     2 a    
2     2     4 b    
3     3     6 c    
4     4     8 d    
5     5    10 e    

9.6 creating tibble row by row using tribbles

While reading the raw code for creating a dataframe or a tibble, it can be challenging to visualize what the actual dataframe/tibble will look like. This is because when typing the data into the code, each column is typed horrizontally instead of vertically. For example:

# Using tribble() for transposed input
# Useful for small, manual data entry
stuff = tribble(
  col1 = c("a",  "b",   "c"),
  col2 = c( 1,    2,     3)
  col3 = c(TRUE, FALSE, TRUE))

# The code above lays out columns horizontally. 
# The actual dataframe displays columns vertically.
stuff

A “tribble” (i.e. TRansposed tIBBLE) is just a different way of typing the code that becomes a tibble. Each column heading is prefixed with a tilde (~). The columns can be laid out vertically in the code, making the code much more readable. See the example below.

# Using tribble() for transposed input
# Useful for small, manual data entry
stuff = tribble(
  ~col1, ~col2, ~col3,
  "a",   1,     TRUE,
  "b",   2,     FALSE,
  "c",   3,     TRUE
)

# The following looks much more similar to the code that created it.
stuff

9.7 Converting Between Tibbles and data.frames

# Convert data.frame to tibble
df = data.frame(
  x = 1:3,
  y = letters[1:3]
)
tbl = as_tibble(df)

# Convert tibble back to data.frame
df_again = as.data.frame(tbl)

# Check classes
class(tbl)
[1] "tbl_df"     "tbl"        "data.frame"
class(df_again)
[1] "data.frame"

9.8 Other differences between tibbles and dataframes

9.8.1 Variable Names and Subsetting

# data.frames modify non-syntactic names
df_names = data.frame(
  `1` = 1:3,
  `2+2` = 4:6,
  check.names = TRUE  # default behavior
)
names(df_names)  # Names are modified
[1] "X1"   "X2.2"
# Tibbles preserve original names
tbl_names = tibble(
  `1` = 1:3,
  `2+2` = 4:6
)
names(tbl_names)  # Original names kept
[1] "1"   "2+2"
# Subsetting differences
# data.frame allows partial matching of variable names
df = data.frame(numbers = 1:3, letters = c("a", "b", "c"))
df$num  # Partial matching works
[1] 1 2 3
# Tibbles require exact matching
tbl = tibble(numbers = 1:3, letters = c("a", "b", "c"))
try(tbl$num)  # This will raise an error
Warning: Unknown or uninitialised column: `num`.
NULL