if(!require(jsonlite)){install.packages("jsonlite");require(jsonelite);}
Loading required package: jsonlite
The jsonlite R pacakge is used to read JSON files into R. The following is the link to the CRAN webpage for the jsonlite package. See the vignettes and reference manual linked there.
jsonlite: A Simple and Robust JSON Parser and Generator for R
Below, I copied the links to some of the vignettes from that page. However, it’s best to check the official CRAN page in case there are modifications or if there are new vignettes added there (this page was compiled in Jan 2025).
IMPORTANT: The following describes how to use the fromJSON and toJSON functions from the jsonlite package. Getting started with JSON and jsonlite
The following will be covered later when we discuss APIs:
Fetching JSON data from REST APIs (2022-01-16)
The following is nice to know but is something that you can figure out yourself without needing this:
Combining pages of JSON data with jsonlite (2022-01-16)
The following is a more detailed definition of how JSON is converted to R data and how R data is converted to JSON. You should NOT need to look at this but it is nice to know that it is available for reference in case you get stuck.
There are some other vignettes on the CRAN page, but you probably don’t need to look at them.
The above vignettes are great for understanding this material. Below I present some of my own material, however, reading the vignettes linked above is just as good (or maybe better) than reading my material.
Using R to read/write JSON files and process their contents comes down to knowing how to do the following
read/write JSON text to/from R vectors using the baseR functions readLines() and writeLines()
convert between JSON text and R objects (lists, dataframes, matrices, arrays, etc) using the fromJSON() and toJSON() functions from the jsonlite package
knowing how to navigate the R objects that were created from the JSON to retrieve the data you want. This is just standard R knowledge about how to navigate R objects.
These topics are covered below.
To use the toJSON or fromJSON functions we will need to use the jsonlite package.
if(!require(jsonlite)){install.packages("jsonlite");require(jsonelite);}
Loading required package: jsonlite
The jsonlite
package in R provides numerous functions for working with JSON. We will discuss the following functions below:
toJSON()
: Convert R objects into a JSON format character vector (You can then write this vector to a file using writeLines() - see above)
fromJSON()
: Convert JSON data into R objects
prettify()
: Format the JSON text on different lines and indent the text in a readable way
minify()
: Convert JSON text to a single line and remove all extra spaces and tabs
Any R object (e.g. vector, list, dataframe, etc) can be converted to JSON text and written to a JSON file. The steps are:
Convert the R object to JSON text :
jsonText = jsonlite::toJSON( SOME_R_OBJECT )
Write the JSON text to a file :
writeLines(jsonText, filename)
Example:
# create a dataframe
= data.frame(
df student = c("joe", "sue", "anne"),
honors = c(FALSE, FALSE, TRUE),
test1 = c(NA,80,90),
test2 = c(75,85,95)
)
df
student honors test1 test2
1 joe FALSE NA 75
2 sue FALSE 80 85
3 anne TRUE 90 95
# convert the dataframe to JSON text
= toJSON(df, pretty=TRUE)
json
json
[
{
"student": "joe",
"honors": false,
"test2": 75
},
{
"student": "sue",
"honors": false,
"test1": 80,
"test2": 85
},
{
"student": "anne",
"honors": true,
"test1": 90,
"test2": 95
}
]
# Create a new directory (i.e. folder) to contain the json files.
# (We create a folder to contain the file. The folder isn't necessary but
# in general, it's a good idea to organize your files into folders that make sense.)
dir.create("jsonFiles",
showWarnings=FALSE) # don't warn us if the folder already exists
# Create a file that contains the JSON data
writeLines(json, "jsonFiles/sampleJsonFile.json")
# Use dir() to see that the file has been created.
dir("jsonFiles")
[1] "sampleJsonFile.json"
# We can see the text of the file by reading the lines of
# the file into a character vector with the BaseR readLines() function.
= readLines("jsonFiles/sampleJsonFile.json")
lines
# A character vector - each entry is one line from the file.
# This can be very hard to read.
lines
[1] "[" " {"
[3] " \"student\": \"joe\"," " \"honors\": false,"
[5] " \"test2\": 75" " },"
[7] " {" " \"student\": \"sue\","
[9] " \"honors\": false," " \"test1\": 80,"
[11] " \"test2\": 85" " },"
[13] " {" " \"student\": \"anne\","
[15] " \"honors\": true," " \"test1\": 90,"
[17] " \"test2\": 95" " }"
[19] "]"
# It is much easier to read the data with cat
cat(lines, sep="\n")
[
{
"student": "joe",
"honors": false,
"test2": 75
},
{
"student": "sue",
"honors": false,
"test1": 80,
"test2": 85
},
{
"student": "anne",
"honors": true,
"test1": 90,
"test2": 95
}
]
Note that toJSON call above included the argument “pretty=TRUE”. This formats the JSON with easy to read indentation.
By contrast, if you leave out “pretty=TRUE” (or if you specify pretty=FALSE) the JSON is created in “minified” format - i.e. all in one line without any extra “whitespace” (no extra spaces, tabs or newline characters), as shown below. The minified version is easy for computers to read and takes up less space. The “pretty” version is appropriate when you want people to read and understand the JSON.
There are also standalone functions prettify() and minify() that convert JSON text - see the next section.
The jsonlite package also provides standalone functions for formatting JSON text.
jsonlite::prettify()
formats JSON text on different lines and indents the text in a readable way.
jsonlite::minify()
removes all extra “whitespace” from JSON text. In other words, it converts JSON text to a single line and removes all extra spaces and tabs.
Examples
# Create a compact JSON string
<- '{"name":"Alice","age":25,"hobbies":["reading","hiking"]}'
json_string
# Make it pretty
prettify(json_string)
{
"name": "Alice",
"age": 25,
"hobbies": [
"reading",
"hiking"
]
}
# Make it compact
minify(prettify(json_string)) # Converting pretty JSON back to compact form
{"name":"Alice","age":25,"hobbies":["reading","hiking"]}
The information below reflects the default approach that toJSON() uses in converting R data into JSON. However, toJSON() takes numerous arguments that can be used to adjust exactly what form the resulting JSON takes. For more info see the R help ?toJSON
R vectors are converted into JSON arrays
= c(10,20,30)
nums toJSON(nums, pretty=TRUE)
[10, 20, 30]
R dataframes are converted into a JSON array of JSON classes. Each class represents a row of the dataframe and contains a “name”:value pair for each value in the row. If a row contains NA in a column, the resulting JSON class will be missing the “name”:value pair that corresponds to that column (for an example of NA, see Joe’s grade for test1 in the example below)
Example:
= data.frame(
df student = c("joe", "sue", "anne"),
honors = c(FALSE, FALSE, TRUE),
test1 = c(NA,80,90),
test2 = c(75,85,95)
)
toJSON(df, pretty=TRUE)
[
{
"student": "joe",
"honors": false,
"test2": 75
},
{
"student": "sue",
"honors": false,
"test1": 80,
"test2": 85
},
{
"student": "anne",
"honors": true,
"test1": 90,
"test2": 95
}
]
R matrices are converted into a JSON array of JSON arrays
Example
= matrix(seq(10,120,by=10), nrow=3, ncol=4)
mat mat
[,1] [,2] [,3] [,4]
[1,] 10 40 70 100
[2,] 20 50 80 110
[3,] 30 60 90 120
toJSON(mat, pretty=TRUE)
[
[10, 40, 70, 100],
[20, 50, 80, 110],
[30, 60, 90, 120]
]
R lists whose values have names are converted into JSON classes.
Example
# named lists
= list( first="joe", middle=NA, last="jones", gpa=3.5, honors=FALSE )
namedList toJSON(namedList, pretty=TRUE)
{
"first": ["joe"],
"middle": [null],
"last": ["jones"],
"gpa": [3.5],
"honors": [false]
}
Often we start with JSON text - either from a JSON file or JSON data that is returned from a call to a Web API - and we want to convert the JSON into R objects (lists, dataframes, etc) so that we can work with the data in R.
We can use the fromJSON() function to convert JSON text into an R object. fromJSON() can accept (a) a filename that contains JSON text (b) a Web API URL that returns JSON text or (c) a character vector that contains JSON text. fromJSON() returns an R object (e.g. list, dataframe, matrix, etc) that can be manipulated with R code to get at the exact data you want.
After converting the JSON text to an R object, it is helpful to use the str() or View() functions to see the structure of the R object so that you can figure out how to access the specific data you want.
Example:
We use fromJSON() to read the file that we created above from the dataframe and get an R object that contains the data.
= fromJSON("jsonFiles/sampleJsonFile.json")
x
x
student honors test2 test1
1 joe FALSE 75 NA
2 sue FALSE 85 80
3 anne TRUE 95 90
In this case it’s easy to understand that fromJSON() returns a dataframe since we know that we had originally created this JSON from an R dataframe. However, when reading complex JSON files, it can be helpfult to use the str() or View() functions to view the structure of the data so that you can understand how to retrieve the specific data that you want.
The fromJSON()
function can accept JSON data from any of the following sources:
From a character vector that contains JSON text. The vector might consist only of a single character value that contains the full JSON text
(see the example above). The vector can also contain the JSON split up into multiple character values (e.g. one line of JSON per character value) The JSON is treated as the concatenation of all the character values. (see this example)
# JSON as a character string
<- c('{',
json_text ' "name": "Alice",',
' "age": 25',
'}')
= fromJSON(json_text)
rData
rData
$name
[1] "Alice"
$age
[1] 25
# see the structure of the R object
str(rData)
List of 2
$ name: chr "Alice"
$ age : int 25
From a JSON file
# Assuming you have a file named 'data.json'
# fromJSON("data.json") # Commented out as file doesn't exist
From a Web API URL
# Example with a public API
# fromJSON("https://api.example.com/data") # Commented out as URL doesn't exist
The general rule is as follows:
The fromJSON function has a few “simplifyXXX” arguments (see below). The default is that all “simplifyXXX” arguments are set to TRUE. When that is the case then the fromJSON function usually operates in a way that is the opposite to the toJSON function. If toJSON converts a particular type of R object into a particular type of JSON, then fromJSON will convert that type of JSON into the correponding type of R object. See the discussion of the toJSON function above for details.
This is usually enough to understand what’s going on. To get specific data from the R object, it helps a lot to understand the structure of the object by using the str() or View() functions.
The following goes into more detail, but you really don’t have to study this too deeply. In general, looking at the str() function and navigating through the data using your knowledge of R is usually enough.
More details
Usually JSON consists of a JSON class or a JSON array. However, a single number, character value, true/false or null is also valid JSON. In the unusual case that the JSON consists of just one number, character value or logical value then the corresponding R object is just a single R numeric, logical or character value. For exmaple:
# Returns a single numeric value, not a list
fromJSON('42')
[1] 42
# Returns a single character value, not a list
fromJSON('"apple"')
[1] "apple"
# Returns a single TRUE value, not a list
fromJSON('true')
[1] TRUE
Similarly, if the JSON consists of a single null
value then the
result is a single NULL value in R.
# Returns a single NULL value, not a list
fromJSON('null')
NULL
If the JSON consists of ["a","JSON","array"]
or a {"JSON": "class"}
, then the type of R object that is created depends on the values of the structure of the JSON and on the values of the three “simplify” arguments to the fromJSON function (see the next section)
The help for fromJSON()
shows the arguments and their default values:
```
fromJSON(
txt,
simplifyVector = TRUE,
simplifyDataFrame = simplifyVector,
simplifyMatrix = simplifyVector,
flatten = FALSE,
...
)
```
The arguments simplifyVector, simplifyDataFrame and simplifyMatrix control how the R object is created when the JSON consists of a JSON array or a JSON class.
By default all values are TRUE. Notice that by default, the value for simplifyDataframe and simplifyMatrix are set to the same value as is supplied for simplifyVector. Therefore to set all the values to FALSE, you can simply pass as single FALSE value.
If all of these values are FALSE, the function will returns an R list. For example:
# Returns a list
= fromJSON('{"a": 1}', FALSE)
x x
$a
[1] 1
str(x)
List of 1
$ a: int 1
# Returns a list
= fromJSON('[1, 2, 3]', FALSE)
x x
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
str(x)
List of 3
$ : int 1
$ : int 2
$ : int 3
When the simplifyVector, simplifyDataFrame or simplifyMatrix arguements are TRUE, then the R object being returned might be a vector, dataframe, matrix or list (depending on the structure of the JSON).
# JSON array example
<- '["a", "b", "c"]'
json_array
# Default behavior (simplifyVector = TRUE) - you get an R vector
fromJSON(json_array)
[1] "a" "b" "c"
# Without simplification (simplifyVector = FALSE) - you get an R LIST
fromJSON(json_array, simplifyVector = FALSE)
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c"
When simplifyDataframe is set to TRUE JSON data that consists of an array of classes, is returned as an R dataframe. Each row of the dataframe comes from the data of a single class in the JSON array. This makes perfect sense when all of the classes contain the same names. Each name in the classes becomees a column in the resulting dataframe.
If some of the classes contain names that don’t appear in other classes, then there will be NAs in the resulting dataframe in those columns for rows that correspond to classes that didn’t contain that name.
See the example:
# A JSON array of JSON classes
<- '[
jsonText {"first": "Bob", "middle": "Michael", "last": "jones", "age": 30},
{"first": "Alice", "last": "cohen", "age": 25},
{"first": "Jane", "last": "springer", "age": 25}
]'
# Default behavior (simplifyDataFrame = TRUE) returns an R dataframe
= fromJSON(jsonText)
x x
first middle last age
1 Bob Michael jones 30
2 Alice <NA> cohen 25
3 Jane <NA> springer 25
str(x)
'data.frame': 3 obs. of 4 variables:
$ first : chr "Bob" "Alice" "Jane"
$ middle: chr "Michael" NA NA
$ last : chr "jones" "cohen" "springer"
$ age : int 30 25 25
# Without simplification (simplifyDataFrame = FALSE)
= fromJSON(jsonText, simplifyDataFrame = FALSE)
x x
[[1]]
[[1]]$first
[1] "Bob"
[[1]]$middle
[1] "Michael"
[[1]]$last
[1] "jones"
[[1]]$age
[1] 30
[[2]]
[[2]]$first
[1] "Alice"
[[2]]$last
[1] "cohen"
[[2]]$age
[1] 25
[[3]]
[[3]]$first
[1] "Jane"
[[3]]$last
[1] "springer"
[[3]]$age
[1] 25
str(x)
List of 3
$ :List of 4
..$ first : chr "Bob"
..$ middle: chr "Michael"
..$ last : chr "jones"
..$ age : int 30
$ :List of 3
..$ first: chr "Alice"
..$ last : chr "cohen"
..$ age : int 25
$ :List of 3
..$ first: chr "Jane"
..$ last : chr "springer"
..$ age : int 25
# JSON representing matrix-like data
<- '[
json_matrix [1, 2, 3],
[4, 5, 6]
]'
# Default behavior (simplifyMatrix = TRUE)
fromJSON(json_matrix)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
# Without simplification (simplifyMatrix = FALSE)
fromJSON(json_matrix, simplifyMatrix = FALSE)
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5 6
When JSON contains nested objects or arrays, fromJSON()
can create nested data frames. A typical dataframe contains only simple individual values (e.g. a single number, text value or logical value) in each cell of the dataframe. A nested dataframe can contain an entire “inner” dataframe or list inside the cells of the “outer” dataframe. The data can be accessed via typical R dataframe techniques. However, it’s important to analyze the structure of the data - e.g. using str() or View() - to understand how to access the data you want.
Example:
# JSON with nested structure
<- '[
nested_json {
"name": "Alice",
"pets": [
{"type": "cat", "name": "Whiskers"},
{"type": "dog", "name": "Rex"}
]
},
{
"name": "Bob",
"pets": [
{"type": "fish", "name": "Bubbles"}
]
}
]'
# Convert to nested data frame
<- fromJSON(nested_json)
nested_df nested_df
name pets
1 Alice cat, dog, Whiskers, Rex
2 Bob fish, Bubbles
# When trying to retrieve the data it is VERY important to
# understand the structure of the R data.
# You can use the following tools to do so:
#
# str() function in baseR
# View() function in baseR (View with a capital "V")
# tidyverse::glimpse() function in tidyverse's dplyr package - similar to str()
# the variable in the "Environment" window pane in RStudio
# See the structure of the data to understand how to extract specific data
str(nested_df)
'data.frame': 2 obs. of 2 variables:
$ name: chr "Alice" "Bob"
$ pets:List of 2
..$ :'data.frame': 2 obs. of 2 variables:
.. ..$ type: chr "cat" "dog"
.. ..$ name: chr "Whiskers" "Rex"
..$ :'data.frame': 1 obs. of 2 variables:
.. ..$ type: chr "fish"
.. ..$ name: chr "Bubbles"
# By analyzing the output from str() we can understand how to access
# specific data that we might want.
# Get the pet owner's names
$name nested_df
[1] "Alice" "Bob"
# Get info for all pets
$pets nested_df
[[1]]
type name
1 cat Whiskers
2 dog Rex
[[2]]
type name
1 fish Bubbles
# Get all info for pets of person #1
$pets[[1]] nested_df
type name
1 cat Whiskers
2 dog Rex
# Use str to see the structure of this
str(nested_df$pets[[1]])
'data.frame': 2 obs. of 2 variables:
$ type: chr "cat" "dog"
$ name: chr "Whiskers" "Rex"
# See all info for the 1st person's 2nd pet
$pets[[1]][2,] nested_df
type name
2 dog Rex
# Get all info for pets of person #2
$pets[[2]] nested_df
type name
1 fish Bubbles
# Get the types of pets for person #1
$pets[[1]]$type nested_df
[1] "cat" "dog"
In this example, the nested data frame has a column pets
that contains another data frame. Each row in the pets
column corresponds to the pets owned by the person in that row of the main data frame.
Some JSON contains JSON arrays nested inside JSON arrays, nested inside a JSON array. This type of data gets converted into an R three dimensional (3d) array. The following shows an example of how to extract data from such a structure in R. There is a lot of similarity to how this data is extracted from 2d matrices. (NOTE - an R matrix is also known as a 2d array).
# The following JSON gets converted into a 3 dimensinal array in R
# which is basically an array of matrices
=fromJSON('[
x [
[1,2,3,4],
[5,6,7,8]
],
[
[9,10,11,12],
[13,14,15,16]
],
[
[17,18,19,20],
[21,22,23,24]
]
]')
# 1st matrix
1,,] x[
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
# 2nd matrix
2,,] x[
[,1] [,2] [,3] [,4]
[1,] 9 10 11 12
[2,] 13 14 15 16
# 3rd matrix
3,,] x[
[,1] [,2] [,3] [,4]
[1,] 17 18 19 20
[2,] 21 22 23 24
# The first index specifies which matrix or matrices
# The 2nd index specifies which row or rows
# The 3rd index specifies which column or columns
# all values from 1st row of 3rd matrix
3,1,] x[
[1] 17 18 19 20
# all values from 2nd column of 1st matrix
1,,2] x[
[1] 2 6
# The 2nd matrix, 1st row, 3rd column
2,1,3] x[
[1] 11
# All 3 matrices, just the numbers in the first row and first column
1,1] x[,
[1] 1 9 17
# The 2nd rows from each matrix
2,] x[,
[,1] [,2] [,3] [,4]
[1,] 5 6 7 8
[2,] 13 14 15 16
[3,] 21 22 23 24