29  29. Get the data - “books database” and other tables

As we learn about databases and SQL we will need to refer to various collections of data (i.e. “tables” of data in SQL terminology). See below for info on how to download and install the various collections of data into R.

29.1 Download all of the files

Download the files by clicking on the links below. To load the various collections of data into R see the instructions in the sections below.

  • The following file contains the grades table:

  • The following file contains several tables of data
    (customers, orders, movies, north_american_cities, boxOffice, buildings, employees)

  • The following files are all associated with the “Books database”:

29.2 Prepare to import the data

In order to read the data into R and to access the data using SQL, we will need to load the R packages “readr” and “sqldf”. We do so with the following commands:

if(!require(readr)){install.packages("readr");require(readr);}
if(!require(sqldf)){install.packages("sqldf");require(sqldf);}
Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

29.3 import the data: grades

grades.csv is a .csv file that contains information about students and their grades. You can read the contents of the file into R with the read.csv function as shown below.

grades = read.csv("grades.csv")

Once you’ve done so, you can display the data:

sqldf("select * from grades")
Error in sqldf("select * from grades"): could not find function "sqldf"

29.4 import the data: customers, orders, movies, north_american_cities, etc

The file sqlbolt_tables-v007.RData contains the following different collections of information:

  • customers
  • orders
  • movies
  • north_american_cities
  • as well as some others (boxOffice, buildings, employees)

Use the load function as shown below to load the data into R.

load("sqlbolt_tables-v007.RData")

Below are the contents of these tables:

sqldf("select * from grades")
Error in sqldf("select * from grades"): could not find function "sqldf"
sqldf("select * from movies")
Error in sqldf("select * from movies"): could not find function "sqldf"
sqldf("select * from orders")
Error in sqldf("select * from orders"): could not find function "sqldf"
sqldf("select * from customers")
Error in sqldf("select * from customers"): could not find function "sqldf"
sqldf("select * from north_american_cities")
Error in sqldf("select * from north_american_cities"): could not find function "sqldf"

29.5 import the data: “books database”

The “books database” is a collection of several relational database tables. The links to download the files for this database are located at the top of this webpage.

Pay special attention to the file booksDatabase description and Questions which contains a description of the database as well as a discussion of different concepts related to databases in general. You should use this file as a tutorial about the books database. The end of this file contains many questions/answers to test your knowledge of SQL.

Once you’ve downloaded the various files you can run the following commands to import the data into R.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The books database contains information about various books, authors,
# publishers, etc.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# Read in the data for the books database - see the 
titles = read_csv("data/booksDatabase/titles.csv", na="NULL", show_col_types=FALSE)
authors = read_csv("data/booksDatabase/authors.csv", na="NULL", show_col_types=FALSE)
publishers = read_csv("data/booksDatabase/publishers.csv", na="NULL", show_col_types=FALSE)
title_authors = read_csv("data/booksDatabase/title_authors.csv", na="NULL", show_col_types=FALSE)
royalties = read_csv("data/booksDatabase/royalties.csv", na="NULL", show_col_types=FALSE)

Below are the contents of these tables:

sqldf("select * from titles")
Error in sqldf("select * from titles"): could not find function "sqldf"
sqldf("select * from publishers")
Error in sqldf("select * from publishers"): could not find function "sqldf"
sqldf("select * from title_authors")
Error in sqldf("select * from title_authors"): could not find function "sqldf"
sqldf("select * from authors")
Error in sqldf("select * from authors"): could not find function "sqldf"
sqldf("select * from royalties")
Error in sqldf("select * from royalties"): could not find function "sqldf"