The following are the different collections of data. Each collection can be downloaded in different formats (e.g. .sqlite, .csv, .RData). Each format should contain the same information for a given dataset. The data is made available in different formats so that you can access the data using different tools, specifically:
a .sqlite database for use with R and many other sqlite tools, including yrSqliteViewer (see below for more info)
a set of .csv files for use with R (and many other tools)
a .RData file for use with R
See the information below for infomation about how to use each of the different formats. For now, if you’re just getting started, I recommend that you download the .sqlite versions for each collection of data and then follow the instructions below for how to download and use the yrSqliteViewer tool.
A collection of a few tables that come from some of the lessons on the website https://sqlbolt.com/. These are stored in the file “sqlbolt_tables-v007.RData”.
This is a collection of several tables (titles, publishers, authors, royalties, title_authors) The following tutorial uses this data to explain relational databases and SQL:
Download as a single ,RData file that contains all tables as R dataframes: booksDatabase-v004.RData
28.1.5 Converting between formats
The following is an R file that can be used to convert between the various formats. You do NOT need to use this, but it may be useful to you now or in the future.
Click here for a set of R functions to convert between formats
Click below to download a single R file that contains all the functions.
# =============================================================================# SQLite Utility Functions# # Helper functions for converting between R data frames and SQLite databases.# Useful for bundling multiple data frames into a single portable file,# or for loading all tables from a SQLite database into R.## Dependencies: DBI, RSQLite, tcltk (for file/folder picker dialogs)### -------------------# Functions Provided# -------------------## dfsToSqlite() - Write data frames from the global environment to a SQLite# database (one table per data frame).## sqliteToDfs() - Load all tables from a SQLite database into R as data frames# (optionally placing them in the global environment).## sqliteToExcel() - Export all tables in a SQLite database to an Excel workbook# with one worksheet per table.## sqliteToRData() - Save all tables from a SQLite database into an .RData file# as individual data frame objects.## rdataToSqlite() - Load an .RData file and write any contained data frames# to a SQLite database (one table per data frame).## excelToSqlite() - Import all worksheets from an Excel file into a SQLite# database, with optional control over where headers start# in each sheet.# =============================================================================if (!require(DBI)) { install.packages("DBI"); require(DBI) }
if (!require(tcltk)) { install.packages("tcltk"); require(tcltk) }
Loading required package: tcltk
# -----------------------------------------------------------------------------# dfsToSqlite# # Writes data frames from the global environment into a SQLite database file.# Each data frame becomes a table, named after the variable.# Non-data-frame objects are silently skipped.## Arguments:# sqliteFilename - (character) Path to the SQLite file to create/overwrite.# If the file already exists, matching tables are overwritten.# objs - (character vector, optional) Names of objects to export.# If omitted, all objects in the global environment are# checked and any data frames found are exported.## Returns:# Nothing. Called for its side effect of writing to a SQLite file.## ==============# Usage Examples# ==============## --- Example 1: Export all data frames in the environment ----------------## movies <- data.frame(title = c("Toy Story", "Cars"), year = c(1995, 2006))# boxoffice <- data.frame(movie_id = c(1, 2), revenue = c(373, 462))## # Writes both 'movies' and 'boxoffice' tables to the SQLite file# dfsToSqlite("my_data.sqlite")### --- Example 2: Export only specific data frames -------------------------## dfsToSqlite("my_data.sqlite", c("movies", "boxoffice"))### --- Example 3: Let the user pick where to save with a file dialog -------## savePath <- tcltk::tclvalue(tcltk::tkgetSaveFile(# defaultextension = ".sqlite",# filetypes = "{{SQLite Files} {.sqlite}} {{All Files} {*}}"# ))# if (nchar(savePath) > 0) {# dfsToSqlite(savePath)# }# -----------------------------------------------------------------------------dfsToSqlite <-function(sqliteFilename, objs) { con <-dbConnect(SQLite(), sqliteFilename)on.exit(dbDisconnect(con))# If no object names were passed in, grab everything in the global environmentif (missing(objs)) objs <-ls(envir = .GlobalEnv)for (name in objs) { obj <-get(name, envir = .GlobalEnv)# Only write objects that are data frames; skip everything elseif (is.data.frame(obj)) {dbWriteTable(con, name, obj, overwrite =TRUE) } }}# -----------------------------------------------------------------------------# sqliteToDfs# # Reads all tables from a SQLite database and returns them as data frames.## Arguments:# sqliteFilename - (character) Path to an existing SQLite file.# toGlobalEnv - (logical, default FALSE) If TRUE, each table is also# loaded into the global environment as a separate variable# (e.g., a table named "movies" becomes a variable `movies`).# If FALSE, tables are only returned as a named list.## Returns:# A named list of data frames, one per table. If toGlobalEnv = TRUE,# the list is returned invisibly (and the data frames are also created# as individual variables in the global environment).# ==============# Usage Examples# ==============# --- Example 1: Load tables as a list -----------------------------------## dfs <- sqliteToDfs("my_data.sqlite")# dfs$movies # access the movies table# dfs$boxoffice # access the boxoffice table### --- Example 2: Load tables directly into the global environment ---------## sqliteToDfs("my_data.sqlite", toGlobalEnv = TRUE)# # Now you can use them directly:# head(movies)# summary(boxoffice)### --- Example 3: Let the user pick a SQLite file to load -----------------## filepath <- tk_choose.files(# caption = "Select a SQLite database",# filter = matrix(c("SQLite Files", ".sqlite", "All Files", ".*"),# ncol = 2, byrow = TRUE)# )# if (length(filepath) > 0 && nchar(filepath) > 0) {# sqliteToDfs(filepath, toGlobalEnv = TRUE)# }# -----------------------------------------------------------------------------sqliteToDfs <-function(sqliteFilename, toGlobalEnv =FALSE) {if (!file.exists(sqliteFilename)) {stop("File not found: ", sqliteFilename) } con <-dbConnect(SQLite(), sqliteFilename)on.exit(dbDisconnect(con)) tables <-dbListTables(con)# Read each table into a named list dfs <-list()for (name in tables) { dfs[[name]] <-dbReadTable(con, name) }# Optionally inject each data frame into the global environmentif (toGlobalEnv) {list2env(dfs, envir = .GlobalEnv)message("Loaded ", length(dfs), " tables: ", paste(names(dfs), collapse =", "))return(invisible(dfs)) } dfs}# -----------------------------------------------------------------------------# sqliteToExcel## Converts all tables in a SQLite database into an Excel workbook.# Each SQLite table becomes one worksheet.## Arguments:# sqliteFilename - (character) Path to an existing SQLite file.# excelFilename - (character, optional) Output .xlsx file path.# If omitted, the user is prompted to choose a location.## Dependencies:# openxlsx## Returns:# Nothing. Called for its side effect of writing an Excel file.## ==============# Usage Examples# ==============## # Save the data to a specific Excel filename## sqliteToExcel("my_data.sqlite", "my_data.xlsx")## # Or let the user pick the save location:## sqliteToExcel("my_data.sqlite")# -----------------------------------------------------------------------------sqliteToExcel <-function(sqliteFilename, excelFilename) {if (!file.exists(sqliteFilename)) {stop("File not found: ", sqliteFilename) }# Ask user where to save if no filename providedif (missing(excelFilename)) { excelFilename <- tcltk::tclvalue(tcltk::tkgetSaveFile(defaultextension =".xlsx",filetypes ="{{Excel Files} {.xlsx}} {{All Files} {*}}" ))if (nchar(excelFilename) ==0) return(invisible(NULL)) } con <-dbConnect(SQLite(), sqliteFilename)on.exit(dbDisconnect(con)) tables <-dbListTables(con) wb <-createWorkbook()for (name in tables) { df <-dbReadTable(con, name)addWorksheet(wb, name)writeData(wb, name, df) }saveWorkbook(wb, excelFilename, overwrite =TRUE)message("Wrote ", length(tables), " tables to: ", excelFilename)}# -----------------------------------------------------------------------------# sqliteToRData## Reads all tables from a SQLite database and saves them into an .RData file.# Each table becomes a data frame object inside the RData file.## Arguments:# sqliteFilename - (character) Path to an existing SQLite file.# rdataFilename - (character, optional) Output .RData file path.# If omitted, the user is prompted to choose a location.## Returns:# Nothing. Called for its side effect of writing an .RData file.## ==============# Usage Examples# ==============## --- Example 1: Convert SQLite database to RData file -------------------## sqliteToRData("my_data.sqlite", "my_data.RData")### --- Example 2: Let the user pick where to save -------------------------## sqliteToRData("my_data.sqlite")# -----------------------------------------------------------------------------sqliteToRData <-function(sqliteFilename, rdataFilename) {if (!file.exists(sqliteFilename)) {stop("File not found: ", sqliteFilename) }# Ask user where to save if not providedif (missing(rdataFilename)) { rdataFilename <- tcltk::tclvalue(tcltk::tkgetSaveFile(defaultextension =".RData",filetypes ="{{RData Files} {.RData}} {{All Files} {*}}" ))if (nchar(rdataFilename) ==0) return(invisible(NULL)) } con <-dbConnect(SQLite(), sqliteFilename)on.exit(dbDisconnect(con)) tables <-dbListTables(con) dfs <-list()for (name in tables) { dfs[[name]] <-dbReadTable(con, name) }# Save all tables as objects in the RData filelist2env(dfs, envir =environment())save(list =names(dfs), file = rdataFilename, envir =environment())message("Saved ", length(dfs), " tables to: ", rdataFilename)}# -----------------------------------------------------------------------------# rdataToSqlite## Loads objects from an .RData file and writes any data frames found# into a SQLite database. Each data frame becomes a table with the# same name as the object.## Arguments:# rdataFilename - (character) Path to an existing .RData file.# sqliteFilename - (character, optional) Output SQLite file path.# If omitted, the user is prompted to choose a location.## Returns:# Nothing. Called for its side effect of writing a SQLite file.## ==============# Usage Examples# ==============## --- Example 1: Convert RData file to SQLite database -------------------## rdataToSqlite("my_data.RData", "my_data.sqlite")### --- Example 2: Let the user pick where to save -------------------------## rdataToSqlite("my_data.RData")# -----------------------------------------------------------------------------rdataToSqlite <-function(rdataFilename, sqliteFilename) {if (!file.exists(rdataFilename)) {stop("File not found: ", rdataFilename) }# Ask user where to save if not providedif (missing(sqliteFilename)) { sqliteFilename <- tcltk::tclvalue(tcltk::tkgetSaveFile(defaultextension =".sqlite",filetypes ="{{SQLite Files} {.sqlite}} {{All Files} {*}}" ))if (nchar(sqliteFilename) ==0) return(invisible(NULL)) } env <-new.env()load(rdataFilename, envir = env) objs <-ls(env) con <-dbConnect(SQLite(), sqliteFilename)on.exit(dbDisconnect(con)) written <-c()for (name in objs) { obj <-get(name, envir = env)if (is.data.frame(obj)) {dbWriteTable(con, name, obj, overwrite =TRUE) written <-c(written, name) } }message("Wrote ", length(written), " tables: ", paste(written, collapse =", "))}# -----------------------------------------------------------------------------# excelToSqlite## Reads all worksheets from an Excel file and writes them to a SQLite database.# Each worksheet becomes a table. Assumes the first row of the data contains# column headers.## The location of the upper-left header cell can optionally be specified.# If not specified, the function automatically searches for the first non-blank# cell scanning column A, then B, then C, etc.## Arguments:# excelFilename - (character) Path to an Excel (.xlsx/.xls) file.# sqliteFilename - (character, optional) Output SQLite file path.# If omitted, the user is prompted to choose a location.# startCells - (character vector, optional) Excel cell references# indicating the upper-left header cell for each sheet# (e.g., c("A1","B3","A2")).# Each entry corresponds to the matching sheet index.# If fewer entries than sheets are provided, remaining# sheets use automatic detection.## Dependencies:# readxl## Returns:# Nothing. Called for its side effect of writing a SQLite file.# -----------------------------------------------------------------------------excelToSqlite <-function(excelFilename, sqliteFilename, startCells =NULL) {if (!file.exists(excelFilename)) {stop("File not found: ", excelFilename) }if (!require(readxl)) {install.packages("readxl")library(readxl) }# Ask where to save if not providedif (missing(sqliteFilename)) { sqliteFilename <- tcltk::tclvalue(tcltk::tkgetSaveFile(defaultextension =".sqlite",filetypes ="{{SQLite Files} {.sqlite}} {{All Files} {*}}" ))if (nchar(sqliteFilename) ==0) return(invisible(NULL)) } sheets <- readxl::excel_sheets(excelFilename) con <-dbConnect(SQLite(), sqliteFilename)on.exit(dbDisconnect(con))# Helper: detect first non-blank cell scanning columns left→right detectStartCell <-function(sheet) { preview <- readxl::read_excel( excelFilename,sheet = sheet,col_names =FALSE,.name_repair ="minimal" ) preview <-as.data.frame(preview)for (col inseq_len(ncol(preview))) { column <- preview[[col]] idx <-which(!is.na(column) & column !="")[1]if (!is.na(idx)) { colLetter <- openxlsx::int2col(col)return(paste0(colLetter, idx)) } }stop("Could not detect data start in sheet: ", sheet) } written <-c()for (i inseq_along(sheets)) { sheet <- sheets[i]# Determine starting cell start <-NULLif (!is.null(startCells) &&length(startCells) >= i) { start <- startCells[i] } else { start <-detectStartCell(sheet) } df <- readxl::read_excel( excelFilename,sheet = sheet,range = readxl::cell_rows(NULL),skip =0,col_names =TRUE )# Re-read using the detected range df <- readxl::read_excel( excelFilename,sheet = sheet,range =paste0(start, ":1048576") ) tableName <-make.names(sheet)dbWriteTable(con, tableName, as.data.frame(df), overwrite =TRUE) written <-c(written, tableName) }message("Wrote ", length(written), " tables: ", paste(written, collapse =", "))}# =============================================================================# Usage Examples - excelToSqlite()# =============================================================================# --- Example 1: Convert an Excel file directly -------------------------## excelToSqlite("my_data.xlsx", "my_data.sqlite")# --- Example 2: Specify header locations for sheets --------------------## excelToSqlite(# "my_data.xlsx",# "my_data.sqlite",# startCells = c("A1", "B3", "A2")# )# --- Example 3: Let the user choose where to save ----------------------## excelToSqlite("my_data.xlsx")
28.2 How to access the data using the yrSqliteViewer.html
The yrSqliteViewer.html tool is a single file that can be downloaded to your computer that allows you to work with .sqlite files.
In order to read the data into R and to access the data using SQL, we will need to load the R packages “readr” and “sqldf”. We do so with the following commands:
This a .csv file that contains information about students and their grades. You can read the contents of the file into R with the read.csv function as shown below.
grades =read.csv("grades.csv")
Once you’ve done so, you can display the data:
Click here - grades table
sqldf("select * from grades")
id student year honors test1 test2 test3 major
1 s0100 joe fr FALSE 70 82 NA IDS
2 s0200 sue fr FALSE 80 80 79 IDS
3 s0300 anne fr TRUE 90 95 92 IDS
4 s0400 frank so TRUE 100 95 91 ACC
5 s0500 bob so FALSE 30 NA NA IDS
6 s0600 samantha so TRUE 100 70 97 IDS
7 s0700 larry ju FALSE 32 80 NA FIN
8 s0800 bill se FALSE 84 90 92 ACC
9 s0900 lucricia fr TRUE 80 100 88 IDS
10 s1000 margaret se FALSE 59 47 NA IDS
11 s1100 dalia se FALSE 85 92 93 FIN
12 s1200 zvi ju TRUE 90 98 92 ACC
13 s1300 mike ju TRUE 90 86 79 IDS
14 s1400 david se TRUE 90 87 87 IDS
15 s1500 dana so FALSE 100 93 91 FIN
A .RData file contains a copy of R’s environment variables. You can save you environment variables to an RData file with the save() function. (see ?save). Once you have an RData file, you can load those variables into your R environment with the load() function (see ?load).
This RData file contains the following different dataframe variables. When using the sqldf() function, each dataframe variable can used as though it were a Relational Database table.
customers
orders
movies
north_american_cities
as well as some others (boxOffice, buildings, employees)
Use the load function as shown below to load the data into R.
load("sqlbolt_tables-v007.RData")
Below are the contents of these tables:
Click here - grades table
sqldf("select * from grades")
id student year honors test1 test2 test3 major
1 s0100 joe fr FALSE 70 82 NA IDS
2 s0200 sue fr FALSE 80 80 79 IDS
3 s0300 anne fr TRUE 90 95 92 IDS
4 s0400 frank so TRUE 100 95 91 ACC
5 s0500 bob so FALSE 30 NA NA IDS
6 s0600 samantha so TRUE 100 70 97 IDS
7 s0700 larry ju FALSE 32 80 NA FIN
8 s0800 bill se FALSE 84 90 92 ACC
9 s0900 lucricia fr TRUE 80 100 88 IDS
10 s1000 margaret se FALSE 59 47 NA IDS
11 s1100 dalia se FALSE 85 92 93 FIN
12 s1200 zvi ju TRUE 90 98 92 ACC
13 s1300 mike ju TRUE 90 86 79 IDS
14 s1400 david se TRUE 90 87 87 IDS
15 s1500 dana so FALSE 100 93 91 FIN
Click here - movies table
sqldf("select * from movies")
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
CustomerID CustomerName ContactName Address City PostalCode Country
1 1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
2 2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constituci\xf3n 2222 M\xe9xico D.F. 5021 Mexico
3 3 Antonio Moreno Taquer\xeda Antonio Moreno Mataderos 2312 M\xe9xico D.F. 5023 Mexico
4 4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
5 5 Berglunds snabbk\xf6p Christina Berglund Berguvsv\xe4gen 8 Lule\xe5 S-958 22 Sweden
6 6 Blauer See Delikatessen Hanna Moos Forsterstr. 57 Mannheim 68306 Germany
7 7 Blondel p\xe8re et fils Fr\xe9d\xe9rique Citeaux 24, place Kl\xe9ber Strasbourg 67000 France
8 8 B\xf3lido Comidas preparadas Mart\xedn Sommer C/ Araquil, 67 Madrid 28023 Spain
9 9 Bon app' Laurence Lebihans 12, rue des Bouchers Marseille 13008 France
10 10 Bottom-Dollar Marketse Elizabeth Lincoln 23 Tsawassen Blvd. Tsawassen T2F 8M4 Canada
11 11 B's Beverages Victoria Ashworth Fauntleroy Circus London EC2 5NT UK
12 12 Cactus Comidas para llevar Patricio Simpson Cerrito 333 Buenos Aires 1010 Argentina
13 13 Centro comercial Moctezuma Francisco Chang Sierras de Granada 9993 M\xe9xico D.F. 5022 Mexico
14 14 Chop-suey Chinese Yang Wang Hauptstr. 29 Bern 3012 Switzerland
15 15 Com\xe9rcio Mineiro Pedro Afonso Av. dos Lus\xedadas, 23 S\xe3o Paulo 05432-043 Brazil
16 16 Consolidated Holdings Elizabeth Brown Berkeley Gardens 12 Brewery London WX1 6LT UK
17 17 Drachenblut Delikatessend Sven Ottlieb Walserweg 21 Aachen 52066 Germany
18 18 Du monde entier Janine Labrune 67, rue des Cinquante Otages Nantes 44000 France
19 19 Eastern Connection Ann Devon 35 King George London WX3 6FW UK
20 20 Ernst Handel Roland Mendel Kirchgasse 6 Graz 8010 Austria
21 21 Familia Arquibaldo Aria Cruz Rua Or\xf3s, 92 S\xe3o Paulo 05442-030 Brazil
22 22 FISSA Fabrica Inter. Salchichas S.A. Diego Roel C/ Moralzarzal, 86 Madrid 28034 Spain
23 23 Folies gourmandes Martine Ranc\xe9 184, chauss\xe9e de Tournai Lille 59000 France
24 24 Folk och f\xe4 HB Maria Larsson \xc5kergatan 24 Br\xe4cke S-844 67 Sweden
25 25 Frankenversand Peter Franken Berliner Platz 43 M\xfcnchen 80805 Germany
26 26 France restauration Carine Schmitt 54, rue Royale Nantes 44000 France
27 27 Franchi S.p.A. Paolo Accorti Via Monte Bianco 34 Torino 10100 Italy
28 28 Furia Bacalhau e Frutos do Mar Lino Rodriguez Jardim das rosas n. 32 Lisboa 1675 Portugal
29 29 Galer\xeda del gastr\xf3nomo Eduardo Saavedra Rambla de Catalu\xf1a, 23 Barcelona 8022 Spain
30 30 Godos Cocina T\xedpica Jos\xe9 Pedro Freyre C/ Romero, 33 Sevilla 41101 Spain
31 31 Gourmet Lanchonetes Andr\xe9 Fonseca Av. Brasil, 442 Campinas 04876-786 Brazil
32 32 Great Lakes Food Market Howard Snyder 2732 Baker Blvd. Eugene 97403 USA
33 33 GROSELLA-Restaurante Manuel Pereira 5\xaa Ave. Los Palos Grandes Caracas 1081 Venezuela
34 34 Hanari Carnes Mario Pontes Rua do Pa\xe7o, 67 Rio de Janeiro 05454-876 Brazil
35 35 HILARI\xd3N-Abastos Carlos Hern\xe1ndez Carrera 22 con Ave. Carlos Soublette #8-35 San Crist\xf3bal 5022 Venezuela
36 36 Hungry Coyote Import Store Yoshi Latimer City Center Plaza 516 Main St. Elgin 97827 USA
37 37 Hungry Owl All-Night Grocers Patricia McKenna 8 Johnstown Road Cork Ireland
38 38 Island Trading Helen Bennett Garden House Crowther Way Cowes PO31 7PJ UK
39 39 K\xf6niglich Essen Philip Cramer Maubelstr. 90 Brandenburg 14776 Germany
40 40 La corne d'abondance Daniel Tonini 67, avenue de l'Europe Versailles 78000 France
41 41 La maison d'Asie Annette Roulet 1 rue Alsace-Lorraine Toulouse 31000 France
42 42 Laughing Bacchus Wine Cellars Yoshi Tannamuri 1900 Oak St. Vancouver V3F 2K1 Canada
43 43 Lazy K Kountry Store John Steel 12 Orchestra Terrace Walla Walla 99362 USA
44 44 Lehmanns Marktstand Renate Messner Magazinweg 7 Frankfurt a.M. 60528 Germany
45 45 Let's Stop N Shop Jaime Yorres 87 Polk St. Suite 5 San Francisco 94117 USA
46 46 LILA-Supermercado Carlos Gonz\xe1lez Carrera 52 con Ave. Bol\xedvar #65-98 Llano Largo Barquisimeto 3508 Venezuela
47 47 LINO-Delicateses Felipe Izquierdo Ave. 5 de Mayo Porlamar I. de Margarita 4980 Venezuela
48 48 Lonesome Pine Restaurant Fran Wilson 89 Chiaroscuro Rd. Portland 97219 USA
49 49 Magazzini Alimentari Riuniti Giovanni Rovelli Via Ludovico il Moro 22 Bergamo 24100 Italy
50 50 Maison Dewey Catherine Dewey Rue Joseph-Bens 532 Bruxelles B-1180 Belgium
51 51 M\xe8re Paillarde Jean Fresni\xe8re 43 rue St. Laurent Montr\xe9al H1J 1C3 Canada
52 52 Morgenstern Gesundkost Alexander Feuer Heerstr. 22 Leipzig 4179 Germany
53 53 North/South Simon Crowther South House 300 Queensbridge London SW7 1RZ UK
54 54 Oc\xe9ano Atl\xe1ntico Ltda. Yvonne Moncada Ing. Gustavo Moncada 8585 Piso 20-A Buenos Aires 1010 Argentina
55 55 Old World Delicatessen Rene Phillips 2743 Bering St. Anchorage 99508 USA
56 56 Ottilies K\xe4seladen Henriette Pfalzheim Mehrheimerstr. 369 K\xf6ln 50739 Germany
57 57 Paris sp\xe9cialit\xe9s Marie Bertrand 265, boulevard Charonne Paris 75012 France
58 58 Pericles Comidas cl\xe1sicas Guillermo Fern\xe1ndez Calle Dr. Jorge Cash 321 M\xe9xico D.F. 5033 Mexico
59 59 Piccolo und mehr Georg Pipps Geislweg 14 Salzburg 5020 Austria
60 60 Princesa Isabel Vinhoss Isabel de Castro Estrada da sa\xfade n. 58 Lisboa 1756 Portugal
61 61 Que Del\xedcia Bernardo Batista Rua da Panificadora, 12 Rio de Janeiro 02389-673 Brazil
62 62 Queen Cozinha L\xfacia Carvalho Alameda dos Can\xe0rios, 891 S\xe3o Paulo 05487-020 Brazil
63 63 QUICK-Stop Horst Kloss Taucherstra\xdfe 10 Cunewalde 1307 Germany
64 64 Rancho grande Sergio Guti\xe9rrez Av. del Libertador 900 Buenos Aires 1010 Argentina
65 65 Rattlesnake Canyon Grocery Paula Wilson 2817 Milton Dr. Albuquerque 87110 USA
66 66 Reggiani Caseifici Maurizio Moroni Strada Provinciale 124 Reggio Emilia 42100 Italy
67 67 Ricardo Adocicados Janete Limeira Av. Copacabana, 267 Rio de Janeiro 02389-890 Brazil
68 68 Richter Supermarkt Michael Holz Grenzacherweg 237 Gen\xe8ve 1203 Switzerland
69 69 Romero y tomillo Alejandra Camino Gran V\xeda, 1 Madrid 28001 Spain
70 70 Sant\xe9 Gourmet Jonas Bergulfsen Erling Skakkes gate 78 Stavern 4110 Norway
71 71 Save-a-lot Markets Jose Pavarotti 187 Suffolk Ln. Boise 83720 USA
72 72 Seven Seas Imports Hari Kumar 90 Wadhurst Rd. London OX15 4NB UK
73 73 Simons bistro Jytte Petersen Vinb\xe6ltet 34 K\xf8benhavn 1734 Denmark
74 74 Sp\xe9cialit\xe9s du monde Dominique Perrier 25, rue Lauriston Paris 75016 France
75 75 Split Rail Beer & Ale Art Braunschweiger P.O. Box 555 Lander 82520 USA
76 76 Supr\xeames d\xe9lices Pascale Cartrain Boulevard Tirou, 255 Charleroi B-6000 Belgium
77 77 The Big Cheese Liz Nixon 89 Jefferson Way Suite 2 Portland 97201 USA
78 78 The Cracker Box Liu Wong 55 Grizzly Peak Rd. Butte 59801 USA
79 79 Toms Spezialit\xe4ten Karin Josephs Luisenstr. 48 M\xfcnster 44087 Germany
80 80 Tortuga Restaurante Miguel Angel Paolino Avda. Azteca 123 M\xe9xico D.F. 5033 Mexico
81 81 Tradi\xe7\xe3o Hipermercados Anabela Domingues Av. In\xeas de Castro, 414 S\xe3o Paulo 05634-030 Brazil
82 82 Trail's Head Gourmet Provisioners Helvetius Nagy 722 DaVinci Blvd. Kirkland 98034 USA
83 83 Vaffeljernet Palle Ibsen Smagsl\xf8get 45 \xc5rhus 8200 Denmark
84 84 Victuailles en stock Mary Saveley 2, rue du Commerce Lyon 69004 France
85 85 Vins et alcools Chevalier Paul Henriot 59 rue de l'Abbaye Reims 51100 France
86 86 Die Wandernde Kuh Rita M\xfcller Adenauerallee 900 Stuttgart 70563 Germany
87 87 Wartian Herkku Pirkko Koskitalo Torikatu 38 Oulu 90110 Finland
88 88 Wellington Importadora Paula Parente Rua do Mercado, 12 Resende 08737-363 Brazil
89 89 White Clover Markets Karl Jablonski 305 - 14th Ave. S. Suite 3B Seattle 98128 USA
90 90 Wilman Kala Matti Karttunen Keskuskatu 45 Helsinki 21240 Finland
91 91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-012 Poland
Click here - north_american_cities table
sqldf("select * from north_american_cities")
city country population latitude longitude
1 Guadalajara Mexico 1500800 20.65970 -103.34961
2 Toronto Canada 2795060 43.65323 -79.38318
3 Houston United States 2195914 29.76043 -95.36980
4 New York United States 8405837 40.71278 -74.00594
5 Philadelphia United States 1553165 39.95258 -75.16522
6 Havana Cuba 2106146 23.05407 -82.34519
7 Mexico City Mexico 8555500 19.43261 -99.13321
8 Phoenix United States 1513367 33.44838 -112.07404
9 Los Angeles United States 3884307 34.05223 -118.24368
10 Ecatepec de Morelos Mexico 1742000 19.60184 -99.05067
11 Montreal Canada 1717767 45.50169 -73.56726
12 Chicago United States 2718782 41.87811 -87.62980
28.4 The “Books Database”
The “books database” is a collection of several tables that contains data relating to books (i.e. “titles”), authors, publishers, etc.
You can download the data for the database in the following ways:
Click here to download all the data for the books datbase as a single “SQLite” database file.
To use this file you can do one of the following:
Install SQLite software on your computer. There are many such programs avaiable. I recommend “DB Viewer for SQLite” which is a free open source program.
This is a website that lets you upload a sqlite3 file and run SQL Select statements against the data. I created this website myself (with the help of some AI coding assistants).
PROs: you don’t need to install anything
CONs: currently only works with SQL SELECT statment but not any other types of sql statements.
a tutorial on how “relational databases” are structured
a tutorial on Entity Relationship Diagrams (ERDs)
a set of questions and answers
Once you’ve downloaded the various CSV files you can run the following commands to import the data into R.
28.4.3 Import the data (books database)
if(!require(readr)){install.packages("readr");require(readr);}# Read in the data for the books database - see the titles =read_csv("titles.csv", na="NULL", show_col_types=FALSE)authors =read_csv("authors.csv", na="NULL", show_col_types=FALSE)publishers =read_csv("publishers.csv", na="NULL", show_col_types=FALSE)title_authors =read_csv("title_authors.csv", na="NULL", show_col_types=FALSE)royalties =read_csv("royalties.csv", na="NULL", show_col_types=FALSE)
The following is an “Entity Relationship Diagram (ERD)” that describes the relationships between the tables. See the Word document linked above for more info:
title_id title_name type pub_id pages price sales pubdate
1 T01 1977! history P01 107 21.99 566 8/1/2000 0:00
2 T02 200 Years of German Humor history P03 14 19.95 9566 4/1/1998 0:00
3 T03 Ask Yor System Administrator computer P02 1226 39.95 25667 9/1/2000 0:00
4 T04 But I Did It Unconciously psychology P01 510 12.99 13001 5/31/1999 0:00
5 T05 Exchange of Platitudes psychology P01 201 6.95 201440 1/1/2001 0:00
6 T06 How About Never? biography P01 473 19.95 11320 7/31/2000 0:00
7 T07 I Blame My Mother biography P03 333 23.95 1500200 10/1/1999 0:00
8 T08 Just Wait Until After School children P01 86 10.00 4095 6/1/2001 0:00
9 T09 Kiss My Boo Boo children P01 22 13.95 5000 5/31/2002 0:00
10 T10 Not Without My Fabrerge Egg biography P05 NA NA NA <NA>
11 T11 Perhaps It's a Glandular Problem psychology P05 826 7.99 94123 11/30/2000 0:00
12 T12 Spontaneous, Not Annoying biography P05 507 12.99 100001 8/31/2000 0:00
13 T13 What Are The Civilian Applications? history P03 802 29.99 10467 5/31/1999 0:00
28.4.6 publishers table
Stores one row for each publisher.
Click here to hide/show
sqldf("select * from publishers")
pub_id pub_name city state country
1 P01 Abatis Publishers New York NY USA
2 P02 Core Dump Books San Francisco CA USA
3 P03 Schandenfreude Press Hamburg <NA> Germany
4 P04 Tneterhooks Press Berkeley CA USA
5 P05 AAA Publishing Berkeley CA USA
28.4.7 title_authors table
Each row in this table represents a relationship between an author and a book that (s)he wrote. If a book has 3 authors, there will be 3 rows in this table for that book. This table has a composite primary key (title_id, au_id). The “royalty_share” field represents the percent of the royalties that this author receives. If there is only one author for a book then this will be 1. If there are two authors for the book, the “royalty_share” could be the same for both or it could be different (e.g. 0.6 and 0.4).
au_id au_fname au_lname phone address city state zip
1 A01 Sarah Buchman 718-496-7223 75 West 205 St Bronx NY 10468
2 A02 Wendy Heydemark 303-986-7020 2922 Baseline Rd Boulder CO 80303
3 A03 Hallie Hull 415-549-4278 3800 Waldo Ave, #14F San Francisco CA 94123
4 A04 Klee Hull 415-549-4278 3800 Waldo Ave, #14F San Francisco CA 94123
5 A05 Christian Kells 212-771-4680 114 Horatio St New York NY 10014
6 A06 Harvey Kellsey 650-836-7128 390 Serra Mall Palo Alto CA 94305
7 A07 Paddy O'Furniture 941-925-0752 1442 Main St Sarasota FL 34236
28.4.9 royalties table
This table the information about royalties that the publisher will pay to the author(s) of a book. “advance” is the fixed amount that the publisher pays the author when (s)he starts to write the book. “royalty_rate” is a decimal value that contains the percent of each book’s sale price that is paid to the author(s). If there is more than one author then the royalties and advances are split between them.