“Common Table Expressions (CTE)” are designed to make complex queries easier to read and think about. Instead of embedding subqueries in the middle of an outer query CTEs allow you to several SELECT statements in an organized way. These SELECT statements are named and can be referred to as if they were tables inside of the main query. See below for exmaples.
(Note - these examples work but were generated by chatgpt. I should modify these to make them more interesting, but the current code works to get across the main ideas of CTEs)
The following is the database that we will be using to demo the CTEs below.
Click here to hide/show this section
34.3 The “Books Database”
The “books database” is a collection of several tables that contains data relating to books (i.e. “titles”), authors, publishers, etc.
You can download the data for the database in the following ways:
Click here to download all the data for the books datbase as a single “SQLite” database file.
To use this file you can do one of the following:
Install SQLite software on your computer. There are many such programs avaiable. I recommend “DB Viewer for SQLite” which is a free open source program.
This is a website that lets you upload a sqlite3 file and run SQL Select statements against the data. I created this website myself (with the help of some AI coding assistants).
PROs: you don’t need to install anything
CONs: currently only works with SQL SELECT statment but not any other types of sql statements.
# Read in the data for the books database - see the titles =read_csv("titles.csv", na="NULL", show_col_types=FALSE)authors =read_csv("authors.csv", na="NULL", show_col_types=FALSE)publishers =read_csv("publishers.csv", na="NULL", show_col_types=FALSE)title_authors =read_csv("title_authors.csv", na="NULL", show_col_types=FALSE)royalties =read_csv("royalties.csv", na="NULL", show_col_types=FALSE)
The following is an “Entity Relationship Diagram (ERD)” that describes the relationships between the tables. See the Word document linked above for more info:
title_id title_name type pub_id pages price
1 T01 1977! history P01 107 21.99
2 T02 200 Years of German Humor history P03 14 19.95
3 T03 Ask Yor System Administrator computer P02 1226 39.95
4 T04 But I Did It Unconciously psychology P01 510 12.99
5 T05 Exchange of Platitudes psychology P01 201 6.95
6 T06 How About Never? biography P01 473 19.95
7 T07 I Blame My Mother biography P03 333 23.95
8 T08 Just Wait Until After School children P01 86 10.00
9 T09 Kiss My Boo Boo children P01 22 13.95
10 T10 Not Without My Fabrerge Egg biography P05 NA NA
11 T11 Perhaps It's a Glandular Problem psychology P05 826 7.99
12 T12 Spontaneous, Not Annoying biography P05 507 12.99
13 T13 What Are The Civilian Applications? history P03 802 29.99
sales pubdate
1 566 8/1/2000 0:00
2 9566 4/1/1998 0:00
3 25667 9/1/2000 0:00
4 13001 5/31/1999 0:00
5 201440 1/1/2001 0:00
6 11320 7/31/2000 0:00
7 1500200 10/1/1999 0:00
8 4095 6/1/2001 0:00
9 5000 5/31/2002 0:00
10 NA <NA>
11 94123 11/30/2000 0:00
12 100001 8/31/2000 0:00
13 10467 5/31/1999 0:00
Click here - publishers table
sqldf("select * from publishers")
pub_id pub_name city state country
1 P01 Abatis Publishers New York NY USA
2 P02 Core Dump Books San Francisco CA USA
3 P03 Schandenfreude Press Hamburg <NA> Germany
4 P04 Tneterhooks Press Berkeley CA USA
5 P05 AAA Publishing Berkeley CA USA
au_id au_fname au_lname phone address city
1 A01 Sarah Buchman 718-496-7223 75 West 205 St Bronx
2 A02 Wendy Heydemark 303-986-7020 2922 Baseline Rd Boulder
3 A03 Hallie Hull 415-549-4278 3800 Waldo Ave, #14F San Francisco
4 A04 Klee Hull 415-549-4278 3800 Waldo Ave, #14F San Francisco
5 A05 Christian Kells 212-771-4680 114 Horatio St New York
6 A06 Harvey Kellsey 650-836-7128 390 Serra Mall Palo Alto
7 A07 Paddy O'Furniture 941-925-0752 1442 Main St Sarasota
state zip
1 NY 10468
2 CO 80303
3 CA 94123
4 CA 94123
5 NY 10014
6 CA 94305
7 FL 34236
Goal: Show each author’s full name, how many books they’ve written, and a list with one row per book and a total count included.
sqldf("WITH author_titles_cte AS ( SELECT a.au_id, a.au_fname || ' ' || a.au_lname AS author, t.title_name FROM authors a JOIN title_authors ta ON a.au_id = ta.au_id JOIN titles t ON ta.title_id = t.title_id),author_book_counts AS ( SELECT au_id, COUNT(*) AS book_count FROM author_titles_cte GROUP BY au_id)SELECT at.author, at.title_name, abc.book_countFROM author_titles_cte atJOIN author_book_counts abc ON at.au_id = abc.au_idORDER BY abc.book_count DESC, at.author, at.title_name")
author title_name book_count
1 Klee Hull But I Did It Unconciously 4
2 Klee Hull Exchange of Platitudes 4
3 Klee Hull I Blame My Mother 4
4 Klee Hull Perhaps It's a Glandular Problem 4
5 Wendy Heydemark How About Never? 4
6 Wendy Heydemark I Blame My Mother 4
7 Wendy Heydemark Not Without My Fabrerge Egg 4
8 Wendy Heydemark Spontaneous, Not Annoying 4
9 Harvey Kellsey Just Wait Until After School 3
10 Harvey Kellsey Kiss My Boo Boo 3
11 Harvey Kellsey Perhaps It's a Glandular Problem 3
12 Sarah Buchman 1977! 3
13 Sarah Buchman 200 Years of German Humor 3
14 Sarah Buchman What Are The Civilian Applications? 3
15 Hallie Hull But I Did It Unconciously 2
16 Hallie Hull Perhaps It's a Glandular Problem 2
17 Christian Kells Ask Yor System Administrator 1
Goal: Find authors who have written books with more than 10,000 sales and show the average sales per author.
sqldf("WITH high_selling_titles AS ( SELECT title_id, sales FROM titles WHERE sales > 10000),authors_of_hits AS ( SELECT a.au_id, a.au_fname || ' ' || a.au_lname AS author, t.sales FROM high_selling_titles h JOIN title_authors ta ON h.title_id = ta.title_id JOIN authors a ON ta.au_id = a.au_id JOIN titles t ON h.title_id = t.title_id)SELECT author, AVG(sales) AS avg_salesFROM authors_of_hitsGROUP BY authorORDER BY avg_sales DESC")
Goal: Find top-earning books and list their authors. Top-earning = (sales × price) > 500,000.
sqldf("WITH book_earnings AS ( SELECT title_id, title_name, sales * price AS revenue FROM titles),top_books AS ( SELECT title_id, title_name, revenue FROM book_earnings WHERE revenue > 500000),top_books_authors AS ( SELECT tb.title_name, tb.revenue, a.au_fname || ' ' || a.au_lname AS author FROM top_books tb JOIN title_authors ta ON tb.title_id = ta.title_id JOIN authors a ON ta.au_id = a.au_id)SELECT * FROM top_books_authorsORDER BY revenue DESC")
title_name revenue author
1 I Blame My Mother 35929790.0 Wendy Heydemark
2 I Blame My Mother 35929790.0 Klee Hull
3 Exchange of Platitudes 1400008.0 Klee Hull
4 Spontaneous, Not Annoying 1299013.0 Wendy Heydemark
5 Ask Yor System Administrator 1025396.7 Christian Kells
6 Perhaps It's a Glandular Problem 752042.8 Hallie Hull
7 Perhaps It's a Glandular Problem 752042.8 Klee Hull
8 Perhaps It's a Glandular Problem 752042.8 Harvey Kellsey
The following is an example from the books database:
Goal: Given an author, find all other authors who co-authored any book with them, directly or through a chain of co-authors.
Let’s assume we start from author A03.
sqldf(" WITH RECURSIVE coauthors_cte(au_id, coauthor_id, level) AS ( -- Anchor: Get direct co-authors of A03 SELECT ta1.au_id, ta2.au_id, 1 FROM title_authors ta1 JOIN title_authors ta2 ON ta1.title_id = ta2.title_id WHERE ta1.au_id = 'A03' AND ta2.au_id != 'A03' UNION -- Recursive: Get co-authors of the previous level SELECT c.coauthor_id, ta2.au_id, level + 1 FROM coauthors_cte c JOIN title_authors ta1 ON c.coauthor_id = ta1.au_id JOIN title_authors ta2 ON ta1.title_id = ta2.title_id WHERE ta2.au_id != c.coauthor_id ) SELECT DISTINCT coauthor_id FROM coauthors_cte ORDER BY coauthor_id;")