flowchart TD
U[Student using browser or app]
W[Web server program: receives requests, runs business logic, returns responses]
D[(Database server program: stores and retrieves data about courses, sections, seats, students, and enrollments)]
U -->|Step 1: request| W
W -->|Step 2: SQL query| D
D -->|Step 3: results/status| W
W -->|Step 4: response| U
27 27. “databases” and “database management systems”
A “database (DB)” is a collection of data that can be accessed by a computer programs (see below) in an accurate and efficient way.
A “Database Management System (DBMS)” is the software that is used to enable the storage and retrieval of the data in a database.
27.1 Common features of databases
Most databases and DBMS software provide the following features:
Able to be contolled by other software - Databases and DBMS software are usually used as part of a larger software system. Software that is coded in programming languages such as R, Python, Java, etc. have the ability to interact with a DBMS in order to store an retrieve data from a database.
For example, websites that manipulate a lot of data generally consist of a “web server software program” that users communicate with. The “web server software” program in turn communicates with a DBMS in order to store and retrieve data for each user. For example, facebook.com makes use of one or more databases to store and retrieve messages, friend information, advertisements, etc. The web server program retrieves the information it needs from a database and uses that information to create a web page that is then returned to the user and displayed in the browser.
Reliable and Accurate Data – Databases have built-in rules to keep data correct and consistent. They prevent mistakes like duplicate entries or missing information by ensuring that all data follows set guidelines.
Multi User Access – Databases allow many people to access and update data at the same time without causing errors or losing information. They use special features to keep everything organized, even when multiple users make changes at once.
Handles Tremendous Amounts of Data – Databases are designed to store and manage huge amounts of data efficiently. They remain fast and responsive even as the amount of data grows.
Security – Databases have security settings to control who can see or change the data.
Reliability - Databases have safeguards to ensure that changes are properly saved and that information isn’t lost, even if something goes wrong (e.g. a blackout, earthquake, etc.).
Note that Excel is not usually considered a “database”. Excel is FANTASTIC as a tool for quickly gaining insights into data. However, Excel does not have most of the features listed above that describe typical databases.
27.2 Excel vs databases: what is the difference?
Excel and relational databases both organize information into rows and columns, so they can look similar at first. But they are designed for different goals.
Excel is primarily person-first software: A human usually opens a workbook, edits cells, adds formulas, and builds charts.
Databases are primarily application-first systems: A software program (website, app, API, report process, etc.) usually sends commands to a DBMS to read and write data.
In other words, Excel is an excellent analysis and presentation tool, while a database is a better long-term system for storing and serving data reliably.
27.2.1 Quick comparison
| Topic | Excel | Relational Database (DBMS) |
|---|---|---|
| Typical use | Interactive analysis by people | Data storage/retrieval for software + teams |
| Main interface | Spreadsheet UI | SQL (Structured Query Language) + admin tools |
| Data size | Great for small-to-medium data | Designed for very large data volumes |
| Number of users editing at once | Limited and can get messy | Built for many concurrent users safely |
| Validation rules | Basic validation options (useful, but limited) | Strong constraints/rules to enforce quality |
| Structure | Workbook with sheets | Database with multiple related tables |
| How related data is organized | Often spread across multiple sheets, with manual links | Related tables are connected with keys, reducing duplication |
| Where data usually lives | Usually a file (local drive/shared folder/cloud file app) | Usually a database server (on a network or in the cloud) |
| Repeatability of data tasks | Heavily dependent on workbook formulas and manual steps | Queries/scripts can be saved and rerun consistently |
| Reliability | Easier to break accidentally | Transaction and recovery features reduce data loss |
| Security | File-level sharing controls | Fine-grained user/role permissions |
27.2.2 Why this matters in practice
Scale
Excel can become slow and difficult to manage as data grows. Databases are engineered to keep performance reasonable even with large datasets.Data quality
Excel does support validation (for example, requiring GPA to be between 0 and 4). But database systems provide stronger, centralized rules (types, keys, constraints, relationships) so all applications follow the same standards.Many people working at once
Spreadsheets can be hard to coordinate when many people update data. Databases are designed for concurrent access and include locking/transaction logic to prevent collisions and inconsistent results.Program integration
A DBMS is designed to be queried by programs using SQL. This makes databases a better fit for websites and business systems that must continuously read and update data.Relationships between topics
In many real systems, data naturally spans multiple related topics (for example: students, courses, and grades). Databases are built to connect these topics cleanly across tables, while spreadsheets often require more manual linking across tabs.Consistency over time
Spreadsheet workflows can drift over time (formula edits, copied tabs, and one-off manual fixes). Database queries and rules are easier to standardize, so teams are more likely to get the same result each time.
27.2.3 A helpful rule of thumb
- Use Excel when the goal is quick exploration, ad hoc analysis, and visualization by a person or a small team.
- Use a database when data must be shared across applications, edited by many users, protected by strong rules, and stored reliably over time.
27.3 Architecture of software that uses a database
Many modern software systems are “multi-user” systems. That means lots of users can use the same system at the same time.
Examples include:
- social media sites (e.g. Facebook)
- shopping sites (e.g. Amazon)
- school systems (e.g. a university course registration system)
- banking and payment systems
Even though these systems can look very different, they usually share a similar high-level architecture:
- A client (usually a browser or mobile app) sends a request.
- A web server program receives that request and runs business logic.
- The web server program asks a database server program (DBMS) for data.
- The DBMS reads/writes data in the database and returns results.
- The web server program sends a response back to the client.
27.3.1 One concrete example: university course registration
Suppose a student logs in and searches for open courses in Fall 2026.
- The student clicks “Search” in a browser.
- The web server receives the request (for example, with filters like department, course level, and meeting time).
- The web server sends a SQL query to the DBMS.
- The DBMS returns matching course records, seat counts, and instructor details.
- The web server formats that information and sends an HTML/JSON response to the browser.
- The student sees a list of available courses.
When the student clicks “Enroll”, another request is sent and the same architecture is used again. The DBMS updates enrollment data and the web server returns a success/failure message.
Where the numbered arrows mean:
Request from client to web server
The student submits an action in the browser/app (for example, search or enroll).Web server asks the DBMS for data/action
The web server sends a SQL command to retrieve data or update records.DBMS returns results to web server
The DBMS returns matching rows or update status (success/failure).Web server sends final response to client
The web server formats the result and returns it to the browser/app.
27.3.2 “Single computer” mental model vs. real deployments
For learning, it is perfectly fine to imagine the web server program and DBMS running on one computer. That simple model helps explain the core idea.
In real production systems, organizations often use technical strategies (that we will not cover here) to run multiple web servers and multiple database servers for scale, reliability, and fault tolerance.
So the simple model is still conceptually correct; real systems are usually larger versions of the same pattern.
27.4 Different categories of databases
There are different types of databases and Database Management System (DBMS) software. These types can be categorized by how their data is structured.
Relational databases (sometimes referred to as “SQL” databases) arrange data in “tables” that are comprised of “rows” and “columns” (similar to an R dataframe or an Excel spreadsheet).
Non-Relational databases (often referred to as “NoSQL” databases) structure data in other forms (e.g. JSON or graphs - see below)
27.5 What is a “Relational Database”
Relational databases store information in “tables” that have rows and columns of information - very similar to the way the data is arranged in an R dataframe. A single relational database is a collection of one or more tables. Each column of a relational database table (a) has a name (b) has a datatype (e.g. numeric, logical, character, etc)
There are MANY different relational database products in use today. Some very popular ones are:
- Postgres
- MySql
- MariaDb
- Microsoft SQL Server
- Oracle Relational Database Management System
- SQLite
- many, many others
27.6 What is SQL (often pronounced “sequel”)
Relational database software uses Structured Query Language (SQL) to manipulate database contents. SQL commands are categorized as follows:
Relational database software uses the Structured Query Language (SQL) (often pronounced “sequel”) to manipulate the contents of the database. Commands in the SQL language are often grouped into the following categories:
(For this class, we will focus on the first two categories listed below, i.e. DDL and DML)
| Category | Commands |
|---|---|
|
DDL (Data Definition Language) Commands Defines and modifies database structure. |
CREATE DATABASE – Creates a new database. CREATE TABLE – Creates a new table in an existing database. DROP TABLE – Removes an entire table from a database.
|
|
DML (Data Manipulation Language) Commands Manipulates data within tables. |
SELECT – retrieves data from database tables. The SELECT statement has many different options that allow the data to be retireved in very customized formats and arrangements. E.g. SELECT can be used to summarize data, retrieve only data that has specific attributes, display data in specific order, etc.INSERT – Adds new rows to a table. DELETE – Removes rows from a table. UPDATE – Modifies existing data in a table.
|
|
DCL (Data Control Language) Commands Manages user permissions. |
GRANT – Give permission to specific users to access specific data in the database.REVOKE – Remove permission from specific users to access specific info in the database.
|
|
TCL (Transaction Control Language) Commands Transactions ensure that a group of operations either all succeed or none take effect. If a failure occurs, the system ensures consistency using transaction control commands. (* see below for more info about transactions) |
|
|
(* transactions explained more fully): Most Relational databases offer “transaction control”. A transaction is a set of operations that are supposed to either all happen as a group or not happen at all. For example, when processing a money transfer between two bank accounts, the following steps must happen: Step 1: remove money from account 1 |
|
(For this class, we will mostly focus on the DDL an DML categories of commands. Most of our time will be spent studying the numerous features of the SELECT statement)
27.7 Different “flavors” of SQL
Each relational database software product has its own “flavor” of the SQL language. You can think of this as different “dialects” of the same language. For example the difference between English as it’s used in the USA vs as it is used in England.
ANSI (American National Standards Institute) publishes a SQL Standard that is generally recognized by all relational database software programs.
However, most relational database products add additional capabilities and commands to its version of SQL to differentiate it from other relational database products in the market.
In this class we will be working with the SQLite version of SQL. However, we will try to focus primarily on the standard SQL features that are included in all relational database products and not on the extensions that are provided only by SQLite. When appropriate, we will try to highlight differences between the way that SQLite and other RDBMS products work.
The ANSI standard was first introduced in 1986 and has been reissued every few years since then. The following gives a flavor of features that were introduced or “deprecated” (keep reading) in different versions of the ANSI standard. To “deprecate” a feature means to recommend that it not be used as the feature might actually be removed in a future version.
27.8 Non-Relational Databases (AKA NoSQL databases)
NoSQL (or non-relational) databases organize information in formats other than rows and columns.
SOME NoSql databases actually CAN be manipulated with the SQL language. However, the term NoSql has come to include any database that stores information in a form other than in strict rows and columns.
Some examples are shown below.
27.9 Document databases - e.g. MongoDB
MongoDb is an example of Database Management software that stores information in “JSON” notation. Other similar products include CouchDB, Firebase and Firestore. We will explore more about JSON later in the semester. However as an example, the above information could be represented in a JSON file as shown below:
[
{
"name" : "annie",
"friends" : [ "sam", "jack" ]
},
{
"name" : "jack",
"friends" : [ "annie" , "mac" , "doug" , "harry" ]
},
{
"name" : "harry",
"friends" : [ "jack" ]
},
{
"name" : "sam",
"friends" : [ "annie" ]
},
{
"name" : "mac",
"friends" : [ "jack" ]
},
{
"name" : "harry",
"friends" : [ "jack" ]
},
{
"name" : "doug",
"friends" : [ "jack" ]
}
]
27.10 Graph databases - e.g. Neo4J
Neo4J is an example of a “graph database”. These types of databases organize their information in a “graph” structure. A graph looks like a “web” of information. It is comprised of “nodes” that contain data which are connected via “edges” that represent relationships between the nodes.
For example:
graph TD
%% Position nodes to match example layout
sam([sam])
annie([annie])
mac([mac])
jack([jack])
doug([doug])
harry([harry])
%% Create bidirectional connections with straight lines
annie <-->|friend| sam
jack <-->|friend| annie
jack <-->|friend| mac
jack <-->|friend| doug
jack <-->|friend| harry
%% Black and white styling
classDef person fill:white,stroke:black,stroke-width:1px;
class sam,annie,mac,jack,doug,harry person;
%% Force specific layout
sam ~~~ annie
mac ~~~ jack ~~~ doug
jack ~~~ harry
%% Make invisible linkage lines transparent
linkStyle 5,6,7 stroke:none;
This example shows a set of people and who is friends with whom. There is a different icon for each person. “Friend arrows” indicate who is friends with whom.
For many types of information a diagram as shown above is much more intuitive and understandable than a table of information. This is just one reason why graph databases are popular.
However, this same information could be converted to the following table and stored in a relational database.
name is_friends_with
---- ---------------
sam annie
annie sam
annie jack
jack annie
jack mac
jack doug
jack harry
doug jack
harry jack
mac jack
The following is version of the graph also includes “company” nodes and relationships indicating where people work. In general, graphs can be quite complex and be used to store all sorts of information.
graph LR
%% Define nodes with type labels
sam([Person: sam])
annie([Person: annie])
mac([Person: mac])
jack([Person: jack])
doug([Person: doug])
harry([Person: harry])
Acme([Company: Acme Inc.])
MikroTek([Company: MikroTek])
%% Friend relationships (bidirectional)
annie <-->|friend| sam
jack <-->|friend| annie
jack <-->|friend| mac
jack <-->|friend| doug
jack <-->|friend| harry
%% Work relationships
sam -->|works at| Acme
annie -->|works at| Acme
mac -->|works at| MikroTek
jack -->|works at| MikroTek
doug -->|works at| MikroTek
%% Styling for nodes - white background with black borders and text
classDef person fill:white,stroke:black,stroke-width:1px,color:brown;
classDef company fill:white,stroke:black,stroke-width:2px,stroke-dasharray:5 2,color:green;
%% Apply styles
class sam,annie,mac,jack,doug,harry person;
class Acme,MikroTek company;
27.11 Other types of NoSQL databases
The are other types of NoSQL databases. The following are also popular. We will not go into these in more detail now.
The main thing that NoSQL databases have in common is that the data is NOT stored in a rectangular form (i.e. rows and columns).
- Key-Value Stores
- Store data as a collection of key-value pairs.
- Fast reads and writes, ideal for caching and session management.
- Examples: Redis, Amazon DynamoDB, Riak
- Store data as a collection of key-value pairs.
- Column-Family Stores
- Organize data into columns rather than rows, optimized for analytical queries.
- Designed for high scalability and distributed storage.
- Examples: Apache Cassandra, HBase, ScyllaDB
- Organize data into columns rather than rows, optimized for analytical queries.