14-Database

Install SQLite

Install Chrome plugin for SQLite Database

SQLite Browser

Relational Databases

http://en.wikipedia.org/wiki/Relational_database

Relational databases model data by storing

rows and columns in tables. The power of

the relational database lies in its ability to

efficiently retrieve data from those tables

and in particular where there are multiple

tables and the relationships between those

tables involved in the query.

Terminology

• Database - contains many tables

• Relation (or table) - contains tuples and attributes

• Tuple (or row) - a set of fields that generally represents an “object” like a person or a music track

• Attribute (also column or field) - one of possibly many elements of data corresponding to the object represented by the row

A relation is defined as a set of tuples that have the same attributes. A tuple usually

represents an object and information about that object. Objects are typically physical

objects or concepts. A relation is usually described as a table, which is organized into

rows and columns. All the data referenced by an attribute are in the same domain

and conform to the same constraints. (Wikipedia)

SQL

• Structured Query Language is the language we use to issue commands to the database

• Create a table

• Retrieve some data

• Insert data

• Delete data

Two Roles in Large Projects

• Application Developer - Builds the logic for the application, the look and feel of the application - monitors the application for problems

• Database Administrator - Monitors and adjusts the database as the program runs in production

• Often both people participate in the building of the “Data model”

Database Administrator

http://en.wikipedia.org/wiki/Database_administrator

A database administrator (DBA) is a person responsible for the

design, implementation, maintenance, and repair of an organization’s

database. The role includes the development and design of database

strategies, monitoring and improving database performance and

capacity, and planning for future expansion requirements. They may

also plan, coordinate, and implement security measures to safeguard

the database.

Database Model

http://en.wikipedia.org/wiki/Database_model

A database model or database schema is the structure or

format of a database, described in a formal language

supported by the database management system. In other

words, a “database model” is the application of a data

model when used in conjunction with a database

management system.

Common Database Systems

• Three major Database Management Systems in wide use

• Oracle - Large, commercial, enterprise-scale, very very tweakable

• MySql - Simpler but very fast and scalable - commercial open source

• SqlServer - Very nice - from Microsoft (also Access)

• Many other smaller projects, free and open source

• HSQL, SQLite, Postgres, ...

SQLite Browser

• SQLite is a very popular database - it is free and fast and small

• SQLite Browser allows us to directly manipulate SQLite files

• http://sqlitebrowser.org/

• SQLite is embedded in Python and a number of other languages

SQL

• Structured Query Language is the language we use to issue commands to the database

• Create a table

• Retrieve some data

• Insert data

• Delete data

Start Simple - A Single Table

CREATE TABLE Users(

name VARCHAR(128),

email VARCHAR(128)

)

SQL: Insert

• The Insert statement inserts a row into a table

INSERT INTO Users (name, email) VALUES ('Kristin', 'kf@umich.edu')

SQL: Delete

• Deletes a row in a table based on a selection criteria

DELETE FROM Users WHERE email='ted@umich.edu'

SQL: Update

• Allows the updating of a field with a where clause

UPDATE Users SET name='Charles' WHERE email='csev@umich.edu'

Retrieving Records: Select

• The select statement retrieves a group of records - you can either retrieve all the records or a subset of the records with a WHERE clause

SELECT * FROM Users

SELECT * FROM Users WHERE email='csev@umich.edu'

Sorting with ORDER BY

• You can add an ORDER BY clause to SELECT statements to get the results sorted in ascending or descending order

SELECT * FROM Users ORDER BY email

SELECT * FROM Users ORDER BY name

SQL Summary

SELECT * FROM Users

SELECT * FROM Users WHERE email='csev@umich.edu'

UPDATE Users SET name="Charles" WHERE email='csev@umich.edu'

INSERT INTO Users (name, email) VALUES ('Kristin', 'kf@umich.edu')

DELETE FROM Users WHERE email='ted@umich.edu'

SELECT * FROM Users ORDER BY email

This is not too exciting (so far)

• Tables pretty much look like big fast programmable spreadsheets with rows, columns, and commands

• The power comes when we have more than one table and we can exploit the relationships between the tables

Complex Data Models and Relationships

Database Design

• Database design is an art form of its own with particular skills and experience

• Our goal is to avoid the really bad mistakes and design clean and easily understood databases

• Others may performance tune things later

• Database design starts with a picture...

Building a Data Model

• Drawing a picture of the data objects for our application and then figuring out how to represent the objects and their relationships

• Basic Rule: Don’t put the same string data in twice - use a relationship instead

• When there is one thing in the “real world” there should be one copy of that thing in the database

For each “piece of info”...

• Is the column an object or an attribute of another object?

• Once we define objects, we need to define the relationships between objects.

Representing Relationships in a Database

Database Normalization (3NF)

• There is *tons* of database theory - way too much to understand without excessive predicate calculus

• Do not replicate data - reference data - point at data

• Use integers for keys and for references

• Add a special “key” column to each table which we will make references to. By convention, many programmers call this column “id”

http://en.wikipedia.org/wiki/Database_normalization

Key Terminology

Finding our way around....

Three Kinds of Keys

• Primary key - generally an integer autoincrement field

• Logical key - What the outside world uses for lookup

• Foreign key - generally an integer key pointing to a row in another table

Primary Key Rules

Best practices

• Never use your logical key as the primary key

• Logical keys can and do change, albeit slowly

• Relationships that are based on matching string fields are less efficient than integers

Foreign Keys

• A foreign key is when a table has a column that contains a key which points to the primary key of another table.

• When all primary keys are integers, then all foreign keys are integers - this is good - very good

Relationship Building (in tables)

CREATE TABLE Genre (

id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,

name TEXT

)

CREATE TABLE Album (

id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,

artist_id INTEGER,

title TEXT

)

CREATE TABLE Track (

id INTEGER NOT NULL PRIMARY KEY

AUTOINCREMENT UNIQUE,

title TEXT,

album_id INTEGER,

genre_id INTEGER,

len INTEGER, rating INTEGER, count INTEGER

)

Using Join Across Tables

Relational Power

• By removing the replicated data and replacing it with references to a single copy of each bit of data we build a “web” of information that the relational database can read through very quickly - even for very large amounts of data

• Often when you want some data it comes from a number of tables linked by these foreign keys

The JOIN Operation

• The JOIN operation links across several tables as part of a select operation

• You must tell the JOIN how to use the keys that make the connection between the tables using an ON clause

It can get complex...

select Track.title, Artist.name, Album.title, Genre.name

from Track join Genre join Album join Artist on Track.

genre_id = Genre.id and Track.album_id = Album.id and

Album.artist_id = Artist.id

Many-To-Many Relationships

Many to Many

• Sometimes we need to model a relationship that is many-tomany

• We need to add a "connection" table with two foreign keys

• There is usually no separate primary key

Insert Users and Courses

INSERT INTO User (name, email) VALUES ('Jane', 'jane@tsugi.org');

INSERT INTO User (name, email) VALUES ('Ed', 'ed@tsugi.org');

INSERT INTO User (name, email) VALUES ('Sue', 'sue@tsugi.org');

INSERT INTO Course (title) VALUES ('Python');

INSERT INTO Course (title) VALUES ('SQL');

INSERT INTO Course (title) VALUES ('PHP');

Complexity Enables Speed

• Complexity makes speed possible and allows you to get very fast results as the data size grows

• By normalizing the data and linking it with integer keys, the overall amount of data which the relational database must scan is far lower than if the data were simply flattened out

• It might seem like a tradeoff - spend some time designing your database so it continues to be fast when your application is a success

Additional SQL Topics

• Indexes improve access performance for things like string fields

• Constraints on data - (cannot be NULL, etc..)

• Transactions - allow SQL operations to be grouped and done as a unit

Summary

• Relational databases allow us to scale to very large amounts of data

• The key is to have one copy of any data element and use relations and joins to link the data to multiple places

• This greatly reduces the amount of data which much be scanned when doing complex operations across large amounts of data

• Database and SQL design is a bit of an art form

Python for Informatics Database Handout

Download and install FireFox and the SQLite Manager

http://www.mozilla.org/enUS/

firefox/new/

https://addons.mozilla.org/enUS/

firefox/addon/sqlitemanager/

Queries

insert into Users (name, email) values ('Ted', 'ted@umich.edu')

delete from Users where email='ted@umich.edu'

update Users set name="Charles" where email='csev@umich.edu'

select * from Users

select * from Users where email='csev@umich.edu'

select * from Users order by email

insert into Artist (name) values ('Led Zepplin')

insert into Artist (name) values ('AC/DC')

insert into Genre (name) values ('Rock')

insert into Genre (name) values ('Metal')

insert into Album (title, artist_id) values ('Who Made Who', 2)

insert into Album (title, artist_id) values ('IV', 1)

insert into Track (title, rating, len, count, album_id, genre_id) values ('Black Dog', 5, 297, 0, 2, 1)

insert into Track (title, rating, len, count, album_id, genre_id) values ('Stairway', 5, 482, 0, 2, 1)

insert into Track (title, rating, len, count, album_id, genre_id) values ('About to Rock', 5, 313, 0, 1, 2)

insert into Track (title, rating, len, count, album_id, genre_id) values ('Who Made Who', 5, 207, 0, 1, 2)

select Track.title, Genre.name from Track join Genre on Track.genre_id = Genre.id

select Track.title, Artist.name, Album.title, Genre.name from Track join Genre join Album join Artist on

Track.genre_id = Genre.id and Track.album_id = Album.id and Album.artist_id = Artist.id