May 2
This Thursday : I'm getting my 2nd vaccination on Wed, and so may be out of the loop the dafter after ... and so you may get the day off. I'll send out a note Wed night either way.
... start recording.
Questions about anything?
Search projects due today ... how did that go?
Final project thoughts?
This week I'd like to at least expose you to relational databases and SQL. We aren't going to
do much more than see some of the basic ideas, and I want you to be starting to work on your
final project ... so the assignment is fairly open ended.
"height balanced m-way tree"
main points :
- made up of nodes & leaf-nodes (terminal)
- each node has some number (up to n, depends on settings) of children and some number of keys , which define where to look for children
- "self balancing" : cannot turn into --- but stays spread out
- examples are usually 2-3 (two or three children) but in practical implementations may be many more (100 or so)
- the point is to keep the top parts in main memory, for fast access
- SQL databases may also have multiple b-trees for one table, as specified with an "index" command, to get fast lookup for some columns.
- splitting a node : promote a key upwards; divide into two.
- so tree depth only increases if root splits, promoting something up to new root
Here's a python search & insert code example.
What : SQL implementation models
- client/server systems (MySQL, PostgreSQL) are common parts of backend web applications
- the web app client "connects" (internet or local process communication) to the sql server
- good for scaling to large systems with distributed components
- all the complexities of authentication (who has access to what database)
- NoSQL - alternative types databases ; usually big distributed key:value stores
- SQLite
- free , open source , database=file (so not client/server)
- authentication is same as file access
- local read/write from a process ... simple but some scaling issues
SQLite
concepts
- tables linked with foreign keys
- security : insert values into queries correctly ; see "parameter substitution" and defense against the dark arts . (Don't be that xkcd Mom
- transactions : making sure that a batch of changes either all succeed or all fail
talking the talk
SQL is not one language - each application (SQLite, MySQL, PostgreSQL, ...) has their own variations,
though the basics are the same.
Here are some of the most important commands :
TABLE - define a table
SELECT - fetch some data
INSERT - add a new row of data
UPDATE - modify some data
A few quirks :
- sqlite3 syntax ignores case in commands and table names : "SELECT * FROM Person;" is the same as "select * from person;".
- sqlite3 has special "user" commands that start with a period, such as
.tables
. Most of these are for information or settings.
- All SQL statements end with a semicolon (;).
- The comment charcter is two dashes : -- this is a comment. The C syntax /* ... */ also works.
Combining information from several tables is where the syntax can get tricky. The options are :
- JOIN : connect tables using a primary=foreign key
- subquery : next a SELECT within another SELECT
There are a number of types of joins (left, right, inner, outer), differing mostly in what
to do for data which is in one of the two tables but not the other.
an example - umber (this course management system)
practice
There are some nice lessons at software carpentry ;
I think that lesson 1 (selecting), 7 (combining), and 10 (python) look interesting.
To get started with these on jupyter.bennington.college :
$ wget https://swcarpentry.github.io/sql-novice-survey/files/survey.db
$ sqlite3 survey.db
sqlite>
I'll walk us through some what's on the resources page.
For your homework for next Monday, I'd like you to try some of these and describe what you did.