Questions about anything?
Today's assignment was to try "by hand" some of the graph "shortest" algorithms we looked at last week: Dijkstra's & Floyd-Warshall (shortest path) and Prim's & Kruskal (min spanning tree).
We didn't spend much time on Prim or Kruskal last week - depending on interest we may do a "by hand" example today ... though since we're online and don't have the whiteboard that'll be more awkward - I can draw on my iPad and share that.
For this week, we'll look at a classic data structure and look at what it's used for: one of the most common database technologies, the "SQL" database.
This week I'd like to at least expose you to relational databases and SQL. We aren't going to do much more than see some of the basic ideas, and I want you to be starting to work on your final project ... so the assignment is fairly open ended.
SQL stands for "structured query language"; it's a programming language (yes, another one) used to manage and search relational databases. But before looking at the SQL language, I want to look at the data structure used to store the information in a typical database.
An m-way (max number of children for children per node) balanced tree, with O(log n) cost to search, insert, or delete.
The search is similar to the binary search tree we looked at earlier this term, but each node can have more than two children. This means that the tree can have many more nodes for a given height.
B-trees are often used for databases and file systems. Besides doing all operations fairly fast (log(n)), often the higher nodes can be kept within faster memory, leaving only the bottom tier of nodes on the (slower) disk. There are also variants targeted at specific use cases. For example, a typical database is tuned with "indices" (an auxiliary b-tree that clumps some of the original data together to give a smaller tree), giving faster lookup for some of the database keys, at the cost of maintaining the index.
The short description is that B-tree is a "height balanced m-way tree", together with algorithms for search, insert, delete that ensure it remains balanced.
I'm bringing all this up at this point in our course because (a) these sorts of databases are common and important, and (b) the algorithm uses trees, which are a kind of graph, which is our most recent topic.
main points :
Here's a python search & insert code example.
And here's a "real code" implementation, in SQLite (which we'll look at more later)
Also be aware that there are other similar "balanced" trees with differing specifics: Red-Black trees, AVL trees, 2-3 trees, etc. Some courses on algorithms spend a lot of time on the details of these things. We're not going to ... though you could explore any of these for a final project.
SQL is not one language - each application (SQLite, MySQL, PostgreSQL, ...) has their own variations, though the basics are the same.
Here are some of the most important commands :
TABLE - define a table
SELECT - fetch some data
INSERT - add a new row of data
UPDATE - modify some data
A few quirks :
.tables
. Most of these are for information or settings.Combining information from several tables is where the syntax can get tricky. The options are :
There are a number of types of joins (left, right, inner, outer), differing mostly in what to do for data which is in one of the two tables but not the other.
There are some nice lessons at software carpentry ; I think that lesson 1 (selecting), 7 (combining), and 10 (python) look interesting.
To get started with these on jupyter.bennington.college :
$ wget https://swcarpentry.github.io/sql-novice-survey/files/survey.db
$ sqlite3 survey.db
sqlite>
I'll walk us through some what's on the resources page.
I uploaded the files that I used to explain things during class - min spanning tree , sql tables.
last modified | size | ||
min_spanning_tree.png | Mon Apr 25 2022 02:59 pm | 364K | |
sql_tables.txt | Mon Apr 25 2022 03:00 pm | 425B |