Algorithms
and
Data
Structures

Spring 2022
course
site
-->

April 25

Questions about anything?

Today's assignment was to try "by hand" some of the graph "shortest" algorithms we looked at last week: Dijkstra's & Floyd-Warshall (shortest path) and Prim's & Kruskal (min spanning tree).

We didn't spend much time on Prim or Kruskal last week - depending on interest we may do a "by hand" example today ... though since we're online and don't have the whiteboard that'll be more awkward - I can draw on my iPad and share that.

For this week, we'll look at a classic data structure and look at what it's used for: one of the most common database technologies, the "SQL" database.

SQL

This week I'd like to at least expose you to relational databases and SQL. We aren't going to do much more than see some of the basic ideas, and I want you to be starting to work on your final project ... so the assignment is fairly open ended.

SQL stands for "structured query language"; it's a programming language (yes, another one) used to manage and search relational databases. But before looking at the SQL language, I want to look at the data structure used to store the information in a typical database.

How : the "B-tree" data structure

An m-way (max number of children for children per node) balanced tree, with O(log n) cost to search, insert, or delete.

The search is similar to the binary search tree we looked at earlier this term, but each node can have more than two children. This means that the tree can have many more nodes for a given height.

B-trees are often used for databases and file systems. Besides doing all operations fairly fast (log(n)), often the higher nodes can be kept within faster memory, leaving only the bottom tier of nodes on the (slower) disk. There are also variants targeted at specific use cases. For example, a typical database is tuned with "indices" (an auxiliary b-tree that clumps some of the original data together to give a smaller tree), giving faster lookup for some of the database keys, at the cost of maintaining the index.

The short description is that B-tree is a "height balanced m-way tree", together with algorithms for search, insert, delete that ensure it remains balanced.

I'm bringing all this up at this point in our course because (a) these sorts of databases are common and important, and (b) the algorithm uses trees, which are a kind of graph, which is our most recent topic.

main points :

Here's a python search & insert code example.

And here's a "real code" implementation, in SQLite (which we'll look at more later)

Also be aware that there are other similar "balanced" trees with differing specifics: Red-Black trees, AVL trees, 2-3 trees, etc. Some courses on algorithms spend a lot of time on the details of these things. We're not going to ... though you could explore any of these for a final project.

What : SQL implementation models

concepts

talking the talk

SQL is not one language - each application (SQLite, MySQL, PostgreSQL, ...) has their own variations, though the basics are the same.

Here are some of the most important commands :

   TABLE    - define a table
   SELECT   - fetch some data
   INSERT   - add a new row of data
   UPDATE   - modify some data

A few quirks :

Combining information from several tables is where the syntax can get tricky. The options are :

There are a number of types of joins (left, right, inner, outer), differing mostly in what to do for data which is in one of the two tables but not the other.

an example - umber (this course management system)

practice

There are some nice lessons at software carpentry ; I think that lesson 1 (selecting), 7 (combining), and 10 (python) look interesting.

To get started with these on jupyter.bennington.college :

   $ wget https://swcarpentry.github.io/sql-novice-survey/files/survey.db
   $ sqlite3 survey.db
   sqlite>

I'll walk us through some what's on the resources page.


I uploaded the files that I used to explain things during class - min spanning tree , sql tables.

https://cs.bennington.college /courses /spring2022 /algorithms /notes /april25
last modified Mon April 25 2022 3:00 pm

attachments [paper clip]

  last modified size
TXT min_spanning_tree.png Mon Apr 25 2022 02:59 pm 364K
TXT sql_tables.txt Mon Apr 25 2022 03:00 pm 425B