Thu April 7

Covid has slowed me down this week, and so I'm not finished looking at your midterm projects. I hope to get caught up soon.

My plan for today is to (1) finish our discussion of hash tables, and (2) start our next topic, trees and graphs and algorithms to work with them, which we'll be doing for the next several weeks.

I've posted a fairly open-ended hash table coding project due Monday.

Questions about anything?

hash tables

We started working on these what seems like a long time ago, before I got sick and before the midterm break. So let's start with a brief review. Look over the notes & code from last time, and describe :

a hash function
a collision handling mechanism : "within" the table or "outside" the table

I want to mention the class birthday problem, hich has a lot to do with how often we get collisions in our hash table indexing.

And I'll wave my hands at the code in code/hash_tables , which should give you some ideas of what I have in mind for the homework for monday.

graphs & trees

Starting up a new topic!

In this context, a "graph" is a bunch of points (also called nodes) connected by lines (also called edges). Think ... facebook, all your friends, all their friends, and so on. How do you find that person you sort-of remember something about?

To start with, I think it makes sense to get a feel for the landscape here, then go deeper in a few specific places. Please start reading / browsing through the references above over the next week. Our first trees & graphs assignment will be a due a week from Monday, after the hash table assignment, so there is some time to dig into this stuff first.

overview

This is a big important topic with many variations, and different people organize and explain it in different ways. Trees are often treated as their own topic, but really both trees & graphs are graphs - just a specific kind - and share many the same features.

Places to read about this stuff :

Skienna's "Algorithm Design Manual", chap 8 (graph traversal) & chap 9 (weighted graph algorithms)
- and his slides and videos
runestone pythonds, chap 8 (graphs) and chap 7 (trees)
Data Structures in Python, chapters 20 (graphs), 21 (searching graphs), 16 (trees)
Foundations of Computer Science, chapter 5 - the tree data model and chapter 9 - the graph data model
open data structures chap 12 (graphs)
graph data structures and algorithms (geeksforgeeks.org)
wikipedia graph (discrete mathematics)
wikipedia graph (abstract data type)
wikipedia graph traversal

lots of vocabulary, definitions, properties, and buzzwords :

traversal (i.e. visit everything) ; search (look for something)
depth-first, breadth-first
recursive search; pre- vs post- traversal order
topological sort
properties : directed, weights, connected, acyclic
... and more

some graph search classic problems

topics to read about:

what is a "graph", and how can one be stored in a computer?
- as nodes with links (or pointers) to other nodes
- all in memory (static) or generated dynamically (like chess game positions)
- with an adjacency matrix
- or more often an adjacency list
how can we search a graph?
- breadth-first-search (with a fringe and a queue)
- depth-first-search (with a fringe and a stack)

trees are a subset of graphs ... which start simple and get complicated fast :

recursive search is an intuitive approach (maybe with backtracking)
"balanced" trees are important for O(log n) collections - databases, filesystems
includes game search : tic-tac-toe, solitaire puzzles, ...
some classic tree data structures :

some code examples :

my starting graph examples
my recursion examples

I want us to look at graphs first, and then trees later.

The topics to focus on are :

how to store a graph in a computer, particularly the "adjacency list" or similar "object with neighbors" approach (a generalization of the linked list we did before)
how to search the graph for something, making sure to visit each node

some specifics

what we've done

A linear structure :

A -- B  -- C -- D

can be stored as either

['A', 'B', 'C', 'D']       # a vanilla list


Node('A') -- .next --->   Node('B')      # a linked list

a = Node('A')
b = Node('B')
a.next = b     
# or a.next='B' , along with a way to get from 'B' to b

Searching one of these is straightforward : we start at the beginning and walk through it.

new : graph

A -- B
|    |
C -- D -- E

How do we store this ?

 I) "Adjacency matrix"                                   A B C D E
 Each node gets a number 0,1,2,3,...                  A  . 1 1 0 0
 Put information about possible edge (i,j)            B  1 . 0 1 0
 into that (row,column) of the matrix                 C  1 0 . 1 0
 Good for dense graphs (i.e. lots of numbers          D  0 1 1 . 1
 If N nodes, space is O(N*N).                         E  0 0 0 1 .

 II) "Adjacency list"
 For each node, store its neighbors in a list
 ... so a list of lists, essentially.
 If we number nodes  (A,B,C,D,E) as (0,1,2,3,4) then
 graph = [ [1,2],    # A neighbors : A-B, A-C
           [1,3],    # B neighbors : B-A, B-D
           [0,3],    # C neighbors : C-A, C-D
           [1,2,4],  # D neighbors : D-B, D-C, D-E
           [3]       # E neighbors : E-D
 ]
 Or in python use a dictionary of dictionaries,
 including "weight" of each edge (all weight=1 here)
 { 'A' : {'B':1, 'C':1},
   'B' : {'A':1, 'D':1},
   etc }
 Or use node pointers or references to objects in list or dict,
 rather than just name of node.

 III) node objects with .neighbor property or method
      which give either names or reference to other objects
      ... like what we did with the linked lists.
 a = Node()
 b = Node()
 c = Node()
 a.neighbors = [b, c]   # or ['B', 'C']

How do we search (or "traverse") through a structure like this efficiently, visiting each node once?

Good question. First approach : recursive depth first search.

This is a fundamental starting point for many other graph and tree algorithms.

# See https://en.wikipedia.org/wiki/Depth-first_search
define search(graph, node):
  Mark node as visited.
  Do any other processing: print it, search for goal, etc.
  For each neighbor of node:
    if neighbor is not visited:
      search(graph, neighbor)

class coding exercise

Depending on available time and your preferences, either all of us or in breakout groups or individually:

Write a python program that
(a) includes a representation of a graph, for example the one in the diagram above
(b) implement a recursive depth-first search that visits each node and prints it's name.

We'll either finish this today or continue on Monday, along with alternatives to this approach.