word search

Feb 25 2021

The problem is to take a list of n words, and to search for some given word.

What does that plot tell us?

Well, there isn't any obvious trend here. What's going on is that most of them time is spent in opening up and reading the (large) file, which we're currently doing every time, no matter how many words we want.

This is O(1), in other words about the same amount of time regardless of n.

TODO in class

Python review/reminder :

Notebook review/reminder :

How to turn in one of these notebooks as homework :

First, write a function which searches a list of n words for a given word, by starting at i=0, looking at each word[i], and incrementing i.

Then measure how long that takes to run for different values of n.

Discuss : "worst case", "best case", "average case"

Do timing for both (a) can find word, (b) can't find word.

Make a plot of (time, n).

What O() behavior is this? Why?

Second, repeat all of that with a binary search algorithm.

This only words if the list is sorted. For now we'll just sort the list with Python's built-in stuff, and not look at that time. If we're given a sorted list, we could use a binary search to look things up, for example looking up names in a phone book.

Look up "binary search" and explain the algorithm.

Do timing and make the plot again. What O() behavior is this? Why?

Discuss number of times through loop if n=128, 64, 32, ...

hash table

Third, repeat all of that with a python built-in dictionary.

We won't (for now) discuss how the dictionary works, just plop all the data into the dictionary initially. Then look up words and do the timing again. Make a plot. What O() is this?

Then make a plot which all three cases on one graph.

Discuss advantages and disadvanages to each approach.


We started here in class


linear search : O(n)

First option: look at each word in the list until we find the one we want. Time is proportional to the number of words.

binary search : O(log n)

If the list of words is already sorted, we can search much faster.

Hash Table : O(1)

And finally, if we put the words into a hash table - fairly expensive in terms of memory - we can do the lookup even faster, in essentially the same time regardless of n.