How long does it take to find a number in a list of numbers?
Jim Mahoney | cs.bennington.college | MIT License | Feb 2022
from matplotlib import pyplot as plt
import numpy as np
from searching import main, average_time, search_linear, search_binary
Which is faster, linear or binary search?
Well, the main()
routine in searching tries them both.
main()
-- searching.py -- a random array: [28, 27, 40, 30, 53, 3, 56, 43] linear search for 10 in [5, 8, 10, 20, 30] : i=2. binary search for 10 in [5, 8, 10, 20, 30] : i=2. n_trials=2, elapsed=0.0 n_trials=4, elapsed=0.4900527000427246 n_trials=8, elapsed=0.8311092853546143 time for linear n=1000000 is 0.30787081. n_trials=2, elapsed=0.0 n_trials=4, elapsed=0.0012392997741699219 n_trials=8, elapsed=0.0005674362182617188 n_trials=16, elapsed=0.0005891323089599609 n_trials=32, elapsed=0.0009806156158447266 n_trials=64, elapsed=0.0007903575897216797 n_trials=128, elapsed=0.0007679462432861328 n_trials=256, elapsed=0.0011363029479980469 n_trials=512, elapsed=0.002196788787841797 n_trials=1024, elapsed=0.0054399967193603516 n_trials=2048, elapsed=0.010576963424682617 n_trials=4096, elapsed=0.016460180282592773 n_trials=8192, elapsed=0.022704362869262695 n_trials=16384, elapsed=0.05892753601074219 n_trials=32768, elapsed=0.11140298843383789 n_trials=65536, elapsed=0.2549874782562256 n_trials=131072, elapsed=0.5946667194366455 n_trials=262144, elapsed=0.9795405864715576 time for binary n=1000000 is 0.00000742.
Clearly binary search is much faster. But what is their O() behavior? Let's do some numerical experiments, make some plots, and see.
# For these values of N, find the search times.
ns = [100, 300, 1000, 3000, 10000, 30000] # ("ns" is the plural of n.)
linear = [0] * len(ns)
binary = [0] * len(ns)
for i in range(len(ns)):
linear[i] = average_time(search_linear, ns[i])
binary[i] = average_time(search_binary, ns[i])
# Here are the linear times.
linear
[8.542288924218155e-06, 2.4722019588807598e-05, 0.00012605218216776848, 0.0003067218931391835, 0.0011086054146289825, 0.003213269170373678]
# And here are the binary times.
binary
[2.185012817790266e-06, 3.1073805075720884e-06, 4.25834150519222e-06, 3.954675776185468e-06, 5.544729901885148e-06, 5.083496944280341e-06]
# Now for the plot/
plt.figure(dpi=220, figsize=(5,3)) # dot-per-inch ; plot size in inches
plt.xlabel('N')
plt.ylabel('time')
plt.title('linear vs binary array search')
# The two sets of points :
plt.scatter(ns, linear, color="blue", label="linear")
plt.scatter(ns, binary, color="green", label="binary")
# And two curves.
# (In both cases, I've multiplied by a constant to get a reasonable scale,
# using the rightmost point to find y_rightmost / x_rightmost.
# Note that in python, array[-1] is the last value.)
# The x axis. ("xs" is the plural of x; a list of many closely spaced x values)
# Runs from the leftmost value of N to the rightmost value of N.
xs = np.linspace(ns[0], ns[-1]) # xs (plural of x), a list of many x values
# O(n) : a straight line
constant_1 = linear[-1]/ns[-1] # rightmost y/x for linear points
y_line = constant_1 * xs # y = C1 * n
plt.plot(xs, y_line, color="magenta", label="O(n)")
# O(log(n)) : np.log2 is numpy.log_base_two
constant_2 = (binary[-1]/np.log2(xs[-1])) # rightmost y/x for binary points
y_log = constant_2 * np.log2(xs) # y = C2 * log(n)
plt.plot(xs, y_log, color="cyan", label="O(log(n))")
plt.xscale('log')
plt.yscale('log')
plt.legend()
None
Are we having fun yet?