This is a toy version of the DCT transform used in jpeg lossy compression.
Jim Mahoney cs.bennington.edu | March 2021
from numpy import *
from matplotlib.pyplot import plot
import matplotlib.pyplot as plt
import numpy as np
Here are the basis vectors that we'll use for this work.
$$ \begin{align} \hat{b}_1 & = [1, 1, 1, 1] \frac{1}{2} \\ \hat{b}_2 & = [1, 0, -1, 0] \frac{1}{\sqrt{2}} \\ \hat{b}_3 & = [0, 1, 0, -1] \frac{1}{\sqrt{2}} \\ \hat{b}_4 & = [1, -1, 1, -1] \frac{1}{2} \\ \end{align} $$As seen in last week's homework, these are orthonormal: unit length and orthogonal to each other.
# And here's what they look like in python numpy.
half = 1/2
sqrt2 = 1/sqrt(2)
b = matrix([ [half, half, half, half],
[sqrt2, 0, -sqrt2, 0],
[0, sqrt2, 0, -sqrt2],
[half, -half, half, -half]
])
print(" The rows of this matrix are the basis vectors. ")
print(b)
print()
print(" The 0'th one is b[0,:] i.e.")
b0 = b[0,:]
b0
The rows of this matrix are the basis vectors. [[ 0.5 0.5 0.5 0.5 ] [ 0.70710678 0. -0.70710678 0. ] [ 0. 0.70710678 0. -0.70710678] [ 0.5 -0.5 0.5 -0.5 ]] The 0'th one is b[0,:] i.e.
matrix([[0.5, 0.5, 0.5, 0.5]])
# Here you can see the (rows, columns) dimensions of the top row.
# (The ':' here means 'all columns).
shape(b0)
(1, 4)
# As a 4 x 1 column vector this is
transpose(b0)
matrix([[0.5], [0.5], [0.5], [0.5]])
# The matrix product of a (1x4) * (4x1) is the "inner" or dot product. We've already seen that.
# Here since these are numpy matrices, they are multiplied with matrix multiplication.
b0 * transpose(b0)
matrix([[1.]])
# The matrix product of a (4x1) * (1x4) is the "outer" product.
# This is one of 16 (4*4) basis images that we will use for our four_by_four transform.
b00 = transpose(b0) * b0
print(b00)
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
# Here are all 16 of them. Here "row" and "column" are in the final array
plt.figure(figsize=(14, 14), dpi=200) # 9" x 12"
for i in range(4):
for j in range(4):
plt.subplot(4, 4, 1 + 4*i + j)
b_i = b[i, :]
b_j= b[j, :]
basis_image = transpose(b_i) * b_j
#print("{}".format(i,j))
#print(basis_image)
plt.imshow(basis_image)
plt.colorbar()
# Or if you like numbers better :
# Here are all 16 of them. Here "row" and "column" are in the final array
for i in range(4):
for j in range(4):
b_i = b[i, :]
b_j= b[j, :]
basis_image = transpose(b_i) * b_j # i.e. bb_ij in a notation I use below.
print("{}".format((i,j)))
print(basis_image)
print()
(0, 0) [[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]] (0, 1) [[ 0.35355339 0. -0.35355339 0. ] [ 0.35355339 0. -0.35355339 0. ] [ 0.35355339 0. -0.35355339 0. ] [ 0.35355339 0. -0.35355339 0. ]] (0, 2) [[ 0. 0.35355339 0. -0.35355339] [ 0. 0.35355339 0. -0.35355339] [ 0. 0.35355339 0. -0.35355339] [ 0. 0.35355339 0. -0.35355339]] (0, 3) [[ 0.25 -0.25 0.25 -0.25] [ 0.25 -0.25 0.25 -0.25] [ 0.25 -0.25 0.25 -0.25] [ 0.25 -0.25 0.25 -0.25]] (1, 0) [[ 0.35355339 0.35355339 0.35355339 0.35355339] [ 0. 0. 0. 0. ] [-0.35355339 -0.35355339 -0.35355339 -0.35355339] [ 0. 0. 0. 0. ]] (1, 1) [[ 0.5 0. -0.5 0. ] [ 0. 0. 0. 0. ] [-0.5 0. 0.5 0. ] [ 0. 0. 0. 0. ]] (1, 2) [[ 0. 0.5 0. -0.5] [ 0. 0. 0. 0. ] [ 0. -0.5 0. 0.5] [ 0. 0. 0. 0. ]] (1, 3) [[ 0.35355339 -0.35355339 0.35355339 -0.35355339] [ 0. 0. 0. 0. ] [-0.35355339 0.35355339 -0.35355339 0.35355339] [ 0. 0. 0. 0. ]] (2, 0) [[ 0. 0. 0. 0. ] [ 0.35355339 0.35355339 0.35355339 0.35355339] [ 0. 0. 0. 0. ] [-0.35355339 -0.35355339 -0.35355339 -0.35355339]] (2, 1) [[ 0. 0. 0. 0. ] [ 0.5 0. -0.5 0. ] [ 0. 0. 0. 0. ] [-0.5 0. 0.5 0. ]] (2, 2) [[ 0. 0. 0. 0. ] [ 0. 0.5 0. -0.5] [ 0. 0. 0. 0. ] [ 0. -0.5 0. 0.5]] (2, 3) [[ 0. 0. 0. 0. ] [ 0.35355339 -0.35355339 0.35355339 -0.35355339] [ 0. 0. 0. 0. ] [-0.35355339 0.35355339 -0.35355339 0.35355339]] (3, 0) [[ 0.25 0.25 0.25 0.25] [-0.25 -0.25 -0.25 -0.25] [ 0.25 0.25 0.25 0.25] [-0.25 -0.25 -0.25 -0.25]] (3, 1) [[ 0.35355339 0. -0.35355339 0. ] [-0.35355339 0. 0.35355339 0. ] [ 0.35355339 0. -0.35355339 0. ] [-0.35355339 0. 0.35355339 0. ]] (3, 2) [[ 0. 0.35355339 0. -0.35355339] [ 0. -0.35355339 0. 0.35355339] [ 0. 0.35355339 0. -0.35355339] [ 0. -0.35355339 0. 0.35355339]] (3, 3) [[ 0.25 -0.25 0.25 -0.25] [-0.25 0.25 -0.25 0.25] [ 0.25 -0.25 0.25 -0.25] [-0.25 0.25 -0.25 0.25]]
# Now let's choose an toy "image" to work with :
image = matrix([[0,0,0,0], [0, 100, 100, 0], [0, 100, 100, 0], [0,0,0,0]])
plt.imshow(image)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7fd8408fc7f0>
The transform we have in mind is to turn this into a sum of 16 coefficients times the 16 basis images.
$$ \text{image} = c_{00} * \hat{bb}_{00} + c_{01} * \hat{bb}_{01} + ... + c_{33} * \hat{bb}_{33} $$where each of those $ \hat{b}_{ij} $ is one of the 16 images
This is essentially a 2D Fourier series, built from 4-component basis vectors.
(The DCT transform is very similar, but with 8x8 matrices.)
As we did last week, the trick is to do a dot product of the desired result (the image) with one of the basis vectors (which here are the $bb$ matrices.)
We need to be a bit careful here, because this dot product is treating the whole collection of 16 numbers as a single thing. We want to multiply corresponding terms and add, which is not matrix multiplication here, since the things that we're dotting are matrices. Essentially we want to flatten out the 16 numbers into one long vector, and dot those.
# In other words, these 16 4x4 images (the ones I did pictures of above)
# form a basis under the following dot product.
def bb(i,j):
""" Return one of the 16 basis images, 0<i<3, 0<j<3 """
return transpose(b[i, :]) * b[j, :]
def flatten(a):
""" Return a numpy 1D array from a numpy 2D matrix. """
return array(a).flatten()
def flatdot(a, b):
return dot(flatten(a), flatten(b))
# Here's the evidence that the bb's form an orthonormal bases.
def print_flatdot(i1, j1, i2, j2):
print(" flatdot(bb({},{}), bb({},{})) = {}".format(
i1, j1, i2, j2, flatdot(bb(i1,j1), bb(i2,j2))))
print("basis dot same basis => 1.0 (16 of 'em)")
print_flatdot(0,0, 0,0)
print_flatdot(1,3, 1,3)
print_flatdot(2,0, 2, 0)
print()
print("basis dot different basis => 0.0 (16*15/2 = 120 of 'em)")
print_flatdot(0,0, 0,1)
print_flatdot(1,0, 0,1)
print_flatdot(2,3, 0,2)
basis dot same basis => 1.0 (16 of 'em) flatdot(bb(0,0), bb(0,0)) = 1.0 flatdot(bb(1,3), bb(1,3)) = 0.9999999999999998 flatdot(bb(2,0), bb(2,0)) = 0.9999999999999998 basis dot different basis => 0.0 (16*15/2 = 120 of 'em) flatdot(bb(0,0), bb(0,1)) = 0.0 flatdot(bb(1,0), bb(0,1)) = 0.0 flatdot(bb(2,3), bb(0,2)) = 0.0
Now it's your turn.