This is a toy version of the DCT transform used in jpeg lossy compression.
Jim Mahoney cs.bennington.college | March 2021
from numpy import *
from matplotlib.pyplot import plot
import matplotlib.pyplot as plt
import numpy as np
# Python 3.5.2, Anaconda 4.2.0
Here are the basis vectors that we'll use for this work.
$$ \begin{align} \hat{b}_1 & = [1, 1, 1, 1] \frac{1}{2} \\ \hat{b}_2 & = [1, 0, -1, 0] \frac{1}{\sqrt{2}} \\ \hat{b}_3 & = [0, 1, 0, -1] \frac{1}{\sqrt{2}} \\ \hat{b}_4 & = [1, -1, 1, -1] \frac{1}{2} \\ \end{align} $$As seen in last week's homework, these are orthonormal: unit length and orthogonal to each other.
# And here's what they look like in python numpy.
half = 1/2
sqrt2 = 1/sqrt(2)
b = matrix([ [half, half, half, half],
[sqrt2, 0, -sqrt2, 0],
[0, sqrt2, 0, -sqrt2],
[half, -half, half, -half]
])
print(" The rows of this matrix are the basis vectors. ")
print(b)
print()
print(" The 0'th one is b[0,:] i.e.")
b0 = b[0,:]
b0
The rows of this matrix are the basis vectors. [[ 0.5 0.5 0.5 0.5 ] [ 0.70710678 0. -0.70710678 0. ] [ 0. 0.70710678 0. -0.70710678] [ 0.5 -0.5 0.5 -0.5 ]] The 0'th one is b[0,:] i.e.
matrix([[0.5, 0.5, 0.5, 0.5]])
# Here you can see the (rows, columns) dimensions of the top row.
# (The ':' here means 'all columns).
shape(b0)
(1, 4)
# As a 4 x 1 column vector this is
transpose(b0)
matrix([[0.5], [0.5], [0.5], [0.5]])
# The matrix product of a (1x4) * (4x1) is the "inner" or dot product. We've already seen that.
# Here since these are numpy matrices, they are multiplied with matrix multiplication.
b0 * transpose(b0)
matrix([[1.]])
# The matrix product of a (4x1) * (1x4) is the "outer" product.
# This is one of 16 (4*4) basis images that we will use for our four_by_four transform.
b00 = transpose(b0) * b0
print(b00)
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
# Here are all 16 of them. Here "row" and "column" are in the final array
plt.figure(figsize=(14, 14), dpi=200) # 9" x 12"
for i in range(4):
for j in range(4):
plt.subplot(4, 4, 1 + 4*i + j)
b_i = b[i, :]
b_j= b[j, :]
basis_image = transpose(b_i) * b_j
#print("{}".format(i,j))
#print(basis_image)
plt.imshow(basis_image)
plt.colorbar()
# Or if you like numbers better :
# Here are all 16 of them. Here "row" and "column" are in the final array
for i in range(4):
for j in range(4):
b_i = b[i, :]
b_j= b[j, :]
basis_image = transpose(b_i) * b_j # i.e. bb_ij in a notation I use below.
print("{}".format((i,j)))
print(basis_image)
print()
(0, 0) [[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]] (0, 1) [[ 0.35355339 0. -0.35355339 0. ] [ 0.35355339 0. -0.35355339 0. ] [ 0.35355339 0. -0.35355339 0. ] [ 0.35355339 0. -0.35355339 0. ]] (0, 2) [[ 0. 0.35355339 0. -0.35355339] [ 0. 0.35355339 0. -0.35355339] [ 0. 0.35355339 0. -0.35355339] [ 0. 0.35355339 0. -0.35355339]] (0, 3) [[ 0.25 -0.25 0.25 -0.25] [ 0.25 -0.25 0.25 -0.25] [ 0.25 -0.25 0.25 -0.25] [ 0.25 -0.25 0.25 -0.25]] (1, 0) [[ 0.35355339 0.35355339 0.35355339 0.35355339] [ 0. 0. 0. 0. ] [-0.35355339 -0.35355339 -0.35355339 -0.35355339] [ 0. 0. 0. 0. ]] (1, 1) [[ 0.5 0. -0.5 0. ] [ 0. 0. 0. 0. ] [-0.5 0. 0.5 0. ] [ 0. 0. 0. 0. ]] (1, 2) [[ 0. 0.5 0. -0.5] [ 0. 0. 0. 0. ] [ 0. -0.5 0. 0.5] [ 0. 0. 0. 0. ]] (1, 3) [[ 0.35355339 -0.35355339 0.35355339 -0.35355339] [ 0. 0. 0. 0. ] [-0.35355339 0.35355339 -0.35355339 0.35355339] [ 0. 0. 0. 0. ]] (2, 0) [[ 0. 0. 0. 0. ] [ 0.35355339 0.35355339 0.35355339 0.35355339] [ 0. 0. 0. 0. ] [-0.35355339 -0.35355339 -0.35355339 -0.35355339]] (2, 1) [[ 0. 0. 0. 0. ] [ 0.5 0. -0.5 0. ] [ 0. 0. 0. 0. ] [-0.5 0. 0.5 0. ]] (2, 2) [[ 0. 0. 0. 0. ] [ 0. 0.5 0. -0.5] [ 0. 0. 0. 0. ] [ 0. -0.5 0. 0.5]] (2, 3) [[ 0. 0. 0. 0. ] [ 0.35355339 -0.35355339 0.35355339 -0.35355339] [ 0. 0. 0. 0. ] [-0.35355339 0.35355339 -0.35355339 0.35355339]] (3, 0) [[ 0.25 0.25 0.25 0.25] [-0.25 -0.25 -0.25 -0.25] [ 0.25 0.25 0.25 0.25] [-0.25 -0.25 -0.25 -0.25]] (3, 1) [[ 0.35355339 0. -0.35355339 0. ] [-0.35355339 0. 0.35355339 0. ] [ 0.35355339 0. -0.35355339 0. ] [-0.35355339 0. 0.35355339 0. ]] (3, 2) [[ 0. 0.35355339 0. -0.35355339] [ 0. -0.35355339 0. 0.35355339] [ 0. 0.35355339 0. -0.35355339] [ 0. -0.35355339 0. 0.35355339]] (3, 3) [[ 0.25 -0.25 0.25 -0.25] [-0.25 0.25 -0.25 0.25] [ 0.25 -0.25 0.25 -0.25] [-0.25 0.25 -0.25 0.25]]
# Now let's choose an toy "image" to work with :
image = matrix([[0,0,0,0], [0, 100, 100, 0], [0, 100, 100, 0], [0,0,0,0]])
print(image)
plt.imshow(image)
plt.colorbar()
[[ 0 0 0 0] [ 0 100 100 0] [ 0 100 100 0] [ 0 0 0 0]]
<matplotlib.colorbar.Colorbar at 0x7fcc086a2a60>
The transform we have in mind is to turn this into a sum of 16 coefficients times the 16 basis images.
$$ \text{image} = c_{00} * \hat{bb}_{00} + c_{01} * \hat{bb}_{01} + ... + c_{33} * \hat{bb}_{33} $$where each of those $ \hat{b}_{ij} $ is one of the 16 images
This is essentially a 2D Fourier series, built from 4-component basis vectors.
(The DCT transform is very similar, but with 8x8 matrices.)
As we did last week, the trick is to do a dot product of the desired result (the image) with one of the basis vectors (which here are the $bb$ matrices.)
We need to be a bit careful here, because this dot product is treating the whole collection of 16 numbers as a single thing. We want to multiply corresponding terms and add, which is not matrix multiplication here, since the things that we're dotting are matrices. Essentially we want to flatten out the 16 numbers into one long vector, and dot those.
# In other words, these 16 4x4 images (the ones I did pictures of above)
# form a basis under the following dot product.
def bb(i,j):
""" Return one of the 16 basis images, 0<i<3, 0<j<3 """
return transpose(b[i, :]) * b[j, :]
def flatten(a):
""" Return a numpy 1D array from a numpy 2D matrix. """
return array(a).flatten()
def flatdot(a, b):
return dot(flatten(a), flatten(b))
# Here's the evidence that the bb's form an orthonormal bases.
def print_flatdot(i1, j1, i2, j2):
print(" flatdot(bb({},{}), bb({},{})) = {}".format(
i1, j1, i2, j2, flatdot(bb(i1,j1), bb(i2,j2))))
print("basis dot same basis => 1.0 (16 of 'em)")
print_flatdot(0,0, 0,0)
print_flatdot(1,3, 1,3)
print_flatdot(2,0, 2, 0)
print()
print("basis dot different basis => 0.0 (16*15/2 = 120 of 'em)")
print_flatdot(0,0, 0,1)
print_flatdot(1,0, 0,1)
print_flatdot(2,3, 0,2)
basis dot same basis => 1.0 (16 of 'em) flatdot(bb(0,0), bb(0,0)) = 1.0 flatdot(bb(1,3), bb(1,3)) = 0.9999999999999998 flatdot(bb(2,0), bb(2,0)) = 0.9999999999999998 basis dot different basis => 0.0 (16*15/2 = 120 of 'em) flatdot(bb(0,0), bb(0,1)) = 0.0 flatdot(bb(1,0), bb(0,1)) = 0.0 flatdot(bb(2,3), bb(0,2)) = 0.0
Now it's your turn.
# First create a 4x4 matrix to put the result.
c = matrix([[0,0,0,0],
[0,0,0,0],
[0,0,0,0],
[0,0,0,0]])
# Then for each entry, find the dot product of
# the image with the corresponding basis image.
# This gives the amount of the "projection" of
# the image in that direction.
for i in range(4):
for j in range(4):
c[i,j] = flatdot(image, bb(i,j))
# Then display it. The resulting numbers represent
# how much of each of the 16 basis images is present
# in the original image.
print(c)
plt.imshow(c)
plt.colorbar()
[[100 -70 70 0] [-70 49 -49 0] [ 70 -49 49 0] [ 0 0 0 0]]
<matplotlib.colorbar.Colorbar at 0x7fcbc87cc190>
# Undo that, just to see if we can.
# First set aside space for it.
# (Since each step adds into this, it needs to start
# as a collection of floating point numbers.)
undo_image = matrix([[0.0 ,0.0 ,0.0 ,0.0],
[0.0 ,0.0 ,0.0 ,0.0],
[0.0 ,0.0 ,0.0 ,0.0],
[0.0 ,0.0 ,0.0 ,0.0]] )
# Now add to this each coefficient of the matrix c
# as weight for each of the basis images.
for i in range(4):
for j in range(4):
undo_image += c[i,j] * bb(i,j) # matrix += scalar * matrix
# Display the "blurry image".
plt.imshow(undo_image)
plt.colorbar()
# And print out the numbers,
# with 5 decimal digits and without scientific notation.
np.set_printoptions(precision=5, suppress=True)
print(undo_image)
[[ 0.00253 0.5 0.5 0.00253] [ 0.5 98.99747 98.99747 0.5 ] [ 0.5 98.99747 98.99747 0.5 ] [ 0.00253 0.5 0.5 0.00253]]
... which except for roundoff errors is pretty good.
Now let's look at the "blurry" version; removing the highest frequency.
# truncate
c[2,2] = 0.0
plt.imshow(c)
plt.colorbar()
print(c)
[[100 -70 70 0] [-70 49 -49 0] [ 70 -49 0 0] [ 0 0 0 0]]
# Recreate new image
# First set aside space for it.
# (Since each step adds into this, it needs to start
# as a collection of floating point numbers.)
new_image = matrix([[0.0 ,0.0 ,0.0 ,0.0],
[0.0 ,0.0 ,0.0 ,0.0],
[0.0 ,0.0 ,0.0 ,0.0],
[0.0 ,0.0 ,0.0 ,0.0]] )
# Now add to this each coefficient of the matrix c
# as weight for each of the basis images.
for i in range(4):
for j in range(4):
new_image += c[i,j] * bb(i,j) # matrix += scalar * matrix
# Display the "blurry image".
plt.imshow(new_image)
plt.colorbar()
# And print out the numbers,
# with 5 decimal digits and without scientific notation.
np.set_printoptions(precision=5, suppress=True)
print(new_image)
[[ 0.00253 0.5 0.5 0.00253] [ 0.5 74.49747 98.99747 25. ] [ 0.5 98.99747 98.99747 0.5 ] [ 0.00253 25. 0.5 -24.49747]]
It's similar to the original : numbers of order 100 in the central four and numbers of (mostly) order 0 along the edge ... well, 25 isn't 0 but it is closer to 0 than 100. :)
# total power in the new and old image :
image_power = flatdot(image, image)
print("total power in old image : {}".format(image_power))
new_image_power = flatdot(new_image, new_image)
print("total power in new image : {}".format(new_image_power))
total power in old image : 40000 total power in new image : 36803.0
We've lost some of the total intensity; not unexpected since we didn't renormalize.