The expression for the calculation of the moments
in terms of a sum would be written by a
mathematician in the more general form of an integral.
This covers the case for a
density function over the plane, or indeed over
for any positive integer n as well as
the discrete case. The
density function does not have to a characteristic
function, we could allow grey levels, so
fP can be supposed to be an integrable function
in
, although we shall mainly be
interested in the case n = 2. Integrating most
functions over the whole plane does not give sensible
answers, so we assume that the functions are defined only
in the unit disk,
. We can
normalise the regions of interest to us by scaling
down its size until it just fits into the unit disk.
If we write for Natural Numbers p,q
![\begin{displaymath}
% latex2html id marker 1392
\mu_{[p,q]} = \int_{D^2} x^p y^q f_A \left(\begin{array}
{c}
x \\ y \end{array} \right) \end{displaymath}](img105.gif)
This leads to the obvious question as to whether there is a better basis of functions to use in order to express the characteristic function of a shape. In particular, why not take the Discrete Fourier Transform of the image? This may be done quickly and efficiently using the
FFT or Fast Fourier Transform, and is described in Image Processing courses. Or are there better bases of functions, more suited for the shapes we get in character recognition?
A small amount of reflection on what you are doing when you do a two dimensional Fourier expansion of the characteristic function of a character suggests that this is not a particularly good way to go. Imagine doing it in one dimension: you have a set which is a couple of intervals and you have the characteristic function for the set, which looks like a rectangle, 1 over the set, 0 elsewhere. You approximate this by adding up sine and cosine waves until you get a reasonable fit. Now go to two dimensions and think of a character, say the letter /E/. You have a function which is of height 1 over the pixels in the character, and 0 outside the character. Now you want to build this function of two variables up out of sines and cosines in the x direction and in the y direction.
Closing your eyes and brooding about it quietly, leads you to the suspicion that it will be a messy business with rather a lot of terms before anything much like the /E/ emerges from the mess. Of course, the same considerations suggest using polynomials would be a mistake too, but at least it is easier to do the arithmetic for polynomials.
On the other hand, the trigonometric functions do have one advantage, they form an orthogonal set of functions, which makes the set of coefficients much more informative. So we might ask whether we can do anything to improve on the monomials.
In KKOP, page 277, the reader is shown how to use the Gram-Schmidt Orthogonalisation Process to make the sequence of basis functions
![]()
If we normalise all our shapes so that they are scaled to lie in the unit disk in the plane, and have their centroid at the origin, and if we use polar co-ordinates, we can expand any shape in the plane into the set of basis functions, indexed by the natural numbers p,q:
![]()

The result of applying the Gram-Schmidt Orthogonalisation
Process to the above basis of functions
gives us the Zernike basis,
.
Although it is fairly unpleasant to compute these by hand, some programming in a good symbolic package such as MAPLE or MATHEMATICA or MACSYMA helps. Alternatively one can get closed form expressions for the functions by reading the right papers. See Alireza Khotanzad and Jiin-Her Lu in the bibliography. The former give the formula:
![]()
| where | |
| p | is a positive integer or zero, |
| q | is a positive or negative integer, |
| subject to the constraints | |
| (p-|q|) is even and |
|
| is the distance from the origin to the pixel, | |
| is the angle between the radius line to the | |
| pixel and the x-axis, | |
| is the radial polynomial defined by: |
![\begin{displaymath}
R_{[p,q]}(\rho) = \sum_{s=0}^{(p-\vert q\vert)/2} \frac{(-1)...
...\frac{p+\vert q\vert}{2} -s)!(\frac{p-\vert q\vert}{2} -s)!}
\end{displaymath}](img117.gif)
Note that
.
Then the Zernike moment of order p with repetition q for a function f defined in the unit disk (possibly the characteristic function of a set) is
![]()
The obvious changes are made to the digitised case, replacing the integrals by sums. * denotes the complex conjugate.
Projecting down on this basis in the space
, the set of square integrable
functions defined on the interior and boundary
of the unit disk in
, or the discrete
pixelised approximation to it, gives all the
Zernike moments. They are complex numbers in
general.
We have ensured that the objects we are going to expand in this way have a representation which is shift invariant by reducing it to the centroid, and scale invariant by normalising it to lie inside the unit disk. If we observe that the expansion in terms of the basis functions expressed radially will have a complex number arising out of the angular component, then by simply taking the modulus of this number we ensure that the resulting coefficients give a representation of the shape which is rotation invariant as well. Naturally, this leads to confusing commas and sixes and nines and single quote marks. The measurement suite has rather more invariance than is good for it, but it is a simple matter to add in a few extra terms which tell you where the character is, how big it is, and which way is up.
Experimentally it is found that it is feasible
to recover almost complete information about
the
shape of a point set in
using less than
20 terms in the infinite collection of moments,
when the point set is something of complexity
comparable with a printed character. The general
use
of moments is favoured because, happily, only
a small number of moments often suffice to give
adequate approximations.