An important part of the problem of binding or chunking is the following: Suppose I have seen lots of squares and triangles separately, and I see three sides of a square: how do I arrange to look harder for the fourth side? After all, I should have some expectation that there is a fourth side, so would you, and so, one suspects would a moderately bright cockroach. Does this emerge from the formalism? How does one deal with the situation whereby we have two objects, one partially in front of the other and we recognise them both? This is, if you reflect upon it, a clever trick.
The case of three sides of a square where we have formerly seen all four will do to start with, since the general case of missing data is essentially the same. It will by now be apparent that these issues do not make sense except relative to a body of data, so we shall suppose that many squares and triangles of different sizes and orientations and locations have been provided to the system, and also a similar set of triangles. Now we present the system with a three sided square, that is, one with a single edge absent. The system has to decide if this is more like a square or a triangle. If it concludes that it is a square, it has to supply the missing side. A related problem is that we give the system a noisy square and require the system to clean it up.
Now the UpWrite of the three-sided square will
have a zeroth
component saying it has three elements in the
set, a centroid
consisting of six numbers, and a covariance matrix
of 21 numbers.
This supposes that we have a resolution radius
big enough to
encompass all three edges. The six centroid entries
will contain
first the average of the number of pixels in each
edge; these
numbers will be very close, squares being what
thery are. The
corresponding variance entry will therefore be
small and the
centroid entry will be the same as the length
of each side in
pixels. The next entry in the centroid wil be
the centroid of the
centres of the edges. This will not be in the
centre of the square
because one side is missing. The centroids of
the covariance matrix
terms in
will likewise be distorted from
what we might have
expected had the fourth term been added in.
Likewise, the covariance terms will be affected
by the missing edge.
If we look at the (four dimensional) manifold
of squares in
,
the six dimensional manifold of all triangles
in the same space,
and the single point representing the UpWrite
of the three-sided
square, we can ask, to which manifold is the new
point closest?
And with this comes the associated question, how
do we measure
distances? This is clearly vital here: if we give
high weight to the
first component, we observe that all triangles
have three sides,
this object has three sides, therefore this object
is a triangle. If
we giver high weight to the centroids as well,
we shall conclude
that the triangle it `is' must be equilateral.
But if we give high
weight to the covariance terms, we might complete
the square instead
of bending two sides to join the remote ends.
Similarly, if we had
used smaller neighbourhoods to get more levels
of UpWrite, we should
have been heavily influenced by the fact that
we had two right
angles in the three-sided square, and that right
angles correlate
with squares and not triangles. Which is the right
way to go?
It depends on the data. There is no answer to the dilemma of which is right, either might be reasonable, it would depend largely on what kind of data has been seen before. If there were cases of triangles which had gaps in them at one or more vertices, the case for thinking it a deformed triangle would be strengthened. If there were squares with holes in one or more of the sides, the case for a square would be strengthened. In general there is no way to tell, and both are possible.
To see how to use the extra data, say lots of squares with bits missing, note that the manifold of squares is now made rather blurred along the outlines by the noisy data of partial squares. Local descriptions of the structure of this noisy manifold will give, to second order, local quadratic forms near the centrally fitting manifold. We may use these to obtain an estimate of the metric in the region of the three-sided square. If deformations of this sort occur commonly, then the quadratic forms will have large extensions in the appropriate direction. We can think of them as giving estimates of the Riemannian Metric Tensor for a suitable metric. This solves, at least in principle, the issue of the best completion of the three-sided square. All we have to do is to find the closest and DownWrite it. Refinements using different levels of the UpWrite will occur to the reflective reader. It will be immediately apparent to such a reader, that excluding adventitious line segments to an existing square is essentially the same problem.
The occlusion problem extends this quite directly. If we have a pyramid occluding part of a cube,as in fig. 8.13 first we find the pyramid.
This is a matter of deciding to exclude some elements in a neighbourhood, as referred to dismissively in the last sentence of the preceding paragraph. (It may be a little longer to compute it than to describe it.) Next we remove the pixels belonging to the pyramid which we have by this time classified. We then take the object which is left and classify that. We discover that it may be completed into a cube by consistent and relatively small extensions at lower levels. This means we have to use the appropriate metric at the topmost level. The metric is learnt, either intrinsically from other parts of the same image or extrinsically by comparison with other images. Alternatively, we use a hierarchical probabilistic model to generate relative likelihoods of the two models given the data.