Friday, July 6, 2007

Matlab day

Well, yesterday after Larry Ewing, my mentor, helped me out with the scaling problem I had, we reached a point where we have the detections located correctly in f-spot, screenshot here.

Today I'm working on the Matlab code of the algorithm, trying to glue something, but I don't feel like I get the whole thing properly. I'll try to pseudocode the algorithm for people to comment on.
First, a bit abstract description.
The algorithm uses a cascade of classifiers, where a classifier is described by:
* a rectangular feature (f)
* parity (p)
* threshold (theta)
* error (alpha)
The cascade is built of several stages, which in turn are made of a number of classifiers. The property of a classifier is that it has a very low false negative rate, so if it considers the tested region not face, then we can immediately stop processing that region and follow to the next one.
When a classifier classifies the region as a face it will be tested by the following classifiers and stages until it reaches the end of the cascade or some classifier along the way reports that it's not a face after all. That way computation time is greatly reduced for most of the regions.
Now, to be able to detect faces that are in various scales and locations Viola and Jones propose a shifting window mechanism. The detector is applied at every location and every scale. The window is shifted each time by 1 pixel. It is common that the same face is detected several times, at slightly different locations or scales - V-J propose a simple method of summing these up.
The classifier itself is a simple binary function returning true if a feature over the examined image window exceeds a given threshold. The classifier's error has to be accounted here as well.
As to the feature calculation - I found several implementations, all not very educative, maybe the best one being the one in Torch. It seems that the feature is a simple sum of the pixel intesities (of the so called integral image), although the features vary by having different weight on different regions (i.e. 1 or -1 ). If the rectangular sum exceeds the given threshold - the classifier reports a face.

I'll try to post some pseudocode later, for discussion. As for now, please point out any misunderstandings if you find them.

Edit :
I posted the Matlab code here : http://wytyczak.com/andrzej/upload/MatlabVJDetector/
I'll check for Octave compliance in a minute and post the code in http://wytyczak.com/andrzej/upload/OctaveVJDetector/



Octave is not compliant because it seems not to understand Matlab's sparse arrays,
which are used in the feature set I'm using (reusing actually).

1 comment:

Søren Hauberg said...

Just wondering: what version of Octave did you use? The 2.9.x series has quite good support for sparse matrices.

Søren