Rowley 1998: Neural network-based face detection

[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

Rowley, H.A., S. Baluja, & T. Kanade (1998), Neural Network-Based Face Detection, IEEE Transactions on PAMI, 20(1):23-38.

@Article{rowley1998,
  author = 	 {H.A. Rowley, S. Baluja, & T. Kanade},
  title = 	 {Neural network-based face detection},
  journal = 	 {IEEE Transactions on PAMI},
  year = 	 {1998},
  OPTkey = 	 {},
  OPTvolume = 	 {20},
  OPTnumber = 	 {1},
  OPTpages = 	 {23--28},
}

Author of the summary: Jim R. Davies, 2000, jim@jimdavies.org

Cite this paper for:

face detection

to get bad examples, use what the nn incorrectly identifies as a face. [p4]

neural network to arbitrate the output of other neural networks.

Purpose: To detect faces in an image.

Results: detected 90.5% of the faces, with an acceptable # of false positives. 130 complex images were used.

How it works

A filter takes a 20x20 pixel image and outputs whether or not there is a face in the image. This filter is applied to all parts of the image. The filter is also applied to each size, so that a face that takes up the entire image will also be detected. To do this subsampling makes every sub-image 20x20.

The input image is made into a pyramid with windows of different sizes. Each goes through the following process:

Lighting is corrected. This simulates ambient light.
Histogram equalized. This effectively raises contrast.
20x20 is input to the neural net.

This paper's main contribution is how the nn analyzes the image. There are three sets of receptive fields. First, 4 areas that look at 10x10 pixels each. Second, 16 that look at 5x5 each. And Third are 6 horizontal bands (20x5). These bands detect facial features.

Training the network

Another big contribution of this paper is how they got representative non-face images. They used non-face scenes, ran the nn on them. Where they falsely identified faces, use those sub-images as the non-face examples. Thus you have non-faces that one might think were faces. Pretty smart!

"For each location and scale, the number of detections within a specified neighborhood of that location can be counted. If that number is above a threshold, than that location is classified as a face." This is called "thresholding". If there is a face detected, then all other overlapping faces detected are probably errors "Overlap elimination."

To further reduce false positives, multiple networks arbitrate. [p5] Arbitration works by ANDing the two pyramids. 2 networks will likely not both say the same position and scale has a face. They also got an arbitration nn to take the output of the other nn's and decide if there really is a face. This worked about as well as the heuristics AND and OR. [p6]

The experimental, other work, and future work sections are not summarized here.

Summary author's notes:

page numbers are from the pre-print version. Add 23 to each to get the journal page number. :)

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies (jim@jimdavies.org)

Last modified: Fri Mar 3 11:12:16 EST 2000