|
framED9211 Tutorial 3 : Object Recognition Techniques
framED9211
Object Recognition Techniques
Contents
- Thresholding
- Edge Detection
- Clustering Methods
- Histogram based methods
- Region growing methods
- Reference links
In analysis of the given video clip, we will need to process each frame to check whether it is a blank frame or features a car in order to count accordingly. Primitively we just need to identify the presence of the car as an object to aid the counting but thereafter the frames will have to be analyzed for identification of the number plate of the fats cars.
The techniques that are used to find the objects of interest are usually referred to as segmentation techniques - segmenting the foreground from background. But the segmentation techniques are not universal but have to be adapted according to the requirements.
Here some of the most basic techniques and references to the information of use have been enlisted :
Thresholding
The input to a thresholding operation is typically a grayscale or color image. In the simplest implementation, the output is a binary image representing the segmentation. Black pixels correspond to background and white pixels correspond to foreground (or vice versa). In simple implementations, the segmentation is determined by a single parameter known as the intensity threshold. In a single pass, each pixel in the image is compared with this threshold. If the pixel's intensity is higher than the threshold, the pixel is set to, say, white in the output. If it is less than the threshold, it is set to black.
In more sophisticated implementations, multiple thresholds can be specified, so that a band of intensity values can be set to white while everything else is set to black. For color or multi-spectral images, it may be possible to set different thresholds for each color channel, and so select just those pixels within a specified cuboid in RGB space. Another common variant is to set to black all those pixels corresponding to background, but leave foreground pixels at their original color/intensity (as opposed to forcing them to white), so that that information is not lost.
A parameter
called the brightness threshold is chosen and applied to the image a[m,n] as follows:

To obtain simple binary images as output.
The central question in thresholding is: how do we choose the threshold
? While there is no universal procedure for threshold selection that is guaranteed to work on all images, there are a variety of alternatives.
Fixed threshold - One alternative is to use a threshold that is chosen independently of the image data. If it is known that one is dealing with very high-contrast images where the objects are very dark and the background is homogeneous and very light, then a constant threshold of 128 on a scale of 0 to 255 might be sufficiently accurate. By accuracy we mean that the number of falsely-classified pixels should be kept to a minimum.
Histogram-derived thresholds - In most cases the threshold is chosen from the brightness histogram of the region or image that we wish to segment. An image and its associated brightness histogram are shown in Figure 1.

(a) Image to be thresholded (b) Brightness histogram of the image
Figure 1: Pixels below the threshold (a[m,n] <
) will be labeled as object pixels; those above the threshold will be labeled as background pixels.
A variety of techniques have been devised to automatically choose a threshold starting from the gray-value histogram, {h[b] | b = 0, 1, ... , 2B-1}. Some of the most common ones are presented below. Many of these algorithms can benefit from a smoothing of the raw histogram data to remove small fluctuations but the smoothing algorithm must not shift the peak positions. This translates into a zero-phase smoothing algorithm given below where typical values for W are 3 or 5:
Isodata algorithm - The histogram is initially segmented into two parts using a starting threshold value such as
0 = 2B-1, half the maximum dynamic range. The sample mean (mf,0) of the gray values associated with the foreground pixels and the sample mean (mb,0) of the gray values associated with the background pixels are computed. A new threshold value
1 is now computed as the average of these two sample means. The process is repeated, based upon the new threshold, until the threshold value does not change any more. In formula:

Background-symmetry algorithm - This technique assumes a distinct and dominant peak for the background that is symmetric about its maximum. The technique can benefit from smoothing as described in eq. . The maximum peak (maxp) is found by searching for the maximum value in the histogram. The algorithm then searches on the non-object pixel side of that maximum to find a p% point.
In Figure 1b, where the object pixels are located to the left of the background peak at brightness 183, this means searching to the right of that peak to locate, as an example, the 95% value. At this brightness value, 5% of the pixels lie to the right (are above) that value. This occurs at brightness 216 in Figure 51b. Because of the assumed symmetry, we use as a threshold a displacement to the left of the maximum that is equal to the displacement to the right where the p% is found. For Figure 51b this means a threshold value given by 183 - (216 - 183) = 150. In formula:
This technique can be adapted easily to the case where we have light objects on a dark, dominant background. Further, it can be used if the object peak dominates and we have reason to assume that the brightness distribution around the object peak is symmetric. An additional variation on this symmetry theme is to use an estimate of the sample standard deviation (s in eq. (37)) based on one side of the dominant peak and then use a threshold based on
= maxp +/- 1.96s (at the 5% level) or
= maxp +/- 2.57s (at the 1% level). The choice of "+" or "-" depends on which direction from maxp is being defined as the object/background threshold. Should the distributions be approximately Gaussian around maxp, then the values 1.96 and 2.57 will, in fact, correspond to the 5% and 1 % level.
Triangle algorithm - A line is constructed between the maximum of the histogram at brightness bmax and the lowest value bmin = (p=0)% in the image. The distance d between the line and the histogram h[b] is computed for all values of b from b = bmin to b = bmax. The brightness value bo where the distance between h[bo] and the line is maximal is the threshold value, that is,
= bo. This technique is particularly effective when the object pixels produce a weak peak in the histogram.
Figure 2: The triangle algorithm is based on finding the value of b that gives the maximum distance d.
The three procedures described above give the values
= 139 for the Isodata algorithm,
= 150 for the background symmetry algorithm at the 5% level, and
= 152 for the triangle algorithm for the image in Figure 1a
Edge Detection
Edge detection can be used to mark the contour of objects in the frame. Region boundaries and edges are closely related, since there is often a sharp adjustment in intensity at the region boundaries. Edge detection techniques have therefore been used as the base of another segmentation technique.
One algorithm that can be followed is:
- Go through the image matrix pixel by pixel
- For every pixel, analyze each of the 8 pixels surrounding it
- Record the value of the darkest pixel, and the lightest pixel
if (darkest_pixel_value - lightest_pixel_value) > threshold)
then rewrite that pixel as 1;
else rewrite that pixel as 0;
Clustering Methods
The K-means algorithm is an iterative technique that is used to partition an image into K clusters. The basic algorithm is:
- Pick K cluster centers, either randomly or based on some heuristic
- Assign each pixel in the image to the cluster that minimizes the variance between the pixel and the cluster center
- Re-compute the cluster centers by averaging all of the pixels in the cluster
- Repeat steps 2 and 3 until convergence is attained (e.g. no pixels change clusters)
In this case, variance is the squared or absolute difference between a pixel and a cluster center. The difference is typically based on pixel color, intensity, texture, and location, or a weighted combination of these factors. K can be selected manually, randomly, or by a heuristic.
This algorithm is guaranteed to converge, but it may not return the optimal solution. The quality of the solution depends on the initial set of clusters and the value of K.
Histogram-Based Methods
Histogram-based methods are very efficient when compared to other image segmentation methods because they typically require only one pass through the pixels. In this technique, a histogram is computed from all of the pixels in the image, and the peaks and valleys in the histogram are used to locate the clusters in the image. Color or intensity can be used as the measure.
A refinement of this technique is to recursively apply the histogram-seeking method to clusters in the image in order to divide them into smaller clusters. This is repeated with smaller and smaller clusters until no more clusters are formed.
One disadvantage of the histogram-seeking method is that it may be difficult to identify significant peaks and valleys in the image. In this technique of image classification distance metric and integrated region matching are familiar.
Region Growing Methods
The first region growing method was the seeded region growing method. This method takes a set of seeds as input along with the image. The seeds mark each of the objects to be segmented. The regions are iteratively grown by comparing all unallocated neighbouring pixels to the regions. The difference between a pixel's intensity value and the region's mean, δ, is used as a measure of similarity. The pixel with the smallest difference measured this way is allocated to the respective region. This process continues until all pixels are allocated to a region.
Seeded region growing requires seeds as additional input. The segmentation results are dependent on the choice of seeds. Noise in the image can cause the seeds to be poorly placed. Unseeded region growing is a modified algorithm that doesn't require explicit seeds. It starts off with a single region A1 – the pixel chosen here does not significantly influence final segmentation. At each iteration it considers the neighbouring pixels in the same way as seeded region growing. It differs from seeded region growing in that if the minimum δ is less than a then a predefined threshold T then it is added to the respective region Aj. If not, then the pixel is considered significantly different from all current regions Ai and a new region An + 1 is created with this pixel.
Some other algorithms proposed in the various papers have been enlisted below with the references to the paper along:
- http://www.issia.cnr.it/~iesiml39/pubblica/wiamis2004.pdf
This paper proposes a motion detection algorithm based on background subtraction and shadow removing. The main idea is to implement a fast and reliable approach for motion detection, able to extract the moving objects without their own shadows, in a single-step algorithm. It is based on the correlation between regions selected from the reference image and the current one. - http://research.microsoft.com/~jiansun/papers/BgCut_Camera.pdf
This paper introduces background cut, a high quality and realtime foreground layer extraction algorithm. From a single video sequence with a moving foreground object and stationary background, our algorithm combines background subtraction, color and contrast cues to extract a foreground layer accurately and efficiently. The key idea in background cut is background contrast attenuation, which adaptively attenuates the contrasts in the background while preserving the contrasts across foreground/background boundaries. Our algorithm builds upon a key observation that the contrast (or more precisely, color image gradient) in the background is dissimilar to the contrast across foreground/background boundaries in most cases. - http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V14-41WBFSN-9&_user=779849&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000031018&_version=1&_urlVersion=0&_userid=779849&md5=71d9dd5bf8e7736e8881d87cde6f3c55
This paper presents an algorithm for segmenting and tracking moving objects in a scene. Temporal information provided by a region tracking strategy is integrated for improving frame-to-frame motion segmentation. - http://rcv.kaist.ac.kr/~shkim/document/FCV2007_shkim.pdf
- http://www.vision.ee.ethz.ch/~bleibe/papers/leibe-interleaved-bmvc03.pdf
- http://homepages.inf.ed.ac.uk/rbf/HIPR2/featops.htm
...and ofcourse there is always RoboWiki and Forum to help you.
