This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Image Category Classification Using Bag of Features

This example shows how to use a bag of features approach for image category classification. This technique is also often referred to as bag of words. Visual image categorization is a process of assigning a category label to an image under test. Categories may contain images representing just about anything, for example, dogs, cats, trains, boats.

Download Caltech101 Image Set

To learn about bag of features image category classification, you will first download a suitable image data set. One of the most widely cited and used data sets is Caltech 101, collected by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato.

% Location of the compressed data set
url = '';
% Store the output in a temporary folder
outputFolder = fullfile(tempdir, 'caltech101'); % define output folder

Note that downloading the set from the web can take a very long time depending on your Internet connection. It can be 30 minutes or more since the set contains 126MB of data. The commands below will block MATLAB for that period of time. Alternatively, you can use your web browser to first download the set to your local disk. If you choose that route, re-point the 'url' variable above to the file that you downloaded.

if ~exist(outputFolder, 'dir') % download only once
    disp('Downloading 126MB Caltech101 data set...');
    untar(url, outputFolder);

Load Image Sets

Instead of operating on the entire Caltech 101 set, which can be time consuming, use three categories: airplanes, ferry, and laptop. Note that for the bag of features approach to be effective, majority of each image's area must be occupied by the subject of the category, for example, an object or a type of scene.

rootFolder = fullfile(outputFolder, '101_ObjectCategories');

Construct an ImageDatastore based on the following categories from Caltech 101: 'airplanes', 'ferry', 'laptop'. Use imageDatastore to help you manage the data. Since imageDatastore operates on image file locations, and therefore does not load all the images into memory, it is safe to use on large image collections.

categories = {'airplanes', 'ferry', 'laptop'};
imds = imageDatastore(fullfile(rootFolder, categories), 'LabelSource', 'foldernames');

You can easily inspect the number of images per category as well as category labels as shown below:

tbl = countEachLabel(imds)
tbl=3×2 table
      Label      Count
    _________    _____

    airplanes     800 
    ferry          67 
    laptop         81 

Note that the labels were derived from directory names used to construct the ImageDatastore, but can be customized by manually setting the Labels property of the ImageDatastore object.

Prepare Training and Validation Image Sets

Since imds above contains an unequal number of images per category, let's first adjust it, so that the number of images in the training set is balanced.

minSetCount = min(tbl{:,2}); % determine the smallest amount of images in a category

% Use splitEachLabel method to trim the set.
imds = splitEachLabel(imds, minSetCount, 'randomize');

% Notice that each set now has exactly the same number of images.
ans=3×2 table
      Label      Count
    _________    _____

    airplanes     67  
    ferry         67  
    laptop        67  

Separate the sets into training and validation data. Pick 30% of images from each set for the training data and the remainder, 70%, for the validation data. Randomize the split to avoid biasing the results.

[trainingSet, validationSet] = splitEachLabel(imds, 0.3, 'randomize');

The above call returns two imageDatastore objects ready for training and validation tasks. Below, you can see example images from the three categories included in the training data.

% Find the first instance of an image for each category
airplanes = find(trainingSet.Labels == 'airplanes', 1);
ferry = find(trainingSet.Labels == 'ferry', 1);
laptop = find(trainingSet.Labels == 'laptop', 1);

% figure


Create a Visual Vocabulary and Train an Image Category Classifier

Bag of words is a technique adapted to computer vision from the world of natural language processing. Since images do not actually contain discrete words, we first construct a "vocabulary" of SURF features representative of each image category.

This is accomplished with a single call to bagOfFeatures function, which:

  1. extracts SURF features from all images in all image categories

  2. constructs the visual vocabulary by reducing the number of features through quantization of feature space using K-means clustering

bag = bagOfFeatures(trainingSet);
Creating Bag-Of-Features.
* Image category 1: airplanes
* Image category 2: ferry
* Image category 3: laptop
* Selecting feature point locations using the Grid method.
* Extracting SURF features from the selected feature point locations.
** The GridStep is [8 8] and the BlockWidth is [32 64 96 128].

* Extracting features from 60 images...done. Extracted 247480 features.

* Keeping 80 percent of the strongest features from each category.

* Balancing the number of features across all image categories to improve clustering.
** Image category 2 has the least number of strongest features: 58003.
** Using the strongest 58003 features from each of the other image categories.

* Using K-Means clustering to create a 500 word visual vocabulary.
* Number of features          : 174009
* Number of clusters (K)      : 500

* Initializing cluster centers...100.00%.
* Clustering...completed 31/100 iterations (~3.14 seconds/iteration)...converged in 31 iterations.

* Finished creating Bag-Of-Features

Additionally, the bagOfFeatures object provides an encode method for counting the visual word occurrences in an image. It produced a histogram that becomes a new and reduced representation of an image.

img = readimage(imds, 1);
featureVector = encode(bag, img);

% Plot the histogram of visual word occurrences
title('Visual word occurrences')
xlabel('Visual word index')
ylabel('Frequency of occurrence')

This histogram forms a basis for training a classifier and for the actual image classification. In essence, it encodes an image into a feature vector.

Encoded training images from each category are fed into a classifier training process invoked by the trainImageCategoryClassifier function. Note that this function relies on the multiclass linear SVM classifier from the Statistics and Machine Learning Toolbox™.

categoryClassifier = trainImageCategoryClassifier(trainingSet, bag);
Training an image category classifier for 3 categories.
* Category 1: airplanes
* Category 2: ferry
* Category 3: laptop

* Encoding features for 60 images...done.

* Finished training the category classifier. Use evaluate to test the classifier on a test set.

The above function utilizes the encode method of the input bag object to formulate feature vectors representing each image category from the trainingSet.

Evaluate Classifier Performance

Now that we have a trained classifier, categoryClassifier, let's evaluate it. As a sanity check, let's first test it with the training set, which should produce near perfect confusion matrix, i.e. ones on the diagonal.

confMatrix = evaluate(categoryClassifier, trainingSet);
Evaluating image category classifier for 3 categories.

* Category 1: airplanes
* Category 2: ferry
* Category 3: laptop

* Evaluating 60 images...done.

* Finished evaluating all the test sets.

* The confusion matrix for this test set is:

KNOWN        | airplanes   ferry   laptop   
airplanes    | 0.95        0.05    0.00     
ferry        | 0.00        1.00    0.00     
laptop       | 0.05        0.00    0.95     

* Average Accuracy is 0.97.

Next, let's evaluate the classifier on the validationSet, which was not used during the training. By default, the evaluate function returns the confusion matrix, which is a good initial indicator of how well the classifier is performing.

confMatrix = evaluate(categoryClassifier, validationSet);
Evaluating image category classifier for 3 categories.

* Category 1: airplanes
* Category 2: ferry
* Category 3: laptop

* Evaluating 141 images...done.

* Finished evaluating all the test sets.

* The confusion matrix for this test set is:

KNOWN        | airplanes   ferry   laptop   
airplanes    | 0.81        0.13    0.06     
ferry        | 0.02        0.98    0.00     
laptop       | 0.00        0.00    1.00     

* Average Accuracy is 0.93.
% Compute average accuracy

Additional statistics can be derived using the rest of arguments returned by the evaluate function. See help for imageCategoryClassifier/evaluate. You can tweak the various parameters and continue evaluating the trained classifier until you are satisfied with the results.

Try the Newly Trained Classifier on Test Images

You can now apply the newly trained classifier to categorize new images.

img = imread(fullfile(rootFolder, 'airplanes', 'image_0690.jpg'));
[labelIdx, scores] = predict(categoryClassifier, img);

% Display the string label
ans = 1x1 cell array