Speed-efficient classification in Matlab

I have an image of size as RGB uint8(576,720,3) where I want to classify each pixel to a set of colors. I have transformed using rgb2lab from RGB to LAB space, and then removed the L layer so it is now a double(576,720,2) consisting of AB.

Now, I want to classify this to some colors that I have trained on another image, and calculated their respective AB-representations as:

Cluster 1: -17.7903  -13.1170
Cluster 2: -30.1957   40.3520
Cluster 3:  -4.4608   47.2543
Cluster 4:  46.3738   36.5225
Cluster 5:  43.3134  -17.6443
Cluster 6:  -0.9003    1.4042
Cluster 7:   7.3884   11.5584

Now, in order to classify/label each pixel to a cluster 1-7, I currently do the following (pseudo-code):

for each x
  for each y
    ab = im(x,y,2:3);
    dist = norm(ab - clusters); // norm of dist between ab and each cluster
    [~, idx] = min(dist);

However, this is terribly slow (52 seconds) because of the image resolution and that I manually loop through each x and y.

Are there some built-in functions I can use that performs the same job? There must be.

To summarize: I need a classification method that classifies pixel images to an already defined set of clusters.


Approach #1

For a N x 2 sized points/pixels array, you can avoid permute as suggested in the other solution by Luis, which could slow down things a bit, to have a kind of "permute-unrolled" version of it and also let's bsxfun work towards a 2D array instead of a 3D array, which must be better with performance.

Thus, assuming clusters to be ordered as a N x 2 sized array, you may try this other bsxfun based approach -

%// Get a's and b's
im_a = im(:,:,2);
im_b = im(:,:,3);

%// Get the minimum indices that correspond to the cluster IDs
[~,idx]  = min(bsxfun(@minus,im_a(:),clusters(:,1).').^2 + ...
idx = reshape(idx,size(im,1),[]);

Approach #2

You can try out another approach that leverages fast matrix multiplication in MATLAB and is based on this smart solution -

d = 2; %// dimension of the problem size

im23 = reshape(im(:,:,2:3),[],2);

numA = size(im23,1);
numB = size(clusters,1);

A_ext = zeros(numA,3*d);
B_ext = zeros(numB,3*d);
for id = 1:d
    A_ext(:,3*id-2:3*id) = [ones(numA,1), -2*im23(:,id), im23(:,id).^2 ];
    B_ext(:,3*id-2:3*id) = [clusters(:,id).^2 ,  clusters(:,id), ones(numB,1)];
[~, idx] = min(A_ext * B_ext',[],2); %//'
idx = reshape(idx, size(im,1),[]); %// Desired IDs

What’s going on with the matrix multiplication based distance matrix calculation?

Let us consider two matrices A and B between whom we want to calculate the distance matrix. For the sake of an easier explanation that follows next, let us consider A as 3 x 2 and B as 4 x 2 sized arrays, thus indicating that we are working with X-Y points. If we had A as N x 3 and B as M x 3 sized arrays, then those would be X-Y-Z points.

Now, if we have to manually calculate the first element of the square of distance matrix, it would look like this –

first_element = ( A(1,1) – B(1,1) )^2 + ( A(1,2) – B(1,2) )^2         

which would be –

first_element = A(1,1)^2 + B(1,1)^2 -2*A(1,1)* B(1,1)   +  ...
                A(1,2)^2 + B(1,2)^2 -2*A(1,2)* B(1,2)    … Equation  (1)

Now, according to our proposed matrix multiplication, if you check the output of A_ext and B_ext after the loop in the earlier code ends, they would look like the following –

enter image description here

enter image description here

So, if you perform matrix multiplication between A_ext and transpose of B_ext, the first element of the product would be the sum of elementwise multiplication between the first rows of A_ext and B_ext, i.e. sum of these –

enter image description here

The result would be identical to the result obtained from Equation (1) earlier. This would continue for all the elements of A against all the elements of B that are in the same column as in A. Thus, we would end up with the complete squared distance matrix. That’s all there is!!

Vectorized Variations

Vectorized variations of the matrix multiplication based distance matrix calculations are possible, though there weren't any big performance improvements seen with them. Two such variations are listed next.

Variation #1

[nA,dim] = size(A);
nB = size(B,1);

A_ext = ones(nA,dim*3);
A_ext(:,2:3:end) = -2*A;
A_ext(:,3:3:end) = A.^2;

B_ext = ones(nB,dim*3);
B_ext(:,1:3:end) = B.^2;
B_ext(:,2:3:end) = B;

distmat = A_ext * B_ext.';

Variation #2

[nA,dim] = size(A);
nB = size(B,1);

A_ext = [ones(nA*dim,1) -2*A(:) A(:).^2];
B_ext = [B(:).^2 B(:) ones(nB*dim,1)];

A_ext = reshape(permute(reshape(A_ext,nA,dim,[]),[1 3 2]),nA,[]);
B_ext = reshape(permute(reshape(B_ext,nB,dim,[]),[1 3 2]),nB,[]);

distmat = A_ext * B_ext.';

So, these could be considered as experimental versions too.

Use pdist2 (Statistics Toolbox) to compute the distances in a vectorized manner:

ab = im(:,:,2:3);                              % // get A, B components
ab = reshape(ab, [size(im,1)*size(im,2) 2]);   % // reshape into 2-column
dist = pdist2(clusters, ab);                   % // compute distances
[~, idx] = min(dist);                          % // find minimizer for each pixel
idx = reshape(idx, size(im,1), size(im,2));    % // reshape result

If you don't have the Statistics Toolbox, you can replace the third line by

dist = squeeze(sum(bsxfun(@minus, clusters, permute(ab, [3 2 1])).^2, 2));

This gives squared distance instead of distance, but for the purposes of minimizing it doesn't matter.


 ? Python: tf-idf-cosine: to find document similarity
 ? Is it possible to specify your own distance function using scikit-learn K-Means Clustering?
 ? Logo recognition in images
 ? gradient descent using python and numpy
 ? How to get most informative features for scikit-learn classifiers?
 ? How to split data into 3 sets (train, validation and test)?
 ? A simple explanation of Naive Bayes Classification
 ? What is the difference between a Generative and Discriminative Algorithm?
 ? What are advantages of Artificial Neural Networks over Support Vector Machines?
 ? What is the difference between supervised learning and unsupervised learning?