Home > Data Analysis > Image classification using SVMs in R

Image classification using SVMs in R

Recently I did some Support Vector Machine (SVM) tests in R (statistical language with functional parts for rapid prototyping and data analysis — somehow similar to Matlab, but open source ;)) for my current face recognition projects. To get my SVMs up and running in R, using image data as in- and output, I wrote a small demo script for classifying images. As test data I used 2 classes of images (lines from left top to right bottom and lines from left bottom to right top), with 10 samples each — like these:

ImageImageImageImageImageImage
The complete image set is available here.

For SVM classification simple train and test sets get used — for more sophisticated problems n-fold cross validation for searching good parameter settings is recommended instead. For everybody who did not yet work with SVMs, I’d recommend reading something about how to start with “good” SVM classification, like the pretty short and easy to read “A Practical Guide to Support Vector Classification” from Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin (the LIBSVM-inventors).

Update: added parallel processing using parallel and mclapply for loading image data (for purpose of demonstration only, loading 10 images in parallel does not make a big difference ;)).

print('starting svm demo...')

library('png')
library('e1071')
library('parallel')

# load img data
folder<-'.'
file_list <- dir(folder, pattern="png")
data <- mclapply(file_list, readPNG, mc.cores=2)
# extract subject id + img nr from names
subject_ids <- lapply(file_list, function(file_name) as.numeric(unlist(strsplit(file_name, "_"))[1]))
# rename subject id's to c1 and c2 for more clear displaying of results
subject_ids[subject_ids==0]='c1'
subject_ids[subject_ids!='c1']='c2'
img_ids <- lapply(file_list, function(file_name) as.numeric(unlist(strsplit(unlist(strsplit(file_name, "_"))[2], "\\."))[1]))

# specify which data should be used as test and train by the img nrs
train_test_border <- 7
# split data into train and test, and bring into array form to feed to svm
train_in <- t(array(unlist(data[img_ids < train_test_border]), dim=c(length(unlist(data[1])),sum(img_ids < train_test_border))))
train_out <- unlist(subject_ids[img_ids < train_test_border])
test_in <- t(array(unlist(data[img_ids >= train_test_border]), dim=c(length(unlist(data[1])),sum(img_ids >= train_test_border))))
test_out <- unlist(subject_ids[img_ids >= train_test_border])

# train svm - try out different kernels + settings here
svm_model <- svm(train_in, train_out, type='C', kernel='linear')

# evaluate svm
p <- predict(svm_model, train_in)
print(p)
print(table(p, train_out))
p <- predict(svm_model, test_in)
print(p)
print(table(p, test_out))

print('svm demo done!')
  1. Fabian Zentner
    May 21, 2013 at 11:44

    Hey
    I would like to know what is your train_in and train_out when you are training your svm? I would like to use a satellite image and classify just 5 classes with svm in R? Do you have any idea how to resolve that?
    Thanks in advance

    • May 21, 2013 at 13:23

      Hey Fabian,

      train_in is x and train_out is y in the in the e1071 documentation (svm part):

      x: a data matrix, a vector, or a sparse matrix (object of class Matrix provided by the Matrix package, or of class matrix.csr provided by the SparseM package, or of class simple_triplet_matrix provided by the slam package).

      y: a response vector with one label for each row/component of x. Can be either a factor (for classification tasks) or a numeric vector (for regression).

      For my example train_in therefore is a 3D matrix holding all images (as layers) with each layer holding all pixel values of the corresponding image, and train_out is simply a vector of 0 and 1.

      You can use the same approach for classifying basically any image — but keep mind that your data has to be normalized, and that you need to have enough samples. Satellite images won’t be as easy as the samples from above.

      Bg!

  2. Fabian Zentner
    May 21, 2013 at 16:29

    Thanks for your fast answer. I understand the package e1071 now better.
    I want for example to take a TIFF and classify it with samples (Areas of Interest, Points of Interests). Would this be possible?
    Thanks a lot!

    • May 29, 2013 at 12:50

      For treating tiff images you can e.g. use the R packages tiff and rtiff (depending on the type of tiff).

      Classifying for something like an “area of interest” depends on how you normalize your data. What exactly do you want to achieve? Recognizing regions containing a bright star or a group of stars inside a satellite image? If that’s the case you should start thinking about an SVM like an “discrete-answer-machine”. It’s best at telling you discrete answers like “yes” or “no” to a question it was trained for. If you want to get the x/y coordinates and width/height of a region of interest as answer, you will need other approaches too (sliding window, template scaling) to achieve feasible results.

      Bg!

  3. Karthik
    August 19, 2015 at 07:20

    Can anyone say me how to include error rate/classification rate in this program ?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: