Posts Tagged ‘svm’

SVM classification example with performance measures using R caret

The caret package (short for Classification And REgression Training)

The caret package (short for Classification And REgression Training)

This example is a followup of hyperparameter tuning using the e1071 package in R. This time we’re using the SVM implementation from the R caret package, a binary class classification problem and some extended features that come in handy for many classification problems. For an easy start with caret take a look at one of the many presentations and intros to caret (like this one by Max Kuhn, maintainer of caret).

Desirable features

  • Hyperparameter tuning using grid search
  • Multicore processing for speedup
  • Weighting of samples to compensate for unequal class sizes (not possible with all classifiers, but possible with SVM)
  • Classifier outputting not only predicted classes but prediction probabilities. These can be used for extended performance measures (e.g. ROC curves)


We use 2 out of the 3 classes from the standard R iris dataset (the versicolor and virginica classes). We further restrict ourselves to use only the first two features (Sepal.Length and  Sepal.Width) to make well performing classification a bit more difficult. If we would use all features classification would be perfect, therefore performance measures would be rather boring.

> head(iris[order(iris$Species, decreasing = T),])

    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
101          6.3         3.3          6.0         2.5 virginica
102          5.8         2.7          5.1         1.9 virginica
103          7.1         3.0          5.9         2.1 virginica
104          6.3         2.9          5.6         1.8 virginica
105          6.5         3.0          5.8         2.2 virginica
106          7.6         3.0          6.6         2.1 virginica


We first split our data into a training and a test partition. Training data will be used for cross validation (CV) hyperparameter tuning and determining+building a model from the best hyperparameters. Test data will be used exclusively to do performance measures. Second, we create weights for our data samples so that each sample is weighted according to it’s corresponding class size (compensates for smaller and bigger class sizes). Samples of bigger classes will be assigned smaller weights and vice versa. As in our problem classes are exactly of equal size, weights will be equal for all samples too.

Next we search for best hyperparameters and create the according model using caret train. For preprocessing we let train auto scale and center our training data. There exist many other, well known and well working approaches for preprocessing, even implemented in caret (e.g. using PCA or IDA) – but we leave them out for the moment. For the parameter grid to be searched we use exponential parameter series. This way the level of detail gets adjusted along with the parameter size (be careful where to arrange for more and less details) and huge parameter spaces can be covered (with coarse detail) with the first run already. For cross validation we use 10 partitions (performance measures internally of caret will be computed as average partition performance). We allow parallel processing – and request train to create the classification model so that it computes prediction probabilities along with predicted classes.

After having found our model parameters and having created our model we let it predict our test data classes. We do that a) with predicting classes to obtain a confusion matrix and b) with prediction probabilities to create a ROC curve. For the ROC curve we have to define one of our classes as the positive class (we choose versicolor here) and the other (virginica) as the negative class. Finally, we extract the ROC curve points of equal error rate (EER), maximum accuracy, maximum kappa and minimum squared error rate (MSER). They all together give you an idea of how well your model is performing on new data.

Script code

# Example of SVM hyperparameter tuning and performance measures for a binary class problem with the caret package
# Rainhard Findling
# 2014 / 07

# optional - enable multicore processing

# create binary problem: only use two classes
x <- iris[,1:2][iris$Species %in% c('versicolor', 'virginica'),]
y <- factor(iris[,5][iris$Species %in% c('versicolor', 'virginica')])

# we see that data will probably not be perfectly seperable using linear separation
featurePlot(x = x, y = y)

# split into train and test data. train = cv parameter grid search and model creation, test = performance analysis
indexes_y_test <- createDataPartition(y = 1:length(y), times = 1, p = 0.3)[[1]]

# creation of weights - also fast for very big datasets
weights <- as.numeric(y[-indexes_y_test])
for(val in unique(weights)) {weights[weights==val]=1/sum(weights==val)*length(weights)/2} # normalized to sum to length(samples)

model <- train(method = 'svmLinear', 
           x = x[-indexes_y_test,], 
           y = y[-indexes_y_test], 
           weights = weights,
           maximize = T,
           tuneGrid = expand.grid(.C=3^(-15:15)),   
           preProcess = c('center', 'scale'),
           trControl = trainControl(method = 'cv', # cross validation
                                    number = 10,   # nr of cv sets
#                                     repeats = 5, # use with method=repeatcv
                                    returnResamp = 'none', # return accuracy per cv partition and parameter setting
                                    classProbs = T, # return prediction probabilities along with predicted classes
#                                     savePredictions=T, # returns all predictions (for all cv paritions) for each tuneGrid parameter set 
                                    returnData = F, # disable return of training data e.g. for big data sets
                                    allowParallel = T
# we see some accuracy around 0.7-0.8
head(with(model, results[order(results$Kappa, decreasing=T),]))
# confusion matrix: model predicting classes of test data
table(predict.train(object = model, newdata = x[indexes_y_test,], type='raw'), y[indexes_y_test])
# prediction probabilities of test data classes
probs <- predict.train(object = model, newdata = x[indexes_y_test,], type='prob')[,1]
isPositiveClass <- y[indexes_y_test] == 'versicolor' # for a ROC curve there is a positive class (true match rate...) - defining that class here
pred <- prediction(probs, isPositiveClass)
perf <- performance(pred, 'tpr', 'fpr')
# plot: either
plot(perf, lwd=2, col=3)
# or
with(attributes(perf), plot(x=x.values[[1]], y=y.values[[1]], type='l')) 

# some metrics: AUC (area under curve), EER (equal error rate), MSER (minimum squared error rate), Cohen's Kappa etc.
AUC <- attributes(performance(pred, 'auc'))$y.value[[1]] # area under curve
df <- with(attributes(pred), data.frame(cutoffs=cutoffs[[1]], tp=tp[[1]], fn=fn[[1]], tn=tn[[1]], fp=fp[[1]], TMR=tp[[1]]/(tp[[1]]+fn[[1]]), TNMR=tn[[1]]/(tn[[1]]+fp[[1]])))
df$MSER <- with(df, sqrt((1-TMR)**2+(1-TNMR)**2))
MSER <- with(df, sqrt(TMR**2+TNMR**2)) # sqrt of minimum squared error rate = eucl. distance to point TMR=TNMR=1
i_eer <- with(df, which.min(abs(TMR-TNMR)))
EER <- with(df[i_eer,], mean(c(TMR,TNMR))) # equal error rate: mean would not be required when using ROCR as it's always exact
df$acc <- with(df, (tp + tn) / length(isPositiveClass)) # observed accuracy
df$acc_expected <- with(df, sum(isPositiveClass) * (tp+fp) / length(isPositiveClass) + sum(!isPositiveClass) * (tn+fn) / length(isPositiveClass)) / length(isPositiveClass) # expected accuracy
df$kappa <- with(df, (acc - acc_expected) / (1 - acc_expected)) # cohen's kappa
# graphical representation 
matplot(df[,6:11], lty=1:6, lwd=2, type='l'); legend('bottomright', legend=names(df[,6:11]), lty=1:6, lwd=2, col=1:6)

# characteristics for settings with best EER, MSER etc.
df[i_eer,] # curve point for equal error rate
df[order(df$acc, decreasing=T)[[1]],] # curve point for max accuracy
df[order(df$kappa, decreasing=T)[[1]],] # curve point for max kappa
df[order(df$MSER, decreasing=F)[[1]],] # curve point for minimum squared error rate


With a test run the confusion matrix for the test data set was

             versicolor virginica
  versicolor         14         6
  virginica           2        10

which points out that 2 out of 16 versicolor samples were incorrectly classified as virginica and 6 out of 16 virginica samples were incorrectly classified as versicolor. Based on prediction probabilities we generated a ROC curve. The true match rate (TMR) indicates the rate of how much test data set samples were correctly identified as being of the positive class (in our case: versicolor). The false match rate (FMR, counterpart to true non match rate TNMR) indicates the rate of how much test data set samples were incorrectly identified as being of the positive class.rocAlthough ROC curves indicate classification performance well it’s hard to compare classifiers based directly on ROC curves. Here, different other metrics come in handy, such as the area under curve (AUC), equal error rate (EER), maximum accuracy, maximum kappa and minimum squared error rate (MSER = square of minimum euclidean distance between ROC curve and top left corner for TMR = TNMR = 1). In our case the AUC is about 0.8672, the other metrics (and their corresponding ROC curve points) are about:

             cutoffs   TMR TNMR accuracy accuracy_expected kappa  MSER
EER            0.550 0.750 0.75     0.75               0.5   0.5 0.354
Max acc.       0.515 0.875 0.75   0.8125               0.5 0.625 0.280
Max kappa      0.515 0.875 0.75   0.8125               0.5 0.625 0.280
Min MSER       0.515 0.875 0.75   0.8125               0.5 0.625 0.280

Image classification using SVMs in R

February 24, 2013 6 comments

Recently I did some Support Vector Machine (SVM) tests in R (statistical language with functional parts for rapid prototyping and data analysis — somehow similar to Matlab, but open source ;)) for my current face recognition projects. To get my SVMs up and running in R, using image data as in- and output, I wrote a small demo script for classifying images. As test data I used 2 classes of images (lines from left top to right bottom and lines from left bottom to right top), with 10 samples each — like these:

The complete image set is available here.

For SVM classification simple train and test sets get used — for more sophisticated problems n-fold cross validation for searching good parameter settings is recommended instead. For everybody who did not yet work with SVMs, I’d recommend reading something about how to start with “good” SVM classification, like the pretty short and easy to read “A Practical Guide to Support Vector Classification” from Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin (the LIBSVM-inventors).

Update: added parallel processing using parallel and mclapply for loading image data (for purpose of demonstration only, loading 10 images in parallel does not make a big difference ;)).

print('starting svm demo...')


# load img data
file_list <- dir(folder, pattern="png")
data <- mclapply(file_list, readPNG, mc.cores=2)
# extract subject id + img nr from names
subject_ids <- lapply(file_list, function(file_name) as.numeric(unlist(strsplit(file_name, "_"))[1]))
# rename subject id's to c1 and c2 for more clear displaying of results
img_ids <- lapply(file_list, function(file_name) as.numeric(unlist(strsplit(unlist(strsplit(file_name, "_"))[2], "\\."))[1]))

# specify which data should be used as test and train by the img nrs
train_test_border <- 7
# split data into train and test, and bring into array form to feed to svm
train_in <- t(array(unlist(data[img_ids < train_test_border]), dim=c(length(unlist(data[1])),sum(img_ids < train_test_border))))
train_out <- unlist(subject_ids[img_ids < train_test_border])
test_in <- t(array(unlist(data[img_ids >= train_test_border]), dim=c(length(unlist(data[1])),sum(img_ids >= train_test_border))))
test_out <- unlist(subject_ids[img_ids >= train_test_border])

# train svm - try out different kernels + settings here
svm_model <- svm(train_in, train_out, type='C', kernel='linear')

# evaluate svm
p <- predict(svm_model, train_in)
print(table(p, train_out))
p <- predict(svm_model, test_in)
print(table(p, test_out))

print('svm demo done!')