Wednesday 10 April 2013

To pedestrian or not to pedestrian

The final step of the algorithm is the classification of Detection Windows [DW] for Pedestrian or Not Pedestrian.

In this entry I'll proceed to explain the steps I take for training and testing a classifier. Know that I'm not yet very well acquainted with all the concepts and different parameters on this matter.

In my problem, there is a DW to be classified, and thousands of features that describe it. In such problems Adaboost is normally a preferred approach, since it is a Machine Learning method that takes in a large number of weak classifiers (features) and creates a strong classifier. This is exactly the approach that ChnFtrs proposes, and fortunately openCV has an implementation of it.

Step 1: Introducing the data to the algorithm.

For this I need to prepare a .csv file with the category in one column, followed by the features. For example, if I was using 3 features my file could look like this:

N,1000,1020,900
P,2000,1200,300
P,3300,1235,1000
N,1432,1587,5587
...


The file  is being  generated by running the code on multiple images and writing the results to a file.

Step 2: Opening the file

Supposing that our file is called "train.csv", the code looks like this:

 CvMLData cvml;

  cvml.read_csv ("train.csv");

CvMLData stands for Computer Vision Machine Learning Data and is a class made just for handling machine learning problems with openCV.

Step 3: Set response index

This is to let openCV know in which column the response is

cvml.set_response_idx (0); //column 0

Step 4: Separating response from values

  const CvMat* Resp = cvml.get_responses();
  const CvMat* Values = cvml.get_values();
 
  Mat RespM(Resp, false); // change from old CvMat to the newer Mat class
  Mat ValM(Values,false);


Mat trainData = ValM.colRange(1, ValM.cols); //eliminating 1st column which
                                                                            // has the responses

Step 5: Training and saving a classifier

   CvBoost boost;
 
    boost.train(    trainData,                  //data
                          CV_ROW_SAMPLE,   //Samples in rows
                          RespM,                      //Responses
                          Mat(),
                          Mat(),
                          Mat(),
                          Mat(),
                         CvBoostParams(CvBoost::REAL,1000, 0, 1, false, 0),
                         false );
   
     boost.save ("./trained_boost.xml", "boost")

The CvBoost::train method as several parameters, some of which I'm not using yet. It is possible to select a subset of the train data to do the training, leaving the rest for testing, allowing for an immediate grasp on the performance of the classifier. It is also possible to have missing fields on the feature pool and let the algorithm fill them with approximated values.

As it is, I'm training a model of type REAL, with 1000 weak classifiers, using all the data on the file and leaving all other parameters default.

Step 6: Classifying new samples

On the main code I have to load the classifier and the run the CvBoost::predict method on samples with the same number of columns as the ones used to train it.

  CvBoost boost;
 
  boost.load("trained_boost.xml");

for(uint n=0;n<WindowFtrs.size();n++)
    {
     
      Test=Mat::zeros(1,NRFEATURE,CV_32FC1);
     
      for(uint i=0; i<NRFEATURE; i++)
      {
   
    Test.at<float>(0,i)=WindowFtrs[n][i];
   
      }
    float x = boost.predict(Test,Mat(),Range::all(),false,false);
    if (x==2) nPedcount++;
    if (x==1) Pedcount++;
   
    }
   cout<<"No Ped: "<<nPedcount<<" Ped: "<<Pedcount<<endl;

In this cycle, Test is filled with the feature values and then it is tested with the predict method.

x outputs the class (1  for Ped and 2 for nPed) and this way I was able to do some preliminary testing of my first classifier. On 1300 windows with pedestrians, the classifier failed 6 predictions, which is pretty good for a first try. I also tested the classifier for over 14 million DWs without pedestrians, leading to around 290000 false detections, which in terms of blunt accuracy means 98% correct predictions. However, I'll not discuss this results yet since this is hardly the best way to evaluate a classifier.

Regards!

No comments:

Post a Comment