a set of C++ library classes
for neural networks development



pnd_cv

Application for training and testing PND ensembles using cross validation techniques

The application pnd_cv implements Parallel Non linear Dichotomizers (PND) ensembles of learning machines for classification problems, using cross validation techniques for training and testing the ensemble. PND solve multiclass classification problems using a set of two-class classifiers (dichotomizers).
They are based on Output Coding (OC) decomposition methods, that decompose a K-class (K > 2) classification in a set of two-class more simpler subproblems and then recompose the outputs of the dichotomizers coding a class by suitable codewords.

The implemented methods includes One-Per-Class (OPC), Pairwise Coupling (PWC), Pairwise Coupling Correcting Classifiers (CC) and Error Correcting Output Coding (ECOC) decomposition methods. Each dichotomizer can be a Multi-Layer-Perceptron (MLP) or a simple linear perceptron.
You can select a decomposition method, different learning algorithms and parameters for each dichotomizer. There are many possible options, but if any are not set by the user, default ones are used. The only mandatory parameter is the name of the training file. Be aware that the dimension of the input of the dichotomizers must agree with the dimension of the samples of the traininig and test files.
For generating the files for the cross validation from a single data file you can use the NEURObjects application dofold. See also the application pnd.

Usage: pnd_cv datafile [options]
The parameter datafile coresponds to the base name of the data files used for the cross validation: If you perform a 10 fold cross validation on the data set datafile, the application expects that the files datafile.f1.train, datafile.f2.train, ... , datafile.f10.train and the files datafile.f1.test, datafile.f2.test, ... , datafile.f10.test corresponding to the 10 folds are available.

Options:
-res string file summarizing cross-validation results
-decomp string Decomposirtion schemes:
OPC (one per class)
PWC (pairwise coupling)
CC (correcting classifiers)
ECOC_ex (Error Correcting Ouput Code - exaustive algorithm)
ECOC_BCH (Error Correcting Ouput Code - BCH algorithm)
-ndico unsigned codeword length
-nc unsigned number of the classes
-nf unsigned number of the folds
-nl unsigned number of the layers (considering also the input layer):
1: one layer (no hidden units)
2: two layers (1 hidden layer)
3: three layers (2 hidden layers)
-h1 unsigned number of hidden units of the first hidden layer of the three layers MLP
-h2 unsigned number of hidden units of the second hidden layer of the three layers MLP or number of hidden units of the two layers MLP (default 4)
-d unsigned sample dimension/number of input units
-maxit unsigned maximal number of iterations of the backpro algorithm (default 500)
-maxerr double maximal normalized root mean square error (RMS) (default 0.1)
-alg string Backpropagation learning algorithm type:
gd (gradient descent)
gd_dl (gradient descent linear decrement)
gd_de (gradient descent exponential decrement)
md (gradient descent with momentum)
md_dl (gradient descent with momentum and linear decrement)
md_de (gradient descent with momentum and exponential decrement)
bold (bold driver)
boldmom (bold driver with momentum)
-rate double learning rate (default 0.07)
-mom double momentum rate (default 0.5)
-incbold double bold driver increment rate
-decbold double bold driver decrement rate
-decr double gradient decrement (for gradient descent with momentum and linear or exponential decrement)
-seed seed for the random initialization of the weights. If 0 the initialization is performed using current computer time.
-out string save pnd outputs onto a file
-p integer 1 = printing a point at each iteration; 0 = noprint

Examples:
In all the examples we refer to data sets named foo with 5-dimensional samples and 7 classes.
The application uses a number of training and test sets equal to the number specified by the option -nf. If you to perform a 10 fold cross validation on a data set foo, the application expects that the files named foo.f1.train, foo.f2.train, ... , foo.f10.train and the files foo.f1.test, foo.f2.test, ... , foo.f10.test corresponding to the 10 folds are available.
These are only few examples. See above Usage for a complete reference of the available parameters. Note that you can place the parameters in any order, and most of the parameters are not mandatory. More precisely, you can also supply no parameters, but remember that the number of inputs of each dichotomizer must agree with the dimension of the input examples and you must also specify the correct number of classes and folds you want to use.

  1. Training and testing a PND using cross validation:
    pnd_cv foo -d 5 -nc 7
    Note that the OPC decomposition is the default one. Note that the order of the the supplied parameters is not important.
    Note also that the number of inputs of the dichotomizer must agree with the dimension of the input examples and you must also specify the correct number of classes. The dichotomic MLP of the PND have one hidden layer (default) and uses a backpropagation learning algorithm with fixed learning rate (default) and performs a 10 fold (default) cross validation.
    If you want to perform a cross-validation and summarize the results in a file named myresults:
    pnd_cv foo -d 5 -nc 7 -res myresults

  2. Cross validated pnd selecting a specific decomposition scheme:
    Selecting a Pairwise Correcting Classifiers (CC) decomposition scheme:
    pnd_cv foo -d 5 -nc 7 -decomp CC
    Selecting an Error Correcting Output Coding (ECOC) decomposition scheme using the exhaustive algorithm for generating the codewords:
    pnd_cv foo -d 5 -nc 7 -decomp ECOC_ex

  3. Training and testing a PND using cross validation with a user defined number of folds:
    pnd_cv foo -d 5 -nc 7 -nf 5
    In such a way you perform a 5 fold cross validation. You must provide files foo.f1.train, ... , foo.f5.train and foo.f1.test, ... , foo.f5.test corresponding to the 5 folds. See dofold for automatic generating the desired number of folds.

  4. Training and testing a PND using cross validation, varying number of hidden units and/or learning parameters:
    If you want to select a learning rate equal, for instance, to 0.03:
    pnd_cv foo -d 5 -nc 7 -rate 0.03
    If you want to select 15 neurons for the hidden layer:
    pnd_cv foo -d 5 -nc 7 -h2 15

  5. Performing cross validation using an ECOC-PND with two hidden layers:
    pnd_cv foo -d 5 -nc 7 -nl 3 -decomp ECOC_BCH -ndico 15
    If you want to select a learning rate equal, for instance, to 0.04:
    pnd_cv foo -d 5 -nc 7 -rate 0.04 -nl 3 -decomp ECOC_BCH -ndico 15
    If you want to select 30 neurons for the fist hidden layer and 22 for the second hidden layer:
    pnd_cv foo -d 5 -nc 7 -decomp ECOC_BCH -ndico 15 -nl 3 -h1 30 -h2 22

  6. Performing cross validation using an OPC-PND with dichotomic simple perceptrons:
    pnd_cv foo -d 5 -nc 7 -nl 1 -decomp OPC

  7. Performing a 5-fold cross validation using CC-PND and selecting a learning algorithms:
    Using a backpropagation with a momentun term:
    pnd_cv foo -d 5 -nc 7 -nf 5 -alg md -decomp CC
    If you want to select a learning rate equal, for instance, to 0.03 and a momentum rate equal to 0.5:
    pnd_cv foo -d 5 -nc 7 -alg md -rate 0.03 -mom 0.5 -decomp CC
    If you want to select a bold driver learning algorithm with increment rate equal to 1.02 and decremet rate equal to 0.5:
    pnd_cv foo -d 5 -nc 7 -alg bold -incbold 1.02 -decbold 0.5 -decomp CC

  8. Saving the outputs of the PND onto a file:
    For saving the computed outputs of the ensemble onto a file named output:
    pnd_cv foo -d 5 -nc 7 -out output

  9. A more complicated example:
    We want to train and test an OPC-PND with dichotomic MLP with one hidden layer and 22 hidden neurons using 5-fold cross validation, backpropagation algorithm with an exponential decrement of the learning rate during the epochs, with an initial rate of 0.3. We want also that the learning of each dichotomizer ends when the normalized RMS errors goes below 0.04 or we have reached 2000 iterations and we want also initialize the seed for the pseudorandom initialization of the weights to -1; then we want that the results are stored in the file results and the outputs relative to the test set are stored in the file output.
    In order to obtain this result you must type:
    pnd_cv foo -decomp OPC -d 5 -nc 7 -nf 5 -seed -1 -h2 22 -alg gd_de -rate 0.3 -maxerr 0.04 -maxit 2000 -res results -out output

Returns:
0 if there are no errors, 1 if there are some errors, 2 if the maximum number of iterations is reached by at least one dichotomizer in at least one fold.
Output:
Output of the application
File storing cross validation results
File storing the outputs
Input data file

Alphabetic index Hierarchy of classes


Last Updated February 2001
For comments and suggestions mail to Giorgio Valentini