nn_cv |
Application for training and testing MLP using cross validation techniques
The application nn_cv implements an MLP and a simple perceptron with multiple inputs and outputs for classification problems, using cross validation techniques for training and testing. The classes can be coded according to the classical One-Per-Class (OPC) coding scheme or Error Correcting Output Coding (ECOC) scheme. Program options allow to build arbitrary MLP with one, two or no hidden layers, using a number of hidden units limited only by the available memory. The user can also select different flavors of back propagation algorithms and different learning algorithm parameters (e.g. learning rates, momentum). If an option is not set by the user, a default one is used. For generating the file for the cross validation from the data file we can use the NEURObjects application dofold.
Usage: nn_cv datafile [options]
The parameter datafile coresponds to the base name of the data files used for the cross validation: If you perform a 10 fold cross validation on the data set datafile, the application expects that the files datafile.f1.train, datafile.f2.train, ... , datafile.f10.train and the files datafile.f1.test, datafile.f2.test, ... , datafile.f10.test corresponding to the 10 folds are available.Options:
-res string file summarizing cross-validation results -type string MLP type: normal (MLP-OPC) (default) ecoc (MLP-ECOC) -lencw unsigned codeword length -nc unsigned number of the classes -nf unsigned number of the folds -nl unsigned number of the layers (considering also the input layer): 1: one layer (no hidden units) 2: two layers (1 hidden layer) 3: three layers (2 hidden layers) -h1 unsigned number of hidden units of the first hidden layer of the three layers MLP -h2 unsigned number of hidden units of the second hidden layer of the three layers MLP or number of hidden units of the two layers MLP (default 4) -d unsigned sample dimension/number of input units -maxit unsigned maximal number of iterations of the backpro algorithm (default 500) -maxerr double maximal normalized root mean square error (RMS) (default 0.1) -alg string Backpropagation learning algorithm type: gd (gradient descent) gd_dl (gradient descent linear decrement) gd_de (gradient descent exponential decrement) md (gradient descent with momentum) md_dl (gradient descent with momentum and linear decrement) md_de (gradient descent with momentum and exponential decrement) bold (bold driver) boldmom (bold driver with momentum) -rate double learning rate (default 0.07) -mom double momentum rate (default 0.5) -incbold double bold driver increment rate -decbold double bold driver decrement rate -decr double gradient decrement (for gradient descent with momentum and linear or exponential decrement) -seed seed for the random initialization of the weights. If 0 the initialization is performed using current computer time. -s string save parameters and weights of all the cross validated MLP onto a file -out string save neural net outputs onto a file -w string read parameters and weights for all the cross validated MLP from a file -p integer 1 = printing a point at each iteration; 0 = noprint Examples:
In all the examples we refer to data sets named foo with 5-dimensional samples and 7 classes.
The application uses a number of training and test sets equal to the number specified by the option -nf. If you to perform a 10 fold cross validation on a data set foo, the application expects that the files named foo.f1.train, foo.f2.train, ... , foo.f10.train and the files foo.f1.test, foo.f2.test, ... , foo.f10.test corresponding to the 10 folds are available.
These are only few examples. See above Usage for a complete reference of the available parameters. Note that you can place the parameters in any order, and most of the parameters are not mandatory. More precisely, you can also supply no parameters, but remember that the number of inputs of the MLP must agree with the dimension of the input examples and you must also specify the correct number of classes and folds you want to use.
- Training and testing a standard MLP with one hidden layer using cross validation:
nn_cv foo -d 5 -nc 7
Note that the order of the the supplied parameters is not important.
Note also that the number of inputs of the MLP must agree with the dimension of the input examples and you must also specify the correct number of classes. The selected MLP has one hidden layer (default) and uses a backpropagation learning algorithm with fixed learning rate and performs a 10 fold (default) cross validation.
If you want to select a learning rate equal, for instance, to 0.02:
nn_cv foo -d 5 -nc 7 -rate 0.02
If you want to select 10 neurons for the hidden layer:
nn_cv foo -d 5 -nc 7 -h2 10
Performing cross-validation and summarizing the results in a file named myresults nn_cv foo -d 5 -nc 7 -res myresults
- Training and testing a standard MLP using cross validation with a user defined number of folds:
nn_cv foo -d 5 -nc 7 -nf 5
In such a way you perform a 5 fold cross validation. You must provide files foo.f1.train, ... , foo.f5.train and foo.f1.test, ... , foo.f5.test corresponding to the 5 folds. See dofold for automatic generating the desired number of folds.- Performing cross validation using a standard MLP with two hidden layers:
nn_cv foo -d 5 -nc 7 -nl 3
If you want to select a learning rate equal, for instance, to 0.04:
nn_cv foo -d 5 -nc 7 -rate 0.04 -nl 3
If you want to select 20 neurons for the fist hidden layer and 12 for the second hidden layer:
nn_cv foo -d 5 -nc 7 -nl 3 -h1 20 -h2 12
- Performing cross validation using a standard MLP and changing learning algorithms:
Using a backpropagation with a momentum term:
nn_cv foo -d 5 -nc 7 -alg md
If you want to select a learning rate equal, for instance, to 0.1 and a momentum rate equal to 0.4:
nn_cv foo -d 5 -nc 7 -alg md -rate 0.1 -mom 0.4
If you want to select a bold driver learning algorithm with increment rate equal to 1.04 and decremet rate equal to 0.5:
nn_cv foo -d 5 -nc 7 -alg bold -incbold 1.04 -decbold 0.4
- Cross validation using an ECOC MLP with one hidden layer:
nn_cv foo -d 5 -nc 7 -type MLP-ECOC
The same but performing a 3 fold cross validation:
nn_cv foo -nf 3 -d 5 -nc 7 -type MLP-ECOC
- Performing cross validation using standard MLP and saving it onto a file:
nn_cv foo -d 5 -nc 7 -s mymlp
The file mymlp.cvnet stores the parameters and weights of all the cross validated MLP, i.e. if you perform a 10 fold cross cros validation, all the parameters and weights of the 10 MLP are stored in the file. You can reload their weights and parameters and continue the training in a second, possibly different stage: nn_cv foo -d 5 -nc 7 -alg boldmom -maxerr 0.02 -w mymlp
The MLP are initialized with the parameters and weights stored in the file mymlp.cvnet, and the the training starts using a bold driver with momentum algorithm and the learning algorithms ends when the normalized RMS errors drops down to 0.02 (or the maximum allowed number of iterations is reached).- Saving the outputs of the cross validated MLP:
For saving the computed outputs of the net onto a file named output:
nn_cv foo -d 5 -nc 7 -out output
- A more complicated example:
We want to perform a 5 fold cross validation using a MLP with one hidden layer and 35 hidden neurons using a backpropagation algorithm with an exponential decrement of the learning rate during the epochs, with an initial rate of 0.2, initializing it with the weights stored in the file startweights. We want also that the learning of each MLP ends when the normalized RMS errors goes below 0.02 or we have reached 5000 iterations; then we want that the resulting MLP is stored in the file finishweights and the outputs are stored in the file output. In order to obtain this result you must type:
nn_cv foo -nf 5 -d 5 -nc 7 -h2 35 -alg gd_de -rate 0.2 -maxerr 0.05 -maxit 5000 -w startweights -s finishweights -out output
Alphabetic index Hierarchy of classes