a set of C++ library classes
for neural networks development



class TrainingSet

Class for reading and loading the training set from file
Attributes are loaded in the matrix data in numerical form.

Inheritance:


Public Fields

[more]static const char* const FOLD_TRAIN_SUFFIX
Suffix for the training set file
[more]static const char* const FOLD_TEST_SUFFIX
Suffix for the test set file
[more]static const char BLANK
Blank separator.

Public Methods

[more] TrainingSet (void)
Dummy constructor
[more] TrainingSet ( unsigned num_attr, unsigned num_train, char separator = ', ')
Constructor.
[more] TrainingSet ( unsigned num_attr, unsigned num_train, char* file, char separator = ', ', class_pos pos = last)
Constructor.
[more]virtual ~TrainingSet ()
Destructor
[more]void normalize (void)
It normalizes input data
Input data are normalized in such a way that the mean value is zero and standard deviation sv is 1:
normalized_x = (x - mean(x)) / sv.
[more]void normalize_var (void)
It normalizes input data
Input data are normalized in such a way that the mean value is zero and the variance is 1.
[more]void normalize (vect& m, vect& sv)
It normalizes input data
Input data are normalized in such a way that the mean value is zero and standard deviation sv is 1:
normalized_x = (x - mean(x)) / sv.
[more]void calc_mean_stdev (void)
Computes mean and standard deviation of the data
[more]void FoldRand (unsigned numfold, char* name)
It subdivides a file for k-fold cross-validation
[more]void Subsample_with_replacement (char* name, unsigned numsample, long initseed = 0)
Subsampling with replacement of the data, saving results into a file
It saves in the file name a subset of the data drawn with replacement according to a uniform probability distribution
[more]TrainingSet* Subsample_with_replacement (unsigned numsample, long initseed = 0)
Subsampling with replacement of the data
It generates a subset of the data drawn with replacement according to a uniform probability distribution
[more]void save (char* name)
It stores the data set into a file
[more]void save_light_format (char* name)
It stores the data set into a file in SVM-light format
All the classes with target <> 1 are relabeled with -1.
[more]void set_thresholds (float low_thresh, float hi_tresh)
Set a low threshold and a ceiling to the data
[more]inline unsigned Ntrain () const
[more]inline unsigned Nattr () const
[more]inline unsigned Nclass () const
[more]inline vect& Read_target ()
[more]inline matrix& Read_data ()
[more]inline void Set_data (matrix& m)
[more]inline vect& Read_mean (void)
[more]inline vect& Read_stdev (void)
[more]inline unsigned Is_normalized () const
[more]inline void print_data ()
[more]inline void print_target ()
[more]inline void print_info ()
[more]inline void print_sep ()
[more]inline char read_sep ()
[more]inline void set_sep (char c)
Set the data set separator character
[more]void Transpose_data (void)
Transpose the data matrix
It swaps the rows and the columns of the data matrix.
[more]void load_target (char* t)
Load the file of the targets

Public Members

[more]enum class_pos
position order of the class in the rows of the data file

Protected Fields

[more]unsigned n_train
Training set cardinality
[more]unsigned n_attr
Attributes cardinality
[more]unsigned n_class
Cardinality of the classes
[more]char sep
character separator between attributes in the rows of the input data file
[more]vect target
[more]matrix data
Vettore numerico delle classi target
[more]vect mean
Ncolonne = n_attr
[more]vect stdev
valor medio degli input (dim: n_attr)
[more]unsigned is_normalized
deviazione standard degli input (dim: n_attr)

Protected Methods

[more]virtual unsigned read (char* file, class_pos pos = last)
Read a file containing the training set and load input data
It loads input attributes data on matrix data and numerical target data identifying the classes in vector target.
[more]void mean_calc (void)
It computes then mean of the input pattern data
[more]void stdev_calc (void)
It computes then standard deviation of the input pattern data
[more]void DoTestTrainFile (char* name, unsigned n, vector<unsigned>& v)
It saves train and validation set (n-fold cross - validation)
If v[i] = 1, data(i,:) and target t[i] are saved in the file namefntest, otherwise they are saved in the file namefntrain.
[more]void save_pattern (ofstream& fdata, vect& patt, unsigned k)
It saves a single pattern data and its class target k on a stream
[more]void save_pattern_light_format (ofstream& fdata, vect& patt, int k)
It saves a single pattern data and its class target k on a stream in SVM-light format
[more]void save_subset (char* name, unsigned numsample, vect & subset)
It saves a subset of data
It saves numsample samples, using the indices in data and target stored in subset
[more]TrainingSet* generate_subsample (unsigned numsample, vect & subset)
It generates a subset of data
It gebnerates numsample samples, using the indices in data and target stored in subset


Documentation

Class for reading and loading the training set from file
Attributes are loaded in the matrix data in numerical form. Classes are loaded in memory in numerical form in the vector target. Each line of the file to be read must be a sample, and each different attribute of the same sample must be separated by a separator or a sequence of blanks. The last field of the line is the class: attr1,attr2, ..., attrn_attr,target
oenum class_pos
position order of the class in the rows of the data file

ostatic const char* const FOLD_TRAIN_SUFFIX
Suffix for the training set file

ostatic const char* const FOLD_TEST_SUFFIX
Suffix for the test set file

ostatic const char BLANK
Blank separator. If separator is BLANK, attributes and classes can be separated by an arbitrary sequence of blanks

o TrainingSet(void)
Dummy constructor

o TrainingSet( unsigned num_attr, unsigned num_train, char separator = ', ')
Constructor.
Parameters:
num_attr - attribute cardinality
num_train - training set cardinality
separator - attribute separator in the rows of the input data file

o TrainingSet( unsigned num_attr, unsigned num_train, char* file, char separator = ', ', class_pos pos = last)
Constructor. It reads from file the attributes and the class of the patterns. If separator is BLANK, the attributes and the class can be separated by an arbitrary sequence of blanks, otherwise by the selected separator character.
Parameters:
num_attr - attribute cardinality
num_train - training set cardinality
file - name of the input data file
separator - attribute separator in the rows of the input data file
pos - position order of the class in the rows of the data file

ovirtual ~TrainingSet()
Destructor

ovoid normalize(void)
It normalizes input data
Input data are normalized in such a way that the mean value is zero and standard deviation sv is 1:
normalized_x = (x - mean(x)) / sv. It computes mean and standard deviation of the training set values

ovoid normalize_var(void)
It normalizes input data
Input data are normalized in such a way that the mean value is zero and the variance is 1.

ovoid normalize(vect& m, vect& sv)
It normalizes input data
Input data are normalized in such a way that the mean value is zero and standard deviation sv is 1:
normalized_x = (x - mean(x)) / sv. In order to perform the normalization it uses the given vectors of the mean and standard deviation.
Parameters:
- m mean vector of attributes before normalization.
sv - standard deviation vector before normalization.

ovoid calc_mean_stdev(void)
Computes mean and standard deviation of the data

ovoid FoldRand(unsigned numfold, char* name)
It subdivides a file for k-fold cross-validation
Parameters:
numfold - number of folds
name - base name of the file

ovoid Subsample_with_replacement(char* name, unsigned numsample, long initseed = 0)
Subsampling with replacement of the data, saving results into a file
It saves in the file name a subset of the data drawn with replacement according to a uniform probability distribution
Parameters:
name - name of the file to save data
numsample - number of the samples
initseed - if equal to 0 the seed of the random generator is initialized using the computer clock, otherwise is set to initseed

oTrainingSet* Subsample_with_replacement(unsigned numsample, long initseed = 0)
Subsampling with replacement of the data
It generates a subset of the data drawn with replacement according to a uniform probability distribution
Parameters:
numsample - number of the samples to be generated
initseed - if equal to 0 the seed of the random generator is initialized using the computer clock, otherwise is set to initseed
Returns:
the subsampled training set

ovoid save(char* name)
It stores the data set into a file
Parameters:
name - file name

ovoid save_light_format(char* name)
It stores the data set into a file in SVM-light format
All the classes with target <> 1 are relabeled with -1.
Parameters:
name - file name

ovoid set_thresholds(float low_thresh, float hi_tresh)
Set a low threshold and a ceiling to the data
Parameters:
low_thresh - lower bound to which data are set
hi_tresh - ceiling of the data

oinline unsigned Ntrain() const

oinline unsigned Nattr() const

oinline unsigned Nclass() const

oinline vect& Read_target()

oinline matrix& Read_data()

oinline void Set_data(matrix& m)

oinline vect& Read_mean(void)

oinline vect& Read_stdev(void)

oinline unsigned Is_normalized() const

oinline void print_data()

oinline void print_target()

oinline void print_info()

oinline void print_sep()

oinline char read_sep()

oinline void set_sep(char c)
Set the data set separator character
Parameters:
c - separator character

ovoid Transpose_data(void)
Transpose the data matrix
It swaps the rows and the columns of the data matrix.

ovoid load_target(char* t)
Load the file of the targets
Parameters:
- t name of the target file

ounsigned n_train
Training set cardinality

ounsigned n_attr
Attributes cardinality

ounsigned n_class
Cardinality of the classes

ochar sep
character separator between attributes in the rows of the input data file

ovect target

omatrix data
Vettore numerico delle classi target

ovect mean
Ncolonne = n_attr

ovect stdev
valor medio degli input (dim: n_attr)

ounsigned is_normalized
deviazione standard degli input (dim: n_attr)

ovirtual unsigned read(char* file, class_pos pos = last)
Read a file containing the training set and load input data
It loads input attributes data on matrix data and numerical target data identifying the classes in vector target. It returns the number of classes deduced by the maximum number associated to a class in the input file.
Parameters:
file - name of the input data file
pos - position of the target class: it can be at the end (last, default) or at the beginning (first) of each line of the input file.
Returns:
number of classes

ovoid mean_calc(void)
It computes then mean of the input pattern data

ovoid stdev_calc(void)
It computes then standard deviation of the input pattern data

ovoid DoTestTrainFile(char* name, unsigned n, vector<unsigned>& v)
It saves train and validation set (n-fold cross - validation)
If v[i] = 1, data(i,:) and target t[i] are saved in the file namefntest, otherwise they are saved in the file namefntrain.
Parameters:
name - base name of the couple train and test data file
- n nth fold
- v vector to assign the patterns to the training or test set

ovoid save_pattern(ofstream& fdata, vect& patt, unsigned k)
It saves a single pattern data and its class target k on a stream
Parameters:
fdata - stream to save data
patt - pattern to be saved
- k target class to be saved

ovoid save_pattern_light_format(ofstream& fdata, vect& patt, int k)
It saves a single pattern data and its class target k on a stream in SVM-light format
Parameters:
fdata - stream to save data
patt - pattern to be saved
- k target class to be saved. It can be only 1 or -1.

ovoid save_subset(char* name, unsigned numsample, vect & subset)
It saves a subset of data
It saves numsample samples, using the indices in data and target stored in subset
Parameters:
name - name of the file for the data to be saved
numsample - number of the samples to be saved
subset - indices of the samples in data and target to be saved

oTrainingSet* generate_subsample(unsigned numsample, vect & subset)
It generates a subset of data
It gebnerates numsample samples, using the indices in data and target stored in subset
Parameters:
numsample - number of the samples to be saved
subset - indices of the samples in data and target to be saved
Returns:
training set generated


Direct child classes:
TSetStrClass

Alphabetic index HTML hierarchy of classes or Java


Last Updated February 2001
For comments and suggestions mail to Giorgio Valentini