dodata |
Application for the automatic generation of synthetic data sets
Application for generating synthetic data sets for clasification problems. Data are grouped in clusters centered in predefined or random points pertainig to the input space. These clusters are distributed according to a normal distribution and each cluster can be assigned to a specified class.
Usage: dodata -t type datafile [options]
type = { ranspheric | rs | ranvariable | rv | readinfo | ri | readinfopatt | rip }
It defines the generation modedatafile (string) : name of the file generated by the application.
Options:
The options vary according to the selected generation mode
ranspheric (rs) option: The clusters are randomly distributed, one for each class according to spheric normal distribution -nc unsigned number of the classes (default=2) -d unsigned number of attributes (dimension of generated patterns) (default=2) -np unsigned number of patterns (default=100) -rmin double minimum value for the centers of clusters (default=1.0) -rmax double maximum value for the centers of clusters (default=9.0) -sigma double standard deviation (default=1.0) ranvariable (rv) option: The clusters are randomly distributed, one for each class according to a normal distribution with different standard deviation and number of patterns -nc unsigned number of the classes (default=2) -d unsigned number of attributes (dimension of generated patterns) (default=2) -pmin unsigned minimum number of patterns for each class (default=20) -pmax unsigned maximum number of patterns for each class (default=100) -rmin double minimum value for the centers of clusters (default=1.0) -rmax double maximum value for the centers of clusters (default=9.0) -sigmin double minimum allowed standard deviation (default=0.5) -sigmax double maximum allowed standard deviation (default=2.0) readinfo (ri) option: The clusters are normal distributed, with parameters read from a file -ninfo string file storing the parameters readinfopatt (rip) option: The clusters are normal distributed, with parameters read from a file and with random numbers of pattern per class -ninfo string file storing the parameters -pmin unsigned minimum number of patterns for each class (default=20) -pmax unsigned maximum number of patterns for each class (default=100) Examples:
- Generating a dataset data with spheric normal distribution:
dodata -t rs data
It generates a data set with all default values of parameters (see Usage)- Generating a dataset data with normal distribution:
dodata -t rv data
It generates a data set with all default values of parameters (see Usage)- Generating a dataset data with spheric normal distribution and parameters user defined:
It generates 5 classes with 1000 patterns 12-dimensional for each class and with centers of clusters randomly varying from 5.0 and 125.0 and standard deviation equal to 5.7
dodata -t rs data -nc 5 -d 12 -np 1000 -rmin 5.0 -rmax 125.0 -sigma 5.7
- Generating a dataset data with normal distribution and parameters user defined:
It generates 7 classes with 8-dimensional patterns randomly ranging in each class from 100 to 300 and with centers of clusters randomly varying from 2.0 and 100.0 and standard deviation randomly ranging fon 5.0 to 10.0.
dodata -t rv data -nc 7 -d 8 -pmin 100 -pmax 300 -rmin 2.0 -rmax 100.0 -sigmin 5.0 -sigmax 10.0
- The clusters are normal distributed, with parameters read from the file info:
dodata -t ri data -ninfo info
For the format of the file see ...Output:
The applications output a file data according to the selected generation mode.
Alphabetic index Hierarchy of classes