The second step concerns the generation of the artificial
expression data and is ruled by the following parameters:
- Number of rows of the matrix, i.e. number of artificial
tissues;
- Number of tissues that have to be verified by
the first EP;
- Total number of considered artificial genes, i.e. number
of columns of the matrix;
- A range for the admitted data values (gene
expression levels);
- A minimum and a maximum level
that can be assumed by genes not belonging to the EPs.
Once the value for the parameters are fixed, the procedure
generates data as follows:
- The expression levels of all the genes are
generated according to a normal distribution with mean
and standard deviation
.
- Then, the expression levels of genes belonging to the
first EP are changed in the first rows and are chosen so that
a sufficient number of genes is modulated to make the EP verified.
Similarly, the expression levels of the genes that belong
to the second EP are changed in the remaining tissues, so
that the modulated genes are sufficient to verify the second EP.
The expression level of a generic overexpressed gene is chosen
according to a normal distribution with mean and
variance , where is the modulation threshold of
, while, if is an underexpressed gene, its level is
chosen according to a normal distribution with mean
and
same variance.