Name | Last modified | Size | Description | |
---|---|---|---|---|
Parent Directory | - | |||
DATA_SETS/ | 2018-05-07 13:59 | - | ||
Subfolder DATASET_I:
HPO.graph10.organ.UA.rda: object of class graphNEL (R package graph) that represents the hiearchy of terms of the HPO subontology Phenotypic abnormality. This DAG has 2154 nodes (HPO terms) and 2641 edges (between-term relationships).
HPO.ann10.organ.UA.rda: annotations table in which the transitive closure of annotation was performed. Rows correspond to Entez Gene ID and columns to HPO terms (subontology Phenotypic Abnormality). If T represents the annotation table, i a gene and j an HPO term, T[i,j]=1 means that the gene i is annotated with the term j, T[i,j]=0 means that gene i is not annotated with the term j. All the HPO terms having less than 10 annotations has been pruned. Size: 19430 X 2154.
Scores.eav.score.p1.a2.M.hpo.ann.organ.all.10.rda: flat scores matrix representing
the likelihood that a given gene i belongs to a given class j: higher the value higher the likelihood. Rows correspond
to Entez Gene ID and columns to HPO terms (subontology Phenotypic
Abnormality). This flat scores matrix
was obtained running RANKS package. Size: 19430 X 2154.
Subfolder DATASET_II:
HPO.graph10.string.v91.rda: object of class graphNEL (R package graph) that represents the hierarchy of terms of the whole the HPO ontology. This DAG has 2445 nodes (HPO terms) and 3059 edges (between-term relationships).
HPO.ann10.string.v91.rda: annotations table in which the transitive closure of annotation was performed. Rows correspond to Entez Gene ID and columns to HPO terms. If T represents the anntation table, i a gene and j an HPO term, T[i,j]=1 means that the gene i is annotated with the term j, T[i,j]=0 means that gene i is not annotated with the term j. All the HPO terms having less than 10 annotations has been pruned. Size: 3412 X 2445.
Scores.holdout.par.type.0.C1.W.hpo.rda: flat scores matrix representing the probability that a given gene i belongs to a given class j: higher the value higher the probability. Rows correspond to Entez Gene ID and columns to HPO terms. This flat scores matrix was obtained running a multicore version of LiblineaR using doParallel and foreach R packages. Size: 3412 2444. NOTE: the fake root node "All" (HP:0000001) has been removed from the flat scores matrix.
test.set.index.rda: vector of integer numbers corresponding to the indices of the elements (rows) of scores matrix to be used in the test set. Useful only in holdout experiments. Length: 608.
Marco Notaro, Max Schubach, Peter N. Robinson and Giorgio Valentini, Prediction of Human Phenotype Ontology terms by means of Hierarchical Ensemble methods, BMC Bioinformatics 2017
Giorgio Valentini, Giuliano Armano, Marco Frasca, Jianyi Lin, Marco Mesiti, and Matteo Re, RANKS: a flexible tool for node label ranking and classification in biological networks, Bioinformatics first published online June 2, 2016
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin, LIBLINEAR: A library for large linear classification, Journal of Machine Learning Research 9, 1871-1874 (2008)