next up previous
Next: Download software and documentation Up: Application of clusterv to Previous: Analysis of cluster reliability

Analysis of cluster reliability in lung tumor patients.

Here we present the results of an application of clusterv to the analysis of lung tumor patients, using a DNA microarray data composed by 203 histologically defined specimens: 186 lung tumors, subdivided in 127 lung adenocarcinoma (AD), 21 squamous cell lung adenocarcinoma (SQ), 20 pulmonary carcinoids (COID), 6 small-cell lung adenocarcinoma (SMCL) and 17 normal lung (NL) specimens [6]. From the 12600 original genes of the U95A Affymetrix oligonucleotide array 3312 passed the filter (genes with standard deviation units less than 50 have been excluded), according to the procedures described in [6] and then the gene expression levels have been normalized with respect to the mean and standard deviation. In this case also we implemented the pre-processing procedures with R scripts. We evaluated the reliability of the discovered clusters for both normalized and not normalized (with respect to the mean and standard deviation) data.

Bhattacharjee et al. discovered distinct subclasses of lung adenocarcinoma [6] using DNA microarray data. We applied our stability measures using PMO projections and the Ward's hierarchical clustering to analyze the reliability of the discovered subclasses.

The results summarized in Tab. 4 and Fig. 6 partially confirmed that the clusters defined by established histological classes [6] are quite reliable. At first, the overall stability indices suggest that pulmonary carcinoid tumors (COID) constitute a well-defined and separated cluster among the different subclasses of lung adenocarcinomas. Indeed the highest overall stability index is obtained with N=2 clusters; the first cluster that collect all the COID patients shows an individual stability index very close to 1. Moreover with N=3 clusters the first COID cluster is highly supported by the s index (Tab. 4). Anyway also partitions characterized by larger number of clusters show relatively high values of the overall stability index, supporting the Bhattacharjee et al. thesis of distinct subclasses of lung adenocarcinoma. For instance, with N=4 clusters, the COID and normal lung (NL) subclasses are classified as reliable by the s index, the big second cluster characterized by several adenocarcinomas (AD) with Small-Cell-Lung-adenocarcinoma (SMCL) and some normal examples is scored as quite reliable (stablity index s=0.8168), while the fourth cluster that groups together adenocarcinoma and squamous cell lung adenocarcinomas (SQ) is scored as less reliable (s=0.7157) (Tab. 4 and Fig. 6). With N=8 s the first two subclasses of COID patients are highly reliable, as well as the sixth cluster (normal lung). Interestingly enough, the cluster 3,4,5 represents three distinct subclasses inside adenocarcinoma patients, with a relatively high individual cluster stability (Tab. 4, N=8). Cluster 7 also represents another cluster of adenocarcinomas with also SQ and SMCL specimens inside, even if its individual stability index is quite smaller (s=0.7692). These results partially confirm the hypothesis of distinct subclasses among lung adenocarcinoma [6]. Anyway the stability indices show also that the subclasses are not so clearly delineated: these facts show that the results of clustering algorithms should be considered with caution, especially when complex and noisy data (such as DNA microarray data usually are) are analyzed. A stability analysis using other clustering algorithms (using for instance the functions Random.kmeans.validity, Random.fuzzy.kmeans.validity, Random.PAM.validity of the clusterv package could get more insights into this problem.

Figure: Hierarchical clustering of Lung tumor examples (Ward method). Gray dotted lines cut the dendrogram such that exactly k clusters are produced, for k=2,3,4,8. Considering 8 clusters, the first two refers two pulmonary carcinoids patients (COID), the third to a group of lung adenocarcinoma together with small-cell lung adenocarcinoma patients (SMCL), the fourth to a first group of lung adenocarcinoma patients (AD I), the fifth to a second group of adenocarcinoma patients with 3 normal patients (AD II + NL), the sixth to normal (NL) patients, the seventh to a third group of adenocarcinoma patients (AD III) and the last to squamous cell lung adenocarcinomas (SQ). See Table 4 for the corresponding stability indices.
\includegraphics[width = 14cm]{ps/tree.Bhatta.Lung.eps}


Table 4: Lung Tumor: Estimate of cluster stability
N. Overall stability index S
eps=0.5 eps=0.4 eps=0.3 eps=0.2 eps=0.1
2 0.9017 0.9376 0.9705 0.9708 0.9883
3 0.7550 0.7723 0.7964 0.8057 0.8611
4 0.7381 0.7571 0.7928 0.8074 0.8698
5 0.7198 0.6994 0.7497 0.7815 0.8294
6 0.6706 0.6797 0.7264 0.7602 0.8273
7 0.6777 0.6982 0.7381 0.7750 0.8225
8 0.6330 0.6575 0.7030 0.7460 0.8096
9 0.6123 0.6355 0.6870 0.7282 0.8098
10 0.5970 0.6304 0.6850 0.7336 0.8105
20 0.5769 0.6271 0.6700 0.7346 0.8056
N. Cl. Individual stability index s
eps=0.5 eps=0.4 eps=0.3 eps=0.2 eps=0.1
2 1 0.9185 0.9292 0.9684 0.9724 0.9940
2 0.8849 0.9459 0.9726 0.9692 0.9826
3 1 0.9034 0.9236 0.9624 0.9723 0.9940
2 0.7025 0.7455 0.7600 0.7401 0.8475
3 0.6592 0.6479 0.6667 0.7047 0.7420
4 1 0.8976 0.9236 0.9624 0.9723 0.9940
2 0.6371 0.6857 0.7237 0.7119 0.8448
3 0.7997 0.7861 0.8469 0.9020 0.9247
4 0.6180 0.6331 0.6384 0.6435 0.7157
5 1 0.8875 0.9236 0.9624 0.9723 0.9940
2 0.5690 0.5597 0.6132 0.6664 0.6992
3 0.7630 0.7211 0.7930 0.8588 0.9107
4 0.7637 0.6811 0.7267 0.6971 0.7341
5 0.6157 0.6116 0.6532 0.7132 0.8093
6 1 0.8495 0.9144 0.9624 0.9723 0.9940
2 0.7804 0.7993 0.8321 0.8891 0.8861
3 0.4322 0.4648 0.5111 0.5181 0.6642
4 0.7270 0.7150 0.7767 0.8261 0.9107
5 0.6690 0.6275 0.6675 0.6772 0.7236
6 0.5655 0.5573 0.6086 0.6786 0.7850
8 1 1.0000 1.0000 1.0000 1.0000 1.0000
2 0.7884 0.8430 0.9345 0.9509 0.9900
3 0.5514 0.6352 0.6965 0.7776 0.8360
4 0.4865 0.4828 0.5510 0.6283 0.7813
5 0.3904 0.4505 0.4588 0.5073 0.5258
6 0.7064 0.7144 0.7755 0.8255 0.9107
7 0.5949 0.5957 0.6078 0.6132 0.6484
8 0.5455 0.5385 0.5998 0.6649 0.7847


next up previous
Next: Download software and documentation Up: Application of clusterv to Previous: Analysis of cluster reliability
Giorgio 2006-08-16