A general stability-based algorithmic scheme for assessing the reliability of clustering solutions may be summarized in the following way:

- Randomly perturb the data many times according to a given perturbation procedure.
- Apply a given clustering algorithm to the perturbed data
- Apply a given clustering similarity measure to multiple pairs of k-clusterings obtained according to steps 1 and 2.
- Use appropriate similarity indices to assess the stability of a given clustering.
- Repeat steps 1 to 4 for multiple values of k and select the most stable clustering(s) as the most reliable.

Several approaches have been proposed to implement the first step: a random "perturbation" of the data may be obtained through bootstrap samples drawn from the available data [15,18], or random noise injection into the data [17] or random projections into lower dimensional subspaces [20,7].

The application of a given algorithm (step 2) represents a choice based on "a priori" knowledge or assumptions about the characteristics of the data. To estimate the similarity between clusterings (step 3), classical measures, such as the Rand Index [19], or the Jaccard or the Fowlkes and Mallows coefficients [14] or their equivalent dot-product representations [4] may be applied.

Several stability indices for model order selection have been proposed in the literature (see, e.g. [17,20,18,16]): very schematically they can be divided into indices that use statistics of the similarity measures [20,7] or their overall empirical distribution [4].

The last step, that is the selection of the most stable/reliable clustering, given a set of similarity measures and the related stability indices, has been usually approached by choosing the best scored clustering (according to the chosen stability index). A major problem in this last step is represented by the estimate of the statistical significance of the discovered solutions.