................ SHORT DOC .............................................
MCG2: Optimal Partitionning (numerical variables).
Ascending hierarchical clustering with the
"Modified Clustering Gain" as stop criterion.
/* MCG2 is a new version of MCG in which two bugs have been corrected
/* in the module reading the input data (the clustering routine is ok).
/* The functionalities of MCG and MCG2 are identical.
Reference:
J.E. Meslamani, F. Andre, M. Petitjean
J. Chem. Inf. Model. 2009, 49[2], 330-337.
DOI: 10.1021/ci800275k
mailto: petitjean.chiral@gmail.com
MCG2 reads, either an (N*N) array of distances or dissimilarities,
or an array containing N lines (individuals) and P columns
(numerical variables), which will be converted into an
(N*N) array of euclidean distances.
Then MCG2 computes an optimal partition.
Input parameters: N , P , (S,C,A)
(these 3 parameters are input at keyboard on a single line)
(allowed separators: space, comma, semi-colon, slash)
-----------------------------------------------------
N: number of individuals
P: number of numerical variables
Entering P = 0 means that the (N*N) array of distances or
dissimilarities is to be read (N lines, N columns).
Entering P > 0 means that the array of N lines and P columns
is to be read, and the array of euclidean distances
will be computed.
(S,C,A): indicates the linkage mode for the hierarchical clustering:
Entering S sets the single linkage mode
Entering C sets the complete linkage mode (default)
Entering A sets the quadratic average linkage mode
Input file name:
Name of the input data file.
(an empty name indicates that the data are to be input at keyboard)
Output: the optimal partition, plus the mean point in each cluster.
For each individual, the sum of squared distances to its
mean cluster point is computed.
Remarks:
- The dissimilarites are treated as distances,
and are assumed to be non-negative.
- The computed mean points are always taken among the N initial ones,
even in the euclidean case.
- In the case of P numerical variables, it is sometimes useful to center
and/or scale the data before calling MCG2.
- The computing time grows as O(N^3).
- The current maximal value of N is 10000.
- The line buffer is set to 255*255 characters per record in the file.
................ END SHORT DOC .........................................