© Michel Petitjean, 2014

Author's professional address:

MTi, INSERM UMR-S 973, Université Paris 7

35 rue Hélène Brion, 75205 Paris Cedex 13, France.

petitjean.chiral@gmail.com

Formerly: CEA/DSV/iBiTec-S/SB2SM (CNRS URA 2096), Saclay, France

(and formerly: ITODYS, UMR 7086, CNRS, Université Paris 7).

Other topics:

- The Mathematical Theory of Chirality.
- Symmetry, Chirality, Symmetry Measures and Chirality Measures: General Definitions
- An Asymmetry Coefficient for Multivariate Distributions
- A Simple Dataset to Test and Compare Chirality Measures
- The Random-Disorder Paradox
- Classification: Computing an Optimal Partition
- Mining Data in Graph Databases
- The Parity Phenomenon in Large Chemical Databases
- The Graphic Mendeleev Table
- The Radius Diameter Diagram
- Virtual Screening of 3D Chemical Data Bases
- [6.6]Chiralane: The chemical nomenclature challenge
- Molecular Symmetry and Chirality
- Molecular shape descriptors: the cylindrical model
- Spheres Unions and Intersections
- Analytical Computation of van der Waals Surfaces and Volumes
- Freewares

Shape recognition, i.e. geometric pattern recognition, is a problem commonly encountered in various areas: data analysis, robotics, molecular modeling, biochemistry, etc.

There are two basic problems:

Both problems may be solved with the same tool: the objects are represented (modelized) by colored mixtures.Similarity: the experimentalist looks for identical parts among a family of at least two objects. Most time, this is done by computer via an optimal superposition of the objects. In the simplest situation, the objects are finite sets of points. Data analysis methods of optimal superposition of two sets of points are known as Procrustes methods. In chemistry and biochemistry, the optimal superposition of sets of points via least squares techniques are known as RMS (Root Mean Square) methods.

Docking: the experimentalist looks for the geometric complementarity of two objects, as in the key-lock model. This recognition of shape complementarity may be done by computer, but it occurs spontaneously in the real life, such as for enzymatic recognition.

The idea is to work with distributions of random variables in a space which is little more complicated than the ordinary euclidean space (see [1] for a rigorous presentation).

A colored mixture is a random variable X taking values in the product of the d-dimensional euclidean space by a measurable space called the space of colors. The marginal distribution of X in R

In order to understand the use of this concept, we consider some simple situations. First imagine three equally weighted gaussian distributions in R: a red one, a green one, and a blue one. It means that the marginal distribution of X in the space of colors is Prob(red)=1/3, Prob(green)=1/3, and Prob(blue)=1/3. These three marginal distributions in R exist together, and constitute a mixture [2]. An other example is the molecular model of the charges distributions in a neutral molecule: it is a mixture of two equally probable distributions: the distribution of the negative charges, and the distribution of the positive charges (additivity of the effects of the charge are handled via an adequate operator). A last example is the case of a set of n points in R

Now, for physical applications, we consider two or more than two colored mixtures such that the marginal components of the colored mixtures in the space of colors are almost surely equal: this is the colored mixture model.

In other words, it means that, for all mixtures of marginal distributions in R

The best way to evaluate the similarity between two objects is to define a distance in the space of the objects. The Wasserstein metric is a well-known distance between distributions in R

D

The distance D has the following properties in the case of two finite sets of n colored points [1]:

- It is the Procrustes distance in the Procrustes method of optimal superposition
of the two sets.

Note that minimizing for orthogonal transformations or linear transformations rather than for rotations and translations are just trivial changes to the method. - It is equal to the RMS, in the RMS method of spatial alignment of the two sets.
- When all colors are identical, we get both the Procrustes and the RMS methods in the case of an unspecified correspondence. It means that we are looking for the optimal correspondence (among n!) which minimizes D. This correspondence is associated to the permutation matrix [nW].

The Monge-Kantorovitch transportation problem, in which a distribution of mass is transportated at minimal cost onto an other distribution of mass, is known to have deep relations with the Wasserstein metric [4]. There is now a formal link between the transportation problem and the optimal superposition problems, in which we move optimally a distribution onto an other one.

A very important application of the similarity theory occurs when the two objects are a pair of inverted images in a mirror: see the The Mathematical Theory of Chirality.

An analytical solution of the optimization problem has been given in some situations, discarding if the distributions of X and Y are inverted mirror images or not [5].

The freeware ARMS performs the optimal RMS superposition of two rigid conformers, provided that the pairwise correspondence is known or implicit. The computing time is linearly related to the size of the data.

The freeware CSR detects automatically the maximal common 3D motif of two rigid conformers and performs the optimal RMS superposition [6]. The computing time is quadratically related to the size of the data.

The shape complementarity could be viewed from the generalization of the parallelism concept to non trivial situations. Although the parallelism could be obvious for lines, planes, concentric cylinders, concentric spheres, etc., providing a general definition would require drastic conditions on the existence and the properties of the surfaces. Unfortunately, many practical situations involve sets or distributions for which the concept of surface does not make sense, such as for a finite discrete set of points. Note that in this latter situation, the polyhedral convex hull has little use in a docking context. So, there is a need to exhibit a general docking criterion, working for vrious kinds of sets and distributions.

The variance criterion hereafter suffices in many cases to measure the degree of complementarity of two distributions. Considering two colored mixtures X and Y and any of their joint distribution W, the variance is minimized for all rotations R and translations t of Y:

Z = Inf

The docking coefficient Z is formally analog to the minimized squared Wasserstein distance D

An analytical solution of the optimization problem has been given in some situations [1], including the case of two sets of n colored points in R

- PETITJEAN M.

*From Shape Similarity to Shape Complementarity: Toward a Docking Theory.*

J. Math. Chem. 2004,**35**[3],147-158.

(DOI 10.1023/B:JOMC.0000033252.59423.6b)

- EVERITT B.S., HAND D.J.

*Finite Mixture Distributions.*, chap. 1.

Chapman and Hall, London 1981.

- RACHEV S.T.

*Probability Metrics and the Stability of Stochastic Models.*

Wiley, New-York 1981.

- RACHEV S.T., RÜSCHENDORF L.

*Mass Transportation Problems.*Vol. I and II.

Springer-Verlag, New-York 1998.

- PETITJEAN M.

*Chiral Mixtures.*

J. Math. Phys. 2002,**43**[8],4147-4157. (DOI 10.1063/1.1484559)

Copyright AIP (American Institute of Physics), 2002. Download PDF paper (for personal use only. Other uses: see copyright information)

- PETITJEAN M.

*Interactive Maximal Common 3D Substructure Searching with the Combined SDM/RMS Algorithm.*

Comput.Chem. 1998,**22**[6],463-465.

(DOI 10.1016/S0097-8485(98)00017-5)

- PETITJEAN M.

*Solving the Geometric Docking Problem for Planar and Spatial Sets.*

Internet Electron. J. Mol. Des. 2002,**1**[4],185-192.

Download PDF paper (open access paper posted with permission from the Publisher)