Antonio Mucherino's Research


Keywords: bioinformatics, data mining, combinatorial and global optimization, meta-heuristics, parallel computing

The Molecular Distance Geometry Problem. The distance geometry problem is the problem of finding the coordinates of a given set of points in the three-dimensional space when some of the relative distances between pairs of such points are known. An interesting application of this problem is to protein conformations. Experimental techniques, such as the Nuclear Magnetic Resonance (NMR), are able to find estimates of some of the distances between the atoms of the molecule which are closer than about 6 Angstrom. The problem of finding all the coordinates of the atoms of a protein, starting from the information obtained by NMR, is a distance geometry problem.
We are working on a combinatorial reformulation of the distance geometry problem and on an ad-hoc algorithm for its solution. The discretization of the problem is possible when some particular assumptions are satisfied. We proposed two reformulations. The first reformulation is based on the structure of protein conformations, and we refer to the reformulated combinatorial problem as the Discretizable Molecular Distance Geometry Problem (DMDGP). More recently, we proposed another reformulation for the problem which is based on weaker assumptions that are not related to molecular conformations. We named the reformulated problem Discretizable Distance Geometry Problem (DDGP). For solving both the combinatorial problems, we employ a Branch & Prune (BP) algorithm, which is strongly based on the combinatorial structure of the two problems.
We are developing a software tool called MD-jeep, which is distributed under the GNU General Public License (v.2). The sources of MD-jeep can be downloaded here. This work is in collaboration with Leo Liberti, Carlile Lavor and Nelson Maculan.


Data Mining. Data mining is the nontrivial extraction of previously unknown, potentially useful and reliable patterns from a given set of data. Data mining techniques can be mainly divided into two categories: classification techniques and clustering techniques. Classification techniques can be used when there is some previous knowledge about the data, whereas clustering techniques try to discover inherent patterns hidden in the data without using any additional information. Classification techniques include the k Nearest Neighbor method (kNN), the Artificial Neural Networks (ANNs), the Support Vector Machines (SVMs). The most used clustering technique is k-means, which is very simple to apply, and that has many variants. In many practical applications, clustering techniques need to be applied, because the knowledge about the data is usually limited. Most data mining techniques bring to the formulation of a global optimization problem.
I'm co-author with Panos Pardalos and Petraq Papajorgji of the textbook Data Mining in Agriculture. The book is devoted to data mining techniques applied to the agricultural field. Both classification and clustering techniques are considered in the book, and many examples of applications in agriculture are discussed. The implementations in Matlab of simple algorithms help the reader understand the topics and allow to perform specific experiments. The book Data Mining in Agriculture can be purchased on Springer.com.
Among the most recent data mining techniques, supervised biclustering techniques seem to be promising. Given a set of data, biclustering aims at finding simultaneous partitions in biclusters of its samples and of the features which are used for representing the samples. Consistent biclusterings can be exploited for performing supervised classifications, and the problem of finding consistent biclusterings of training sets can be formulated as a 0–1 linear fractional optimization problem. I'm working on a bilevel reformulation of this optimization problem, and on a heuristic algorithm which is based on the bilevel reformulation.


Exploring the optimal conformations of clusters of molecules. Finding the ground state conformations of clusters of molecules is one of the most difficult global optimization problems. Such clusters are regulated by potential energy functions, such as the Lennard Jones and the Morse energy. The Lennard Jones energy is very simple but sufficiently accurate for noble gas, and it also provides approximations for structures of nickel and gold. The Morse energy has a parameter which determines the width of the potential well. Morse clusters show a wide variety of structural behaviors in function of this parameter. Both the potential energies have a lot of local minima in which methods for optimization may get stuck at. For instance, a Lennard Jones cluster formed by 13 atoms has about 1000 local minima, and the number of local minima increases with the size of the cluster. The Lennard Jones and Morse optimal clusters usually satisfy particular geometric motifs, which can be exploited during the optimization process. The Cambridge Cluster Database contains experimentally obtained clusters. Some of them may be just a local minimum, and hence the database is open to newly generated clusters having a lower energy function value.


Climbing trees like a monkey. A monkey can survive in a forest of trees because it is able to focus its food researches where it already found good food to eat. Onur Seref and I proposed a novel method for global optimization which is inspired by the behavior of a monkey. We named this method Monkey Search. The trees the monkey explores contain solutions to the optimization problem to be solved, and branches of the tree represent perturbations able to transform one solution into another. At each step, the monkey chooses the branches to climb by applying a random mechanism, which is based on the objective function values of the new corresponding solutions. The monkey prefers better solutions, but it is not forced to choose them.
Rather than a completely new meta-heuristic approach for global optimization, the Monkey Search can be considered as a general framework (describing the monkey behavior) where other ideas and strategies for global optimization are also employed. Perturbations can be taken from other methods for optimization (such as Genetic Algorithms, Harmony Search, Ant Colony Optimization, etc) and other perturbations can also be inspired by the optimization problem to be solved, leading to a wide range of variants for this meta-heuristic method. We implemented the Monkey Search in C programming language for solving particular classes of optimization problems. Moreover, Alla Kammerdiner implemented the Monkey Search algorithm for solving the Multidimensional Assignment Problem (MAP).


Parallel Computing. There are several applications that require high computing resources. Parallel computing allows to exploit the CPU power of many processors simultaneously for solving difficult problems. There exist parallel computers with different architectures, and the most used parallel computers are nowadays the MIMD machines with distributed memory. I recently developed a parallel version of the Branch & Prune algorithm which we use for solving the DMDGP. The experimental testing of the parallel algorithm has been performed on the French nation-wide grid infrastructure Grid5000.


Proteins described by a mathematician. Proteins are heteropolymers. They are formed by smaller molecules called amino acids, that are bound together to form a "chain". Living organisms contain plenty of proteins, which perform many functions of vital importance. Each amino acid along the protein chain contains a Carbon atom called C_alpha, which is chemically bound to three different aggregates of atoms. Two of these aggregates are common to each amino acid: the amino group (NH2) and the carboxyl group (COOH). Just one aggregate of atoms makes each amino acid different by one another: the so-called side chain R. Just 20 amino acids are involved into the protein synthesis. These molecules are under genetic control: the amino acid sequence of a protein is contained into the corresponding gene on the DNA molecule. The three-dimensional conformation of a protein molecule is of interest, because it defines its dynamics in the cell, and therefore its function.
Different properties of the side chains allow a classification the amino acids on the basis of different features. A very important feature of the amino acids is the polarity of their side chains R. A side chain is polar if it has a global or local electric charge. There is an attraction between positive and negative charged molecules and there is rejection between molecules with the same charge. Since the protein solvent, i.e. water, is polar and globally uncharged, there is attraction between the water and the polar R groups. It follows that the amino acids having a polar side chain are hydrophilic, because they tend to interact with the solvent. Other amino acids are buried inside the protein because of the hydrophilic amino acid behavior. Therefore, they are called hydrophobic. This feature of the amino acids plays a fundamental role in the stabilization of the protein molecules. The attractive and repulsive Van der Waals forces are important as well. They can be modeled, for instance, by the Lennard Jones energy (see above). Click here to have more information on the amino acids and on their properties.
Each amino acid along the protein chain is bound to the previous and the successive amino acid through a peptide bond. More precisely, the carboxyl group of the i-th amino acid is bound to the amino group of the (i+1)-th amino acid, i.e. a covalent bond keeps bound the C atom of the first group and the N atom of the second one. The peptide bond is strong enough to force every atom between two consecutive C_alpha atoms on the same plane. Therefore, the movement possibilities of the amino acids on the chain are limited.
More information about proteins and the computational methods for finding their stable conformations can be found on this web page on the Arnold Neumaier web site. Experimentally determined protein conformations can be downloaded from the Protein Data Bank (PDB) which currently contains 66828 entries.


Why should one work on proteins? C. M. Dobson, A. Sali and M. Karplus wrote: "An understand of folding is important for the analysis of many events involved in cellular regulation, the design of proteins with novel functions, the utilization of sequence information from the various genome projects, and the development of novel therapeutic strategies for treating or preventing debilitating human diseases that are associated with the failure of proteins to fold correctly".

Simulating the conformation of protein molecules. The final solution of the protein folding problem would be the identification of a strategy for correlating the known chemical structure of a protein to its three-dimensional conformation. I performed various analyses on protein conformations, where particular attention was given to their geometric aspects. I also worked on a model for protein simulations which is mainly based on geometric properties of protein conformations. Differently from other models for protein prediction, the considered model does not rely on chemical and physical forces involved into the folding process, but it rather attempts the simulation of protein conformations by exploiting their geometric properties. Computational experiments proved that the geometric properties of protein conformations play an important rule in the folding process. Indeed, among the conformations we simulated, there are conformations which are very similar to the ones protein molecules actually have in nature. The similarities between protein conformations have been evaluated through Root Mean Square Deviations (RMSD).


Back Home