The Distance Geometry Problem (DGP) asks whether a simple weighted undirected graph
G=(V,E,d) can be embedded in a K-dimensional Euclidean space so that the
distance between every pair of embedded vertices u and v belonging to V
corresponds to the weight d(u,v) assigned to the edge (u,v), when available.
The distance information can be given either through an exact value, or by a real-valued interval.
The DGP is NP-hard. One of the most interesting applications of the DGP is in biology.
Experiments of Nuclear Magnetic Resonance (NMR) are able to find estimates of some distances between atom
pairs of molecular conformations, and the problem of finding these conformations by using this distance
information is, in fact, a DGP. Over the last years, in collaboration with
Carlile Lavor and
and other people, we have been working on the discretization of the DGP.
Data mining is the nontrivial extraction of previously unknown, potentially useful and reliable patterns from a given set of data. Data mining techniques can be mainly divided into two categories: classification techniques and clustering techniques. Classification techniques can be used when there is some previous knowledge about the data, whereas clustering techniques try to discover inherent patterns hidden in the data without using any additional information. Classification techniques include the k-Nearest Neighbor (kNN), Artificial Neural Networks (ANNs), and Support Vector Machines (SVMs). The most used clustering technique is k-means, that is very simple to apply, and which has many variants. Most data mining techniques bring to the formulation of global optimization problems.
Proteins described by a mathematician.
Proteins are heteropolymers. They are formed by smaller molecules called amino acids, that are bonded together to form a sort of "chain".
Living organisms contain plenty of proteins, which perform several functions of vital importance.
Each amino acid along the protein chain contains a Carbon atom called Cα,
which is chemically bonded to three different aggregates of atoms.
Two of these aggregates are common to each amino acid: the amino group (NH2) and the carboxyl group (COOH).
Only one aggregate of atoms makes each amino acid different from one another: the so-called side chain R.
Only 20 amino acids are involved into the protein synthesis.
These molecules are under genetic control: the amino acid sequence of a protein is contained into the
corresponding gene of the DNA molecule. The three-dimensional conformation of a protein molecule is of great interest,
because it defines its dynamics in the cell, and therefore its function.
Why are we so interested in proteins?
C. M. Dobson, A. Sali and M. Karplus wrote:
"An understand of folding is important for the analysis of many events involved in cellular regulation, the design of proteins with novel functions, the utilization of sequence information from the various genome projects, and the development of novel therapeutic strategies for treating or preventing debilitating human diseases that are associated with the failure of proteins to fold correctly".
Exploring the optimal conformations of clusters of molecules. Finding the ground state conformations of clusters of molecules is one of the most difficult global optimization problems. Such clusters are regulated by potential energy functions, such as the Lennard Jones and the Morse potential energy. The Lennard Jones potential is very simple but sufficiently accurate for noble gas, and it also provides approximations for structures of nickel and gold. The Morse potential has a parameter which determines the width of the potential well; Morse clusters show a wide variety of structural behaviors in function of this parameter. Both potential functions have several local minima in which methods for optimization may get stuck at. For instance, a Lennard Jones cluster formed by 13 atoms has about 1000 local minima, and the number of local minima increases exponentially with the size of the cluster. The Lennard Jones and Morse optimal clusters usually have particular geometric motifs, which can be exploited during the optimization process. The Cambridge Cluster Database contains experimentally obtained clusters. Some of them may just be local minima: the database is open to newly generated clusters having a lower energy function value.
Climbing trees like a monkey.
A monkey can survive in a forest of trees because it is able to focus its food researches where it already found good food to eat.
During my period of work in the United States,
and I proposed a novel method for global optimization which is inspired by the behavior of a monkey.
We named this method Monkey Search. The trees the monkey explores contain solutions to the optimization problem to be solved,
and branches of the tree represent perturbations able to transform one solution into another. At each step,
the monkey chooses the branches to climb by applying a random mechanism, which is based on the objective function values of
the new corresponding solutions. The monkey prefers better solutions, but it is not forced to choose them.
Parallel Computing. There are several applications that require high computing resources. Parallel computing allows to exploit the CPU power of many processors simultaneously for solving difficult problems. There exist parallel computers with different architectures, and the most used parallel computers are nowadays the MIMD machines with distributed memory. We are developing parallel versions of the Branch & Prune algorithm, which we employ for solving the distance geometry problem (see above). The experimental testing of the parallel algorithm has been performed on the French nation-wide grid infrastructure Grid5000.