Abstract
Metabolisms represent highly organized systems characterized by strong regulations satisfying the mass conservation principle. This makes a whole chemical resource to be competitively shared between several ways at both intra-and inter-molecular scales. Whole resource sharing can be statistically translated by a constant sum-unit constraint which represents the basis of simplex mixture rule. In this work, a new simplex-based simulation approach was developed to extract scaffold information on metabolic processes controlling molecular diversity from a wide set of observed chemical structures. Starting from a wide dataset of chemical structures initially classified into p clusters, a machine learning process was applied by linearly combining the p clusters j through several (N) samplings of a constant number (n) of molecules by respecting different clusters’ weights (wj/w) given by Scheffé’s mixture matrix. At the output of mixture design, the N molecular linear combinations lead to calculate N barycentric molecules integrating the characteristics of the different weighted clusters. The mixture-design was iterated by bootstrap technique for extensive exploration of chemical variability between and within clusters. Finally, the K response matrices resulting from K iterated mixture designs were averaged to calculate a smoothed matrix containing scaffold information on regulation processes responsible for molecular diversification at inter- and intra-molecular (atomic) scales. This matrix was used as a backbone for graphical analysis of multidirectional positive and negative trends between atomic characteristics (chemical substitutions) at both mentioned scales. This new simplex approach was illustrated by cycloartane- based saponins of Astragalus genus by combining three desmosylation clusters characterized by relative glycosylation levels of different aglycones' carbons.
Keywords: Computational chemistry, Simulation, Training, Molecular diversity, Cycloartane, Glycosylation, Desmosylation.