Activity Prediction

Inquiry

In drug discovery research, it is challenging but of great importance to be able to determine which 3-dimensional (3D) shapes (so-called conformers) of a given molecule are responsible for its observed biological activity. Due to structural flexibility, a molecule may adopt a wide range of conformers and the identification of the bioactive conformers is extremely important in order to understand the recognition mechanism between small molecules and proteins, which is crucial in drug discovery and development. The drug activity prediction is to predict the activity of proposed drug compounds by learning from the observed activity of previously-synthesized drug compounds. The most reliable approach to obtain the bioactive conformer is to use the X-ray crystal structure of a ligand-protein complex.

Below is a list of activity prediction methods

QSAR (3D quantitative structure-activity relationship) methods

The QSAR methods are without instance-based embedding. In the QSAR methods, three widely used classification algorithms includes decision tree (DT), 1-norm SVM, and random forest.

The decision tree is a greedy method based on a recursive partitioning algorithm. The classification trees were constructed using the 'classregtree' function implemented in Matlab R2011b. The tree-based classification method can account well for multiple binding mechanisms. Gini's diversity index was used for recursive partitioning, and the minimal number of molecules per tree leaf was set as 3 to terminate tree growing.
The 1-norm SVM model is a statistical learning theory derived from the structural risk minimization principle and Vapnik-Chervonenkis (VC) dimension, which is different from the tree-based method.
Since the major drawback of DT is its low prediction caused by the overfitted tree-based structure, the ensemble learning method, random forests, can deliver improved prediction while retaining the appealing properties of tree-based methods. It is a collection of decision trees which are grown from bootstrapping samples of the original data without tree pruning, and has been demonstrated as one of the most powerful tools available for data exploration. The Matlab implementation (randomforest-matlab v0.02) was used with default parameters.

The multiple-instance learning (MIL)

Researcher encoded the 3-dimensional structures using pharmacophore fingerprints which are binary strings, and accomplished instance-based embedding using calculated dissimilarity distances. Four dissimilarity measures were employed and their performances were compared. 1-norm SVM was used for joint feature selection and classification. The approach was applied to four data sets, and the best proposed model for each data set was determined by using the dissimilarity measure yielding the smallest number of selected features. The proposed approach produced the best predictive models for one data set and second best predictive models for the rest of the data sets, based on the external validations.

Why Choose BOC Sciences?

BOC Sciences provides high-quality, low-cost, high-tech products to customers around the world. We has employed a dedicated staff of professional chemists to help you develop the most efficient process for your program. Each step of product synthesis is subject to BOC Sciences' stringent quality control. Our experienced staff will do the drug activity prediction with different models according to your request and provide 100% guaranteed service to customers.

Predict compound activity with BOC Sciences advanced modeling

Our activity prediction services use statistical models, AI, and cheminformatics to estimate bioactivity, guiding compound selection and prioritization efficiently.

Submit your inquiry to request a custom solution.

References

García, G. C., Ruiz, I. L., & Gómez-Nieto, M. Á. (2011, June). Prediction of drug activity using molecular fragments-based representation and RFE support vector machine algorithm. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (pp. 396-405). Springer, Berlin, Heidelberg.
Fu, G., Nan, X., Liu, H., Patel, R. Y., Daga, P. R., Chen, Y., & Doerksen, R. J. (2012, September). Implementation of multiple-instance learning in drug activity prediction. In BMC bioinformatics (Vol. 13, No. 15, p. S3). BioMed Central.
Hermann, J. C., Marti-Arbona, R., Fedorov, A. A., Fedorov, E., Almo, S. C., Shoichet, B. K., & Raushel, F. M. (2007). Structure-based activity prediction for an enzyme of unknown function. Nature, 448(7155), 775.