# QSPR Models of Physicochemical Properties of Natural Amino Acids by Using Topological Indices and MLR Method.

Byline: Afsaneh Safari and Fatemah. ShafieiSummary: The mathematical has provided very useful tools for relationships between the structures and molecular properties. Topological indices are numerical parameters of a graph which characterize its topology and are usually graph invariant. In this study the relationship between the Randic' (1X), Balaban (J), Szeged (Sz), Harary (H), Wiener (W), Hyper-Wiener (WW) and Wiener Polarity (Wp)to the thermal energy (Eth kJ/mol), heat capacity (CV J/mol K) and entropy (SJ/mol K) of 19 natural amino acids is represented.

Physicochemical properties are taken from the quantum mechanics methodology with HF level using the ab initio 6-31G basis sets. The multiple linear regressions (MLR) and Back ward methods (with significant at the 0.05 level) were employed to give the QSPR models.

After MLR analysis, we studied the validation of linearity between the molecular descriptors in the best models for used properties. The predictive powers of the models were discussed by using the method of cross-validation.

The results have shown that one descriptor (W) could be efficiently used for estimating the entropy (S), heat capacity (Cv) and Wiener Polarity index could be used for modeling and predicting the thermal energy of respect compounds.

Keywords: Topological indices; Natural amino acids; Multiple linear regression; QSPR; Validation.

Introduction

Amino acids (AAs) the building blocks of proteins, molecules that contain amine and carboxyl functional groups play an important role in biology such as: synthesis of proteins [1, 2], intermediates of metabolic pathways [3], neurotransmitters [4, 5], antibiotics [6, 7]. The standard amino acids were mostly investigated.

Quantitative Structure-Property Relationship (QSPR)and Quantitative Structure-Activity Relationship (QSAR) methods are based on the hypothesis that change in chemical structure of a compound is reflected in change to its physico-chemical properties [8].

Topological indices (TIs) proposed by the group of mathematical chemists that is final result of a logic and mathematical procedure which can be used for translating chemical constitution into numerical values for QSPR /QSAR studies. Topological descriptors to be calculated for any set of molecules and must be derived from well-established procedures, which enable molecular. For the use of the topological descriptors, knowledge of chemo-informatics, chemometrics, statistics and the principles of the QSPR approaches is necessary.

Revived interest in Tls is apparent from numerous recent reviews [9-12, 13].

Peptides and analogues QSAR modeling has been used to estimate antimicrobials and angiotensin-converting enzyme (ACE) inhibitors [14].

The oxytocin vasopressin analogues of 20 natural amino acids have been investigated by using semi-quantitative descriptors [15].

Two-dimensional descriptor for QSAR studies of amino acids includes ten orthogonal factors (TOF) has been developed [16].

QSAR studies for prediction hydrophobic, steric, and electronic properties of 20 coded amino acids by using a new set of descriptors and multiple regression method combined with partial Least Squares regression (PLS) have been reported [17].

Amino acids descriptors such as Divided Physico-chemical Property Scores (DPPS), Vectors of Hydrophobic, Steric and Electronic properties (VHSE) were applied to QSAR study of 214 tripeptides. PLS method is applied for prediction of antioxidant activity of tripeptides [18].

A new topological scale (T-scale) for the studies on structure-activity of 135 amino acids has been examined [19].

New topological descriptors were calculated for predicting the relationship between structure and biological activity of 87 amino acids [20].

A new molecular descriptors based on quantum topological molecular similarity (QTMS) parameters of amino acid has been applied for modeling of peptides and best models were established [21].

The main aim of this work is to illustrate the usefulness of topological descriptors in QSPR study of natural amino acids. As far as we are aware, this is the first QSPR study for prediction physicochemical properties such as the heat capacity (Cv), thermal energy and entropy of natural amino acids using topological indices.

Experimental

Quantum Chemical Computations

The chemical structure of the AAs was drawn by Gauss View Software and then was transferred into Gaussian03 program to optimize the physicochemical properties of the molecules at the RHF/6-31 G level of theory.

The heat capacity (Cv), entropy (S) and thermal energy (Eth) of nineteen essential (AAs) is taken from the quantum mechanics methodology are listed in Table-1.

Table-1: Amino acids and their heat capacity, entropy and thermal energy used in present study.

Comp.###S###Cv###Eth

###Compounds

No###(J/mol.K)###(J/mol.K)###(kJ/mol)

1###Alanine###335.929###97.657###320.347

2###Arginine###504.868###198.993###617.992

3###Asparagine###389.901###136.38###408.779

4###Aspartic acid###383.088###127.92###382.800

5###Cysteine###370.331###117.332###323.607

6###Glutamine###433.294###151.179###494.042

7###Glutamic acid###473.731###168.88###457.129

8###Glycine###309.169###76.356###237.779

9###Histidine###429.624###148.54###480.917

10###Isoleucine###415.053###155.851###570.842

11###Leucine###403.027###141.447###567.268

12###Lysine###456.556###173.871###623.819

13###Methionine###441.638###157.783###495.184

14###Phenylalanine###439.163###163.759###564.124

15###Serine###351.032###108.282###338.889

16###Threonine###388.372###135.515###418.309

17###Tryptophane###469.355###195.908###653.915

18###Tyrosine###458.989###181.909###578.499

19###Valine###389.576###137.643###486.677

Topological Indices

Nowadays, in the literature, hundreds of topological indices, suitable to describe different properties, are reported. The topological indices (TIs) used for the QSPR analysis were Wiener (W) [22], Szeged (Sz)[23], first order molecular connectivity(1X) [24], Balaban(J) [25], Hyper-Wiener(WW) [26], Wiener Polarity(Wp)[27] and Harary(H) [28] indices. Moreover, many investigations were carried out with such descriptors [29, 31].

All the used topological indices were calculated using all hydrogen suppressed graph by deleting all the carbon hydrogen as well as hydrogen bonded heteroatoms from the structure of the amino acids. The descriptors were calculated with chemicalize program [32]. Seven topological indices tested in the present study are recorded in Table-2.

Statistical Analysis

Structure-Property models (MLR models) are generated using the multi linear regression procedure of SPSS version 16.0 and backward stepwise regression was used to construct the QSPR models. For drawing the graphs of our results, we used the Microsoft Office Excel - 2010 program.

Regression Analyses

The heat capacity (Cv J/mol K), entropy (S J/mol K) and thermal energy (Eth kJ/mol) are used as the dependent variable and 1X, J, H, W, Wp, WW and Sz indices as the independent variables. Criteria for selection of the best multiple linear regression model were the statistics: correlation coefficient (R), squared multiple correlation coefficient (R2), adjusted correlation coefficient ( ), Fisher ratio (F), root mean square error (RMSE), Durbin-Watson value (DW) and significant (Sig).

Results and Discussion

Several linear QSPR models involving three-seven descriptors are established and strongest multivariable correlations are identified by the Back ward step wise regression routine implemented in SPSS 20.0 was used to develop the linear model for the prediction of the physico chemical properties.

Table-2: Amino acids and their topological indices, used in present study.

Comp.No.###X###J###H###W###WW###Wp###Sz

###1###2.64###2.99###9.33###29###47###4###29

###2###5.54###3.2###26.92###247###739###11###247

###3###4.04###3.38###17.97###96###206###8###96

###4###4.57###3.38###17.97###96###206###8###96

###5###3.18###3.14###12###46###83###6###46

###6###5.07###3.3###20.87###136###330###9###136

###7###5.07###3.3###20.87###136###330###9###136

###8###2.27###2.54###6.67###18###28###2###18

###9###5.2###1.98###25.6###165###397###10###182

###10###4.09###3.58###18.23###92###188###10###92

###11###4.04###3.38###17.97###96###206###8###96

###12###4.63###3.13###20.45###143###368###9###143

###13###4.18###3.16###17.55###102###235###8###102

###14###6.24###2.2###29.17###212###542###13###293

###15###3.72###3.14###12###46###83###6###46

###16###3.55###3.46###15.17###65###122###8###65

###17###7.72###1.76###42.96###369###1018###20###518

###18###6.09###2.05###32.99###268###735###15###376

###19###4.09###3.46###15.17###65###122###8###65

QSPR Models for the Heat Capacity (Cv), Entropy(S) and Thermal Energy (Eth)

The best linear model for Cv, S and Eth contains three topological descriptors, namely, Wiener polarity (Wp), Wiener (W) and Szeged (Sz) indices.

The statistical items of the best three descriptors correlation model were as follows:

Model 1

Cv = 60.243+7.396(Wp) +0.829(W)-0.595(Sz) ----------- (1)

N=19, R=0.963, R2=0.928, R2adj = 0.913

RMSE = 75.686, F=64.120, Sig=0.000, DW=1.115

Model 2

S = 290.743+7.773(Wp) +1.668(W)-1.102(Sz) ------------- (2)

N = 19, R = 0.944, R2=0.890, R2adj = 0.868

RMSE = 117.576, F=40.627, Sig=0.000, DW=1.084

Model 3

Eth =150.806+ 34.627(Wp)-1.957 (Sz) +2.326(W) ----------- (3)

N=19, R=0.994, R2 =0.799, R2adj = 0.759,

RMSE = 257.991, F=19.857, Sig=0.000, DW=2.375

We studied the relationship between topological indices to the thermal energy (Eth), heat capacity (Cv) and entropy (S) of 19 natural amino acids.

In this study, to find the best model for predict the properties mentioned, we will use the following sections.

Multicollinearity

In regression analysis Collinearity occurs when two predictor variables in a multiple regression have a non-zero correlation. Multicollinearity occurs when more than two predictor variables are inter-correlated. Test multicollinearity as a basis the variance inflation factor (VIF) value of multicollinearity test results using SPSS. If the VIF value lies between1-10, then there is no multicollinearity, and if the VIF10, then there is multicollinearity.

In all our final models, the Multicollinearity has existed, because the values of correlations between independent variables are near to one and VIFs value lies are not use of between 1-10.

Verification and Validation

Verification and validation are the primary processes for quantifying and building confidence (or credibility) in numerical models [33].

In this section for verification and validation of the regression models, we will focus on the Durbin-Watson statistic and unstandardized predicted and residual values. One of the assumptions of regression is that the observations are independent. If observations are made over time, it is likely that successive observations are related. If there is no autocorrelation (where subsequent observations are related), the Durbin-Watson statistic should be between 1.5 and 2.5.

In our all models, the value of Durbin-Watson statistic is close to 2(See eqs.1-3) and hence the errors are uncorrelated.

We studied the validation of linearity between the molecular descriptors in the models 1, 2 and 3. We obtained by SPSS the Pearson coefficient correlation and collinearity statistics as follows Tables (3, 4).

Table-3: Correlation between the molecular descriptors (models 1, 2).

###Pearson correlations###Collinearity###Corrected

###(models 1,2)###statistical###model

###Sz###Wp###W###Tolerance###VIF###VIF

Sz###1 -0.390###-0.816###0.031###31.990###-

Wp###1###-0.185###0.090###11.077###-

W###1###0.036###28.097###1

Table-4: Correlation between the molecular descriptors (model 3).

###Pearson correlations###Collinearity statistical

###Sz###(model 3)###Corrected

###Wp###W###Tolerance###VIF###model

###VIF

Sz###1###-0.390###-0.816###0.031###31.990###-

Wp###1###-0.185###0.090###11.077###1

W###1###0.036###28.097###-

For models 1-3 the Pearson correlation (Sz, W) is near one, and VIF (Sz),(W) and VIF (Wp)>10, therefore there is a linearity between Sz and W. After removed Sz from these models, we corrected models 1 and 2 as follows:

(Equations)

Similarity model 1 and 2 we obtained corrected model 3 as follows:

(Equation)

In equations (4-6) Q2LOO is the squared cross-validation coefficients for leave one out, respectively.

We have computed Q2LOO (Eq.7) by 50% of data, randomly, that are positive and less than one.

(Equation)

In the equation (7), the notation i|i indicates is predicted by a model estimated when the i-th sample was left out from the training set.

Regular Residuals

The residual is the difference between the observed and predicted values.Comparison between predicted and observed values of Cv, S and Eth of respect natural amino acids show in Table 5.

Fig. (1-3) show the linear correlation between the observed and the predicted heat capacity, entropy and thermal energy values obtained using equations (4-6) respectively.

Table-5: Comparison between predicted and observed values of the heat capacity (Cv), entropy (S) and thermal energy (Eth) of respect AAs.

###Comp.###Observed###Predicted###Observed###Predicted###Observed###Predicted

###Residual###Residual###Residual

###No.###Cv(J/molK)###Cv(J/molK)###S(J/molK)###S(J/molK)###Eth (kJ/mol)###Eth (kJ/mol)

###1###97.66###116.21###-18.55###335.929###368.412###-32.483###320.347###352.170###-31.823

###2###76.36###112.88###-36.52###309.169###363.470###-54.301###237.779###303.635###-65.856

###3###137.64###127.09###10.55###389.576###384.587###4.989###486.677###449.240###37.437

###4###141.45###136.47###4.98###403.027###398.514###4.513###567.268###449.240###118.028

###5###108.28###121.35###-13.07###351.032###376.050###-25.018###338.889###400.705###-61.816

###6###135.52###127.09###8.43###388.372###384.587###3.785###418.309###449.240###-30.931

###7###181.91###188.48###-6.57###458.989###475.792###-16.803###578.499###619.113###-40.614

###8###157.78###138.28###19.50###441.638###401.210###40.428###495.184###449.240###45.944

###9###168.88###148.56###20.32###473.731###416.486###57.245###457.129###473.508###-16.379

###10###151.18###148.56###2.62###433.294###416.486###16.808###494.042###473.508###20.534

###11###148.54###157.33###-8.79###429.624###429.515###0.109###480.917###497.775###-16.858

###12###127.92###136.47###-8.55###383.088###398.514###-15.426###382.800###449.240###-66.440

###13###195.91###219.02###-23.11###469.355###521.170###-51.815###653.915###740.450###-86.535

###14###163.76###171.54###-7.78###439.163###450.632###-11.469###564.124###570.578###-6.454

###15###117.33###121.35###-4.02###370.331###376.050###-5.719###323.607###400.705###-77.098

###16###173.87###150.68###23.19###456.556###419.631###36.925###623.819###473.508###150.311

###17###136.38###136.47###-0.09###389.901###398.514###-8.613###408.779###449.240###-40.461

###18###155.85###135.26###20.59###415.053###396.717###18.336###570.842###497.775###73.067

###19###198.99###182.13###16.87###504.868###466.357###38.511###617.992###522.043###95.949

Conclusion

In this study, QSPR mathematical models for the prediction of the heat capacity (Cv), entropy (S) and thermal energy(Eth) of amino acids by using MLR method based on topological descriptors calculated from molecular structure alone have been developed.

MLR model is proved to be a useful tool in the prediction of Cv, S and Eth. Cross-validation as the evaluation technique has been designed to evaluate the quality and predictive ability of the MLR model. The obtained results showed that the only one topological index (W) for predicting Cv and S. Wiener polarity index is good topological index for modeling thermal energy.

References

1. M. Szymanski and J. Barciszewski, The Genetic Code-40 years on, Acta. Bio. chim. Pol. 54, 51 (2007).

2. T. Yoshimoto, Biochemistry and Structural Biology of Microbial Enzymes and their Medical Applications, Yakugaku Zasshi. 27, 1035 (2007).

3. C. Cunchillos, G. Lecointre, Ordering Events of Biochemical Evolution, Bio.chimie. 89, 555 (2007).

4. L. E. Trudeau, R. Gutierrez, On Cotransmission and Neurotransmitter Phenotype Plasticity, Mol. Interv. 7, 138 (2007).

5. V. Demidchik and F. J. M. Maathuis, Physiological Roles of Nonselective Cation Channels in Plants: From Salt Stress to Signalling and Development, New. Phytol. 175, 387 (2007).

6. T. Degenkolb, J. Kirschbaum and H. Bruckner, New Sequences, Constituents, and Producers of Peptaibiotics: An Updated Review, Chem. Biodivers. 4, 1052 (2007).

7. R. Mazurkie Wicz, A. Kuznik and M. Grymel, A. Pazdzierniok-Holewa, [alpha]-Amino Acid Derivatives with a C[alpha]-P bond in Organic Synthesis, Arkivoc. 6, 193 (2007).

8. Tu. Le, V. Chandana Epa, F. R. Burden and D. A. Winkler, Quantitative Structure-Property Relationship Modeling of Diverse Materials Properties, Chem. Rev. 112, 2889 (2012).

9. I. Gutman, S. KlavA3/4ar, An Algorithm for the Calculation of the Szeged Index of Benzenoid Hydrocarbons, J. Chem. Inf. Comput. Sci. 35, 1011 (1995).

10. M. Randic', On Characterization of Molecular Branches, J. Am. Chem. Soc. 97, 6609 (1975).

11. D. J. Klein, I. Lukovits, I. Gutman, On the Definition of the Hyper-Wiener Index for Cycle-Containing Structures, J. Chem. Inf. Comput. Sci. 35, 50 (1995).

12. F. Shafiei, Relationship between Topological Indices and Thermodynamic Properties of the Monocarboxylic Acids Applications in QSPR, Iranian I. J. Math. Chem. 6, 15 (2015).

13. R. Todeschini, V. Consonni, "Molecular Descriptors for Chemoinformatics", WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim, p.46 (2009).

14. L. Yang, M. Shu, K. Ma, H. Mei, Y. Jiang and Z. Li, ST-Scale as a Novel Amino Acid Descriptor and its Application in QSAM of Peptides and Analogues. Amino Acids. 38, 805 (2010).

15. P. H. A. Sneath, Relations between Chemical Structure and Biological Activity in Peptides. J. Theor. Biol. 12, 157 (1996).

16. A. Kidera, Y. Konishi, M. Oka, T. Ooi, H. A. Scheraga, Statistical Analysis of the Physical Properties of the 20 Naturally Occurring Amino Acids. J. Protein Chem. 4, 23 (1985).

17. H. Mei, Z. H. Liao, Y. Zhou and S. Z. Li, A New Set of Amino Acid Descriptors and its Application in Peptide QSARs. Biopolymers. 80, 775 (2005).

18. L. Yao. Wang, L. Bo, H. Jiguo and Q. Ping, Quantitative Structure-Activity Relationship Study of Antioxidative Peptide by Using Different Sets of Amino Acids Descriptors, J. Mol. Struct. 998, 53 (2011).

19. F. Tian, P. Zhou, Z. Li, T-Scale as a Novel Vector of Topological Descriptors for Amino Acids and its Application in QSARs of Peptides, J. Mol. Struct. 830, 106 (2007).

20. M. Sandberg, L. Eriksson, J. Jonsson, M. Sjostrom, S. Wold, New Chemical Descriptors Relevant for the Design of Biologically Active Peptides. A Multivariate Characterization of 87 Amino Acids, J. Med. Chem. 41, 2481 (1998).

21. B .Hemmateenejad, M.Shamsipur, A. R. Mehdipour, Novel Amino Acids Indices Based on Quantum Topological Molecular Similarity and their Application to QSAR Study of Peptides, Amino Acids. 40, 1169 (2011).

22. P. V. Khadikar, N. V. Deshpande, P. P. Kale, A. Dobrynin, I. Gutman and G. Domotor, The Szeged Index and an Analogy with the Wiener Index, J. Chem. Inf. Compt. Sci. 35, 547 (1995).

23. I. Gutman, S. KlavA3/4ar, An Algorithm for the Calculation of the Szeged Index of Benzenoid Hydrocarbons, J. Chem. Inf. Comput. Sci. 35, 1011 (1995).

24. M. Randic, Generalized Molecular Descriptors, J. Math. Chem. 7, 155 (1991).

25. A. T. Balaban, Highly Discriminating Distance Based Topological Indices, Chem. Phys. Lett. 89, 399 (1982).

26. I. Gutman, A New Hyper-Wiener index, Croat. Chem. Acta. 77, 61 (2004).

27. M. Liu and B. Liu, On the Wiener Polarity Index, MATCH Commun. Math. Comput. Chem. 66, 293 (2011).

28. C. K. Das, B. Zhou and N. Trinajstic, Bounds on Harary Index, J. Math. Chem. 1369 (2009).

29. L. Blaha, J. Damborsky, M. Nemec, QSAR for Acute Toxicity of Saturated and Unsaturated Halogenated Compounds, Chemosphere. 36, 1345 (1998).

30. F. Shafiei, H. Hosseini, Quantitative Structure Property Relationship Models for the Prediction of Gas Heat Capacity of Benzene Derivatives Using Topological Indices, MATCH Commun. Math. Comput. Chem. 75, 583 (2016).

31. Z. Slanina, M. C. Chao, S. L. Lee and I. Gutman, "On Applicability of the Wiener Index to Estimate Relative Stabilities of the Higher-fullerene IPR Isomers", J. Serb. Chem. Soc. 62, 211 (1997).

32. Web search engine developed by ChemAxon; software available at http:// WWW. Chemicalize. Org.

33. P. J. Roach, Verification and Validation in Computational Science and Engineering, Hermosa Publishers, Albuquerque, NM, 1998.

Printer friendly Cite/link Email Feedback | |

Publication: | Journal of the Chemical Society of Pakistan |
---|---|

Article Type: | Report |

Date: | Oct 31, 2017 |

Words: | 3804 |

Previous Article: | Hydration Kinetics of Some Durum and Bread Wheat Varieties Grown in South-Eastern Region of Turkey. |

Next Article: | Photocatalytic Inactivation of Hospital-Associated Bacteria using Titania Nanoparticle Coated Textiles. |

Topics: |