A general quantitative structure-property relationship treatment for dielectric constants of polymers.
The electrical properties of polymers are important in many applications, such as insulation of cables, encapsulates for electric components, interlayer dielectrics, and printed wiring board materials (1). The dielectric constant, also called the relative permittivity [epsilon], is a measure of polarization of the medium between two charges when this medium is subjected to an electric field. A larger value of dielectric constant implies greater polarization of the medium between the two charges. Thus, the dielectric constant represents the ability of a substance to separate charge and/or to orient its molecular dipoles in external electric field.
The dielectric constant is an important fundamental molecular property, which can be a useful predictor of other electrical properties of polymers (1-3). Nevertheless, accurate experimental [epsilon] values of polymers are often unavailable. Theoretical prediction of dielectric constants is valuable in the molecular design of new polymeric materials. The ability to make fast and reliable predictions over a wide range of diverse chemical structures will substantially increase the productivity and speed of the research. However, the theoretical calculation of the dielectric constant of a polymer from the first principles is a complex problem, since this property is a function of so many factors, such as temperature and rate (frequency) of measurement; structure and composition of polymer; morphology of specimens; and impurities, fillers, plasticizers, other additives, and moisture (water molecules) in the polymer.
Alternatively, quantitative structure-property relationship (QSPR) provides a promising approach for estimating the f, values of polymers based on descriptors derived solely from the molecular structure to fit experimental data. The QSPR approach is based on the assumption that the variation of the behavior of the compounds, as expressed by any measured physicochemical properties, can be correlated with numerical changes in structural features of all compounds, termed "molecular descriptors" (4-7). Molecular descriptors are formal mathematical representations of a molecule, obtained by a well-specified algorithm, and applied to a defined molecular representation or a well-specified experimental procedure: the molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment (8). Each molecular descriptor takes into account a small part of the whole chemical information contained into the real molecule. Molecular descriptors play a fundamental role in developing QSPR models. The advantage of the QSPR approach lies in the fact that it requires only the knowledge of the chemical structure and is not dependent on any experimental properties. Once a correlation is established, it can be applicable for the prediction of the property of new compounds that have not been synthesized or found. Thus, the QSPR approach can expedite the process of development of new molecules and materials with desired properties. The QSPR approach has been successfully used to predict many properties of polymers, such as refractive index (1), (9-14), (glass transition temperature (10), (15-23), (cohesive energy (24), (thermal decomposition temperature (25), (and solubility parameter (26). Meanwhile), (several QSPR models for the dielectric constants of small organic molecules have also been reported in the literature (3), (27-30). However), (there have been relatively few attempts to correlate and predict the dielectric constants of polymers (1), (31). Liu et al. (31) introduce da model with squared correlation coefficient [R.sup.2] of 0.9086 and standard error (s) of 0.00103 for 22 polyalkenes by using three descriptors, but the [epsilon] values in this case span the range only from 2.154 to 2.165. Bicerano (1) developed a QSPR modelwith [R.sup.2] of 0.9580 and s of 0.0871 to relate [epsilon] with 32 topological and constitutional descriptors for 61polymers. There are too many descriptors involved in this model, since improvement of results by increasing the number of descriptors in the correlation should be considered with care, since over fitting and chance correlations may in part be due to such an approach. In addition, these two models have not been validated with the test set. In fact, validation is a crucial aspect of any QSPR/QSAR (quantitative structure-activity relationship) modeling (32).
The aim of this study was to produce a robust QSPR model that could predict the dielectric constant values from a training set of 45 polymers with extensive structural diversity, and then the model was validated with 12 polymers in the test set.
MATERIALS AND METHODS
A total of 57 polymers with extensive structural diversity were selected as the working dataset (see Table 1). The polymers chosen in the working dataset contained polyethylenes,polyacrylates,polystyrenes,polyethers,polysulfones,polyacrylnitrile,polyamides,polysiloxanes, and polyoxides. The functionalities present in the side chains included halides, acetates, ethers, cyanides, hydrocarbon chains, aromatic rings, and nonaromatic rings. The experimental [epsilon] values measured at room temperature (298 K) were taken from a published book by Bicerano (1), which ranged from 2.10 to 4.00.
TABLE 1, Polymers involved in this study with experimental and predicted dielectric constants. No. Compound Expt. Pred. 1 bisphenol-A polycarbonate 2.90 2.78 2 ethyl cellulose 2.70 2.70 3 poly(1,1 -ethane bis(4-phenyl)carbonate] 2.90 2.84 4 poly(l,4-butadiene) 2.51 2.58 5 poly(l,4-cyc!ohexylidene dimethylene terephthate) 3.00 2.86 6 poly(l-butene) 2.27 2.22 7 poiy(2,2/-(m-phenylene)-5,5'-bibenzimidazole] (a) 3.30 3.27 8 poly(3,4-dichlorostyrene) 2.94 2.77 9 poly(4,4'-diphenoxy di(4-phenylene)sulfone) 3.44 3.38 10 poly(4,4'-isopropylidene diphenoxy 3.18 3.26 di(4-phenylene)sulfone) 11 poly(4-methyl-l-pentene) 2.13 2.18 12 poly(a,a,a',a'-tetrafluoro-p-xylylene) (a) 2.35 2.34 13 polyacrylonitrilea 4.00 3.94 14 poly(a-methyl styrene) 2.57 2.49 15 poly(a-vinyl naphthalene) 2.60 2.54 16 poly(b-vinyI naphthalene) 2.51 2.61 17 poly(chloro-p-xylylene) 2.95 2.80 18 Polychlorotrifluoroethylene (a) 2.60 2.80 19 poly(cyclohexyl methacrylate) 2.58 2.75 20 poly(dimethyl siloxane) 2.75 2.83 21 poly(e-caprolactam) 3.50 3.37 22 poly(ether ether ketone) 3.20 3.19 23 poly(ethyl a-chloroacrylate) 3.10 3.28 24 po!y(ethyl methacrylate) 3.00 2.89 25 poly(ethylene terephthalate) 3.25 3.32 26 Polyethylene 2.30 2.27 27 poly(hexamethylene adipamide) 3.50 3.39 28 poly(hexamethylene sebacamide) 3.20 3.27 29 poly(isobutyl methacrylate) (a) 2.70 2.80 30 Poiyisobutylene (a) 2.23 2.16 31 Polyisoprene 2.37 2.44 32 poly(m-chloro styrene) 2.80 2.76 33 poly(methyl a-chloroacrylate) 3.40 3.53 34 poly(methyl methacrylate) 3.10 3.05 35 Poly(N, N'-(p, P'-oxydiphenyleneJpyromellitimide] 3.50 3.64 (a) 36 poly(n-butyl methacrylate) 2.82 2.81 37 poly(N-vinyl carbazole) 2.90 2.90 38 poly(o-methyl styrene) (a) 2.49 2.50 39 poly(oxy-2,2-dichloromethyltrimethylene) 3.00 2.93 40 Polyoxymethylene 3.10 3.10 41 poly(p-chloro styrene) 2.65 2.81 42 poly(p-hydroxybenzoate) 3.28 3.31 43 poly(p-methoxy-o-chIoro styrene) (a) 3.08 3.10 44 Polypropylene 2.20 2.22 45 poly(p-xylylene) 2.65 2.63 46 Polystyrene 2.55 2.59 47 Polytetrafluoroethylene (a) 2.10 2.00 48 poly(tetramethylene terephthalate) 3.10 3.06 49 poly(vinyl acetate) 3.25 3.21 50 poly(vinyl chloride) 2.95 2.85 51 poly(vinyl cyclohexane) 2.25 2.32 52 poly(vinylidene chloride) (a) 2.85 2.79 53 polyf 1, J-cyclohexane bis(4-phenyl)carbonate] 2.60 2.72 54 polyr4,4'-sulfone diphenoxy 3.80 3.62 di(4-phenylene)sulfone] (a) 55 poly[oxy(2,6-dimethyl-1,4-phenylene)] 2.60 2.68 56 poly [oxy(2,6-diphenyl-1,4-phenylene)] 2.80 2.89 57 oolvfthio(D-Dhenvlene)] 3.10 3.07 No. Compound Residual Leverag 1 bisphenol-A polycarbonate 0.12 0.1513 2 ethyl cellulose 0.00 0.5193 3 poly(1,1 -ethane bis(4-phenyl)carbonate] 0.06 0.1484 4 poly(l,4-butadiene) -0.07 0.0573 5 poly(l,4-cyc!ohexylidene dimethylene 0.14 0.1490 terephthate) 6 poly(l-butene) 0.05 0.0994 7 poiy(2,2/-(m-phenylene)-5,5'-bibenzimidazole] 0.03 0.0851 (a) 8 poly(3,4-dichlorostyrene) 0.17 0.1426 9 poly(4,4'-diphenoxy di(4-phenylene)sulfone) 0.06 0.0878 10 poly(4,4'-isopropylidene diphenoxy -0.08 0.0734 di(4-phenylene)sulfone) 11 poly(4-methyl-l-pentene) -0.05 0.1350 12 poly(a,a,a',a'-tetrafluoro-p-xylylene) (a) 0.01 0.8585 13 polyacrylonitrilea 0.06 0.9008 14 poly(a-methyl styrene) 0.08 0.0431 15 poly(a-vinyl naphthalene) 0.06 0.0491 16 poly(b-vinyI naphthalene) -0.10 0.0415 17 poly(chloro-p-xylylene) 0.15 0.0440 18 Polychlorotrifluoroethylene (a) -0.20 0.3367 19 poly(cyclohexyl methacrylate) -0.17 0.0452 20 poly(dimethyl siloxane) -0.08 0.1475 21 poly(e-caprolactam) 0.13 0.1700 22 poly(ether ether ketone) 0.01 0.0609 23 poly(ethyl a-chloroacrylate) -0.18 0.0815 24 poly(ethyl methacrylate) 0.11 0.0711 25 poly(ethylene terephthalate) -0.07 0.0658 26 Polyethylene 0.03 0.0787 27 poly(hexamethylene adipamide) 0.11 0.2317 28 poly(hexamethylene sebacamide) -0.07 0.328 i 29 poly(isobutyl methacrylate) (a) -0.10 0.1115 30 Poiyisobutylene (a) 0.07 0.1329 31 Polyisoprene -0.07 0.0707 32 poly(m-chloro styrene) 0.04 0.0506 33 poly(methyl a-chloroacrylate) -0.13 0.0992 34 poly(methyl methacrylate) 0.05 0.0570 35 Poly(N, N'-(p, -0.14 0.1479 P'-oxydiphenyleneJpyromellitimide] (a) 36 poly(n-butyl methacrylate) 0.01 0.0928 37 poly(N-vinyl carbazole) 0.00 0.0481 38 poly(o-methyl styrene) (a) -0.01 0.0558 39 poly(oxy-2,2-dichloromethyltrimethylene) 0.07 0.5542 40 Polyoxymethylene 0.00 0.5193 41 poly(p-chloro styrene) -0.16 0.0456 42 poly(p-hydroxybenzoate) -0.03 0.0660 43 poly(p-methoxy-o-chIoro styrene) (a) -0.02 0.0594 44 Polypropylene -0.02 0.1095 45 poly(p-xylylene) 0.02 0.0462 46 Polystyrene -0.04 0.0309 47 Polytetrafluoroethylene (a) 0.10 0.2963 48 poly(tetramethylene terephthalate) 0.04 0.1410 49 poly(vinyl acetate) 0.04 0.0476 50 poly(vinyl chloride) 0.10 0.0861 51 poly(vinyl cyclohexane) -0.07 0.0620 52 poly(vinylidene chloride) (a) 0.06 0.1901 53 polyf 1, J-cyclohexane bis(4-phenyl -0.12 0.1632 )carbonate] 54 polyr4,4'-sulfone diphenoxy 0.18 0.1197 di(4-phenylene)sulfone] (a) 55 poly[oxy(2,6-dimethyl-1,4-phenylene)] -0.08 0.2760 56 poly [oxy(2,6-diphenyl-1,4-phenylene)] -0.09 0.0458 57 oolvfthio(D-Dhenvlene)] 0.03 0.0716
It is impossible to calculate descriptors directly for the entire macromolecules because all polymers have high molecular weights and wide distribution of molecular weights (15). If the polymer chain is long enough, the terminal groups hold only a very small proportion in a polymer and its effect on the dielectric constant can be ignored. There are two approaches to resolve this problem in most QSPR studies about polymers: (1) using the monomer structure as representative of the corresponding polymer (10), (12), (16), (33) and (2) using the repeat unit end-capped with two hydrogen atoms as representative (9), (11), (15-17), (23). These methods fail to account for the influences between neighboring repeating units which are of great importance to polymeric properties. In this work, the cyclic dimer structure (one repeating unit is connected with another, as shown in Fig. 1) which has been described previously (14) was used to calculate descriptors for the corresponding polymer. It has been demonstrated that these structures would represent the polymers more realistically than the monomer or repeating unit structure and produce more accurate models.
The cyclic dimer structures of all polymers were drawn into the HYPERCHEM program (34) and preoptimized using MM +molecular mechanics method (Polak-Ribiere algorithm). The final geometries of the minimum energy conformation were obtained by the semi-empirical AMI method at a restricted Hartree-Fock level with no configuration interaction, applying a gradient norm limit of 0.01 kcal-A mol as a stopping criterion. Then a total of 1630 molecular descriptors for each polymer were calculated from the optimized molecular geometries using the DRAGON software (35). These descriptors include (1) OD-constitutional (atom and group counts); (2) lD-func-tional groups and atom centered fragments; (3) 2D-topo-logical, BCUTs, walk and path counts, autocorrelations, connectivity indices, information indices, topological charge indices, and eigenvalue-based indices; and (4) 3D-Randic molecular profiles from the geometry matrix, geometrical, WHIM, and GETAWAY descriptors.
To reduce redundant and nonuseful information, constant or near constant values and descriptors found to be highly correlated pairwise (one of any two descriptors with a correlation greater than 0,99 136) were excluded in a prereduction step. The 770 remaining descriptors underwent subsequent descriptor selection and model development.
Kennard and Stones Algorithm
Kennard and Stones algorithm (37) has been widely used for splitting datasets into two subsets. This algorithm starts by finding two samples, based on the input variables that are the farthest apart from each other. These two samples are removed from the original dataset and put into the calibration set. This procedure is repeated until the desired number of samples has been selected in the calibration set. The advantages of this algorithm are that the calibration samples always map the measured region of the input variable space completely with respect to the induced metric and that the no validation samples fall outside the measured region. Kennard and Stones algorithm has been considered as one of the best ways to build training and test sets (38), (39). Using Kennard and Stones algorithm, the entire set was divided into two subsets: a training set of 45 polymers, and a test set including the remaining 12 polymers.
[FIGURE 1 OMITTED]
Model Development and Validation
Stepwise multilinear regression analysis (MLRA) with Leave-One-Out (LOO) cross-validation was used to select descriptors for the QSPR models on the training set. F-to-enter and F-to-remove were 4 and 3, respectively. The models were justified by the [R.sup.2], the adjusted[R.sup.2], the cross-validated[R.sup.2], the F ratio values, the standard error s and the significance level value p. The adjusted [R.sup.2]is calculated using the following formula:
[R.sub.adj.sup.2] - 1 - [(n - 1/n - m - 1) (1-[R.sup.2])] (1)
where n is the number of samples of the training set and m is the number of descriptors involved in the correlation. The adjusted [R.sup.2] is a better measure of the proportion of variance in the data explained by the correlation than [R.sup.2] (especially for correlations developed using small data-sets) because R is somewhat sensitive to changes in n and m. The adjusted [R.sup.2] corrects for the artificiality introduced when m approaches n through the use of a penalty function which scales the result. F ratio is defined as the ratio between the model sum of squares and the residual sum of squares, which is a comparison between the model-explained variance and the residual variance: high values of the F ratio indicate reliable models. A variance inflation factor (VIF) was calculated to test if multicolli-nearities existed among the descriptors, which is defined as
VIF = 1/1 - [R.sub.j.sup.2] (2)
where [R.sub.j.sup.2]is the squared correlation coefficient between the yth coefficient regressed against all the other descriptors in the model. Models would not be accepted if they contain descriptors with VIFs above a value of five (40).
Randomization tests were also carried out to prove the possible existence of chance correlation. To do this, the dependent variable was randomly scrambled and used in the experiment. Models were then investigated with all members in the descriptor pool to find the most predictive models. The resulting models obtained on the training set with the randomized [epsilon] values should have significantly lower [R.sup.2] values than the proposed one because the relationship between the structure and property is broken. This is a proof of the proposed model's validity as it can be reasonably excluded that the originally proposed model was obtained by chance correlation.
Validation of the models was further performed by using the test set. The external RqVcxx for the test set is determined with Eq. 3:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where [epsilon], and fi, are the experimental and calculated values of the dielectric constants for the /th sample in the test set, respectively; [epsilon]tia is the averaged value of the dielectric constants for the training set; and the summation runs over all samples in the test set. According to Golbraikh and Tropsha (32), a QSPR model is successful if it satisfies several criteria as follows:
[R.sub.CV/ext.sup.2]> 0.5 (4a)
[r.sup.2]> 0.6 (4b)
([r.sup.2 - [r.sub.0.sup.2]])/[r.sup.2] <0.1 or([r.sup.2] - [r.sub.0.sup.j2])/[r.sup.2] <0.1 (4c)
0.85 [Less than or equal to] k [Less than or equal to] 1.15 or 0.85 [Less than or equal to] k' [Less than or equal to] 1.15 (4d)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5a)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5b)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5c)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5d)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5e)
where r is the correlation coefficient between the calculated and experimental values in the test set; [r.sub.0.sup.2] (calculated versus observed values) and [r.sub.0.sup.2] (observed versus calculated values) are the coefficients of determination; k and k' are slopes of regression lines through the origin of calculated versus observed and observed versus calculated, respectively; [[eplison].sub.i.sup.ro] and [[eplison].sub.i.sup.ro] are defined as sf = &Sj and [pounds sterling]-[degrees] = fc^, respectively; and the summations are over all samples in the test set.
[FIGURE 2 OMITTED]
The applicability domain of a QSPR model (38), (41) must be defined if the model is to be used for screening new compounds. Predictions for only those compounds that fall into this domain may be considered reliable. Extent of extrapolation (38) is one simple approach to define the applicability of the domain. It is based on the calculation of the leverage h, (42) for each compound, where the QSPR model is used to predict its property.
[h.sub.i]=[x.sub.i.sup.T] [([X.sup.T]X).sup.-1 xi (6)
where [x.sub.j] is the descriptor row-vector of the i-th compound, xj is the transpose of a, X is the descriptor matrix, [X.sub.T] is the transpose of X. The warning leverage h* is, generally, fixed at 3(m +1)/n, where n is the total number of samples in the training set and m is the number of descriptors involved in the correlation. A leverage greater than the warning leverage h* means that the predicted response is the result of a substantial extrapolation of the model and may not be reliable.
RESULTS AND DISCUSSION
The experimental [epsilon] values in Table 1 were divided into the training and test sets according to Kennard and Stones algorithm. Stepwise MLRA with LOO cross-validation was applied on the training set to select the descriptors for the best model and the number of descriptors in the final QSPR model was determined on the basis of the dataset size and on the basis of the correlation coefficient R, the adjusted Ry the significance test F and the standard error s. The [R.sup.2] and s results during the stepwise MLRA are shown in Fig. 2. Obviously, e is not linearly correlated with any of the molecular descriptors since univariant correlations between [epsilon] and the different descriptors have poor [R.sup.2] and.s values. The [R.sup.2] increases gradually with the increased number of descriptors. When adding another descriptor did not significantly improve the statistics of a model, it was determined that the optimum subset size had been achieved. To avoid over-parameterization of the models, such as those which contain an excess of descriptors and are difficult to interpret in terms of physical interactions, an increase of the [R.sup.2].value of less than 0.01 was chosen as the breakpoint criterion. In addition, from a statistical viewpoint the ratio of the number of samples to the number of descriptors (m) should not be too low. Usually, it is recommended that n/m [Greater than or equal to] 5. In the situation of this work, with 45 samples in the training set, nine descriptors were selected. The final correlation equation is the following:
[epsilon]= -0.0701 [nX] - 0.00376[T(F..F)] - 2.102[X0Av] + 1-728[AAC] + 0.0863[MATS6e] - 4.756[JGI6] + 0.0427[J3D]-0.115[H-0.48] - 0.0930[O - 060] + 2.045 (7)
N = 45, R2 = 0.9384, R^v= 0,9329, /?Lj = 0.9370, s = 0.0873, F = 655.0, p < 0.00001.
Here, nX is the number of halogen atoms; T(F..F) is the sum of topological distances between fluorine and fluorine atoms; XOAv is the average valence connectivity index chi-0; AAC is the mean information index on atomic composition; MATS6e is the Moran autocorrelation--lag 6/weighted by atomic Sanderson electronegativities; JGI6 is the mean topological charge index of order 6; J3D is the 3D-Balaban index; H-048 is the number of hydrogen atoms attached to C2(sp3)/C'(sp2)/C[degrees](sp), where the superscript represents the formal oxidation number; and O-060 is the number of oxygen atoms in A--lO--Ar/ ArO--Ar/R..O..R/R--O--C=X, respectively. More in-formafion abou these descriptors can be found in Dragon software user's guide (35) and the references therein.
In general, the larger the magnitude of the F ratio, the better the model predicts the property values in the training set. The large F ratio of 655.0 indicates that Eq. 7 does an excellent job of predicting the [epsilon] values. Equation 7 has an adjusted [R.sup.2] value of 0.9370, which indicates very good agreement between the correlation and the variation in the data. The cross-validated correlation coefficient [R.sup.2] cv = 0.9329 illustrates the reliability of the model by focusing on the sensitivity of the model to the elimination of any single data point. The model was further validated by applying (he randomization test and the obtained [R.sup.2] vs. the correlation coefficient between the original and permuted response data are plotted in Fig. 3. The lower [R.sup.2] values indicate that the good results of the original model are not due to chance correlation or structural dependency of the training set. Some important statistical parameters (as given in Table 2) were used to valuate the involved descriptors. The f-value of a descriptor measures the statistical significance of the regression coefficients. The high absolute t-values shown in Table 2 express that the regression coefficients of the descriptors involved in the MLR model are significantly larger than the standard deviation. The t-probability of a descriptor can describe the statistical significance when combined together within an overall collective QSPR model (i.e., descriptors' interactions). Descriptors with t-probability values below 0.05 (95% confidence) are usually considered statistically significant in a particular model, which means that their influence on the response variable is not merely by chance (43). The smaller t-probability suggests the more significant descriptor. The t-probability values of the four descriptors are very small, indicating that all of them are highly significant descriptors. The VIF values and the correlation matrix as shown in Table 3 suggest that these descriptors are weakly correlated with each other. Thus, the model can be regarded as an optimal regression equation.
TABLE 2. Characteristics of descriptors in the best MLRA model. Descriptor Descriptor type X DX t-value Constant 2.045 0.127 16.083 nX Constitutional -0.0701 0.0113 -6.191 descriptors T(F..F) Topological -0.00376 0.00089 -4.206 descriptors XOAv Connectivity -2.102 0.178 -11.804 indices AAC Information 1.728 0.071 24.330 indices MATS6e 2D 0.0863 0.021 4.191 autocorrelations JGI6 Topological -4.756 1.089 -4.369 charge indices J3D Geometrical 0.0427 0.0126 3.383 descriptors H-048 Atom-centered -0.115 0.0195 -5.888 fragments O-060 Atom-centered -0.0930 0.0122 -7.589 fragments Descriptor Descriptor type t-probability VIF Constant 0.000 nX Constitutional 0.000 2.812 descriptors T(F..F) Topological 0.000 2.736 descriptors XOAv Connectivity 0.000 1.627 indices AAC Information 0.000 1.524 indices MATS6e 2D 0.000 1.181 autocorrelations JGI6 Topological 0.000 1.258 charge indices J3D Geometrical 0.001 1.291 descriptors H-048 Atom-centered 0.000 1.147 fragments O-060 Atom-centered 0.000 1.349 fragments TABLE 3. Correlation matrix between the selected descriptors and e. nX T(F..F) XOAv AAC MATS6e JGI6 nX 1.0000 T(F..F) 0.7419 1.0000 XOAv -0.1041 -0.3164 1.0000 AAC 0.2166 0.0364 -0.0253 1.0000 MATS6e -0.1677 -0.1316 -0.0545 0.0649 1.0000 JGI6 0.0578 0.0206 -0.0992 0.3412 -0.1896 1.0000 J3D -0.0661 -0.0541 0.4187 -0.0666 0.0111 -0.0300 H-048 -0.0844 -0.0394 -0.0648 0.1157 -0.0652 -0.0851 O-060 -0.1887 -0.1141 -0.1995 0.2638 -0.0595 0.1819 [epsilon]. -0.2102 -0.2702 -0.2722 0.7422 0.3798 0.1295 J3D H-048 O-060 [epsilon] nX T(F..F) XOAv AAC MATS6e JGI6 J3D 1.0000 H-048 0.0312 1.0000 O-060 0.0054 -0.1054 1.0000 [epsilon]. -0.1101 0.0148 0.1403 1.0000
[FIGURE 3 OMITTED]
The calculated results of the [epsilon] values from Eq. 7 for the training and test sets are shown in Table 1 and Fig. 4. The distributions of errors for the entire dataset are also given in Fig. 5. As the errors are distributed on both sides of the zero line, one may conclude that there is no systematic error in the model development. The min/max values of the relative error (RE) are 0.09%/6.42% and 0.20%/7.84% for the training and test sets, respectively. Among the entire dataset, there are 33 samples with RE less than 3.00% and only five samples have RE higher than 5.00%. The mean relative error is 2.67% for the entire dataset. The following statistical parameters were obtained for the test set, which obviously satisfy the generally accepted condition and thus demonstrate the predictive power of the present model:
[R.sup.2]CV.ext = 0.9699 > 0.5
[r.sup.2]= 0.9699 > 0.6
([r.sup.2] - [r.sub.0.sup.2] = (0.9699 - 0.9999)/0.9699 <0.1
or ([r.sup.2] - [r.sub.0.sup.2]) [r.sup.2] = (0.9699 - 1.0000)/0.9699 < 0.1 0.85 [Greater than or equal to] k= 1.0007 < 1.15 or 0.85 < k' = 0.9982 < 1.15 (8)
It needs to be pointed out that no matter how robust and validated a QSPR model may be, it cannot be expected to reliably predict the modeled property for the entire universe of compounds. Therefore, before a QSPR model is put into use for screening compounds, its applicability domain must be defined and predictions for only those compounds that fall in this domain can be considered as reliable. The extent of extrapolation method was applied to the 57 polymers that constitute the entire data-set. The leverages for all 57 polymers were computed (as listed in Table 1) and two polymers (poly(a,a,a',a'-tetra-(luoro-p-xylylene) and polyacrylonitrile) were found to fall outside the domain of the model (warning leverage limit 0.6667).
To further test the suitability of the QSPR model developed in our study, the obtained statistical parameters were compared with those calculated from Bicerano's model (1). It can be seen that the performance of Bicerano's model *1) ([R.sup.2]= 0.9580 and s = 0.0871) is a little better than the present model ([R.sup.2]= 0.9520 and s = 0.0912 for the entire dataset). There are nine descriptors in the present model, while Bicerano's QSPR model consisted of 32 topological and constitutional descriptors, rovement of results by increasing the number of descriptors in the correlation should be considered with care, since over fitting and chance correlations may in part be due to such an approach.
On the basis of a previously described procedure (44), (45), the relative contributions of the nine descriptors to the model were determined and are plotted in Fig. 6. Nine descriptors were needed in the QSPR model from a training set of 45 polymers, showing that the analyzed dataset is quite "noisy" within a small data range (2.10-4.00), although it is not against the rule of thumb for building a linear model, that is, at least five data point (samples) per descriptor must exist in the model. The significance of the descriptors involved in the model decreases in the following order: AAC > XOAv > O-060 > H-048 > riX > JGI6 > T(F..F) >J3D > MATS6e.
The first important descriptor is the information index AAC, which explains 27.4% contribution of the total and itself correlates relatively high (R = 0.7422) with the tar-get experimental 8 values. The AAC descriptor describes each atom by its own atom type and the bond types and atom types of its first neighbors. AAC is a measure of atomic composition related to molecular complexity. When the molecule is bigger and its elemental composition is more complex, this descriptor increases. The positive coefficient of AAC indicates that the polymers with larger values for this descriptor and accordingly more complex composition would have larger [epsilon] values. Thus, this descriptor could be an indicator for polymers that have a large [epsilon] value.
The second important descriptor is the connectivity index XOAv, which explains 13.7% of the contributions. The XOAv descriptor is derived directly from the molecular structural formula, encoding information about the size, branching, cyclizalion, unsaturation, and heteroatom content in the molecule. With an increase in connectivity index, branching, size, presence of heteroatoms, and double and triple bonds, and molecular volume also increases. This descriptor has a negative effect on the [epsilon] value. The atom-centered fragments O-060 and H-048 are calculated by knowing the molecular composition and atom connectivities. The presences of O-060, H-048, and nX in the model reflect the influence of certain atoms on the [epsilon] values. The importance of the intramolecular charge transfer on the [epsilon] values is apparent due to the presence of JGI6 in the model. T(F..F) represents the sum of topological distances between fluorine and fluorine atoms in the molecule and measures the position of fluorine atoms with respect to each other. The coefficient of the descriptor T(F..F) has a negative sign in the model, which indicates that a decrease in the distance between fluorine and fluorine atoms is favorable to the [epsilon] value. The 3D-Balaban index J3D is calculated by the same formula as the Bala ban distance connectivity index using the geometric distance degrees in place of topological distance degrees,which encodes the compactness of the molecule. The positive coefficient of J3D in the model indicates that the polymers with higher molecular compactness would have larger [epsilon] values. The presence of MATS6e in the equation illuminates the influence of atomic electronegativities on the [epsilon] values because this descriptor encodes electronegativity information associated with atom pairs separated by six bonds.
[FIGURE 4 OMITTED]
[FIGURE 5 OMITTED]
[FIGURE 6 OMITTED]
In this article, a general QSPR model with [R.sup.2]= 0.9384 and s = 0.0873 was obtained to predict dielectric constants for a diverse set of polymers, based on descriptors calculated from the cyclic dimer structures of polymers. Several validation techniques illustrated the reliability of the present model. The mean relative error is 2.67% for the entire dataset, indicating that the present model is predictive. The model relies solely on descriptors derived from the chemical structure and thus it is applicable to regular polymers of any chemical structure. Therefore, this QSPR model should be useful in the development of new polymers with desired dielectric constants.
The authors gratefully wish to express their thanks to the reviewers for critically reviewing the manuscript and making important suggestions.
(1.) J. Bicerano, Prediction of Polymer Properties, Marcel Dekkcr, New York (1996).
(2.) G. Hougham, G. Tesoro, and A. Viehbeck, Macromolecules, 29, 3453 (1996).
(3.) R.C. Schweitzer and J.B. Morris, J. Chem. Inf. Comput. Set., 40, 1253 (2000).
(4.) J. Devillers and A.T. Balaban, Eds., Topological Indicesand Related Descriptors in QSAR and QSPR, Gordon and Breach, The Netherlands (1999).
(5.) M. Karelson, Molecular Descriptors in QSAR/QSPR, Wiley Online Library, New York (2000).
(6.) X.J. Yao, Y.W. Wang, X.Y. Zhang, R.S. Zhang, M.C. Liu, Z.D. Hu, and B.T. Fan, Chemom. Intell. Lab. Syst., 62, 217 (2002).
(7.) J. Xu, B. Guo, B. Chen, and Q. Zhang, J. Mol. Model., 12, 65 (2005).
(8.) R. Todeschini and V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, Weinheim (2000).
(9.) A. Katritzky, S. Sild, and M. Karelson, J. Chem. Inf. Com?put. Sci., 38, 1171 (1998).
(10.) R. Garcia-Domenech and J.V. Julian-Ortiz, J. Phys. Chem. B, 106, 1501 (2002).
(11.) J. Xu, B. Chen, Q. Zhang, and B. Guo, Polymer, 45, 8651 (2004).
(12.) X. Yu, B. Yi, and X. Wang, J. Comput. Chem., 28, 2336 (2007).
(13.) J. Gao, J. Xu, B. Chen, and Q. Zhang, J. Mol. Model., 13, 573 (2007).
(14.) J. Xu, H. Liang, B. Chen, W. Xu, X. Shen, and H. Liu, Chemom. Intell. Lab. Syst., 92, 152 (2008).
(15.) A. Katritzky, S. Sild, V. Lobanov, and M. Karelson, J. Chem. Inf. Comput. Sci., 38, 300 (1998).
(16.) B.E. Mattioni and P.C. Jurs, J. Chem. Inf. Comput. Sci., 42, 232 (2002).
(17.) C. Cao and Y. Lin, J. Chem. Inf. Comput. Sci., 43, 643 (2003).
(18.) A. Afantitis, G. Melagraki, K. Makridima, A. Alexandridis, H. Sarimveis, and O. Iglessi-Markopoulou, J. Mol. Struct. (Theochem), 716, 193 (2005).
(19.) X. Yu, X. Wang, H. Wang, A. Liu, and C. Zhang, J. Mol. Struct. (Theochem), 766, 113 (2006).
(20.) X. Yu, B. Yi, X. Wang, and Z. Xie, Chem. Phys., 332, 115 (2007).
(21.) C. Bertinetto, C. Duce, A. Micheli, R. Solaro, A. Starita, and M.R. Tine, Polymer, 48, 7121 (2007).
(22.) C. Duce, A. Micheli, A. Starita, M.R. Tine, and R. Solaro, Macromol. Rapid Commun., 27, 711 (2006).
(23.) X. Yu, Fiber Polym., 11, 757 (2010).
(24.) J. Xu, B. Chen, H. Liang, W. Xu, and W. Cui, Polimery, 54, 19 (2009).
(25.) D. Ajloo, A. Sharifian, and H. Behniafar, Bull. Korean Chem. Soc, 31, 2009 (2008).
(26.) X. Yu, X. Wang, H. Wang, X. Li, and J. Gao, QSAR Comb. Sci., 25, 156 (2006).
(27.) M. Cocchi, P.G.D. Benedetti, R. Seeber, L. Tassi, and A. Ulrici, J. Chem. Inf. Comput. Sci., 39, 1190 (1999).
(28.) R.C. Schweitzer and J.B. Morris, Anal. Chim. Acta, 384, 285 (1999).
(29.) S. Sild and M. Karelson,J. Chem. Inf. Comput. M, 42, 360 (2002).
(30.) J.-P. Liu, W.V. Wilding, N.F. Giles, and R.L. Rowley, J. Chem. Eng. Data, 55, 41 (2010).
(31.) A. Liu, X. Wang, L. Wang, H. Wang, and H. Wang, Eur. Polym. J., 43, 989 (2007).
(32.) A. Golbraikb and A. Tropsba, J. Mol. Graph. Model., 20, 269 (2002).
(33.) A. Afantitis, G. Mclagraki, H. Sarimveis, P.A. Koutentis, J. Markopoulos, and O. Igglessi-Markopoulou, Polymer, 47, (3240 (2006).
(34.) HYPERCHEM, Version 6.01, Hypercube, Inc., Gainesville, USA (2000).
(35.) R. Todeschini, V. Consonni, A. Mauri, and M. Pavan, DRAGON for Windows (Software for Molecular Descriptor Calculations), Version 5.4, TALETE srl, Milan (2006).
(36.) H. Liu and P. Gramalica, Bioorgan. Med. Chem., 15, 5251 (2007).
(37.) R.W.Kemmrd and L.A. Stone, Technometrics, U, 137 (1969).
(38.) A.Tropsha, P.Gramatica, and V.K.Gombar, QSAR Comb. Set., 22, 69 (2003).
(39.) W. Wu, B. Walczak, D.L. Massart, S. Heuerding, F. Erni, l.R. Last, and K.A. Prebble, Chemom. Intell. Lab. Syst., 33, 35 (1996).
(40.) A.J. Holder, D.M Yourtee, D.A. White, A.G. Glaros, and R. mith, J. Comput. Aid. Moi. Des., 17, 223 (2003).
(41.) M. Shen, C. Beguin, A. Golbraikh, J.P. Stables, H. Kohn, and A. Tropsha, J. Med. Chem., 47, 2356 (2004).
(42.) A. Atkinson, Plots, Transformations, and Regression, Clarendon Press,Oxford, UK (1985).
(43.) L.F.Ramsey and W.D. Schafer, The Statistical Sleuth, Wadsworth Publishing Company, USA (1997).
(44.) F.Zheng, E.Bayram, S.P.Sumithran, J.T.Ayers, C.-G.Zhan, J.D.Schmitt, L.P.Dwoskin, and P.A.Crooks, Bioorg. Med. Chem., 14, 3017 (2006).
(45.) R.Guha and P.C. Jurs,J. Chem. Inf. Model., 45, 800 (2005).
Jie Xu, (1) Lei Wang, (1) Guijie Liang, (1), (2) Luoxin Wang, (1) Xiaolin Shen (1)
(1.) Key Lab of Green Processing and Functional Textiles of New Textile Materials, Ministry of Edua Wuhan Textile University, 430073, Wuhan, China
(2.) Department of Materials Chemistry and Physics, College of Materials Science and Engineering, Xi'an Jiao Tong University 710049, Xi'an, China
Correspondence to: Jie Xu; e-mail: firstname.lastname@example.org
Contract grant sponsor: Natural Science Foundation of Hubei Province; contract grant number: 2008CDB261; contract grant sponsor: Key Project of Science and Technology Research of Ministry of Education; contract grant number: 208089; contract grant sponsor; Educational Commission of Hubei Province; contract grant number: Q20101606; contract granl sponsor: Natural Science Foundation of China; contract grant number: 51003082.
Published online in Wiley Online Library (wileyonlinelibrary.com).C 2011 Society of Plastics Engineers
|Printer friendly Cite/link Email Feedback|
|Author:||Xu, Jie; Wang, Lei; Liang, Guijie; Wang, Luoxin; Shen, Xiaolin|
|Publication:||Polymer Engineering and Science|
|Date:||Dec 1, 2011|
|Previous Article:||Co-injection molding of immiscible polymers: skin-core structure and adhesion studies.|
|Next Article:||Removal of methylene f blue dyes from wastewater using cellulose-based superadsorbent hydrogels.|