# Applying Learning Analytics for the Early Prediction of Students' Academic Performance in Blended Learning.

IntroductionBlended learning, also known as hybrid learning or mixed-mode instruction, incorporates one or two learning strategies into traditional classroom teaching. In 1960, many computer programming courses relied on the Internet to deliver digital learning materials to students; for example, Programmed Logic for Automatic Teaching Operations, developed at the University of Illinois (Hart, 1995), provided teaching activities that could be conducted on a large scale to enable a single instructor to simultaneously teach a large number of students.

In recent years, blended learning has become a popular teaching strategy because of the development of data analysis and computation; for example, Ellis, Pardo, and Han (2016) integrated social networking into a one-semester course and monitored the behaviors of over 220 undergraduate engineering students. The researchers used the students' interactive records to examine how to help them succeed in a collaboratively driven course. Hong et al. (2016) adopted a web game to develop ten teaching scenarios. After 6 weeks of experimentation on 110 elementary school students, the researchers indicated that the students were highly motivated by the combination of game-based learning and traditional classroom activities. Huang, Yang, Chiang, and Su (2016) improved students' learning motivations and performance in an English course by incorporating a mobile-based vocabulary feedback application into a traditional classroom environment.

To gain benefits from blended learning, many educators have adopted the Online Assessment System (OAS) or Massive Open Online Courses (MOOCs) into their course design; for example, Awang and Zakaria (2013) integrated the OAS into an integral course for 101 college students. The results indicated that the OAS improved the students' learning performance. Lu, Huang, Huang, and Yang (2017) incorporated MOOCs into a course and the results showed evidence of a well-defined intervention strategy. The course not only facilitated the students' learning achievements but also increased their level of engagement. Although the aforementioned studies have explained the advantages of blended learning, many researchers have asserted that in blended courses, monitoring students' learning behaviors and habits is difficult because of the complex learning environment (Ellis et al., 2016; Hong et al., 2016; Huang et al., 2016). Furthermore, at- risk students cannot be identified, and thus timely interventions cannot be conducted to facilitate learning success (Tempelaar, Rienties, & Giesbers, 2015).

To help students achieve classroom success, educators in Europe and the United States have recently applied learning analytics. In 2011, Horizon Report, a report of educational trends, investigated the benefits and future trends of learning analytics (Johnson, Smith, Willis, Levine, & Haywood, 2011). The report defined learning analytics as an ideal framework to improve learning performance based on data of students' learning history. Because of the limitations of data analysis and computation, learning analytics has been considered as a conceptual framework since 2011. Because of the rise of big data technology, in 2016, a special issue of Horizon Report was released on learning analytics to highlight that the optimal time to incorporate learning analytics into classroom settings had arrived (Johnson et al., 2016).

In recent years, learning analytics has served as a conceptual framework for the analysis of course characteristics, and has included prediction of students' learning performance, educational data analysis process development (Hwang, Chu, & Yin, 2017), data collection, and timely intervention (Hwang, 2014). To develop a conceptual framework for learning analysis, many researchers have designed and implemented courses with strategies for learning analytics. Lu et al. (2017) measured student engagement in a virtual learning environment and intervened with the students' learning activities according to the engagement score. The results showed improvements in the students' final academic performance and their self- regulated abilities after applying learning analytics. Hachey, Wladis, and Conway (2014) collected the learning data of 962 students to determine the factors that influence their grade point averages (GPAs). The results showed that students with no experience of online learning obtained low retention rates and had low GPA scores. The researchers concluded that online learning and practice must be offered to students without relevant experience before the beginning of a course (Papamitsiou & Economides, 2014).

In our research, learning analytics is a conceptual framework and as a part of our Precision education used to analyze and predict students' performance and provide timely interventions based on student learning profiles. The idea of our Precision education is the same as of The Precision Medicine Initiative (see https://obamawhitehouse.archives.gov/node/333101), which was proposed by President Obama in his 2015 State of the Union address, the Initiative is a new research effort to revolutionize the medical treatment of disease. As addressed in this Initiative, most treatments were designed for the average patients as a result of one-size-fits-all-approach treatments which could be successful for some patients but not for others. With the same philosophy, we carry the idea of Precision medicine, which is to improve the diagnosis, prediction, treatment, and prevention of disease, and define the objective of our Precision education as the improvement of diagnosis, prediction, treatment, and prevention of learning outcome.

The previous studies have shown that the development of big data technology has enabled learning analytics to become a suitable method for facilitating student success. The advantage of blended learning is that huge quantities of learning data can be collected through learning management system (LMS) to enrich personal learning data. However, few case studies have been conducted on the effects of applying learning analytics in blended courses due to the complexity of learning environments and the diversity of data. To provide timely interventions for at-risk students through learning analytics in blended learning, the present study not only implemented a MOOC and OAS enabled Calculus course but also proposed a process for the early identification of at-risk students. To predict students' final academic performance, many studies have used only one data set: a subset of a blended course. To improve prediction performance, critical factors may need to be identified and prediction accuracy may need to be compared using a data set combining online and traditional learning activities. The following research questions were proposed:

RQ1. How early can we predict students' final academic performance?

RQ2. Which are the most critical factors that affect students' final academic performance in blended learning?

RQ3. Which type of data set (blended vs. online vs. traditional) is more effective for predicting students' final academic performance in blended learning?

Literature review

Identification of at-risk students

According to the learning analytics executive reports by Arroway, Morgan, O'Keefe, and Yanosky (2015) and Kuzilek, Hlosta, Herrmannova, Zdrahal, and Wolff (2015), the first stage of implementing learning analytics is to identify at-risk students. Moreover, at-risk student identification must be conducted as early as possible to allow sufficient time for instructors to conduct educational interventions to facilitate students' learning achievements. Early at-risk student identification originated from the implementation of an open course that yielded a high dropout rate (Yang, Huang, & Huang, 2017).

Many researchers have defined dropout as a risk of MOOCs and have designed prediction methods to identify the dropout group. Xing, Chen, Stein, and Marcinkowski (2016) collected data on 3,617 students' video watching behaviors in 2014 and developed a classification model to identify the students likely to drop out by the following week. The results suggested that the retention rate would have been higher if the instructors had conducted timely interventions based on the prediction results. Lara, Lizcano, Martinez, Pazos, and Riera (2014) collected historical data on 100 students in a virtual learning environment consisting of five variables and proposed a knowledge discovery system for dividing students into dropout and non-dropout groups. The researchers reached a 90% classification accuracy through a verification process involving 100 students. Thammasiri, Delen, Meesad, and Kasap (2014) compared several resample algorithms with 7 years of student interaction data to assess data imbalance. Moreover, the target data was 80% true, indicating that 80% of freshman continued their studies, and 20% as false, indicating that 20% dropped out. These results show that the combination of synthetic minority oversampling (SMOTE) and the support vector machine yielded a classification accuracy of 90%, which was an improvement on the 86% accuracy without resampling in 10-folder cross validation. In addition to online courses, numerous researchers have incorporated student learning performance prediction into traditional classroom settings. Hachey et al. (2014) used a unique combination of variables to construct several classification models and verified the models with historical data collected from a learning management system. The results indicated that if the goal is to predict the learning outcomes of students with online course experience, retention rate is a more useful variable than GPA. For all other goals, GPA is more favorable. The results of the aforementioned studies show that at-risk students can be identified through classification methods if at-risk is defined as potential course dropout. However, in contrast to some studies, which have used data from open courses and pure online courses, another group of researchers defined at-risk as students who failed or obtained low grades at the end of a course. Many researchers have since adopted this approach for predicting students' final academic performance.

Students' final academic performance prediction

To identify at-risk students based on their final grades, scores, or learning outcomes, educational data mining can be used to identify students' behavioral patterns and predict their grades (Romero & Ventura, 2010). Romero, Lopez, Luna, and Ventura (2013) collected data on 114 students from an online discussion forum and separated them into several data subsets on a weekly basis before evaluating each data set's predictive accuracy through several data-mining methods. Romero et al. (2013) used the sequential minimal optimization classification algorithm and student interaction data before a midterm exam to achieve the highest accuracy for predicting student learning performance. Hu, Lo, and Shih (2014) developed an early warning system by using a decision tree classifier. The model was constructed from data on 300 students and contained 13 online variables, including for how long each student had used the system and how many documents had been read by each student in the preceding week. The results revealed a 95% accuracy in predicting whether students would pass or fail based on 1-4 weeks of data from a skewed data set. To verify which critical factors affect prediction performance, Villagra-Arnedo, Gallego-Duran, Compan, Llorens-Largo, and Molina- Carmona (2016) determined 8 variables for student behavior and 53 for learning activity from a learning management system. Villagra-Arnedo et al. (2016) designed four experiments to validate a data set with different variable combinations. The results demonstrated that a data set with particular variables had the highest correlation coefficient with grades and could attain higher prediction accuracy than the others.

In addition to predicting student learning outcomes, one study used students' grades as prediction labels and marked students as at-risk if their prediction grades were below average. Meier, Xu, Atan, and van der Schaar (2016) used regression to design a neighbourhood selection process to predict students' grades. The researchers claimed that the proposed algorithm achieved 76% accuracy. Asif, Merceron, and Pathan (2014) used a naive Bayes classifier to demonstrate that students' grades in their final year of university could be predicted based on student data collected during freshman year. In addition, the researchers executed the feature selection process before classification and the results showed that the data set from which socioeconomic and demographic variables had been removed was reasonably accurate. Huang and Fang (2013) used students' final grades as prediction targets. To evaluate the prediction results, the researchers designed two quantitative indicators to transfer the regression mean square error into prediction accuracy. The final results showed that the students' final exam scores were predictable to 88% accuracy based on eight variables collected from a learning management system. Previous studies have explained that "at-risk" can generally be used to describe students who dropout, fail, or achieve low grades on courses. We can fulfil the critical requirement of learning analytics by using students' final grades or scores as prediction indicators and designing a data-mining methodology based on classification or regression for the early prediction of indicators.

Recent studies have used data collected from entire course periods, which is problematic because, through this method, students can only be determined as at-risk after the conclusion of a course, which is ineffective in real scenarios. Moreover, recent studies have used single data sets collected from virtual learning environments or classroom activities, which is ineffective for applying the results to blended courses that combine online and face-to-face learning. Therefore, we referred to recent studies to define the following four aspects for consideration: First, data must be divided into sub data sets based on duration (Hu et al., 2014; Romero et al., 2013). Second, critical factors must be identified to improve prediction accuracy (Asif et al., 2014; Villagra-Arnedo et al., 2016); for example, Villagra-Arnedo et al. (2016) reduced the number of variables from 61 to 23 without losing prediction accuracy. Third, a predesigned regression model used in previous studies called principle component regression (PCR) (Agudo-Peregrina, Iglesias-Pradas, Conde- Gonzalez, & Hernandez-Garcia, 2014; Cevik, 2015; Huang & Fang, 2013; Meier et al., 2016) was used. The model was also implemented and evaluated in our previous study. PCR involves performing principle component analysis (PCA) to calculate the principle components, some of which can be used as variables in multiple linear regression. Fourth, design indicators and acceptance criteria must be considered to evaluate prediction performance. Although the regression model provided several indicators to evaluate performance, it did not provide any accuracy indicator. Therefore, following the concept of prediction accuracy proposed by Huang and Fang (2013), we applied the cross-validation mechanism proposed by Golub, Heath, and Wahba (1979) to design indicators to evaluate prediction performance. Moreover, in recent studies, the acceptance of prediction accuracy ranged from 75% (Villagra-Arnedo et al., 2016) to 95% (Hu et al., 2014).

Method and experiments

Participation and learning activities

The participants in this study were 33 male and 26 female students. The experiment was conducted in a Calculus course that ran from September 2015 to February 2016. This study utilized MOOCs and the OAS to improve freshman students' learning outcomes at a university in Northern Taiwan.

The Calculus course lasted for 18 weeks and included six learning activities (Figure 1). During the course, the participants used MOOCs to preview Calculus content through Open edX (see https://open.edx.org/about-open-edx) and practiced Calculus by using the OAS through Maple T.A. (see http://www.maplesoft.com/). To improve participants' mathematics ability, an instructor provided weekly after-school tutoring for each participant. To encourage the participants to continue studying Calculus, the instructor assigned paper homework exercises. To evaluate the students' learning performance for each topic, the instructor administered quizzes for specific weeks. The weekly quizzes, homework assignments, and course content are listed in Table 1 and Table 2.

Data sets of learning activities and variables

The MOOC and OAS enabled Calculus course collected participant learning profiles, which consisted of their video-viewing behaviors, out-of-class practice, homework assignments, and quiz scores. In particular, this study collected data on video-viewing behaviors from Open edX and data on out-of-class practice from the Maple T.A. Both types of data were categorized as online behavior. Table 3 lists the data variables definition for the Calculus course.

Process for predicting students' final academic performance

At-risk students can be identified as those with a predicted final academic performance of lower than 60. In the blended Calculus course, we applied a final academic performance prediction process with PCR consisting of data preprocessing, modeling, and evaluation phases. The data preprocessing phase consisted of data integration and data set separation. Data integration focused on integrating the learning data derived from MOOCs, the OAS, homework, quiz scores, and after-school tutoring. This study defined 21 variables from the blended learning environments consisting of data of online and traditional learning. The details of variables are described in Table 3. In the data set separation, the duration of the collected learning data was identified. The details of the proposed accumulated and duration data sets are described in the following section. In the modeling phase, a prediction model for students' final academic performance was generated through PCR. The evaluation phase was focused on measuring the goodness of fit and predictive effectiveness of the regression model. In the evaluation phase, this study measured not only the goodness of fit of the regression model by using the mean squared error (MSE), coefficient of determination (R2), and Quantile-Quantile (Q-Q) plot but also the predictive performance of the regression model by using the predictive MSE (pMSE) and predictive mean absolute percentage correction (pMAPC), both of which were proposed in our previous study.

Experimental data set description

To investigate the influence of data set duration on predictive effectiveness, this study proposed accumulated and duration data sets. The purpose of the accumulated data set was to record learning data collected from the first week to a specified week, whereas that of the duration data set was to record the participants' learning behaviors during specific weeks. [W.sup.j.sub.i] indicates that the data set has collected data on the participants' learning behaviors from week i to week j. The accumulated and duration data sets included [W.sup.6.sub.1]. [W.sup.12.sub.1], and [W.sup.18.sub.1] data sets and [W.sup.12.sub.7] and [W.sup.18.sub.13] data sets, respectively. [W.sup.6.sub.1], [W.sup.12.sub.1], and [W.sup.18.sub.1] were the three accumulated data sets that recorded students' learning behaviors from weeks 1-6, 1-12, and 1-18, respectively. [W.sup.12.sub.7] and [W.sup.18.sub.13] were the two duration data sets that recorded students' learning behaviors from weeks 7-12 and 13-18, respectively. The statistics for variables [X.sub.1]-[X.sub.21] based on the accumulated ([W.sup.6.sub.1], [W.sup.12.sub.1], and [W.sup.18.sub.1]) and duration ([W.sup.12.sub.7] and [W.sup.18.sub.13]) data sets are listed in Table 4 and Error! Reference source not found.

In Table 4 and Error! Reference source not found., "Scale" denotes the variable range from the minimum to maximum value. "Mean" and "SD" indicate the average and standard deviation values of 59 students, respectively. In the Calculus course, the average and standard deviation of the participants' scores were 70.05 and 19.2, respectively. The minimum and maximum Calculus scores were 25 and 100, respectively.

Regression model estimation

The performance indicators for evaluating the prediction results in this study were the pMSE and pMAPC, both of which were proposed in our previous study. In the present study, we introduced 10-fold cross validation with shuffling to calculate the pMSE and pMAPC values. We used the testing data obtained from the 10-fold cross validation to calculate the prediction performance. The pMSE and pMAPC equations are as follows:

[mathematical expression not reproducible] (1)

[mathematical expression not reproducible] (2)

The symbols [a.sub.i] and [p.sub.i] represent the actual and predictive scores, respectively, of student [s.sub.i]. A = {[a.sub.1], [a.sub.2], ..., [a.sub.n]} records each student's Calculus score. The symbol a represents the average score of all students in the blended Calculus course. [mathematical expression not reproducible] records the predictive Calculus score in the testing data. A lower pMSE value and higher pMAPC value indicate higher predictive performance and higher predictive accuracy, respectively. Therefore, our objective was to find a regression model with a lower pMSE and higher pMAPC.

Experimental results and discussion

Earliness of students' final academic performance prediction

Regression Model Estimation

We applied PCR to five data sets and generated 21 final academic performance prediction models for each data set. Table 5 lists the average values and scale of the [R.sup.2], adjusted [R.sup.2], and Durbin-Watson statistic for each data set. The Durbin-Watson values indicate that the 21 learning variables are independent. The ranges of the average [R.sup.2] and adjusted [R.sup.2] values for each data set are 0.34-0.47 and 0.30-0.38, respectively. These results are similar to those of previous studies (Agudo-Peregrina et al., 2014; Cevik, 2015), which indicates that the explanatory power of each regression model in the present study was acceptable. Regarding the scale of the [R.sup.2] and adjusted [R.sup.2], the scale ranges of the accumulated data sets are all higher than the scales of the duration data sets, which suggests that the explanatory power of the regression models using the accumulated data sets was higher than that of the regression models using the duration data sets.

Regarding testing of the regression models. Table 6 lists the values of the F- test and corresponding significance level for each data set. Datasets [W.sup.6.sub.1], [W.sup.12.sub.1], [W.sup.18.sub.1], [W.sup.12.sub.7], and [W.sup.18.sub.13] had 21, 20, 20, 16, and 17 regression models, respectively. According to the conventional estimation results in Table 5 and Table 6, the accumulated data sets had regression models with better goodness of fit than those of the duration data sets.

Predictive performance of the five data sets

Table 7 lists the prediction indicators for the five data sets. The pMSE and pMAPC ranges among the data sets are 214-248 and 0.82-0.83, respectively. Regarding the mean of the pMSE, the accumulated data sets all had slightly lower means than did the duration data sets. However, according to the pMSE values, the predictive error for each participant's final academic performance in each of the five data sets was close to 15. By contrast, the mean range of the pMAPC among the accumulated and duration data sets was 0.82-0.83. Regarding the average pMSE and pMAPC values, predictive performance was fairly similar in the accumulated and duration data sets because some information may have been lost when computing the average. To solve this problem, this study conducted Wilcoxon signed-rank testing for the 21 regression models for each data set.

The results of Wilcoxon signed-rank testing of the five data sets are listed in Table 7. The Wilcoxon signed-rank test results for pMSE and pMAPC are listed in the lower and upper triangular matrices, respectively. For the Wilcoxon signed-rank tests for pMSE and pMAPC, the accumulated data sets [W.sup.6.sub.1] and [W.sup.18.sub.1] had significantly different results to the duration data sets [W.sup.12.sub.7] and [W.sup.18.sub.13], suggesting that the predictive performance was significantly different between the data set types. Furthermore, we applied box plots to determine which accumulated data set had the highest predictive performance.

Figure 2 shows a box plot comparison of the different data sets based on the pMSE and pMAPC results. For each data set, we used box plots to describe the distribution of pMSE and pMAPC values for the 21 regression models obtained using PCR. The bottom and top lines represent the minimum and maximum values, respectively. From bottom to top, the three lines in the box indicate the lower quartile, median quartile, and upper quartile, respectively. Figure 2 shows that the box plots of the duration data sets are longer than those of the accumulated data sets, which indicates that the predictive performance of the accumulated data sets was more stable than that of the duration data sets. In addition, the minimum pMSE values of the accumulated data sets are lower than those in the duration data sets and the maximum pMAPC values of the accumulated data sets are higher than those of the duration data sets. The results of the pMSE and pMAPC comparison show that the accumulated data sets have better prediction ability than do the duration data sets.

The results of the pMAPC and pMSE comparison matrix show that among the accumulated data sets, [W.sup.18.sub.1] and [W.sup.6.sub.1] had better predictive performance than did [W.sup.12.sub.1]. Compared with [W.sup.6.sub.1], [W.sup.18.sub.1] had a higher maximum value and higher medial quartile for pMAPC, as well as a lower median quartile for pMSE. However, [W.sup.6.sub.1] had the lowest pMSE value. These results show that [W.sup.18.sub.1] had a slightly higher predictive performance and accuracy than did [W.sup.6.sub.1]. Because of outliers in the maximum value of pMSE and minimal value of pMAPC, the stability of [W.sup.18.sub.1] was lower than that of [W.sup.6.sub.1]. In a real scenario, PCR would generate an equal number of regression results as variables of PCA. Thus, only one prediction result could be randomly selected from the results, which could cause issues if the data set had a wide range of prediction accuracy or in a data set with high average accuracy but few outliers such as [W.sup.18.sub.1]. Therefore, a convergent or stable data set is necessary even if its average accuracy is lower than that of other data sets. Thus, [W.sup.6.sub.1] was determined to be the most suitable data set for real scenarios.

Linear regression residual analysis

According to the results of conventional regression and predictive performance estimation presented in the previous section, the accumulated data set [W.sup.6.sub.1] had the highest stability and accuracy for predicting students' final academic performance. A final test was required to identify the characteristics of normalization, independence, and homogeneity in the data set. However, because PCA can project data into a vector space with a dimension with the same number of variables, 21 models were estimated for each data set. To follow up [W.sup.6.sub.1], we had to select the most predictable components from the 21 PCR results.

Figure 3 shows the pMSE and pMAPC results for each principle component in data set [W.sup.6.sub.1]. The optimal pMSE and pMAPC values (178.94 and 83.5%, respectively) can be obtained in the 12 components. Figure 4 shows the results of linear regression residual analysis by using a Q-Q plot of 12 principle components of [W.sup.6.sub.1]. The distribution for all residuals closely resembles a straight line, which indicates that the distribution for the difference between the predicted and real values supports the characteristics of normalization, independence, and homogeneity.

To answer RQ1 (How early can we predict students' final academic performance?), the results of the conventional and predictive performance estimations indicate that students' final academic performance can be predicted by the sixth week of the semester. The PCR model from data set [W.sup.6.sub.1] had the highest stability and prediction accuracy, which is consistent with the findings of previous studies, which achieved early identification of at-risk students after one third of the course period had been completed (Hu et al., 2014) and before the midterm exam (Romero et al., 2013). Data set [W.sup.18.sub.1] had similar predictive accuracy and stability for predicting students' final academic performance because performance can be calculated using quiz or homework scores throughout the whole semester. Hu et al. (2014) asserted that to identify at- risk students within the learning analytics framework, offering intervention based on an 18-week prediction result is too late. Therefore, the present study recommends using accumulated data set [W.sup.6.sub.1] to predict students' final academic performance. In addition, we found that the predictive performance of duration data sets is inferior to that of accumulated data sets, which indicates that the completeness of data collection is crucial for data analysis.

Determining critical factors that affect students' final academic performance in blended learning

According to the summary of the literature review, the first step to predicting students' final academic performance is to determine as many variables as possible. Subsequently, rules should be applied to enable the selection of variables to obtain higher prediction ability. Moreover, according to the summary in previous section, data set [W.sup.6.sub.1] had the highest stability and predictive accuracy, and thus we used this data set to determine the critical factors that affect students' learning performance. Table 8 shows the regression model estimation results. Components 1, 2, 5, 7, 9, 10, and 12 had a significant influence on students' final academic performance. For each significant component, we selected variables with higher coefficients as critical factors; for example, variable X2 was selected as the critical factor for Component 1 because of the substantial differences between the coefficient of variable X2 and those of the other variables.

To address RQ2 (Which are the most critical factors that affect students' final academic performance in blended learning?), this study determined seven critical factors that affect students' final academic performance, namely [X.sub.2] (Number of activities a student engages in per week), [X.sub.9] (Number of times a student clicks "Play" during video viewing per week), [X.sub.11] (Number of times a student clicks "Backward seek" during video viewing per week), [X.sub.18] (Student's weekly practice score), [X.sub.19] (Student's weekly homework score), [X.sub.20] (Student's weekly quiz score), and [X.sub.21] (Number of times a student participates in after- school tutoring per week).

[X.sub.18], [X.sub.19], and [X.sub.20] are critical factors that affect students' final academic performance because of the evident relationships between each of these three variables and learning performance. The results are consistent with the findings of Huang and Fang (2013), who determined that exam scores and homework scores can predict students' final academic performance. Xing et al. (2016) asserted that online learning behaviors can predict dropout only in online courses. Based on our identification of four online variables, [X.sub.2], [X.sub.9], [X.sub.11] and [X.sub.18], as critical factors that affect students' final academic performance, dropout and students' final academic performance may be related.

Ability of different data sets (blended vs. online vs. traditional) to predict students' final academic performance in blended learning

As mentioned in previous section, we identified seven critical factors that affect students' final academic performance in MOOC and OAS enabled blended courses. These seven critical factors can be categorized in [W.sup.6.sub.1] as blended, online, and traditional data sets. Table 9 lists the categories of each factor and the PCR results. [W.sup.6.sub.1](O), [W.sup.6.sub.1](T), and [W.sup.6.sub.1](B) represent online, traditional, and blended data sets, respectively.

The results of [R.sup.2], the F-test, and the Durbin-Watson test, demonstrate that each indicator was acceptable for each data set (Table 9). The independent variables in three data sets are listed in Table 9. The regression tests for [W.sup.6.sub.1](O), [W.sup.6.sub.1](T), and [W.sup.6.sub.1](B) contained three, three, and five significant variables, respectively, which indicates that the selected critical factors are crucial for predicting students' final academic performance. In addition, the numbers of best components for the online, traditional, and blended data sets were all equal to the numbers of independent variables for each data set, which shows that each data set required whole independent variables to determine the optimal predictive performance. The blended data set [W.sup.6.sub.1](B) obtained the optimal pMSE and pMAPC values of 159.17 and 0.82, respectively. Figure 3 illustrates that the optimal pMSE in [W.sup.6.sub.1] was 178.94, which was inferior to that of blended dataset [W.sup.6.sub.1](B). These results show that the selected critical factors not only reduce the number of variables for PCR but also improve prediction performance.

To answer RQ3 (Which type of data set (blended vs. online vs. traditional) is more effective for predicting students' final academic performance in blended learning?), the blended data set obtained the most favorable predictive performance, demonstrating that the blended data set had a higher predictive performance than did the traditional data set. This result is consistent with the findings of Agudo- Peregrina et al. (2014), who revealed that students' interactions with online learning environments influence their academic performance. In addition, the present study followed previous studies in using critical factors to improve predictive performance (Asif et al., 2014; Romero et al., 2013; Villagra-Arnedo et al., 2016).

Conclusion

This study collected student profiles from a MOOC and OAS enabled blended Calculus course. In addition, we applied PCR to evaluate five data sets that were separated based on the collected data. The experimental results demonstrate that students' final academic performance in a blended Calculus course can be predicted with high stability and accuracy by a data set containing data from weeks 1-6 of the course. In other words, through well-identified online and traditional variables, we were able to predict students' final academic performance when as early as one-third of the way through the semester. Seven critical factors that influence students' learning performance were identified by the regression model to improve prediction performance. However, explaining the relationship between these critical factors and learning performance would require investigation through interviews with educational experts. Furthermore, to achieve the goal of improving students' learning performance, the student performance prediction model proposed in this study and a well-defined intervention strategy must be integrated into the learning analytics framework. The complete learning analytics framework could be applied to predict student learning outcomes in the second semester of such a Calculus course.

Acknowledgments

This work was supported by Ministry of Science and Technology, Taiwan under grants MOST-104-2511-S-008-006-MY2, MOST-105-2511-S-008-003-MY3, MOST-105-2622-S-008 -002-CC2, MOST-106-2511-S-008- 004-MY3.

References

Agudo-Peregrina, A. F., Iglesias-Pradas, S., Conde-Gonzalez, M. A., & Hernandez- Garcia, A. (2014). Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLE-supported F2F and online learning. Computers in Human Behavior, 31, 542-550.

Arroway, P., Morgan, G., O'Keefe, M., & Yanosky, R. (2015). Learning analytics in higher education: Research report. Louisville, CO: ECAR.

Asif, R., Merceron, A., & Pathan, M. K. (2014). Predicting student academic performance at degree level: A Case study. International Journal of Intelligent Systems and Applications, 7(1), 49-61.

Awang, T. S., & Zakaria, E. (2013). Enhancing students' understanding in integral calculus through the integration of Maple in learning. Procedia-Social and Behavioral Sciences, 102, 204-211.

Cevik, Y. D. (2015). Predicting college students' online information searching strategies based on epistemological, motivational, decision-related, and demographic variables. Computers & Education, 90, 54-63.

Ellis, R. A., Pardo, A., & Han, F. (2016). Quality in blended learning environments-Significant differences in how students approach learning collaborations. Computers & Education, 102, 90-102.

Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2), 215-223.

Hachey, A. C., Wladis, C. W., & Conway, K. M. (2014). Do prior online course outcomes provide more information than GPA alone in predicting subsequent online course grades and retention? An Observational study at an urban community college. Computers & Education, 72, 59-67.

Hart, R. S. (1995). The Illinois PLATO foreign languages project. CALICO Journal, 12(4), 15-37.

Hong, J.-C., Hwang, M.-Y., Wu, N.-C., Huang, Y.-L., Lin, P.-H., & Chen, Y.-L. (2016). Integrating a moral reasoning game in a blended learning setting: Effects on students' interest and performance. Interactive Learning Environments, 24(3), 572-589.

Hu, Y.-H., Lo, C.-L., & Shih, S.-P. (2014). Developing early warning systems to predict students' online learning performance. Computers in Human Behavior, 36, 469-478.

Huang, C. S. J., Yang, S. J. H., Chiang, T. H. C., & Su, A. Y. S. (2016). Effects of situated mobile learning approach on learning motivation and performance of EFL students. Educational Technology & Society, 19(1), 263-276.

Huang, S., & Fang, N. (2013). Predicting student academic performance in an engineering dynamics course: A Comparison of four types of predictive mathematical models. Computers & Education, 61, 133- 145.

Hwang, G.-J. (2014). Definition, framework and research issues of smart learning environments-a context-aware ubiquitous learning perspective. Smart Learning Environments, 1(1), 4.

Hwang, G.-J., Chu, H.-C., & Yin, C. (2017). Objectives, methodologies and research issues of learning analytics. Interactive Learning Environments, 25(2), 143-146.

Johnson, L., Adams Becker, S., Cummins, M., Estrada, V., Freeman, A., & Hall, C. (2016). NMC Horizon Report: 2016 higher education edition. Austin, TX: The New Media Consortium.

Johnson, L., Smith, R., Willis, H., Levine, A., & Haywood, K. (2011). The 2011 Horizon Report. Austin, TX: The New Media Consortium.

Kuzilek, J., Hlosta, M., Herrmannova, D., Zdrahal, Z., & Wolff, A. (2015). OU analyse: Analysing at-risk students at The Open University. Learning Analytics Review, 1-16.

Lara, J. A., Lizcano, D., Martinez, M. A., Pazos, J., & Riera, T. (2014). A System for knowledge discovery in e-learning environments within the European Higher Education Area-Application to student data from Open University of Madrid, UDIMA. Computers & Education, 72, 23-36.

Lu, O. H. T., Huang, J. C. H., Huang, A. Y. Q., & Yang, S. J. H. (2017). Applying learning analytics for improving students engagement and learning outcomes in an MOOCs enabled collaborative programming course. Interactive Learning Environments, 25(2), 220-234.

Meier, Y., Xu, J., Atan, O., & van der Schaar, M. (2016). Predicting grades. IEEE Transactions on Signal Processing, 64(4), 959-972.

Papamitsiou, Z. K., & Economides, A. A. (2014). Learning analytics and educational data mining in practice: A Systematic literature review of empirical evidence. Educational Technology & Society, 17(4), 49-64.

Romero, C., Lopez, M.-I., Luna, J.-M., & Ventura, S. (2013). Predicting students' final performance from participation in online discussion forums. Computers & Education, 68, 458-472.

Romero, C., & Ventura, S. (2010). Educational data mining: A Review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601-618.

Tempelaar, D. T., Rienties, B., & Giesbers, B. (2015). In search for the most informative data for feedback generation: Learning Analytics in a data-rich context. Computers in Human Behavior, 47, 157- 167.

Thammasiri, D., Delen, D., Meesad, P., & Kasap, N. (2014). A Critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Systems with Applications, 41(2), 321-330.

Villagra-Arnedo, C., Gallego Duran, F.J., Compan, P., Llorens-Largo, F., & Molina-Carmona, R. (2016). Predicting academic performance from behavioural and learning data. International Journal of Design & Nature and Ecodynamics, 11 (3), 239-249.

Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2016). Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Computers in Human Behavior, 58, 119-129.

Yang, S. J. H, Huang, J. C. H., & Huang, A. Y. Q. (2017). MOOCs in Taiwan: The Movement and experiences open education. In Open Education: from OERs to MOOCs (pp. 101-116). New York, NY: Springer.

Owen H. T. Lu (1), Anna Y. Q. Huang (1), Jeff C. H. Huang (2), Albert J. Q. Lin (1), Hiroaki Ogata (3) and Stephen J. H. Yang (1) *

(1) Department of Computer Science and Information Engineering, National Central University, Taiwan // (2) Department of Computer Science and Information Engineering, Hwa Hsia University of Technology, Taiwan // (3) Graduate School of Informatics, Kyoto University, Japan // cfleul98@gmail.com // anna.yuqing@gmail.com // jeff@cc.hwh.edu.tw // snailsmall612@gmail.com // hiroaki.ogata@gmail.com // jhyang@csie.ncu.edu.tw

* Corresponding author

Caption: Figure 1. Calculus course learning activities

Caption: Figure 2. Comparison of the pMSE and pMAPC results of different data sets

Caption: Figure 3. Results of pMSE and pMAPC for each [W.sup.6.sub.1] component

Caption: Figure 4. Q-Q plot of 12 components of data set [W.sup.6.sub.1]

Table 1. Homework and quiz execution weeks Weeks 1 2 3 4 5 6 7 8 9 10 11 Homework H1 H2 H3 H4 H5 H6 Quiz Q1 Q2 Q3 Q4 Q5 Q6 Weeks 12 13 14 15 16 17 18 Homework H7 H8 H9 Quiz Q7 Q8 Q9 Table 2. Course content presented over 18 weeks (see http://mathweb.math.ncu.edu.tw/calc/maple-tutorial.html) Week Content 1 Function Limitation 2 Differentiation 3 Newton's Method 4 Integral 5 Piecewise Function 6 Arc Length 7 Anti-differentiation 8 Number Integral 9 Harmonic series 10 Taylor Error 11 Fourier Series 12 Polar 13 Vector Space 14 Curve in Space 15 Surface 16 Scalar Field 17 Multiple Integral 18 Line Integral Table 3. Variables definition for the Calculus course Variable Description Category Learning environment [X.sub.1] Number of days a Online MOOCs student exhibits activity * per week [X.sub.2] Number of activities Online * a student engages in per week [X.sub.3] Number of days a Online student watches videos per week [X.sub.4] Number of videos a Online student watches per week ** [X.sub.5] Number of videos a Online student completely watches *** per week [X.sub.6] Number of times a Online student clicks "Forward seek" or "Backward seek" during video viewing per week [X.sub.7] Number of videos Online during which a student clicks "Pause" per week [X.sub.8] Number of videos Online during which a student clicks "Stop" per week [X.sub.9] Number of times a Online student clicks "Play" per week [X.sub.10] Number of times a Online student clicks "Forward seek" per week [X.sub.11] Number of times a Online student clicks "Backward seek" per week [X.sub.12] Number of times a Online student clicks "Pause" per week [X.sub.13] Number of times a Online student clicks "Stop" per week [X.sub.14] Number of times a Online OAS student engages in online practice per week [X.sub.15] Number of Calculus Online units a student practices per week [X.sub.16] Number of days a Online student engages in online practice per week [X.sub.17] Sum of days of Online practiced Calculus units per week [X.sub.18] Student's weekly Online practice score [X.sub.19] Student's weekly Traditional Paper homework score [X.sub.20] Student's weekly Paper quiz score [X.sub.21] Number of times a Classroom student participates in after-school tutoring per week Y Student's final academic performance Note. * MOOC activity refers to logging in to watch videos or browse course content. ** Counting only once if repeated; unfinished video viewing is included. *** Completely" refers to more than 95%. Table 4. Statistics of variables for accumulated data sets ([W.sup.6.sub.1], [W.sup.12.sub.1], and [W.sup.18.sub.1]) Variable Data set [W.sup.6.sub.1] Scale Mean SD [X.sub.1] 0.0-4.17 2.33 0.96 [X.sub.2] 0.0-1410.33 482 254.34 [X.sub.3] 0.0-3.0 1.26 0.66 [X.sub.4] 0.0-10.33 4.26 2.67 [X.sub.5] 0.0-10.0 2.7 2.3 [X.sub.6] 0.0-7.33 2.42 1.86 [X.sub.7] 0.0-7.83 3.07 2.05 [X.sub.8] 0.0-9.67 2.37 2.21 [X.sub.9] 0.0-309.33 48.96 55.58 [X.sub.10] 0.0-154.83 13.99 23.36 [X.sub.11] 0.0-28.5 4.92 5.71 [X.sub.12] 0.0-43.5 11.47 10.34 [X.sub.13] 0.0-11.5 2.61 2.5 [X.sub.14] 0.0-8.5 4 2 [X.sub.15] 0.0-2.17 1.55 0.62 [X.sub.16] 0.0-2.33 1.09 0.51 [X.sub.17] 0.0-3.17 1.8 0.79 [X.sub.18] 0.0-9.12 5.99 2.33 [X.sub.19] 0.0-9.99 9.09 1.61 [X.sub.20] 0.0-9.94 7.83 1.85 [X.sub.21] 0.0-4.0 0.14 0.6 Variable Data set [W.sup.12.sub.1] Scale Mean SD [X.sub.1] 0.0-3.67 1.86 0.8 [X.sub.2] 0.0-839.0 321.6 176.13 [X.sub.3] 0.0-2.0 1.04 0.54 [X.sub.4] 0.0-10.42 3.74 2.42 [X.sub.5] 0.0-9.42 2.33 1.94 [X.sub.6] 0.0-6.83 2.11 1.63 [X.sub.7] 0.0-7.08 2.69 1.75 [X.sub.8] 0.0-8.92 2.05 1.86 [X.sub.9] 0.0-255.33 43.42 47.65 [X.sub.10] 0.0-85.08 10.83 16.2 [X.sub.11] 0.0-21.17 4.26 4.77 [X.sub.12] 0.0-30.67 9.38 7.67 [X.sub.13] 0.0-10.25 2.25 2.08 [X.sub.14] 0.0-7.08 3.03 1.54 [X.sub.15] 0.0-1.83 1.15 0.48 [X.sub.16] 0.0-1.67 0.83 0.4 [X.sub.17] 0.0-2.25 1.34 0.63 [X.sub.18] 0.0-8.91 5.55 2.07 [X.sub.19] 0.0-9.99 9.12 1.55 [X.sub.20] 0.0-9.94 7.67 1.9 [X.sub.21] 0.0-4.0 0.14 0.6 Variable Data set [W.sup.18.sub.1] Scale Mean SD [X.sub.1] 0.0-3.22 1.67 0.75 [X.sub.2] 0.0-594.39 257.13 142.16 [X.sub.3] 0.0-2.11 0.94 0.51 [X.sub.4] 0.0-8.61 3.3 2.15 [X.sub.5] 0.0-7.5 2.1 1.69 [X.sub.6] 0.0-6.22 1.9 1.42 [X.sub.7] 0.0-6.5 2.45 1.61 [X.sub.8] 0.0-7.11 1.78 1.55 [X.sub.9] 0.0-220.5 40.68 42.1 [X.sub.10] 0.0-57.61 8.85 11.94 [X.sub.11] 0.0-21.33 4.34 4.5 [X.sub.12] 0.0-32.78 9.57 7.71 [X.sub.13] 0.0-8.22 1.95 1.73 [X.sub.14] 0.0-7.17 2.53 1.54 [X.sub.15] 0.0-1.61 0.89 0.41 [X.sub.16] 0.0-1.22 0.64 0.33 [X.sub.17] 0.0-1.94 1.03 0.52 [X.sub.18] 0.0-8.89 5.41 1.97 [X.sub.19] 0.0-9.98 9.06 1.63 [X.sub.20] 0.0-9.89 7.33 2.02 [X.sub.21] 0.0-4.0 0.14 0.6 Table 5. Statistics of variables for duration data sets ([W.sup.12.sub.7] and [W.sup.18.sub.13]) Variabl e Data set [W.sup.12.sub.7] Scale Mean SD [X.sub.1] 0.0-3.33 1.38 0.85 [X.sub.2] 0.0-537.33 161.21 151.9 [X.sub.3] 0.0-2.5 0.82 0.65 [X.sub.4] 0.0-10.5 3.21 2.97 [X.sub.5] 0.0-8.83 1.95 2.2 [X.sub.6] 0.0-7.83 1.79 1.83 [X.sub.7] 0.0-7.67 2.32 2.11 [X.sub.8] 0.0-8.17 1.74 2.08 [X.sub.9] 0.0-247.33 37.87 50.74 [X.sub.10] 0.0-68.83 7.68 13.8 [X.sub.11] 0.0-30.33 3.6 5.46 [X.sub.12] 0.0-32.67 7.28 7.98 [X.sub.13] 0.0-9.0 1.89 2.26 [X.sub.14] 0.0-5.67 2.06 1.51 [X.sub.15] 0.0-1.5 0.75 0.46 [X.sub.16] 0.0-1.67 0.56 0.4 [X.sub.17] 0.0-2.33 0.88 0.62 [X.sub.18] 0.0-8.7 5.12 2.05 [X.sub.19] 0.0-9.99 9.15 1.52 [X.sub.20] 0.0-9.94 7.52 2.06 [X.sub.21] 0.0-4.0 0.14 0.6 Variable Data set [W.sup.18.sub.13] Scale Mean SD [X.sub.1] 0.0-3.0 1.3 0.9 [X.sub.2] 0.0-436.17 128.19 113.98 [X.sub.3] 0.0-2.5 0.73 0.61 [X.sub.4] 0.0-6.83 2.44 2.03 [X.sub.5] 0.0-5.5 1.63 1.55 [X.sub.6] 0.0-5.0 1.49 1.33 [X.sub.7] 0.0-6.0 1.97 1.68 [X.sub.8] 0.0-4.83 1.23 1.23 [X.sub.9] 0.0-261.0 35.2 43.91 [X.sub.10] 0.0-26.5 4.87 6.51 [X.sub.11] 0.0-21.67 4.51 5.13 [X.sub.12] 0.0-49.83 9.96 10.45 [X.sub.13] 0.0-5.17 1.34 1.36 [X.sub.14] 0.0-12.5 1.55 2.22 [X.sub.15] 0.0-1.17 0.39 0.38 [X.sub.16] 0.0-0.83 0.27 0.27 [X.sub.17] 0.0-1.33 0.42 0.43 [X.sub.18] 0.0-8.85 5.14 1.91 [X.sub.19] 0.0-9.97 8.94 1.88 [X.sub.20] 0.0-9.89 6.65 2.48 [X.sub.21] 0.0-4.0 0.14 0.6 Table 5. [R.sup.2], adjusted [R.sup.2], and Durbin-Watson values for five data sets Dataset [R.sup.2] Mean Scale Accumulated [W.sup.6.sub.1] 0.47 0.16~0.66 data set [W.sup.12.sub.1] 0.47 0.11~0.69 [W.sup.18.sub.1] 0.48 0.10~0.72 Duration data [W.sup.12.sub.7] 0.34 0.01~0.70 set [W.sup.18.sub.13] 0.43 0.03~0.59 Dataset Adjusted [R.sup.2] Mean Scale Accumulated [W.sup.6.sub.1] 0.37 0.15~0.52 data set [W.sup.12.sub.1] 0.36 0.08~0.52 [W.sup.18.sub.1] 0.38 0.08~0.56 Duration data [W.sup.12.sub.7] 0.31 0.02~0.53 set [W.sup.18.sub.13] 0.30 0.01~0.43 Dataset Durbin-Watson Mean Scale Accumulated [W.sup.6.sub.1] 1.70 1.4~1.99 data set [W.sup.12.sub.1] 1.77 1.4~2.06 [W.sup.18.sub.1] 1.87 1.47~2.18 Duration data [W.sup.12.sub.7] 1.69 1.49~1.88 set [W.sup.18.sub.13] 1.92 1.51~2.18 Table 6. F-test values and corresponding significance levels for five data sets Data set Value of F-test Mean Scale Accumulated [W.sup.6.sub.1] 4.93 3.29~11.24 data set [W.sup.12.sub.1] 4.50 2.32~7.25 [W.sup.18.sub.1] 4.75 2.21~6.53 Duration data [W.sup.12.sub.7] 3.43 0.55~5.31 set [W.sup.18.sub.13] 3.43 0.72~5.90 Data set p-value of F-test Mean Scale Accumulated [W.sup.6.sub.1] 0.001 1.92E-6~0.008 data set [W.sup.12.sub.1] 0.006 3.32E-6~0.068 [W.sup.18.sub.1] 0.007 7.63E-6~0.08 Duration data [W.sup.12.sub.7] 0.12 4.73E-5~0.65 set [W.sup.18.sub.13] 0.07 5.84E-5~0.54 Data set Number of significant Not sig. Sig. Accumulated [W.sup.6.sub.1] 0 21 data set [W.sup.12.sub.1] 1 20 [W.sup.18.sub.1] 1 20 Duration data [W.sup.12.sub.7] 5 16 set [W.sup.18.sub.13] 4 17 Table 7. Results of predictive performance for the five data sets Mean of Mean of pMSE pMAPC Accumulate [W.sup.6.sub.1] 214.85 0.82 data set [W.sup.12.sub.1] 230.70 0.82 [W.sup.18.sub.1] 217.06 0.83 Duration data [W.sup.12.sub.7] 239.62 0.82 set [W.sup.18.sub.13] 248.33 0.82 pMSE \ pMAPC (Wilcoxon signed-rank test) [W.sup.6.sub.1] Accumulate [W.sup.6.sub.1] -- data set [W.sup.12.sub.1] 0.54 [W.sup.18.sub.1] 0.05 * Duration data [W.sup.12.sub.7] 0.01 * set [W.sup.18.sub.13] 0.00 ** pMSE \ pMAPC (Wilcoxon signed-rank test) [W.sup.12.sub.1] Accumulate [W.sup.6.sub.1] 0.00 ** data set [W.sup.12.sub.1] -- [W.sup.18.sub.1] 0.00 ** Duration data [W.sup.12.sub.7] 0.07 set [W.sup.18.sub.13] 0.16 pMSE \ pMAPC (Wilcoxon signed-rank test) [W.sup.18.sub.1] Accumulate [W.sup.6.sub.1] 0.61 data set [W.sup.12.sub.1] 0.03 * [W.sup.18.sub.1] -- Duration data [W.sup.12.sub.7] 0.00 *** set [W.sup.18.sub.13] 0.00 ** pMSE \ pMAPC (Wilcoxon signed-rank test) [W.sup.12.sub.7] Accumulate [W.sup.6.sub.1] 0.01 * data set [W.sup.12.sub.1] 0.07 [W.sup.18.sub.1] 0.00 ** Duration data [W.sup.12.sub.7] -- set [W.sup.18.sub.13] 0.99 pMSE \ pMAPC (Wilcoxon signed-rank test) [W.sup.18.sub.13] Accumulate [W.sup.6.sub.1] 0.00 ** data set [W.sup.12.sub.1] 0.04 * [W.sup.18.sub.1] 0.00 *** Duration data [W.sup.12.sub.7] 0.07 set [W.sup.18.sub.13] -- Note. * p < .05, ** p < .01, *** p < .001. Table 8. Variable estimation results of PCR for 12 components obtained using data set [W.sup.6.sub.1] Components Variables 1 2 3 4 5 6 [X.sub.1] 0 -0.01 0.01 0 0 -0.01 [X.sub.2] 0.99 -0.17 0.01 0.03 0 0 [X.sub.3] 0 0 0.01 -0.01 0.02 0.03 [X.sub.4] 0.01 0 0.03 -0.13 0.2 0.34 [X.sub.5] 0.01 0 0.05 -0.12 0.23 0.35 [X.sub.6] 0.01 0 -0.02 -0.08 0.15 0 [X.sub.7] 0.01 0 0.03 -0.14 0.05 0.14 [X.sub.8] 0 -0.01 0.05 -0.12 0.24 0.36 [X.sub.9] 0.16 0.95 0.26 0.06 -0.01 -0.01 [X.sub.10] 0.06 0.26 -0.94 -0.17 -0.06 0.12 [X.sub.11] 0.02 0.03 -0.08 -0.24 0.63 -0.65 [X.sub.12] 0.03 0.01 0.19 -0.9 -0.29 -0.06 [X.sub.13] 0.01 -0.01 0.06 -0.12 0.3 0.39 [X.sub.14] 0 -0.02 0 -0.02 -0.21 -0.09 [X.sub.15] 0 0 0 -0.01 -0.07 -0.03 [X.sub.16] 0 0 0 0 -0.05 -0.01 [X.sub.17] 0 -0.01 0 -0.02 -0.1 -0.03 [X.sub.18] 0 -0.02 -0.02 -0.07 -0.31 -0.09 [X.sub.19] 0 0 0.01 0 -0.17 0 [X.sub.20] 0 -0.01 0.01 0.03 -0.27 0.04 [X.sub.21] 0 0 0 0 0 0.03 p value 0 *** 0.009 ** 0.881 0.637 0.02 * 0.81 Components Variables 7 8 9 10 11 [X.sub.1] -0.06 0.04 0.18 -0.14 0.21 [X.sub.2] 0.01 0 0 0 0 [X.sub.3] -0.02 -0.06 0.16 -0.03 0.05 [X.sub.4] -0.2 -0.08 0.5 0.12 0.03 [X.sub.5] -0.13 0.08 -0.19 -0.04 0.09 [X.sub.6] -0.14 -0.15 0.37 0.28 -0.12 [X.sub.7] -0.1 -0.05 0.38 0.03 -0.05 [X.sub.8] -0.14 0.09 -0.26 -0.04 0.04 [X.sub.9] -0.03 0.02 0 0.01 0 [X.sub.10] 0.03 0 -0.01 -0.03 0.01 [X.sub.11] -0.29 -0.04 -0.09 -0.04 0.09 [X.sub.12] 0.22 -0.05 -0.04 -0.05 0.01 [X.sub.13] -0.17 0.15 -0.29 -0.06 -0.06 [X.sub.14] -0.31 0.67 0.24 -0.42 0.25 [X.sub.15] -0.13 0.12 0 0.06 -0.06 [X.sub.16] -0.07 0.08 -0.02 0.02 -0.08 [X.sub.17] -0.14 0.15 0.02 0.04 -0.11 [X.sub.18] -0.49 0.2 -0.21 0.62 -0.22 [X.sub.19] -0.41 -0.36 -0.03 -0.55 -0.59 [X.sub.20] -0.43 -0.5 -0.13 -0.01 0.65 [X.sub.21] 0.01 -0.09 -0.1 0.01 0.04 p value 0.006 ** 0.114 0.033 * 0.001 ** 0.099 Components Variables 12 [X.sub.1] 0.01 [X.sub.2] 0 [X.sub.3] 0.04 [X.sub.4] 0.14 [X.sub.5] 0.08 [X.sub.6] -0.36 [X.sub.7] 0.33 [X.sub.8] -0.02 [X.sub.9] 0 [X.sub.10] 0 [X.sub.11] 0.07 [X.sub.12] -0.05 [X.sub.13] -0.22 [X.sub.14] 0 [X.sub.15] 0.05 [X.sub.16] -0.02 [X.sub.17] 0 [X.sub.18] 0.08 [X.sub.19] 0 [X.sub.20] -0.11 [X.sub.21] 0.81 p value 0.003 ** Note. * p < .05, ** p < .01, *** p < .001. Table 9. PCR results of blended, online, and traditional learning data sets Data set Variables p value (critical factors) Data set which blended [X.sub.2] 0.00 *** online and traditional [X.sub.9] 0.01 ** critical factors [X.sub.11] 0.15 [W.sup.6.sub.1](B) [X.sub.18] 0.00 *** [X.sub.19] 0.1 [X.sub.20] 0.11 ** [X.sub.21] 0.01 * Data set of online critical [X.sub.2] 0.00 *** factors [W.sup.6.sub.1](O) [X.sub.9] 0.03 * [X.sub.11] 0.40 [X.sub.18] 0.00 *** Data set of traditional [X.sub.19] 0.00 ** critical factors [X.sub.20] 0.00 *** [W.sup.6.sub.1](T) [X.sub.21] 0.03 * Data set Variables pMSE (critical factors) Data set which blended [X.sub.2] 159.17 online and traditional [X.sub.9] critical factors [X.sub.11] [W.sup.6.sub.1](B) [X.sub.18] [X.sub.19] [X.sub.20] [X.sub.21] Data set of online critical [X.sub.2] 181.16 factors [W.sup.6.sub.1](O) [X.sub.9] [X.sub.11] [X.sub.18] Data set of traditional [X.sub.19] 186.99 critical factors [X.sub.20] [W.sup.6.sub.1](T) [X.sub.21] Data set Variables pMAPC (critical factors) Data set which blended [X.sub.2] 0.82 online and traditional [X.sub.9] critical factors [X.sub.11] [W.sup.6.sub.1](B) [X.sub.18] [X.sub.19] [X.sub.20] [X.sub.21] Data set of online critical [X.sub.2] 0.82 factors [W.sup.6.sub.1](O) [X.sub.9] [X.sub.11] [X.sub.18] Data set of traditional [X.sub.19] 0.80 critical factors [X.sub.20] [W.sup.6.sub.1](T) [X.sub.21] Data set Variables Best (critical factors) Comp Data set which blended [X.sub.2] 7 (DF = online and traditional [X.sub.9] 7) critical factors [X.sub.11] [W.sup.6.sub.1](B) [X.sub.18] [X.sub.19] [X.sub.20] [X.sub.21] Data set of online critical [X.sub.2] 4 (DF = factors [W.sup.6.sub.1](O) [X.sub.9] 4) [X.sub.11] [X.sub.18] Data set of traditional [X.sub.19] 3 (DF = critical factors [X.sub.20] 3) [W.sup.6.sub.1](T) [X.sub.21] Data set Variables [R.sup.2] (critical factors) Data set which blended [X.sub.2] 0.56 online and traditional [X.sub.9] critical factors [X.sub.11] [W.sup.6.sub.1](B) [X.sub.18] [X.sub.19] [X.sub.20] [X.sub.21] Data set of online critical [X.sub.2] 0.39 factors [W.sup.6.sub.1](O) [X.sub.9] [X.sub.11] [X.sub.18] Data set of traditional [X.sub.19] 0.40 critical factors [X.sub.20] [W.sup.6.sub.1](T) [X.sub.21] Data set Variables F (critical factors) Data set which blended [X.sub.2] 0.00 *** online and traditional [X.sub.9] critical factors [X.sub.11] [W.sup.6.sub.1](B) [X.sub.18] [X.sub.19] [X.sub.20] [X.sub.21] Data set of online critical [X.sub.2] 0.00 *** factors [W.sup.6.sub.1](O) [X.sub.9] [X.sub.11] [X.sub.18] Data set of traditional [X.sub.19] 0.00 *** critical factors [X.sub.20] [W.sup.6.sub.1](T) [X.sub.21] Data set Variables Durbin- (critical factors) Watson Data set which blended [X.sub.2] 1.62 online and traditional [X.sub.9] critical factors [X.sub.11] [W.sup.6.sub.1](B) [X.sub.18] [X.sub.19] [X.sub.20] [X.sub.21] Data set of online critical [X.sub.2] 1.42 factors [W.sup.6.sub.1](O) [X.sub.9] [X.sub.11] [X.sub.18] Data set of traditional [X.sub.19] 1.70 critical factors [X.sub.20] [W.sup.6.sub.1](T) [X.sub.21] Note. * p < .05, ** p < .01, *** p < .001.

Printer friendly Cite/link Email Feedback | |

Author: | Lu, Owen H.T.; Huang, Anna Y.Q.; Huang, Jeff C.H.; Lin, Albert J.Q.; Ogata, Hiroaki; Yang, Stephen J |
---|---|

Publication: | Educational Technology & Society |

Article Type: | Report |

Geographic Code: | 9TAIW |

Date: | Apr 1, 2018 |

Words: | 10311 |

Previous Article: | A Learning Analytics Approach to Investigating Factors Affecting EFL Students' Oral Performance in a Flipped Classroom. |

Next Article: | Personalized Word-Learning based on Technique Feature Analysis and Learning Analytics. |

Topics: |