Printer Friendly

Visualization analytics for second language vocabulary learning in virtual worlds.


Language skills play a critical role in our globalized world for a person's career development. A foundation of language learning is the acquisition of vocabulary (Barcroft, 2004; Lan, 2013; Llach & Gomez, 2007). Learning success in the acquisition of vocabulary will affect a student's future skills in language learning in general, including, for example, reading skills (Cobb, 2007; Kern, 1989) and listening skills (Kelly, 1991; Smidt & Hegelheimer, 2004).

Many researchers (Krashen, 1985; Lan, Kan, Hsiao, Yang, & Chang, 2013) have pointed out the importance of learning a language in authentic contexts for faster and more effective acquisition. Several studies also stated learning in contexts facilitates learners' acquisition of vocabulary (e.g., Snow, 2005). However, authentic contexts in the real world often involve travel and other time-consuming efforts. Virtual worlds provide an alternative to real world environments. Complex and perceptually rich environments in the virtual worlds can increase the student's sense of immersion just as real environments do (Warburton, 2009). Lan (2015) discussed that in order to build an authentic and immersive learning environment, the government in Taipei funded 12 elementary schools to construct the English Village for language learning. However, such a program is very costly and still fails to meet all the needs of all the elementary school students. If Second Life is adopted as an alternative to build virtual environments for learning, positive results can be obtained to benefit all students (see Lan, 2015). Dalgarno and Lee (2010) also suggested that virtual worlds provide the third dimension via 3D virtual reality, and enhance the visual authenticity. Among the 3D virtual worlds available, Second Life (henceforth SL; Linden Research Inc.) has high flexibility and expandability (Kaplan & Haenlein, 2009), and has been very commonly used for teaching and learning. As an educator or a teacher, one can use SL to design the virtual environments in a manner required by the needs of a class.

Vocabulary acquisition in virtual worlds

In the context of second language learning, the acquisition of vocabulary is considered the most fundamental (Barcroft, 2004; Lan, 2013; Llach & Gomez, 2007) because it affects how well a learner could perform in listening, speaking, reading, and writing. In order to carry out a conversation, a learner must acquire a certain amount of vocabulary and have an awareness of how each word could be appropriately used. However, the acquisition of vocabulary can be challenging (Lan, 2013; Li, 2015; Meara, 1980), given the large amount of words in a foreign language and the complex meaning-to-form mappings (Malt, Li, Pavlenko, Zhu, & Ameel, 2015; Zinszer, Malt, Ameel, & Li, 2014). To date, a large number of studies have examined how to best enhance the learning outcomes of vocabulary acquisition.

With the increasing use of virtual worlds, more researchers have been studying vocabulary learning in 3D virtual worlds. One reason could be that context-based learning allows L2 learners to build a direct link between vocabulary and its concepts, which facilitate L2 vocabulary learning (Cohen & Aphek, 1980; Kroll & Curley, 1988; Snow, 2005). Thus, many researchers have found that students who learn in virtual worlds receive positive outcome in vocabulary acquisition (Lan, 2015; Lan, Fang, Legault, & Li, 2015; Rankin, Gold, & Gooch, 2006). For example, Lan et al. (2015) divided students into two groups, one learning vocabulary words through structured programs on computers, while the other through virtual worlds. Their findings showed that although participants were exposed to the same words for fewer times in the 3D virtual worlds than in computer-assisted learning, the two groups performed equally well at the end of the training. However, the researchers did not use additional data analytics for understanding participants' language learning strategies (LLSs). As Lan (2013) pointed, data analytics on LLSs could help reveal additional insights into language learning. The current study is designed to examine the LLSs used for vocabulary learning by the same participants reported by Lan et al. (2015).

Analysis of language learning strategies (LLSs) in virtual worlds

Language Learning Strategies or LLSs are defined as "specifications, behaviors, steps, or techniques--such as seeking out conversation partners, or giving oneself encouragement to tackle a difficult language task--used by students to enhance their own learning" (Scarcella & Oxford, 1992, p. 63). LLSs can be divided into six types: cognitive, metacognitive, memory-related, compensatory, affective, and social, according to Oxford (2003). On the basis of Oxford's (1990) research on LLSs, Lan (2013) further outlined 12 vocabulary learning strategies: practicing, note-taking, key word, contextualization, grouping, imagery, recombination, deduction, analysis, physical response, translation, and transfer. In this study, an LLSs questionnaire based on Lan's (2013) 12 learning strategies was used to analyze the participants' learning data and to identify the LLSs used by the participants.

As Sagarra and Alba (2006) stated, how students learn vocabulary words influences how well they understand and use the words. Although previous studies indicated that learning outcomes are linked to students' learning logs, which cover all of the students' learning activities (Agudo-Peregrina, Iglesias-Pradas, Conde-Gonzalez, & Hernandez-Garcia, 2014; Lan et al., 2013; Rankin et al., 2006), they focused on the interactions among different users in a digital learning environment. The current research, however, highlights the interactions between the student learners and the target vocabulary words.

Various methods can be used to analyze videotapes, questionnaires, interviews, and the database of learning logs displayed in pie charts, line graphs, etc. (Heer, Bostock, & Ogievetsky, 2010). In past research, questionnaire has been the most commonly used form for evaluating performance of vocabulary learning (Gu & Johnson, 1996; Kojic-Sabo & Lightbown, 1999). However, few studies have recorded detailed learning logs. The limitation is that learners themselves could not report everything in details during the learning process, and therefore, these methods cannot provide an in-depth understanding of how learners actually learn in virtual worlds. At the same time, simply relying on databases based on virtual words is not the best choice because 3D virtual worlds are much more complex than 2D environments. For example, Cruz-Benito et al. (2014) provided the teachers and students with learning analytics displayed in texts and statistics; Camilleri, de Freitas, Montebello and McDonagh-Smith (2013) visualized the interactions in virtual worlds through statistical graphs; Tashiro, Hung and Martin (2010) visualized the learning paths of individual learners. These studies are useful, but none of them captured the huge amount of data from students' learning behaviors. Although it is possible to manually analyze learning logs, e.g., how students interact with one another and how students interact with virtual learning objects, such manual methods are only suitable for handling a small portion of the big data. Furthermore, it can be timeconsuming and resource-demanding. To solve this problem, big data analytic tools are required.

With the rapid development of big data analytics, tools and methods have emerged to examine the learning data from virtual worlds. However, these analyses so far have been mostly developed for analyzing 2D websites. For example, Minovic and Milovanovic (2013) produced a platform which provides educators with the ability to define a 2D adventure game, and with graphical representation of what students know. This technique is also used to assist the management of virtual worlds (Perera, Allison, Ajinomoh, & Miller, 2012), but it is rarely used to analyze the users' strategies in virtual worlds. In the study we follow the approach used in several recent studies to deal with big data in e-learning. For example, Bargel, Schrock, Szentes, and Roller (2012) used visualization technique to study the learning paths of students taking online courses; Lin, Yeh, Hung, and Chang (2013), for optimizing performance of creativity, used data-mining techniques of decision trees to provide personalized learning paths in a web-based learning environment. These are encouraging studies, but they are not methods that can be directly used to examine learning paths that occur in 3D virtual worlds.

In the current study, a learning path is defined as the chosen route of navigation in a virtual learning environment which enables the learner to build the target knowledge progressively. As Walkowiak, Foulsham and Eardley (2015) showed, learners' navigational strategies can affect which learning paths they take. The strategy for deciding learning paths is called the navigation strategy, which may influence performance on independent navigation tasks (Rodgers, Sindone III, & Moffat, 2011). Navigational strategy involves how users want to shift between locations and how they want to operate a virtual vehicle (Martens & Antonenko, 2012). More specifically, the use of effective navigation can affect the learners' degree of satisfaction as well as the learning outcomes (Martens & Antonenko, 2012). In virtual worlds, the acquisition of vocabulary can be viewed as a navigation learning task; thus, navigation strategy is considered as a type of LLSs and it tends to influence learning outcomes (Gu & Johnson, 1996).

In sum, the current study aims at examining the LLSs in an attempt to understand the role of individual differences in second language vocabulary acquisition. Specifically, using visualization analytic tools, the authors would identify student learning paths to better understand the LLSs that may be related to learning outcomes. The key research questions are: (1) What LLSs do learners adopt in the virtual worlds while learning a new vocabulary word? (2) How do the LLSs influence learners' performance in the acquisition of vocabulary in virtual worlds?



Fourteen students aged between 19 and 28 (mean age = 22; 8 females) from the Pennsylvania State University participated in the study. They completed the Language History Questionnaire (Li, Zhang, Tsai, & Puls, 2014), used to understand their language learning background and language proficiency level before the study, and the results showed that all the learners were native speakers of English and had no prior experience with Mandarin Chinese as a foreign language. See Lan et al. (2015) for details. The study of Lan et al. (2015) also contained another group of 17 students who learned Mandarin Chinese in a computer-assisted picture-word association paradigm, and data from those participants are not included in the current analyses, as those data are not subject to the visualization analytics presented here.


Learning materials

Three virtual world contexts were used in this study: a supermarket, a kitchen, and a zoo. Figure 1 presents an example of the virtual zoo. Learners were required to visit all three contexts to learn 90 new Chinese words, 30 words in each context. In the virtual world they were free to take any learning paths within the time limit (see below), and can click on any virtual object to learn its Chinese name. Once the participants clicked the objects, they would hear the pronunciations of the corresponding words. All words to be learned were two-syllable Chinese words. The participants were instructed that the objects labelled with "TELL" in blue (above the objects) were the target words to be learned, because there were other objects in the virtual words that were not the learning targets. Once an object was clicked, its label would turn red, indicating that it had been learned at least once. The learner could click on the same object for as many times as needed, within the 5-min time window. See Lan et al. (2015, pp. 675-676) for further details.

Learning logs

For recording and analyzing the digital learning data, the chosen tools were LSL (Linden Script Language), PHP, and MySQL to develop the Second Life Learning Database (SLLDB), which recorded all learners' event logs, including learner information, words learned, learning dates, time, and the 3D locations of the objects/words learned (The SLLDB is independent of the Second Life platform and was developed by the authors before the participants took part in the study). In addition, the authors transformed the learning logs into data that could be read by the visualization tools used in this study.


After the completions of all the learning and testing sessions, the participants were required to participate in an interview so that the authors could explore the participants' learning strategies. The interview contained 12 questions, and was compiled based on the 12 vocabulary learning strategies proposed by Lan (2013). The complete Word Learning Strategy Questionnaire can be found here (


The learning and testing sessions lasted 18 days, during which each participant attended 7 learning sessions, each consisting 3 learning contexts. Before the experiment, all participants completed the informed consent approved by the Institutional Review Board of the Pennsylvania State University. They then did a pre-test with practice trials. They made their own avatars walk around the virtual environments and clicked on different objects (which were similar but not identical to the objects used in the experimental sessions). The practice session included fewer words and a shorter duration (3 minutes). During the experimental sessions, each participant was given 5 minutes to learn 30 vocabulary words in each of the three learning contexts (see Learning materials), and there was a 3-min break between two learning contexts. The total duration of a session was 21 minutes. Each learning session was followed by a test to evaluate student's learning outcome. During the test, participants were required to listen to audio files and to select the correct images corresponding to the audio files. All the images were screenshots of the objects in the three SL virtual learning contexts corresponding to the 90 target words required for learning. Information on how they answered each question, including the length of time taken to answer a question and their accuracy, was recorded by the E-Prime software (Schneider, Eschman, & Zuccolotto, 2012). See Lan et al. (2015; pp. 676-677) for further details. The complete procedure of each learning and testing sessions is displayed in Figure 2.

The steps from "Supermarket" to "Test" were repeated for 7 times in 3 weeks. Three weeks after the end of the 7 learning sessions, the participants were asked to do a post-test and an interview. Based on how the participants performed in the post-test, they were divided into the high-achieving group and the low-achieving group. The high-achieving group included 8 students scoring over 85% in accuracy in the post-test (mean: 93.6%), and among them, 7 of them received at least 80% in the 3rd test and 1 student scored 78% in the 3rd test and 94.44% in the 4th test. The low-achieving group scored less than 85% in the post-test (mean: 67%), and only one student reached over 80% in the 4th and the 5th tests.

Data analysis

All participants' learning logs were saved in the SLLDB. Adjacency matrices and clusters produced by the learning paths were used to identify the commonly used learning strategies, the differences between the strategies used by the high-achieving and low-achieving students, and the possible explanation for such differences. The complete and detailed method used to compute the adjacency matrices and the clusters for each participant can be found here ( See Figure 3(a-b) for an example of adjacency matrix.

Due to the large amount of learning logs in virtual worlds, the R statistical software was adopted (R version 3.2.4; Development Core Team, 2004) and several modifications were made based on Lincoln's (2014) research. With the data format employed in Lincoln's research, the researchers compiled the nodes and edges. Three more algorithms were used: label propagation community, leading eigenvector community, and infomap community to analyze the data collected. The method of label propagation is a method based on Social Network Analysis (SNA; Otte & Rousseau, 2002). Although powerful, this method, however, cannot provide more details through adjacency matrices about the learning paths among nodes, such as the frequency and the directions of learning for specific words (Wong, Mackey, Foote, & May, 2013). Thus, to understand how students tended to select certain words for learning, Gephi (Bastian, Heymann, & Jacomy, 2009) was used to further observe the participants' learning paths for the acquisition of vocabulary. Gephi is an open graph visualization platform (see It shows the clusters in a network and the directional patterns and strength of any interrelationships. Gephi 0.91 (Bastian et al., 2009) was adopted and each learning object was treated as an individual in the network and the learning paths as the interrelationships. Compared to other layout algorithm, it produces high-quality results (Khokhar, 2015). It aims at exploring real-world data with high-quality visualization graphs in the output (see for further details). More details about label propagation and Gephi can be found here (


Learning strategies in the virtual world

Figure 3(a-b) shows the adjacency matrices of objects in the virtual supermarket, the order of vocabulary words arranged according to where they were located in the virtual world, for low-achieving vs. high-achieving learners, respectively.

The different colors represent different clusters formed by the participants' learning paths. For example, the green block at the bottom left in Figure 3(a) indicates a cluster with words baicai ("cabbage"), yumi ("corn"), and qincai ("celery"). Since the order of the words was arranged according to how they were located in the virtual world, each colored block in Figure 3(a-b) indicates the words that the learners grouped together as being close. The size of the block indicates the number of words categorized into a single group, and the color of the block indicates a single group of words.

This result shows that the participants tended to select the words which were the nearest to the previous word learned. For the naming of the labels, the distance-based method is a common one used by many researchers. We employed the data analysis method used by Kirsten, Wrobel, and Horvath (2001), Kuncheva and Bezdek (1998), and Biswas and Robinson (2008) to analyze the data. The method "nearest neighbour" is considered instance-based learning or "lazy learning" because by using the nearest neighbors the students do not have to make active choices but to wait for the nearest object (Kirsten et al., 2001). In the current data, it is found that this "nearest strategy" was commonly used by many students.

Figure 3(a) shows that [TEXT NOT REPRODUCIBLE IN ASCII] luobo ("carrot") itself formed a group. There are two conditions under which a single word can form a group: (a) there is no any specific route connected between the target word and other words, and (b) the target word has been frequently clicked on for several times. Given these two conditions, a single-word group indicates a "focus strategy," in contrast to the "nearest strategy."

A third strategy identified is the "cluster strategy," which means that the students selected the words that belong to the same group but are not nearby. For example, as shown in Figure 3(b), the word [TEXT NOT REPRODUCIBLE IN ASCII] naiyou ("butter") and [TEXT NOT REPRODUCIBLE IN ASCII] mianbao ("bread") belonged to the same group not because they were located nearby but because the student linked butter with bread functionally (i.e., people commonly put butter on bread). Further, the 'cluster strategy' was divided into categories based on the objects' functions, appearances, pronunciations, and the combination of function and pronunciations of the virtual objects.

The remaining grey blocks and white areas shown in Figure 3(a-b) indicate no specific strategies among the learning paths for the words. That is, those were the words randomly picked by the students for learning, which is not considered as a strategy in this study.

Different strategies used by high- versus low-achieving learners

Based on the three strategies described above and the results shown in Figure 3(a-b), Table 1 summarizes the distribution of type of strategies adopted by the high- and the low-achieving learners.

Table 1 shows that the most frequently used strategy is the "nearest strategy," occupying 54.65% of all the strategies used. The percentage of the "focus strategy" is 34.88%. The frequency of the use of this strategy is twice as high in the high-achieving learners (23.26%) as in the low-achieving learners (11.63%). A chi-square test revealed a significant difference in the type of strategy adopted by the two groups of students (p < .05; df=2).

Additionally, the two groups also differed in the adoption of the "cluster strategy": high-achieving learners used 9.30% and low-achieving learners used 1.16% of this strategy. The detailed results, which can be found here (, show that groupings created by the high-achieving students were more meaningful. Kojic-Sabo and Lightbown (1999) also adopted the cluster analysis method, but their data were based on the results of questionnaires and test scores. However, the analyses based on the visualization analytic methods are consistent with the basic patterns of Kojic-Sabo et al. (1999), showing that learning strategies are clearly connected with learning outcomes.

With a further analysis through set color by edge weight, the results can be seen in Figure 4(a-b). The horizontal x-axis shows the starting point, while the vertical y-axis shows the destination point. The colored blocks indicate where the starting and the ending points meet, showing the learning paths between different words. Warm colors are red and yellow, and cool colors are blue and green. The warmer the colors of the blocks, the more learning paths there are between the two words. As displayed in Figure 4(a-b), the high-achieving learners tended to learn vocabulary words through repetitive clicks on the same objects. For example, as Figure 4(b) shows, the color of the block for [TEXT NOT REPRODUCIBLE IN ASCII] qiezi ("eggplant") to [TEXT NOT REPRODUCIBLE IN ASCII] huanggua ("cucumber") is as warm as the color of the block for [TEXT NOT REPRODUCIBLE IN ASCII] qiezi ("eggplant") to [TEXT NOT REPRODUCIBLE IN ASCII] qiezi ("eggplant"). By contrast, in Figure 4(a), for the low-achieving students, the color of the block for [TEXT NOT REPRODUCIBLE IN ASCII] qiezi ("eggplant") to [TEXT NOT REPRODUCIBLE IN ASCII] huanggua ("cucumber") is warmer than the color of the block for [TEXT NOT REPRODUCIBLE IN ASCII] qiezi ("eggplant") to [TEXT NOT REPRODUCIBLE IN ASCII] qiezi ("eggplant").

Visualization analyses of high- versus low-achieving learners

Although the user strategies could be identified through adjacency matrices as discussed above, the complete learning logs could be used to provide further detailed analyses of individual differences between groups. An SQL script was created to link the MySQL database of SLLDB with Gephi to visualize the data. Take for example the results of the visualization graphs produced by Gephi through the force atlas layout, as shown in Figure 5(a-b), they are the learning logs of the high- versus low-achieving learners in the virtual kitchen. Here the researchers illustrate the analysis with the data from the virtual kitchen. Although the results from other virtual contexts are not displayed here, they are consistent with the following analysis based on the data from the virtual kitchen.

Figure 5(a-b) shows the visualization graph produced by Gephi through the force atlas layout. All of the learning paths and their directions are shown in lines with arrows. The thickness of lines indicates the frequency of such learning paths. The force atlas could set the repulsion strength, attraction strength, and gravity according to the size of words and thickness of lines, and the position of words and the distance between words are then adjusted. It is important to note here that the thickness does not represent an absolute value. Rather, the thickness was decided upon the frequency of certain learning paths within the group. That is, the different levels of thickness presented in Figure 5(a) and (b) should not be directly compared.

The force atlas could also distribute the attractive force along outbound links. Thus, words would be pushed away if their attraction strength is lower than that of their neighbors. In this way, the clusters of words would be clearer. As can be seen, the number of clusters created by the high-achieving group was greater than the number of clusters created by the low-achieving students, and the clusters are also clearer in the former than they are in the latter: the average clustering coefficient of high-achieving students is 0.759, while that of the low-achieving students is 0.718. In other words, the high-achieving students were more able to tell the difference between different words and create more meaningful links, while the learning paths created by the low-achieving students were quite evenly distributed, indicating that for them, most words were similar in features. Figure 5(b) also shows that high-achieving students created much thicker lines with two-way arrows, and more meaningful clusters of words were formed since only those words with a greater number of interactions in between could form clusters.

Further, the colored lines indicate the learning paths of the back-and-forth paths between two words. For example, the green lines shown in Figure 5(a-b) indicate that learners' repeated learning of those two words twice, while the blue lines indicate learning three times. The fact that more blue and green lines can be found in Figure 5(b) means the high-achieving learners tended to click between words more often.

Finally, the size of the words reflected the number of times the word was clicked on, with bigger fonts indicating more frequent clicks/learning by the learner. Similar to the description of thickness, the size of the words does not represent an absolute value and is based on the results within a single group; thus, the sizes cannot be directly compared between Figure 5(a) and (b). The results show that the low-achieving group tended to click on objects which were more easily seen. For example, the sizes of [TEXT NOT REPRODUCIBLE IN ASCII] canzhuo ("table"), [TEXT NOT REPRODUCIBLE IN ASCII] shuicao ("sink"), [TEXT NOT REPRODUCIBLE IN ASCII] zhengguo ("steamer"), and [TEXT NOT REPRODUCIBLE IN ASCII] dunguo ("stewpot") in Figure 5(a) are bigger. This could be because these virtual objects were relatively larger than all the other objects and could be more easily seen and clicked on. In addition, the words next to [TEXT NOT REPRODUCIBLE IN ASCII] canzhuo ("table") also received more frequent clicks. On the other hand, the high-achieving group clicked on virtual objects more selectively than did the low-achieving group. Specifically, the high-achieving learners did not simply click frequently on a group of words that are close in physical distance in the virtual environment.

The current study, through the visualization of Gephi, not only imported and analyzed the information on the source, target, and edge, but also the information on the number of repeated learning paths and the person who created such paths. Thus, by using the filter function of Gephi (, this study has obtained specific results according to specific conditions in this regard. Figure 6(a-b) shows the results of the learning paths that had more than 3 times of back and forth on the same pairs of words.

As can be seen in Figure 6(a-b), the high-achieving students created much more meaningful clusters. For example, as Figure 6(b) shows, Student #3032 (All student learners were numbered by random ID numbers) from the high-achieving group repeated the clicks on the pair of [TEXT NOT REPRODUCIBLE IN ASCII] ("chair") and [TEXT NOT REPRODUCIBLE IN ASCII] dengzi ("stool") and the pair of [TEXT NOT REPRODUCIBLE IN ASCII] chahu ("teapot") and [TEXT NOT REPRODUCIBLE IN ASCII] chabei ("teacup"), indicating that this student applied the "cluster strategy" of pronunciation over 3 times (perhaps because of the similar syllables involved in the two pairs of word). Similarly, Student #3039 also used the "cluster strategy" of pronunciation by repeatedly clicking on [TEXT NOT REPRODUCIBLE IN ASCII] zhengguo ("steamer") and [TEXT NOT REPRODUCIBLE IN ASCII] dunguo ("stewpot") over 3 times. In contrast, Figure 6(a) shows that while Student #3036 showed similar patterns as students from the high-achieving group for [TEXT NOT REPRODUCIBLE IN ASCII] zhengguo ("steamer") and [TEXT NOT REPRODUCIBLE IN ASCII] dunguo ("stewpot"), Student #3045 repeatedly clicked on the pair of [TEXT NOT REPRODUCIBLE IN ASCII] shoutao ("mitten") and [TEXT NOT REPRODUCIBLE IN ASCII] kaoxiang ("oven") and the pair of [TEXT NOT REPRODUCIBLE IN ASCII] shuicao ("sink") and kaoxiang ("oven"). Since the latter pair of words were located next to each other, it implies that Student #3045 was able to link two words together simply based on the position of objects. A more in-depth analysis of the interview also revealed similar results. The detailed and further explanation of the results can be found here (

Although some findings are not significant enough due to the small sample size, these visualization analyses provide informative data about students' learning behaviors, and they could also serve as useful reference for instruction, for example, to the teachers who want to know what words the students are focusing on in learning. Meaningful repetition of words is an important indicator of whether the learner uses strategic learning. The results show that 5.14% of the learning paths created by the high-achieving students can be regarded as "repetition," while the percentage of "repetition" created by the low-achieving students is 3.72%. Out of 36,734 repetitions, 99% occurred below 3 times, while 27 repetitions were found to occur over 3 times. Among the word pairs that had occurrences over 3 times, the repetitive clicks between [TEXT NOT REPRODUCIBLE IN ASCII] songshu ("squirrel") and [TEXT NOT REPRODUCIBLE IN ASCII] tuzi ("rabbit") (cluster by appearance) reached as many as 6 times by the high-achieving students, and those between [TEXT NOT REPRODUCIBLE IN ASCII] daxiang ("elephant") and [TEXT NOT REPRODUCIBLE IN ASCII] tuoniao ("ostrich") as many as 11 times by the low-achieving student. The repetitions created by the high-achieving students are considered more meaningful because they are based on similarity of either the pronunciation or the appearance of the virtual objects.

When interviewed, the participants from different groups provided quite different answers. The main difference could be seen particularly in when asked whether they found any rules for Chinese words. All of the students in the high-achieving group answered "Yes," while only half of the students in the low-achieving group answered "Yes." This finding could indicate that the high-achieving students could identify either the link or the rules between the two words and that it may also explain why the students from the two groups adopted different learning strategies (cluster strategy by the high-achieving group vs. nearest strategy by the low-achieving group).


The current study combines tools for data visualization and database from learning logs recorded from virtual worlds (SLLDB) to analyse learners' behaviours while they learned vocabulary of Mandarin as a foreign language in a virtual world. It should be noted that the three contexts included in this research display different features in terms of the arrangement of the space. Thus, when presenting the results of the adjacency matrices, the authors especially selected the results obtained from the supermarket due to its "U"-shaped arrangement. Presumably, the learners would be likely to click on different words based on what's close to the words previously learned, clearly one of the results obtained from the low-achieving students. However, the highachieving students were able to conduct more strategic learning by creating different clusters not merely based on the physical positions of the objects in the virtual worlds. When presenting the results presented by Gephi, the authors chose the results received from the kitchen because students from both groups could easily create clusters, for the objects in the kitchen were arranged without a specific order, forcing the students to rely on specific LLSs to help decide their learning paths. The zoo was designed as a round area, in which students were more likely to walk around and along the ring. That's why fewer clusters were formed because it would be inconvenient for students to click between objects back and forth. That being said, the results obtained from different contexts were consistent with each other regardless of which visualization analytics, R or Gephi, was used.

The visualization tools allowed us to effectively examine the links between students' choice of learning strategies and their learning paths. The three main learning strategies are the "nearest strategy," the "focus strategy," and the "cluster strategy." The "cluster strategy" is further divided into three types: by appearance, by pronunciation, and by function. In addition, the high-achieving learners tended to employ particular learning strategy to do vocabulary learning, while the low-achieving learners showed more tendency to learn words either by using the nearest neighbours strategy or randomly. The analyses indicated that there is a difference in the learning strategies adopted by the high- versus the low-achieving learners.

The findings also showed a few features from the students' learning paths. Previous studies such as Camilleri et al. (2013) also provided visualization analysis results, which mostly included visit time and traffic. The work reported here demonstrates that the results based on social network analytic methods can provide further detailed information, including visualizing the links among different elements/words/objects. Other studies such as Tashiro et al. (2010) used PathFinder to visualize student learning paths, but their graphs show only the learning paths of individual students. The analyses can provide graphs of not only individual learners, but also different learning groups. With this type of graph analysis, researchers can effectively capture student learning in selfexploratory environments such as the SL virtual worlds, and further provide teachers with the whole picture of students' behaviours as well as individual learning paths.

The visualization method used in this study provides useful information for designing learning environments that can more effectively guide their students in learning vocabulary or other aspects of languages. It is suggested that researchers design the environments based on the features of objects clustering, and make best use of the space and the arrangement of virtual objects in virtual worlds. For example, if the virtual environment can be arranged to place objects similar in appearances, function, or pronunciations in the same area, learners can then more easily navigate among the words that are similar and build learning paths that are shorter and more effective.

One limitation of the current exploratory work is the small number of participants and the short training/learning period for the results to be generalized to all students. Another limitation is that this research has focused on the learning of Mandarin, and whether similar patterns would be obtained for other languages learned in virtual worlds is yet unclear. Finally, it would also be interesting to find out whether the learner's spatial ability would affect their performance in language learning in virtual worlds. Despite these limitations, it is hoped that this work will provide an initial effort to analyze language learning big data by employing the visualization analytics to identify learning paths, learning strategies, and their relations with learning outcomes.


We would like to thank two anonymous reviewers and editors for their valuable comments and suggestions for improving this article. We thank the Ministry of Science and Technology, Taiwan, R.O.C., under Grant Nos. MOST 103-2628-S-003 -002 -MY3, MOST 105-2511-S-003 -018 -MY3, MOST 105-2911-I-003-301, and the US National Science Foundation grants, BCS-1349110 and BCS-1338946, for financially supporting this research. We are also grateful that this research was partially supported by the Aim for the Top University Project and Center of Learning Technology for Chinese of National Taiwan Normal University (NTNU), sponsored by the Ministry of Education, Taiwan, R.O.C., and the International Research-Intensive Center of Excellence Program of NTNU.


Agudo-Peregrina, A. F., Iglesias-Pradas, S., Conde-Gonzalez, M. A., & Hernandez-Garcia, A. (2014). Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLEsupported F2F and online learning. Computers in Human Behavior, 31, 542-550.

Barcroft, J. (2004). Second language vocabulary acquisition: A Lexical input processing approach. Foreign Language Annals, 37(2), 200-208.

Bargel, B. A., Schrock, J., Szentes, D., & Roller, W. (2012). Using learning maps for visualization of adaptive learning path components. International Journal of Computer Information Systems and Industrial Management Applications, 4(1), 228235.

Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An Open source software for exploring and manipulating networks. In Proceedings of the Third International ICWSM conference (Vol. 8, pp. 361-362). Retrieved from

Biswas, P., & Robinson, P. (2008). Modelling user interfaces for special needs. In Accessible Design in the Digital World (ADDW) 2008. Retrieved from

Camilleri, V., de Freitas, S., Montebello, M., & McDonagh-Smith, P. (2013). A Case study inside virtual worlds: Use of analytics for immersive spaces. In Proceedings of the Third International Conference on Learning Analytics and Knowledge (pp. 230-234). New York, NY: ACM.

Cobb, T. (2007). Computing the vocabulary demands of L2 Reading. Language Learning & Technology, 11(3), 38-63.

Cohen, A. D., & Aphek, E. (1980). Retention of second-language vocabulary overtime: Investigating the role of mnemonic associations. System, 8(3), 221-235.

Cruz-Benito, J., Theron, R., Garcia-Penalvo, F. J., Maderuelo, C., Perez-Blanco, J. S., Zazo, H., & Martin-Suarez, A. (2014). Monitoring and feedback of learning processes in virtual worlds through analytics architectures: A Real case. In 2014 9th Iberian Conference on Information Systems and Technologies (CISTI) (pp. 1-6). doi:10.1109/CISTI.2014.6877097

Dalgarno, B., & Lee, M. J. W. (2010). What are the learning affordances of 3-D virtual environments? British Journal of Educational Technology, 41(1), 10-32.

Development Core Team, R. (2004). R: A Language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Gu, Y, & Johnson, R. K. (1996). Vocabulary learning strategies and language learning outcomes. Language Learning, 46(4), 643-679.

Heer, J., Bostock, M., & Ogievetsky, V. (2010). A Tour through the visualization zoo. Communications of the ACM, 53(6), 59-67.

Kaplan, A. M., & Haenlein, M. (2009). The Fairyland of Second Life: Virtual social worlds and how to use them. Business Horizons, 52(6), 563-572.

Kelly, P. (1991). Lexical ignorance: The Main obstacle to listening comprehension with advanced foreign language learners. IRAL, 29(2), 135-149.

Kern, R. G. (1989). Second language reading strategy instruction: Its Effects on comprehension and word inference ability. The Modern Language Journal, 73(2), 135-149.

Khokhar, D. (2015). Gephi cookbook. Birmingham, UK: Packt Publishing Ltd.

Kirsten, M., Wrobel, S., & Horvath, T. (2001). Distance based approaches to relational learning and clustering. In S. Dzeroski & N. Lavrac (Eds.), Relational Data Mining (pp. 213-232). doi:10.1007/978-3-662-04599-2_9

Kojic-Sabo, I., & Lightbown, P. M. (1999). Students' approaches to vocabulary learning and their relationship to success. The Modern Language Journal, 83(2), 176-192.

Krashen, S. D. (1985). Second language acquisition and second language learning (Reprinted). Oxford. UK: Pergamon Pr.

Kroll, J. F., & Curley, J. (1988). Lexical memory in novice bilinguals: The Role of concepts in retrieving second language words. Practical aspects of memory, 2(389-395), 8.

Kuncheva, L. I., & Bezdek, J. C. (1998). Nearest prototype classification: Clustering, genetic algorithms, or random search? IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 28(1), 160-164.

Lan, Y.-J. (2013). The Effect of technology-supported co-sharing on L2 vocabulary strategy development. Journal of Educational Technology & Society, 16(4), 1-16.

Lan, Y.-J. (2015). Action research contextual EFL learning in a 3D virtual environment. Announcements & Call for Papers, 16.

Lan, Y.-J., Fang, S.-Y., Legault, J., & Li, P. (2015). Second language acquisition of Mandarin Chinese vocabulary: Context of learning effects. Educational Technology Research and Development, 63(5), 671-690.

Lan, Y.-J., Kan, Y.-H., Hsiao, I. Y. T., Yang, S. J. H., & Chang, K.-E. (2013). Designing interaction tasks in Second Life for Chinese as a foreign language learners: A Preliminary exploration. Australasian Journal of Educational Technology, 29(2). doi:10.14742/ajet.144

Lin, C. F., Yeh, Y., Hung, Y. H., & Chang, R. I. (2013). Data mining for providing a personalized learning path in creativity: An application of decision trees. Computers & Education, 68, 199-210.

Lincoln, M. D. (2014, December 20). Adjacency matrix plots with R and ggplot2. Retrieved from

Li, P. (2015). Bilingualism as a dynamic process. In B. MacWhinney & W. O'Grady (Eds.), The Handbook of Language Emergence (pp. 511-536). Malden, MA: John Wiley & Sons, Inc.

Li, P., Zhang, F., Tsai, E., & Puls, B. (2014). Language history questionnaire (LHQ 2.0): A New dynamic web-based research tool. Bilingualism: Language and Cognition, 17(03), 673-680.

Llach, M. del P. A., & Gomez, A. B. (2007). Children's characteristics in vocabulary acquisition and use in the written production. Revista Espanola de Linguistica Aplicada, (20), 9-26.

Malt, B. C., Li, P., Pavlenko, A., Zhu, H., & Ameel, E. (2015). Bidirectional lexical interaction in late immersed MandarinEnglish bilinguals. Journal of Memory and Language, 82, 86-104.

Martens, J., & Antonenko, P. D. (2012). Narrowing gender-based performance gaps in virtual environment navigation. Computers in Human Behavior, 28(3), 809-819.

Meara, P. (1980). Vocabulary acquisition: A Neglected aspect of language learning. Language Teaching, 13(3-4), 221-246.

Minovic, M., & Milovanovic, M. (2013). Real-time learning analytics in educational games. In Proceedings of the First International Conference on Technological Ecosystem for Enhancing Multiculturality (pp. 245-251). New York, NY: ACM.

Otte, E., & Rousseau, R. (2002). Social network analysis: A Powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441-453.

Oxford, R. L. (1990). Language learning strategies: What every teacher should know. New York, NY: Newbury House Publisher.

Oxford, R. L. (2003). Language learning styles and strategies. Retrieved from

Perera, I., Allison, C., Ajinomoh, O., & Miller, A. (2012). Managing 3D multi user learning environments--A Case study on training disaster management. International Journal of Emerging Technologies in Learning (iJET), 7(3). doi:10.3991/ijet.v7i3.2046

Rankin, Y., Gold, R., & Gooch, B. (2006). 3D role-playing games as language learning tools. In Eurographics (Education Papers) (pp. 33-38). Retrieved from g_Tools/links/54b9b[B.sub.2]40cf24e50e93dc9ca/3D-Role-Playing-Games-as-Language- Learning-Tools.pdf

Rodgers, M. K., Sindone III, J. A., & Moffat, S. D. (2011). Navigation strategy as a predictor of navigation performance. In Proceedings of the 33rd Annual Meeting of the Cognitive Science Society (pp. 2770-2775). Austin, TX: Cognitive Science Society. Retrieved from

Sagarra, N., & Alba, M. (2006). The Key is in the keyword: L2 vocabulary learning methods with beginning learners of Spanish. The Modern Language Journal, 90(2), 228-243.

Scarcella, R. C., & Oxford, R. L. (1992). The Tapestry of language learning: The Individual in the communicative classroom. Boston, MA: Heinle & Heinle Publishers.

Schneider, W., Eschman, A., & Zuccolotto, A. (2012). E-Prime user's guide. Pittsburgh, PA: PsychologySoftware Tools Inc.

Smidt, E., & Hegelheimer, V. (2004). Effects of online academic lectures on ESL listening comprehension, incidental vocabulary acquisition, and strategy use. Computer Assisted Language Learning, 17(5), 517-556.

Snow, M. A. (2005). A Model of academic literacy for integrated language and content instruction. In Handbook of Research in Second Language Teaching and Learning (pp. 693-712). Mahwah, NJ: Lawrence Erlbaum.

Tashiro, J., Hung, P. C. K., & Martin, M. V. (2010). ROAD-MAP for educational simulations and serious games. In P. Tsang, S. K. S. Cheung, V. S. K. Lee, & R. Huang (Eds.), Hybrid Learning (pp. 186-204). doi:10.1007/978-3-642-14657-2_18

Walkowiak, S., Foulsham, T., & Eardley, A. F. (2015). Individual differences and personality correlates of navigational performance in the virtual route learning task. Computers in Human Behavior, 45, 402-410.

Warburton, S. (2009). Second Life in higher education: Assessing the potential for and the barriers to deploying virtual worlds in learning and teaching. British Journal of Educational Technology, 40(3), 414-426.

Wong, P. C., Mackey, P., Foote, H., & May, R. (2013). Visual matrix clustering of social networks. IEEE Computer Graphics and Applications, 33(4), 88-96.

Zinszer, B. D., Malt, B. C., Ameel, E., & Li, P. (2014). Native-likeness in second language lexical categorization reflects individual language history and linguistic community norms. Frontiers in Psychology, 5. doi: 10.3389/fpsyg.2014.01203

Indy Y. T. Hsiao (1), Yu-Ju Lan (1)*, Chia-Ling Kao (1) and Ping Li (2)

(1) Department of Applied Chinese Language and Culture, National Taiwan Normal University, Taiwan // (2) Department of Psychology and Center for Brain, Behavior, and Cognition, Pennsylvania State University, University Park, PA, USA // // // //

* Corresponding author

Caption: Figure 1. The virtual zoo for L2 word learning; the learner avatar can move around in the zoo and tap any of the 30 animals to listen to the corresponding L2 term/word referring to the animal

Caption: Figure 2. The session procedure for each participant

Caption: Figure 3(a). The adjacency matrices of objects in the virtual supermarket from low-achieving learners

Caption: Figure 3(b). The adjacency matrices of objects in the virtual supermarket from high-achieving learners

Caption: Figure 4(a). Set color of objects in the virtual supermarket used by low-achieving learners

Caption: Figure 4(b). Set color of objects in the virtual supermarket used by high-achieving learners

Caption: Figure 5(a). Gephi displays of objects in the virtual kitchen used by low-achieving learners

Caption: Figure 5(b). Gephi displays of objects in the virtual kitchen used by high-achieving learners

Caption: Figure 6(a). Gephi displays of objects in the virtual kitchen repeatedly clicked for more than 3 times by low achieving learners

Caption: Figure 6(b). Gephi displays of objects in the virtual kitchen repeatedly clicked for more than 3 times by high-achieving learners
Table 1. Distribution of type of strategies adopted by
high- and low-achieving learners

Type of strategy            Frequency    High-achieving

Nearest strategy           47 (54.65%)     21 (24.42%)
Focus strategy             30 (34.88%)     20 (23.26%)
Cluster (Appearance)        4 (4.65%)        4 (4.65%)
Cluster (Function)          1 (1.16%)        1 (1.16%)
Cluster (Pronunciation)     2 (2.33%)        1 (1.16%)
Cluster (Function and       2 (2.33%)        2 (2.33%)
Total                       86 (100%)      49 (56.98%)

Type of strategy           Low-achieving

Nearest strategy            26 (30.23%)
Focus strategy              10 (11.63%)
Cluster (Appearance)          0 (0.00%)
Cluster (Function)            0 (0.00%)
Cluster (Pronunciation)       1 (1.16%)
Cluster (Function and         0 (0.00%)
Total                       37 (43.02%)

Type of strategy           Number of the same clusters by
high- and low-achieving
Nearest strategy                               8 (18.60%)
Focus strategy                                 7 (16.28%)
Cluster (Appearance)                            0 (0.00%)
Cluster (Function)                              0 (0.00%)
Cluster (Pronunciation)                         1 (2.33%)
Cluster (Function and                           0 (0.00%)
Total                                         16 (37.21%)
COPYRIGHT 2017 International Forum of Educational Technology & Society
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Hsiao, Indy Y.T.; Lan, Yu-Ju; Kao, Chia-Ling; Li, Ping
Publication:Educational Technology & Society
Geographic Code:9TAIW
Date:Apr 1, 2017
Previous Article:Guest editorial: learning analytics in technology enhanced language learning.
Next Article:To activate English learning: listen and speak in real life context with an AR featured u-learning system.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters