Publications
2024
- Reducing Redundancy in Japanese-to-English Translation: A Multi-Pipeline Approach for Translating Repeated Elements in JapaneseQiao Wang, Yixuan Huang , and Zheng YuanIn Proceedings of the Ninth Conference on Machine Translation (WMT) , 2024
This paper presents a multi-pipeline Japanese-to-English machine translation (MT) system designed to address the challenge of translating repeated elements from Japanese into fluent and lexically diverse English. The system was developed as part of the Non-Repetitive Translation Task at WMT24, which focuses on minimizing redundancy while maintaining high translation quality. Our approach utilizes MeCab, the de facto Natural Language Processing (NLP) tool for Japanese, to identify repeated elements, and Claude Sonnet 3.5, a Large Language Model (LLM), for translation and proofreading. The system effectively accomplishes the shared task by identifying and translating in a diversified manner 89.79% of the 470 repeated instances in the test dataset and achieving an average translation quality score of 4.60 out of 5, significantly surpassing the baseline score of 3.88. The analysis also revealed challenges, particularly in identifying standalone noun-suffix elements and occasional cases of consistent translations or mistranslations.
- Sequence Tagging Approach in Grammar Error Detection: Identifying Areas of Improvement for the State-of-the-ArtQiao Wang, and Zheng Yuanpreprint, 2024
This study provides a qualitative evaluation of Seqtagger, a state-of-the-art machine learning-based sequence-tagging model developed for grammatical error detection (GED) and correction (GEC). The model’s performance is evaluated on error detection against human benchmarks, with academic texts written by Japanese university students. Through human annotation and subsequent thematic analysis on failures in error detection, this study reveals that Seqtagger performs well in detecting errors related to simpler grammatical rules such as adverb position and prepositions in fixed collocations, with poorer performance in errors possibly influenced by the Japanese language, macro-structure errors and errors where human judgment is required. The underlying reasons for failures in detection are identified to be a narrow context window that fails to capture broader textual information, insufficient training data, particularly data that fully represents the linguistic characteristics of the Japanese students, and overgeneralization of patterns from the training data. These findings highlight the need for sequence-tagging GED and GEC tools to enhance their context window, be more adaptable to the diverse linguistic features of global learners and to enhance the ability to understand the linguistic complexities of the English language.
- Effectiveness of Large Language Models in Automated Evaluation of Argumentative Essays: Finetuning vs. Zero-Shot PromptingQiao Wang, and John GayedComputer Assisted Language Learning, 2024
To address the long-standing challenge facing traditional automated writing evaluation (AWE) systems in assessing higher-order thinking, this study built an AWE system for scoring argumentative essays by finetuning the GPT-3.5 Large Language Model and compared the system’s effectiveness with that of the non-finetuned GPT-3.5 and GPT-4 base models via zero-shot prompting methods. The dataset used was the TOEFL Public Writing Dataset provided by Education Testing Service, containing 480 argumentative essays with ground truth scores under two essay prompts. Three finetuned models were generated: two finetuned exclusively on either prompt and one on both. All finetuned and base models were used to score the remaining essays after finetuning and their scoring effectiveness was compared with ground truth scores as the benchmark. The impact of the variety of finetuning prompts and the robustness of finetuned models were also explored. Results showed a 100% consistency of all models in two scoring sessions. More importantly, the finetuned models significantly outperformed the base models in accuracy and reliability. The best-performing model, finetuned on prompt 1, showed an RMSE of 0.57, a percentage agreement (score discrepancy≤0.5) of 84.72% and a QWK of 0.78. Further, the model finetuned on both prompts did not exhibit enhanced performance, and the two models finetuned on one prompt remained robust when scoring essays from the alternative prompt. These results suggest 1) task-specific finetuning for AWE is beneficial; 2) finetuning does not require a large variety of essay prompts; and 3) fine-tuned models are robust to unseen essays.
- Assessing the Efficacy of Grammar Error Correction: A Human Evaluation Approach in the Japanese ContextQiao Wang, and Zheng YuanJoint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024
In this study, we evaluated the performance of the state-of-the-art sequence tagging grammar error detection and correction model (SeqTagger) using Japanese university students’ writing samples. With an automatic annotation toolkit, ERRANT, we first evaluated SeqTagger’s performance on error correction with human expert correction as the benchmark. Then a human-annotated approach was adopted to evaluate Seqtagger’s performance in error detection using a subset of the writing dataset. Results indicated a precision of 63.66% and a recall of 20.19% for error correction in the full dataset. For the subset, after manual exclusion of irrelevant errors such as semantic and mechanical ones, the model shows an adjusted precision of 97.98% and an adjusted recall of 42.98% for error detection, indicating the model’s high accuracy but also its conservativeness. Thematic analysis on errors undetected by the model revealed that determiners and articles, especially the latter, were predominant. Specifically, in terms of context-independent errors, the model occasionally overlooked basic ones and faced challenges with overly erroneous or complex structures. Meanwhile, context-dependent errors, notably those related to tense and noun number, as well as those possibly influenced by the students’ first language (L1), remained particularly challenging.
- Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5arXiv preprint arXiv:2403.02078, 2024
A common way of assessing language learners’ mastery of vocabulary is via multiple-choice cloze (i.e., fill-in-the-blank) questions. But the creation of test items can be laborious for individual teachers or in large-scale language programs. In this paper, we evaluate a new method for automatically generating these types of questions using large language models (LLM). The VocaTT (vocabulary teaching and training) engine is written in Python and comprises three basic steps: pre-processing target word lists, generating sentences and candidate word options using GPT, and finally selecting suitable word options. To test the efficiency of this system, 60 questions were generated targeting academic words. The generated items were reviewed by expert reviewers who judged the well-formedness of the sentences and word options, adding comments to items judged not well-formed. Results showed a 75% rate of well-formedness for sentences and 66.85% rate for suitable word options. This is a marked improvement over the generator used earlier in our research which did not take advantage of GPT’s capabilities. Post-hoc qualitative analysis reveals several points for improvement in future work including cross-referencing part-of-speech tagging, better sentence validation, and improving GPT prompts.
2023
- Mapping the research trends of digital game-based language learning (DGBLL): a scientometrics reviewComputer Assisted Language Learning, 2023
The research on digital game-based language learning (DGBLL) keeps growing, but a comprehensive account of its development in the recent two decades is still lacking. Therefore, the study presents a scientometrics review of the field based on the bibliometric records retrieved from the Web of Science Core Database. 205 publications with their references were included in this review. The science mapping software Citespace was employed to compute the properties of the publications with a view to outlining the major research areas and detecting the trends of DGBLL. The document co-citation analysis has identified major research clusters as educational game, MMORPG, out-of-school gameplay, vocabulary, game-based learning and theoretical underpinnings. The content of the major clusters was examined to gain a deeper understanding of the field and the burstness analysis has revealed that over the years, research on MMORPG has experienced fluctuations in activity, whereas the interest in vocabulary learning remained stable. The findings also highlighted the need for more research into the educational vs. commercial adoption in the context of language learning and teaching. As the first scientometrics review in the field, the study supplements traditional reviews by tracing the development of DGBLL over time via data-driven analysis. The discussion concludes by identifying gaps in the literature and offering suggestions for future research.
- The role of computer games in Chinese students’ narrative writing: A case study with The SimsQiao Wang, and Ke LiOsaka JALT Journal, Dec 2023
This single-participant study explores the role of story-rich computer games in an EFL narrative writing course for Chinese university students. In a two-month English writing class using the game The Sims 4, a participant received instruction on gameplay techniques andlanguage, completedgame quests set by the teacher, wrote 14 narratives based on gameplay events, and receivedall-encompassing corrective feedback on her writing samples. A pre-test on game vocabulary and two writing preand post-tests were also administered. The researchers evaluated the participant’s EFL narrative writing performance with an in-depth analysis of linguistic features using NLP tools which included syntax, lexicon, cohesion, and content. A follow-up interview was conducted to complement the results from the writing evaluation and to provide information on the participant’s perception and attitude towards the GBW class. Results show that while there was no consistent improvement in the participant’s EFL narrative writing performance, the rich stories and contextualized vocabulary in the game contributed to the participant’s content and lexicon and helped the participant to better understand howtowritenarratives.
- The role of live transcripts in synchronous online L2 classrooms: Learning outcomes and learner perceptionsQiao Wang, and Yijun ChenEducation and Information Technologies, Apr 2023
This study explored the role of live transcripts in online synchronous academic English classrooms by focusing on how automatically generated live transcripts influence the learning outcomes of lower-proficiency and higher-proficiency learners and on their perceptions towards live transcripts. The study adop ted a 2 × 2 factorial design, with the two factors being learner proficiency (high vs. low) and availability of live transcription (presence and absence). The participants were 129 second-year Japanese university students from four synchronous classes taught on Zoom by the same teacher under an academic English reading course. Learning outcomes in this study were evaluated according to the course syllabus through grades and participation in class activities. A questionnaire consisting of nine Likert-scale questions and a comment box was administered to explore participants’ perceived usefulness of, perceived ease of use of, and perceived reliance on live transcripts. Results showed that contrary to previous studies reporting the effectiveness of captioned audiovisual materials in L2 learning, live transcripts as a special type of captions were not effective in promoting the grades of learners of either proficiency. However, it significantly improved the activity participation of lower-proficiency learners, but not that of higher-proficiency learners. Questionnaire results showed that there were no significant differences between learners of two proficiencies in their perceptions towards live transcription, which contradicts previous findings that lower-proficiency learners tend to rely more on captions. Besides enhancement of lecture comprehension, participants reported innovative uses of live transcripts such as screenshots with transcripts for notetaking purposes and transcripts downloaded for later review.
- A content-controlled monolingual comparable corpus approach to comparing learner and proficient argumentative writingQiao Wang, Laurence Anthony, and Nurul Ihsan ArshadResearch Methods in Applied Linguistics, Aug 2023
This mixed-methods study approaches the differences between learner and proficient argumentative writing by building a content-controlled monolingual comparable corpus (CCMCC) that contains learner-teacher sample pairs of the same semantic content. Twenty-seven learner samples were collected from 27 Chinese university students who each wrote on one topic from the second writing task of IELTS Academic. To generate content-controlled teacher samples, an experienced teacher revised or rewrote each learner sample after confirming the ideas learners intended to express through individual and face-to-face communication with each learner. Then, a native speaker checked the language of the teacher samples. In data analysis, each learner-teacher sample pair was analyzed using Coh-Metrix to generate statistics in 45 indices under text length, syntax, lexicon, and cohesion, after which a shortlist of indices of both statistically and practically significant differences was identified. Qualitatively, the researchers identified the differences through side-by-side comparisons of sample pairs and coded the important patterns in the differences to explore their underlying reasons. This approach generated different quantitative results from previous corpus-based comparative writing studies, such as the ineffectiveness of cohesion indices to distinguish learner and proficient writing. Qualitative analysis further revealed noteworthy findings including the lack of concision in learner writing and learners’ unfamiliarity with using prepositional phrases to express actions. The advantages, limitations and implications of this approach are discussed.
- The Use of Network-Based Virtual Worlds in Second Language Education: A Research ReviewMark Peterson, Qiao Wang, and Maryam Sadat MirzaeiDec 2023
This chapter reviews 28 learner-based studies on the use of network-based social virtual worlds in second language learning published during the period 2007-2017. The purpose of this review is to establish how these environments have been implemented and to identify the target languages, methods used, research areas, and important findings. Analysis demonstrates that research is characterized by a preponderance of small-scale studies conducted in higher education settings. The target languages most frequently investigated were English, Spanish, and Chinese. In terms of the methodologies adopted, analysis reveals the majority of studies were qualitative in nature. It was found that the investigation of learner target language production, interaction, and affective factors represent the primary focus of research. Although positive findings relating to the above areas have been reported, the analysis draws attention to gaps in the current research base. The researchers provide suggestions for future research.
2022
- The use of semantic similarity tools in automated content scoring of fact-based essays written by EFL learnersQiao WangEducation and Information Technologies, Jun 2022
This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard was produced by a native expert. A shortlist of carefully selected tools, including InferSent, spaCy, DKPro, ADW, SEMILAR and Latent Semantic Analysis, generated semantic similarity scores between student writing samples and the expert sample. Three teachers who were lecturers of the course manually graded the student samples on content. To ensure validity of human grades, samples with discrepant agreement were excluded and an inter-rater reliability test was conducted on remaining samples with quadratic weighted kappa. After the grades of the remaining samples were proven valid, a Pearson correlation analysis between semantic similarity scores and human grades was conducted and results showed that InferSent was the most effective tool in predicting the human grades. The study further pointed to the limitations of the six tools and suggested three alternatives to traditional methods in turning semantic similarity scores into reporting grades on content.
- Evaluation Dataset of Multiple-Choice Cloze Items for Vocabulary Training and TestingIn Proceedings of the 2022 ACM International Joint Conference on Pervasive and Ubiquitous Computing , Sep 2022
Vocabulary learning is a typical part of nearly any second language learning curriculum. This entails methodologies and materials for training and testing vocabulary knowledge in learners. In large-scale programs, the preparation of such materials can be labor intensive and thus automatic means of generation are desirable. VocaTT (Vocabulary Training and Testing) is an ongoing project to use machine learning methods to generate novel multiple choice cloze (i.e., fill-in-the-blank) items for use in second language learning programs. This paper describes the ongoing creation of a gold standard set of multiple-choice cloze items to be used in training a machine learning algorithm. Machine-generated multiple choice cloze items were reviewed by two experienced language teachers, who evaluated each item for well-formedness (i.e., suitability as multiple-choice cloze test item) with three options: reject as unsalvageable, keep as-is, or revise into a well-formed item as they thought best. Results for a 600-item set that both checkers evaluated show moderate agreement on the question of rejection but slight agreement for keeping as-is. For revised items, the agreement on what type of revisions to make was slight to fair. In an expanded set of 2,792 items, checkers judged most items as needing revision but made varying kinds of revisions to yield well-formed items. Interested researchers may contact the authors to inquire about how they may access and use the evaluation dataset.
- Out-of-school language learning through digital gaming: a case study from an activity theory perspectiveKe Li, Mark Peterson, and Qiao WangComputer Assisted Language Learning, May 2022
This study applies Activity Theory to describe and analyze an out-of-school project in which eight Chinese university students utilized a massively multiplayer online game (MMOG) to learn English. Based on data collected through questionnaires, gaming journals, gaming recordings and interviews, thematic analysis was performed to identify the recurrent themes, which were then mapped onto the activity system. Four contradictions were identified in the process. Temporary contradictions dominated the early phase of the project and were easily resolved. However, inherent contradictions, mainly manifesting themselves through inadequate competence and learner variation, remained unresolved. Efforts to overcome these tensions resulted in the evolvement of the activity system. In terms of the actual outcomes, there was evidence for the development and exercise of autonomy. Learners also reported enhanced confidence and gains in vocabulary, listening and oral fluency. The study contributes new knowledge to the field by revealing how non-gamers make use of digital gaming for language learning in an informal setting. Pedagogical implications for digital game-based language learning are discussed and suggestions for future research are also provided.
- A Review of Research on the Application of Digital Games in Foreign Language EducationMay 2022
The use of digital games represents an expanding domain in computer-assisted language learning (CALL) research. This chapter reviews the findings of 26 learner-based studies in this area that are informed by cognitive and social accounts of SLA. The analysis shows that massively multiplayer online role-playing games (MMORPGs) are the most frequently investigated game type and the majority of studies involved EFL learners in higher education. Mixed methods were the most frequent research tool utilized by researchers. Limitations of current research include the preponderance of small-scale experimental studies that investigated only a limited number of factors. Although the research is not conclusive, findings indicate that game play facilitates collaboration, the production of target language output, vocabulary learning, and reduces the influence of factors that inhibit learning. This chapter concludes by identifying promising areas for future research.
2021
- Using Community of Inquiry to Scaffold Language Learning in Out-of-School Gaming: A Case StudyKe Li, Mark Peterson, and Qiao WangInternational Journal of Game-Based Learning, Jan 2021
This paper reports on a project that draws upon the framework of the community of inquiry to support game-based language learning outside classroom. Case study design was employed to collect and analyze both qualitative and quantitative data, with a view to investigating the participants’ language development, participation, and perception. This study spanned a 6-week period and involved 11 intermediate English learners in China. The volunteer participants played an interactive adventure game in an out-of-class setting, with the instructor present and scaffolds available online. Results showed that the participants gained statistically significant vocabulary development and believed they made progress in listening and reading. Moreover, it is found that the participants were the most active in the first two and final weeks. The findings also showed general satisfaction and improved learning autonomy, highlighting the pivotal role of the instructor. The paper concludes by discussing its limitations and identifying future research directions.
2020
- The Role of Classroom-Situated Game-Based Language Learning in Promoting Students’ Communicative CompetenceQiao WangInternational Journal of Computer-Assisted Language Learning and Teaching, Apr 2020
The study is the second in a series of mixed-methods studies on the integration of The Sims 4, a life-simulation game, into language classrooms. In this study, the researcher explores the effect of game-based language learning (GBLL) on students’ English communicative competence from three aspects, interaction, fluency and content, in a Japanese university. In class, students received instruction from the teacher on game language and gameplay skills, played the game on their own and presented gameplay stories. The presentations were recorded for evaluation. Surveys were also administered for students’ perceptions on the GBLL classroom. Results showed that no clear improvement in communicative competence was suggested by quantitative evaluation. Qualitatively data, however, indicated that the game afforded students interesting events and proper expressions in presentations and that the teacher played a vital role in ensuring ample interactional opportunities and linguistic support. Suggestions for future research in classroom-situated GBLL were also proposed.
2019
- Classroom intervention for integrating simulation games into language classrooms: An exploratory study with the SIMS 4Qiao WangCALL-EJ, Apr 2019
This study explored three forms of classroom intervention: teacher instruction, peer interaction and in-class activities, for the purpose of integrating simulation games into a vocabulary-focused English classroom. The aim was to establish which intervention is most effective, as well as what improvements should be made for future application. The study took the form of a controlled experiment and evaluation of the interventions was based on concurrently collected quantitative and qualitative data. The researcher concluded that while quantitative data failed to confirm any statistical significance between the two groups, qualitative data suggested two forms of intervention, teacher instruction and in-class activities, were effective. Peer interaction, however, did little to promote vocabulary acquisition. The researcher proposes implementing more diversified in-class activities and game quests relating to curriculum goals in existing classroom interventions. The discussion concludes by highlighting promising areas for future research.
- The Use of Network-Based Virtual Worlds in Second Language Education: A Research ReviewMark Peterson, Qiao Wang, and Maryam Sadat MirzaeiApr 2019
This chapter reviews 28 learner-based studies on the use of network-based social virtual worlds in second language learning published during the period 2007-2017. The purpose of this review is to establish how these environments have been implemented and to identify the target languages, methods used, research areas, and important findings. Analysis demonstrates that research is characterized by a preponderance of small-scale studies conducted in higher education settings. The target languages most frequently investigated were English, Spanish, and Chinese. In terms of the methodologies adopted, analysis reveals the majority of studies were qualitative in nature. It was found that the investigation of learner target language production, interaction, and affective factors represent the primary focus of research. Although positive findings relating to the above areas have been reported, the analysis draws attention to gaps in the current research base. The researchers provide suggestions for future research.