1. Introduction
The growth of English as an additional language (EAL) education worldwide (observed by, e.g.,
Smith, 2015) has led to an increase in EAL learners in countries such as the UK (
Murphy & Unthiah, 2015) and the US (
US Department of Education, National Center for Education Statistics, 2022), highlighting the need for effective EAL language instruction. Previous studies that have focused on the needs of EAL learners (
Hessel & Murphy, 2019;
Spencer et al., 2017;
Townsend et al., 2016) have shown that English academic vocabulary is one vital area where teachers can support EAL learners. While there have been fewer studies conducted in the international school context, existing research indicates that EAL learners in this context also often lack the vocabulary skills to engage with the academic material in their classes (
Coxhead & Boutorwick, 2018).
The issue of providing research-backed support for learners studying in an English as a medium of instruction (EMI) context in Japan is urgent, given the significant increase in the number of international schools operating in the country. Globally, International Baccalaureate (IB) schools experienced a 33.3% increase from 2016 to 2022, with Japan seeing commensurate levels of growth during this period (
International Baccalaureate, 2023). International schools in Japan often comprise learners from diverse L1 backgrounds with varying degrees of English language proficiency (
Brooks et al., 2021), making it challenging for teachers to support EAL learners in the classroom. In international schools, the need for this support becomes even more crucial. This is because the challenges posed by learners are compounded by the fact that they speak multiple languages, and some are required to learn both English and the language of the wider community where the school is located (
Carder, 2007). Consequently, learning English becomes significantly more complex in this context. The English language proficiency of EAL learners can be greatly affected by this, causing them to struggle to understand classroom materials in Japan (
Brooks, 2023).
To better support the needs of EAL learners in the classroom, it is necessary to understand both the range of vocabulary with which EAL learners are likely to be familiar across different educational stages as well as the vocabulary composition of academic materials these students are expected to engage with during their studies. This study investigates the gaps in vocabulary knowledge that EAL learners studying in the International School context may demonstrate across grade levels and subjects. To explore EAL learners’ vocabulary proficiency across various grade levels, we compared the vocabulary knowledge of two cohorts of learners enrolled in the international school context in Japan against the vocabulary profiles of the textbooks prescribed for their classes.
4. Methodology
4.1. Design
The study compared the vocabulary knowledge of EAL and FLE learners across seven different grade levels. A total of 139 participants were included in the final analysis, representing a diverse linguistic background. Each group’s vocabulary knowledge was assessed using one of two versions of the Vocabulary Levels Test (VLT). The scores they received on the VLT were compared to an analysis of the vocabulary coverage in the textbooks used by the participants in their International Baccalaureate (IB) subjects.
The independent variables in the analysis were learner type (with two levels: (1) FLE and PL2 learners, and (2) EAL learners) and school level (with two levels: pre-IB, which covers Grades 6 to 9, and IB, covering Grades 10 to 12). The dependent variable was vocabulary knowledge, as measured by the VLT scores across the first five 1000-word bands. The analysis included an interaction term to examine whether the difference in vocabulary knowledge between FLE/PL2 and EAL learners varied across school levels.
The corpus analysis focused on five subject areas (literature, maths, physics, chemistry, and biology), examining the frequency and coverage of vocabulary in textbooks used in these subjects. Vocabulary comprehension was measured in terms of how well the learners’ existing knowledge aligned with the vocabulary demands of these textbooks.
4.2. Participants
The study took place at two different International Schools in Japan. A total of 142 participants (
N = 142) from diverse linguistic and cultural backgrounds participated in the study. While Japanese and English were the two most common languages spoken, 22 different languages were represented in our dataset. We used a Rasch analysis (
Beglar, 2010) to evaluate participant scores on the uVLT and the NVLT. The majority of VLT scores demonstrated a good fit for the Rasch model. However, we removed three learners with very high outfit scores (
Zstd > 8.76) from the dataset. After the three participants were excluded from the final analysis, the final dataset consisted of 139 participants (68 male and 71 female learners ranging in age from 13 to 18 years old).
Following the procedure used in previous studies (
Coxhead & Boutorwick, 2018), we categorised the participants as either FLE, Proficient L2 (PL2) learners, or EAL learners based on nationality, time spent in English-speaking countries, languages spoken at home, and teacher assessments of English proficiency. For this study, EAL learners were defined as those who did not speak English at home and required additional language support in the classroom. FLE learners were participants who were both proficient in English and either spoke English predominantly at home or had significant exposure to English-speaking environments. PL2 learners were proficient in English but primarily spoke a language other than English at home. The term PL2 was chosen over non-native speaker (NNS) to avoid the deficit model associated with native and non-native speaker labels.
Table 1 details how many EAL, PL2, and FLE participants there were at each grade level.
Given that our study was primarily concerned with the vocabulary sizes of EAL learners and investigating the relationship between their vocabulary knowledge and their ability to engage with and comprehend classroom readings, as highlighted in previous studies (
Coxhead & Boutorwick, 2018;
Marianne & Coxhead, 2023;
Murphy & Unthiah, 2015), we made the decision to group the FLE and PL2 learners for the purpose of analysis. This was based on several factors. Firstly, this approach mirrored that of earlier studies (
Brooks, 2023;
Coxhead & Boutorwick, 2018) that have focused on EAL learners and compared their vocabulary knowledge and progress with those of their FLE and PL2 peers. Secondly, teachers considered both groups to be similar in the classroom context, and additional language support, such as pull-out classes, was reserved exclusively for EAL learners. Thirdly, demarcating between FLE and PL2 was challenging as the responses they gave to the questions on the surveys were very similar. For example, a Grade 10 student who had lived in the US for the first eight years of his life but spoke Japanese at home selected Japanese as his first language, while another learner who had bilingual parents but spoke predominantly English at home and had never lived overseas identified English as her L1. Despite initially being in different groups, the learners from the FLE and PL2 groups shared the most similarities in both their responses to the survey questions as well as how their language skills in the classroom were evaluated by their teachers. Given all these factors, it made sense to combine the FLE and PL2 learners, allowing us to focus on comparing the EAL learners with the rest of the cohort more effectively.
4.3. Materials
4.3.1. The Vocabulary Levels Tests
The learners’ vocabulary knowledge was measured using the Vocabulary Levels Test (VLT). The VLT is a standardised assessment instrument designed to evaluate a learner’s vocabulary knowledge across different frequency bands to gain a deeper understanding of their language proficiency (
Schmitt et al., 2001). In this study, the participants were given one of two different versions of the VLT test. Both tests cover the first five 1000-word bands of the BNC/COCA. These tests were chosen over previous VLTs (e.g.,
Nation, 1983;
Schmitt et al., 2001) because they cover a greater number of frequency bands and are based on a more recently compiled set of word lists, the BNC/COCA (
Nation, 2020). In total, 49 participants (31 EAL learners, 6 FLE learners, and 12 PL2 learners) took the new Vocabulary Levels Test (NVLT) and 90 participants (72 EAL learners, 5 FLE learners, and 13 PL2 learners) took the updated Vocabulary Level Test (uVLT).
We administered the new Vocabulary Levels Test (
McLean & Kramer, 2016) to the first group. This vocabulary assessment tool is a multiple-choice test consisting of 24 questions for each of the first five bands. The test also includes an academic vocabulary section based on
Coxhead’s (
2000) AWL. Each of the individual items consists of the target word, which is given by itself and in context, along with four possible responses. The examinee has to select the correct response, which can be either a single word or a phrase closest in meaning to the target word (see
Figure 1).
We administered
Webb et al.’s (
2017) updated Vocabulary Levels Test to the second group of participants. As with the NVLT, this test measures knowledge of the first five 1000-word bands. The uVLT also requires examinees to match a word with its definition or explanation. However, it differs from the NVLT in that the items are not presented individually but grouped into clusters, with each cluster containing three related words. There are 10 clusters for each of the first five 1000-word bands. The test taker must match each word to its corresponding definition or explanation (see
Figure 2). The uVLT includes a representative portion of nouns, verbs, and adjectives (15, 9, and 6 items per level, respectively) selected from each of
Nation’s (
2020) BNC/COCA bands. Because the uVLT does not include a section for
Coxhead’s (
2000) Academic Word List, we supplemented the test with the AWL section of the NVLT.
We understand that it would have been better to use the same assessment tools for both groups of learners. However, the assessments were conducted as part of two larger studies (e.g.,
Brooks et al., 2021) and, in the context of this paper, we felt that it was important to include the data from both assessments as that allows us to examine larger trends within the population. The tests serve as a tool for assessing the learner’s comprehension of the frequency band, and the effectiveness of both assessments in demonstrating the mastery of frequency bands by learners has been proven in studies conducted by the authors of each assessment tool (
McLean & Kramer, 2016;
Webb et al., 2017) and have been used in this capacity by other researchers (e.g.,
Ha, 2021;
Kremmel et al., 2023;
Xodabande & Hashemi, 2023). It is crucial to clarify that these scores are not being compared to any other scores or utilised as a measure of linguistic proficiency but are being used to provide us with a picture of the level of vocabulary knowledge EAL learners in this context are likely to possess. Given this, we feel that any potential variance between the tests would not invalidate the picture they are able to give us of the average level of vocabulary knowledge of the different groups of students in this study. Although there may be a slight variation between the tests, the benefits of gathering information about the vocabulary knowledge of a larger number of students in diverse settings outweigh any potential drawbacks related to differences between the assessments themselves.
4.3.2. The Corpus
To create a corpus representative of the textbooks used by learners in the classroom, we compiled a set of domain-specific corpora sourced from a variety of textbooks taken from across the different subjects the participants were studying at the two schools. The corpus comprises five subjects (
Table 2): literature, maths, physics, chemistry, and biology. The corpus focuses on the textbooks used during the IB diploma and does not include the handouts and textbooks the participants were using at the middle school level. The rationale for this was two-fold. Initially, these textbooks aided in building a strong vocabulary foundation for EAL learners in the IB program. Secondly, teachers at the middle school level used handouts and teacher-created materials more frequently in the classroom rather than relying on prescribed textbooks. While there were no middle-school textbooks in the corpus, we feel that the vocabulary in the corpus is relevant to these grades because, in the IB setting, one of the primary goals of the middle-school grades is to prepare students for the IB program. Part of this preparation involves having students learn the vocabulary that they need to succeed in the IB context.
We digitised the textbooks by scanning them into the computer, then meticulously reviewed and refined the scanned texts to rectify any errors that might have occurred during the scanning process. This thorough process was conducted using a text editor, following the procedures suggested by
Nation (
2016). We utilised R and Excel to deal with errors not addressed during the initial cleaning phase. All of the errors that resulted in non-words were cleaned by exporting a csv of all the off-list words and manually checking those against the original PDFs. This was carried out by the primary researcher and a group of research assistants. However, despite the extensive data cleaning, it was not possible to completely eliminate all noise from such a large corpus, and some OCR (optical character recognition) errors remain, such as those with mathematical and chemical formulas, with numbers such as “1” being recognised as an “I”, or with punctuation. Although this may restrict the use of data to analyse features of the text such as paragraph length or mean sentence length, the focus on removing incorrectly scanned words means that the impact of this noise on vocabulary measures was minimal.
4.4. Procedures
As noted above, the participants for the study were recruited from two international schools in Japan. This was carried out through a collaboration with the EAL teachers and the principals at both schools, which helped to facilitate the recruitment process. All students in grades 6 through 11 were administered the Vocabulary Levels Tests (VLTs) as part of their regular classroom activities. Due to time constraints, not all grade 12 students were able to participate, as they were heavily engaged in preparing for their International Baccalaureate (IB) exams and final assessments.
At the start of the project, we obtained consent from both the participants and their parents or guardians. This process was carried out collaboratively with the research team, the students, the parents, and the school administration. Consent was required before the administration of the VLT, and only students from whom consent was received were included in the final dataset.
The VLT was administered in the classroom by the students’ regular teachers. The test was paper-based, and students were given one hour to complete the VLT along with a short survey on their language background. All participants finished within the allotted time.
At the same time they took the VLT, participants also filled in a short survey which included questions on nationality, languages spoken at home, and time spent overseas. Additionally, students were asked to self-identify their first language (L1), with the option to select multiple L1s, acknowledging the number of bilingual and multilingual learners in the study.
6. Results
Research Question 1: To what extent does the vocabulary knowledge of English as an additional language (EAL) and first-language English (FLE) learners in an international school context vary across different grade levels?
An examination of the VLT scores (
Table 3) shows that the majority of the participants were not able to attain mastery of the AWL or the BNC/COCA mid-frequency bands. These findings reflect previous studies’ findings (e.g.,
Coxhead & Boutorwick, 2018). Additionally, we noticed that a considerable portion of the participants struggled with learning the common high-frequency words from the 2000 range before reaching Grade 9, and less than half of both groups were able to grasp the 3000-word levels before Grade 12. The mastery of the AWL also posed a significant challenge for learners at all grade levels, with fewer than 25% of the participants being able to master these words until the 12th grade. Because of how important the general academic words found on the AWL are for comprehending academic texts such as school textbooks (
Greene & Coxhead, 2015;
Hu et al., 2021), the lower levels of proficiency the participants displayed on vocabulary items from the AWL would suggest that they would be likely to encounter difficulties reading texts appropriate for their grade level.
We conducted separate analyses for EAL learners and PL2/FLE learners, highlighting that EAL participants showed significantly lower levels of proficiency with both high-frequency and mid-frequency vocabulary (
Table 4) compared to PL2/FLE learners (
Table 5). We found that the majority of PL2/FLE participants were able to master the first 5000 BNC/COCA frequency bands. In contrast, fewer than 50% of EAL participants could master beyond the 2000-word band. Even at grade 12, only 50% of EAL participants had mastered the 3000-word band, and just 33% had mastered the 5000-word band. However, despite their higher levels of vocabulary proficiency, we found that many of the PL2/FLE still struggled with the AWL, suggesting that vocabulary, particularly academic vocabulary, remains a challenge for even this more adept group of students.
The descriptive statistics of the scores for each of the Grade levels for the EAL and FLE/PL2 groups are given in
Table 6.
A linear regression model was used to compare the vocabulary knowledge of EAL learners with that of FLE and PL2 learners across different grade levels (
Table 7). Given the small number of participants at some grade levels, the participants were divided up into pre-IB, covering Grades 6 to 9, and IB, covering Grades 10 to 12. The results showed that FLE/PL2 learners consistently outperformed EAL learners on the Vocabulary Levels Test (VLT). On average, FLE/PL2 learners scored about 19 percentage points higher than EAL learners. This significant difference in vocabulary knowledge was evident across both the pre-IB (Grades 6–9) and IB (Grades 10–12) levels. Additionally, EAL learners in pre-IB scored lower than those in the IB level, indicating an improvement in vocabulary knowledge as they advanced in grade levels. However, the gap between EAL and FLE/PL2 learners remained consistent across grade levels, suggesting that while all learners develop their vocabulary over time, EAL learners continue to lag behind their FLE/PL2 peers throughout their schooling.
Research Question 2: To what extent does the vocabulary in subject-specific textbooks at an international school align with BNC/COCA frequency bands, and how does this alignment vary across IB courses?
The vocabulary profiles of the domain-specific corpora (
Table 8) show that the literature textbooks were likely to be the most accessible for EAL learners. Based on the coverage of the BNC/COCA over this corpus, we would expect learners to achieve the 95% coverage threshold required for comprehension if they could master the first five 1000-word bands from the BNC/COCA. The next most accessible group of textbooks were those from the maths corpus, where the coverage provided by the first five 1000-word bands was 94.28%, very close to the minimum threshold of 95%. However, for all the other corpora, the coverage provided by the first 5000 words fell significantly below the 95% mark.
The first 2000 frequency bands provided notably low coverage, particularly in the chemistry and biology texts, where less than 78% of all tokens were from the first two bands. This is significantly lower than the coverage these bands have been found to provide over texts written specifically for English language learners, where the first 2000 words have been found to provide over 92% coverage (
Sun & Dang, 2020). These findings suggest that EAL learners would struggle with these textbooks given their vocabulary mastery level.
Research Question 3: To what extent can EAL learners be expected to comprehend the vocabulary encountered in the textbooks used for different IB subjects?
We estimated the percentage of vocabulary in the textbooks that EAL learners would likely know by analysing their scores on the NVLT or uVLT. We then looked at the coverage provided by each of the bands of the BNC/COCA over the various domain-specific corpora (
Table 9). By looking at the coverage provided by the words from the frequency levels that an individual participant had displayed mastery of, we were able to determine the overall vocabulary coverage a learner would likely have over the texts from different subjects. Using these numbers, we were able to infer what percentage of words the participants would likely know from the different corpora.
To assess whether the differences in coverage that learners were likely to achieve across subjects, as identified in the descriptive analysis, were statistically significant, a repeated-measures ANOVA was conducted, followed by pairwise comparisons using the Tukey adjustment to control for multiple comparisons. The ANOVA revealed a significant difference in coverage between academic subject, F(4, 408) = 4858.89, p < 0.001, , showing that coverage learners would be likely to have over the different sub-corpora would vary between subjects. The subsequent post-hoc analysis showed that the largest difference was between biology and literature, with students achieving significantly higher coverage in literature (M = 88.98, SE = 0.79) than in biology (M = 82.39, SE= 0.83; Mean Difference = 6.58, p < 0.001). The only non-significant difference was between math (M = 87.54, SE = 0.83) and physics (M = 87.54, SE = 0.85; Mean Difference = 0.01, p = 0.912), suggesting learners would have similar vocabulary coverage across both of these two subjects. These findings suggest that, with one exception, the coverage provided by the BNC/COCA corpus is domain-specific, indicating that learners require varying levels of vocabulary support depending on the subject they are studying.
Figure 3 presents a histogram of the vocabulary coverage participants would likely know for each corpus. The data reveal that most EAL participants lack the vocabulary needed to read discipline-specific textbooks, with biology and chemistry being the most challenging subjects and maths and literature the easiest. Given the vocabulary levels displayed by EAL learners, even at the Grades 11 and 12 level, we expect them to struggle to fully comprehend the textbooks used in their classes.
Research Question 4: Using the BNC/COCA, how much additional vocabulary would EAL learners need to acquire to understand the language used in textbooks they would typically encounter within the international school context?
To answer this fourth and final question, we need to look at the gap that exists between the learners’ current knowledge and the number of words they would need to learn, using existing word lists, to achieve 95% or greater coverage. From what we have discussed above, it is evident that learners often lack proficiency in frequency bands exceeding the first 2000 words. We also know that they are not likely to have mastered the Academic Word List. Since learners would have to master the 5000 to 9000 most frequently used words in the BNC/COCA to adequately cover various subjects, the number of words they would need to learn solely based on the BNC/COCA is likely too large for them to acquire within their available time (
Table 10). For example, a typical 10th-grade EAL student would need to learn around 7000 new words to achieve the threshold necessary to comprehend a biology or chemistry textbook successfully.
Examples from our study further illustrate this issue: participant S04 was able to master the 4000- and 5000-word bands but not the 3000-word band. Participant S09 mastered the 5000-word band but not the 3000- or 4000-word bands. According to previous studies (e.g.,
Schmitt & Schmitt, 2014;
Teng, 2019), learners are likely to acquire words based on the number of exposures to those words. Given this fact, our finding suggests that, beyond the first 2000 high-frequency words, the BNC/COCA frequency list may not accurately represent the frequency of the vocabulary items in the textbooks that EAL learners are being asked to read for their classes. This misalignment calls into question the effectiveness of using traditional frequency lists for supporting EAL learners’ vocabulary acquisition. Given this fact and the strong and significant correlation between vocabulary knowledge and reading comprehension shown by other studies (e.g.,
Brooks et al., 2021;
Laufer & Ravenhorst-Kalovski, 2010), more research is needed regarding EAL learners’ vocabulary knowledge and needs in an EMI context. Additionally, there is a pressing need to develop tools that effectively meet these vocabulary needs.
7. Discussion
The current study was designed to examine the vocabulary knowledge of EAL learners in an international context in Japan and compare it to the vocabulary found in the textbooks these learners use in class, using the BNC/COCA word list. The aim was to gain insight into the vocabulary used in these textbooks to identify potential challenges EAL learners may face while reading them. The study was driven by four research questions, each of which we discuss in turn below.
Regarding Research Question 1, the data strongly support the findings of previous research (
Coxhead & Boutorwick, 2018), indicating a widespread challenge among learners, especially those with English as an additional language, in mastering academic vocabulary. Similar challenges were noted in studies by
Dixon et al. (
2020), which found that while EAL learners do acquire vocabulary, they are often not able to do so with sufficient speed to keep up with their FLE-speaking peers. The finding that fewer than 25% of participants could master the AWL before Grade 12 is concerning, as academic vocabulary is essential for reading comprehension and success in the classroom (
Brooks, 2023;
Green & Lambert, 2018;
Greene & Coxhead, 2015). Studies conducted by
Coxhead and Boutorwick (
2018) and
Marianne and Coxhead (
2023) yielded similar findings, indicating low levels of academic vocabulary mastery among EAL learners in comparable educational contexts. Given their difficulties with academic texts, it is probable that EAL learners will face substantial challenges in their academic pursuits, particularly as they progress to higher grades where the academic demands intensify.
For Research Question 2, the results reinforce the need for domain-specific vocabulary lists, a need that has been extensively discussed in earlier research (
Green & Lambert, 2018;
Greene & Coxhead, 2015). In line with previous studies on vocabulary (
Coxhead, 2017), it appears that the difficulty of the vocabulary found in a corpus can vary markedly across domains. While more proficient learners may possess sufficient vocabulary resources to handle content in subjects like literature and mathematics, which were found to be relatively accessible with approximately 95% coverage achieved using the first 5000 BNC/COCA word bands, specialised subjects such as biology and chemistry present greater lexical challenges, resulting in a notable drop in coverage. This builds upon previous studies that have shown both the difficulty of the vocabulary in textbooks from a specific domain, such as the sciences (
Hu et al., 2021), as well as the importance of this vocabulary because of the strong connection between content knowledge and technical vocabulary (
Coxhead, 2017;
Woodward-Kron, 2008). The results of our research corroborate the findings of previous studies and underscore the distinct obstacles posed by subject-specific terminology, emphasising the necessity of providing tailored lexical assistance to students grappling with such domains.
Although the 5000 most frequent words offer a wide range of coverage, the BNC/COCA corpus falls short in adequately capturing the specialised terminology prevalent in the fields of biology, chemistry, and physics. This deficiency echoes similar findings from prior studies on the vocabulary demands of English as an additional language (EAL) learners, highlighting the need for a more comprehensive approach to vocabulary acquisition in these subjects (
Coxhead & Boutorwick, 2018;
Coxhead et al., 2010). The low coverage in these specialised domains emphasises the difficulties faced by EAL learners, especially considering that only a minority (e.g., 33% of Grade 12 participants) demonstrated mastery of the requisite vocabulary bands. This further underscores the importance of developing targeted vocabulary lists to support comprehension in more specialised academic texts, as earlier studies have suggested (
Green & Lambert, 2018;
Greene & Coxhead, 2015).
Research Question 3 investigates the vocabulary coverage of discipline-specific textbooks, analysing the expected coverage based on English as an additional language (EAL) learners’ Vocabulary Levels Test (VLT) scores. This analysis reveals a concerning trend: most EAL learners possess insufficient vocabulary to fully comprehend the content within these textbooks. This supports earlier research, which has shown that these learners tend to struggle with reading comprehension in an academic setting (
Brooks, 2023;
Coxhead et al., 2010;
Murphy & Unthiah, 2015).
While the EAL learners in this study demonstrated improvement in vocabulary knowledge across grade levels, the rate of this growth did not match the rapid gains observed in other researchers, such as
Coxhead et al. (
2015), who found that 15–16-year-old FLE speakers enrolled in New Zealand high schools learned an average of over 1300-word families per year. While our study indicated vocabulary knowledge improvements across grade levels, these improvements were less substantial than those observed in a study by
Coxhead and Boutorwick (
2018), where some EAL learners achieved comparable vocabulary knowledge levels to their native-speaking peers within a shorter timeframe—four years for high-frequency vocabulary and five years for academic vocabulary. Our research reveals a persistent gap in vocabulary knowledge, even among upper-grade learners, particularly in subjects like biology and chemistry. Even Grade 12 English as an additional language (EAL) learners show a significant lack of proficiency in essential vocabulary bands, highlighting a substantial discrepancy between their vocabulary comprehension and the demands of these disciplines. This presents a significant barrier to their academic success in these subjects (
Marianne & Coxhead, 2023;
Woodward-Kron, 2008).
Although the BNC/COCA list could potentially help address these vocabulary gaps, our analysis for Research Question 4 reveals that EAL learners’ vocabulary profiles deviate significantly from the typical profile predicted by this frequency list (
Nation, 2016), and further suggests that the number of words they would need to learn based on these lists is too extensive to be realistically managed. These lists organise words by how often learners encounter them in real-world contexts, with higher-frequency words appearing more frequently in texts. As a result, learners are expected to learn higher-frequency words sooner due to repeated exposure. However, previous studies have shown that textbooks in subjects like maths and science can have very different vocabulary profiles and loads than other texts, such as novels or English language textbooks (
Groves, 2016;
Hu et al., 2021), which could result in EAL learners acquiring the vocabulary in a different order than ESL or EFL learners. The results of our study, which support previous research, indicate a notable disparity in vocabulary knowledge growth, particularly within the 3000-word band. In this band our participants demonstrated a mastery level that was noticeably lower (by more than 10%) compared to the mastery they displayed in the 4000-word band.
Table 9 highlights this pattern with examples from individual learners. The misalignment between the expected and actual vocabulary acquisition sequences, where learners are expected to acquire bands in order, underscores the challenges associated with using the BNC/COCA for teaching mid- to low-frequency vocabulary to this particular group of learners.
Furthermore, given that previous studies have shown that vocabulary acquisition takes place at a relatively steady pace (
Dixon et al., 2020;
Webb & Chang, 2012), it is unlikely that learners would be able to acquire the vocabulary knowledge required to gain the 95% to 98% coverage necessary for comprehension during their time at school using the BNC/COCA alone. This supports calls from researchers (
Coxhead & Boutorwick, 2018;
Nation, 2016) that highlight the need to develop domain-specific word lists for EAL learners in order to help them acquire the vocabulary knowledge they need to succeed academically.
8. Implications and Limitations
The current study has several limitations that should be considered when interpreting its findings. One potential issue, as previously discussed, is the use of two different vocabulary assessment tools to measure the vocabulary knowledge of the participants in the study. In the future, it would be preferable to conduct a similar study with the same types of assessment for both groups of learners. However, it is important to keep in mind that both assessment tools used in this study were developed for adult EFL learners, and neither may be entirely suitable for EAL learners. Unfortunately, there are limited assessment tools for use with this group of learners. While tools do exist for younger EAL learners, such as the Peabody Picture Vocabulary Test (
Dunn & Dunn, 2012), developed for young FLE speakers, and the English Picture Vocabulary Test (
Güngör & Önder, 2023), developed for young L2 learners of English, there are no tools that have been specifically designed to measure the vocabulary knowledge for the age group of EAL learners participating in this study. The development of such a test is important, based on both the findings of this study as well as those of previous research (
Green & Lambert, 2018;
Greene & Coxhead, 2015), which have shown that the vocabulary needed in the EMI classroom is different from the vocabulary found in frequency lists developed for adult learners. A further limitation of the current study relates to the content of the corpus. The corpus used for this study was composed entirely of textbooks developed for the classes being taught. While it would have been beneficial to include a spoken component in the corpus, the logistics of making recordings in the classroom and transcribing a sufficient number of those recordings to develop a spoken corpus of a comparable size to the written corpus made it impossible to undertake as part of this study. Given the acknowledged differences between spoken and written English (
Dang et al., 2017), we feel that it would be beneficial to conduct a follow-up study using a corpus of spoken English. A final limitation that needs to be addressed is the effect of L1 backgrounds, which the literature has shown can affect learners’ English vocabulary knowledge (
Booth & Clenton, 2020). While our dataset includes assessments from learners with diverse linguistic backgrounds, we were unable to evaluate how these backgrounds might impact their knowledge of academic English. We would like to pursue this in the future should we be able to include a sufficient number of learners from similar L1 backgrounds. We would also like to expand the scope of our study to include EAL learners studying in schools outside of Japan.
Despite the limitations discussed above, the current study offers some important implications with regard to supporting EAL learners in the classroom. First, as suggested by previous studies (
Coxhead & White, 2012;
Green & Lambert, 2019;
Greene & Coxhead, 2015), educators looking to support EAL should prioritise enhancing their students’ vocabulary knowledge. The data from this study highlight a pressing need to enable learners to expand their vocabulary knowledge in order to comprehend the textbooks they are asked to read. Additionally, there is an urgent requirement to develop word lists specifically tailored for EAL learners. While lists such as the AWL (
Coxhead, 2000) and the BNC/COCA (
Nation, 2020) are able to provide EAL learners studying in the international school context some support in acquiring the vocabulary they need to understand the textbooks that they are being asked to read, they are often either too broad (in the case of the BNC/COCA) or too limited (in the case of the AWL) in scope to provide learners with the support they require. While including important words, tools like the AWL were developed using a corpus of articles written for adult learners and may not include all of the words that EAL learners need in their context. On the other hand, the BNC/COCA contains too many words, requiring EAL learners to acquire upwards of 7000 new words before they are able to achieve sufficient coverage to understand the textbooks from subjects such as biology or chemistry.
Furthermore, both sets of word lists, by design, cover vocabulary from multiple domains, making it difficult for teachers to select the words most appropriate for a single subject. This is important because research shows us (e.g.,
Teng, 2019;
van Zeeland, 2013) that it is more effective to teach words in context. A set of domain-specific word lists focusing on the various subjects that EAL learners studying in the international school context would be likely to study would allow teachers to focus on a manageable list of words and teach them within the context of the subject where those words are required.