Pronunciation, Autonomy, and Automatic Speech Recognition

We all know pronunciation is important, but it is a skill that often falls by the wayside. There are a lot of reasons for this. Some of us lack the proper training. For students, progress is slow. Pronunciation teaching is often unplanned, reactive, or one-off. Other “big” skills like listening, grammar, or vocabulary often seem more important. Clearly, it’s important for students to learn important skills and strategies in order to improve pronunciation on their own. They learn these skills for other skills, so why not pronunciation? Of course, it’s easier said than done. In the article below, McCrocklin shows that introducing the use of Automatic Speech Recognition software may be a simple and minimally time-intensive way of getting students to improve not only their pronunciation accuracy but their autonomous pronunciation learning as well.

Article

McCrocklin, S. M. (2016). Pronunciation learner autonomy: The potential of Automatic Speech Recognition. System, 57, 25-42. Retrieved from http://www.sciencedirect.com/science/article/pii/S0346251X15001980.

Pronunciation, Autonomy, Technology

Most pronunciation teaching is heavily teacher-focused. The teacher is often the source of instruction, modelling, feedback, and evaluation. The student plays a very small role, if any, in monitoring and correcting their own pronunciation. This makes sense since students often have difficulty identifying their own pronunciation errors. Past research on student autonomy has highlighted its importance in contributing to successful language learning. However, there has been very little research that has paired pronunciation practice and autonomous learning. Modern speech recognition technology now may make such learning possible.

In the past, ASR (automatic speech recognition) software was unreliable, especially for foreign accents; however, more recent ASR software such as Siri, Windows Speech Recognition, and even Google Voice Typing, is much more accurate. Several studies have shown that use of ASR can lead to pronunciation improvement (Hincks, 2003; Neri, Cucchiarini, & Strik, 2006; Neri, Mich, Gerosa, & Giuliani, 2008). McCrocklin’s article looks at how ASR can be used to improve autonomy rather than accuracy.

To teach students to be autonomous is a gradual process. It is done through training, guidance and feedback from a teacher, coupled with students’ autonomous experimentation and reflection. As McCrocklin states (p. 27):

In pronunciation learning, students are likely to be at a loss if simply handed the reins of their language learning.

Study
McCrocklin devised a mixed-methods experimental research study that investigated whether instruction and practice with ASR leads to autonomy. His study consisted of:

Three-week pronunciation workshop within an advanced ESL listening course at a large US university
Two classes per week for 50-minutes each (total of 6 classes)
Focused on / 3/ vs. /æ/, /ɔ/ vs. /ʌ/, /i/ vs. /ɪ/, /ɹ/, /θ/, /ʒ/, and /dʒ/
48 participants total
pre, post, delayed-post survey; language learning logs, interviews

	Control	Experimental
Name	CONV	STRAT	HYBRID
Size	15	17	16
Description	face to face course	face to face with some strategy training	face to face with some strategy training and a focused technology day
Session 1	Listening and production activities following Celce-Murcia et al’s framework¹	Listening and production activities following Celce-Murcia et al’s framework independent pronunciation practice strategies: focused listening² covert rehearsal³	Listening and production activities following Celce-Murcia et al’s framework independent pronunciation practice strategies focused listening covert rehearsal ASR
Session 2	Listening and production activities following Celce-Murcia et al’s framework	Listening and production activities following Celce-Murcia et al’s framework	recorded listening practice controlled and communicative ASR activities⁴

Findings

Both experimental groups (STRAT, HYRBID) improved in their beliefs about autonomy (no statistical difference between these groups)
Delayed post-survey showed that there were significant differences on time spent on activities after the three-week workshop had concluded. Students in the HYBRID group spent more time using autonomous pronunciation strategies, using more strategies (such as covert practice) and significantly more ASR activities.
- That HYBRID had greater autonomous practice points to the need for gradual introduction to autonomy, as well as training
Findings from interviews found that:
- Many (84% across all groups) found ASR useful, despite some frustrations
- ASR is frustrating, especially initially. Some programs are better than others.
- Students were not aware of ASR they already had access to on their phones
- Those who liked ASR pointed to its ability to give feedback as one reason
- Some used dictionaries to listen to pronunciation in order to get the correct feedback from ASR
- One did minimal pair activities in sentences rather than isolation because the ASR software better recognized their words
- Students reported using Google Voice Search, Dragon Dictation, and Siri

Conclusion and Takeaway

Previous research has shown ASR to possibly be effective, and the present study showed that, given minimal training and “spending minimal time (a few minutes per week”, students can begin to use ASR autonomously and possibly reap many benefits. McCrocklin provides a great explanation of the implications of her research (p. 35):

Given that teachers often do not have sufficient time to cover pronunciation in class, it may be easy to see ASR work as an extra burden and teachers may be hesitant to turn courses into hybrids. Instead, ASR should be seen as a potential solution to this problem. ASR work could easily be turned into homework for students in speaking classes, giving students a chance to practice and focus on their pronunciation outside of class, freeing up class time for other speaking activities.

Students already have access to such tools on their phones and computers. I have personally tried (available within Google Docs) with myself and international students and found it both frustrating at times and very accurate at times, especially given a clear microphone and a quiet room. McCrocklin calls for more research, and I think teachers can easily answer this call with their own classroom experimentation

Here are some ideas that you could use with ASR:

Minimal pairs practice – provide students with a list of minimal pairs of a difficult set of phonemes
Tongue twisters – provide students with a tongue twister with a target sound or pair of sounds
Dictation – give students a short passage and get them to read it aloud, trying to voice type it verbatim
Thought groups – get students to divide a short passage into thought groups, and then get students to voice type each thought group as fluently and accurately as possible
Speak, speak again! – students respond to a question by voice typing without stopping. Then, they look over their answer, highlight their errors, and practice the correct pronunciation (perhaps using Forvo to find exemplars). Finally, students voice type their answer again, trying to avoid the initial errors.

Notes

Celce-Murcia et al’s framework: a framework for pronunciation instruction that moved through five stages: description and analysis, listening discrimination, controlled practice, guided practice, and communicative practice. See here.
Focused listening: listening for specific pronunciation during authentic listening (e.g. to videos)
Covert rehearsal: private speaking while monitoring pronunciation
ASR activities: these included minimal pairs, reading a dialogue, and freely speaking while focusing on target sounds (these are available in the appendix)

References

Celce-Murcia, M., Brinton, D., & Goodwin, J. (2010). Teaching pronunciation (2nd ed.). Cambridge: Cambridge University Press.

Hincks, R. (2003). Speech technologies for pronunciation feedback and evaluation. ReCALL, 15(1), 3-20.

Neri, A., Cucchiarini, C., & Strik, H. (2006). ASR-based corrective feedback on pronunciation: does it really work?. In Proceedings of the ISCA Interspeech 2006, Pittsburgh, PA (pp. 1982-1985).

Neri, A., Mich, O., Gerosa, M., & Giuliani, D. (2008). The effectiveness of computer assisted pronunciation training for foreign language learning by children. Computer Assisted Language Learning, 21(5), 393e408.

Photo by TheAngryTeddy

Anthony Schmidt

English language Instructor at University of Tennessee, Knoxville

Anthony Schmidt is editor of ELT Research Bites. He also has his own blog at anthonyteacher.com. Offline, he is a full-time English language instructor in a university IEP program. He is interested in all aspects of applied linguistics, in particular English for Academic Purposes.

3 thoughts on “Pronunciation, Autonomy, and Automatic Speech Recognition”

David Oberlin, Vice President Global Education, Carnegie Speech says:

February 2, 2017 at 4:49 am

Anthony, I am grateful for your contribution to this field. I have been working for 10 years in the realm of speech-recognition and language learning for Carnegie Speech, and your article confirms what I have observed with thousands of English language learners. Thank you for this article.
Deborah Cordier PhD says:

January 30, 2017 at 11:08 am

Anthony,

Thanks for presenting this research. The qualitative findings of my dissertation work (2009) supported this approach to extracting the highest learner FL pronunciation value from ASR. I am happy to see ASR has resurfaced here with a focus on its practical use.

Again, thanks.
1. Anthony Schmidt says:
  
  January 30, 2017 at 8:32 pm
  
  Thanks for reading! For your dissertation, what ASR software did you use? How was it implemented?

Comments are closed.