Since ICAO adopted the revised Standards and Recommended Practices to Annex 1 of the Chicago Convention in 2003, the ICAO Rating Scale has been in use for the assessment of aviation language proficiency for pilot and ATCO licensing worldwide. Each year, thousands of personnel undergo a wide variety of aviation English tests, and their performances are assessed by hundreds of raters working for airlines, ANSPs, ATOs, universities, colleges and commercial aviation English test providers. For this global endeavour to meet its objective - safe radio communications - not only do test instruments need to be fit for purpose, but the personnel administering them need to be adequately trained.

Recognising the important role that raters play in aviation language testing, ICAO Document 9835 calls for some 40 hours of initial training, and annual recurrent training of between 24 and 40 hours. Accordingly, many aviation authorities have established national regulations for rater training, with some stipulating training courses provided by a foreign entity. Each year, in addition to refresher training for existing raters, dozens of English language and aviation professionals enter the field of aviation English driven by curiosity, interest and a desire to develop and diversify their professional knowledge and skills for future employment, with rater training seen as a clear route to entry. This combination of national regulations and individual motivations has created a demand for generic aviation English rater training courses, a demand which is met by a number of providers who offer courses in varied shape and form. As rater training is largely unregulated, and as there is little guidance on what an aviation English rater training course should include, this article briefly explores this niche area of activity, suggests content that training courses should cover, and evaluates the usefulness of generic rater training courses.

The ICAO Language Proficiency Requirements (LPRs) bring together language for all aeronautical radiotelephony communication, calling for adherence to standard phraseology in the first instance, and the availability of plain English when phraseology does not suffice. As the LPRs introduced a universal level of language proficiency for all personnel who use the radiotelephone in international operations, it is tempting to believe that the Rating Scale and the language of aeronautical communication are universal too. This is true, but only in part.

On one hand, the work of the rater is inextricably connected to the test instrument. Language performances vary from test to test, task to task, and accordingly, the guidelines that Test Service Providers develop for the use of the Rating Scale vary too. All raters (and to an extent, aviation English teachers as well!) require background aviation language assessment literacy (this is where good generic rater training courses work well), but this doesn’t mean that rating skills are transferable from test to test. Just like pilots who require type rating and controllers who require unit endorsements, raters require test-specific training on the instrument to be used.

On the other hand, while the ICAO Rating Scale covers all users of the radiotelephone, the language of radio communications is not as homogenous as one might assume. The ‘aeronautical community’ represents a kaleidoscope of people, cultures and first language backgrounds with diverse roles and levels of expertise, operating different equipment in airspace with varied structure and procedures. Consider, for example, the diversity that exists between an Omani Search and Rescue helicopter pilot, a Brazilian B777 first officer, a Japanese student doing a PPL course in the USA, a Turkish drone operator, a Chinese radar approach controller and an Irish oceanic controller. True: all have to meet the same standard of language proficiency for international operations, but given the myriad contexts in which language is used on the radio, the language that constitutes ICAO level 4 varies from user to user and region to region.

To take a musical analogy, an orchestra achieves harmony from the coordination of dozens of musicians playing different parts on strings, brass, woodwind and percussion. Each musician must meet standards for musicianship for harmony to be achieved. The same can be said in aviation communications: the Rating Scale provides the standard, but the standard is applicable to different users in different contexts, contexts which need to be well-understood by both the test developer and the rater for assessment to be meaningful. Indeed, ICAO Document 9835 states that ‘raters must ... understand the criteria and the context in which the criteria occur’¹. Thus, an important stop on any rater training journey is a thorough exploration of the varied characteristics of the Target Language Use (TLU) domain, in our case, aeronautical radiotelephony communications.

Varying TLU domains also need to be considered in the relationship between test design and the Rating Scale. Tasks elicit a performance. The performance is assessed by the rater with reference to the scale. But for rating to be meaningful, good tasks are a prerequisite. Let's return to our musical analogy. To achieve harmony, the orchestra comprises different instruments playing specific parts. There’s little use instructing the violins to play the clarinet part or the trumpets to play the timpani - discord would ensue! Similarly, in radio communications, although pilots and ATCOs share two ends of the radio frequency, the parts they play differ. Likewise, the language that a recreational PPL holder uses differs from that of an airline captain. These differences need to be captured in the design of test tasks so that they elicit role-relevant performances which enable meaningful assessment in accordance with the Rating Scale. Quality in language testing is a thread which runs from TLU to task to assessment, and an examination of this thread is a critical component in any rater training course. All raters need to understand the nature of the performance that different tasks elicit. A rater’s work is only as good as the sample they work with. Task design comes first.

Then we move to the ICAO Rating Scale itself. Notoriously vague in its wording, the Rating Scale is full of peculiarities which require careful analysis and definition. Let’s take some descriptors for example:

Pronunciation ... rarely interfere[s] with ease of understanding
Basic grammatical structures ... are consistently well controlled
Vocabulary range and accuracy are sufficient to communicate effectively
Produces stretches of language at an appropriate tempo
Comprehension is accurate on common, concrete, and work related topics
Responses are usually immediate, appropriate, and informative

What do the descriptors ‘rarely’, ‘usually’ and ‘consistently’ mean? What is ‘a stretch of language’? What do we mean by ‘range and accuracy’? Definitions from a linguistics point of view are useful, but there is more to it than that.

Although ICAO document 9835 states that the Rating Scale ‘has a distinct aeronautical radiotelephony focus’², the descriptors above lack what language testers refer to as ‘explicitness’. In other words, they could be readily interpreted in any professional setting, from the deck of an ocean liner to the hospital theatre or even the hairdresser’s salon! Therefore, rater training not only needs to explore what the descriptors mean from a linguistics perspective, but what they mean specifically for pilot-controller communication. For measurement to be meaningful, the Rating Scale requires domain-specific interpretation.

The core component of a rater training course is, as one would expect, lots of opportunity to practise applying the six criteria of the scale to spoken performances at a variety of levels. This is typically presented initially through standard setting exercises, possibly with reference to the ICAO Rated Speech Samples Training Aid³, before moving on to ‘table-top’ and individual rating exercises with plenty of time for facilitated analysis and discussion. As we know, rating quality is inextricably linked to the quality of the performance itself, so the samples used for training purposes are front-and-centre. As discussed in a previous blog post, raters need to work with performances which are representative of radiotelephony communications, both in terms of what the test-takers listen to and what they say.

Group rating exercises generate a lot of ‘rating data’, and as participants progress through training and begin to produce more consistent ratings, we can look at this data and what it means. Given the fundamental importance of reliability in language assessment, a useful rater training course will provide a practical introduction in the use of statistics for measuring rater performance, adding value by developing the skills that raters need to implement procedures for rater monitoring within their own assessment systems once training is complete.

Although the ICAO LPRs have been in place for nearly two decades, their implementation has been uneven, and not always successful. The social, political, economic and regulatory challenges are many and varied. Research over the years has highlighted problems not only with the policy and its implementation, but with the Rating Scale itself, and research continues to lead to a more mature understanding of what aeronautical communication is and how it should be tested. More recently, new guidance material on the design of aviation language tests has been published, further assisting the community in the move towards higher standards. No rater training course would be complete without a critical examination of the current status of implementation of the LPRs, an introduction to test quality (validity and reliability) and overview of current perspectives and guidance on best practice in the field. Rater training should be more than a box-ticking exercise; it should add value by raising awareness of issues in aviation English testing and developing the broader language assessment literacy of course participants.

On a final note (and there we conclude with the music analogy again!) a word of caution: course participants will naturally look for a certificate as proof of completion of training for presentation to regulators and prospective employers. However, certificates need to be worded carefully by course providers, and interpreted carefully by course participants and their sponsors and regulators. Generic rater training courses can offer real learning value to those with a regulatory requirement to fulfil or those stepping into aviation English assessment for the first time. Nevertheless, generic rater training course certificates should not be presented by course providers nor perceived by stakeholders as a license or approval to rate any and all language proficiency for personnel licensing. For this very high-stakes assessment activity which has a direct impact on the careers of individuals and on flight safety, we need to keep in mind that a rater without a carefully-constructed test instrument is as useful as a pilot without an aircraft. As ICAEA’s Code of Professional Practice⁴ sets out:

Rater or interlocutor training, even when based on the ICAO LPR Rating Scale, needs to be tailored to a specific test instrument. Test takers interact differently with different test tasks on different tests. How tests produce samples of language for rating purposes also differs. Raters and interlocutors need specific training on how to deliver and rate the test they are required to administer.

Henry Emery is Latitude’s Managing Director. He was project manager for the development of the English test for Aviation for flight crew which was the first test in the world to receive an endorsement from ICAO in 2012⁵. He was also project chair for the development of the ICAO Rated Speech Samples Training Aid developed by the International Civil Aviation English Association.

For information on Latitude’s 40-hour blended online Rater and test developer training course, click here or email courses@latitude-aes.aero.

¹ ICAO Document 9835 Manual on the Implementation of Language Proficiency Requirements section 2.5

² ibid. section 4.5.5 (b)

³ https://cfapps.icao.int/RSSTA/

⁴ https://www.icaea.aero/about/code-of-professional-practice/

⁵The English Test for Aviation held a conditional ICAO endorsement from 2012-2013.