***All technical information was sourced from PTE Academic Automated Scoring White Paper (ctfassets.net) , all opinions expressed are purely my own and contain no malicious intent***
Why? I will explain soon, but let’s learn a bit more about how PTE Academic Speaking and Writing are assessed. The IEA can apparently be tuned to understand and evaluate text in ANY subject area and includes built-in detectors for off-topic responses or other situations that may need to be referred to human readers.
How is it used in PTE Academic Writing? Responses from the 200-300 word essays and 50-70 word summaries are submitted to the IEA. It evaluates the meaning of the response and mechanical aspects of the writing. It then COMPARES the response with a large set of training responses, computes similarities and assigns a score based on content. Grammar, structure and coherence is also compared to the large set of training responses. Then the response is ranked.
How did they “train the IEA”? More than 50 000 written responses were collected for the training of IEA and scored first by two HUMAN raters (and then a third if the first two did not agree). The scores from the human raters were then INPUT FOR TRAINING IEA. Pearson states that the reliability of the measure of writing in PTE Academic is 0.89. What does this mean? It means that the scores given by the IEA is very reliable (a score of 0.89 is considered good) so we can safely assume that test takers are being as fairly and accurately marked as test takers who are marked by humans. My anecdotal experience leads me to believe this is not the case.
What about PTE Academic Speaking? Another fancy name, the Pearson Ordinate technology, is used to score PTE Academic Speaking.
Ordinate tech is the result of years of research in speech recognition, statistical modelling, linguistics, and testing theory. It is designed to analyse and automatically score speech from NATIVE and NON-NATIVE speakers of English.
How did they “teach” the Ordinate tech to score spoken language? From the PTE Academic white pages, they explained it this way: First a HUMAN gave the Ordinate tech a list of things to listen to. Then the human scored a speech sample and shared the parameters he/she used to score the speech sample with the Ordinate tech. This was done by many expert scorers (humans) who fed scores into the Ordinate tech from hundreds of test takers. The speech processor for PTE Academic has been trained on more than 126 different accents and can deal with all accents equally – or so they say. Independent studies have proven that Ordinate’s automated scoring system can more objective and dependable than “many” of the best human-rated tests, including one-on-one proficiency interviews.
The Ordinate scoring system collects information from test takers’ spoken responses (e.g. pace, timing and rhythm, vocal power, emphasis, intonation, and pronunciation accuracy). Apparently, it is able to recognise the words that speakers select EVEN IF THEY ARE MISPRONOUNCED and evaluates the content, relevance and coherence of it. My experience teaching? NOPE. You don’t say it the exact way it needs to be said and the technology just WILL NOT DETECT THE WORD YOU HAVE SAID!
Why the need for AUTOMATED SCORING?
Yes! You guessed it. Automated, AI scoring was created to help IMPARTIAL, JUDGEMENT and BIAS FREE SCORING. A direct quote from the PTE white page, “This means that the system is not “distracted” by language-irrelevant factors such as a test taker’s appearance, personality or body language (as can happen in spoken interview tests). Such impartiality means that test takers can be confident that they are being judged solely on their language performance, and stakeholders can be confident that a test taker’s scores are “generalizable” – that they would have earned the same score if the test had been administered in Beijing, Brussels or Bermuda.”
Well… as “generalizable” as it may seem, it has produced some rather distasteful tricks and unwanted habits amongst test takers.
My final thoughts
I know this must seem like an attack on Pearson and PTE Academic, but it really isn’t. I love the language, appreciate the need for proficiency tests, I see the role artificial intelligence like KAT, IEA and the Ordinance play in the future of language testing but I will always stand by this one, true fact: all language is learnt for the sake of effective communication, and there is no true communication without the human element – the English language is no exception. I do not believe for a second that the current level of both IEA and Ordinance tech are effectively contributing to an increase in communicative skills.
My anecdotal experience teaching has proven that candidates will do almost anything to convince the software that they are effective communicators – when quite often they are not quite at the level that the AI pegs them at.
The use of templates, rote-learned vocabulary that’s mindlessly regurgitated, speeding through question-types that they believe do not contribute heavily to their overall scores and focusing purely on just trying to obtain the score they need instead of improving their general command of English is NOT what Pearson should be condoning.
What can be done? Combine AI marking with human marking. Instead of just coming up with what they say “fits a critical gap by providing a state-of-the-art test that accurately measures the English language speaking, listening, reading and writing abilities of non-native speakers” they need to come up with a way to guarantee (especially to all bodies that accept results from PTE Academic) that English language command is sufficient for TRUE, MEANINGFUL and EFFECTIVE COMMUNICATION.
If you don’t agree or would like to express your opinions, you’re welcome to do so. I would love to learn about other teachers’ experiences and would even love to hear from someone within Pearson!
Comments