AI-Assisted (Pre-TP) Feedback on the CELTA Course

Mohamed Oummih
Apr 1
7 min read

Can you Teach a Language Speaker to Speak (or a Teacher to Teach)?

If you were to ask a Cambridge CELTA trainee how much support they want to help them succeed on the course, you can expect the following answer: as much as possible. And with reason: the CELTA is a notoriously challenging course. As challenging as it is objectively, though, it can feel even more so subjectively. In spite of consistently low fail rate (across all parts of the world), the course is perceived to be particularly stressful because of what feels like a shift in the way candidates are expected to learn.

Indeed, most candidates come from an academic background, where the ability to process and repackage knowledge is the measure of success; the CELTA runs on an entirely different premise, wherein success is measured by the candidate’s ability to practice a craft. The analogy I prefer is the one most language teachers will understand instinctively: it’s possible (and doubtless useful) for a language learner to understand the grammar of the language they’re trying to learn, but actually learning the language requires risk-taking, experimentation and learning from mistakes.

The Trainer’s Moment to Pounce

The part of the CELTA course where candidates (are expected to) make mistakes the most is Teaching Practice (TP). This is when trainees teach real language students while being observed; candidates then receive detailed oral and written feedback from their tutor.

Before each TP, candidates submit a detailed lesson plan: their lesson aims, stage aims, materials, procedures, anticipated problems and solutions, materials, and more. These plans represent the CELTA candidate’s initial efforts to think like a teacher, and like any piece of writing, there is a strong and definite feedback loop: the candidate shapes the plan, and he or she is in turn shaped by the effort of writing it. This feedback loop is at the heart of the experiential learning paradigm, which itself is the cornerstone of how the Cambridge CELTA was designed.

The planning process on the CELTA follows these three steps:

· The candidate receives “TP points,” which are documents detailing the lesson materials (usually specific pages and exercises from the course book), the lesson focus (e.g., writing, grammar, etc.), and relevant information about staging, interaction patterns, and problems and solutions that may come about in this lesson. As the course progresses, the detail in these documents decreases, until by the end of the course candidates are capable of planning without any TP points at all.

· Candidates come to “assisted lesson planning” (ALP) having closely read their TP points. The trainer helps the trainee to stage and plan the lesson in a discussion that usually lasts between 10 and 15 minutes. While the focus of the discussion is each candidate’s individual lesson, it is held in the company of all the trainees in that TP group, so that everyone can benefit from everyone else’s planning process (in addition to their own).

· The candidate goes away and writes their plan. On a part-time course where trainees teach once per week, they typically have one full week to write and submit their lesson plan.

Given this set-up, if further support were to be provided, when exactly would be the best moment for the trainer to pounce? I believe that it’s between ALP and submission of the plan, mainly because this is where the trainee is on their own for the longest period of time, with the highest stakes. Once the lesson plan is submitted, assessment for that lesson has begun.

The Experiential Paradigm vs the Supportive Trainer

This does not, however, answer the question of how much further support the trainee should get.

The experiential training paradigm which I mentioned before holds that you learn by doing, even (especially?) when what you’re doing is wrong. The trainee makes mistakes, understands through feedback, and tries it again. To extend the metaphor of the clown, how many rotten tomatoes is too many before the trainer intervenes? Can we really teach a clown how to effectively stumble and fall? Or can a clown only gauge his effectiveness through their interaction with the audience?

My personal belief is that more support is always better than less, but with a caveat: that support should be provided in a way that safeguards the trainee’s ability to make decisions for themselves, and thereby both own the lesson they are going to teach, and own the mistakes they are going to make. The aim isn’t to remove the possibility of making mistakes, it’s to help candidates deepen their understanding of the mechanisms at play, thereby improving their ability to learn from the mistakes they are going to make.

It was the desire to provide candidates with this extra support, without undermining the experiential paradigm at the CELTA’s foundation, that led me to design and build (with the help of Claude) the EfA CELTA TP Plan Reviewer.

A Pre-Teaching Plan Review: The Right Tool at the Right Time

The TP Plan Reviewer is a web application that gives candidates structured feedback on their lesson plans before they teach. It is the trainee’s opportunity to check which aspects of their plan are working well, and which might need improving. The app does this by reading the plan and providing three types of feedback, separated into four columns:

· Area: The aspect of the plan being referenced, including the CELTA syllabus assessment criteria relevant to that area.

· What the plan shows: A description of what the lesson plan says.

· Principles to consider: Tenets of communicative language teaching methodology that will support the candidate in rewriting this part of the plan.

· Why it matters: The theory behind this action point.

The above is preceded and followed by plan strengths. This is the first thing trainees see after uploading their plans:

And this is at the bottom of the page:

Not too Late, Not too Early, Just on (Part-) Time

There are pitfalls that I tried to avoid when designing this tool. I wanted to make sure candidates didn’t use it the night before TP and then spend all night re-writing their plan, thus leading to a great plan but poor performance due to lack of sleep (if you’re a trainer, you know that this happens infuriatingly often).

To prevent this from happening, there is a log-in for each candidate, and the TP rota (this is a document that says when each candidate will teach) is uploaded for each course. The tool only allows candidates to use it up to 48 hours before they teach. If they try to use it less than two days before they teach, the tool will prevent them from doing so. We don’t want trainee teachers to use this tool too late.

And we don’t want them to use it too early. Using the course too soon on the course would provide candidates with feedback they won’t be able to act upon because they haven’t yet become familiar with the terminology, much less the concepts used during feedback. At TP 1, for example, much of the feedback would be useless to most candidates. That is why the tool is only available from TP 2 onwards.

Finally it is designed to be used only on part-time courses. This goes back to the fact that candidates need time to act upon the feedback provided by the tool. On intensive courses there are often just 48 hours between ALP and TP. The tool only works on full lesson plans, which are generally completed less than 24 hours before candidates teach. Getting feedback at that stage can lead to panicked and often counter-productive re-writing of the plan, not to mention over-focus on the plan. Being convinced of the excellence of their plan is the bane of the inexperienced teacher, because it reduces their wariness of where the real problems occur: in the classroom.

The calibration: one tool for one CELTA trainer

This was the first tool I have designed using AI. I discovered that the heart of a tool like this is its system prompt, which is the set of instructions that tell the AI how to review a plan. Writing and refining that prompt was the most important part of the development process, and the most important ingredient in it is real data: actual lesson plans and actual written feedback from previous EFA courses.

In order to build the app, I uploaded numerous lesson plans from previous courses, including my feedback, and instructed the AI to focus on feedback related to the plan only, not to teaching. It used this raw data to extract principles, language, and tone I tend to use when giving feedback to candidates on their lesson plans. The lesson plans were anonymized, and used only to extract the principles which would go into the system prompt.

Right now there is only one version of the app, which I’m planning to roll out on our next CELTA course. What I’m aiming for, though, is for each of our trainers to calibrate their own app based on their own written feedback from previous courses. I think this is important because I want to avoid major discrepancies between the feedback provided by the tool and the feedback that will be provided by that particular trainer. Not all trainers focus equally on all things, nor do they use the same tone or the same approach. Consistency is important because dissonance between feedback from the tool and feedback from the trainer can lead to course participants feeling misled.

Does it Work?

I’ve never been trained in programming, and I know only as much about programming as I’ve had to learn in order to get this project done. And it was fun!

Yeah, okay—but does it work?

After using plans from previous courses to calibrate the tool, I have been testing it using other plans, and comparing Claude’s feedback to my own. Right now, the results are pretty good, but still not good enough. As I’ve been testing, I’ve been adding to the tool’s system prompt, usually by editing it directly in the backend using a local API (I swear I kind of know what those terms mean).

It’s a real kick to see how adjusting the system prompt just a bit can lead to the elimination of false positives. Indeed, so far, these have been the major weakness of the app: it makes action points out of things that aren’t at all worth focusing on, or else are just not action points at all.

To Roll-out or to Roll Not

If I don’t manage to eliminate these false positives, I will not roll the tool out, because it could lead to more harm than good.

But if I do manage to eliminate the false positives I think I’ll have taken a small step towards increasing learner autonomy without compromising the experiential learning principles upon which the CELTA course is built. And, with luck, we’ll be a bit further along in our answering the question we all face as trainers and teachers: how do we support without taking ownership away?

I’ll let you know!