🗣️ TeLeS: Making AI More Honest About What It Hears

🎧 What If AI Could Say, “I’m Not Sure”?

Automatic Speech Recognition (ASR) is at the heart of voice assistants, transcription tools, and smart speakers. But here’s the problem: these systems often act overconfident — even when they get things wrong.

That’s dangerous when accuracy matters — like in healthcare, law, or low-resource languages.

So this paper tackles a fundamental problem: How can we make ASR models know when they might be wrong?

The answer? A clever new score called TeLeS — a blend of time alignment + word similarity — that teaches AI to estimate how confident it really is.

🧠 The Innovation: TeLeS = Temporal + Lexeme Similarity

Most confidence estimation models use binary logic:

✅ Right word → score = 1
❌ Wrong word → score = 0

But that’s… too blunt. What about:

Minor typos? (e.g. “president” → “presidant”)
Timing errors?
Mixed accents?

TeLeS (Temporal Lexeme Similarity) solves this by assigning a score between 0 and 1 based on:

🕰️ Temporal similarity: Did the spoken and predicted words occur at the same time?
🔤 Lexeme similarity: How similar are the actual and predicted words in spelling and meaning?

Then it trains a separate confidence model (called WLC) using this fine-grained score.

⚙️ How It Works (Simplified)

ASR makes a prediction — say, turning speech into words.
TeLeS aligns the ASR output with the ground truth and computes:
- Lexical similarity (word overlap)
- Temporal similarity (start/end time of words)
These become the target scores for a new model that learns to predict how confident it should be.
During testing, this confidence score helps identify:
- What the AI thinks it got right
- What it’s unsure about

🚀 Bonus: TeLeS Learns Smarter with “Shrinkage Loss”

Because most words in a transcript are correct, models can become biased toward high confidence.

To fix this, the paper uses shrinkage loss:

Focuses on learning from hard-to-learn (wrong or borderline) examples
Ignores the “too easy” (obvious) ones

This makes the confidence model more balanced and robust.

💡 Active Learning: TeLeS-A Knows What to Ask

The authors go a step further with TeLeS-A — an active learning system that picks:

🤔 Uncertain predictions to send to human annotators
🤖 Confident ones to self-label and add to training data

This human-in-the-loop setup improves the ASR over time — using TeLeS scores to guide what to learn next.

🌍 Tested in 3 Indian Languages

This isn’t theory — it’s been tried on real-world datasets in:

🇮🇳 Hindi (Prasar Bharati and Common Voice)
🏛️ Tamil (IISc-MILE)
🎙️ Kannada (IISc-MILE)

And even on mismatched domains (i.e., data very different from training), TeLeS held up beautifully — better than state-of-the-art.

📊 Results: More Trustworthy Predictions

Compared to previous methods, TeLeS:

Had better calibration (the gap between confidence and accuracy was smaller)
Handled subtle errors better than binary-label approaches
Achieved lower Word Error Rates (WER) in active learning settings

Method	WER ↓	Calibration Error ↓	Score Quality ↑
Class-Prob	❌ High	❌ Poor	⚠️ Basic
Entropy-Based	❌ Mid	⚠️ So-so	⚠️ Inconsistent
Binary Labels	⚠️ Mid	⚠️ Inaccurate	⚠️ Blunt Scoring
TeLeS	✅ Low	✅ Well-calibrated	✅ Fine-Grained

🧵 TL;DR: TeLeS Makes Voice AI More Honest

By fusing how similar a word sounds with when it was said, this paper introduces a new confidence scoring system that helps speech recognition systems:

Be more aware of their own errors
Learn smarter, with fewer annotations
Work better across languages and accents

It’s a step toward trustworthy AI that knows its limits — and asks for help when it’s unsure.

🗣️ TeLeS: Making AI More Honest About What It Hears

🎧 What If AI Could Say, “I’m Not Sure”?

🧠 The Innovation: TeLeS = Temporal + Lexeme Similarity

⚙️ How It Works (Simplified)

🚀 Bonus: TeLeS Learns Smarter with “Shrinkage Loss”

💡 Active Learning: TeLeS-A Knows What to Ask

🌍 Tested in 3 Indian Languages

📊 Results: More Trustworthy Predictions

🧵 TL;DR: TeLeS Makes Voice AI More Honest

Leave a Reply Cancel reply

Products

Industries

Company