Solve Real-World Problems. Compete. Win.
Noise event detection and removal in real-world Indic speech, built for robust and inclusive speech AI.
Welcome to Datathon@IndoML 2026 - a research-oriented data science competition held in conjunction with IndoML 2026. Building on the success of previous editions, this year's datathon challenges participants to tackle noise event detection and removal in real-world Indic speech - a critical problem for inclusive, robust Automatic Speech Recognition across Indian languages.
The competition is organised into two tightly coupled tracks: Track 1 (Detection) - detect noise events with precise timestamps, effectively utilising data annotated at different levels - and Track 2 (Removal) - suppress detected events while preserving the underlying speech. The dataset is a curated subset of the Vaani corpus, consisting of ~150 hours of real-world Indic audio with three levels of annotation quality.
Top-performing teams will be invited to attend IndoML 2026 and present their solutions to leading researchers and professionals from academia and industry. These teams will also receive exciting cash prizes.
Most academic speech enhancement benchmarks are built around stationary noise (white, pink, café hum) or studio-recorded mixtures. Field recordings from rural and semi-urban India contain bursty, semantically rich events — a passing motorbike, a hen, a pressure-cooker whistle, a doorbell, a TV in another room — that current denoisers either smear over or treat as speech.
Two consequences follow:
This challenge targets that gap directly. It asks participants to treat noise as a first-class, labelled, time-localised object — and then to suppress it without harming the speech.
The official website for Datathon@IndoML 2026 is now live. Registration details, task description, dataset, and timeline will be announced soon.
Stay tuned for updates on registration, task details, and important dates.
Robust, Inclusive Speech Processing for Real-World Indic Speech
Speech recordings collected in real Indian environments are dominated by non-stationary background events - vehicle horns, dogs barking, children crying, doorbells, ringtones, kitchen appliances, and devotional music. These events degrade downstream Automatic Speech Recognition (ASR).
This challenge invites participants to build a two-stage system on the Vaani dataset that (i) detects noise events with precise timestamps, and (ii) removes them while preserving the underlying speech. The challenge is framed under the Responsible AI theme, with explicit emphasis on robustness, linguistic inclusivity across multiple Indian languages, and methodological transparency.
Participants may enter either track independently or both.
Detect noise events in Indic speech recordings with precise onset/offset timestamps. Effectively utilise data annotated at different levels.
{onset: 1.24, offset: 3.81},
{onset: 4.31, offset: 4.71},
{onset: 5.04, offset: 5.41}
Suppress the detected noise events while preserving the underlying speech signal - output clean, intelligible audio.
16 kHz mono WAV - one per test clip, original filename retained
Track 1: Submit a JSON file with onset/offset events per clip.
Track 2: Submit cleaned 16 kHz mono WAV files.
Automated scoring on held-out test clips. Track 1: F1 + Dice. Track 2: SI-SDR + ΔWER, then PESQ for top-5.
Live rankings published after each submission window. Final standings adjusted by expert Novelty Score for top-5 entries.
Top teams invited to present at IndoML 2026 and receive cash prizes. Code release required for prize-eligible entries.
A prediction is correct when its temporal extent overlaps with ground truth within +/-20% of event duration.
Temporal overlap between predicted and reference event segments: 2 * |P intersection G| / (|P|+|G|).
Scale-Invariant Signal-to-Distortion Ratio between enhanced signal and synthetic clean reference.
A frozen multilingual Indic ASR is run on both noisy and enhanced clips. Delta WER = WERnoisy - WERenhanced. Higher is better.
Perceptual Evaluation of Speech Quality - intelligibility & naturalness vs. clean references. Evaluated only for the top-5 initial entries.
This challenge is positioned under the Responsible AI track on three explicit axes:
Real-world Indic recordings - not curated studio mixtures - are the evaluation distribution. Systems are scored on actual ASR improvement (Delta WER), not signal-level metrics alone.
Vaani spans multiple Indian languages and a wide range of speakers. The eval set is monitored for language-wise and class-wise balance so no sub-population is under-represented.
Top-5 submissions must release code and document pre-trained dependencies. The frozen ASR used for Delta WER is publicly identified for independent reproducibility.
A large-scale, openly released Indic speech dataset spanning multiple Indian languages, collected across districts of India. Learn more at vaani.iisc.ac.in or read the paper.
The dataset consists of three types of annotated noise events, totalling ~150 hours of training audio. An additional 10 hours of noise events with clean timestamps will be provided for final evaluation. Effectively utilising data annotated at different quality levels is a key part of the challenge.
Sample Preview on HuggingFace| # | Annotation Type | Duration (hrs) | Description |
|---|---|---|---|
| 1 | Clean Timestamps (🥇 Gold) | 20 | Noise events with precise timestamps where mutual agreement between multiple annotators has been verified |
| 2 | Noisy Timestamps (🥈 Silver) | 100 | Annotated noise events with timestamps, but agreement between annotators is not verified |
| 3 | No Timestamps (🥉 Bronze) | 30 | Only noise event tags present in the transcript — no onset/offset timestamps |
| Training Total | 150 | ||
Three levels of annotation quality. The chart below shows the distribution across the training set.
Clean timestamp data is the smallest subset. Effectively leveraging noisy and tag-only annotations alongside clean labels is a key part of the challenge.
Noise events with precise onset/offset timestamps where mutual agreement between multiple annotators has been verified.
[ { "category": "vehicle_traffic", "tag": "<horn>", "start": "2.714", "end": "3.761" }, { "category": "human_non_speech", "tag": "[breathing]", "start": "4.938", "end": "5.410" }, { "Verification_status": "Verified" } ]
Annotated noise events with timestamps, but agreement between annotators is not verified.
[ { "category": "vehicle_traffic", "tag": "<horn>", "start": "2.714", "end": "3.761" }, { "category": "human_non_speech", "tag": "[breathing]", "start": "4.938", "end": "5.410" } ]
Only noise event tags present in the transcript — no onset/offset timestamps provided.
<noise> सजावट के <horn> </horn> लिए यहाँ एक गुलाब भी<horn> </horn> लगाया गया है। </noise>
TBA
TBA
TBA
TBA
TBA
All deadlines will be at 12:00 Noon IST (Indian Standard Time).
Exciting cash prizes await top-performing teams. Detailed prize distribution will be announced along with the registration opening.
Top teams will be invited to present their solutions at IndoML 2026, in front of leading researchers from academia and industry.
An expert panel award for the most original methodological contribution: new architectures, novel training regimes, or unsupervised approaches.
Students and early-career professionals are welcome. Each team must include at least one member affiliated with an Indian university or research institution.
There is no restriction on team size. However, each participant may only join one team.
We follow an open model and open data policy. Teams may use any publicly available, closed-source, or proprietary models, along with additional data or augmentation strategies.
Evaluation criteria will be announced alongside the task description. Stay tuned!
Top-performing teams will be invited to present at IndoML 2026. Details regarding travel support will be communicated later.
A public benchmark for noise-event-aware speech enhancement on Indic audio — a gap that currently has no widely adopted dataset.
Open-sourced winning systems, raising the floor of available denoising tools for Indian-language ASR.
A reusable evaluation harness pairing event-detection metrics with downstream ΔWER on real audio.
Check out last year's edition: Datathon@IndoML 2025 - Evaluating LLM-Powered AI Tutors