Solve Real-World Problems. Compete. Win.
Noise event detection and removal in real-world Indic speech, built for robust and inclusive speech AI.
Welcome to Datathon@IndoML 2026 - a research-oriented data science competition held in conjunction with IndoML 2026. Building on the success of previous editions, this year's datathon challenges participants to tackle noise event detection and removal in real-world Indic speech - a critical problem for inclusive, robust Automatic Speech Recognition across Indian languages.
The competition is organised into two coupled tracks: Track 1 (Detection) - detect noise events with precise timestamps - and Track 2 (Removal) - suppress detected events while preserving the underlying speech. The dataset is a curated subset of the Vaani corpus, spanning ~167 hours of labelled real-world Indic audio across seven noise categories.
Top-performing teams will be invited to attend IndoML 2026 and present their solutions to leading researchers and professionals from academia and industry. These teams will also receive exciting cash prizes.
The official website for Datathon@IndoML 2026 is now live. Registration details, task description, dataset, and timeline will be announced soon.
Stay tuned for updates on registration, task details, and important dates.
Robust, Inclusive Speech Processing for Real-World Indic Speech
Speech recordings collected in real Indian environments are dominated by non-stationary background events - vehicle horns, dogs barking, children crying, doorbells, ringtones, kitchen appliances, and devotional music. These events degrade downstream Automatic Speech Recognition (ASR).
This challenge invites participants to build a two-stage system on the Vaani dataset that (i) detects noise events with precise timestamps, and (ii) removes them while preserving the underlying speech. The challenge is framed under the Responsible AI theme, with explicit emphasis on robustness, linguistic inclusivity across multiple Indian languages, and methodological transparency.
Participants may enter either track independently or both.
Detect noise events in Indic speech recordings with precise onset/offset timestamps.
{onset: 1.24, offset: 3.81},
{onset: 4.31, offset: 4.71},
{onset: 5.04, offset: 5.41}
Suppress the detected noise events while preserving the underlying speech signal - output clean, intelligible audio.
16 kHz mono WAV - one per test clip, original filename retained
Track 1: Submit a JSON file with onset/offset events per clip.
Track 2: Submit cleaned 16 kHz mono WAV files.
Automated scoring on held-out test clips. Track 1: F1 + Dice. Track 2: SI-SDR + ΔWER, then PESQ for top-5.
Live rankings published after each submission window. Final standings adjusted by expert Novelty Score for top-5 entries.
Top teams invited to present at IndoML 2026 and receive cash prizes. Code release required for prize-eligible entries.
A prediction is correct when its temporal extent overlaps with ground truth within +/-20% of event duration.
Temporal overlap between predicted and reference event segments: 2 * |P intersection G| / (|P|+|G|).
Scale-Invariant Signal-to-Distortion Ratio between enhanced signal and synthetic clean reference.
A frozen multilingual Indic ASR is run on both noisy and enhanced clips. Delta WER = WERnoisy - WERenhanced. Higher is better.
Perceptual Evaluation of Speech Quality - intelligibility & naturalness vs. clean references. Evaluated only for the top-5 initial entries.
This challenge is positioned under the Responsible AI track on three explicit axes:
Real-world Indic recordings - not curated studio mixtures - are the evaluation distribution. Systems are scored on actual ASR improvement (Delta WER), not signal-level metrics alone.
Vaani spans multiple Indian languages and a wide range of speakers. The eval set is monitored for language-wise and class-wise balance so no sub-population is under-represented.
Top-5 submissions must release code and document pre-trained dependencies. The frozen ASR used for Delta WER is publicly identified for independent reproducibility.
A large-scale, openly released Indic speech dataset spanning multiple Indian languages, collected across districts of India. Learn more at vaani.iisc.ac.in or read the paper. The challenge subset provides ~167 hours of training audio across seven noise-event categories, with a ~16.7-hour evaluation set (proportional to training distribution, ~10% of each category).
Annotations include onset/offset timestamps and category labels. The training set is imbalanced and reflects the natural frequency of events in real Vaani recordings - Human Non-Speech (40 hrs) and Animal/Vehicle (~40 hrs each) dominate, while Appliance/Machine is comparatively rare (~1.89 hrs). Handling this long-tailed distribution is part of the task.
Sample Preview on HuggingFace| # | Category | Example Events | Train (hrs) | Eval (hrs) |
|---|---|---|---|---|
| 1 | Animal | Barking, mooing, bird chirps, insect noise, cat, hen, goat | ~40 | ~4 |
| 2 | Vehicle / Traffic | Horns, engines, motorbikes, sirens, train, general traffic | ~40 | ~4 |
| 3 | Baby / Child | Crying, babbling, yelling, playing, child laughter | ~24.2 | ~2.42 |
| 4 | Singing / Music | Background music, singing, instruments, prayer, devotional | ~13.4 | ~1.34 |
| 5 | Phone / Signal / Alarm | Ringtones, beeps, alarms, sirens, bells, doorbells | ~7.98 | ~0.798 |
| 6 | Appliance / Machine | Fans, mixers, TVs, mics, typing, clocks, machinery | ~1.89 | ~0.189 |
| 7 | Human Non-Speech | Breathing, lip smacks, coughs, sneezes, snoring, throat clearing | ~40 | ~4 |
| Total | ~167.47 | ~16.7 | ||
Seven noise-event categories annotated with onset/offset timestamps. The chart below illustrates the long-tailed distribution in the training set.
The training set is intentionally imbalanced — Appliance/Machine (~1.89 hrs) is the rarest class. Handling this long tail is part of the challenge.
Barking, mooing, bird calls & chirps, cricket/insect noise, cat meows, hen, rooster, goats
Horns, engines, motorcycles, cars, trains, ambulance sirens, general traffic noise
Crying, babbling, yelling, screaming, laughing, playing
Background music, singing, instruments, whistling melodies, drums, flute, prayer/devotional music
Phone ringing, ringtones, beeps, vibrations, alarms, sirens, buzzers, bells, doorbells, notifications
Fans, mixers, TVs, mics/loudspeakers, typing, clocks, washing machines, generic machinery
Breathing, lip smacks, coughs, sneezes, yawns, throat-clearing, snoring, hiccups
TBA
TBA
TBA
TBA
TBA
All deadlines will be at 12:00 Noon IST (Indian Standard Time).
Exciting cash prizes await top-performing teams. Detailed prize distribution will be announced along with the registration opening.
Top teams will be invited to present their solutions at IndoML 2026, in front of leading researchers from academia and industry.
An expert panel award for the most original methodological contribution: new architectures, novel training regimes, or unsupervised approaches.
Students and early-career professionals are welcome. Each team must include at least one member affiliated with an Indian university or research institution.
There is no restriction on team size. However, each participant may only join one team.
We follow an open model and open data policy. Teams may use any publicly available, closed-source, or proprietary models, along with additional data or augmentation strategies.
Evaluation criteria will be announced alongside the task description. Stay tuned!
Top-performing teams will be invited to present at IndoML 2026. Details regarding travel support will be communicated later.
Check out last year's edition: Datathon@IndoML 2025 - Evaluating LLM-Powered AI Tutors