HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition
Published in Findings of EACL, 2026
The HiKE benchmark introduces the first publicly available, hierarchical evaluation framework for Korean-English code-switching (CS) speech recognition, addressing a longstanding gap in multilingual ASR evaluation.
HiKE provides high-quality, natural code-switched speech data with loanword-aware annotations and a hierarchical labeling scheme spanning word-, phrase-, and sentence-level code-switching. This structure enables systematic and fine-grained analysis of how ASR models handle distinct linguistic manifestations of code-switching.
Key highlights:
- Introduces hierarchical CS-level annotations (word / phrase / sentence) for precise ASR evaluation.
- Includes loanword labeling, enabling targeted analysis of lexical and phonetic transfer effects.
- Benchmarks diverse multilingual ASR models, revealing consistently weak zero-shot CS-ASR performance.
- Demonstrates that fine-tuning with synthetic code-switched data substantially improves CS-ASR capability.
Recommended citation: Gio Paik, Yongbeom Kim, Soungmin Lee, Sangmin Ahn, Chanwoo Kim. (2026). "HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition." Findings of EACL 2026.
Download Paper | Download Slides
