HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition

Published in Findings of EACL, 2026

The HiKE benchmark introduces the first publicly available, hierarchical evaluation framework for Korean-English code-switching (CS) speech recognition, addressing a longstanding gap in multilingual ASR evaluation.

HiKE provides high-quality, natural code-switched speech data with loanword-aware annotations and a hierarchical labeling scheme spanning word-, phrase-, and sentence-level code-switching. This structure enables systematic and fine-grained analysis of how ASR models handle distinct linguistic manifestations of code-switching.

Key highlights:

  • Introduces hierarchical CS-level annotations (word / phrase / sentence) for precise ASR evaluation.
  • Includes loanword labeling, enabling targeted analysis of lexical and phonetic transfer effects.
  • Benchmarks diverse multilingual ASR models, revealing consistently weak zero-shot CS-ASR performance.
  • Demonstrates that fine-tuning with synthetic code-switched data substantially improves CS-ASR capability.

Recommended citation: Gio Paik, Yongbeom Kim, Soungmin Lee, Sangmin Ahn, Chanwoo Kim. (2026). "HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition." Findings of EACL 2026.
Download Paper | Download Slides