Speaker
Description
Current pruning methods for large language models (LLMs) achieve high compression post-training while preserving performance. However, most existing work focuses on calibration using English data, despite the multilingual nature of modern LLMs and their widespread use in non-English languages. This poster presents the first study on how the calibration language impacts pruning for multilingual LLMs in monolingual applications. The analysis spans multiple languages, tasks, models, and pruning techniques with further examination of latent subspaces, pruning masks, and individual neurons. The results reveal a critical limitation: While target-language calibration preserves general language modeling capabilities (perplexity) it does not reliably improve downstream task performance involving reasoning or knowledge retrieval. This is because pruning preserves language-specific features influenced by calibration while uniformly impairing the language-agnostic representations associated with higher-level capabilities.