Large language models (LLMs) are widely deployed, but their growing compute demands expose them to inference cost attacks that maximize output length. We reveal that prior attacks are fundamentally self-targeting because they rely on crafted inputs, so the added cost accrues to the attacker's own queries and scales poorly in practice. In this work, we introduce the first bit-flip inference cost attack that directly modifies model weights to induce persistent overhead for all users of a compromised LLM. Such attacks are stealthy yet realistic in practice: for instance, in shared MLaaS environments, co-located tenants can exploit hardware-level faults (e.g., Rowhammer) to flip memory bits storing model parameters. We instantiate this attack paradigm with BitHydra, which (1) minimizes a loss that suppresses the end-of-sequence token (i.e., EOS) and (2) employs an efficient yet effective critical-bit search focused on the EOS embedding vector, sharply reducing the search space while preserving benign-looking outputs. We evaluate across 11 LLMs (1.5B-14B) under int8 and float16, demonstrating that our method efficiently achieves scalable cost inflation with only a few bit flips, while remaining effective even against potential defenses.
翻译:暂无翻译