Yes Opus 4.7 Burns Tokens in German. Half the Story Is Hype. | Aravind Arumugam
ai6 min read
Yes Opus 4.7 Burns Tokens in German. Half the Story Is Hype.
A post on r/ClaudeAI blew up this week. A Pro user ran the same finance prompt in English and German. English took 37 per cent of his Opus 4.7 session. German hit 100 per cent in seconds. People started calling it a stealth downgrade.
I read the whole thread. Then I read Anthropic's release notes. Then I read the academic papers everyone is citing without linking. Here is what I actually found.
The OP is half right. The half he got wrong is the more interesting story.
What is actually happening
Two things are stacked here and both are real.
Thing one. The new 4.7 tokenizer eats more tokens.
Anthropic shipped a new tokenizer with Opus 4.7 on 16 April 2026. They did not hide it. The official release notes say the new tokenizer typically uses 12 to 18 per cent more tokens than the 4.6 tokenizer on English prose, and up to 35 per cent more on some content types. Pricing stayed the same. So your per-task cost goes up unless you adjust. MindStudio's review and Caylent's deep dive both confirm this. Not sneaky. In the release notes.
Thing two. The language tax. Bigger and older than 4.7.
Tokenizers compress English well because English dominates training data. Other languages do not get the same compression. This has a name. It is called the "language tax" and it has been studied for years.
Tamil, my first language, usually sits between Russian and Greek. I have watched LLMs eat my Tamil prompts since 2021 and quietly cost me more tokens than the English equivalent. This is not news to me. It is news to most of LinkedIn.
Stack 4.7's thirstier baseline with the German language tax. Add adaptive thinking mode and a heavy Excel output the OP was asking for. A 37 per cent English session becoming 100 per cent in German is exactly what you would expect.
What the Reddit post got wrong
Three things I want to call out plainly.
One. It is not sneaky. Anthropic mentioned the tokenizer change in their release notes. The language tax has been studied since 2023. Calling this a stealth downgrade is just not reading the docs. I do not love when "gotcha" framing distracts from a real problem.
Two. It is not Opus 4.7-specific. Every LLM has this. GPT, Gemini, Llama, Mistral. Run the same test on any of them in Tamil or Burmese and the ratios are worse than the German example in the Reddit screenshot. Programmer Raja's post from last year walks through exactly this with the title "How Your LLM Costs 5X More If You Don't Speak English". I would have liked to see that linked in the thread.
Three. Non-English actually got better in places. This is the part nobody is sharing and it is the most interesting bit. The new 4.7 tokenizer uses a denser vocabulary for non-Latin scripts. MindStudio and Caylent both measured 20 to 35 per cent token count REDUCTION for Mandarin, Japanese, Korean, Arabic and Hindi. If your product serves users in those languages, 4.7 is a meaningful cost cut.
The catch is real though. German, French and Spanish, the Latin-script European languages, did not get the same upgrade. They still pay the old language tax and now sit on top of a thirstier English baseline. Worst of both worlds.
The OP's frustration was valid. The framing was off.
What I think you should do
Five things. No conspiracy required.
Measure your actual cost per language. Pick five real prompts from your product. Run them on 4.7 in your top three user languages. Compare to 4.6. Log everything. Your finance team will care soon if you have a multilingual user base. I have not seen one team in the last six months that has actually done this audit.
For complex tasks, prompt in English even if your user is German. I know it feels wrong. You lose some cultural nuance, you gain predictable cost. For internal tools especially, this is the answer the Reddit thread eventually landed on too.
Cap your thinking budget. Opus 4.7 reasons longer when you let it. Set a thinking budget cap when you do not need 30 seconds of internal reasoning per call. Anthropic added the parameter for exactly this reason.
Route by language and use case. If you serve Mandarin, Japanese, Korean, Arabic or Hindi users, send them to 4.7. They actively benefit. If you serve German, French or Spanish, stay on 4.6 for now. If you do web research-heavy work, also stay on 4.6 because that is where 4.7 regressed according to MindStudio.
Be honest with your customers. If you sell a flat per-day message limit, non-English users hit the cap sooner. That is quietly charging them more for the same product. Either calibrate the limit by language or be upfront about it. They will figure it out eventually. Trust costs less to keep than to rebuild.
Why I am writing this
18+ years in tech, the last few in security. I started by soldering ISRO satellite PCBs. I have watched this exact pattern play out before. Web character encoding bias. Unicode handling. Mobile keyboard support for non-English scripts. International phone number parsing. Speech recognition for non-English languages. Every single one of these went through the same arc. English-first infrastructure ships, the world routes around it, the providers catch up later.
The LLM language tax is the next round of the same story. The 4.7 release is actually evidence that Anthropic is taking the non-Latin script gap seriously. The Mandarin and Hindi improvements are a real cost cut for users in those markets. The Latin-script European gap will close in a future release. The Burmese 15x ratio will probably take another two or three model generations to compress.
In the meantime, the practical question is whether your team is measuring the cost asymmetry honestly or hoping nobody notices. The Reddit post made the problem visible for German speakers. The question for you is whether you find out from a customer ticket or from your own dashboard.
What is your team doing about per-language AI cost?