Same token ≠ same experts
Hold the emitted token id fixed at a repeated anchor: the experts that generate it still separate task identity, trajectory history, and reasoning-effort mode — under 5% cross-problem and 1% cross-effort routing leakage.
Fudan University · Shanghai Innovation Institute
MoE routing as signal for reasoning control
One token is just one door — what matters is which experts light up behind it.
The one idea
Multi-rollout reasoning bets that correct paths recur. But aggregation happens over a visible answer string the controller must extract, canonicalize, and vote over — fine for short answers, ill-defined for code or open-ended outputs. Sparse Mixture-of-Experts models expose another signal: which experts the router engaged to produce each token. RAD reads that, not the text.
Hold the emitted token id fixed at a repeated anchor: the experts that generate it still separate task identity, trajectory history, and reasoning-effort mode — under 5% cross-problem and 1% cross-effort routing leakage.
Near answer-opening delimiter anchors, routing neighborhoods line up with final-answer basins — already at a marker-only readout, and strongest in a short answer-opening window.
RAD returns the center of the densest Weighted-Jaccard K-NN route basin. It never parses, normalizes, executes, or votes over answer strings; the chosen answer is read only after the routing-only decision.
What's behind the door
To produce each token, a sparse MoE router engages a few experts with routing weights — a record of which experts drove that token's generation, produced online from the hidden state, never decoded to text.
We align rollouts at repeated anchor token ids and compare routing at those fixed positions. Even with the emitted token id held fixed, the generating experts retain task family, problem identity, and reasoning-budget mode. The anchor token can be the same while the route that produced it is not.
How RAD works
Average the sparse MoE routing vectors over a fixed W=16 window beginning at the answer-opening delimiter.
Pairwise similarities form an N×N consensus matrix. WJ preserves routed expert mass rather than binarizing which experts fired.
Each rollout's local agreement density is the summed similarity to its k=10 nearest routes. RAD returns the densest — a representative of one high-mass basin, not a global medoid pulled between basins.
The selector may inspect token ids only to locate anchors, and routing only inside the fixed window. It uses no correctness labels, execution results, majority labels, or answer-token contents. Majority counts answer strings; RAD aggregates routing agreement.
Where it lands
Problem-weighted accuracy on the well-posed math + GPQA pool, reading no answer strings. The edge is small and not statistically significant (RAD−Majority +0.28 pp, McNemar p=0.52; RAD+DC−Majority +0.57 pp, p=0.18, n=3180).
Raw code outputs are near-unique strings, so an exact-string Majority collapses to singletons (“—”). The same selector still returns a direct pass@1 — and the strict marker-only readout, conditioning on no answer-region tokens, already adds +4.0 pp over the random floor.
Candidates are multi-step trajectories whose outputs are code patches scored pass/fail by a hidden test suite. Re-anchored at the thinking→action boundary, RAD improves best-of-16 selection by +4.7 / +4.0 / +5.4 pp over random on the decidable subset — across three agentic backbones, no per-model tuning.
RAD's lift over chance is a clean S-curve in how many rollouts are correct: below chance when correct answers are a minority, large-positive once they are a majority. It tracks the most populated route basin, not truth — a dense wrong basin can win, exactly as textual Majority can follow a dense wrong consensus.
Evaluated across 10 MoE configurations · 6 datasets · 64 rollouts / problem · 310,400 generated runs gpt-oss (20B / 120B), Qwen3-30B-A3B, Qwen3-Next-80B-A3B · AIME24/25, BRUMO25, HMMT25, GPQA, LiveCodeBench v5
Watch
Cite
@article{chen2026rad,
title = {Does the Same Token Mean the Same State?
MoE Routing as Signal for Reasoning Control},
author = {Chen, Kang and Yu, Mingshen and Nian, Junjie and
Wang, Yaoning and Cao, Yixin and Jiang, Yugang},
year = {2026},
note = {Routing Agreement Decoding (RAD)},
url = {https://CckFdu.com/RAD},
} Answers live in the text; consensus, in the routes.