Fudan University  ·  Shanghai Innovation Institute

Does the same token mean the same state?

MoE routing as signal for reasoning control

RAD·Routing Agreement Decoding

One token is just one door — what matters is which experts light up behind it.

Kang Chen · Mingshen Yu · Junjie Nian · Yaoning Wang · Yixin Cao · Yugang Jiang

The one idea

The answer is in the text. The agreement is in the routes.

Multi-rollout reasoning bets that correct paths recur. But aggregation happens over a visible answer string the controller must extract, canonicalize, and vote over — fine for short answers, ill-defined for code or open-ended outputs. Sparse Mixture-of-Experts models expose another signal: which experts the router engaged to produce each token. RAD reads that, not the text.

01

Same token ≠ same experts

Hold the emitted token id fixed at a repeated anchor: the experts that generate it still separate task identity, trajectory history, and reasoning-effort mode — under 5% cross-problem and 1% cross-effort routing leakage.

02

Routes cluster into answer basins

Near answer-opening delimiter anchors, routing neighborhoods line up with final-answer basins — already at a marker-only readout, and strongest in a short answer-opening window.

03

Pick the densest basin — read no strings

RAD returns the center of the densest Weighted-Jaccard K-NN route basin. It never parses, normalizes, executes, or votes over answer strings; the chosen answer is read only after the routing-only decision.

What's behind the door

A token id is not a sufficient statistic for the router state.

To produce each token, a sparse MoE router engages a few experts with routing weights — a record of which experts drove that token's generation, produced online from the hidden state, never decoded to text.

We align rollouts at repeated anchor token ids and compare routing at those fixed positions. Even with the emitted token id held fixed, the generating experts retain task family, problem identity, and reasoning-budget mode. The anchor token can be the same while the route that produced it is not.

  • Trajectory anchors — discourse markers (So, Now, therefore)
  • Boundary anchors — the final-response transition (</think>)
  • Delimiter anchors — what opens the answer (\boxed{ or a code fence)
t-SNE of pairwise Weighted-Jaccard routing distances at a fixed token id: generating experts cluster by problem identity and by reasoning-effort mode.
Fig. 1Same emitted token, different generating experts. Routing histograms at one fixed token id cluster by problem (left) and by reasoning-effort regime (right).

How RAD works

Locate an anchor. Read its routing window. Select the densest basin.

RAD pipeline: 64 rollouts are aligned by token id at a fixed delimiter anchor; the routing window is averaged into a per-rollout vector; pairwise Weighted-Jaccard similarities form a consensus matrix; RAD selects the rollout with the highest K-NN agreement density.
Fig. 2For each problem, draw N=64 rollouts; align them by token id at a fixed delimiter anchor, read over a W=16 window (delimiter@16); average that window into a vector zi; build the Weighted-Jaccard consensus matrix; select the highest-density route basin. The answer string is read only once, after selection.
  1. zi

    Represent each rollout by its anchor-window routing

    Average the sparse MoE routing vectors over a fixed W=16 window beginning at the answer-opening delimiter.

    zi = 1|Wi| t∈Wi Ri(t)
  2. Sij

    Compare routes with Weighted Jaccard

    Pairwise similarities form an N×N consensus matrix. WJ preserves routed expert mass rather than binarizing which experts fired.

    WJ(u,v) = e min(ue,ve)e max(ue,ve)
  3. qi

    Select the densest K-NN route basin

    Each rollout's local agreement density is the summed similarity to its k=10 nearest routes. RAD returns the densest — a representative of one high-mass basin, not a global medoid pulled between basins.

    qi = j∈KNNk(i) Sij , i = arg maxi qi

The selector may inspect token ids only to locate anchors, and routing only inside the fixed window. It uses no correctness labels, execution results, majority labels, or answer-token contents. Majority counts answer strings; RAD aggregates routing agreement.

Where it lands

On par where voting works. Defined where voting breaks.

73.6Majority
73.9RAD
74.2RAD+DC

Problem-weighted accuracy on the well-posed math + GPQA pool, reading no answer strings. The edge is small and not statistically significant (RAD−Majority +0.28 pp, McNemar p=0.52; RAD+DC−Majority +0.57 pp, p=0.18, n=3180).

Code: voting is undefined, RAD still selects

Raw code outputs are near-unique strings, so an exact-string Majority collapses to singletons (“—”). The same selector still returns a direct pass@1 — and the strict marker-only readout, conditioning on no answer-region tokens, already adds +4.0 pp over the random floor.

Agentic SWE-bench Verified: no answer string at all

Candidates are multi-step trajectories whose outputs are code patches scored pass/fail by a hidden test suite. Re-anchored at the thinking→action boundary, RAD improves best-of-16 selection by +4.7 / +4.0 / +5.4 pp over random on the decidable subset — across three agentic backbones, no per-model tuning.

A consensus signal, not a verifier

RAD's lift over chance is a clean S-curve in how many rollouts are correct: below chance when correct answers are a minority, large-positive once they are a majority. It tracks the most populated route basin, not truth — a dense wrong basin can win, exactly as textual Majority can follow a dense wrong consensus.

Evaluated across 10 MoE configurations · 6 datasets · 64 rollouts / problem · 310,400 generated runs gpt-oss (20B / 120B), Qwen3-30B-A3B, Qwen3-Next-80B-A3B · AIME24/25, BRUMO25, HMMT25, GPQA, LiveCodeBench v5

Watch

One token, many routes — a visual essay

A silent vector essay, drawn live in your browser — no video file. Routes leave one door, drift, and condense into glowing basins; RAD selects the densest.

Cite

Reference

Kang Chen1, Mingshen Yu1, Junjie Nian1, Yaoning Wang1, Yixin Cao1,2,†, Yugang Jiang1

1 Fudan University    2 Shanghai Innovation Institute    Corresponding author

Correspondence: yxcao@fudan.edu.cn  ·  first-author contact kchen24@m.fudan.edu.cn

@article{chen2026rad,
  title   = {Does the Same Token Mean the Same State?
             MoE Routing as Signal for Reasoning Control},
  author  = {Chen, Kang and Yu, Mingshen and Nian, Junjie and
             Wang, Yaoning and Cao, Yixin and Jiang, Yugang},
  year    = {2026},
  note    = {Routing Agreement Decoding (RAD)},
  url     = {https://CckFdu.com/RAD},
}

Answers live in the text; consensus, in the routes.