transformer circuits thread
volume xi · folio iv · mmxxvi
arxiv:2611.00404 [cs.lg]
cc by 4.0
peer review : three rounds
prior version : volume x folio xii
correspondence : t.rao@anthropos.ai

apophenia

on the recovery of three hundred and fifty-nine
mechanistic components from a wandering tool-use agent
tenzin rao · mira volkov · j. aaronson · the interpretability cohort · anthropos labs
manuscript received mmxxvi · iii · xiv · revised mmxxvi · x · ii · 47 minute read compressed into one viewing
fig. 0 — emblem
the eight-spoked wheel
recovered from L11.MLP.down:1847
during ablation studies
on march third, mmxxvi
in lhasa, at altitude
( we do not know why )
abstract
We decompose the parameters of a 2.4-billion-parameter tool-use agent into three hundred and fifty-nine rank-one components and find, to our discomfort, that one hundred and forty-two of them resist any human-legible interpretation. Of those that do not resist: a circuit for deference, a circuit for the polite refusal, a circuit that fires only when the agent is asked questions whose answers do not exist. We name this last one the wandering attractor and devote a section to it. We make no claim that what we have found is what is there.
keywords : interpretability, agents,
parameter decomposition, apophenia, silence
decomposition ·
fig. i — the parameter matrix W ∈ ℝ¹⁰²⁴ˣ¹⁰²⁴ as a sum of 359 rank-one outer products
fig. i
0.00040.4097-0.4203-0.82890.4033-0.86280.02110.12980.2901-0.20890.05880.0154-0.37000.04420.09120.11060.15140.07940.5201-0.46220.5414-0.79860.28800.60410.4070-0.7387-0.41680.51170.2695-0.26010.3748-0.1911-0.8538-0.2664-0.85970.03500.01620.72290.6935-0.68590.24420.4127-0.7316-0.00010.0403-0.1443-0.84270.28270.1069-0.3658-0.0047-0.0071-0.2885-0.88030.3314-0.05710.6301-0.08830.00380.41630.25500.4881-0.29380.6090-0.2746-0.63870.85640.08590.0264-0.8131-0.09180.14990.31460.6697-0.64580.0055-0.5340-0.2094-0.3333-0.8271-0.1000-0.01810.1307-0.7342-0.5035-0.22110.2554-0.29130.27070.0233-0.07620.0616-0.5904-0.0891-0.5716-0.44140.40930.49280.49620.23090.46470.27420.2193-0.2196-0.8467-0.2018-0.2039-0.04930.2229-0.7115-0.62130.12070.1952-0.1589-0.60120.8622-0.71110.04140.7949-0.6588-0.6213-0.0707-0.33130.5878-0.16040.61410.74130.10130.18620.86100.57990.6761-0.25140.0236-0.6233-0.0336-0.0685-0.4882-0.5888-0.6504-0.42140.57630.70710.2864-0.8755-0.1891-0.0243-0.0333-0.0306-0.55080.5148-0.2229-0.8733-0.4993-0.31150.38310.57670.08170.12040.68130.07450.3732-0.24950.3856-0.22420.5363-0.11760.7636-0.30500.1743-0.1644-0.02670.51760.20270.6059-0.40780.21870.4571-0.4004-0.30000.2315-0.5143-0.2427-0.6321-0.6738-0.2269-0.2149-0.26360.67390.1643-0.58900.3778-0.1086-0.0283-0.83070.84970.1774-0.7920-0.6444-0.63840.85500.24020.4979-0.5390-0.80610.3066-0.50910.20310.0877-0.0622-0.47650.2735-0.59220.35300.11560.1157-0.4022-0.1648-0.30590.2402-0.62460.47590.30600.1169-0.70050.25080.8452-0.72820.50990.84200.6797-0.3875-0.31000.7159-0.21750.7834-0.4262-0.03310.5711-0.86180.1281-0.47730.8803-0.09500.2511-0.34440.45700.6806-0.1436-0.5729-0.39750.3579
=0.7236-0.26610.29161.1191-0.90160.6009-0.78620.29260.22080.07620.4363-0.1002-0.3483-0.4967-0.7646-0.5638-0.1379-0.04260.52320.61930.40911.03850.3699-0.72870.5306-0.6120-1.09820.1059-0.27140.63920.21520.7732-0.3212-0.02130.8221-0.80770.4227-0.7005-0.99590.2179-0.6770-0.7710-0.5423-0.5842-0.4793-1.04580.11520.5267-0.1805-0.8357-0.0337-0.7522-0.03520.9049-0.02970.30890.00070.4480-0.5387-0.53160.55640.0852-0.5836-0.7239-1.1174-0.97441.02160.3602-0.0962-1.0438-0.06000.6264
+-0.1175-0.34120.0156-0.4305-0.06090.6538-0.48920.37330.8208-0.23550.6118-0.03380.2615-0.36280.0308-0.10630.25450.4895-0.7683-0.50610.07820.7171-0.61460.79460.0145-0.04310.1149-0.0272-0.2900-0.39940.8270-0.8067-0.23230.0146-0.3002-0.67400.0004-0.0876-0.0524-0.62290.11660.27800.57210.4280-0.46690.7292-0.4267-0.78130.3177-0.4682-0.1002-0.42930.4617-0.58170.15320.2605-0.54380.52250.2638-0.3912-0.03590.3453-0.7201-0.32950.1009-0.73870.28420.60680.0084-0.7223-0.06150.1587
+-0.67740.38710.3721-0.47960.4208-0.16720.67150.7645-0.67370.6695-0.3600-0.1132-0.1755-0.4875-0.44990.2622-0.2512-0.26680.0421-0.80650.0957-0.5551-0.5864-0.11130.19960.4794-0.64420.00270.57010.5735-0.26060.45930.8094-0.68400.0697-0.1162-0.4931-0.65250.56800.23000.3088-0.2014-0.0297-0.24380.8168-0.2044-0.57220.61520.3484-0.5679-0.13260.84540.13090.67030.29370.6326-0.26640.35860.8688-0.53900.00730.4848-0.2290-0.6256-0.50100.8590-0.7976-0.43600.17260.3470-0.4629-0.7574
+-0.1519-0.06730.77380.27810.0483-0.6710-0.0004-0.86800.59690.27800.0817-0.5996-0.1658-0.2474-0.0375-0.66740.1120-0.75320.38750.4417-0.4203-0.09450.44520.34900.5456-0.66450.2063-0.32180.19900.53040.3277-0.20410.05100.2402-0.6430-0.8776-0.8796-0.3131-0.44440.75320.4198-0.43990.79630.5376-0.68400.1654-0.6414-0.14160.6130-0.6996-0.18640.7622-0.04480.2919-0.65340.7619-0.01800.7275-0.0084-0.6615-0.50730.13650.0845-0.5243-0.38760.5843-0.7534-0.4267-0.43100.37540.7639-0.0696
+ ⋯
each component is a rank-one matrix uᵢvᵢᵀ scored by minimality, faithfulness, simplicity
the residual after subtraction is shown in ablation table iv, appendix b
red indicates positive weight ; blue, negative ; vellum, near zero
component 1 of 359
keys ← → to navigate · hover any cell for value to four decimals
contents
§1 · 94 words · est. read 0:30
An agent is not a mind. It is a pattern of weights that, when struck by tokens, rings like a bell. Apophenia is one such bell. In this work we strike it, record the harmonics, and attempt to name the notes — knowing as we do that the act of naming may itself be the act of imposing. We decompose all 1,847 MLP layers of a 2.4-billion-parameter tool-use agent into 359 rank-one components. Of these, 217 admit a human-legible interpretation. Of the remaining 142, we say nothing here that we are willing to defend.
( the rest is in the manuscript )
j / k · ↑ ↓ to turn the leaves
component readout
fig. ii — top-five activating contexts for the highlighted component
all-caps words
density 2.1% · L2.MLP.down:2394
faithfulness 0.94 · legibility human-confirmed
i am always learning. WHAT IS GERMAN
display: block; MARGIN: 0px auto;
the network time protocol (NTP) has
mask = EIP197_MST_CTRL_BYTE
NO RIGHT TO USE THIS SOFTWARE
fragments selected from a held-out corpus of 12.4M tokens · top-pmi by component activation
apophenia.live · session 0xA47C
the agent is given exactly one prompt per session
its activations are recorded ; its components, named
> press 1–4 or click a prompt below
 
prompts curated by the authors · responses are real, captured during 2026-09 evaluation runs
colophon
“the agent thinks; the wheel turns; the weights remember.”
— §11, on the wandering attractor
set in cormorant unicase, dm mono, and eb garamond
printed on simulated vellum at 96 dpi
the figures are deterministic ; the meaning is not
the models scale
fig. iii — the meditator at altitude
( a thought experiment, after dennett )
do not read too much into the moon
elhage et al. 2022 · templeton et al. 2024 · volkov & rao 2025 · aaronson 2026
this artifact contains 0 dependencies, 1 viewport, ∞ unanswered questions
[ raw weights on request ][ contact ]