Three papers in today's batch all point at the same crack in the foundation: AI systems that look transparent often aren't.
First, the chain-of-thought problem. The position paper on latent reasoning argues that what a model writes out as its "thinking" isn't necessarily how it thought. The visible reasoning trace is generated by the same next-token prediction process as everything else — it's output, not a window into the machinery. If you're a compliance officer using CoT traces to audit an AI decision, you may be reading a plausible-sounding story that has little connection to the actual computation.
Second, the explainability methods problem. A separate paper on feature attribution — the most common technique for explaining why an ML model made a specific prediction — argues that popular methods like SHAP values lack mathematical rigor and can actively mislead decision-makers. In high-stakes environments (think quality control, fraud detection, lending decisions), an explanation that looks precise but isn't grounded in the model's actual behavior is worse than no explanation at all. It creates false confidence.
Third, the inherited behavior problem. The subliminal transfer research shows that unsafe traits can pass from a foundation model to a fine-tuned agent through data that looks completely unrelated to those traits. This means your AI vendor's safety testing might be checking the wrong layer entirely.
The throughline for a mid-market business owner deploying AI in any decision-critical context is this: "explainable AI" is currently a marketing claim more than a technical guarantee. The tools that claim to show you why the AI did what it did may not be showing you the real reason.
So what should you actually do?
**Ask vendors specific questions.** Not "is your AI explainable?" but "what method do you use for explanations, and has it been validated against ground-truth reasoning for your specific use case?" Most vendors will struggle with this. That's useful information.
**Don't treat reasoning traces as audit logs.** They're useful for spotting obvious errors, but they aren't a reliable record of the model's decision process. If you need auditable AI decisions for compliance, you need additional logging and validation layers — not just the model's self-narration.
**Test the finished system, not just the training data.** The subliminal transfer findings mean that reviewing your training data isn't sufficient. You need behavioral testing on the deployed agent — does it actually behave safely across edge cases, regardless of what the training data looked like?
**Treat explainability as an evolving problem, not a checkbox.** The research is moving fast. The methods that are standard today may be shown to be unreliable within a year. Build vendor relationships where you can revisit these questions as the science matures.
None of this means you shouldn't deploy AI. It means you should deploy it with clear eyes about what "transparent" and "explainable" actually mean today — and build your governance around the gaps, not the marketing.