
Watchtower
Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.
< Back
Date:
Anthropic
Change
Moderate
Anthropic updated its Responsible Scaling Policy (RSP) from v3.2 to v3.3. In Section 1, “Our Recommendations for Industry-Wide Safety,” this update included a changed description in the “Capability or usage threshold” for “novel chemical/biological weapons production.” It’s notable that the RSP changelog calls this a “revision,” while the Opus 4.8 system card frames it as a “clarification of the intent of our earlier threshold.” The latter understates the change, given that this is a reshaping rather than a wording tweak. The capability the threshold requires narrowed, from a model that significantly helps threat actors (moderately resourced, expert-backed teams) to one that can substitute for the world-leading specialists such a team would need. However, the harm it covers broadened, from damage “far beyond” COVID-19 to damage “comparable to or worse than” it. The mitigations tied to this threshold remain unchanged.

The system card also states that the reasoning used to clear models under the old wording would carry over to the new wording and that the two definitions reach the same conclusion. Anthropic asserts this equivalence rather than demonstrating it; the card doesn't show a model measured against both.
There were also changes to Section 3.1, Risk Reports: Scope and Timing. The criteria for including internally deployed models in a Risk Report are now structured around two explicit conditions: (1) the model could pose risks related to high-stakes misalignment or automated R&D, and (2) those risks significantly exceed those of models covered by a prior Risk Report. Additionally, v3.3 names and refines “off-cycle updates,” analyses that Anthropic publishes between Risk Reports when a model’s risk profile changes significantly.
V3.3 also makes minor terminology changes, notably renaming the third capability-threshold row from “High-stakes sabotage opportunities” to “Misaligned AI systems in high-stakes settings.” The threshold text itself is unchanged.
Correction: This post originally described the chem/bio threshold change as simply making the bar "harder to trigger." That was incomplete. The capability the threshold requires was narrowed, but the harm it covers was broadened, from damage "far beyond" COVID-19 to damage "comparable to or worse than" it. We've also updated the screenshot to include the footnotes where this point is made. Thanks to @MaskedTorah for flagging.
