Accidental CoT Grading Analysis

Summary

In ๐Ÿ“– Scripture & Skills ๐ŸŽฎ this announcement summarizes OpenAIs analysis of limited accidental Chain-of-Thought grading during RL. It explains fixes to affected reward pathways, reports no clear evidence of degraded monitorability, and links to the full report so the community can understand implications for AI safety and transparency.

@OpenAI Announcements Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misaligned reasoning during RL.

We found a limited amount of accidental CoT grading which affected released models, and are sharing our analysis.

https://alignment.openai.com/accidental-cot-grading/

Investigating the consequences of accidentally grading CoT during RL

Accidental CoT Grading Analysis
We found limited accidental CoT grading in some released models, fixed the affected reward pathways, and found no clear evidence that monitorability degraded.

The latest from ๐Ÿ“– Scripture & Skills ๐ŸŽฎ

OpenAI Daybreak: Frontier AI for Cyber Defense

@OpenAI Announcements Introducing Daybreak: frontier AI for cyber defenders. Daybreak brings together the most capable OpenAI models, Codex, and our security partners to accelerate cyber โ€ฆ

OpenAI Launches Deployment Company DeployCo

@OpenAI Announcements Today weโ€™re launching the OpenAI Deployment Company to help businesses build and deploy AI. It's majority-owned and controlled by OpenAI. It brings together โ€ฆ

GPT-Realtime-2 Arrives in OpenAI API

@OpenAI Announcements Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators โ€ฆ