Proof, Not Promises: A New Bar for Behavioral AI