Natural emergent misalignment from reward hacking in production rl [pdf]

57 minutes ago 1

%PDF-1.5 %???? 4176 0 obj << /Linearized 1 /L 911547 /H [ 2123 673 ] /O 4180 /E 111222 /N 68 /T 886211 >> endobj 4177 0 obj << /Type /XRef /Length 96 /Filter /FlateDecode /DecodeParms << /Columns 5 /Predictor 12 >> /W [ 1 3 1 ] /Index [ 4176 206 ] /Info 1633 0 R /Root 4178 0 R /Size 4382 /Prev 886212 /ID [<0f0a305fe600943da9261f4e1f1db467><5a688d9e57b38fe656c4441a769b1069>] >> stream x?cbd`?g`b``8 "?߂H?6ɴL?H?? R?=??d9? ???*?d[ b?[?Hf6?D?

Read Entire Article