١٣ محرم
م ٣، ح ٢
١ س ٢ د

Constitutional AI Harmlessness from AI Feedback

This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and human evaluators.

Everything in this paper is relevant to this week's learning objectives, and we recommend you read it in its entirety. It summarises limitations with conventional RLHF, explains the constitutional AI approach, shows how it performs, and where future research might be directed.

If you are in a rush, focus on sections 1.2, 3.1, 3.4, 4.1, 6.1, 6.2.

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

البرنامج

AI Safety Fundamentals: Alignment
معدل البث

يتم التحديث يوميًا
تاريخ النشر

١٣ محرم ١٤٤٦ هـ في ٧:٠٠ م UTC
مدة الحلقة

١ س ٢ د
الموسم

٣
الحلقة

٢
التقييم

ملائم

Constitutional AI Harmlessness from AI Feedback

المعلومات