Human interactions are influenced by emotions, temperament, and affection, often conflicting with individuals' underlying preferences. Without explicit knowledge of those preferences, judging whether behaviour is appropriate becomes guesswork, leaving us highly prone to misinterpretation. Yet, such understanding is critical if autonomous agents are to collaborate effectively with humans. We frame the problem with multi-agent inverse reinforcement learning and show that even a simple model, where agents weigh their own welfare against that of others, can cover a wide range of social behaviours. Using novel Bayesian techniques, we find that intrinsic rewards and altruistic tendencies can be reliably identified by placing agents in different groups. Crucially, this disentanglement of intrinsic motivation from altruism enables the synthesis of new behaviours aligned with any desired level of altruism, even when demonstrations are drawn from restricted behaviour profiles.
翻译:暂无翻译