奖励函数论文 - 专知

会员服务 ·

奖励函数

Multi-Objective Planning with Contextual Lexicographic Reward Preferences

Arxiv

0+阅读 · 11月3日

A Reinforcement Learning Framework for Resource Allocation in Uplink Carrier Aggregation in the Presence of Self Interference

Arxiv

0+阅读 · 11月22日

Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models

Arxiv

0+阅读 · 12月4日

BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

Arxiv

0+阅读 · 11月27日

Automatic Reward Shaping from Multi-Objective Human Heuristics

Arxiv

0+阅读 · 12月17日

Statistical analysis of Inverse Entropy-regularized Reinforcement Learning

Arxiv

0+阅读 · 12月7日

DRAGON: Distributional Rewards Optimize Diffusion Generative Models

Arxiv

0+阅读 · 11月14日

Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning

Arxiv

0+阅读 · 11月30日

Differentiable Evolutionary Reinforcement Learning

Arxiv

0+阅读 · 12月15日

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Arxiv

0+阅读 · 10月24日

Provably Efficient Reward Transfer in Reinforcement Learning with Discrete Markov Decision Processes

Arxiv

0+阅读 · 10月22日

Non-Stationary Lipschitz Bandits

Arxiv

0+阅读 · 10月22日

Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

Arxiv

0+阅读 · 10月21日

Expressive Reward Synthesis with the Runtime Monitoring Language

Expressive Reward Synthesis with the Runtime Monitoring Language

Arxiv

0+阅读 · 10月21日

Expressive Reward Synthesis with the Runtime Monitoring Language

Arxiv

0+阅读 · 10月17日

参考链接

微信扫码咨询专知VIP会员