This paper presents a deep reinforcement learning (DRL) framework for dynamic portfolio optimization under market uncertainty and risk. The proposed model integrates a Sharpe ratio-based reward function with direct risk control mechanisms, including maximum drawdown and volatility constraints. Proximal Policy Optimization (PPO) is employed to learn adaptive asset allocation strategies over historical financial time series. Model performance is benchmarked against mean-variance and equal-weight portfolio strategies using backtesting on high-performing equities. Results indicate that the DRL agent stabilizes volatility successfully but suffers from degraded risk-adjusted returns due to over-conservative policy convergence, highlighting the challenge of balancing exploration, return maximization, and risk mitigation. The study underscores the need for improved reward shaping and hybrid risk-aware strategies to enhance the practical deployment of DRL-based portfolio allocation models.
翻译:本文提出了一种在市场不确定性和风险下进行动态投资组合优化的深度强化学习(DRL)框架。该模型将基于夏普比率的奖励函数与直接风险控制机制相结合,包括最大回撤和波动率约束。采用近端策略优化(PPO)算法,在历史金融时间序列上学习自适应资产配置策略。通过在高收益股票上进行回测,将模型性能与均值-方差及等权重投资组合策略进行基准比较。结果表明,DRL智能体能够有效稳定波动率,但由于策略收敛过于保守,风险调整后收益有所下降,这凸显了在探索、收益最大化与风险缓解之间取得平衡的挑战。本研究强调,需要改进奖励塑造机制并发展混合风险感知策略,以提升基于DRL的投资组合配置模型的实际应用能力。