High-dimensional reinforcement learning faces challenges with complex calculations and low sample efficiency in large state-action spaces. Q-learning algorithms struggle particularly with the curse of dimensionality, where the number of state-action pairs grows exponentially with problem size. While neural network-based approaches like Deep Q-Networks have shown success, recent tensor-based methods using low-rank decomposition offer more parameter-efficient alternatives. Building upon existing tensor-based methods, we propose Tensor-Efficient Q-Learning (TEQL), which enhances low-rank tensor decomposition via improved block coordinate descent on discretized state-action spaces, incorporating novel exploration and regularization mechanisms. The key innovation is an exploration strategy that combines approximation error with visit count-based upper confidence bound to prioritize actions with high uncertainty, avoiding wasteful random exploration. Additionally, we incorporate a frequency-based penalty term in the objective function to encourage exploration of less-visited state-action pairs and reduce overfitting to frequently visited regions. Empirical results on classic control tasks demonstrate that TEQL outperforms conventional matrix-based methods and deep RL approaches in both sample efficiency and total rewards, making it suitable for resource-constrained applications, such as space and healthcare where sampling costs are high.
翻译:高维强化学习在大规模状态-动作空间中面临计算复杂度高与样本效率低的挑战。Q学习算法尤其受到维度灾难的影响,其中状态-动作对的数量随问题规模呈指数级增长。尽管基于神经网络的方法(如深度Q网络)已取得显著成果,但近期采用低秩分解的张量方法提供了更具参数效率的替代方案。在现有张量方法基础上,我们提出张量高效Q学习(TEQL),该方法通过改进离散化状态-动作空间上的块坐标下降法来增强低秩张量分解,并引入创新的探索与正则化机制。其核心创新在于结合近似误差与基于访问次数的置信上界的探索策略,优先选择高不确定性的动作,避免无效的随机探索。此外,我们在目标函数中引入基于访问频率的惩罚项,以促进对低访问频率状态-动作对的探索,并减少对高频访问区域的过拟合。经典控制任务的实验结果表明,TEQL在样本效率和累计奖励方面均优于传统基于矩阵的方法与深度强化学习方法,适用于采样成本高昂的资源受限场景(如航天与医疗领域)。