A Study of Q-Learning in the Taxi-v3 Environment: Reinforcement Learning for Optimal Navigation

Authors

DOI:

https://doi.org/10.51153/kjcis.v8i1.256

Keywords:

reinforcement learning, Q-learning, Taxi-v3, hyperparameter tuning

Abstract

Reinforcement Learning (RL) has widely showcased its effectiveness across a variety of domains including healthcare, robotics, gaming, and autonomous driving. RL involves teaching an agent to navigate through an environment whilst trying different actions to receive feedback in terms of rewards and penalties. This leads to an iterative process of learning how to take actions that provide the most rewards. A widely used model-free algorithm in RL is the tabular Q-learning algorithm which aims to identify an optimal policy by selecting actions that maximize rewards. This research takes a deeper look into the application of Q-learning in the Taxi-v3 environment, a popular environment for evaluating different RL algorithms. Specifically, our study focuses on hyperparameters and their optimization to determine how they impact the performance of the agent in the Taxi-v3 environment. To assess how the agent performs in the environment, we use the rewards that the agent obtains throughout each episode, the steps it takes in each episode to finish the task, and the loss values that indicate how well the agent was able to predict the optimal actions required for a given state. Our agent attained the highest reward during episode 1,196 during the initial values, and when we optimized the hyperparameters and used the fine-tuned values, the agent achieved the same reward during episode 248. This behavior of the agent after the fine-tuning process suggests that optimizing the hyperparameters leads to the agent learning an optimal policy early on in the training and improves the quality of the Q-learning algorithm in solving challenges that involve navigating through grid-based worlds.

References

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.

Barros e Sá, G. C., & Madeira, C. A. G. (2025). Deep reinforcement learning in real-time strategy games: a systematic literature review. Applied Intelligence, 55(3), 243.

Aljamal, M., Patel, S., & Mahmood, A. (2025). Comprehensive Review of Robotics Operating System-Based Reinforcement Learning in Robotics. Applied Sciences, 15(4), 1840.

Lin, S., Zhou, S., Jiao, H., Wang, M., Yan, H., Dou, P., & Chen, J. (2025). CDR-Detector: a chronic disease risk prediction model combining pre-training with deep reinforcement learning. Complex & Intelligent Systems, 11(1), 104.

Du, G., Zou, Y., Zhang, X., & Fan, J. (2025). Improved Deep Reinforcement Learning for Efficient Motion Control of Autonomous Vehicle with Domain-Centralized Electronic and Electrical Architecture. IEEE Internet of Things Journal.

Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8, 279-292.

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623-2631).

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:1606.01540.

Xu, C. (2023). Maze solving problem using q-learning. Applied and Computational Engineering, 6, 1491-1497.

Çimen, M. E., Garip, Z., Yalç?n, Y., Kutlu, M., & Boz, A. F. (2023). Self Adaptive Methods for Learning Rate Parameter of Q-Learning Algorithm. Journal of Intelligent Systems: Theory and Applications, 6(2), 191-198.

Yang, H. (2021). An independent study of reinforcement learning and autonomous driving. arXiv preprint arXiv:2110.07729.

Gupta, A., Roy, P. P., & Dutt, V. (2021). Evaluation of instance-based learning and Q-learning algorithms in dynamic environments. IEEE Access, 9, 138775-138790.

Esmaeily, F., & Keyvanpour, M. R. (2020, September). WMat algorithm based on Q-Learning algorithm in taxi-v2 game. In 2020 4th International Conference on Smart City, Internet of Things and Applications (SCIOT) (pp. 112-118). IEEE.

Beikmohammadi, A., & Magnússon, S. (2023, May). Comparing nars and reinforcement learning: An analysis of ona and q-learning algorithms. In International Conference on Artificial General Intelligence (pp. 21-31). Cham: Springer Nature Switzerland.

Dar, S. A., Palanivel, S., & Geetha, M. K. (2022). Autonomous Taxi Driving Environment Using Reinforcement Learning Algorithms. International Journal of Modern Education and Computer Science, 14(3), 88.

Van Hasselt, H., Guez, A., & Silver, D. (2016, March). Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).

Qiang, W., & Zhongli, Z. (2011, August). Reinforcement learning model, algorithms and its application. In 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC) (pp. 1143-1146). IEEE.

Karnivaurus, PolBM, Clwainwright, Neil G, Zhenlingcn, & Khandelwal, S. (1963, September 01). Understanding the role of the discount factor in reinforcement learning. Stats Stack Exchange. Retrieved November 15, 2024, from https://stats.stackexchange.com/questions/221402/understanding-the-role-of-the-discount-factor-in-reinforcement-learning

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. nature, 518(7540), 529-533.

Zhang, T., Li, X., & Wu, J. (2021). A comparative study of reinforcement learning algorithms in grid-world environments. Journal of Artificial Intelligence Research, 58, 125-142.

Published

2025-07-01

How to Cite

A Study of Q-Learning in the Taxi-v3 Environment: Reinforcement Learning for Optimal Navigation. (2025). KIET Journal of Computing and Information Sciences, 8(1). https://doi.org/10.51153/kjcis.v8i1.256

Most read articles by the same author(s)

Make a Submission

Make a Submission

Developed By

Open Journal Systems
-->