home/content/self/theses/master-thesis.md

---
title: "masters thesis"
weight: 20
---
# Reinforcement Learning<br/>Theory and Implementation in a Custom Environment
---
you can find the thesis [here](/pdf/mthesis.pdf) and the code [here](https://github.com/aethrvmn/GodotPneumaRL)

## Abstract

Reinforcement Learning (RL) is a subcategory of Machine Learning that consis-
tently surpasses human performance and demonstrates superhuman understand-
ing in various environments and datasets. Its applications span from master-
ing games like Go and Chess to optimizing real-world operations in robotics, fi-
nance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic
and complex scenarios highlight their transformative potential across multiple do-
mains.

In this thesis, we present some core concepts of Reinforcement Learning.

First, we introduce the mathematical foundation of Reinforcement Learning
(RL) through Markov Decision Processes (MDPs), which provide a formal frame-
work for modeling decision-making problems where outcomes are partly random
and partly under the control of a decision-maker, involving state transitions influ-
enced by actions. Then, we give an overview of the two main branches of Rein-
forcement Learning: value-based methods, which focus on estimating the value of
states or state-action pairs, and policy-based methods, which directly optimize the
policy that dictates the agent’s actions.

We focus on Proximal Policy Optimization (PPO), which is the de facto baseline
algorithm in modern RL literature due to its robustness and ease of implementa-
tion, and discuss its potential advantages, such as improved sample efficiency and
stability, as well as its disadvantages, including sensitivity to hyper-parameters
and computational overhead. We emphasize the importance of fine-tuning PPO to
achieve optimal performance.

We demonstrate the application of these concepts within Pneuma, a custom-
made environment specifically designed for this thesis. Pneuma aims to become
a research base for independent Multi-Agent Reinforcement Learning (MARL),
where multiple agents learn and interact within the same environment. We outline
the requirements for such environments to support MARL effectively and detail
the modifications we made to the baseline PPO method, as presented by OpenAI,
to facilitate agent convergence for a single-agent level.

Finally, we discuss the potential for future enhancements to the Pneuma envi-
ronment to increase its complexity and realism, aiming to create a more RPG-like
setting, optimal for training agents in complex, multi-objective, and multi-step
tasks.