moved theses to own page
This commit is contained in:
parent
19ad6583cd
commit
b092deebed
3 changed files with 84 additions and 0 deletions
19
content/theses/_index.md
Normal file
19
content/theses/_index.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
---
|
||||
title: "theses"
|
||||
weight: 10
|
||||
---
|
||||
# theses
|
||||
---
|
||||
## master-thesis
|
||||
title: reinforcement learning theory and implementation in a custom environment
|
||||
|
||||
you can find the thesis abstract [here](/theses/master-thesis), the pdf [here](/pdf/mthesis.pdf), and the code [here](https://github.com/aethrvmn/GodotPneumaRL)
|
||||
|
||||
topic: reinforcement learning
|
||||
|
||||
## bachelor-thesis
|
||||
title: the one–dimensional heisenberg model, rg methods and numerical simulation of the sdrg process
|
||||
|
||||
you can find the thesis abstract[here](/theses/bachelor-thesis), the pdf [here](/pdf/bthesis.pdf), and the code [here](https://github.com/aethrvmn/1d-RandAFHeisenberg-SDRG)
|
||||
|
||||
topic: random af spin-1/2 heisenberg model and the sdrg method
|
17
content/theses/bachelor-thesis.md
Normal file
17
content/theses/bachelor-thesis.md
Normal file
|
@ -0,0 +1,17 @@
|
|||
---
|
||||
title: "bachelor thesis"
|
||||
weight: 30
|
||||
---
|
||||
# The One–Dimensional Heisenberg Model<br/>RG Methods and Numerical Simulation of the SDRG Process
|
||||
---
|
||||
you can find the thesis [here](/pdf/bthesis.pdf) and the code [here](https://github.com/aethrvmn/1d-RandAFHeisenberg-SDRG)
|
||||
|
||||
## Abstract
|
||||
|
||||
The Strong Disorder Renormalisation Group (SDRG) method, first introduced by Dasgupta, Ma and Hu, and later greatly expanded by Fisher, yields asymptotically exact results in distributions where the disorder grows without limit in large scales, whilst Fisher also calculated limit values as well as scaling factors for random spin chains.
|
||||
|
||||
These results where the first of many, yielded through the intense research that followed afterwards, firstly in random quantum systems, and later expanded in classically disordered systems as well. The previous Real Space RG methods that were used treated the whole space as homogenous, allowing the grouping of spins into super-spins, and although in systems absent of randomness this homogenity is physically verifiable, it comes into question in the presence of disordered systems. The SDRG method has the property of renormalising space in a non-homogenous way so it can better handle local disorders.
|
||||
|
||||
More specifically, the XX chain, presented by Fisher, can be used to obtain exact results for the behaviour of phases dominated by randomness, as well as the critical behaviour near the various zero temperature phase transitions that occur. Studying the properties of antiferromagnetic Heisenberg spin-1/2 chains with random bonds, we analyse the low-energy behaviour, by decimating the strongest bond, replacing it with a new effective bond between the nearest neighbours. Repeating the procedure, the distribution becomes extremely broad improving the accuracy of the approximation.
|
||||
|
||||
The structure of the thesis is this. First we introduce the Heisenberg model, it's relation to the Ising and Free Fermion models, solve it exactly for the ferromagnetic case using the Bethe Ansatz and introduce the Block RG method for the antiferromagnetic case. Afterwards we present the Strong Disorder RG method, using a modernised version of Fisher's process to solve the random AF XX chain. Finally, we present the methods we created to simulate the process.
|
48
content/theses/master-thesis.md
Normal file
48
content/theses/master-thesis.md
Normal file
|
@ -0,0 +1,48 @@
|
|||
---
|
||||
title: "masters thesis"
|
||||
weight: 20
|
||||
---
|
||||
# Reinforcement Learning<br/>Theory and Implementation in a Custom Environment
|
||||
---
|
||||
you can find the thesis [here](/pdf/mthesis.pdf) and the code [here](https://github.com/aethrvmn/GodotPneumaRL)
|
||||
|
||||
## Abstract
|
||||
|
||||
Reinforcement Learning (RL) is a subcategory of Machine Learning that consis-
|
||||
tently surpasses human performance and demonstrates superhuman understand-
|
||||
ing in various environments and datasets. Its applications span from master-
|
||||
ing games like Go and Chess to optimizing real-world operations in robotics, fi-
|
||||
nance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic
|
||||
and complex scenarios highlight their transformative potential across multiple do-
|
||||
mains.
|
||||
|
||||
In this thesis, we present some core concepts of Reinforcement Learning.
|
||||
|
||||
First, we introduce the mathematical foundation of Reinforcement Learning
|
||||
(RL) through Markov Decision Processes (MDPs), which provide a formal frame-
|
||||
work for modeling decision-making problems where outcomes are partly random
|
||||
and partly under the control of a decision-maker, involving state transitions influ-
|
||||
enced by actions. Then, we give an overview of the two main branches of Rein-
|
||||
forcement Learning: value-based methods, which focus on estimating the value of
|
||||
states or state-action pairs, and policy-based methods, which directly optimize the
|
||||
policy that dictates the agent’s actions.
|
||||
|
||||
We focus on Proximal Policy Optimization (PPO), which is the de facto baseline
|
||||
algorithm in modern RL literature due to its robustness and ease of implementa-
|
||||
tion, and discuss its potential advantages, such as improved sample efficiency and
|
||||
stability, as well as its disadvantages, including sensitivity to hyper-parameters
|
||||
and computational overhead. We emphasize the importance of fine-tuning PPO to
|
||||
achieve optimal performance.
|
||||
|
||||
We demonstrate the application of these concepts within Pneuma, a custom-
|
||||
made environment specifically designed for this thesis. Pneuma aims to become
|
||||
a research base for independent Multi-Agent Reinforcement Learning (MARL),
|
||||
where multiple agents learn and interact within the same environment. We outline
|
||||
the requirements for such environments to support MARL effectively and detail
|
||||
the modifications we made to the baseline PPO method, as presented by OpenAI,
|
||||
to facilitate agent convergence for a single-agent level.
|
||||
|
||||
Finally, we discuss the potential for future enhancements to the Pneuma envi-
|
||||
ronment to increase its complexity and realism, aiming to create a more RPG-like
|
||||
setting, optimal for training agents in complex, multi-objective, and multi-step
|
||||
tasks.
|
Loading…
Reference in a new issue