update README
This commit is contained in:
parent
9b488e9b62
commit
6f648b4e6b
16 changed files with 539 additions and 13 deletions
1
README.md
Normal file
1
README.md
Normal file
|
@ -0,0 +1 @@
|
||||||
|
Personal HUGO website
|
|
@ -1,6 +0,0 @@
|
||||||
---
|
|
||||||
title: life
|
|
||||||
---
|
|
||||||
|
|
||||||
under construction
|
|
||||||
|
|
17
content/misc/_index.md
Normal file
17
content/misc/_index.md
Normal file
|
@ -0,0 +1,17 @@
|
||||||
|
---
|
||||||
|
title: "misc"
|
||||||
|
---
|
||||||
|
|
||||||
|
this is the place for my miscalleneous projects.
|
||||||
|
|
||||||
|
here you'll find a collection of my work: from research and theses to explorations in mathematics, physics, and computer science.
|
||||||
|
|
||||||
|
## <u>contents</u>
|
||||||
|
|
||||||
|
- [master-thesis](/pdf/mthesis.pdf): reinforcement learning theory and implementation in a custom environment.
|
||||||
|
- [bachelor-thesis](/pdf/bthesis.pdf): random af spin-1/2 heisenberg model and the sdrg method.
|
||||||
|
- [nimertes](nimertes): making a foundational llm from scratch in nim, a system's language.
|
||||||
|
- [black-scholes](black-scholes.md): exploring the black-scholes equation.
|
||||||
|
- [human-diffusion](human-diffusion): a study on the diffusion of human populations.
|
||||||
|
|
||||||
|
feel free to explore and delve into the details.
|
17
content/misc/bachelor-thesis.md
Normal file
17
content/misc/bachelor-thesis.md
Normal file
|
@ -0,0 +1,17 @@
|
||||||
|
---
|
||||||
|
type: "page"
|
||||||
|
showTableOfContents: true
|
||||||
|
---
|
||||||
|
# The One–Dimensional Heisenberg Model, RG Methods and Numerical Simulation of the SDRG Process
|
||||||
|
|
||||||
|
You can find the thesis [here](/pdfs/bthesis.pdf) and the code [here](https://github.com/aethrvmn/1d-RandAFHeisenberg-SDRG)
|
||||||
|
|
||||||
|
## Abstract
|
||||||
|
|
||||||
|
The Strong Disorder Renormalisation Group (SDRG) method, first introduced by Dasgupta, Ma and Hu, and later greatly expanded by Fisher, yields asymptotically exact results in distributions where the disorder grows without limit in large scales, whilst Fisher also calculated limit values as well as scaling factors for random spin chains.
|
||||||
|
|
||||||
|
These results where the first of many, yielded through the intense research that followed afterwards, firstly in random quantum systems, and later expanded in classically disordered systems as well. The previous Real Space RG methods that were used treated the whole space as homogenous, allowing the grouping of spins into super-spins, and although in systems absent of randomness this homogenity is physically verifiable, it comes into question in the presence of disordered systems. The SDRG method has the property of renormalising space in a non-homogenous way so it can better handle local disorders.
|
||||||
|
|
||||||
|
More specifically, the XX chain, presented by Fisher, can be used to obtain exact results for the behaviour of phases dominated by randomness, as well as the critical behaviour near the various zero temperature phase transitions that occur. Studying the properties of antiferromagnetic Heisenberg spin-1/2 chains with random bonds, we analyse the low-energy behaviour, by decimating the strongest bond, replacing it with a new effective bond between the nearest neighbours. Repeating the procedure, the distribution becomes extremely broad improving the accuracy of the approximation.
|
||||||
|
|
||||||
|
The structure of the thesis is this. First we introduce the Heisenberg model, it's relation to the Ising and Free Fermion models, solve it exactly for the ferromagnetic case using the Bethe Ansatz and introduce the Block RG method for the antiferromagnetic case. Afterwards we present the Strong Disorder RG method, using a modernised version of Fisher's process to solve the random AF XX chain. Finally, we present the methods we created to simulate the process.
|
220
content/misc/black-scholes.md
Normal file
220
content/misc/black-scholes.md
Normal file
|
@ -0,0 +1,220 @@
|
||||||
|
---
|
||||||
|
title: Black Scholes Model
|
||||||
|
---
|
||||||
|
|
||||||
|
|
||||||
|
You can find the repository [here](https://github.com/aethrvmn/Black-Scholes-Model)
|
||||||
|
|
||||||
|
**Amandeep Singh, Vasilis Valatsos**
|
||||||
|
|
||||||
|
We attempt to make a program that predicts optional premiums, using the Black-Scholes model, introduced in 1973.
|
||||||
|
|
||||||
|
To start off, we first install all the required modules, (We have the cell commented, but in the case that one or more modules aren't installed, uncomment and run once the cell below.)
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
# pip install -r "requirements.txt"
|
||||||
|
```
|
||||||
|
|
||||||
|
where the module scipy.stats is not imported here but is needed in the class that we import.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
from BlackScholes import Model
|
||||||
|
```
|
||||||
|
|
||||||
|
### Section 1
|
||||||
|
|
||||||
|
#### Testing
|
||||||
|
|
||||||
|
To check whether the class functions work, as well as to explain the process, we can create a Brownian motion to simulate a possilbe stock price over four years (1460 days) and calculate all the needed variables.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
def brownian(time_range, mean=0, sd=1):
|
||||||
|
time = np.linspace(0,1,time_range)
|
||||||
|
path = np.zeros(time_range)
|
||||||
|
for i in np.arange(1, time_range):
|
||||||
|
path[i] = path[i-1]+np.random.normal(mean*time[i], sd*time[i])
|
||||||
|
|
||||||
|
return time, path
|
||||||
|
```
|
||||||
|
|
||||||
|
where time has a range from 0 to 1, because we can always scale the dates of the hypothetical stock to fit those values with an appropriate transformation. Below we see the graph of the motion
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
time_range = 1460 #in days
|
||||||
|
walk = brownian(time_range)
|
||||||
|
time = walk[0]
|
||||||
|
stock_price = walk[1] + np.abs(min(walk[1])) +1
|
||||||
|
plt.plot(time, stock_price)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Next, we define the exercise price, where here we choose a "smooth" deviation from the original stock price, essentially taking a stable function and adding noise, defined by a uniform distribution over an arbitrary, probably non-realistic range. Likewise, the interest rate should float between 0% and 1%, and finally the range of the time to expiration was made 1, 2, 3, 4, and 5 months.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
variance=np.random.uniform(0.005, 0.01)
|
||||||
|
exercise_price = stock_price + 0.1*stock_price
|
||||||
|
interest = np.random.uniform(0.005,0.01, time_range)
|
||||||
|
expiration_date = 30 #days
|
||||||
|
volatility = 0.23
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, we can initialize the Model class, using the above as inputs and executing the call and put functions.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
option = Model(stock_price, exercise_price, interest, expiration_date, volatility)
|
||||||
|
option.call()
|
||||||
|
option.put()
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
plt.plot(stock_price)
|
||||||
|
plt.plot(option.callval)
|
||||||
|
plt.plot(option.putval)
|
||||||
|
plt.legend(['Stock Price', ' Call Price', 'Put Price'])
|
||||||
|
plt.savefig('example.png')
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Since we see that the behaviour is as expected and since we want to reuse the names of variables, we clear everything and move to Section 2. (We could just skip this step since the variables would simply get replaced, but it is easier during working to just clear everything. There were many instances where we unknowingly used variables that we didn't replace and were puzzled at the plots no changing etc.)
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
del option, stock_price, exercise_price, interest, variance, volatility, walk, time_range, time, expiration_date
|
||||||
|
```
|
||||||
|
|
||||||
|
### Section 2
|
||||||
|
|
||||||
|
#### Application using real-world data
|
||||||
|
|
||||||
|
Having checked that the idea works, we begin by opening the .xlsx file we wish to use and defining our parameters before initializing our class.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
df = pd.read_excel('ZNGA.xlsx', usecols=['Date', 'High (in $)', 'Low (in $)', 'Close/Last', 'Difference of High and Low'])
|
||||||
|
high = df['High (in $)'].array
|
||||||
|
low = df['Low (in $)'].array
|
||||||
|
close = df['Close/Last'].array
|
||||||
|
date = df.Date.array
|
||||||
|
interest = np.random.uniform(0.005,0.01, len(close))
|
||||||
|
volatility = (np.average(high - low, weights = close))
|
||||||
|
expiration_date = 60
|
||||||
|
```
|
||||||
|
|
||||||
|
Because we want to check what happens for an exercise price both higher and lower than the original stock price (say ±10%), we create a vector containing those two values and we initialize both instances with a simple forloop
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
exercise_price = [close-(0.1*close), close+(0.1*close)]
|
||||||
|
option = []
|
||||||
|
for change in exercise_price:
|
||||||
|
option.append(Model(close, change, interest, expiration_date, volatility))
|
||||||
|
```
|
||||||
|
|
||||||
|
Now we call the call() and put() functions for each of the instances to generate the data, which we have plotted below.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
for model in option:
|
||||||
|
model.call()
|
||||||
|
model.put()
|
||||||
|
|
||||||
|
plt.plot(date, close)
|
||||||
|
plt.plot(date, model.callval)
|
||||||
|
plt.plot(date, model.putval)
|
||||||
|
plt.legend(['Stock Price', ' Call Price', 'Put Price'])
|
||||||
|
plt.xlabel('time')
|
||||||
|
plt.ylabel('price')
|
||||||
|
|
||||||
|
if model.ex_p[0] < model.stock_p[0]:
|
||||||
|
plt.title('Behaviour with a lower exercise price than stock price')
|
||||||
|
plt.savefig('lower.png')
|
||||||
|
else:
|
||||||
|
plt.title('Behaviour with higher exercise price than stock price')
|
||||||
|
plt.savefig('higher.png')
|
||||||
|
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Finally, we use pandas to export all of the information we have gathered and created into an excel file. The model genrates the data frame of values of change in option prices which can be converted into an xlsx file which can then be
|
||||||
|
passed onto/be communicated to a non python user for further interpretation and decision making.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
data = []
|
||||||
|
for i in np.arange(len(exercise_price)):
|
||||||
|
dict = {'Date': date,
|
||||||
|
'Stock Price': option[i].stock_p,
|
||||||
|
'Exercise Price': option[i].ex_p,
|
||||||
|
'Interest': option[i].int_rate,
|
||||||
|
'Time to Expiration': option[i].time_to_exp,
|
||||||
|
'Volatility': option[i].vol,
|
||||||
|
'Call Price': option[i].callval,
|
||||||
|
'Put Price': option[i].putval}
|
||||||
|
data.append(pd.DataFrame(dict, index = None))
|
||||||
|
|
||||||
|
with pd.ExcelWriter('assignment.xlsx') as writer:
|
||||||
|
data[0].to_excel(writer, sheet_name='Sheet 1', header = True, index = False)
|
||||||
|
data[1].to_excel(writer, sheet_name='Sheet 2', header = True, index = False)
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
time_range = 1460 #in days
|
||||||
|
walk = brownian(time_range)
|
||||||
|
time = walk[0]
|
||||||
|
stock_price = walk[1] + np.abs(min(walk[1])) +1
|
||||||
|
plt.plot(time, stock_price)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Next, we define the exercise price, where here we choose a "smooth" deviation from the original stock price, essentially taking a stable function and adding noise, defined by a uniform distribution over an arbitrary, probably non-realistic range. Likewise, the interest rate should float between 0% and 1%, and finally the range of the time to expiration was made 1, 2, 3, 4, and 5 months.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
variance=np.random.uniform(0.005, 0.01)
|
||||||
|
exercise_price = stock_price + 0.1*stock_price
|
||||||
|
interest = np.random.uniform(0.005,0.01, time_range)
|
||||||
|
expiration_date = 30 #days
|
||||||
|
volatility = 0.23
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, we can initialize the Model class, using the above as inputs and executing the call and put functions.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
option = Model(stock_price, exercise_price, interest, expiration_date, volatility)
|
||||||
|
option.call()
|
||||||
|
option.put()
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
plt.plot(stock_price)
|
||||||
|
plt.plot(option.callval)
|
||||||
|
plt.plot(option.putval)
|
||||||
|
plt.legend(['Stock Price', ' Call Price', 'Put Price'])
|
||||||
|
plt.savefig('example.png')
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Since we see that the behaviour is as expected and since we want to reuse the names of variables, we clear everything and move to Section 2. (We could just skip this step since the variables would simply get replaced, but it is easier during working to just clear everything. There were many instances where we unknowingly used variables that we didn't replace and were puzzled at the plots no changing etc.)
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
del option, stock_price, exercise_price, interest, variance, volatility, walk, time_range, time, expiration_date
|
||||||
|
```
|
251
content/misc/human-diffusion.md
Normal file
251
content/misc/human-diffusion.md
Normal file
|
@ -0,0 +1,251 @@
|
||||||
|
---
|
||||||
|
type: "page"
|
||||||
|
showTableOfContents: true
|
||||||
|
---
|
||||||
|
|
||||||
|
# Human Diffusion
|
||||||
|
|
||||||
|
|
||||||
|
You can find the repository [here](https://github.com/aethrvmn/Human-Diffusion)
|
||||||
|
|
||||||
|
## A Q-Learning Process About The Human Migration From Africa
|
||||||
|
|
||||||
|
We start by importing the proper modules (equivalent to libraries in R).
|
||||||
|
|
||||||
|
These are
|
||||||
|
|
||||||
|
- NumPy
|
||||||
|
- MatPlotLib
|
||||||
|
- Pandas
|
||||||
|
- PIL, (Pillow) an image handler
|
||||||
|
- tqdm, (pronounced ta-qa-dum) from Arabic (taqadum, تقدّم) meaning *progress*, is a simple progress bar to be able to estimate the time for each task
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
#pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
from earth import Earth
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generating the Map
|
||||||
|
We initialise the picture that we want to use, and convert it into pixel values, so we can have a pure black and white image of the earth to use.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
stage = Earth()
|
||||||
|
```
|
||||||
|
|
||||||
|
The following forloop checks each individual pixel and the converts it to black or white. The threshold was found through running the loop many times and picking a number that looked good enough.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
stage.black_and_white('earth.jpg', 'newPixels.csv', 'pure-bw-earth.jpg')
|
||||||
|
```
|
||||||
|
|
||||||
|
We then generate the new picture and save it before we convert it into an array.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
stage.generate_image('pure-bw-earth.jpg')
|
||||||
|
```
|
||||||
|
|
||||||
|
We are now ready to create the map we will need.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
stage.plot('map.jpg')
|
||||||
|
```
|
||||||
|
|
||||||
|
Now that we have our map ready, we can move on to the implementation of the algorithm.
|
||||||
|
|
||||||
|
### Application of the Q-Learning Algorithm
|
||||||
|
|
||||||
|
We import the necessary libraries
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
import numpy as np
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
from tqdm import tqdm
|
||||||
|
|
||||||
|
np.random.seed(1)
|
||||||
|
```
|
||||||
|
|
||||||
|
and define the actions that the agent is able to perform
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
actions = ['west', 'east', 'north', 'south']
|
||||||
|
#coded to 0, 1, 2, 3
|
||||||
|
```
|
||||||
|
|
||||||
|
Then we can generate the Q-map, which gives the rewards.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
q_values = np.random.uniform(-1, 1, size=(stage.height,stage.width, len(actions)))
|
||||||
|
```
|
||||||
|
|
||||||
|
After, we define the functions that we will use, namely tone to generate our starting position, one for the agent to take action either randomly or by checking the Q-table, and one to define the result of the action taken.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
def starting_area(column, row):
|
||||||
|
col = np.random.randint(column[0], column[1])
|
||||||
|
row = np.random.randint(row[0], row[1])
|
||||||
|
return col, row
|
||||||
|
|
||||||
|
|
||||||
|
def next_action(current_height, current_width, epsilon):
|
||||||
|
if np.random.random() < epsilon:
|
||||||
|
move = np.argmax(q_values[current_height, current_width])
|
||||||
|
else:
|
||||||
|
move = np.random.randint(4)
|
||||||
|
return move
|
||||||
|
|
||||||
|
|
||||||
|
def next_location(height, width, action):
|
||||||
|
new_width = width
|
||||||
|
new_height = height
|
||||||
|
|
||||||
|
if actions[action] == 'west' and width > -1:
|
||||||
|
new_width = width - 1
|
||||||
|
|
||||||
|
if actions[action] == 'east' and width < stage.width - 1:
|
||||||
|
new_width = width + 1
|
||||||
|
|
||||||
|
if actions[action] == 'north' and height > 1:
|
||||||
|
new_height = height -1
|
||||||
|
|
||||||
|
if actions[action] == 'south' and height < stage.height:
|
||||||
|
new_height = height +1
|
||||||
|
|
||||||
|
|
||||||
|
return new_height, new_width
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Now we are ready to run the algorithm for the number of episodes we need
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
reward_map = np.zeros(shape=(stage.height,stage.width))
|
||||||
|
reward_map[np.where(stage.map > 0)] = -10
|
||||||
|
reward_map[:10, :] = -15
|
||||||
|
reward_map[610:, :] = -15
|
||||||
|
reward_map[:, 720:] = -15
|
||||||
|
reward_map[:, :10] = -15
|
||||||
|
|
||||||
|
#Arabian bridge
|
||||||
|
# Gulf of Aden
|
||||||
|
reward_map[350:388, 250:282] = 0
|
||||||
|
# Hormuz
|
||||||
|
reward_map[300:340, 290:315] = 0
|
||||||
|
|
||||||
|
# Indonesian bridge
|
||||||
|
# Sumatra
|
||||||
|
reward_map[417:433, 485:495] = 0
|
||||||
|
# Java
|
||||||
|
reward_map[450:455, 495:505] = 0
|
||||||
|
# Brunei
|
||||||
|
reward_map[430:465, 525:530] = 0
|
||||||
|
# New Guinea
|
||||||
|
reward_map[460:465, 525:645] = 0
|
||||||
|
# Australia
|
||||||
|
reward_map[460:505, 525:605] = 0
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# Bering Straight
|
||||||
|
reward_map[30:60, 580:610] = 50
|
||||||
|
# Australia
|
||||||
|
reward_map[510:540, 580:610] = 50
|
||||||
|
|
||||||
|
real_map = np.ones(shape=(stage.height,stage.width))*10
|
||||||
|
real_map[np.where(stage.map > 0)] = -10
|
||||||
|
|
||||||
|
timeline = np.arange(0, 5000)
|
||||||
|
episodes = np.arange(0, 200000)
|
||||||
|
|
||||||
|
reward_per_episode = np.zeros(len(episodes))
|
||||||
|
lifetime = np.zeros(len(episodes))
|
||||||
|
ims = []
|
||||||
|
for episode in tqdm(episodes): #30k
|
||||||
|
|
||||||
|
epsilon = 0.7
|
||||||
|
discount_factor = 0.3
|
||||||
|
learning_rate = 1
|
||||||
|
|
||||||
|
rewards = np.zeros(len(timeline))
|
||||||
|
|
||||||
|
if episode >= 195000: # This statement is the way we destabilise the system to get more natural motion
|
||||||
|
|
||||||
|
# India
|
||||||
|
reward_map[390, 388] = 20
|
||||||
|
# New Guinea Papua
|
||||||
|
reward_map[455, 650] = 20
|
||||||
|
# Brunei
|
||||||
|
reward_map[425, 540] = 20
|
||||||
|
#Australia
|
||||||
|
reward_map[510:540, 580:610] = 50
|
||||||
|
|
||||||
|
|
||||||
|
old_height, old_width = 400, 230
|
||||||
|
height, width = starting_area([old_height-5, old_height+5], [old_width-5, old_width+5])
|
||||||
|
|
||||||
|
for year in timeline:
|
||||||
|
try:
|
||||||
|
|
||||||
|
action = next_action(height, width, epsilon)
|
||||||
|
old_height, old_width = height, width
|
||||||
|
height, width = next_location(height, width, action)
|
||||||
|
|
||||||
|
reward = reward_map[height, width]
|
||||||
|
rewards[year] = reward
|
||||||
|
|
||||||
|
old_q_value = q_values[old_height, old_width, action]
|
||||||
|
temporal_difference = reward + (discount_factor*np.max(q_values[height, width])) - old_q_value
|
||||||
|
|
||||||
|
new_q_value = old_q_value + (learning_rate * temporal_difference)
|
||||||
|
q_values[old_height, old_width, action] = new_q_value
|
||||||
|
|
||||||
|
if reward_map[old_height, old_width] > 0:
|
||||||
|
reward_map[old_height, old_width] = 0
|
||||||
|
|
||||||
|
real_map[old_height, old_width] = 5
|
||||||
|
|
||||||
|
except IndexError as e:
|
||||||
|
break
|
||||||
|
|
||||||
|
if year == timeline[-1]:
|
||||||
|
lifetime[episode] = year
|
||||||
|
|
||||||
|
if reward_map[old_height, old_width] <= -10 and reward_map[height, width] <= -10:
|
||||||
|
lifetime[episode] = year
|
||||||
|
break
|
||||||
|
|
||||||
|
reward_per_episode[episode] = np.mean(rewards)
|
||||||
|
if reward_map[510:540, 580:610].all() == 0:
|
||||||
|
#Australia
|
||||||
|
reward_map[510:540, 580:610] = 50
|
||||||
|
|
||||||
|
if reward_map[30:60, 580:610].all() == 0:
|
||||||
|
# Bering Straight
|
||||||
|
reward_map[30:60, 580:610] = 50
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
plt.figure(figsize = (10,10))
|
||||||
|
plt.ylabel('Latitude')
|
||||||
|
plt.xlabel('Logntitude')
|
||||||
|
plt.xticks([])
|
||||||
|
plt.yticks([])
|
||||||
|
plt.imshow(real_map, cmap = 'ocean_r')
|
||||||
|
plt.show()
|
||||||
|
```
|
21
content/misc/master-thesis.md
Normal file
21
content/misc/master-thesis.md
Normal file
|
@ -0,0 +1,21 @@
|
||||||
|
---
|
||||||
|
type: "page"
|
||||||
|
showTableOfContents: true
|
||||||
|
---
|
||||||
|
# Reinforcement Learning: Theory and Implementation in a Custom Environment
|
||||||
|
|
||||||
|
You can find the thesis [here](/pdfs/mthesis.pdf) and the code [here](https://github.com/aethrvmn/GodotPneumaRL)
|
||||||
|
|
||||||
|
## Abstract
|
||||||
|
|
||||||
|
Reinforcement Learning (RL) is a subcategory of Machine Learning that consistently surpasses human performance and demonstrates superhuman understanding in various environments and datasets. Its applications span from mastering games like Go and Chess to optimizing real-world operations in robotics, finance, and healthcare. The adaptability and efficiency of RL algorithms in dynamic and complex scenarios highlight their transformative potential across multiple domains.
|
||||||
|
|
||||||
|
In this thesis, we present some core concepts of Reinforcement Learning.
|
||||||
|
|
||||||
|
First, we introduce the mathematical foundation of Reinforcement Learning (RL) through the Multi-Armed Bandit (MAB) problem, which serves as a simplified model for decision-making problems without state transitions, focusing solely on the trade-off between exploration and exploitation. We then extend the discussion to the more complex Markov Decision Processes (MDPs), which provide a formal framework for modeling decision-making problems where outcomes are partly random and partly under the control of a decision-maker, involving state transitions influenced by actions. Finally, we give an overview of the two main branches of Reinforcement Learning: value-based methods, which focus on estimating the value of states or state-action pairs, and policy-based methods, which directly optimize the policy that dictates the agent's actions.
|
||||||
|
|
||||||
|
We focus on Proximal Policy Optimization (PPO), which is the *de facto* baseline algorithm in modern RL literature due to its robustness and ease of implementation. We discuss its potential advantages, such as improved sample efficiency and stability, as well as its disadvantages, including sensitivity to hyper-parameters and computational overhead. We emphasize the importance of fine-tuning PPO to achieve optimal performance.
|
||||||
|
|
||||||
|
We demonstrate the application of these concepts within *Pneuma*, a custom-made environment specifically designed for this thesis. *Pneuma* aims to become a research base for independent Multi-Agent Reinforcement Learning (MARL), where multiple agents learn and interact within the same environment. We outline the requirements for such environments to support MARL effectively and detail the modifications we made to the baseline PPO method, as presented by OpenAI, to facilitate agent convergence for a single-agent level.
|
||||||
|
|
||||||
|
Finally, we discuss the potential for future enhancements to the *Pneuma* environment to increase its complexity and realism, aiming to create a more RPG-like setting, optimal for training agents in complex, multi-objective, and multi-step tasks.
|
|
@ -21,12 +21,12 @@ noClasses = true
|
||||||
style = "monokai"
|
style = "monokai"
|
||||||
tabWidth = 4
|
tabWidth = 4
|
||||||
|
|
||||||
[permalinks]
|
# [permalinks]
|
||||||
post = "/post/:year/:month/:day/:slug/"
|
# post = "/post/:year/:month/:day/:slug/"
|
||||||
|
|
||||||
[[menu.main]]
|
[[menu.main]]
|
||||||
name = "life"
|
name = "misc"
|
||||||
url = "/life/"
|
url = "/misc/"
|
||||||
weight = 2
|
weight = 2
|
||||||
|
|
||||||
[[menu.main]]
|
[[menu.main]]
|
||||||
|
@ -49,3 +49,4 @@ description = "A simple, minimal, personal website, based on the hugo-classic th
|
||||||
footer = """
|
footer = """
|
||||||
[github/aethrvmn](https://github.com/aethrvmn) | [sr.ht:~aethrvmn](https://git.sr.ht/~aethrvmn) | [@aethrvmn@sigmoid.social](https://sigmoid.social/@aethrvmn) | [t.me/aethrvmn](https://t.me/aethrvmn)
|
[github/aethrvmn](https://github.com/aethrvmn) | [sr.ht:~aethrvmn](https://git.sr.ht/~aethrvmn) | [@aethrvmn@sigmoid.social](https://sigmoid.social/@aethrvmn) | [t.me/aethrvmn](https://t.me/aethrvmn)
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
|
@ -7,6 +7,8 @@
|
||||||
|
|
||||||
{{ .Content }}
|
{{ .Content }}
|
||||||
|
|
||||||
|
<!--
|
||||||
|
{{if not .IsHome }}
|
||||||
<ul>
|
<ul>
|
||||||
{{ $pages := .Pages }}
|
{{ $pages := .Pages }}
|
||||||
{{ if .IsHome }}{{ $pages = .Site.RegularPages }}{{ end }}
|
{{ if .IsHome }}{{ $pages = .Site.RegularPages }}{{ end }}
|
||||||
|
@ -17,5 +19,7 @@
|
||||||
</li>
|
</li>
|
||||||
{{ end }}
|
{{ end }}
|
||||||
</ul>
|
</ul>
|
||||||
|
{{ end }}
|
||||||
|
-->
|
||||||
</div>
|
</div>
|
||||||
{{ partial "footer.html" . }}
|
{{ partial "footer.html" . }}
|
||||||
|
|
BIN
static/android-chrome-192x192.png
Normal file
BIN
static/android-chrome-192x192.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 24 KiB |
BIN
static/android-chrome-512x512.png
Normal file
BIN
static/android-chrome-512x512.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 111 KiB |
BIN
static/apple-touch-icon.png
Normal file
BIN
static/apple-touch-icon.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 21 KiB |
|
@ -5,7 +5,7 @@ html{
|
||||||
}
|
}
|
||||||
|
|
||||||
body {
|
body {
|
||||||
max-width: 800px;
|
max-width: 70%;
|
||||||
margin: auto;
|
margin: auto;
|
||||||
padding: .2em;
|
padding: .2em;
|
||||||
line-height: 1.5em;
|
line-height: 1.5em;
|
||||||
|
@ -131,7 +131,7 @@ main a {
|
||||||
|
|
||||||
#personal {
|
#personal {
|
||||||
min-width: 40px;
|
min-width: 40px;
|
||||||
max-width: 200px;
|
max-width: 270px;
|
||||||
float: right;
|
float: right;
|
||||||
padding: 10px;
|
padding: 10px;
|
||||||
display: block;
|
display: block;
|
||||||
|
|
BIN
static/favicon-16x16.png
Normal file
BIN
static/favicon-16x16.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 728 B |
BIN
static/favicon-32x32.png
Normal file
BIN
static/favicon-32x32.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.7 KiB |
BIN
static/favicon.ico
Normal file
BIN
static/favicon.ico
Normal file
Binary file not shown.
After Width: | Height: | Size: 15 KiB |
Loading…
Reference in a new issue