--- title: Human Diffusion --- You can find the repository [here](https://github.com/aethrvmn/Human-Diffusion) ## A Q-Learning Process About The Human Migration From Africa We start by importing the proper modules. These are - NumPy - MatPlotLib - Pandas - PIL, (Pillow) an image handler - tqdm, (pronounced ta-qa-dum) from Arabic (taqadum, تقدّم) meaning *progress*, is a simple progress bar to be able to estimate the time for each task ```python #pip install -r requirements.txt ``` ```python from earth import Earth ``` ### Generating the Map We initialise the picture that we want to use, and convert it into pixel values, so we can have a pure black and white image of the earth to use. ```python stage = Earth() ``` The following forloop checks each individual pixel and the converts it to black or white. The threshold was found through running the loop many times and picking a number that looked good enough. ```python stage.black_and_white('earth.jpg', 'newPixels.csv', 'pure-bw-earth.jpg') ``` We then generate the new picture and save it before we convert it into an array. ```python stage.generate_image('pure-bw-earth.jpg') ``` We are now ready to create the map we will need. ```python stage.plot('map.jpg') ``` Now that we have our map ready, we can move on to the implementation of the algorithm. ### Application of the Q-Learning Algorithm We import the necessary libraries ```python import numpy as np import matplotlib.pyplot as plt from tqdm import tqdm np.random.seed(1) ``` and define the actions that the agent is able to perform ```python actions = ['west', 'east', 'north', 'south'] #coded to 0, 1, 2, 3 ``` Then we can generate the Q-map, which gives the rewards. ```python q_values = np.random.uniform(-1, 1, size=(stage.height,stage.width, len(actions))) ``` After, we define the functions that we will use, namely tone to generate our starting position, one for the agent to take action either randomly or by checking the Q-table, and one to define the result of the action taken. ```python def starting_area(column, row): col = np.random.randint(column[0], column[1]) row = np.random.randint(row[0], row[1]) return col, row def next_action(current_height, current_width, epsilon): if np.random.random() < epsilon: move = np.argmax(q_values[current_height, current_width]) else: move = np.random.randint(4) return move def next_location(height, width, action): new_width = width new_height = height if actions[action] == 'west' and width > -1: new_width = width - 1 if actions[action] == 'east' and width < stage.width - 1: new_width = width + 1 if actions[action] == 'north' and height > 1: new_height = height -1 if actions[action] == 'south' and height < stage.height: new_height = height +1 return new_height, new_width ``` Now we are ready to run the algorithm for the number of episodes we need ```python reward_map = np.zeros(shape=(stage.height,stage.width)) reward_map[np.where(stage.map > 0)] = -10 reward_map[:10, :] = -15 reward_map[610:, :] = -15 reward_map[:, 720:] = -15 reward_map[:, :10] = -15 #Arabian bridge # Gulf of Aden reward_map[350:388, 250:282] = 0 # Hormuz reward_map[300:340, 290:315] = 0 # Indonesian bridge # Sumatra reward_map[417:433, 485:495] = 0 # Java reward_map[450:455, 495:505] = 0 # Brunei reward_map[430:465, 525:530] = 0 # New Guinea reward_map[460:465, 525:645] = 0 # Australia reward_map[460:505, 525:605] = 0 # Bering Straight reward_map[30:60, 580:610] = 50 # Australia reward_map[510:540, 580:610] = 50 real_map = np.ones(shape=(stage.height,stage.width))*10 real_map[np.where(stage.map > 0)] = -10 timeline = np.arange(0, 5000) episodes = np.arange(0, 200000) reward_per_episode = np.zeros(len(episodes)) lifetime = np.zeros(len(episodes)) ims = [] for episode in tqdm(episodes): #30k epsilon = 0.7 discount_factor = 0.3 learning_rate = 1 rewards = np.zeros(len(timeline)) if episode >= 195000: # This statement is the way we destabilise the system to get more natural motion # India reward_map[390, 388] = 20 # New Guinea Papua reward_map[455, 650] = 20 # Brunei reward_map[425, 540] = 20 #Australia reward_map[510:540, 580:610] = 50 old_height, old_width = 400, 230 height, width = starting_area([old_height-5, old_height+5], [old_width-5, old_width+5]) for year in timeline: try: action = next_action(height, width, epsilon) old_height, old_width = height, width height, width = next_location(height, width, action) reward = reward_map[height, width] rewards[year] = reward old_q_value = q_values[old_height, old_width, action] temporal_difference = reward + (discount_factor*np.max(q_values[height, width])) - old_q_value new_q_value = old_q_value + (learning_rate * temporal_difference) q_values[old_height, old_width, action] = new_q_value if reward_map[old_height, old_width] > 0: reward_map[old_height, old_width] = 0 real_map[old_height, old_width] = 5 except IndexError as e: break if year == timeline[-1]: lifetime[episode] = year if reward_map[old_height, old_width] <= -10 and reward_map[height, width] <= -10: lifetime[episode] = year break reward_per_episode[episode] = np.mean(rewards) if reward_map[510:540, 580:610].all() == 0: #Australia reward_map[510:540, 580:610] = 50 if reward_map[30:60, 580:610].all() == 0: # Bering Straight reward_map[30:60, 580:610] = 50 plt.figure(figsize = (10,10)) plt.ylabel('Latitude') plt.xlabel('Logntitude') plt.xticks([]) plt.yticks([]) plt.imshow(real_map, cmap = 'ocean_r') plt.show() ```