jaclearn.rl.envs.maze.maze#

Classes

CustomLavaWorldEnv

A maze similar to Lava World in OpenAI Gym

MazeEnv

Create a maze environment.

Class CustomLavaWorldEnv

class CustomLavaWorldEnv[source]#

Bases: MazeEnv

A maze similar to Lava World in OpenAI Gym

__init__(map_size=15, mode=None, **kwargs)[source]#
Parameters:
  • map_size – A single int or a tuple (h, w), representing the map size.

  • visible_size – A single int or a tuple (h, w), representing the visible size. The agent will at the center of the visible window, and out-of-border part will be colored by obstacle color.

  • obs_ratio – Obstacle ratio (how many obstacles will be in the map).

  • enable_path_checking – Enable path computation in map construction. Turn it down only when you are sure about valid maze.

  • random_action_mapping – Whether to enable random action mapping. If true, the result of performing every action will be shuffled. _checkingIf a single bool True is provided, we do random shuffle. Otherwise, it should be a list with same length as action space (5 when noaction enabled, 4 otherwise).

  • enable_noaction – Whether to enable no-action operation.

  • dense_reward – Whether the reward is dense.

  • reward_move – Reward for a valid move. For dense reward setting, it should be a positive number. While in sparse reward setting, it is expected to be a non-positive number.

  • reward_noaction – Reward for a no-action.

  • reward_final – Reward when you arrive at the final point.

  • reward_error – Reward when you perform an invalid move.

  • state_mode – State mode, either ‘DEFAULT’ or ‘RENDER’.

__new__(**kwargs)#
action(action)#
append_stat(name, value)#
clear_stats()#
evaluate_one_episode(func)#
finish(*args, **kwargs)#
play_one_episode(func, ret_states=False, ret_actions=False, restart_kwargs=None, finish_kwargs=None, max_steps=10000)#
restart(obstacles=None, start_point=None, final_point=None)[source]#
property action_delta#

the tuple (dy, dx) when you perform action i

Type:

Action deltas

property action_mapping#

If random action mapping is enabled, return the internal mapping

property action_space#
property canvas#

Return the raw canvas (full)

property canvas_size#

Canvas size

property current_point#

Current point (r, c)

property current_state#
property distance_mat#

Distance matrix

property distance_prev#

Distance-prev matrix

property final_point#

Finish point (r, c)

property inv_distance_mat#
property inv_distance_prev#
property lv_finals#
property lv_obstacles#
property lv_starts#
property map_size#

Map size

property obstacles#
property origin_canvas#

Return the original canvas (at time 0, full)

property quick_distance_mat#

this is done during the first run of SPFA, so if you ensure that all valid points are in the same connected component, you can use it

Type:

Distance matrix

property quick_distance_prev#

see also quick_distance_mat

Type:

Distance-prev matrix

property rewards#

A tuple of 4 value, representing the rewards for each action: (Move, Noaction, Arrive final point, Move Err)

property shortest_path#

One of the shortest paths from start to finish, list of point (r, c)

property start_point#

Start point (r, c)

property stats#
property unwrapped#
property visible_size#

Visible size

Class MazeEnv

class MazeEnv[source]#

Bases: SimpleRLEnvBase

Create a maze environment.

__init__(map_size=14, visible_size=None, obs_ratio=0.3, enable_path_checking=True, random_action_mapping=None, enable_noaction=False, dense_reward=False, reward_move=None, reward_noaction=0, reward_final=10, reward_error=-2, state_mode='DEFAULT')[source]#
Parameters:
  • map_size – A single int or a tuple (h, w), representing the map size.

  • visible_size – A single int or a tuple (h, w), representing the visible size. The agent will at the center of the visible window, and out-of-border part will be colored by obstacle color.

  • obs_ratio – Obstacle ratio (how many obstacles will be in the map).

  • enable_path_checking – Enable path computation in map construction. Turn it down only when you are sure about valid maze.

  • random_action_mapping – Whether to enable random action mapping. If true, the result of performing every action will be shuffled. _checkingIf a single bool True is provided, we do random shuffle. Otherwise, it should be a list with same length as action space (5 when noaction enabled, 4 otherwise).

  • enable_noaction – Whether to enable no-action operation.

  • dense_reward – Whether the reward is dense.

  • reward_move – Reward for a valid move. For dense reward setting, it should be a positive number. While in sparse reward setting, it is expected to be a non-positive number.

  • reward_noaction – Reward for a no-action.

  • reward_final – Reward when you arrive at the final point.

  • reward_error – Reward when you perform an invalid move.

  • state_mode – State mode, either ‘DEFAULT’ or ‘RENDER’.

__new__(**kwargs)#
action(action)#
append_stat(name, value)#
clear_stats()#
evaluate_one_episode(func)#
finish(*args, **kwargs)#
play_one_episode(func, ret_states=False, ret_actions=False, restart_kwargs=None, finish_kwargs=None, max_steps=10000)#
restart(obstacles=None, start_point=None, final_point=None)[source]#
property action_delta#

the tuple (dy, dx) when you perform action i

Type:

Action deltas

property action_mapping#

If random action mapping is enabled, return the internal mapping

property action_space#
property canvas#

Return the raw canvas (full)

property canvas_size#

Canvas size

property current_point#

Current point (r, c)

property current_state#
property distance_mat#

Distance matrix

property distance_prev#

Distance-prev matrix

property final_point#

Finish point (r, c)

property inv_distance_mat#
property inv_distance_prev#
property map_size#

Map size

property obstacles#
property origin_canvas#

Return the original canvas (at time 0, full)

property quick_distance_mat#

this is done during the first run of SPFA, so if you ensure that all valid points are in the same connected component, you can use it

Type:

Distance matrix

property quick_distance_prev#

see also quick_distance_mat

Type:

Distance-prev matrix

property rewards#

A tuple of 4 value, representing the rewards for each action: (Move, Noaction, Arrive final point, Move Err)

property shortest_path#

One of the shortest paths from start to finish, list of point (r, c)

property start_point#

Start point (r, c)

property stats#
property unwrapped#
property visible_size#

Visible size