Autonomous Car Parking Simulator
using Unity MLAgents

How do we simulate an AI to find a parking spot?
and then park the car?

Problem Statement

The problem is simple, there's a car in a parking lot, and there is a parking spot at a random position and the Job of the AI is to find the parking spot in it maybe there is a person walking by so the AI has to make sure it avoids hitting the person, as well as avoiding hitting other parked cars

This seems like a perfect case for reinforcement learning and Unity can even help us simulate the environment and look at it in real time to see the AI park the car avoiding the obstacles

Link to the Github repo

Link to the video walkthrough of the project

Environment Setup

Using a few free assets from the unity asset store I found this simple vehicle pack and this simple human asset once I had the basic assets gathered and ready I created the environment that looks like this

Parking lot Environment — Parking Lot Environment

Overview of the environment

The environment consists of a parking lot with certain cars, already parked, and the agent car that is looking for a parking spot. To ensure there is not one specific parking spot that is available or vacant, I have randomized the spot and other parked cars, as well as the agent car that requires to be spawned in different positions of the parking lot and in different orientations as well.

Note: All the cars and the vacant spots are generated randomly with a chance of 85% that the spot will be occupied. So, there might be some episodes where there are no vacant spots in the lot, which encourages the agent to look through the whole parking lot for the spot.

Building the environment

Parking Area

I created the parking lot which is a square shaped plane with walls on the sides, the area has multiple spawn positions, for randomly spawning the parked cars and the agent car, using script, ParkingArea.cs

Script Description

randomSpawnCarsAndParking() This method helps randomly spawning either a car or a parking on the positions marked in blue diamonds in the above image
clearParking() This method allows the area to be cleared for next random spawning once the episode ends
resetArea() This method is created to call the above two functions one after the other from the agent script

Car Setting

This script contains the variable for margin multiplication where the agent car would spawn, the larger the margin the more difficult it is to train, as the agent car would spawn randomly at different locations

Collision Detection Scripts

obstacleDetect.cs
This script will detect when the agent car collides with any of the spawned parked cars, and call the hitACar() method inside the agent script
wallDetect.cs
This script will detect when the agent car collides with the side walls of the parking lot and call the hitAWall() method inside the agent script
parkingDetect.cs
This script will detect when the agent car properly finds the empty parking spot and drives to it. It will call the parked() method inside the agent script

Configuring the Agent

This is our agent car, the car that needs to find its parking spot. For making sure the car behaves the way it should, I apply following build:

I added wheel colliders onto the front wheels of the car to enable real like driving ability to the car

Steer Wheels: The two wheels in front will be able to have the steering ability
Driving Wheels: Considering the car is front wheel drive car I add the driving ability to the front wheels as well. Adding a box collider on the car, with a rigid body that weighs around 1500kg, just to make the car behave realistically.
Adding Ray Perception Sensors

Added 3-dimensional ray perception sensors onto the car, in the front and in the back, which would be able to sense objects in 10m range.

Behaviour Parameters
- Vector Action Space Type:
  Since this is a case where the agent should be able to partially steer the car, we will be keeping the vector actions continuous
- Vector action Space Size: 2
  The car can either go forward or reverse which constitutes as one action space. Also, the car can turn left or right, which constitutes as one action space. Hence, we use two as the vector action space size.
- Decision requester:
  We keep the decision period to 5 as we don’t want the AI to make a decision before every action. And we also allow the AI to take actions between decisions.
Agent Script (carAgent.cs)
Now we can go ahead and write the agent script Following are the important functions we defined in the script
- GetRandomSpawnPos()
  Get a random Spawn position for the agent car in the center area of the parking lot depending upon the spawn area margin multiplier
- hitACar()
  Punishing the agent for hitting another parked car, we add a reward of -0.1f
- hitAWall()
  Punishing the agent for hitting the wall of the parking lot, we add a reward of -0.1f
- hitAHuman()
  Punishing the agent for hitting the human walking by, we add a reward of -0.3f
- parked()
  Positive reward (+5.0f) for finding the empty parking spot
- OnActionReceived()
- moveAgent():
  This function actually drives the car, depending upon the action received, this method will steer and accelerate the car.
- OnEpisodeBegin():

Note: Please refer to the script, CarAgent.cs, (path: \CarParking\Assets\Scripts\CarAgent.cs) it is commented with all the details about the functions

Training the Agent

Now that I have configured the agent and the environment, I can move onto the training phase, we need to train this for a significant amount of time so we want to create multiple instances of the environment, so I created about 12 instances

Hyperparameters for PPO

default:
      trainer: ppo
      batch_size: 1024
      beta: 5.0e-3
      buffer_size: 10240
      epsilon: 0.2
      hidden_units: 128
      lambd: 0.95
      learning_rate: 3.0e-4
      learning_rate_schedule: linear
      max_steps: 5.0e5
      memory_size: 128
      normalize: false
      num_epoch: 3
      num_layers: 2
      time_horizon: 64
      sequence_length: 64
      summary_freq: 10000
      use_recurrent: false
      vis_encode_type: simple
      reward_signals:
          extrinsic:
              strength: 1.0
              gamma: 0.99
      summary_freq: 30000
      time_horizon: 512
      batch_size: 512
      buffer_size: 2048
      hidden_units: 256
      num_layers: 3
      beta: 1.0e-2
      max_steps: 1.0e7
      num_epoch: 3
      reward_signals:
          extrinsic:
              strength: 1.0
              gamma: 0.99
          curiosity:
              strength: 0.02
              gamma: 0.99
              encoding_size: 256

I added extrinsic reward and curiosity to the agent, so that it will explore a bit more in the parking lot. With gamma as 0.99 the curiosity and the reward diminish every time.

Training Using Generative Adversarial Imitation Learning(GAIL):

For the AI to understand the environment early on, we can provide some demos, so I decided to record 100 demo instances to provide our AI. I played a 100 episode and provided the recorded model for learning by adding its path to the PPO hyperparameters while training.

      gail:
        strength: 0.02
        gamma: 0.99
        encoding_size: 128
        use_actions: true
        demo_path: ../demos/ExpertParker.demo

Results

I trained the agent for 10M episodes, here is a short clip of the inference of the trained Ai

Analysis of the results

Cumulative Reward
It the agent is training this should be increasing, as the agent will find the parking more efficiently
After 30K Steps
- Training 1: 0.2871
- Training 2: 0.08488
After 9.99M Steps
- Training 1: 3.252
- Training 2: 4.29
Episode Length
Should be decreasing, which would mean that the AI is finding the parking spot faster
After 30K Steps
- Training 1: 840.2
- Training 2: 896.2
After 9.99M Steps
- Training 1: 370
- Training 2: 179.1
Value Loss
Correlates to how well the model is able to predict the value of each state. This should increase while the agent is learning, as the policy that it created keeps changing after each and every episode and then decrease once the reward stabilizes, which means that the policy is pretty good at taking actions and is getting the expected rewards at the states. We can train this agent for even longer time, as the value loss hasn’t decreased significantly
Policy Extrinsic Value Estimate
The mean value estimate for all states visited by the agent. Should increase during a successful training session
GAIL Reward
Should decrease, as the AI stopped taking inference from our demonstrations and learned on his own
Learning rate
How large a step the training algorithm takes as it searches for the optimal policy. Should decrease over time. Coincides for both the trainings.

Conclusion

Can be trained for more episodes: Even though it's converging at about 4M with good enough results, it still sometimes crashes on to sa human or a parked car, Currently ran for 10M cycles and took about 11 hours on an i5 4th generation processor
Sensors can be added in left and right as well for the agent to get better understanding of the Environment, it currently can not "see" on its left and right and hence sometimes the human walking back and forth crashes onto it, confusing it as to why it got punished and that might change the policy in a weird way, because it won't be able to understand that it is cutting off a human walking

Autonomous Car Parking Simulator using Unity MLAgents

How do we simulate an AI to find a parking spot? and then park the car?