Autonomous Car Parking Simulator
using Unity MLAgents

How do we simulate an AI to find a parking spot?
and then park the car?

Problem Statement

The problem is simple, there's a car in a parking lot, and there is a parking spot at a random position and the Job of the AI is to find the parking spot in it maybe there is a person walking by so the AI has to make sure it avoids hitting the person, as well as avoiding hitting other parked cars

This seems like a perfect case for reinforcement learning and Unity can even help us simulate the environment and look at it in real time to see the AI park the car avoiding the obstacles

Link to the Github repo

Link to the video walkthrough of the project

Environment Setup

Using a few free assets from the unity asset store I found this simple vehicle pack and this simple human asset once I had the basic assets gathered and ready I created the environment that looks like this

Parking lot Environment
Parking Lot Environment

Overview of the environment

The environment consists of a parking lot with certain cars, already parked, and the agent car that is looking for a parking spot. To ensure there is not one specific parking spot that is available or vacant, I have randomized the spot and other parked cars, as well as the agent car that requires to be spawned in different positions of the parking lot and in different orientations as well.

Note: All the cars and the vacant spots are generated randomly with a chance of 85% that the spot will be occupied. So, there might be some episodes where there are no vacant spots in the lot, which encourages the agent to look through the whole parking lot for the spot.

Building the environment

Configuring the Agent

This is our agent car, the car that needs to find its parking spot. For making sure the car behaves the way it should, I apply following build:

Parking lot Environment
Agent car

I added wheel colliders onto the front wheels of the car to enable real like driving ability to the car

Parking lot Environment
Agent car

Training the Agent

Now that I have configured the agent and the environment, I can move onto the training phase, we need to train this for a significant amount of time so we want to create multiple instances of the environment, so I created about 12 instances

Parking lot Environment
Multiple Instances of the environment for faster training
Hyperparameters for PPO
default:
      trainer: ppo
      batch_size: 1024
      beta: 5.0e-3
      buffer_size: 10240
      epsilon: 0.2
      hidden_units: 128
      lambd: 0.95
      learning_rate: 3.0e-4
      learning_rate_schedule: linear
      max_steps: 5.0e5
      memory_size: 128
      normalize: false
      num_epoch: 3
      num_layers: 2
      time_horizon: 64
      sequence_length: 64
      summary_freq: 10000
      use_recurrent: false
      vis_encode_type: simple
      reward_signals:
          extrinsic:
              strength: 1.0
              gamma: 0.99
      summary_freq: 30000
      time_horizon: 512
      batch_size: 512
      buffer_size: 2048
      hidden_units: 256
      num_layers: 3
      beta: 1.0e-2
      max_steps: 1.0e7
      num_epoch: 3
      reward_signals:
          extrinsic:
              strength: 1.0
              gamma: 0.99
          curiosity:
              strength: 0.02
              gamma: 0.99
              encoding_size: 256
      

I added extrinsic reward and curiosity to the agent, so that it will explore a bit more in the parking lot. With gamma as 0.99 the curiosity and the reward diminish every time.

Training Using Generative Adversarial Imitation Learning(GAIL):

For the AI to understand the environment early on, we can provide some demos, so I decided to record 100 demo instances to provide our AI. I played a 100 episode and provided the recorded model for learning by adding its path to the PPO hyperparameters while training.

      gail:
        strength: 0.02
        gamma: 0.99
        encoding_size: 128
        use_actions: true
        demo_path: ../demos/ExpertParker.demo
      
Parking lot Environment
Training at timescale=1

Results

I trained the agent for 10M episodes, here is a short clip of the inference of the trained Ai

Parking lot Environment
Running Inference with the trained Ai
Parking lot Environment
Environment Cumulative Reward
Parking lot Environment
Parking Environment Episode Length
Parking lot Environment
Policy Value Loss
Parking lot Environment
Policy Extrinsic Reward
Parking lot Environment
Policy Extrinsic Value Estimate
Parking lot Environment
Policy GAIL Reward

Analysis of the results

Conclusion