DeepRacer Guide — Knowledge sharing

AWS DeepRacer — Ultimate Beginner Guide v2

Unlock your virtual racing potential with a wealth of tips and strategies

TinyMightyByte
7 min readSep 27, 2024

A while back, I started writing a detailed guide on AWS DeepRacer . But after some reflection, I realized something: you can only learn so much from a guide.(Link)

The real magic? It happens when you start experimenting, diving in, and learning from each race, mistake, and victory.

So, let’s skip the theory and get into the stuff that really matters.

Whether you’re a seasoned coder or a curious beginner eager to dip your toes into reinforcement learning, this guide is your roadmap. We’ll explore critical concepts, break down key strategies, and provide all the essential tips you need to leave your competition in the dust.

Ready to get started?

What is AWS DeepRacer?

Imagine this: AI-powered, 1/18th scale race car zooming around a track, learning from every lap.

That’s AWS DeepRacer in a nutshell. Designed by Amazon Web Services, it’s an incredibly fun way for developers to learn about reinforcement learning (RL). You get to train your car in a virtual simulator and race it in global competitions for both glory and prizes.

But what makes DeepRacer more than just a tech toy?

At its core, DeepRacer is a learning tool, teaching you the fundamentals of RL while making the process enjoyable and tangible. You’re not just watching code run — you’re watching it race..

The Magic of Reinforcement Learning

At the heart of every race car’s success lies reinforcement learning. But what exactly is RL?

Put simply, RL is a type of machine learning where an agent learns by doing. The agent makes decisions, gets feedback (rewards or penalties), and uses this feedback to improve. Over time, it learns what works and what doesn’t.

Reinforcement Learning — Flow chart
Reinforcement Learning — Flow chart

Here’s a breakdown:

  • Agent: That’s your car, figuring out how to navigate the twists and turns of the track.
  • Environment: The track itself, with all its curves and challenges.
  • Action: Your car’s decisions, like whether to steer left, right, or speed up.
  • Reward: Positive feedback for good behavior (staying on track), and penalties for missteps (going off track).
  • Policy: This is the strategy your car follows, deciding what to do in each situation.

Over time, your car improves — not because you told it what to do, but because it learned through trial and error, just like a human driver finding the best racing line

Training Algorithms

AWS DeepRacer gives you two powerful algorithms to train your car: Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC).

  • PPO: Works with both discrete and continuous actions, and is an on-policy algorithm, meaning it learns from its current experience.
  • SAC: Focuses on continuous actions and is off-policy, allowing it to learn from a variety of experiences, even older ones.

Both algorithms have their strengths. The challenge is finding the balance between exploration (trying new things) and exploitation (relying on what your car already knows).

Too much exploration wastes time, while too much exploitation might keep your car stuck in a less-than-perfect strategy.

AWS DeepRacer Service Architecture

Behind every AWS DeepRacer model is a robust architecture of AWS services: Amazon SageMaker (for training), AWS RoboMaker (for simulations), and Amazon S3, among others. Together, they create a smooth ecosystem that lets you train, evaluate, and tweak your models, all while tracking progress.

Why does this matter?

Understanding this architecture helps you make the most of AWS DeepRacer’s full capabilities. Think of it like knowing your car’s engine — it’s not essential to race, but knowing it gives you an edge.

Underling Architecture of AWS DeepRacer
Underlying Architecture of AWS DeepRacer

Approaching Code Writing for DeepRacer

What is happening under the hood?
What is happening under the hood?

Your car’s “brain” is a Convolutional Neural Network (CNN), interpreting the world through its front-facing camera. The key takeaway?

The image captured by the car’s camera is the state or observation the car learns from.

This is how the car “sees” the track. As a developer, your code is what teaches the car how to react to those observations.

This is what car sees through the camera
Where is the focus? — An image from Analysis Log
Where is the focus? — An image from Analysis Log

Crafting the Reward Function

The reward function is the heart of your DeepRacer model.

It’s where you define how your car learns. If the car makes a good move, it gets rewarded; if it messes up, it gets penalized.

Simple, right?

But here’s the trick: the simpler your reward function is, the better you’ll be able to understand and plan your next move. Start small.

Think this: Want your car to stay on track? What are the parameters we can use easily?

Here’s a basic reward function to get you started (from documentation):

def reward_function(params):
if params[‘all_wheels_on_track’]:
reward = 1.0
else:
reward = 1e-3
return float(reward)

Here it is continuously checking whether all wheels are on track. If not, the reward gets reduced over the time.

As you get better, you can gradually introduce more parameters like speed, steering angles, or how far it is from the centerline.

Tips for Better Performance

These tips will give your DeepRacer the edge it needs:

1.Racing Line Calculations:

The fastest path around the track is key.

Draw the optimal line on paper — it helps visualize the ideal path and code accordingly.
— Incorporating
waypoints in your reward function encourages your car to stick to this path.

Waypoint diagram

2.Start Simple, Add Complexity Later:
A complex reward function might seem powerful, but it can easily confuse the model.

  • Begin with simple behaviors like staying on track, then introduce more layers as you go.
  • Learn All the parameters and their behaviors. Then you can filter out the required ones.

Want to write the simplest reward function ?

 def reward_function(params):
return params[‘speed’]

According to the Boltron Racing Team YT channel this one line reward function got good potential. (with higher training hours)

3.Simulation Log Analysis:

  • Don’t just run simulations blindly. Dig into the logs, figure out what worked, what didn’t, and use that data to improve your model.
  • Search for AWS log analysis workbooks and utilize those with the given guidance.

4.Action Space Optimization:

  • Limit the action space to the most effective steering angles and speeds for your track. A smaller, focused action space can accelerate training and improve performance.
  • Select suitable Action space.
    - In PPO, you define a discrete action space with specific actions (like turning left, right, or going straight).
    - Continuous action space can be used in both PPO and SAC, allowing for more flexible, fine-tuned control over speed and steering.

In my experience, Even I wanted to use discreate Action Space, I always ended up using Continuous Actions Space due to its smoothness on the track!!.

5.Hyperparameter Tuning:

  • Adjusting hyperparameters like learning rate, batch size, and epochs can significantly affect your car’s performance. These affect how fast your model learns and can significantly impact your performance.

6.Transfer Learning:

Don’t start from scratch! Clone pre-trained models and tweak them for your specific track.

  • Clone a successful model, fine-tune its reward function, and adjust hyperparameters.

7.Advanced Reward Shaping:

  • Combine multiple factors when you are ready to make your reward function complex — speed, heading, and steering — and assign weights to emphasize the most important ones for your specific track.
  • Use Geometry — Consider using Python built in modules such as numpy, scipy and shapely.

8.Visualization Tools:

A considerable good reward graph
A considerable good reward graph
  • Always watch the reward graph in action. Whenever you feel the Learning is not on track, do the needful.

9.Avoiding Overfitting:

  • Don’t let your car over-learn a single track.
  • A well-trained model should adapt to any track, focusing on general features like road edges and lanes, not specific landmarks.
Reward function of overfitted model

But if you know the competition is designed for one track. Overfitting is your decision.

10.Training Steps

Use your GPU instead of the cloud service for training. A good local GPU allows for longer, more frequent training sessions without the AWS costs.

Train in short bursts. Start with a 1-hour session, then gradually increase training time while making small tweaks.

11.Stay Organized

  • Name your models clearly
  • Store resources in directories with proper naming
  • Take screenshots of evaluation charts for easy comparison.

12.Collaborate with the Community

Join forums, participate in community races, and engage with other DeepRacer enthusiasts. You’ll learn faster and have more fun along the way.

Conclusion

Mastering AWS DeepRacer is all about experimentation and continuous learning.

These tips will help you on your journey, but the real key is in the hands-on tweaking and learning. So, what are you waiting for?

Rev up your engines, fine-tune your reward function, and race your way to the top of the leaderboard!

Happy Racing!

--

--

No responses yet