Building an End-to-End Autonomous Driving Stack

Autonomous driving (AD) is arguably one of the most complex engineering challenges of our time. It requires a seamless fusion of advanced perception, split-second planning, and precise control.

My masters thesis revolved around building a End To End Self Driving Car stack . This is a active research focus in the automotive industry , kick started by Tesla who was able to cut short their code base from half a million lines of code to just about 50000 lines of code by reducing many classical hand scripted modules and moving to a monolithic DL based tech stack .

This was an active research topic during my time at MB and lead me to pursue on this topic for my masters desertation .

Autonomous Driving Video Collection

1) CARLA Videos

CARLA Simulator Tutorial

A detailed tutorial on using CARLA Simulator for autonomous driving research.

CARLA Advanced Features

Exploring sensors, traffic scenarios, and ROS integration in CARLA.

2) BEVFusion Videos

BEVFusion Explained

A walkthrough of BEVFusion for multi-camera 3D perception and autonomous driving pipelines.

3) nuScenes Dataset Videos

nuScenes Dataset Overview

Understanding the nuScenes dataset, including 3D bounding boxes, LiDAR, and camera fusion.

The Toolkit: A Virtual Car in a Virtual World

To build a self-driving “brain,” you first need a car and a world to test it in. The project leveraged an industry-standard toolkit:

Simulator: CARLA was used as the virtual environment. It provides a hyper-realistic world, complete with a full suite of vehicle sensors: 6 cameras, LIDAR, RADAR, IMU, and GPS.
The “Nervous System”: The Robot Operating System (ROS) was used to pipe all this data around. It’s the software backbone that connects the car’s “eyes” (sensors) to its “brain” (the AI models).
The “Textbooks”: The AI models were trained on massive, real-world datasets like Nuscenes, Kitti, and Waymo, which provide the ground truth for perception.

The Core of the Stack: Perception and Planning

The project’s work log shows a systematic build-up of a state-of-the-art perception stack.

From Point Clouds to Bird’s-Eye View (BEV): A core component of modern autonomy is the Bird’s-Eye View (BEV). The project successfully implemented a pipeline to take raw, 3D LIDAR point cloud data and “squash” it into a 2D top-down map. This BEV representation is what the AI uses to “see” the world and plan a path.
Building the Perception Models: The project integrated and trained key 3D object detection models, including PointPillar, a foundational model for processing LIDAR data. The work culminated in a major milestone: getting BEVFusion running. This is a state-of-the-art model that fuses data from both the LIDAR and all 6 cameras, creating a much richer and more robust understanding of the environment.
Achieving Real-Time Performance: A perception model is useless if it’s too slow. A critical achievement was optimizing the massive BEVFusion model using TensorRT (NVIDIA’s high-performance inference optimizer). This allowed the complete, multi-modal perception stack to run in real-time, publishing its results over ROS.

Closing the Loop: From AI Decision to Actuation

The final piece of the puzzle was to “close the loop.” It’s not enough for the AI to just see the world (open-loop); its decisions must actually control the car.

This was accomplished by integrating a Model Predictive Controller (MPC). This advanced controller takes the desired path from the AI’s planning module and translates it into smooth steering and acceleration commands for the CARLA vehicle.

By connecting the real-time BEVFusion model’s output to the MPC controller’s input, the project achieved a true Closed-Loop Testing scenario. The AI “drives” the car, and the car’s new position is fed back to the AI in a continuous, real-time loop.

The Result: A Powerful Testbed for Autonomous Driving

The final system is a powerful and complete testbed for validating end-to-end ADAS stacks. It allows for the benchmarking of various models (like VAD, PPGEO, or UniAD) and measures them against the metrics that matter:

Route Completion (RC): Can the car successfully navigate to its target?
Collisions: How many collisions does it have per kilometer?
Smoothness: Is the ride comfortable, or is it jerky and unsafe?

This project serves as a comprehensive “how-to” for building the brain of a self-driving car, starting from raw data and ending with a fully autonomous agent driving itself in a virtual world.

Multimodal BEV sensor perception

📄

View Resume

Building an VLM for ADAS