Articles by Shu Ishida

LangProp: A code optimization framework using Large Language Models applied to driving

Shu Ishida

Summary: LangProp is a framework that optimizes code with an LLM (Large Language Model) according to a training objective. Code candidates are evaluated on a training dataset, re-ranked by their performances, and updated with an LLM. We applied LangProp to optimize code for a self-driving agent, which outperformed many human-written driving systems in the CARLA benchmark.

[ Paper · Poster · Project Page · Code ]
An overview of the LangProp paper
An overview of LangProp. LangProp is a framework for optimizing code generated by LLMs. We evaluated the performance of LangProp on CartPole, solving a generalized M x N Sudoku, and on autonomous driving in the CARLA simulator.

Can we use ChatGPT to drive a car?

You have probably used ChatGPT to write your emails, summarize documents, find out information, or help you debug your code. But can we take a step further and make ChatGPT drive a car?

This was the question we wanted to answer when I started my work placement at Wayve in March last year. Wayve is an autonomous driving startup in London, applying end-to-end learning to the challenging problem of urban driving. At the time, the company was just about to launch its LLM research team, which has since successfully developed LINGO-1 and LINGO-2. AutoGPT had just come out, and Voyager had not come out yet. And yet, the disruption caused by LLMs was palpable. The question was, how can we use this new technology to driving, a domain where language isn’t the main modality?

In this blog post, I would like to give an overview of our paper LangProp, which we presented at the LLM Agents workshop at ICLR (the International Conference on Learning Representations) in May 2024.

Shu Ishida, 29 June 2024

CALVIN — a neural network that can learn to plan and navigate unknown environments

Shu Ishida

Summary: CALVIN is a neural network that can plan, explore and navigate in novel 3D environments. It learns tasks such as solving mazes, just by learning from expert demonstrations. Our work builds upon Value Iteration Networks (VIN) [1], a type of recurrent convolutional neural network that builds plans dynamically. While VINs work well in fully-known environments, CALVIN can work even in unknown environments, where the agent has to explore the environment in order to find a target.

[ Paper · Project Page · Code ]

The Problem

The problem we address is visual navigation from demonstrations. A robotic agent must learn how to navigate, given a fixed amount of expert trajectories of RGB-D images and the actions taken. While it is easy to plan with a top-down map that defines what are obstacles and targets, it is more challenging if the agent has to learn the nature of obstacles and targets from the RGB-D images.

A sequence of images and actions that the agent sees as expert demonstrations
A sequence of images and actions that the agent sees as expert demonstrations

Another important aspect of navigation is exploration. Our agent starts without any knowledge about the new environment, so it has to build a map of the environment as it navigates, and learn to explore areas that are most likely to lead to the target.

The agent learns to predict rewards that best explains expert demonstrations
The agent learns to predict rewards that best explains expert demonstrations. High values are bright (yellow) and low values are dark, the expert's trajectory is dashed and the agent's trajectory is solid.

For the agent to be able to navigate in environments it hasn’t been trained on, it has to learn some general knowledge applicable across all environments. In particular, we focus on learning a shared transition model and reward model that best explain expert demonstrations, which can then be applied to new setups.

The agent learns motion dynamics that are reusable across all environments
The agent learns motion dynamics that are reusable across all environments. Each panel shows the probability of landing in a local neighbourhood around the agent, when taking a move action in each of 8 cardinal directions; standing still would correspond to a single high probability (bright value) at the centre of a panel.
Shu Ishida, 2 June 2022