I glimpsed the future of coding

View in browser | Your newsletter preferences

By Will Knight | 07.18.24

This week I tested out an AI agent that enhances the abilities of software engineers. Coding may be the first of many areas where AI agents prove their worth.

The Future of Coding Involves a Lot More AI 💻

A photo illustration of hands typing collaged against lines of computer code and a pattern of old computer desktops.

I am by no means a skilled coder, but thanks to a free program called SWE-agent, I was just able to debug and fix a gnarly problem involving a misnamed file within different code repositories on the software-hosting site Github.

I pointed SWE-agent at an issue on Github and watched as it went through the code and reasoned about what might be wrong. It correctly determined that the root cause of the bug was a line that pointed to the wrong location for a file, then navigated through the project, located the file, and amended the code so that everything ran properly. It’s the kind of thing that an inexperienced developer (such as myself) might spend hours trying to debug.

Many coders already use artificial intelligence to write software more quickly. Github Copilot was the first integrated developer environment to harness AI, but lots of IDEs will now automatically complete chunks of code when a developer starts typing. You can also ask AI questions about code or have it offer suggestions on how to improve what you’re working on.

Last summer, John Yang and Carlos Jimenez, two Princeton PhD students, began discussing what it would take for AI to become a real-world software engineer. This led them and others at Princeton to come up with SWE-bench, a set of benchmarks for testing AI tools across a range of coding tasks. After releasing the benchmark in October, the team developed its own tool—SWE-agent—to master these tasks.

SWE-agent (“SWE” is shorthand for “software engineering”) is one of a number of considerably more powerful AI coding programs that go beyond just writing lines of code and act as so-called software agents, harnessing the tools needed to wrangle, debug, and organize software. The startup Devin went viral with a video demo of one such tool in March.

A team at OpenAI recently helped the Princeton crew improve a benchmark for measuring the reliability and efficacy of tools like SWE-agent, suggesting that the company might also be honing agents for writing code or doing other tasks on a computer.

Ofir Press, a member of the Princeton team, says that SWE-bench could help OpenAI test the performance and reliability of software agents. “It’s just my opinion, but I think they will release a software agent very soon,” Press says.

OpenAI declined to comment, but another source with knowledge of the company’s activities, who asked not to be named, told WIRED that “OpenAI is definitely working on coding agents.”

Just as GitHub Copilot showed that large language models can write code and boost programmers’ productivity, tools like SWE-agent may prove that AI agents can work reliably, starting with building and maintaining code.

A number of companies are testing agents for software development. At the top of the SWE-bench leaderboard, which measures the score of different coding agents across a variety of tasks, is one from Factory AI, a startup, followed by AutoCodeRover, an open source entry from a team at the National University of Singapore.

Big players are also wading in. A software-writing tool called Amazon Q is another top performer on SWE-bench. “Software development is a lot more than just typing,” says Deepak Singh, vice president of software development at Amazon Web Services.

He adds that AWS has used the agent to translate entire software stacks from one programming language to another one. “It’s like having a really smart engineer sitting next to you, writing and building an application with you,” Singh says. “I think that’s pretty transformative.”

Singh says that a number of customers are already building complex back-end applications using Q. My own experiments with SWE-bench suggest that anyone who codes will soon want to use agents to enhance their programming prowess, or risk being left behind.

Will Knight, Senior Writer

Need to Know

Hackers Claim to Have Leaked 1.1 TB of Disney Slack Messages

A hacker group called “NullBulge” says it stole more than a terabyte of Disney’s internal Slack messages and files from nearly 10,000 channels in an apparent protest over AI-generated art.

An illustration of a tank juxtaposed in front of document folders layered on top of each other in red, yellow, and green.

The Hidden Ties Between Google and Amazon’s Project Nimbus and Israel's Military

A WIRED investigation found public statements from officials detail a much closer link between Project Nimbus and Israel Defense Forces than previously reported.

A photo illustration of someone using the metaverse via VR goggles and futuristic hologram.

The Metaverse Was Supposed to Be Your New Office. You’re Still on Zoom

Tech founders painted a vision of employees clocking into virtual workplaces. But the adoption of VR at work has been slow.

Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI

“It’s theft.” A WIRED investigation found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Anthropic, Nvidia, Apple, and Salesforce to train AI.

So, This Happened

The latest AI antitrust investigation sees the UK scrutinizing Microsoft’s ties to the startup Inflection. (The Wall Street Journal)

The Chinese government is backing domestic AI companies but also imposing stringent rules for testing their language models. (The Wall Street Journal)

The AI chip boom continues, with ASML, a key manufacturer of chip-making equipment, expecting a jump in orders for its gear. (Reuters)

A Senate investigation finds that Amazon’s Prime Day is a major cause of workplace injuries at the ecommerce giant. (The Washington Post)

Until Next Time

That’s it for another week. I’ll just leave you with a surreal yet fascinating clip of Logan Paul asking Donald Trump for his views on AI last month. The former president displays a decent grasp of many of the key issues, admits that he edited a speech using ChatGPT, and refers to AGI as “super-duper AI.”

Thanks for reading.

Was this newsletter forwarded to you? Sign up here.

Plus, browse more newsletters from WIRED.

You’re receiving this email because you signed up for the Fast Forward newsletter from WIRED.

Manage your preferences | View our Privacy Policy | Unsubscribe

Have questions or comments? Send me a message.

Need help? Contact WIRED.