Communicative Agents for Software Development

Original paper ยท Qian et al 2023

ChatDev, your own virtual chat-powered software engineering team that transforms your typical user-requirements into comprehensive software solutions - encompassing source code, environment dependencies and user manuals. The authors present a robust software engineering pipeline, powered purely by GPT-3.5 Turbo agents that use natural langauge to communicate product needs, design choices, bugs, etc. Replicating that of typical software companies.

Turning a LLM into a software company

The selling point is solid and it's a really interesting concept but in reality the scale isn't quite there yet. Automatize agents were popularized by the AutoGPT project back in March and they were all the hype in spring. I'm going to be frank and say that I haven't peered down this hole before because I wanted to wait for some of the hype dust to settle. I know miHoYo, the company behind Genshin Impact, co-authored a survey paper on LLM based agents that was published just a few weeks ago and I'm excited to read that (or skim at the very least, its 90 pages...). Anyway, we're here to talk about ChatDev.

GPT is great at generating specific, isolated code. Anyone who's asked ChatGPT for a well-defined short-ish function knows this. However, asking it to generate an entire codebase from scratch is almost impossible. It starts hallucinating, forgets dependencies and implements tedious bugs that are difficult to diagnose. As a solution to this, I've always treated coding with GPT-ADA as a pair-programming session, working alongside ADA by asking questions, breaking the code down in testable chunks, and stepping through the project; updating GPT's memory of what's working and what's not. This sequential development model is exactly what ChatDev mimics. By implementing multiple agents and following the classic waterfall model, the project progresses in controlled chunks, leaving substantially less space for errors to grow. Within the chat chain, each node represents a specific subtask, and two roles engage in context-aware, multi-turn discussions to propose and validate solutions. This approach ensures that client requirements are analyzed, creative ideas are generated, prototype systems are designed and implemented, potential issues are identified and addressed, debug information is explained, appealing graphics are created, and user manuals are generated.

I find this to be a very clever set up because it also allows the user to step in between each phase and check the work, modify input/output and adjust future prompts. Now, this is still at a very early stage - the problems that ChatDev has been asked to solve usually involves generating about 300 lines of source code. This is great, but its actual usability still seems quite low. It is however incredible cheap, remember that this project was run with OpenAI's API so they payed per token, but on average the solutions were generated for less than a dollars worth of tokens, wow! I'm really excited to see how this space looks in a year or so. Not sure of software companies is the space that will expand the most but gaming seems like such a low hanging fruit for these agents.