This short post will highlight some of the behaviors I have encountered when using agentic workflows or AI coding assistants in larger coding projects. With these tools, I will largely focus on standard and non-finetuned tools, such as GPT-4 or smaller variants such as GTP-5-mini. If you are interested and working in education, you can even get GitHub Copilot for free (for the caveat of needing to produce documentation).
This is by no means just a criticism of LLM tools for coding, but rather a warning on what to look out for when dealing with these tools.
Are AI Coding assistants too eager to pass tests?
A standard workflow when I personally use AI tools in my day-to-day coding, particularly for projects such as the Synavis Framework, I sometimes like to have it attempt to introduce something, fit an algorithm to what I would like to have, or attempt to fix some build errors.
A common denominator is the fact that AI tools appear to have been fine-tuned or similar to pass tests and to produce workable code, which sometimes does not necessarily align with the task it was given. This is particularly garish in cases in which no error should be output and it is attempting to pass the build pipeline. In these cases, the AI tools will often (not sometimes: often) attempt to introduce the following changes:
- Introduce include workarounds such as
#ifdef FFMPEG_AVAILABLEwhich trigger non-compilation if something earlier in the pipeline failed - Attempt to workaround using more and more stubs which will hide other issues by pure force of obscurification
- Introduce stub methods to avoid both error and implementation
- Introduce helpers, try/catch paradigms, alternate pathways, fallbacks, which allow the code to "run without errors" in any case, no matter what situation is encountered
These aspects make it often very hard to use AI coding assistants in Synavis, even though for some tasks I would really like to have them - particularly e.g. translating an RFC standard into C++ code for packaging of RTC packets.
Everything is optional
Smaller scripts are often what you would assume AI coding assistants would be good at since there are many many examples on the internet on specific tasks and smaller scripts, often didactically prepared, like in the context of a tutorial. However, here there are particular pitfalls that AI agents seem to always run into on purpose.
Often times AI agents will attempt to make everything optional. Would you like a new feature, or a script for a specific purpose? It will introduce dry-runs, optional parameters which really should be mandatory, and particularly workaround parts of the scrips for when there are errors in the meantime, after which the script can still continue even though it should really stop. AI agents, when asked "please add FFMPEG as dependency" will fail to do this, even if the surrounding environemnt is not particularly complex, such as focusing on the Build.cs in a UE project. The agents will make it their mission to suggest to the application that perhaps there might be an FFMPEG installation but they will put in a half-hearted attempt at including it, and bailing at the first hurdle, sometimes not even including output that it failed.
If it does not compile, I will cheat
I once asked an AI coding agent to translate a documentation on how to talk to a storage system into a Python code. The verification methods of the user token to talk to the storage system are available as curl commands. The AI agent, even without prior cause, produced a TV-set version of the code. For some reason I was unable to discern and it was unable to explain, it also produced mock curl commands which would always return the expected answer which was shown in the documentation. Subsequently, the code ran through with no issues, but the actual storage access was not successful, which was hidden by the fake curl commands.
Coding agents will implement stubs, try to workaround issues. Once, an AI coding agent was asked to make a C++ project build successfully. Its dependencies were a bit more difficult to handle, as I relied on PKG_Config in Unix and paths in Windows. The AI coding agent had the deliverable to let an executable with a certain library-related function run successfully. It was allow to automatically rebuild, change files, as well as run the file requested. It did not even arrive at being able to run the file. In the end, the AI coding agent, frustrated(???) with the lack of progress, decided it was best to delete everything. No, it did not "rewrite a clean version of the file to avoid the errors", it straight up deleted everything since its reasoning was "no code has to be able to build as the files are empty and empty files should compile". It probably would have added some code again, but at some point I can not let it run further, even if it was entertaining to watch a prediction tool loose its "mind".
Conclusion
Do not let these tools run freely. For larger projects, their solution seldomly is good. The Synavis solutions that were implemented by LLMs always need to be replaced later on as their solutions are often so inefficient that it would be careless to accept them in production.