By Howard Dodd
OpenAI’s ChatGPT, or similar generative AI (GenAI) chatbots such as Gemini or Claude, have demonstrated the ability to translate software code from one language to another, sparking interest among software engineering leaders in its potential for code transformation and modernization. For instance, it can ostensibly transform a COBOL application running on a mainframe into a Java application running in the cloud. The output from these GenAI-enabled chatbots’ software code translation process is readable, well-outlined and seemingly correct.
However, despite its promising capabilities, GenAI cannot be fully trusted for these tasks.
Understanding the limitations of ChatGPT in Code Transformation
While GenAI’s ability to translate code might seem like a breakthrough for code transformation initiatives, it has significant limitations. The following limitations of GenAI mean that most enterprises should consider it unacceptable to use it for automated code transformation.
- Inaccurate and Incomplete Translation
When modernizing applications, code transformation typically requires like-for-like translation. The resulting applications must be exactly equivalent in terms of functionality and data, and this equivalence must be proven. Without this proof, the applications necessitate full coverage testing and manual correction of any gaps, inaccuracies or unacceptable code. This requirement for functional and data equivalence presents a significant challenge when using ChatGPT or other generative AI solutions.
GenAI is not well-suited to ensuring functional and data equivalence. Although Generative Pre-trained Transformer 4 (GPT-4) is more reliable than its predecessors and can handle more complex situations, it still suffers from several key issues:
- Hallucinations and Factual Errors: GenAI can generate seemingly correct code that, upon closer inspection, contains significant errors and bugs.
- Training Data Issues: The models reflect the deficiencies of its training corpus, including any inaccurate, biased, prohibited or incorrect data.
- Incomplete Results: Some portions of the code might not be transformed, leading to incomplete outputs.
- Lack of Transparency and Explainability: Generative AI models are complex and operate as “black boxes,” making it challenging to understand how they reach their decisions. This lack of transparency hinders the ability to justify or challenge their assessments.
- Inability to Handle Scope of Transformation
An application’s source code typically runs to hundreds of thousands or even millions of lines, which is too much for GenAI models such as ChatGPT or Gemini to process. Its prompt argument limits the number of lines of code. Compared with GPT-3, GPT-4 allows eight to 10 times more user content to be included along with a prompt, but it still has a limit — and that limit is smaller than the codebase of most applications.
GenAI models can transform modules individually, but attempting to transform an application incrementally is not recommended. Large applications have many explicit and implicit dependencies between code and data that must be understood and considered during a transformation. The context of a full system is required to generate a working, stable and secure application.
- Unsecure and Bad Code
GenAI enabled large language models have been trained using not only high-quality code but also code with security vulnerabilities, bugs and copyright restrictions. The code generated by these models may therefore include proprietary, open-source and unsecure coding patterns, and dangerous or malicious code segments.
What’s Next: Generative AI in Application Modernization
Generative AI and other AI technologies are expected to enhance code transformation and modernization tools, though fully automated and unattended code transformation will not be available soon. Vendors in this space have long utilized AI techniques like machine learning and are now working to mitigate generative AI’s limitations. By integrating generative AI with deterministic rule-based models and transpilers, the efficiency and outcomes of code transformation efforts are anticipated to improve.
(The author of the article is Howard Dodd, Sr Director Analyst at Gartner, and the views expressed in this article are his own)