Unlocking the Potential of AI in Software Development

The speed of Generative AI adoption means it is now used in almost all areas of enterprise. Services like transcribing or content creation are perhaps the most widely known applications given that anybody can use them and tools like these are department agnostic. But Generative AI is also exerting considerable influence at a more fundamental level and, as a result, posing difficult questions about its role in coding and software development.

Author: Josep Prat, Open Source Engineering Director, Aiven

Addressing how AI can be best adopted without hampering creativity or overstepping the line when it comes to copyright or licensing laws is one of the major challenges facing software developers. For instance, the Intellectual Property Office (IPO), the Government body responsible for overseeing intellectual property rights in the UK, confirmed recently that it has been unable to facilitate an agreement for a voluntary code of practice which would govern the use of copyright works by AI developers.

The perfect match of AI and OS

Today, most AIs are being trained on open source (OSS) projects because they can be accessed without the restrictions associated with proprietary software. This is something of a perfect match. It provides AI with an ideal training environment with access to a huge amount of standard code bases which are running in infrastructures around the world. At the same time, OS software is exposed to the acceleration and improvement that running with AI can provide.

Developers, too, are massively benefiting from AI because they can ask questions, get answers and, whether it’s right or wrong, use AI as a basis to create something to work with. This major productivity gain is helping to refine coding at a rapid rate. Developers are also using it to solve mundane tasks quickly, get inspiration or source alternative examples on something they thought was a perfect solution.

Unlocking the Potential of AI in Software Development

Total certainty and transparency

However, it’s not all upside. The integration of AI into OSS has complicated licensing. General Public Licenses (GPL) are a series of widely used free software licenses (there are others too), or copyleft, that guarantee end users four freedoms; to run, study, share, and modify the software. Under these licenses, any modification of software needs to be released within the same software license. If a code is licensed under GPL, any modification to it also needs to be GPL licensed.

There lies the issue. Unless there is total transparency in how the software has been trained, it is impossible to be certain of the appropriate licensing requirements or, indeed, how to even license it in the first place. This makes traceability paramount if copyright infringement and other legal complications are to be avoided. Additionally, there is the ethical question – if a developer has taken a piece of code and modified it, is it still the same code?

So the pressing issue is this: What practical steps can software developers take to safeguard themselves against the code they produce and what role can the rest of the software community – OSS platforms, regulators, enterprises and AI companies – play in helping them do that?

Here is where foundations come to offer guidance

Integrity and confidence in traceability matters more when it comes to OSS because everything is out in the open. A mistake or oversight in proprietary software might happen but, because it is a closed system, the chances of exposure are practically zero. Developers working in OSS are operating in full view of a community of millions. They need certainty where a source code from a third party originates from – is it a human, or is it AI?

There are foundations in place. Apache Software Foundation has a directive that says developers shouldn’t take source code done by AI. They can be assisted by AI but the code they contribute is the responsibility of the developer. If it turns out that there is a problem, then it is the developers issue to resolve. We have a similar protocol at Aiven. Our guidelines state that our developers can make use only of the pre-approved constrained Generative AI tools, but in any case, developers are responsible for the outputs and need to be scrutinized and analyzed, and not simply taken as they are. This way we can ensure we are complying with the highest standards.

Beyond this, there are ways organizations using OSS can also play a role, taking steps to safeguard their own risks in the process. This includes the establishment of an internal AI Tactical Discovery team – a team set-up specifically to focus on the challenges and opportunities created by AI. In this case it would involve a project specifically designed to critique OSS code bases, using tools like Software Composition Analysis to analyze the AI-generated codebase, comparing it against known open source repositories and vulnerability databases.

Creating a root of trust in AI

While it is happening, creating new licensing and laws around the role of AI in software development will take time. Not least because consensus is required when it comes to the specifics of its role and the terminology used to describe it. This is made more challenging because the speed of AI development and how it is being applied in code bases moves at a much quicker pace than those trying to put parameters in place to control it.

When it comes to assessing if AI has provided copied OSS code as part of its output, factors such as proper attribution, license compatibility, and ensuring the availability of the corresponding open source code and modifications are absolutely necessary. It would also help if AI companies start adding traceability to their source code. This will create a root of trust that has the potential to unlock significant benefits in software development.