December 6, 2022
In recent months, we've marveled at the quality of computer-generated faces, cat pictures, videos, essays, and even art. Artificial intelligence (AI) and machine learning (ML) have also quietly slipped into software development, with tools like GitHub Copilot, Tabnine, Polycode, and others taking the logical next step of putting existing code autocomplete functionality on AI steroids.…

In recent months, we’ve marveled at the quality of computer-generated faces, cat pictures, videos, essays, and even art. Artificial intelligence (AI) and machine learning (ML) have also quietly slipped into software development, with tools like GitHub Copilot, Tabnine, Polycode, and others taking the logical next step of putting existing code autocomplete functionality on AI steroids. Unlike cat pics, though, the origin, quality, and security of application code can have wide-reaching implications — and at least for security, research shows that the risk is real.

Prior academic research has already shown that GitHub Copilot often generates code with security vulnerabilities. More recently, hands-on analysis from Invicti security engineer Kadir Arslan showed that insecure code suggestions are still the rule rather than the exception with Copilot. Arslan found that suggestions for many common tasks included only the absolute bare bones, often taking the most basic and least secure route, and that accepting them without modification could result in functional but vulnerable applications.

A tool like Copilot is (by design) autocompletion turned up a notch, trained on open source code to suggest snippets that could be relevant in a similar context. This makes the quality and security of suggestions closely tied to the quality and security of the training set. So the bigger questions are not about Copilot or any other specific tool but about AI-generated software code in general.

It’s reasonable to assume Copilot is only the tip of the spear and that similar generators will become commonplace in the years ahead. This means we, the technology industry, need to start asking how such code is being generated, how it’s used, and who will take responsibility when things go wrong.

Satnav Syndrome

Traditional code autocompletion that looks up function definitions to complete function names and remind you what arguments you need is a massive time-saver. Because these suggestions are merely a shortcut to looking up the docs for yourself, we’ve learned to implicitly trust whatever the IDE suggests. Once an AI-powered tool comes in, its suggestions are no longer guaranteed to be correct — but they still feel friendly and trustworthy, so they are more likely to be accepted.

Especially for less experienced developers, the convenience of getting a free block of code encourages a shift of mindset from, “Is this code close enough to what I would write” to, “How can I tweak this code so it works for me.”

GitHub very clearly states that Copilot suggestions should always be carefully analyzed, reviewed, and tested, but human nature dictates that even subpar code will occasionally make it into production. It’s a bit like driving while looking more at your GPS than the road.

Supply Chain Security Issues

The Log4j security crisis has moved software supply chain security and, specifically, open source security into the limelight, with a recent White House memo on secure software development and a new bill on improving open source security. With these and other initiatives, having any open source code in your applications could soon need to be written into a software bill of materials (SBOM), which is only possible if you knowingly include a specific dependency. Software composition analysis (SCA) tools also rely on that knowledge to detect and flag outdated or vulnerable open source components.

But what if your application includes AI-generated code that, ultimately, originates from an open source training set? Theoretically, if even one substantial suggestion is identical to existing code and accepted as-is, you could have open source code in your software but not in your SBOM. This could lead to compliance issues, not to mention the potential for liability if the code turns out to be insecure and results in a breach — and SCA won’t help you, as it can only find vulnerable dependencies, not vulnerabilities in your own code.

Licensing and Attribution Pitfalls

Continuing that train of thought, to use open source code, you need to comply with its licensing terms. Depending on the specific open source license, you will at least need to provide attribution or sometimes release your own code as open source. Some licenses forbid commercial use altogether. Whatever the license, you need to know where the code came from and how it’s licensed.

Again, what if you have AI-generated code in your application that happens to be identical to existing open source code? If you had an audit, would it find that you are using code without the required attribution? Or maybe you need to open source some of your commercial code to remain compliant? Perhaps that’s not yet a realistic risk with current tools, but these are the kind of questions we should all be asking today, not in 10 years’ time. (And to be clear, GitHub Copilot does have an optional filter to block suggestions that match existing code to minimize supply chain risks.)

Deeper Security Implications

Going back to security, an AI/ML model is only as good (and as bad) as its training set. We’ve seen that in the past — for example, in cases of face recognition algorithms showing racial biases because of the data they were trained on. So if we have research showing that a code generator frequently produces suggestions with no consideration for security, we can infer that this is what its learning set (i.e., publicly available code) was like. And what if insecure AI-generated code then feeds back into that code base? Can the suggestions ever be secure?

The security questions don’t stop there. If AI-based code generators gain popularity and start to account for a meaningful proportion of new code, it’s likely somebody will try to attack them. It’s already possible to fool AI image recognition by poisoning its learning set. Sooner or later, malicious actors will try to put uniquely vulnerable code in public repositories in the hope that it comes up in suggestions and eventually ends up in a production application, opening it up to an easy attack.

And what about monoculture? If multiple applications end up using the same highly vulnerable suggestion, whatever its origin, we could be looking at vulnerability epidemics or maybe even AI-specific vulnerabilities.

Keeping an Eye on AI

Some of these scenarios may seem far-fetched today, but they are all things that we in the tech industry need to discuss. Again, GitHub Copilot is in the spotlight only because it currently leads the way, and GitHub provides clear warnings about the caveats of AI-generated suggestions. As with autocomplete on your phone or route suggestions in your satnav, they are only hints to make our lives easier, and it’s up to us to take them or leave them.

With their potential to exponentially improve development efficiency, AI-based code generators are likely to become a permanent part of the software world. In terms of application security, though, this is yet another source of potentially vulnerable code that needs to pass rigorous security testing before being allowed into production. We’re looking at a brand new way to slip vulnerabilities (and potentially unchecked dependencies) directly into your first-party code, so it makes sense to treat AI-augmented codebases as untrusted until tested — and that means testing everything as often as you can.

Even relatively transparent ML solutions like Copilot already raise some legal and ethical questions, not to mention security concerns. But just imagine that one day, some new tool starts generating code that works perfectly and passes security tests, except for one tiny detail: Nobody knows how it works. That’s when it’s time to panic.

Source