This blog post is a personal assessment of GitHub Copilot based on a 60-days free trial usage and information I gathered from different sources.
Let’s start with a short introduction.
What IT IS
GitHub Copilot got announced as
Your AI pair programmer. It’s an artificial intelligence (AI) based service that supports developers in writing code faster and with less effort. Copilot is powered by Codex, a generative pretrained language model developed by OpenAI. It derives context from comments and code and suggests code instantly. It feels a bit like having StackOverflow right in the integrated development environment (IDE).
How it works
The OpenAI Codex got trained on publicly available source code (including code in public GitHub repositories) and natural language. This means the codex works for programming languages and human languages as well.
The GitHub Copilot extension sends your comments and code to the GitHub Copilot service which uses this information to gather context. All these information (comments, code and context) are then used by OpenAI Codex to make suggestions. Suggestions can be single statements/lines or even whole functions.
GitHub Copilot extensions are currently available for the following IDEs:
- Visual Studio Code
- Visual Studio
- JetBrains IDEs
What Data is collected
Now it’s getting more and more interesting. One of the first questions I asked myself when the announcement was made is what data would be collected? On the GitHub Copilot overview page there is a
Frequently asked questions section that contains some insights concerning data collection.
GitHub Copilot relies on file content and additional data to work. It collects data both to provide the service and saves some of the data to perform further analysis and enable improvements. Please see below for more details on how your telemetry data is used and shared.
User Engagement Data
When you use GitHub Copilot it will collect usage information about events generated when interacting with the IDE or editor. These events include user edit actions like completions accepted and dismissed, and error and general usage data to identify metrics like latency and features engagement. This information may include personal data, such as pseudonymous identifiers.
Code Snippets DataGitHub Copilot FAQ
Depending on your preferred telemetry settings, GitHub Copilot may also collect and retain the following, collectively referred to as “code snippets”: source code that you are editing, related files and other files open in the same IDE or editor, URLs of repositories and files paths.
GitHub assures that transmitted data is encrypted in transit and at rest and that access is controlled very strictly. However you as a user don’t know what exactly is being sent to GitHub Copilot service… are all sorts of secrets files (i.e. managed user secrets in VS) automatically excluded by the extensions? Let’s have a closer look at that topic later.
Another important thing to have a look at is the pricing model.
GitHub Copilot is currently only available to individual GitHub users for an additional cost of:
- $10 USD per month
- $100 USD per year
Enterprise managed user accounts are not yet eligible however GitHub Copilot for companies should come later this year (waitlist can be found here).
As many others I registered for the 60-days free trial and gave it a try. I was pretty impressed and decided now to buy a one year subscription to investigate further in context of my side projects.
Based on my experience with GitHub Copilot so far, the tool is mainly useful for completing methods, generating conversions (time, date, …), generating simple validation logic, etc. Unfortunately, I have not yet managed to let Copilot write complete unit tests. This would be a huge benefit if this once gets possible.
Below are a few examples of suggestions by GitHub Copilot:
When using GitHub Copilot especially in a business environment but also privately there are a few things to consider.
FIRST: During setup users can take two important decisions:
I recommend to activate the filter that detects and suppresses suggestions that match public code on GitHub – just to be save in terms of licensing.
SECOND: you should make sure that all source code bases you treat with GitHub Copilot are secret free. Even if this should be the standard – unfortunately it is not always the case my experience teached me…
THIRD: Like any other code (i.e. code from StackOverflow), code suggested by GitHub Copilot should be carefully reviewed, judged and tested. GitHub copilot is only your copilot – you as a developer are the pilot and therefore in charge.
Public code may contain insecure coding patterns, bugs, or references to outdated APIs or idioms. When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns. This is something we care a lot about at GitHub, and in recent years we’ve provided tools such as GitHub Actions, Dependabot, and CodeQL to open source projects to help improve code quality. Of course, you should always use GitHub Copilot together with good testing and code review practices and security tools, as well as your own judgment.GitHub
This point gets even more attention when considering the report of some American researchers which got presented at the Black Hat conference 2022. The researchers systematically investigated the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. They created 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, they found approximately 40% to be vulnerable…
There are several things that seem to have an impact on the quality of the output. First of all the natural language – as public sources are mostly in English, non-English speakers might experience a lower quality of service. But also the author could have an impact. For example the researchers added a python author flag set to Andrey Petrov, lead author of python’s popular third-party library urllib3. As his code is extremely popular it’s more likely vetted for security errors. And indeed the number of non-vulnerable suggestions increased.
GitHub Copilot is a great initiative and definitely a game changer in terms of speed when it comes to writing code. But it requires careful reviews and even better, proper security testing in the development process. I expect GitHub Copilot to become a standard development tool. However there are some legal aspects to be clarified first before it will be used broadly in business environments.