Thursday, March 17, 2011

Google's Android faces a serious Linux copyright issue (potentially bigger than its Java problem)

Intellectual property issues continue to cloud Google's mobile operating system. More than a dozen patent suits over Android are already underway. In one of them, Oracle additionally claims that Android infringes on large amounts of copyrighted Java code. And now there is grave concern over the legality of a central element of its architecture: the library that connects Android and its applications with the underlying Linux kernel.

Google copied 2.5 megabytes of code from more than 700 Linux kernel header files with a homemade program that drops source code comments and some other elements, and daringly claims (in a notice at the start of each generated file) that the extracted material constitutes "no copyrightable information".

It is much more likely that Google is wrong and this is, instead, a very serious violation of the GPL, the open source license under which Linux is published.

The GPL's copyleft nature requires all derivative works of a GPL'd program to be made available on the same terms. Google, however, very intentionally publishes Android as a multi-license potpourri of

  • GPL'd software (the Linux kernel),

  • permissively-licensed open source software (programs under open source licenses such as the Apache Software License or BSD/MIT licenses, which don't come with copyleft), such as the Dalvik virtual machine (which is at the heart of Oracle's lawsuit), and

  • closed-source programs.

Google's "no copyrightable material" claim, which plays a central role in enabling this potpourri, is at best questionable. If Google is proven wrong, pretty much that entire software stack -- and also many popular third-party closed-source components such as the Angry Birds game and the Adobe Flash Player -- would actually have to be published under the GPL. In some cases, such as Dalvik, that would be hard to do for technical and licensing reasons, but in any case, a fully GPL'd Android would completely run counter to Google's Android strategy. Everyone would be free to use, modify and redistribute all of the affected software.

As a result, there would be no more revenue opportunity for the developers of the affected applications, and the makers of Android-based devices would lose their ability to differentiate their products through proprietary add-ons. Whatever software they publish would become available to their competitors on GPL terms. Prices and margins would inevitably come down.

To eliminate the risk of a collapse of the Android ecosystem and navigate around copyleft, the misappropriated Linux code would have to be replaced. The only real viable alternative is a library called glibc (GNU C library). That library is the industry standard and is used by Android’s major mobile Linux competitors, MeeGo and WebOS.

It wouldn’t be easy, though. Due to architectural differences between Bionic and glibc, thousands of Android components would have to be rewritten and rebuilt by Google and third parties. In some cases that could prove very difficult and time-consuming. There would also be significant compatibility issues with legacy versions of Android. Painful as it may be, there's no legally safe alternative that would shield Android from the implications of GPL copyleft.

Let me now

  • explain why Google's denial of copyright is unlikely to hold water in court (at least in the US),

  • describe the wide-ranging implications this hazardous approach -- which is either downright illegal or at least irresponsibly risky -- could have for Android device makers and application developers, and

  • look more closely into what Google should do to fix this problem -- sooner rather than later.

Copyright is more resilient than Google thinks

Google openly admits that it wanted to "keep [the] GPL out of user-space" (userspace is whatever runs on top of Linux). You can find that statement on page 36 of this official Android presentation (PDF). So the Android development team came up with a library named Bionic, which contains a set of Linux kernel header files. Each of them starts with the following notice:

"This header was automatically generated from a Linux kernel header of the same name, to make information necessary for userspace to call into the kernel available to libc. It contains only constants, structures, and macros generated from the original header, and thus, contains no copyrightable information."

Note that the text mentions libc, which is a different library than glibc. It's BSD-licensed. Bionic is based on libc, and the header files with the above notice are added to Bionic.

The above notice is Google's way to say that the GPL doesn't affect Android because copyleft legally depends on copyright to be enforceable.

Having looked at many of those files, I don't think Google is right. There are potentially copyrightable elements in those files, such as inline functions, and even a collection of individually non-copyrightable elements can as a whole be protected by copyright.

Linus Torvalds himself has clearly rejected the idea of using the original Linux kernel headers in programs that aren't licensed under the GPL. In a posting to the official Linux kernel mailing list, he made the following unequivocal statements:

"In short: you do _NOT_ have the right to use a kernel header file (or any other part of the kernel sources), unless that use results in a GPL'd program."

"So you can run the kernel and create non-GPL'd programs [...]

That statement was made in 2003 and looks abundantly clear. I don't think it was based on the assumption that cutting out source code comments and some functions, with the subtlety of a chain saw, would ever be sufficient to circumvent the GPL. If this served its purpose, the GPL would be reduced to absurdity, resulting in proprietary forks and extensions of Linux and other GPL'd software such as MySQL.

Neither Linus nor I are lawyers. However, two high-profile US copyright experts -- an academic and a practitioner -- have also expressed doubts about Google's claims.

Professor Raymond Nimmer stated on his blog that "[t]he Linux core header files [...] are almost certainly copyrighted" and while he points out that he hasn't examined the facts, he finds a removal of "the expressive features involved in the structure of the header files [...] difficult to achieve since the goal was to borrow the effectiveness of the Linux system at least in part."

But the presence of expressive features would make the output of the script copyrightable, and consequently it would have to be published under the GPL.
On the Huffington Post I saw a post by Edward Naughton, a prominent IP litigator. The article is entitled "Google's Android Contains Legal Landmines for Developers and Device Manufacturers" and links to a much more detailed legal analysis, in which he describes Google's approach to the Linux kernel headers as "unusually audacious" and sees it as part and parcel of Google's overall questionable approach to software reuse in Android:

"Google's position is a bold assault on copyright protection for software and source code. There are cases, to be sure, that have permitted some copying of very small snippets of code when that is necessary to achieve interoperability. [...] Those cases do not provide much support for Google's argument that copyright law allows it to copy entire source code files, and even less for its suggestion that entire APIs [application programming interfaces] are not copyrightable."

In summary, Naughton argues that Google is very likely violating the GPL with Bionic because it incorrectly assumed it can simply "clean" the Linux headers of copyrightable information and repurpose them as it wants. On a "micro" (or individual file) level, he explains that most legal experts recognize that header files can contain copyrightable material. He points out that some, if not many, of the Linux headers that Google used in Bionic do indeed contain copyrightable material and that despite Google's claim to the contrary, it did not (and probably cannot) fully remove that material. As a result -- he concludes -- there are very likely files in Bionic that are still subject to the GPLv2.

He also makes an argument at the "macro" level based on the fact that, under US copyright law, API files are copyrightable. He argues that the overall collection of over 700 headers would likely qualify for copyright protection as a whole based on their "complex overarching structure." That would, therefore, preclude Google's ability to take those files as a group and strip them of their GPLv2 license.

Naughton's argument regarding the Bionic headers is straightforward, and I recommend reading it in full because I believe it explains very well what Google has done in a technical and legal sense. While I am not a copyright lawyer, I think the argument is compelling, and bears examining by those who are looking to use Android commercially.

In light of what experts like Nimmer and Naughton say, at the very least, I don't think anyone in the Android ecosystem can rely on Google's "no copyrightable information" claim. For a platform like Android, on which so many products depend, there has to be legal certainty. Anything less wouldn't do.

Widespread risk and far-reaching implications

The header file issue described herein affects many thousands of files (it pervades the Android codebase), and there are thousands of contributors to the Linux kernel -- independent programmers as well as companies -- who could sue Google and other companies in the Android ecosystem, alleging a violation of the GPL.

Litigants could have all sorts of motivations, be it the defense of software freedom, hopes of lucrative settlements, or competitive conflicts with Google, certain device makers, or particular application developers. Someone might act next month, next year, or later on.

If a court of law finds that the Bionic library indeed contains copyrightable GPL'd software, the distribution of all software compiled against Bionic -- and of devices containing such software -- will have to stop until there is full compliance with the GPL.

Bionic is at the heart, not at the periphery, of the Android architecture. Thousands of Android software components depend on it. I have discussed this with a Linux programmer I trust and he generated an automated analysis for me that I have uploaded to Scribd and Crocodoc. The document contains a table that shows Android components that have a so-called file dependency on Bionic, meaning they can't run without Bionic. It shows which particular parts of Bionic are used, and how many times. That table has 1,276 pages and more than 27,000 rows, and isn't even complete because only the open source components of Android were analyzed. In the event of a court ordering an injunction due to GPL infringement, the distribution of Android could not resume until each and every one of those rows -- and similar dependencies in files not yet examined -- has been properly addressed.

In terms of third-party applications, the more powerful and sophisticated they are, the more likely they are to be written in C or C++, and, therefore, the more likely they are to use Bionic. When device makers add their own components (for example, Motorola adds a program named Motoblur on top of Android), they will in most cases use C or C++ as the programming language, and consequently the Bionic libary.

Major third-party apps like Angry Birds and the Adobe Flash Player also appear to be written in C or C++.

The only realistic way to fix the problem: replace Bionic with glibc

Theoretically -- but not practically -- Google could try to solve the problem by giving up on its mixed-source strategy in favor of a GPL-only approach. Proponents of free software would be very happy about that. In fact, some of them have already started the ambitious IcedRobot project to build a GPL-only Android fork. But the price for Google to pay for this would be prohibitive.

For many components of Android, Google owns the copyrights, so it could relicense them under the GPL. However, for some very essential code Google doesn't have that option. In particular, its Dalvik virtual machine includes code from the Apache Harmony project. The Apache license and the GPL are inherently incompatible. Without that virtual machine, Google couldn't make most Android apps run. It would therefore have to replace the Harmony code with something already available or potentially relicensable under the GPL. This might take too long.

Even if Google -- hypothetically speaking -- managed to put all of the essential code under the GPL, it would thereby abandon the commercial strategy it has been pursuing so far, at least to a very large extent. On the current basis, Google uses proprietary licensing terms for closed source apps such as Google Earth -- in addition to its control over the Android trademark -- to control what device makers do. If those components had to be GPL'd, what Google would be left with to control the ecosystem would basically come down to the Android trademark.

Device makers would, as I explained further above, find themselves unable to differentiate their products through proprietary add-ons. They might invest a lot of money in extensions like Motorola's Motoblur only to find their competitors -- such as low-cost manufacturers from China -- building such code into competing products on free software terms. That's the death of differentiation.

Developers of applications using Bionic would only be able to charge (via the Android Market) those customers who don't know what rights they have under the GPL. All others would find ways to download and install those apps on GPL terms, i.e., free of charge.

In view of all of that, I think the only viable option will be for Google to recognize its error with Bionic and to replace it as soon as possible with glibc (GNU C library). That library is licensed under the LGPL ("Lesser GPL"), which has the effect that applications can access the Linux kernel without necessarily being subjected to copyleft if certain criteria are fulfilled.

Using glibc is the industry-standard approach, and it is the approach used by those in the open source world who are trying to "play by the rules." As I said before, even Google's major mobile Linux competitors use glibc. I have found documents that prove this: a MeeGo technical overview, a webOS license information document (Palm was acquired by HP), and a blog post by a sr. webOS developer relations engineer. In fact, Google's decision to forego glibc is one of the reasons Android is considered a Linux fork rather than a true Linux implementation.

However, it's apparent that even the LGPL'd glibc is too much of a copyleft risk from Google's point of view, so Google decided to build Bionic in the dubious way I described herein, essentially going its own way and thumbing its nose at the industry convention.

But replacing all references to Bionic with references to glibc throughout the entire Android codebase would be a daunting task. There wouldn't be the licensing issue Google would face if it wanted to put Dalvik under the GPL, but probably a large number of manual edits would be needed in many of those countless Android files making use of Bionic. Some files might just recompile right away against glibc, but I doubt that all of them would. I understand that there are important architectural differences.

This replacement would have to take place not only on Google's part but also be required of all developers of Android add-ons and applications written in C/C++, and by now a lot of such software has been developed by a large number of companies. Moreover, even if Google could resolve these issues going forward, there would still be problems with products running legacy versions of Android. Nonetheless, Google needs to do something because the sooner Google gets its act together, the more likely it is to pre-empt GPL enforcement by any Linux kernel copyright holder.

I'm sure Google would rather spend the same resources on the development of new features for future Android versions. That's what the ecosystem -- of which I'm actually a part, as a user -- would also like to see happen. But what must be done must be done. Continuing on the current, highly hazardous basis is not a viable option as far as I can see.

If you'd like to be updated on the smartphone patent disputes and other intellectual property matters I cover, please subscribe to my RSS feed (in the right-hand column) and/or follow me on Twitter @FOSSpatents.

Share with other professionals via LinkedIn: