Tuesday, July 9, 2013

Oracle and Google's disagreement on the notion of 'thin' (or 'weak') copyright protection for software

On Friday Oracle's reply brief in its Android-Java copyright appeal against Google was published. I said in my first post that there's more stuff in it that's worth discussing. Here's a follow-up.

The parties disagree, obviously, on the breadth of copyright protection for software. Oracle's reply brief makes clear that they disagree not only on how broad or narrow the scope of protection is but also on the relevance to each particular level: copyrightability, infringement analysis, and fair use.

This is an important issue in the high-profile Android-Java case and far beyond. It matters to policy makers, legal professionals, and (last not least) software developers alike. It's really worth giving more thought, including a comparison with the meaning of "breadth" in connection with patent claims, which the first section will draw. After a high-level discussion of the differentiated meaning of breadth, let's look in the second part at the arguments Oracle and Google have exchanged in this regard (with a particular focus on their references to statutory law, case law, and commentary). In the final part I'll outline my position on the innovation policy question of what breadth (or narrowness) is desirable to afford just the right level of protection to software developers -- including, for example, mobile app developers. I have been fighting against counterproductive overprotection for some time (most recently in connection with the enforcement of standard-essential patents), but I also oppose underprotection.

The meaning of breadth -- and how it relates to strength

When it comes to the breadth of a patent claim, less is more and brevity is the soul of breadth. Every claim limitation is another hurdle for an infringement finding. A patent claim reading on a "mobile telephone with a touch screen" is shorter and broader -- in terms of its ability to cast a wide net for capturing infringements -- than one reading on a "mobile telephone with a touch screen and a pre-installed travel expenses manager application connected to an Internet server, characterized in that [etc.]". That's because a given claim is infringed only if each and every limitation is practiced by the accused technology.

The same logic also applies to certain types of copyright. If the courts in a given country routinely hold the copying of six consecutive musical notes to be an infringement, then they're more restrictive than a country in which it takes seven notes in a row to infringe and in which you can, as a result, copy six notes and steer clear of infringement by replacing any one of the seven notes of the original sequence.

Unfortunately, software copyright is more complicated than this. Intuitively one might be led to think that if a district court finds nine lines of code copyrightable, a body of 7,000 lines -- at issue in the very same litigation -- would be a no-brainer. Not so in Oracle v. Google. The nine-line rangeCheck function and a set of source files containing only a few dozen lines per file were deemed protected. 7,000 lines of declaring API code were not. Both decisions are being appealed.

In this case, the nine lines of code that were held protectable indeed constitute, relatively speaking, a broader right than the 7,000 lines. But what makes software copyright complicated is the idea/expression dichotomy. It's in the statute. 17 U.S.C. §102 is the statute defining the scope of copyrightable subject matter:

(a) Copyright protection subsists, in accordance with this title, in original works of authorship fixed in any tangible medium of expression, now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. Works of authorship include the following categories:

(1) literary works; [...]

(b) In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.

Some of what (a) giveth, (b) taketh. Software developers say they "write" code. And it is indeed copyrightable like literary works. But to what extent? Protecting ideas, systems, methods of operation etc. is not what copyright is meant to do. Some of this can be protected by patents if the criteria for patentability are met. Patents can be broader than copyright. They typically are.

The exceptions are needed to counter overbroad claims. In the 19th-century Baker v. Selden case, the author of a book on an accounting method wanted to own that method. In other words, he wanted copyright to be the equivalent of a financial business method patent (but without examination, and longer-lasting). This is a clear case of excessive breadth. If Oracle had claimed that its copyright in the Java documentation affords it a monopoly over a wide range of write-once-run-anywhere virtual machines with a Java-like set of characteristics at an abstract level, then Baker v. Selden would clearly apply.

Resolving the idea/expression dichotomy isn't easy for the courts. Ultimately, any copyrightable work consists of non-copyrightable smaller elements. A single musical note isn't copyrightable; but a sequence of musical notes is. None of the words used in this blog post is copyrightable; the post as a whole is. A pixel in a given color isn't copyrightable; a painting usually is. Basically, every copyrightable work is a structure, sequence and organization of non-copyrightable atoms.

In 1992, the Second Circuit's Computer Associates v. Altai ruling proposed the Abstraction-Filtration-Comparison test, which has been widely adopted since. It sounds very systematic: the first step, Abstraction, is about identifying which elements of an allegedly-copyrighted work are protectable; the second step, Filtration, removes the non-protectable elements so as to focus the third step, Comparison, on the protected elements (and not to be confused by similarities between an asserted work and an accused work that are due to non-protectable elements). But the ruling I just linked to merely describes these considerations and, at a high level, their application to the software at issue in that case -- and then states a conclusion. There would be a lot more clarity about this if every court using this test fully documented it every step of the way, such as in the form of color-coded source code printouts. This isn't done. And it wasn't done by Judge Alsup either. So we're just left with verbal -- and thus interpretable -- abstractions, filtrations, and comparisons.

In the context of breadth and strength of software copyright, the fact that non-protectable elements must be filtered out for the infringement analysis (which also includes the analysis of the "fair use" defense) is key. If we think of an asserted work by a software developer as a hierarchical structure, it means that there are parts that aren't protectable on their own. But the way they've been combined and arranged may be.

When analyzing the breadth/strength or thinness/weakness of software copyright, the first thing to do is to distinguish between the different steps: copyrightability; infringement in a narrow sense; fair use.

Generally, an intellectual property right is broader if there's a smaller combination of elements each of which must be infringed in order to find infringement on the bottom line. If, for example, the structure, sequence and organization of the Java APIs is found protected on its own, while the particular lines of declaring code are not, then that's a broader IPR than one involving both elements. For example, if Google had used the same SSO, but invented different names for each function, then it would still infringe the SSO, but not a combination of the SSO plus the names. But... this is copyright, not patents. For a patent, it's an item-by-item analysis. You infringe all, or if there's even one element you don't infringe, then you don't infringe the patent. For copyright, the comparison itself is more flexible, and more vague. You can claim that select parts of your copyrighted work (provided that a given part is copyrightable on its own) are infringed, so if you also own the names, you may have an infringement case against someone using the names under a different structure. And, very importantly, there are different degrees of similarity that are required at the infringement stage. The applicable degree depends on the specifics of a given case. The greater the degree of similarity that you have to prove, the easier for your opponent to defend by persuading a court or jury that despite some undeniable similarities there's enough of a difference to let him get away with what he's done.

Let's say someone translates a copyrighted article (such as this blog post) but also rearranges and rephrases parts of it, uses only some parts of it and adds some others of his own. There could still be a copyright infringement (though there wouldn't be a patent infringement if one or more claim elements are left out). So the standard of similarity -- as opposed to just the extent to which elements are filtered out for copyrightability purposes -- is key to the breadth/strength (or thinness/weakness) of software copyright. If 100% of an asserted work is deemed copyrightable, but the infringement standard is almost as strict as for a patent, then copyright is, in this case, weak/thin. It's weak/thin because of its limited ability to capture infringement -- even though the part of the asserted work found copyrightable is as fat/strong as it gets -- 100% of it.

If an infringement is found, the "fair use" defense usually comes up right away. Fair use is about weighing multiple factors, and the extent and scope of someone's copying influences the overall analysis. The closer someone is to literal copying, the harder it is for him to prevail on "fair use". This, too, has to do with the breadth/strength or thinness/weakness of software copyright with respect to the right holder's ability to prevail over accused infringers.

One final point on the meaning of "thin" or "weak": this is a relative concept, not an absolute one, so it depends on the comparison one has in mind. Relative to the degree of protection software can receive in the form of (potentially broad) patents, copyright is certainly "thin" or "weak". For literature there's no equivalent to a patent. Copyright is the only form of protection (apart from trademarks) availabile for it. But the aspect of software -- expressive code -- that is treated under U.S. copyright law like literary works enjoys a reasonable degree of protection.

The argument over "thin" (or "weak") software copyright in Oracle v. Google

There's a popular misbelief. Some believe, because they've been led to believe, that Oracle needs the appeals court to adopt an extremely broad scope of copyrightability -- basically, turning copyright into another kind of patent right in terms of breadth. But that's absolutely not true.

Just imagine, for a moment, that you're in Oracle's shoes. You have that Java licensing business and you want Google to respect the same rules that your ecosystem largely accepts. You want them to take a commercial license from you or to use it under the GPL free and open source software license (which you, or the company you've acquired, have made possible). In order to achieve this purpose, all you need is a sufficient scope of copyrightable subject matter. It has to be sufficient in terms of being enough stuff for Google to need a license. Of course, you'll try to prevail on as much as possible, like any other party to a dispute like this. But your ability to achieve your objectives does not hinge on a superbroad scope of copyrightability.

By contrast, Google needs more or less an affirmance of Judge Alsup's wholesale anti-copyrightability ruling. It needs some kind of software exceptionalism that makes software less than a second-class citizen in the realm of copyright. That's because Google uses the structure, sequence and organization of those Java APIs; and it uses the declaring code including all of the names. If, hypothetically speaking, Oracle prevailed on SSO, Google would need a license. If Oracle prevailed, still hypothetically speaking, only on the set of names, Google would need a license, too. Neither the district court nor the parties distinguished between more and less complex lines of declaring code in the sense of proposing different rulings based on complexity, but if we nevertheless imagined an outcome along those lines, there would likely still be too much copyrightable material here for Google to escape an infringement finding (and for it to be able to do without a license).

Google argues that Oracle's Harry Potter analogy doesn't make sense in a software copyright case. It says that software is, because of its functional nature, fundamentally less protectable by copyright than fictional literature. But it attributes this to its functional nature and fails to explain which software would remain copyrightable if the appeals court adopted its proposed set of rules.

Google makes a "thin" (or "weak") copyright argument with respect to all three stages of the analysis: copyrightability, infringement (in a narrow sense), fair use. It conflates those stages of the analysis throughout its responsive brief, and Oracle criticizes it for this in its reply brief. In particular, Oracle notes that Google points to Sega. Accolade. Here's the relevant passage from the Sega ruling:

"Borrowing from antitrust principles, Sega attempts to label Accolade a 'free rider' on its product development efforts. In Feist Publications, however, the Court unequivocally rejected the 'sweat of the brow' rationale for copyright protection. 111 S. Ct. at 1290-95. Under the Copyright Act, if a work is largely functional, it receives only weak protection. 'This result is neither unfair nor unfortunate. It is the means by which copyright advances the progress of science and art.' Id. at 1290 [...] Here, while the work may not be largely functional, it incorporates functional elements which do not merit protection. The equitable considerations involved weigh on the side of public access."

This paragraph is part of the "fair use" analysis in Sega. The "weak protection" quote from Feist v. Rural should also be read in context:

"This principle, known as the idea-expression or fact-expression dichotomy, applies to all works of authorship. As applied to a factual compilation, assuming the absence of original written expression, only the compiler's selection and arrangement may be protected; the raw facts may be copied at will. This result is neither unfair nor unfortunate. It is the means by which copyright advances the progress of science and art."

The words "thin" or "weak" don't appear in Feist, a copyrightability case. All that Feist says is that "the raw facts [in this case, telephone directory data] may be copied at will". The term "raw facts" means pre-existing, real-world data. None of the Java API code is comparable to such pre-existing, real-world data.

Google doesn't point to copyrightability case law that explicitly labels software copyright as "thin" or "weak". It quotes Berkeley Professor and EFF Vice Chairwoman Pamela Samuelson's writings, such as Why Copyright Law Excludes Systems and Processes from the Scope of Its Protection, which even admits that there are different positions in academia on whether software receives "thin" protection. As an example of a proponent of the "thin" protection theory, she quotes Professor Paul Goldstein's Infringement of Copyright in Computer Programs, which according to Professor Samuelson describes software as receiving "very thin" copyright protection. Remember what I said about the point of comparison? The abstract of Professor Goldstein's writing says this:

"Copyright, with its low standards, long term and thin layer of protection, is far from an appropriate vehicle for attracting optimal research and development investment to functional subject matter like computer software. Patent law, with its much higher standards and level of protection may be more appropriate, but is also far from perfect."

He doesn't say software copyright is thinner than other copyright. No, he says copyright itself is thin -- as compared to patents. Not compared to the protection afforded to other categories of copyrightable material.

Even Professor Samuelson acknowledges that there's disagreement on how thin copyright is: "Some commentators have been skeptical of the 'thin' protection doctrine, although without close analysis of § 102(b)."

Anyway, Google's whole argument about "thin" protection does not, in my view, counterbalance Axiom 1 of Oracle's appellate argument:

"The Copyright Act's threshold for copyright protection is very low. Any 'creative spark' counts, 'no matter how crude [or] humble.' Feist Publ’ns, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340, 345 (1991) (internal quotation marks omitted)."

That is, again, the telephone directory case. In that case, there was no such creative spark with respect to the "raw data". In the Java case, there's far more than a spark.

What's the right degree of protection for software developers?

After two rather long sections I can keep this third one short. I'm not concerned that copyright protection for 7,000 lines of declaring API code and/or its structure, sequence and organization poses a threat to honest software developers. The statistical probability of an innocent person infringing, by happenstance, a sufficient portion of these APIs to be considered a copyright infringer (assuming that, dependent on the jurisdiction, an independent creation defense is unavailable or unavailing) is almost zero. If it happened once (to one programmer in the world) every ten million years, that would be unusually frequent.

By contrast, think of all the little guys who got sued by Lodsys or Macrosolve over patents that these companies claim to read on pretty fundamental features like online updates or Internet forms.

Many app developers access the Java APIs, but they don't have anything to fear from Oracle v. Google, which is not about Oracle trying to prevent Google from writing Java apps -- but about Google's creation of a cannibalizing platform. But app developers also need some protection for their own works, and copyright is cheaper, faster and narrower than patents.

I certainly don't have a problem with intellectual property protection that can't be infringed inadvertently. Everyone can sit in front of their computer and write code without worrying about it. If Oracle prevailed 100% on its appeal, nothing -- I repeat, nothing -- would change for honest people. But if IP can be used against independent authors who don't copy or plagiarize, then the question of desirability requires further analysis. This is a copyright-focused blog post. It's not the time and place for such further analysis.

If you'd like to be updated on the smartphone patent disputes and other intellectual property matters I cover, please subscribe to my RSS feed (in the right-hand column) and/or follow me on Twitter @FOSSpatents and Google+.

Share with other professionals via LinkedIn: