Prepared for submission to Computer Law & Security Report
Version of 16 December 2005
© Xamax Consultancy Pty Ltd, 2005
Available under an AEShareNet
licence or a Creative
Commons
licence.
This document is at http://www.anu.edu.au/people/Roger.Clarke/II/Gurgle0512.html
Google began as yet another search-engine. Its owners have found a successful business model, have established additional lines of business, and have achieved quite dramatic growth and profitability. The now multi-facetted Google corporation is a 'newly big business', and is forthrightly challenging both 'old big business', and copyright, competition, consumer and privacy laws.
The World WideWeb exploded into popular consciousness in 1993. The Web enabled access to content. As the volume of content increased, the informal exchange of URLs became an inadequate way for the escalating user-population to find what they were interested in accessing. Discovery services were needed, to complement the basic Web mechanism.
Techniques were already available to index large volumes of text. The earliest indexing had been biblical concordances, the first of which has been attributed to Hugh of St. Cher's in the 13th century. During the 1970s and 1980s, a range of software was developed to support the indexing of text and the discovery of documents that matched users' search-terms. What came to be known as 'search-engines' applied these established techniques, adapted them to the new context, and drove their effectiveness, efficiency and usability to new levels.
A large number of search-engines have been established, and many are still in existence. Altavista had the greatest impact in the period 1996-98, but fumbled its intended transition from gratis service to 'monetiser' and 'wealth-generator'.
Since then, a late entrant by the name of Google has grown very rapidly, and achieved dominance. It appears to cover a larger proportion of the Web than its competitors (although others appear to have better coverage in specific areas). It has several innovative features that users value, and it has attracted the capital it needs to continue to crawl vast volumes of data frequently.
The Web of the mid-1990s was successful because it was very, very simple. A decade later, the Web is still nowhere near the sophisticated eLibrary envisaged by the hypertext pioneers between 1945 and 1970 - Bush, Engelbart and Nelson. There are many more steps to be taken, but a variety of innovators are easing the Web towards a more mature set of information services.
Google is one of those innovators. It began as yet another search-engine. It was useful. It attracted large amounts of venture capital, achieved brand-recognition, projected an image of corporate responsibility, and 'captured eyeballs'. It devised ways to generate revenue based firstly on targetted advertising, and then on intermediated advertising through its Adsense service. And it has raised huge amounts of cash through public share-offerings.
Google has been successful, in terms of growth, profitability and share-price. After a mere 7 years, it has already attracted Business School hagiography (e.g. Vise & Malseed 2005). But those are not the reasons for this article. Rather, Google is important because of the features it has provided in its basic service, the further services it has introduced alongside its search-engine, and the many ways in which the now multi-facetted Google corporation has run close to the wind.
The purposes of this paper are to provide brief overviews firstly of the lines of business that the corporation has been developing, and then of the ways in which it is challenging copyright, competition, consumer and privacy laws.
At this stage, Google continues to be fundamentally a search-engine, with trimmings. But the corporation has diversified into a wide range of additional lines of business. Some of them it has developed, in some cases by replicating existing ideas but adding new features, in others by outright innovation, and others it has acquired by purchase and takeover. Consistently with conventional business strategy, these lines of business generally appear to be intended to 'cross-leverage' one another.
For the purposes of the analysis conducted in the later parts of this article, it is useful to cluster Google's services into three segments.
Google's foundation service was, and remains, the search-engine. This depends on a 'web crawler' called Googlebot, which accesses and indexes a substantial proportion of the content accessible on the Web.
The index generated by Googlebot is then made available to all comers by means of a search-engine. Users key in 'search terms'. The richness of the enquiry language available to the user varies between suppliers; yet Google's is among the weakest of all. This reflects its intention to appeal to the masses, rather than to the librarian or even the moderately well-educated researcher.
The search-engine responds with a list of the pages that match the search-terms that the user provided. Given the massive scale of the Web, most search-terms generate very large numbers of 'hits'. The key challenge is to rank the pages in a manner that will be likely to be helpful to the user. This is an area in which Google's demonstrated performance has excelled. Its particular 'precedence algorithm' or PageRank technique sorts the hits into a sequence that is usually helpful, and sometimes uncannily accurate. Some years ago, it cheekily began offering an 'I feel lucky' option, which delivers the page that comes first in its rankings. On the one hand, this reflects the fact that a very large proportion of users are entertainment-oriented surfers rather than people on a mission to discover specific information; but, on the other, it signals the confidence that the company has in being able to often deliver something of perceived value.
The company has developed a range of extensions to the basic service. Many of these are merely restrictions of the search-space to sub-sets of the index, such as searches only for images, or only of blogs. Another feature is that any web-site that has attracted the attention of the crawler can offer its visitors a local site-search utilising Google. This takes about half-an-hour to implement, and is valued by many small organisations.
A recent extension of particular interest is Google Scholar. In this case, the search is restricted to materials of relevance in the academic world. This service differs somewhat from other scope-restricted searches in that it has been necessary to ensure that the crawler visits appropriate sites; and features have been added, such as a citation-count, and discovery of and navigation to papers that cite each paper.
Without yet quite becoming a content-provider, various of Google's services are moving a little beyond mere discovery of content provided by others.
One extension of interest is Google News, a consolidator service, which provides users not only with links to current news, but also allows a degree of customisation of the selection, and displays from its cache the headline, source, and first line of text.
Another extension that merges into content-provision is Google Earth. This provides satellite imagery of the earth's surface, already at fairly high levels of resolution, although currently of uncertain age. It is accompanied by client-side software, and offers several levels of service and data precision, some gratis and some for-fee.
A further initiative that involves content on a very large scale is Google Library, which is part of a broader Google Print project. This scheme, in collaboration with five leading libraries, involves the scanning of many books, the extraction of the text, the indexing of that text, and support for users in discovering segments of that text that satisfy their search-terms.
Use of Google's search-engine service is gratis. The company earns considerable revenues from advertising, however, because the advertising is able to be targeted based in particular on the search-terms nominated by the user (Tyacke & Higgins 2004). And the growth imperative demands that the corporation extract more revenue from advertisers.
The third cluster of services was summed up by a statement by Google's CEO to financial analysts: "We are moving to a Google that knows more about you" (reported in The New York Times of 10 February 2005, and subsequently in many other places).
As will be discussed below, the Google services described earlier already provide the company with a massive amount of information about its users, which may provide opportunities for substantial financial returns. But the company is making sure that there will be more, of both.
In 2004, it launched a gratis web-mail service dubbed GMail. This has features that differentiate it from its predecessors such as Hotmail and Yahoo. They include huge storage space, enabling long-term retention of messages, and auto-selection of ads for display to the subscriber. This is understood to be based on text in the message from the subscriber's correspondent, but it could be easily upgraded to reflect the subscriber's accumulated profile.
A new category of web-based services has recently emerged (in a manner and of a form reminiscent of the halcyon days before 'the dot.com bust'). So-called 'social networking services' (SNS) provide spaces where people can establish and expand small networks and larger communities. SNS involve participants creating profiles for themselves that are variously honest, creative and downright dishonest. In addition, many SNS encourage participants to provide personal data about other people, and even to upload the contents of their address books. Google has an entrant in this market, called Orkut.
A further service that might later migrate into this cluster is Google Desktop. This is Google's version of a tool to provide search capabilities across the user's own storage. Such features have previously been provided on Macintoshes, but not yet by Microsoft for the dominant Wintel workstation environments. At least at present, Google Desktop appears to run entirely on the user's own machine, without any linkage out to Google's site. Data about its users therefore appears to be currently unavailable to Google.
But there appears to be very little to ensure that this remains the case. In December 2005, the Google Desktop Terms and Conditions still contained no link to any Privacy Policy, and were easily read as enabling Google to use any personal data that it gathered in any manner it sees fit, now or at any time in the future.
The following sections consider ways in which the various Google services present challenges to the operation of the law, and to the interests of the many different categories of party that have an interest in contemporary ePublishing. The first topic addressed is competition law.
Google dominates search-engine usage, particularly for the general public. But it is far from alone, and it does not appear to have any natural advantage that makes its market uncompetable. Similarly, its other lines of business, such as Earth, Desktop and Orkut, do not have the field to themselves.
Google's moves into content, on the other hand, have given rise to concerns in some quarters that digitisation of old works create the risk of some kind of monopoly over the content of published works, and hence of monopoly rents.
There have been prior initiatives to digitise the world's literature, starting with Project Gutenberg in 1971. Google's deal with major libraries has stimulated parallel initiatives. Some are competitive with Google but collaborative among many other players, such as the European Digital Library mooted in May 2005, and the Open Content Alliance (OCA) announced in October 2005. One that appears to be head-to-head competitive is the British Library / Microsoft project announced in November 2005.
Particularly in view of what appears at this stage to be a virile response from elsewhere in the private sector, it is not clear that a reasonable argument can be mounted that Google's Print and Library campaigns are anti-competitive. That view could be subject to review if, for example, Google were able to use its patents on scanning technology to lock competitors out of the market for an extended period. Otherwise, this would appear that it may be a classic case of that (remarkably rare) phenomenon of successful first-mover advantage.
Discussions of anti-competitive effects draws attention towards the now-dramatic excesses of copyright law in favour of owners. Stronger and longer monopoly has been now granted on every novel, biography, manual and learned work than has ever been the case in the past. Inter-supplier competition is only one dimension. Owners have been granted by the U.S. and subsequently by other parliaments unmindful of their own nations' self-interest, the ability to wield dramatic market power over both consumer and content-intermediaries.
This is against the interests of a young, busy and rich content-intermediary, the Google corporation. The following section considers Google's challenge to the legal rights of copyright-owners.
From the outset, there was considerable potential for fundamental practices of search-engines to be found to be copyright-infringing. In particular:
Those early battles over what can and cannot be done have been resolved, and the boundaries around each of the above practices appear to be fairly clear. American law showed that (provided that very large dollops of money were available) it is capable on at least some occasions of adapting reasonably quickly, and finding a balance appropriate to new capabilities delivered by new digital technologies.
There have also been trademark battles over the Adwords technique used by Google, because trademarked terms have been used, and not infrequently acquired by competitors to the trademark-owner (e.g. Atlee & McMahon 2005).
The copyright and trademark wars are far from over, however, as Google continues to push at the boundaries. In March 2005, the extension service Google News came under attack in the courts by Agence France Press, which believes that the manner in which the service is designed infringes its copyright (e.g. Wright 2005).
The Google Earth web-site is almost devoid of information about copyright, or even terms of use, although marks asserting Google's ownership of copyright appear on many images. It appears that the service has been structured using conventional contract, agency and copyright licensing agreements. If that assumption is correct, then it would be reasonable to expect that such copyright disputes as arise in relation to the use of Google Earth will be settled under existing laws.
But that appears not to be the case with at least some of the complex of services in the Google Print / Google Library cluster.
In many cases (e.g. out-of-copyright works held by organisations such as the Bodleian Library), and works for which the copyright-owner provides a licence, there would appear to be no sense in which the Google Library initiative could reasonably be claimed to infringe copyright law. Some uncertainties exist in relation to works whose copyright-owner is uncertain or cannot be located (referred to at least in the U.S. as 'orphan works', and the subject of current consideration by the U.S. Copyright Office).
Much greater contentiousness arises in relation to Google Library's handling of works that are in-copyright, and whose copyright-owners have not provided licences, and are not prepared to do so. Most of Google's partners appear to be restricting the arrangement to out-of-copyright books. But at least one is making in-copyright books available for scanning. The University of Michigan Library states that "Google will digitally scan and make searchable virtually the entire collection of the U-M library". It very much appears that Google is flexing its now-considerable muscle and proposing to scan, index and enable search over copyright works.
This undertaking involves several steps that can be argued to breach copyright:
Unsurprisingly, the aggrieved have resorted to litigation to defend their position. Two separate actions have been launched in the U.S. District Court, one in September 2005 by the Authors Guild, and the other in October 2005 by five major book-publishers (TechLawJournal 2005, Band 2006). The locations of the protagonists's Head Offices and of the lawsuit could hardly be a better vindication of the Lessig (2000) thesis about 'West Coast Code vs. East Coast Code'.
Most content-expressors want to retain control over at least some aspects of the content they create. Copyright law has long provided rights to originators over copying and republication, and over adaptation and republication of adapted works.
Because of the works' value, and their economic muscle, large, for-profit corporations have come to control much of the content that has the potential to attract revenue. Individual originators have been relegated to the role of employee or contractor. In addition, a few forms of content, such as feature films and light entertainment series, require considerable investment, and result from creative work by teams rather than a single primary originator. As a result, copyright in such works is commonly owned by corporations from the outset.
During the last 15 years, as the digital era has threatened their monopoly profits, copyright owners, particularly in the music and film industries, have lobbied the U.S. Congress for stronger protections. The U.S. Administration is now lobbying, on their behalf, for these extended powers to be given effect in other countries. Remarkably, many countries appear incapable of recognising the disadvantages to themselves in doing so, and are falling into line. Australia, an ultra-loyal ally in American foreign policy, ignored or was ignorant of its own economic interests, and was among the first to do so, through the US-Australia 'Free Trade' Agreement.
The stage is set for enormous tensions between the interests of content-accessors and copyright-owners, with the legal dice already very heavily loaded in favour of the latter.
Google was formed in 1998, but already has $4-5 billion annually in revenue and a market capitalisation of $125 billion. It is a 'newly big business'; and its interests are strongly divergent from those of 'old big business'. Whereas old big business sits fatly, and exploits and arranges extensions to its monopolies, 'newly big business' makes its money by adapting quickly to new contexts, and creating new monopolies that it can dominate from the outset.
From the discussion to date, it might appears that Google is aligned with the interests of content-accessors. The following section considers, however, the challenges that Google presents to consumers and consumer protections laws.
The Web and search-engines began life in the mid-1990s as gratis services, socially-oriented and socialist or communitarian in nature. The patterns have changed a great deal during the following decade, as business enterprises have sought ways to make money from the vast volumes of content and traffic. The social dimension is far from dead, but there is now a substantial economic dimension that threatens to swamp it.
Even since the commercialisation of the Internet, consumers have benefitted greatly from the Web, and the flood of readily accessible content made available on it. Search-engines, not least Google, have made golden needles discoverable within an increasingly large haystack. The first round unarguably brought massive consumer benefit, particularly given that the actual access to content was gratis - for those people with access to the necessary infrastructure.
But the contemporary context is economic to the point of being anti-social. It's therefore necessary to consider the extent to which the interests of consumers are holding up against the interests of the generally much more powerful corporations that control much of the available content. Of particular concern are the interests of the less powerful consumers, i.e. individuals, associations, and small business enterprises.
In many circumstances, consumer rights are simply not respected by commercial providers of content on the Web. Commonly, terms are not negotiable, and in many cases are not even transparent. They are changeable at short notice. They do not survive takeover, or even change in management, or just in management policy. Old versions of terms are simply over-written, and cease to be discoverable. Communications from consumers to providers are ignored, and in many cases barriers are created to make it difficult for the consumer to work out how to send them in the first place. Recourse and enforcement are almost non-existent, not only across jurisdictions, but even within historically consumer-friendly jurisdictions. One interpretation is that the U.S. has successfully exported its marketer-friendly / consumer-hostile / low-regulation approach to the rest of the world.
Google looms as one of the most arrogant of the new generation of content-intermediaries, and appears set to carry its attitudes across into its content-provision business lines. Its terms are abrupt and invariant. Changes are made without notice, as the company sees fit. Consumers who use its Adsense facility find similarly hostile terms, and inflexible application of them, as the company sees fit, without correspondence being entered into. Its terms for Gmail have drawn particular criticism.
No expansion of consumer protection to cope with these abuses is currently in prospect. As content-expressors increasingly flex the powers granted to them by the U.S. Congress and subsidiary parliaments such as Australia's, the interests of content consumers seem likely to be lost in the surge of copyright supremacist activity. There is little comfort to be had in the recent W.S.I.S. Tunis Declaration that "We call for the development of national consumer protection laws and practices, and enforcement mechanisms where necessary ..." (WSIS 2005, para. 47), not least when the document is published by the I.T.U.
The final section shifts back from the economic dimension of content-access and consumption, to the social dimension, and considers the challenges that Google presents for privacy law.
The Web and search-engines have tended to undermine the longstanding protection of privacy through obscurity. See Clarke (1998, 1999), Noguchi (2004) and Aljifri & Sánchez Navarro (2004). Many people conduct research into other people, drawing on court reports, mentions in the media, letters to the editor, records of participation in events, and postings to lists, fora and blogs. The motivations for some of these activities are constructive (e.g. to prepare for a forthcoming meeting), but in other cases they are less so (such as stalking, harassment, and extortion).
The sensitivity of personal data is a serious enough concern, but to that must be added the huge problem of pitifully low data quality. Web content is commonly out-of-date, incomplete, uncorroborated, unsourced, or lifted out of its original context without so much as a reference to what that context was. Some of it is simply inaccurate. Some of it is spurious, relating to another person with a similar name.
Some of it is scurrilous, as captured by the sceptical epithet 'It's on the Web; it must be true'. The splendid case study of the John Siegenthaler entry in Wikipedia, May-December 2005, highlighted the weaknesses and strengths of the Wikipedia model and process.
The Google search-engine is among the most intrinsically intrusive of all Web facilities, not because of any evil intent, but simply because its cachement is so large. Another service that harbours both enormous benefits and great privacy threats is the Internet Archive Wayback Machine.
The privacy implications of email are often overlooked. Most email content was written in unguarded moments, in what may prove to be unwarranted expectations of limited distribution and ephemerality. Yet there is an increasing risk of email becoming available to indexing software. For example, private email may escape onto lists by being forwarded by recipients to other parties; and it may be subject to pre-trial discovery, sub poena or search warrant, and hence find its way into court records.
Every user's Internet Access Provider (IAP) maintains logs of traffic, in some cases including content. Every user's email-Internet Service Provider (ISP) maintains an email database. In the case of webmail-only services (such as Hotmail, Yahoo and GMail), the retention-period is highly uncertain. In all of these cases, the traffic-details and text are subject to unexpected use and to both legally authorised and unauthorised disclosure, often without notification to the individual(s) who thought it was 'their' mail.
Google's Gmail repesents the extremity of untrustworthiness in email-provision. For one thing, it refuses to explain the circumstances under which it releases its subscribers' information, and the number of occasions on which it has done so. For another, Gmail's special features have considerably extended the list of risks. Its subscribers are subject to targeted ads based on text from senders. Google is in a strong position to correlate the ads with other data it holds, including, if and when it chooses to do so, with the content of outbound emails. How rich a profile does an advertiser need to enable the manipulation of consumer behaviour?
Importantly, the threats extend beyond Gmail subscribers to the individuals who send message to Gmail subscribers. The text of their emails is examined, it is retained long-term, it is subject to largely uncontrolled use and disclosure, and the doctrine of privity of contract and the manifold weaknesses and patchiness of privacy laws together suggest that the correspondents simply have no rights at all in relation to the content of those emails. The result has been that some people decline to correspond with other people via Gmail addresses. Perhaps more should do so.
It was noted earlier that one of Google's affiliate businesses is a social networking service called Orkut. Clarke (2004) examined the privacy implications of such schemes, and expressed serious concerns about them.
The whole is, in this case, potentially far greater than the sum of the parts. Google is structuring its business portfolio in order to achieve cross-leveraging, and a particularly valuable form of cross-leveraging is the consolidation of information about the behaviour of users of multiple Google-provided services. At this stage in its development, Google the corporation has the following streams of data about its users available to it:
There is no evidence that the Google corporation has yet moved to mine this data; but this would in any case be a strategically unwise manoeuvre at this early stage. There are various protections nominated in the various privacy policies, none of which are anything like adequate, and all of which are malleable at the will of the company.
Google is a newcomer to the big end of town. New money is always brash; but Google is big new money. The courts are assured of good sport in the next few years, as elephants battle over copyright.
The apparent alignment of user interests with Google in the copyright arena does ntot carry over into consumer rights and privacy. The early, socially-oriented era of the Web is being swamped by the contemporary dominance of corporate interests. The tensions among human, corporate and government interests on the Internet are now very high, and are mostly being resolved against the interests of individuals.
Google promises to be a major player in a range of battles. Its claim that "You can make money without doing evil" is being put to the test, as its growth and diversification puts enormous temptations in front of its executives.
An examination of the epithet is instructive. Google is emphatically not built on an assumption that the company should not do evil. Moreover, a corollary is easily formulated: "But you can make more money by doing evil". Given the obligations of corporations under law, that implies that evil should be done, in order to make more money. It is only reasonable to conclude that Google will see it as being in the company's best interests to gather more personal data, to cross-correlate it and mine it, and to continue to exercise market power over its users.
Aljifri H. & Sánchez Navarro D. (2004) 'Search engines and privacy' Computers & Security 23, 5 (July 2004) 379-388
Atlee S.D. & McMahon B.F. (2005) 'Search Terms: The Use of Trademarked Terms by Web Search Pages Has Challenged Traditional Boundaries of Trademark Protection' Los Angeles Lawyer 28 (November, 2005) 38
Band J. (2006) 'Copyright owners v. The Google Print Library Project' Ent. L.R. 2006, 17(1), 21-24
Clarke R. (1998) 'Information Privacy On the Internet: Cyberspace Invades Personal Space' Telecomm. J. Aust. 48, 2 (May/June 1998)
Clarke R. (1999) 'Internet Privacy Concerns Confirm the Case for Intervention' Commun. ACM 42, 2 (February 1999) 60-67
Clarke R. (2004) 'Very Black 'Little Black Books' Xamax Consultancy Pty Ltd, February 2004
Lessig L. (2000) 'Code and Other Laws of Cyberspace' Basic Books, 2000
Noguchi Y. (2004) 'Online Search Engines Help Lift Cover of Privacy' Washington Post, Monday, February 9, 2004; Page A01
TechLawJournal (2005) 'Major Book Publishers Sue Google for Digitizing Copyrighted Books' TechLawJournal October 19, 2005
Tyacke N. & Higgins R. (2004) 'Searching for trouble - keyword advertising and trade mark infringement' Computer Law & Security Report 20, 6 (November-December 2004) 453-465
WSIS (2005) 'Tunis Agenda for the Information Society' World Summit on the Information Society, WSIS-05/TUNIS/DOC/6(Rev.1)-E ,18 November 2005
Vise D. & Malseed M. (2005) 'The Google Story' Delacorte, 2005
Wright N. (2005) 'Copyright infringement case brought against Google by AFP' EarthTimes Sat, 19 Mar 2005
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, and a Visiting Professor in the Department of Computer Science at the Australian National University.
My thanks to Matthew Rimmer of the Law Faculty at the Australian National University, who stimulated this paper by inviting me to present user perspectives in a seminar on 'Google: Infinite Library, Copyright Pirate, or Monopolist?', at the National Institute of Social Sciences and Law, A.N.U., Canberra, on 9 December 2005. My thanks also to the other presenters at that seminar, for the challenges they presented.
| Personalia | Photographs | Access Statistics |
|
These community service pages are a joint offering of the Australian National University (which provides the infrastructure), and Roger Clarke (who provides the content). |
|
|
The Australian National University
Visiting Professor, Faculty of Engineering and Information Technology |
Xamax Consultancy Pty Ltd, ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA Tel: +61 2 6288 1472, 6288 6916 |
Created: 8 December 2005 - Last Amended: 16 December 2005 by Roger Clarke - Site Last Verified: 15 February 2005
This document is at www.anu.edu.au/people/Roger.Clarke/II/Gurgle0512.html
Mail to Webmaster - © Xamax Consultancy Pty Ltd, 1995-2006 - Privacy Policy