Hard lessons in dataset licensing to create commercial products: 77m v Ordnance Survey
If you are interested in database licensing, the intrigue of how complex geo-spatial based services are developed, electronic mapping and polygons, the legality of scraping, how online terms governing databases are construed, and database rights, then the recent UK decision in 77m Ltd v Ordnance Survey Ltd  EWHC 3007 (Ch) (08 November 2019) is for you.
The dispute in the case was between a start-up company 77m and Ordnance Survey (OS), the national mapping agency of Great Britain. 77m created a dataset called Matrix consisting of an up-to-date, detailed and accurate list of the geospatial coordinates of all the residential and non-residential addresses in Great Britain containing 28 million records.
77m created Matrix by accessing, combining, and processing data from a wide range of datasets which were either publicly available for free or which 77m paid to access. 77m accessed over 50 million records from the various sources. At least 18 datasets from different sources were accessed. One of the sources used to create Matrix was data from Her Majesty’s Land Registry (HMLR). 77m entered into various contracts with HMLR to receive data from or have access to HMLR databases. Another source was the Registers of Scotland (RoS).
Although there was a dispute about exactly how Matrix was created (77m was found not to be forthright and even to have lied about what it did), the court described the steps as i) process address data; ii) process geospatial data; iii) link address data and geospatial data together; and iv) generate more data associated with geolocated addresses.
Matrix was developed to compete with an existing OS product called AddressBase. 77m did not contract with OS for access to AddressBase. The central question in the case was whether 77m had succeeded in legally creating Matrix without breaching any licenses used to develop it or infringing any database rights held by OS or others. The court ruled it had not.
The decision of Birss J., is a lengthy 343 paragraph decision. It goes into detail about all of the datasets accessed and how they were combined to create Matrix. He also carefully analyzed various license terms, their scope, and whether they were breached.
The summary below highlights portions of the decision focusing on the licensing issues. I also provide commentary and takeaways for practitioners engaged in licensing or using datasets to create commercial products.
The INSPIRE Download Terms
One of the license issues was whether 77m complied with online download terms to the INSPIRE polygon dataset. In electronic mapping, parcels of land are defined by polygons. An individual polygon is defined by the set of coordinates of its vertices.
HMLR offered an INSPIRE download service on the “INSPIRE Download Terms”. These terms provided a license under the Open Government Licence (OGL). The OGL enables public bodies to make their data available free of charge for reuse “for commercial or non-commercial purposes”. However, the license was subject to two exceptions, namely to:
- use the polygons (including the associated geometry, namely x,y co-ordinates) for a purpose other than personal, non-commercial use or commercial or non-commercial use within your organisation; or
- sub-license, distribute, sell or make available the polygons (including the associated geometry, namely x,y co-ordinates) to third parties.
The general license language was broad and expressly included combining the data with other information “including it in your own product or application”.
The dispute between 77m and OS was whether the data could be used to make available a competing service to OS’ AddressBase. OS contended that the term “use within your organisation” (or “internal business use”) was a term of art in the industry which restricted use of the licensed data for the internal administration and operation of the licensee’s business as opposed to use for the creation of products or services for licensing or supply to third parties (whether on a commercial basis or otherwise). The court rejected this interpretation.
Instead, the court agreed with 77m that the terms did not expressly prohibit the use of data derived from the polygons in a product or service. Rather, the words focused on preventing use of the polygons outside the organisation and making available the polygons themselves (including the associated geometry). That was consistent with a prohibition on resale of the polygons but not a prohibition against the sale of a product or service based on the polygons but which did not include them. Specifically, the terms did not prohibit providing a service that disclosed derived data, but not the polygons or specific geometry in the dataset.
Takeaway: A licensor that wants to prevent re-use of data in a competing service needs to do so in an explicit manner. Even a license restricted to “internal business purposes” is not enough where a competing service may use the data to create “derived data” that is made available on a commercial basis to the public.
The A1 Match licence
Another issue was how to construe a bespoke license that granted 77m a licence to use the INSPIRE data information supplied by HMLR on 77m’s internal systems “to confirm whether 910,000 INSPIRE IDs relate to non addressable sites.” OS contended that the license was limited to the purposes disclosed at the time of contracting, namely, to cleanse garbage polygons/non-addressable sites from INSPIRE polygon data held by 77m. 77m argued that the license also covered the use it actually made of the data which included using the “link between the INSPIRE ID and address given in the A1 property description for addressable sites (my emphasis) as a way of specifying the geospatial coordinates of its own list of addresses in the Master Address List. It did this to create its anchor points.” The court held this activity was not permitted by the license.
Takeaway: 77m did not disclose its real purpose for licensing the INSPIRE data or negotiate terms that actually covered the intended use. This strategy did not fare well for 77m.
FAP licences and scraping of A1 property descriptions
Another service HMLR offered was called “Find a Property” or FAP. It is a statutory service which allowed people to search for titles on the internet. 77m admitted that it used the FAP service to match INSPIRE IDs to addresses for about 480,000 address records that had been manually obtained from the FAP service. However, the trial judge found that an additional 3.5 million address records were obtained using FAP. The court concluded this was done by scraping – the use of an automated tool to access records via the FAP website.
OS contended that scraping was expressly prohibited by its service terms. Whether it did or not depended on what terms applied to the service. As it happened, the public website from which the data was made available had two sets of terms that could have been applicable. One set, the OGL terms were less prominent to users and were available via a hyperlink and permitted scraping. The more prominent and more specific terms (also available via a link) governing the use of FAP prohibited it.
The court ruled that to determine which set of terms governed was to be approached objectively from the point of view of a reasonable user coming to the website to use it.
Neither side drew my attention to any authority on this but in principle it seems to me that the way a question like this should be approached must be to consider objectively the point of view of a reasonable user coming to the website to use it. That is because these licence terms are not open for negotiation. They are standard terms which the operator of the website is putting forward to all users.
On this basis, the court determined that the more prominent terms applied. Since the address records were obtained by scraping, these 3.5 million records were obtained in breach of those licence terms.
Takeaway: There are legal theories in various jurisdictions that can create liability for scraping data including the tort of trespass, laws that prohibit unauthorized access and copying of data from computers, laws against circumvention of technical measures that are used to prevent access or copying of databases protected by copyright, and sui generis protection of databases such as those in the European Union. (I deal extensively with these in Chapter 3 of my book Sookman: Computer, Internet and Electronic Commerce Law). However, many database licensors expressly prohibit data scraping in website terms or in bespoke license agreements. Anyone engaging in scraping to create a rival database (including in other contexts such as scraping financial data for “open banking” purposes) needs to be aware of these and the other legal theories that can create liability. 77m had argued in this case that it had manually copied the address records by using offshore Pakistan resources. However, improbable this may have been for 3.5 million records, the alleged records of how these addresses were obtained had been deleted. Clearly, these types of records have to be retained in a project where scraping is contractually prohibited but manual extraction and re-utilization is not. Conversely, in this case OS could have avoided the manual extraction defense by permitted use or download quantity limitations in its license.
The RoS Land Values licence
77m also used RoS Land Values data in creating Matrix. This data combined addresses and a geolocation for each address. This data was subject to a license in respect of the defined term “Data” which referred to “Land Values plus House Type”. The Data was licensed to allow users “to develop a web service containing house sale information”. Other data made available could only be used “for internal modelling purposes.”
The challenge for 77m was that it didn’t want the data to offer a service pertaining to land values and house types. Rather, the usefulness was the combination of centroid/geolocation and address data to ascribe good geolocations to 77m’s own Master Address List. But, the only license for this use of this data was for internal modelling purposes. 77m argued that the centroid/geolocation data could be used as part of a commercial product. The court agreed with that but rejected 77m’s contention that it permitted it to use the data to produce a new geolocation for all or substantially all of the properties in the dataset. That was not an act of internal modelling. Accordingly, 77m’s use of the RoS Land Values data was not licensed by RoS.
Takeaway: Many datasets are licensed for specific purposes with some data licensed for a broad purpose and other parts of the data for other or more limited purposes. One cannot simply assume that because data is made available under license any uses thereof are licensed or go ahead and use the data for a prohibited purpose without obtaining another license that covers the use.
Infringement of database right by 77m
The court also engaged in a detailed analysis of whether 77m’s extraction and use of various datasets infringed the EU Database Directive 96/9/EC which was transposed into UK law by the Copyright and Rights in Databases Regulations 1997 (SI 1997/3032). The court reviewed the leading UK and CJEU cases construing this sui generis right including Football Dataco Ltd v Sportradar GmbH  FSR 30, British Horseracing Board Ltd v William Hill Organisation Ltd (C-203/02), Directmedia Publishing GmbH v Albert Ludwigs-Universität Freiburg (C-304/07, Apis-Hristovich EOOD v Lakorda AD C-545/07, and Innoweb .V v Wegener ICT Media C-202/12.
The court concluded that 77m’s use of the centroids from the RoS Land Values dataset and the addresses acquired from HMLR via the A1 Match licence and scraping the FAP service amounted to acts of infringement of database right. However, 77m’s use of the addresses manually downloaded from FAP, which were about 480,000 in number was within the authorised extraction defence and therefore did not infringe any OS database rights.
In the result, while 77m had achieved a measure of success, it lost the case on several grounds. The case illustrates hard lessons in dataset licensing.
This article was first published on barrysookman.com.
The judgment in 77m was released by Justice Birss on November 8, 2019. A week earlier (November 1, 2019) he released reasons in a significant copyright decision Warner Music UK Ltd & Ors v Tunein Inc  EWHC 2923 (Ch) (01 November 2019) in which he found the Internet music radio service Tunein to have infringed the copyrights of record labels. The infringements were, inter alia, by communicating to the public in the UK webcasts and simulcasts of sound recordings that had not been licensed for the UK marketplace. This is a carefully reasoned decision and a must read for developers of applications and web services that use copyright protected content that is made available in one jurisdiction and to target and make it available to users in other jurisdictions for which it is not licensed. The case had originally been heard by the well respected judge Henry Carr before he passed away.