Language Selection

English French German Italian Portuguese Spanish

Keeper of Expired Web Pages Is Sued

Filed under
Legal

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

By TOM ZELLER Jr.
The New York Times

More in Tux Machines

Command Line: FFmpeg, Coinmon, Tizonia

  • FFmpeg Lands OpenCL Improvements
    Besides a lot of NVDEC code landing for the next FFmpeg release, there's also been a number of OpenCL improvements that were just committed to this multimedia library's codebase. The work landed yesterday includes removing an older and experimental OpenCL API while introducing a new OpenCL "hwcontext" implementation. This in turn has introduced an OpenCL overlay filter and OpenCL unsharp mask filter.
  • Coinmon – Check Cryptocurrency Prices From Commandline
    A while ago, we published a guide about Cli-Fyi – a potentially useful command line query tool. Using Cli-Fyi, we can easily find out the latest price of a cryptocurrency and lots of other useful details. Today, we are going to see yet another cryptcurrency price checker tool called “Coinmon”. Unlike Cli.Fyi, Coinmon is only for checking the price of various cryptocurrencies. Nothing more! Coinmon will check cryptocurrencies’ prices, changes right from your Terminal. It will fetch all details from from coinmarketcap.com APIs. It is quite useful for those who are both Crypto investors and Engineers.
  • Command Line Music Player for Spotify, YouTube & Other Music Streaming Services
    Tizonia is a command-line music player that let you stream music from Spotify, Google Play Music, YouTube, Soundcloud, and more, straight from the terminal.

OSS: Configuration Management, Man(ual), Patent Traps (5G and the Internet of Things), Sponsored Development

  • 9 Excellent Open Source Configuration Management Applications
    End users at public and private sector organizations sometimes perceive IT teams a barrier to the development of the business. When the business demands new services and applications, it may take months before progress is made. Why is that? It’s too common for IT teams to spend too much time fighting fires; after all they can come from so many different sources. An IT team’s main responsibility is to maintain, secure, and operate an organization’s systems and networks. This, in itself, carries a huge responsibility. IT teams that maintain technology infrastructure, deploy applications, and provisioning environments with many manual tasks are inefficient. In modern environments, services are rarely deployed in isolation. Simple applications may need several services to run – such as a web server and a database. Deploying more complex systems, many services may need installing, configuring, and linked together. Streamlining system administration must therefore be part of an IT solution. And one of the most time-consuming activity for IT teams is the management of the business’s infrastructure. Automation minimizes manual work, reducing the risk of human mistakes, and offering the ability to quickly deploy new services and applications without risking reliability. Whether it involves container orchestration, real-time big data, deep learning, or stream processing, large software demands operations to be automated. Here’s where configuration management system software steps in. This software automates the configuration of machines to a particular state. Like any other tools, they are designed to solve specific problems in certain ways. The goal is to get a system from whatever state it is in, into the desired state. Configuration management software are the tools of choice for many system administrators and devops professionals. Cloud platforms enable teams to deploy and maintain applications serving thousands of users, and the leading open source configuration management tools offer ways to automate the various processes.
  • 'Gimme Gimme Gimme' Easter egg in man breaks automated tests at 00:30
    The maintainer of the Linux manual program man has scrapped an "Easter egg" after it broke a user's automatic code tests. On Tuesday, Unix systems administrator Jeff Schaller wrote in a Stack Exchange post: "We've noticed that some of our automatic tests fail when they run at 00:30 but work fine the rest of the day. They fail with the message 'gimme gimme gimme' in stderr, which wasn't expected."
  • Open source and standards – The path towards 5G and the Internet of Things
    Following the success of last year’s event, the 2nd workshop “Open Source and Standards – The Path Towards 5G and the Internet of Things”, jointly organised by NGMN and the ITU, took place on 1st November 2017 in Bellevue (Seattle), Washington, USA. The workshop was hosted by Microsoft and co-organised by the IPR Plenary of the NGMN Alliance and the International Telecommunication Union. Bringing together key representatives of a wide range of industry, including standards bodies, open source communities and academia, the discussions focused on how best standard-setting organisations and open source communities can capitalise upon each other’s deliverables and expertise for building a consistent and coherent 5G eco-system. With more than 100 participants, the workshop discussed how diverse stakeholders can rely on the respective strengths and development models to place a broad range of industries in a strong position to achieve the common vision for 5G and beyond.
  • Sponsored development is a win-win for users and developers
    There is a myth that simply by making a software platform open source, qualified people will give up their nights and weekends to contribute to its development. With rare exceptions, that's not how the open source world works. Building a community of contributors takes time, and complex applications often have a steep learning curve before a developer becomes comfortable working with the code. Open source software companies are the fuel behind a lot of software development, forming the communities and providing the financial backing that support it. And, like any other type of business, open source software companies need to earn money to stay in business.

Games: GameShell, GOG, Oxygen Not Included and More

Linux 4.15 Will Treat The HTC Vive VR Headset As "Non-Desktop"

Currently if plugging in the HTC Vive for a virtual reality experience on Linux, the head-mounted display (HMD) is treated just as a conventional display. But now with a new set of changes for Linux 4.15, the kernel will know it's a "non-desktop" display. Besides the DRM leasing support that has already landed during the Linux 4.15 merge window with the main DRM pull request, David Airlie has sent in another pull today for further benefiting SteamVR with Linux 4.15. (And among other benefits, also the AMDGPU priority scheduling landed too for 4.15 as another benefit for VR Linux gaming when using AMD graphics.) Read more