Language Selection

English French German Italian Portuguese Spanish

Keeper of Expired Web Pages Is Sued

Filed under
Legal

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

By TOM ZELLER Jr.
The New York Times

More in Tux Machines

today's leftovers

  • Xiaomi is rumored to be working on a Laptop... running Linux!
  • Xiaomi aims to knock Apple off its branch with move into computers
  • Xiaomi's Macbook Pro killer will run Linux
    Xiaomi is known for its popular clones of Apple's iPhone and iPad. Now the Chinese company is rumored to be working on a Linux-based alternative to Apple's Macbook Pro laptop.
  • Acer Announces Predator 8 Gaming Tablet With Intel Atom x7 And Android 5.1
  • Acer Predator 8: A $299 Android gaming tablet
    Acer is launching its first Android tablet designed for gaming. The company’s been showing off the device for months, but now it’s official: the Acer Predator 8 is a tablet with an 8-inch IPS display, an Intel Atom x7 Cherry Trail processor, and a $299 price tag.
  • Acer Launch New $299 Convertible Chromebook
  • Acer offers convertible Chromebook for $299
    Chromebooks have been burning up the sales charts on Amazon. And now convertible Chromebooks seem to be where the market is headed. Acer has jumped on the convertible bandwagon by announcing the Chromebook R11. This new model offers notebook and tablet functionality built into one Chromebook.
  • Linux Foundation is giving away Chromebooks
    The Linux Foundation, the nonprofit organization that sponsors Linus Torvalds and runs many programs to accelerate the growth of Linux, is now giving away free Chromebooks to those who enroll in one of its training courses during September. Free Chromebook. To everyone. Throughout September. The foundation has chosen Dell’s Chromebook 11 for this program. The $299 Chromebook features a 11.6" display, is powered by 1.4Ghz processor, and comes with 4GB of RAM.
  • CloudRouter now live
    The collaborative open-source CloudRouter project has come out of beta.
  • Linux Kernel Engineer opportunity at Collabora!
    Collabora is a software consultancy specialising in bringing companies and the open source software community together and it is currently looking for a Core Software Engineer, that works in the Linux kernel and/or all the plumbing around the kernel. In this role the engineer will be part of worldwide team who works with our clients to solve their Linux kernel and low level stack technical problems.
  • DevOps: An Introduction
    Not too long ago, software development was done a little differently. We programmers would each have our own computer, and we would write code that did the usual things a program should do, such as read and write files, respond to user events, save data to a database, and so on. Most of the code ran on a single computer, except for the database server, which was usually a separate computer. To interact with the database, our code would specify the name or address of the database server along with credentials and other information, and we would call into a library that would do the hard work of communicating with the server. So, from the perspective of the code, everything took place locally. We would call a function to get data from a table, and the function would return with the data we asked for. Yes, there were plenty of exceptions, but for many application-based desktop applications, this was the general picture.
  • The Comparison and Context of Unikernels and Containers
    Talk about unikernels is starting to gain momentum. Still, these are such early days for this technology that implements the bare minimum of the traditional operating system functions. Its functionality is a topic we discussed last month in a post by Russell Pavlicek of Citrix. As Pavlicek wrote, unikernels implement the bare minimum of the traditional operating system functions — just enough to enable the application it powers.
  • FISH – A smart and user-friendly command line shell for Linux
  • This is what we do if someone offers us some constructive criticism
    We in KDE don’t ignore constructive feedback, so at Akademy, we set out to find solutions to the issues he pointed out. In order to maximize the reach of our efforts’ documentation, I decided to write a two-part series about it over at Linux Veda, a “web-magazine to share and spread knowledge about Linux and Open Source technologies” which has always been very interested in – and generally supportive of – KDE.
  • Calligra 2.9.7 Open-Source Office Suite Adds Multiple Kexi and Krita Improvements
  • [Krita] Updating the Shop!
  • GNOME 3.18 Beta 2 Officially Released, Final Version Coming on September 23
    The GNOME Project sent an email to Softpedia a few minutes ago, informing us of the release of the second Beta build of the upcoming GNOME 3.18 desktop environment, due for release on September 23, 2015.
  • Why Samsung’s new smartwatch doesn’t run Android
    Samsung has released some more information on its next generation of smartwatches, the Gear S2. Unlike most of the spate of non-Apple watches being released this week, it’s not running Android Wear. Instead, Samsung has opted to continue using Tizen, the Linux-based operating system that powers its smart TVs and some phones in India.
  • How to Make Unbreakable Passwords In Your Head Using Mental Cryptography
    You're supposed to have distinct passwords for every one of your different accounts, and, what's more, those passwords are supposed to be difficult. Use some numbers and symbols and weird capitalization, they tell us. But it's hard, and so we wind up just using the same password for everything and taking the risk.
  • Thursday's security advisories

today's howtos

Leftovers: Gaming

Leftovers: Red Hat