Language Selection

English French German Italian Portuguese Spanish

Keeper of Expired Web Pages Is Sued

Filed under
Legal

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Beyond its utility for Internet historians, the Web page database, searchable with a form called the Wayback Machine, is also routinely used by intellectual property lawyers to help learn, for example, when and how a trademark might have been historically used or violated.

That is what brought the Philadelphia law firm of Harding Earley Follmer & Frailey to the Wayback Machine two years ago. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor.

In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

The lawsuit, filed in Federal District Court in Philadelphia, seeks unspecified damages for copyright infringement and violations of two federal laws: the Digital Millennium Copyright Act and the Computer Fraud and Abuse Act.

"The firm at issue professes to be expert in Internet law and intellectual property law," said Scott S. Christie, a lawyer at the Newark firm of McCarter & English, which is representing Healthcare Advocates. "You would think, of anyone, they would know better."

But John Earley, a member of the firm being sued, said he was not surprised by the action, because Healthcare Advocates had tried to amend similar charges to its original suit against Health Advocate, but the judge denied the motion. Mr. Earley called the action baseless, adding: "It's a rather strange one, too, because Wayback is used every day in trademark law. It's a common tool."

The Internet Archive uses Web-crawling "bot" programs to make copies of publicly accessible sites on a periodic, automated basis. Those copies are then stored on the archive's servers for later recall using the Wayback Machine.

The archive's repository now has approximately one petabyte - roughly one million gigabytes - worth of historical Web site content, much of which would have been lost as Web site owners deleted, changed and otherwise updated their sites.

The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

Under popular Web convention, such a file - known as robots.txt - dictates what parts of a site can be examined for indexing in search engines or storage in archives.

Most search engines program their Web crawlers to recognize a robots.txt file, and follow its commands. The Internet Archive goes a step further, allowing Web site administrators to use the robots.txt file to control the archiving of current content, as well as block access to any older versions already stored in the archive's database before a robots.txt file was put in place.

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

The Internet Archive, meanwhile, is accused of breach of contract and fiduciary duty, negligence and other charges for failing to honor the robots.txt file and allowing the archived pages to be viewed.

Brewster Kahle, the director and a founder of the Internet Archive, was unavailable for comment, and no one at the archive was willing to talk about the case - although Beatrice Murch, Mr. Kahle's assistant and a development coordinator, said the organization had not yet been formally served with the suit.

Mr. Earley, the lawyer whose firm is named along with the archive, however, said no breach was ever made. "We wouldn't know how to, in effect, bypass a block." he said.

Even if they had, it is unclear that any laws would have been broken.

"First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.

He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.

"The archive here, they were being the good guys," Mr. Patry said, referring to the archive's recognition of robots.txt commands. "They didn't have to do that."

Mr. Patry also noted that despite Healthcare Advocates' desire to prevent people from seeing its old pages now, the archived pages were once posted openly by the company. He asserted that gathering them as part of fending off a lawsuit fell well within the bounds of fair use.

Whatever the circumstances behind the access, Mr. Patry said, the sole result "is that information that they had formerly made publicly available didn't stay hidden."

By TOM ZELLER Jr.
The New York Times

More in Tux Machines

today's leftovers

  • How fast is KVM? Host vs virtual machine performance!
  • Kernel maintenance, Brillo style
    Brillo, he said, is a software stack for the Internet of things based on the Android system. These deployments bring a number of challenges, starting with the need to support a different sort of hardware than Android normally runs on; target devices may have no display or input devices, but might well have "fun buses" to drive interesting peripherals. The mix of vendors interested in this area is different; handset vendors are present, but many more traditional embedded vendors can also be found there. Brillo is still in an early state of development.
  • Reviewing Project Management Service `Wrike` And Seems Interesting
    I have been testing some services for our project and found this amazing service, thought why not share it with you guys, it might be useful for you. Project management is a term that in some respects appears common, yet in practice still seems to be limited to large companies. While this may be true, the foundations of project management are actually rather simple and can be adopted by anyone, in any industry. One of the major requirements you need to consider when selecting a good project management software is the ability to run and operate it on the go via your mobile devices. Other factors include the ability to access the software from any platform whether it be Linux, Mac, or Windows. This can be achieved when the project management software is web-based. Wrike is a software that does of all this.
  • World Wine News Issue 403
  • OSVR on Steam, Unity drops legacy OpenGL, and more gaming news
  • GNOME Core Apps Hackfest 2016
    This November from Friday 25 to Sunday 27 was held in Berlin the GNOME Core Apps Hackfest. My focus during this hackfest was to start implementing a widget for the series view of the Videos application, following a mockup by Allan Day.
  • Worth Watching: What Will Happen to Red Hat Inc Next? The Stock Just Declined A Lot
  • Vetr Inc. Lowers Red Hat Inc. (RHT) to Buy
  • Redshift functionality on Fedora 25 (GNOME + Wayland). Yes, it's possible!
    For those who can't live without screen colour shifting technology such as Redshift or f.lux, myself being one of them, using Wayland did pose the challenge of having these existing tools not working with the Xorg replacement. Thankfully, all is not lost and it is possible even right now. Thanks to a copr repo, it's particularly easy on Fedora 25. One of the changes that comes with Wayland is there is currently no way for third-party apps to modify screen gamma curves. Therefore, no redshift apps, such as Redshift itself (which I recently covered here) will work while running under Wayland.
  • My Free Software Activities in November 2016
  • Google's ambitious smartwatch vision is failing to materialise
    In February this year, Google's smartwatch boss painted me a rosy picture of the future of wearable technology. The wrist is, David Singleton said, "the ideal place for the power of Google to help people with their lives."
  • Giving Thanks (along with a Shipping Update)
    Mycroft will soon be available as a pre-built Raspberry Pi 3 image for any hobbyist to use. The new backend we have been quietly building is emerging from beta, making the configuration and management of you devices simple. We are forming partnerships to get Mycroft onto laptops, desktops and other devices in the world. Mycroft will soon be speaking to you throughout your day.
  • App: Ixigo Indian Rail Train PNR Status for Tizen Smart Phones
    Going on a train journey in India? Ixigo will check the PNR status, the train arrival and departure & how many of the particular tickets are left that you can purchase. You can also do a PNR status check to make sure that your seat is booked and confirmed.

Networking and Servers

  • How We Knew It Was Time to Leave the Cloud
    In my last infrastructure update, I documented our challenges with storage as GitLab scales. We built a CephFS cluster to tackle both the capacity and performance issues of NFS and decided to replace PostgreSQL standard Vacuum with the pg_repack extension. Now, we're feeling the pain of running a high performance distributed filesystem on the cloud.
  • Hype Driven Development
  • SysAdmins Arena in a nutshell
    Sysadmins can use the product to improve their skills or prepare for an interview by practicing some day to day job scenarios. There is an invitation list opened for the first testers of the product.

Desktop GNU/Linux

  • PINEBOOK Latest News: Affordable Linux Laptop at Only $89 Made by Raspberry Pi Rival, PINE
    PINE, the rival company of Raspberry Pi and maker of the $20 Pine A64, has just announced its two below $100-priced Linux laptops, known as PINEBOOK. The affordable Linux laptop is powered by Quad-Core ARM Cortex A53 64-bit processor and comes with an 11.6" or 14" monitor.
  • Some thoughts about options for light Unix laptops
    I have an odd confession: sometimes I feel (irrationally) embarrassed that despite being a computer person, I don't have a laptop. Everyone else seems to have one, yet here I am, clearly behind the times, clinging to a desktop-only setup. At times like this I naturally wind up considering the issue of what laptop I might get if I was going to get one, and after my recent exposure to a Chromebook I've been thinking about this once again. I'll never be someone who uses a laptop by itself as my only computer, so I'm not interested in a giant laptop with a giant display; giant displays are one of the things that the desktop is for. Based on my experiences so far I think that a roughly 13" laptop is at the sweet spot of a display that's big enough without things being too big, and I would like something that's nicely portable.
  • What is HiDPI and Why Does it Matter?

Google and Mozilla

  • Google Rolls Out Continuous Fuzzing Service For Open Source Software
    Google has launched a new project for continuously testing open source software for security vulnerabilities. The company's new OSS-Fuzz service is available in beta starting this week, but at least initially it will only be available for open source projects that have a very large user base or are critical to global IT infrastructure.
  • Mozilla is doing well financially (2015)
    Mozilla announced a major change in November 2014 in regards to the company's main revenue stream. The organization had a contract with Google in 2014 and before that had Google pay Mozilla money for being the default search engine in the Firefox web browser. This deal was Mozilla's main source of revenue, about 329 million US Dollars in 2014. The change saw Mozilla broker deals with search providers instead for certain regions of the world.