Behind The Scenes the Archive.org Digital Library

Through the first half of 2020, the Internet Archive has seen unprecedented traffic and faced challenges we’ve never experienced before. Thanks to our generous supporters and the tireless work of our team, however, we’ve been able to make improvements to our systems, increase our bandwidth, and stay online.

The Internet Archive has long been an indispensable resource for researchers, students, journalists, and curious citizens everywhere… and when the COVID-19 pandemic hit, the Archive.org saw a huge spike in traffic. With quarantines, curfews, and stay-at-home orders in effect around the world, people have been relying on the Internet Archive for entertainmenteducation, and the preservation of history as it unfolds. The Archive community is reading bookswatching movieslistening to musicplaying games, and tuning in to old time radio.

A year ago, the users of the Archive.org web site were consuming just over 9 petabytes of data per month; in June 2020, it’s over 18 petabytes per month, which is equivalent to 9 trillion pages of printed text. Although most of the Archive.org staff is working remotely, they managed to get new infrastructure up and running, expanding bandwidth and ensuring that everybody can continue to access the Internet Archive. You can read more about the people who made it possible on the Archive blog!

What does this increased bandwidth provide access to? Here are a few fun facts:

  • The Internet Archive currently holds 21 million books and texts, 3.3 million movies and videos, 400,000 software programs, 7.5 million audio files, and 384 billion web pages in the Wayback Machine.

  • All that adds up to about 60 petabytes of data. If you stored all 60 petabytes of data as MP3s, they would take about 120,000 years to play. If you put it all on a series of 1 GB flash drives and laid them end-to-end, it would stretch more than 1,800 miles long.

  • The Internet Archive store all this data themselves, rather than contracting it out to corporations. The primary copy of the Internet Archive is stored in their headquarters—a former church in San Francisco—but they also have backups in Canada, the Netherlands, and Egypt.

Right now The Internet Archive is getting over 4.2 million unique visitors every day. Their usage statistics are public, and you can view them at https://archive.org/stats/!