Saturday, August 20, 2011
How the Library of Congress is Building the Twitter Archive
In April 2010, Twitter announced it was donating its entire archive of public tweets to the Library of Congress. Every tweet since Twitter's inception in 2006 would be preserved. The donation of the archive to the Library of Congress may have been in part a symbolic act, recognition of the cultural significance of Twitter. When the donation was announced users were creating about 50 million tweets per day. As of Twitter’s fifth anniversary several months ago, that number has increased to about 140 million tweets per day.
It's important to note that the Library of Congress is quite adept with the preservation of digital materials, as it's been handling these types of projects for more than a decade. The library has been archiving congressional and presidential campaign websites since 2000, for example, and it currently has more than 200 terabytes of web archives. It also has hundreds of terabytes of digitized newspapers, and petabytes of data from other sources, such as film archives and materials from the Folklife Center.