Monday, January 26, 2009

Preserving Web sites

I really wish that I had been able to find more time last week to blog about all of the stuff that happened last week . . . .

First, we had some momentous changes at the federal level. As any archivist who hasn't had his or her head in the sand knows, the very first executive order that President Obama signed overturns President Bush's dread E.O. 13233 and should facilitate the timely release of presidential records. President Obama signed a memorandum reminding the heads of all federal agencies that "the Freedom of Information Act should be administered with a clear presumption: In the face of doubt, openness prevails." The archival blogosphere and listservs have been chock-full of commentary about these developments, and I really don't have much to add to the discussion at this point; suffice it to say that I'm really, really glad that E.O. 13233 is gone.

Yesterday, an opinion piece penned by Lynne Brindley, the head of the British Library, appeared in the Guardian. Noting that "personal digital disorder" -- our unwillingness or inability to save the digital photos and other electronic materials we create in a way that ensures their long-term survival -- threatens "to leave our grandchildren bereft," she asserts:
As chief executive of the British Library, it's my job to ensure that this does not extend to our national memory. At the exact moment Barack Obama was inaugurated, all traces of President Bush vanished from the White House website, replaced by images of and speeches by his successor. Attached to the website had been a booklet entitled 100 Things Americans May Not Know About the Bush Administration - they may never know them now. When the website changed, the link was broken and the booklet became unavailable.

The 2000 Sydney Olympics was the first truly online games with more 150 websites, but these sites disappeared overnight at the end of the games and the only record is held by the National Library of Australia.

These are just two examples of a huge challenge that faces digital Britain. There are approximately 8 million .uk domain websites and that number grows at a rate of 15-20% annually. The scale is enormous and the value of these websites for future research and innovation is vast, but online content is notoriously ephemeral.

If websites continue to disappear in the same way as those on President Bush and the Sydney Olympics - perhaps exacerbated by the current economic climate that is killing companies - the memory of the nation disappears too. Historians and citizens of the future will find a black hole in the knowledge base of the 21st century.

Brindley goes on to point out that, popular assumptions to the contrary, Google and other commercial entities are simply not capturing and preserving the "nortoriously ephemeral" but immensely valuable information found on the Web:
. . . . The task of capturing our online intellectual heritage and preserving it for the long term falls, quite rightly, to the same libraries and archives that have over centuries systematically collected books, periodicals, newspapers and recordings and which remain available in perpetuity, thanks to these institutions.
She then details the British Library's efforts to digitize some of its paper treasures, ensure the preservation of Web sites relating to the 2010 Olympic Games in London, and, "with appropriate regulation . . . create a comprehensive archive of materials from the UK Web domain."

Brindley is absolutely right, but there really is something missing from this article: an explanation of why libraries and archives must carry out this particular mission. I'm not faulting Brindley for this omission. The editors of the Guardian no doubt had a substantial amount of say in determining the length and overall content of this piece, and Brindley is using her ration of words to link the British Library's activities to a highly anticipated government report on the future of "digital Britain."

However, without explicit discussion of the role that libraries and archives play in preserving cultural heritage materials over very long periods of time and ensuring that the materials in their possession are authentic and unaltered, many people simply won't grasp why it's important that institutions such as the British Library are seeking to preserve electronic materials. A cursory glance at the comment section associated with this article illustrates precisely why this focus on the long term and on authenticity is needed: one of the commenters states that a copy of "100 Things Americans May Not Know About the Bush Administration" is currently available on another publicly accessible Web site and that the British Library simply doesn't know how to use Google. However, the commenter doesn't seem to have thought about whether the copy found on this Web site has not been altered or whether the site itself will be around in 10 years, let alone 100 or 1000 years; given that s/he also links to a parody of this booklet, s/he may simply have tongue planted firmly in cheek, but there is no way to tell from the comment alone.

Unless librarians and archivists do more to jolt people out of their present-minded view of the Web and digital materials generally and to underscore the importance of safeguarding the integrity of digital information that warrants long-term preservation, we're going to find it harder and harder to secure the resources we need to preserve and provide access to it. There simply is no alternative to emphasizing -- loudly and insistently -- that we seek both to serve today's researchers and to lay the foundation needed to ensure that future generations of archivists and librarians will be able to serve future generations of researchers.

No comments: