Friday, August 16, 2013

CoSA-SAA 2013: The Web of Sites

 I had the good fortune to attend three great hour-long sessions today:
  • Session 304, Training in Place: Upgrading Staff Capabilities to Manage and Preserve Electronic Records, in which Richard Pearce-Moses (Clayton State University) discussed how online graduate education programs can benefit working archival professionals, Lori Lindberg (San Jose State University) highlighted SAA's new Digital Archives Specialist program, and Sarah Grimm (Wisconsin Historical Society) discussed the educational offerings developed by CoSA's State Electronic Records Initiative project.
  • Session 407,  The Web of Sites: Creating Effective Web Archiving and Collection Development Polices, which is discussed in greater detail below.
  • Session 504, Records Management Training Gumbo for the Digital Age, in which Cheryl Stadel-Bevans (Office of the Inspector General, U.S. Dept. of Housing and Urban Development) facilitated a series of lighting talks given by Jane Zhang (Catholic University of America), Donna Baker (Middle Tennessee State University), Daniel Noonan (Ohio State University), and Lorraine Richards (University of North Carolina at Chapel Hill).
However, I'm desperately in need of sleep, so this post is going to focus solely on Session 407, The Web of Sites: Creating Effective Web Archiving and Collection Development Policies drew a standing room-only crowd, and with good reason.  The three panelists represented three very different institutions with three very different goals: 
  • Olga Virakhovskaya discussed how one collecting repository, the University of Michigan's Michigan Historical Collections (MHC) devised a Web archiving policy that dovetails with its collecting policy, which calls for aggressive collecting and broad documentation of the state's history and culture.  In an effort to balance topical importance and the quality of information found on a given site, MHC staff identify sites that are created by individuals and organizations that MHC seeks to document, fill in gaps in its holdings, or contain material that fall outside MHC's collecting priorities but nonetheless warrant preservation and determine whether the sites content that is rich, unique, or new.  If the site meets all of these requirements, MHC will archive it.  MHC, which uses the California Digital Library's Web Archiving Service, stops archiving sites when no new content has been added for three consecutive years; it will also cease archiving sites upon creator request.
  • Jennifer Wright of the Smithsonian Institution Archives discussed the Archives' efforts to ensure that the 257 websites, 10 mobile sites, 89 blogs, 26 apps, and 578 social media accounts maintained by various Smithsonian entities are managed and preserved appropriately.  The archives is responsible for providing retention guidance to creators, maintaining periodic snapshots of Smithsonian Web resources, and maintaining a registry of Smithsonian social media accounts. It has developed distinct approaches to preserving websites, Intranet sites, and social media accounts:
    • Public websites are generally treated as permanent records, and the Archives tries to crawl them annually, before and after major redesigns, and on days of major events.  However, it will attempt to configure Archive-It's crawler to exclude content that is being transferred to the Archives in other formats, is the responsibility of other Smithsonian units, or consists of collections (as opposed to organizational records), or which merely points to other Web content.  Crawls of public sites are made publicly accessible almost immediately after completion.
    • Intranet sites are appraised individually. Given that most Intranet sites block Web crawlers, Intranet content is transferred to the Archives via FTP, hard drive, or other non-crawling mechanism.
    • Most social media accounts are captured once in order to document their existence and show how they are used. After this initial capture, staff reappraise each account annually and recapture it if significant new content is present. Social media content often resists capture, so the Archives uses multiple tools (Archive-It, export tools, and screenshots) as needed.  These captures are not made available online.
  • Rachel Taketa discussed how she created the California Tobacco Control Web Archive CTCWC, a topical collection of archived sites that complements the University of San Francisco's Legacy Tobacco Documents Library (LTDL), which consists of 14 million internal business records created by major tobacco companies.  The archive consists of about 90 sites that were captured with the California Digital Library's Web Archiving Service and complement materials found within the LTDL, but most focus on the other side of the tobacco control movement:  they were created by public health advocacy organizations, anti-smoking campaigns, and sites relating to proposed tobacco control legislation.  A written scope statement that establishes the archive's geographic focus (California and anti-smoking campaigns in the state's large metropolitan counties) and collecting priorities (original and/or unique content found in blog posts, interviews, multimedia, sites of established tobacco control groups, and local government sites).  Site captures cease when a given site hasn't been updated for a year or when a given issue is no longer relevant; as one might expect, reappraising sites consumes a lot of time.
 My key takeaways from this session:
  • Your Web archiving policy should, to the extent that your resources and Web archiving tools allow, align with your main collecting policy.
  • Just as collecting policies vary from one institution to another, Web archiving policies will vary from one institution to another.
  • Given the speed with which sites change and the frequency with which once-active sites become dormant, reappraisal is a must.  However, it's incredibly time-consuming and we need some tools that will help us analyze the evolution (or lack thereof) of site content over time.
Image: traces of a rainbow over the West Bank Crescent City Connection, the twin cantilever bridges that span the Mississippi River, in New Orleans, Louisiana, 16 August 2013. Thanks to my friend S.G. for pointing out to me; I never would have noticed it otherwise.

No comments: