I was really looking forward to the first session, "Preservation and Conservation of Captured and Born Digital Materials," and it more than met my expectations. However, I must state upfront that I slept so wretchedly that I've been making dumb mistakes all day. The following post may contain a few more. Caveat lector!
Isiah Beard of Rutgers University's Scholarly Communications Center, which oversees the university's FEDORA-based institutional repository, kicked off the session by furnishing a definition of the still-mysterious concept of digital curation (per the Digital Curation Centre, it's the creation, preservation, maintenance, collection, and archiving of digital objects) and highlighting the factors that make digital objects more fragile than their analog counterparts:
- the ease with which electronic files can be deleted or destroyed
- file format and software dependence (a particular problem with the highly proprietary niche formats that house vast quantities of research data)
- the speed with which storage media become technologically obsolete
- the distance and disconnection with which many creators regard materials that don't have appreciable physical form (a pervasive and, in my opinion, all too often overlooked problem)
In keeping with emerging best practices, Beard and his colleagues migrate some files to new formats in order to increase the chance that they'll remain accessible over time, but always retain a preservation master of the file in its original format and do any needed migration work on derivative copies.
Tim Pyatt of Pennsylvania State University's Special Collections Library highlighted some of the problems associated with current mechanisms for making digitized and born digital materials accessible. At present, many archives provide access to some materials via their traditional research rooms and to other via their online catalogs, their own Web sites, Web sites hosted by creators, social media, and sites hosted by service providers such as the Internet Archive and OCLC; with the exception of linking to sites maintained by creators, my own institution is doing all of these things. As we all know, from an end user's perspective, the proliferation of information silos is mystifying and frustrating. He discussed some of Penn State's strategies for reducing the chaos -- ensuring that every image placed on Flickr has detailed metadata pointing back to Special Collections, including links to an archival Web site now maintained on Penn State's servers in the finding aid describing the collection to which it belongs -- and then identified several repositories that are doing a better job of unifying access:
- "Good": Duke University's Rare Book and Manuscript Library pulls item-level metadata from finding aids and creates discovery pages that furnish access to digital surrogates of paper-based archival materials. However, at present, none of these discovery pages provide access to born-digital objects.
- "Better": the University of North Carolina at Chapel Hill's Special Collections Library finding aid platform fully integrates digitized content into finding aids. Clicking on a folder listing in the finding aid will bring up any digital surrogates of items present in the physical folder.
- "Best": Duraspace's Hypatia application, which is currently under development and which promises to provide a single application that will support accessioning, arrangement, description, discovery, delivery, and long term preservation of born-digital archival collections
The University of Virginia is currently focusing on collection development and accessioning and is establishing policies and developing preliminary workflows. At present, it's revising its donor and depositor agreements to address copyright, access, and ownership issues; in a digital world in which numerous identical copies of a given file may exist, ownership issues are a particular challenge. It's also developing a feasibility testing procedure that addresses a lot of questions that will have to be answered in order to take in and care for digital materials (e.g., file formats, hardware and software needs, need for file format migration or normalization). It will then move on to developing transfer procedures.
While all of this work is going on, Gueguen and her colleagues are also taking steps to deal with the vast array of damaged and obsolete media currently lurking within their collections. They're in the midst of inventorying their legacy media and trying to get data off this media and into a safe and readily accessible (at least to staff) place. (Hunting down legacy media was one of the first things I did when I was an electronic records archivist, but my repository helped to pioneer the More Product, Less Process approach to processing paper records, and as a result my colleagues and I still find floppies and Zip disks lurking in boxes every now and then. We've also discovered that a sizable percentage of this newly discovered media contains non-record material such as retirement party fliers. However, we're a government archives; a special collections unit might have cause to keep similar files found within collections of personal papers.)
When pulling data off legacy and damaged media, Gueguen and her colleagues use a nifty Forensic Recovery of Evidence Device that has a host of SCSI and other ports, built-in drives (5.5" and 3.5" floppy disk, tape, CD/DVD/BluRay, and others), 2 TB of storage, and uses Forensic Toolkit (FTK) digital forensics software to reveal hidden and deleted files (which the University of Virginia doesn't accession), look for possible Social Security Numbers, credit card numbers, and other sensitive data, and extract some metadata. The software is expensive and its output is encoded in proprietary XML, and the device itself is expensive. However, the enterprising archivist can build a similar (albeit far less elegant) hardware array out of component parts, and the Mellon-funded BitCurator project, which may result in creation of an open source, archivally oriented analytic tool might prove to be an alternative to FTK and other proprietary digital forensics tools (I suspect that, for the time being, some of the Open Source Digital Forensics tools might be the best option for archives with limited budgets). They're also using using Archivematica for creation of preservation metadata and access derivatives.
Photo: The Dr. Henry Hunt House at 209 Congress Place, Cape May, New Jersey, 13 April 2012. Cape May is renowned for its Victorian architecture, and this George Stretch-built home, which was built in 1881 and augmented in the 1890s, is a fine example. Can you spot the bunny?