My colleague Michael Martin opened the session by discussing how the New York State Archives typically conducts appraisals. Regardless of format, we compile information about the history of the unit that created or currently maintains the records, the disposition of similar records created by other agencies, similar records already in our holdings, and published research that makes use of similar records. We also look for records disposition schedules for similar or related records, and pertinent state and federal laws and regulations. We then meet with creators to determine the contents of the files, identify any major gaps, examine blank forms or computer reports, and assess the environment in which the records are housed. All of this research forms the basis for formal appraisal reports that assess the legal, administrative, environmental, and research value of the records, identify major preservation and access issues, and recommend specific records management, accessioning, and preservation actions.
When appraising electronic records, we push against creator assumptions that aren't always accurate: that gaps won't exist, that volume won't be an issue, that everything can be easily found, and that passively managed records will remain accessible over time. We also complete a supplemental technical appraisal. We make it a point to speak not only to agency records managers and records creators but also agency IT personnel, and we gather information about the name of the system in which the records are housed, the type(s) of records present, ownership of the records, the hardware and software environment, the size of the system, the physical location of the hardware housing the system, how often records are retrieved and used, the accuracy and completeness of the data, and the existence and location of backup copies. The technical appraisal also assesses the long-term resource commitments needed to ensure that the records will remain accessible over time.
Sibyl Shaefer and Laura Montgomery of the Rockefeller Archive Center focused on the accessioning and ingestion of electronic records. The Rockefeller Archive Center has a sizable backlog of unprocessed records, some of which consist of a mix of paper records and electronic records on legacy media. The digital archivists are searching through boxes, removing legacy media, and producing basic preservation copies of the electronic records, but the paper records may not be processed for some time after this sifting takes place. As a result, the possibility that the relationship between the paper and electronic records will be permanently severed is quite real. In order to ensure that this doesn't happen, Shaefer and Montgomery document the removal of the electronic media in the Resources module (the Accessioning module isn't sufficiently flexible) in their instance of the Archivist's Toolkit (our accessioning workflow is still paper-centric, so for now we're documenting separations of this nature on paper). When the repository receives new accessions, staff conduct a quick survey of the collection, remove the digital media, attach tracking sheets to each piece of media, and create a collection record in the Archivist's Toolkit that documents the removal of the media.
The Rockefeller Archive Center uses Archivematica to ingest electronic records and create item-level preservation and administrative metadata and Submission Information Package-level description metadata. At present, rights issues are a real concern: many of the collections that consist of a mix of paper and electronic records are covered by old donor agreements that make no reference to electronic records, online access, or related issues. Staff eventually hope to enter all information about rights issues into Archivematica at the point of ingest and have it reflected in the PREMIS metadata that Archivematica creates upon ingest.
Jeanne Kramer-Smyth of the World Bank Archives (and author of the always awesome Spellbound Blog) concluded the session with a provocative assessment of issues relating to access. Noting that records aren't truly accessible unless they're also understandable and meaningful, she highlighted the importance of making sure that preservation actions don't inadvertently alter the significant properties of records. For example, the New York Public Library archivist who processed the papers of Jonathan Larsen, the creator of the musical Rent, discovered a mystifying one-line inconsistency in the Microsoft Word 5.1 file containing the lyrics to one of the songs: when opened in an emulator, the line read "before the virus [HIV] strikes." When opened in Microsoft Word 5.1, the line was completely different. Only after opening the file in a hex editor did the archivist figure out what was going on: Microsoft Word 5.1 had a save feature that embedded revisions at the end of the file, but the emulator wasn't configured to read and apply these changes. Had the archivist not taken the precaution of opening the file in its native environment, he or she might have decided that the emulator was a reliable preservation and access tool for Microsoft Word 5.1 files.
As Kramer-Smyth pointed out, migrating files from one format to another can also cause problems: loss of information, loss of fidelity (i.e., changes in appearance or behavior), loss of authenticity/legal admissibility, and the likelihood that migration will have to be performed repeatedly. Moreover, in some instances, it may not be possible to migrate files. In others, one may have to pull records into an emulated environment prior to migrating them
Kramer-Smyth also highlighted a couple of intriguing emulation environments. Basilisk II emulates older Macintosh environments, and Dioscuri provides a universal virtual computer that enables you to run a variety of operating systems and software applications, and all you need to do in order to keep it usable is migrate its interface over time. However, she stressed once again that emulation has its limitations: you need to mimic hardware (a particular concern when attempting to replicate the original user experience), you need to preserve the original operating system and application software, and software licensing issues are a matter of enduring concern.
Despite the limitations of migration and emulation, in the end we will probably have to embrace both approaches: migration can keep electronic files accessible in the relative short term, and emulation will likely be needed in the longer term.
In closing, Kramer-Smyth offered a few intriguing thoughts about end user access:
- In most instances, we will not construct electronic reading rooms akin to the onsite reading rooms that enable us to provide access to paper materials. However, in instances in which specialized hardware is called for or we want to ensure that users don't copy or disseminate materials that are legally restricted or have intellectual property restrictions, we may require users to visit our physical repositories.
- We may create virtual reading rooms at some point in the future, but at present most of us have neither the technological resources nor the volume of electronic files needed to make this approach workable.
- NARA and Maine's Office of GIS allow users to download electronic records in a variety of formats, and we may want to consider embracing this user-centered approach.
Photo: the Joseph and John Steiner Cottages at 22 and 24 Congress Street, Cape May, New Jersey, 13 April 2012. These homes, which have signs indicating that they were built in 1848, aren't as large or as ornate as many other Cape May Victorians, but they have a sweet charm all their own.