Showing posts with label MARAC Spring 2012. Show all posts
Showing posts with label MARAC Spring 2012. Show all posts

Monday, April 16, 2012

MARAC Spring 2012: a few tidbits

As promised, here are a few of the interesting snippets of knowledge I learned at the Spring 2012 meeting of the Mid-Atlantic Archives Conference, which was held in Cape May, New Jersey last week.
  • Isiah Beard, Rutgers University Libraries, Center for Scholarly Communication: There are at least 27 commonly used digital audio file formats and approximately 90 codecs that are used to encode and decode various types of audio file formats. The number of digital video file formats and codecs is even greater. (I already knew that the profusion of audio and video formats and codecs is a big preservation problem, but hadn't quantified the problem. Sobering numbers, aren't they?) (Session 1, "Preservation and Conservation of
    Captured and Born Digital Materials")
  • Laura Hortz Stanton (Conservation Center for Art and Historic Artifacts): Arson is the number one cause of fire in archives and libraries, and library book drops are a particular point of vulnerability. (Session 8, "Fundamentals of Emergency Preparedness: Conducting Risk Assessments")
  • Laura Zucconi (Richard Stockton College of New Jersey) and several colleagues are using archival records to develop a role-playing game, Pox and the City, that will teach students enrolled in middle schools, high schools, undergraduate programs, and medical schools about the history of medicine by focusing upon the spread of smallpox in early 19th-century Edinburgh. When the game is finished, players will be able to play the role of a doctor seeking to build up a practice, an Irish immigrant trying to avoid the disease, or the smallpox virus as it spreads from one person to another. Not surprisingly, everyone wants to play the virus. (Session 13, "Digital Humanities in the Archives")
  • Nelson Johnson (author, Boardwalk Empire: The Birth, High Times, and Corruption of Atlantic City) was a member of Atlantic City's planning board and initially began researching the city's history because he wanted to understand how its government became so messed up. After doing a lot of preliminary research, he concluded not only that the African-Americans who worked in the city's hotels and other establishments were integral to the city's history but also that the city's corruption was organic and essential to its survival. The secret to success in a resort community is repeat business, and the working-class Philadelphians who flocked to Atlantic City during 19 and early 20th centuries didn't want wholesome, morally uplifting entertainment. In the words of one of the people Johnson interviewed, they sought out "booze, broads, and gambling," and the city gave them what they wanted. Although Johnson's conclusion doesn't have much to do with records, I kept thinking about it as I walked around Cape May, which is also a tourist town. Cape May's appeal currently centers around its well-maintained beach, immaculately maintained Victorian architecture, civility (motorists readily yield to pedestrians), and generally family-friendly atmosphere. It survives because it gives people -- or, more correctly, a specific subset of people -- the vacation experience they seek. (Session 19, "From the Pages of History to the Screen: The Role of Archives in HBO's Boardwalk Empire")
  • Heather Perez and Shannon O'Neill (Atlantic City Free Public Library): The HBO show Boardwalk Empire has resulted in 75 percent increase in reference questions, and the volume spikes immediately after season premieres. In an effort to meet the public demand for information about the city's Prohibition-era history, the library has developed a Web site that uses the show as an entry point into the city's history and has stepped up its collection of 1920s materials. (Session 19, "From the Pages of History to the Screen: The Role of Archives in HBO's Boardwalk Empire")
Photo: The former Bell Shields House, built ca. 1880, at the corner of Hughes and Decatur Streets, Cape May, New Jersey, 14 April 2012. This massive residence is now called "The Empress." The current owners refurbished the home -- and added a lot of decorative woodwork to the exterior -- with the intent of turning it into a bed and breakfast, but they were so taken with the finished result that they opted to keep to themselves and to their friends and relatives, at least for a little while. Click here for interesting "before" and "after" photos.

Saturday, April 14, 2012

MARAC Spring 2012: Fundamentals of Electronic Records

The Spring 2012 meeting of the Mid-Atlantic Regional Archives Conference featured two sessions focusing on electronic records, and the second session, "Fundamentals of Electronic Records," took place earlier today.

My colleague Michael Martin opened the session by discussing how the New York State Archives typically conducts appraisals. Regardless of format, we compile information about the history of the unit that created or currently maintains the records, the disposition of similar records created by other agencies, similar records already in our holdings, and published research that makes use of similar records. We also look for records disposition schedules for similar or related records, and pertinent state and federal laws and regulations. We then meet with creators to determine the contents of the files, identify any major gaps, examine blank forms or computer reports, and assess the environment in which the records are housed. All of this research forms the basis for formal appraisal reports that assess the legal, administrative, environmental, and research value of the records, identify major preservation and access issues, and recommend specific records management, accessioning, and preservation actions.

When appraising electronic records, we push against creator assumptions that aren't always accurate: that gaps won't exist, that volume won't be an issue, that everything can be easily found, and that passively managed records will remain accessible over time. We also complete a supplemental technical appraisal. We make it a point to speak not only to agency records managers and records creators but also agency IT personnel, and we gather information about the name of the system in which the records are housed, the type(s) of records present, ownership of the records, the hardware and software environment, the size of the system, the physical location of the hardware housing the system, how often records are retrieved and used, the accuracy and completeness of the data, and the existence and location of backup copies. The technical appraisal also assesses the long-term resource commitments needed to ensure that the records will remain accessible over time.

Sibyl Shaefer and Laura Montgomery of the Rockefeller Archive Center focused on the accessioning and ingestion of electronic records. The Rockefeller Archive Center has a sizable backlog of unprocessed records, some of which consist of a mix of paper records and electronic records on legacy media. The digital archivists are searching through boxes, removing legacy media, and producing basic preservation copies of the electronic records, but the paper records may not be processed for some time after this sifting takes place. As a result, the possibility that the relationship between the paper and electronic records will be permanently severed is quite real. In order to ensure that this doesn't happen, Shaefer and Montgomery document the removal of the electronic media in the Resources module (the Accessioning module isn't sufficiently flexible) in their instance of the Archivist's Toolkit (our accessioning workflow is still paper-centric, so for now we're documenting separations of this nature on paper). When the repository receives new accessions, staff conduct a quick survey of the collection, remove the digital media, attach tracking sheets to each piece of media, and create a collection record in the Archivist's Toolkit that documents the removal of the media.

The Rockefeller Archive Center uses Archivematica to ingest electronic records and create item-level preservation and administrative metadata and Submission Information Package-level description metadata. At present, rights issues are a real concern: many of the collections that consist of a mix of paper and electronic records are covered by old donor agreements that make no reference to electronic records, online access, or related issues. Staff eventually hope to enter all information about rights issues into Archivematica at the point of ingest and have it reflected in the PREMIS metadata that Archivematica creates upon ingest.

Jeanne Kramer-Smyth of the World Bank Archives (and author of the always awesome Spellbound Blog) concluded the session with a provocative assessment of issues relating to access. Noting that records aren't truly accessible unless they're also understandable and meaningful, she highlighted the importance of making sure that preservation actions don't inadvertently alter the significant properties of records. For example, the New York Public Library archivist who processed the papers of Jonathan Larsen, the creator of the musical Rent, discovered a mystifying one-line inconsistency in the Microsoft Word 5.1 file containing the lyrics to one of the songs: when opened in an emulator, the line read "before the virus [HIV] strikes." When opened in Microsoft Word 5.1, the line was completely different. Only after opening the file in a hex editor did the archivist figure out what was going on: Microsoft Word 5.1 had a save feature that embedded revisions at the end of the file, but the emulator wasn't configured to read and apply these changes. Had the archivist not taken the precaution of opening the file in its native environment, he or she might have decided that the emulator was a reliable preservation and access tool for Microsoft Word 5.1 files.

As Kramer-Smyth pointed out, migrating files from one format to another can also cause problems: loss of information, loss of fidelity (i.e., changes in appearance or behavior), loss of authenticity/legal admissibility, and the likelihood that migration will have to be performed repeatedly. Moreover, in some instances, it may not be possible to migrate files. In others, one may have to pull records into an emulated environment prior to migrating them

Kramer-Smyth also highlighted a couple of intriguing emulation environments. Basilisk II emulates older Macintosh environments, and Dioscuri provides a universal virtual computer that enables you to run a variety of operating systems and software applications, and all you need to do in order to keep it usable is migrate its interface over time. However, she stressed once again that emulation has its limitations: you need to mimic hardware (a particular concern when attempting to replicate the original user experience), you need to preserve the original operating system and application software, and software licensing issues are a matter of enduring concern.

Despite the limitations of migration and emulation, in the end we will probably have to embrace both approaches: migration can keep electronic files accessible in the relative short term, and emulation will likely be needed in the longer term.

In closing, Kramer-Smyth offered a few intriguing thoughts about end user access:
  • In most instances, we will not construct electronic reading rooms akin to the onsite reading rooms that enable us to provide access to paper materials. However, in instances in which specialized hardware is called for or we want to ensure that users don't copy or disseminate materials that are legally restricted or have intellectual property restrictions, we may require users to visit our physical repositories.
  • We may create virtual reading rooms at some point in the future, but at present most of us have neither the technological resources nor the volume of electronic files needed to make this approach workable.
  • NARA and Maine's Office of GIS allow users to download electronic records in a variety of formats, and we may want to consider embracing this user-centered approach.
I'm heading back to Albany in a little while, but tomorrow I'll put together a post that highlights some of the other tidbits I picked up at MARAC and the beauty that is Cape May. If you ever get the chance to visit this charming little city, by all means do so.

Photo: the Joseph and John Steiner Cottages at 22 and 24 Congress Street, Cape May, New Jersey, 13 April 2012. These homes, which have signs indicating that they were built in 1848, aren't as large or as ornate as many other Cape May Victorians, but they have a sweet charm all their own.

Friday, April 13, 2012

MARAC Spring 2012: Preservation and Conservation of Captured and Born Digital Materials

I'm in Cape May, New Jersey for the Spring 2012 meeting of the Mid-Atlantic Regional Archives Conference and am temporarily closing the Electronic Records Archivists Local 0011000 Hiring Hall so that I can blog about some of the conference sessions and the loveliness of Cape May.

I was really looking forward to the first session, "Preservation and Conservation of Captured and Born Digital Materials," and it more than met my expectations. However, I must state upfront that I slept so wretchedly that I've been making dumb mistakes all day. The following post may contain a few more. Caveat lector!

Isiah Beard of Rutgers University's Scholarly Communications Center, which oversees the university's FEDORA-based institutional repository, kicked off the session by furnishing a definition of the still-mysterious concept of digital curation (per the Digital Curation Centre, it's the creation, preservation, maintenance, collection, and archiving of digital objects) and highlighting the factors that make digital objects more fragile than their analog counterparts:
  • the ease with which electronic files can be deleted or destroyed
  • file format and software dependence (a particular problem with the highly proprietary niche formats that house vast quantities of research data)
  • the speed with which storage media become technologically obsolete
  • the distance and disconnection with which many creators regard materials that don't have appreciable physical form (a pervasive and, in my opinion, all too often overlooked problem)
He then focused on the digital curation lifecycle, a multi-tiered, continuous, and iterative process in which digital objects are evaluated, preserved, maintained, verified, and re-evaluated as the hardware and software environment evolves. Beard and his colleagues often begin the evaluation process by meeting with the creators and asking them to discuss how the materials were created and used, and then engage in a "controlled chaos" (what an apt description of electronic records work!) of evaluating the materials, taking stock of the software, systems, and recording apparatus needed to keep them accessible. They also attempt to determine the file format that will best keep the content accessible over time (which sometimes means keeping them in industry standard proprietary formats) and how users will access the materials. This work culminates in the production of file format-specific guides that outline how incoming materials encoded in a given file format will be handled. All of these guides are periodically reexamined and revised.

In keeping with emerging best practices, Beard and his colleagues migrate some files to new formats in order to increase the chance that they'll remain accessible over time, but always retain a preservation master of the file in its original format and do any needed migration work on derivative copies.

Tim Pyatt of Pennsylvania State University's Special Collections Library highlighted some of the problems associated with current mechanisms for making digitized and born digital materials accessible. At present, many archives provide access to some materials via their traditional research rooms and to other via their online catalogs, their own Web sites, Web sites hosted by creators, social media, and sites hosted by service providers such as the Internet Archive and OCLC; with the exception of linking to sites maintained by creators, my own institution is doing all of these things. As we all know, from an end user's perspective, the proliferation of information silos is mystifying and frustrating. He discussed some of Penn State's strategies for reducing the chaos -- ensuring that every image placed on Flickr has detailed metadata pointing back to Special Collections, including links to an archival Web site now maintained on Penn State's servers in the finding aid describing the collection to which it belongs -- and then identified several repositories that are doing a better job of unifying access:
  • "Good": Duke University's Rare Book and Manuscript Library pulls item-level metadata from finding aids and creates discovery pages that furnish access to digital surrogates of paper-based archival materials. However, at present, none of these discovery pages provide access to born-digital objects.
  • "Better": the University of North Carolina at Chapel Hill's Special Collections Library finding aid platform fully integrates digitized content into finding aids. Clicking on a folder listing in the finding aid will bring up any digital surrogates of items present in the physical folder.
  • "Best": Duraspace's Hypatia application, which is currently under development and which promises to provide a single application that will support accessioning, arrangement, description, discovery, delivery, and long term preservation of born-digital archival collections
Gretchen Gueguen of the University of Virginia's Special Collections Library discussed the Born Digital Collections: An Inter-Institutional Model for Stewardship (AIMS), a two-year, Mellon-funded initiative to develop a framework for stewardship of born-digital materials found in of personal papers held by collecting repositories (and which is also responsible for development of Hypatia). The framework focuses on collection development (i.e., policy and infrastructure), accessioning (physical and intellectual control), arrangement and description, and discovery and access; given that many other initiatives have focused on digital preservation, the project partners decided not to focus on this aspect of stewardship.

The University of Virginia is currently focusing on collection development and accessioning and is establishing policies and developing preliminary workflows. At present, it's revising its donor and depositor agreements to address copyright, access, and ownership issues; in a digital world in which numerous identical copies of a given file may exist, ownership issues are a particular challenge. It's also developing a feasibility testing procedure that addresses a lot of questions that will have to be answered in order to take in and care for digital materials (e.g., file formats, hardware and software needs, need for file format migration or normalization). It will then move on to developing transfer procedures.

While all of this work is going on, Gueguen and her colleagues are also taking steps to deal with the vast array of damaged and obsolete media currently lurking within their collections. They're in the midst of inventorying their legacy media and trying to get data off this media and into a safe and readily accessible (at least to staff) place. (Hunting down legacy media was one of the first things I did when I was an electronic records archivist, but my repository helped to pioneer the More Product, Less Process approach to processing paper records, and as a result my colleagues and I still find floppies and Zip disks lurking in boxes every now and then. We've also discovered that a sizable percentage of this newly discovered media contains non-record material such as retirement party fliers. However, we're a government archives; a special collections unit might have cause to keep similar files found within collections of personal papers.)

When pulling data off legacy and damaged media, Gueguen and her colleagues use a nifty Forensic Recovery of Evidence Device that has a host of SCSI and other ports, built-in drives (5.5" and 3.5" floppy disk, tape, CD/DVD/BluRay, and others), 2 TB of storage, and uses Forensic Toolkit (FTK) digital forensics software to reveal hidden and deleted files (which the University of Virginia doesn't accession), look for possible Social Security Numbers, credit card numbers, and other sensitive data, and extract some metadata. The software is expensive and its output is encoded in proprietary XML, and the device itself is expensive. However, the enterprising archivist can build a similar (albeit far less elegant) hardware array out of component parts, and the Mellon-funded BitCurator project, which may result in creation of an open source, archivally oriented analytic tool might prove to be an alternative to FTK and other proprietary digital forensics tools (I suspect that, for the time being, some of the Open Source Digital Forensics tools might be the best option for archives with limited budgets). They're also using using Archivematica for creation of preservation metadata and access derivatives.

Photo: The Dr. Henry Hunt House at 209 Congress Place, Cape May, New Jersey, 13 April 2012. Cape May is renowned for its Victorian architecture, and this George Stretch-built home, which was built in 1881 and augmented in the 1890s, is a fine example. Can you spot the bunny?