Saturday, April 18, 2009

MARAC: A Peak at the Portfolio

Last night, I broke one of the vows I made to myself when I started this blog: I didn’t write about all of the conference sessions I attended on the same day that I attended them. I stayed in the hospitality suite a little too long, and by the time I got back to my room I wasn’t in the mood to do anything more than tweak what I had written about Ken Thibideau’s plenary address. However, I’m not feeling too remorseful: I really enjoyed getting the chance to talk with archivists I rarely see at places other than MARAC.

Moreover, I’m spending a leisurely afternoon in Charleston while waiting for my parents, who are en route from Ohio, to meet me here; we’re going to spend the next few days visiting relatives in Mercer County, West Virginia. As I write this, I’m sitting at a window table at Taylor Books, Charleston’s gem of an independent bookstore, café, and gallery. I can’t think of a better place to atone -- at least in part -- for last night’s fall from grace . . . .

Yesterday morning, I sat in on A Peak Inside the Portfolio, which focused on several initiatives that may facilitate the preservation of electronic records.

Don McLaughlin of West Virginia University discussed the SLASH2 project, and in the process gave a terrifying overview of the exponential increase in the amount of scientific data being generated (e.g., the Large Halon Collider at Switzerland’s CERN generates approximately 1 GB of data per second) and scientists’ need to preserve this data for future reuse. It’s simply not possible to build a single centralized resource that can store, preserve, and support analysis of immense datasets, so people who work with and preserve the data must spend a lot of time moving data across multiple, often geographically distributed systems and dealing with corrupted files, bad disk drives and tapes, and other problems. SLASH2, a data management system developed by the Pittsburgh Supercomputer Center (PSC) and currently being tested by PSC and West Virginia University, will automate much of this work and should improve system performance.

As the archivist daughter of an engineer, I was really taken with the presentation given by Victor Mucino of West Virginia University. Mucino, who is an engineer is examining the IDEFO standard and other options for adding contextual information to STEP and other current standards for the exchange of electronic engineering and design information. These standards can express how a given thing can be produced, but they don’t explain why it was produced as it was; for example, the size of a key piece of the new Robert C. Byrd (!) Telescope in Green Bank, WV was determined by the size of the smallest tunnel between the site of its production and Green Bank, but the telescope’s design documentation omits this basic fact. They also omit the results of failure analyses and other tests. Mucino is exploring how STEP and other standards can be expanded to include this sort of contextual information, which facilitates troubleshooting, subsystem design and replacement work, and subsequent innovation.

Richard Marciano of the Data Intensive Cyber Environments (DICE) Group at the University of North Carolina-Chapel Hill discussed two projects that make use of the DICE Group’s iRODS (Integrated Rule Oriented Data System) data grid management system: the Transcontinental Persistent Archive Prototype, which is a partnership with the National Archives and Records Administration (NARA), and Distributed Custodial Archival Preservation Environments (DCAPE), which seeks to develop a digital preservation service for state archives and state university archives (and which I’m actively involved). iRODS assumes that collaborators are at multiple sites and have different policies, storage systems, and naming conventions, and makes it possible to store and access data in any format, stored in any type of storage system, and stored anywhere over a wide area network. It also allows users to specify high-level policies governing management of data, then breaks down those policies into rules that can be followed by computers and microservices that execute the rules. Richard concluded by noting that iRODS is both top-down and consensus-driven: any community that wishes to use iRODS needs to get together, determine its data management, preservation, and access policies, and translate these policies into rules and microservices that iRODS can understand.

After Richard’s presentation ended, Mark Conrad of the National Archives, who moderated the session, made a really important point: but every archives has policies, but articulating those policies is a real challenge, particularly when electronic records are involved. He’s absolutely right. Working on the DCAPE and PeDALS projects and dealing with some things that have recently come up at work has really driven home the importance of defining and documenting policies governing the processing of electronic records. We’re so comfortable with paper records that we don’t often question our processing, description, etc., practices -- or explain to students or paraprofessional staff the underlying rationale for their assignments. This is not good practice, and Mark is absolutely right that we as a profession need to devote ourselves to documenting our policies.

No comments: