Friday, June 26, 2009

NDIIPP project partners meeting, day three

Union Station, Washington, DC, at around 3:00 PM today.

The Library of Congress (LC) National Digital Information Infrastructure Preservation Program (NDIIPP) partners meeting wrapped up this afternoon. This morning’s presentations concerned the Unified Global Format Registry, the PREMIS metadata schema, the Federal Digitization Standards Working Group, and the LC’s proposed National Digital Stewardship Alliance.

Right after breakfast, Andrea Goethals (Harvard University) discussed the Unified Global Format Registry (UGFR) and the importance of file format registries generally. One of the main goals of digital preservation is to ensure that digital information remains useful over time, and as a result we must determine whether a given resource has or is likely to become unusable. In order to do so, we need to answer a series of questions:
  • Which file format is used to encode the information?
  • What current technologies can render the information properly?
  • Does the format have sustainability issues (e.g., intellectual property restrictions)?
  • How does the digital preservation community view the format?
  • What alternative formats could be used for this information?
  • What software can transform the information from its existing format to an alternative format?
  • Can emulation software provide access to the existing format?
  • Is there enough documentation to write a viewing/rendering application that can access this format?
Format registries avoid the need to reinvent the wheel: they pool knowledge so that repositories can make use of each other’s research and expertise.

The UGFR was created with the April 2009 merger of the two largest format registry initiatives: PRONOM, which has been publicly accessible for some time, and the Global Digital Format Registry, which was still under development at the time of the merger. The UDFR will make use of PRONOM’s existing software and data and the GDFR’s support use cases, data model, and distributed architectural model. Moreover, it will incorporate local registry extensions for individual repositories and support distributed data input. At present, it’s governed by an interim group of academic institutions, national archives, and national libraries; a permanent governing body will be established in November 2009.

I’ve used PRONOM quite a bit, so I’m really looking forward to seeing the UGFR.

Rebecca Guenther (LC) then furnished a brief overview of the Preservation Metadata: Implementation Strategies (PREMIS) schema and recent PREMIS-related developments.

The PREMIS schema, which was completed in 2005, is meant to capture all of the information needed to make sure that digital information remains accessible, comprehensible, and intact over time. It is also meant to be practical: it is system/platform neutral, and each metadata element is rigorously defined, supported by detailed usage guidelines and recommendations, and (with very few exceptions) meant to be system-generated, not human-created.

I’m sort of fascinated by PREMIS and have drawn from it while working on the Persistent Digital Archives and Library System (PeDALS) project, but I haven’t really kept up with recent PREMIS developments. It was interesting to learn that the schema is now extensible: externally developed metadata (e.g., XML-based electronic signatures, format-specific metadata schemes, environment information, other rights schemas) can now be contained within PREMIS.

I was also happy to learn that the PREMIS development group is also working on incorporating controlled vocabularies for at least some of the metadata elements and that this work will be available via the Web. (

The group is also working on a variety of other things, including:
  • Draft guidelines for using PREMIS with the Metadata Encoding and Transmission Standard (METS)
  • A tool that will convert PREMIS to METS and vice versa
  • An implementers registry
  • Development of a tool (most likely a self-assessment checklist) that will verify PREMIS implementers are using the schema correctly
  • A tool for extracting metadata and populating PREMIS XML schemas
Guenther also shared one tidbit of information that I found really interesting: although PREMIS allows metadata to be kept at the file, representation, and bitstream level, repositories may opt to maintain only file-level or file- and representation-level metadata. I hadn’t interpreted the schema in this manner, and someone else at the meeting was similarly surprised.

A quick update on the work of the Federal Digitization Standards Working Group followed. Carl Fleischauer (LC) explained that the group, which consists of an array of federal government agencies, is assembling objectives and use cases for various types of digitization efforts (e.g., production of still image master copies). To date, the group’s work has focused largely on still images, and it has put together a specification for TIFF header information and will look at the Extensible Metadata Platform (XMP) schema. In an effort to verify that scanning equipment faithfully reproduces original materials, it is also developing device and object targets and DICE, a software application (currently in beta form).

The group is also working on a specification for digitization of recorded sound and developing audio header standards. However, it is waiting for agencies to gain more experience before it tackles video.

The meeting ended with a detailed overview of LC’s plan to establish a group that will sustain NDIIPP's momentum. The program has just achieved permanent status in the federal budget, and all of the grant projects that it funded will end next year.

In an effort to sustain the partnerships developed during the grant-driven phase of NDIIPP’s existence, LC would like to create an organization that it is tentatively calling the National Digital Stewardship Alliance. Meg Williams of LC’s Office of Counsel outlined what the organization’s mission and governance might look like; before creating the final draft charter, LC will host a series of conference calls and develop an online mechanism that will enable the partners to provide input.

LC anticipates that this alliance, which is intended to be low-cost, flexible, and inclusive, will help to sustain existing partnerships and form new ones. In order to ensure that the organization remains viable, LC envisions that the organization will consist of LC itself, members, and associates:
  • Organizations willing to commit to making sustained contributions to digital preservation research would, at the invitation of LC, become full members of the alliance and would enjoy full voting rights. Member organizations would not have to agree to undertake specific actions or projects, but they would have to commit to remaining involved in the alliance over time.
  • Individuals and organizations that cannot commit to making ongoing, sustained contributions to digital preservation research but have an abiding interest in digital preservation, support the alliance’s mission, and are willing to share their expertise would, at LC’s invitation, become associates. Associates will not have voting status.
  • LC itself will serve as the alliance’s chair and secretariat and will use program funding to support its activities; there will be no fees for members or associates. It will also maintain a clearinghouse and registry of information about content, standards, practices and procedures, tools, services, and training resources. It will also facilitate connections between members and associates who have common interests, convene stakeholders to develop shared understanding of digital stewardship principles and practices, report periodically on digital stewardship, and provide grant funding if such monies are available.
LC projects that this committee will have several standing committees responsible for researching specific areas of interest:
  • Content: contributing significant materials to the “national collection” to be preserved and made available to current and future generations.
  • Standards and practices: developing, following, and promoting effective methods for identifying, preserving, and providing access.
  • Infrastructure: developing and maintaining curation and preservation tools, providing storage, hosting, migration and other services, building collection of open source tools.
  • Innovation: encouraging and conducting research, periodically describing a research agenda.
  • Outreach and education: for hands-on practitioners, other stakeholders, funders, and the general public.
  • Identifying new roles: as needed.
LC also sees these committees as having a governance role: at present, it envisions that the alliance’s Governing Council will consist of the Librarian of Congress, the LC Director of New Initiatives, the chairs of all of the standing committees, and a few members at large.

Williams closed by asking everyone present to think about this proposal how to define “success” and “failure” for the alliance, identify benefits of participation for their own institutions and for others, and supply feedback to LC. LC hopes to have a final draft charter finished later this year.

At this point, I think that creating some sort of formal organization makes a lot of sense but don’t have any strong ideas one way or another about the specifics of LC’s proposal. The past few days have been jam-packed/ Even though I relished the opportunity to hear about what’s happening with NDIIPP and to meet face-to-face with my PeDALS project colleagues -- several people told me that the PeDALS group struck them as really hard-working and really fun, and they’re right -- I’m really feeling the need to get home (I’m writing this post on the train), get some sleep, and reflect on everything that took place over the past couple of days. I’ll keep you posted . . . .

George Washington Bridge, Hudson River, as seen from Amtrak train no. 243 at around 8:30 PM.

No comments: