Monday, June 29, 2009

Stonewall miscellany

This past weekend marked the 40th anniversary of start of the Stonewall riots, which erupted in the wee hours of 28 June 1969 and ultimately became a key symbol of the LGBT civil rights movement.

The mass media has been covering the anniversary quite extensively. In doing so, it has highlighted the existence of relevant historical records and -- created some records of its own. Among the highlights:
  • As I noted a few days ago, eminent historian Jonathan Ned Katz has created an online exhibit featuring New York City Police Department records concerning the riots. The exhibit includes an interview with Raymond Castro, one of the individuals whose arrest is documented in the records. Over the weekend, MSBNC.com published a feature article highlighting Castro's memories of gay life in New York in the 1960s, his views on the LGBT civil rights movement, and the quiet, pleasant life he now leads in suburban Florida.
  • Bay Windows features the recollections of David Bermudez, who was at the Stonewall Inn when the police raided the bar.
  • The New York Daily News interviews Tommy Lanigan-Schmidt, Ellen Shumsky, and Jerry Hoose, all of whom participated in the protests that followed the raid.
  • WNYC-FM's The Brian Lehrer Show interviews Danny Garvin and Tommy Lanigan-Schmidt, who took part in the Stonewall riots, historian David Carter, and Seymour Pine, the New York City Police Department official who authorized the raid. Pine still defends the raid, stating that it took place not because the bar was a gathering place for gay people but because it was controlled by the Mafia, served drinks in dirty glasses, and allowed patrons to violate prevailing standards of dress.
In addition, a couple of articles highlight the work of archivists seeking to document the history of the LGBT community:
  • David Williams, the community-based LGBT archivist who collected the materials that now comprise the Williams-Nichols Collection at the University of Louisville, discusses LGBT activism in Kentucky and the roots of his archival work.
Kudos to both!

Friday, June 26, 2009

NDIIPP project partners meeting, day three

Union Station, Washington, DC, at around 3:00 PM today.

The Library of Congress (LC) National Digital Information Infrastructure Preservation Program (NDIIPP) partners meeting wrapped up this afternoon. This morning’s presentations concerned the Unified Global Format Registry, the PREMIS metadata schema, the Federal Digitization Standards Working Group, and the LC’s proposed National Digital Stewardship Alliance.

Right after breakfast, Andrea Goethals (Harvard University) discussed the Unified Global Format Registry (UGFR) and the importance of file format registries generally. One of the main goals of digital preservation is to ensure that digital information remains useful over time, and as a result we must determine whether a given resource has or is likely to become unusable. In order to do so, we need to answer a series of questions:
  • Which file format is used to encode the information?
  • What current technologies can render the information properly?
  • Does the format have sustainability issues (e.g., intellectual property restrictions)?
  • How does the digital preservation community view the format?
  • What alternative formats could be used for this information?
  • What software can transform the information from its existing format to an alternative format?
  • Can emulation software provide access to the existing format?
  • Is there enough documentation to write a viewing/rendering application that can access this format?
Format registries avoid the need to reinvent the wheel: they pool knowledge so that repositories can make use of each other’s research and expertise.

The UGFR was created with the April 2009 merger of the two largest format registry initiatives: PRONOM, which has been publicly accessible for some time, and the Global Digital Format Registry, which was still under development at the time of the merger. The UDFR will make use of PRONOM’s existing software and data and the GDFR’s support use cases, data model, and distributed architectural model. Moreover, it will incorporate local registry extensions for individual repositories and support distributed data input. At present, it’s governed by an interim group of academic institutions, national archives, and national libraries; a permanent governing body will be established in November 2009.

I’ve used PRONOM quite a bit, so I’m really looking forward to seeing the UGFR.

Rebecca Guenther (LC) then furnished a brief overview of the Preservation Metadata: Implementation Strategies (PREMIS) schema and recent PREMIS-related developments.

The PREMIS schema, which was completed in 2005, is meant to capture all of the information needed to make sure that digital information remains accessible, comprehensible, and intact over time. It is also meant to be practical: it is system/platform neutral, and each metadata element is rigorously defined, supported by detailed usage guidelines and recommendations, and (with very few exceptions) meant to be system-generated, not human-created.

I’m sort of fascinated by PREMIS and have drawn from it while working on the Persistent Digital Archives and Library System (PeDALS) project, but I haven’t really kept up with recent PREMIS developments. It was interesting to learn that the schema is now extensible: externally developed metadata (e.g., XML-based electronic signatures, format-specific metadata schemes, environment information, other rights schemas) can now be contained within PREMIS.

I was also happy to learn that the PREMIS development group is also working on incorporating controlled vocabularies for at least some of the metadata elements and that this work will be available via the Web. (Id.loc.gov)

The group is also working on a variety of other things, including:
  • Draft guidelines for using PREMIS with the Metadata Encoding and Transmission Standard (METS)
  • A tool that will convert PREMIS to METS and vice versa
  • An implementers registry www.loc.gov/premis/premis-registry.html
  • Development of a tool (most likely a self-assessment checklist) that will verify PREMIS implementers are using the schema correctly
  • A tool for extracting metadata and populating PREMIS XML schemas
Guenther also shared one tidbit of information that I found really interesting: although PREMIS allows metadata to be kept at the file, representation, and bitstream level, repositories may opt to maintain only file-level or file- and representation-level metadata. I hadn’t interpreted the schema in this manner, and someone else at the meeting was similarly surprised.

A quick update on the work of the Federal Digitization Standards Working Group followed. Carl Fleischauer (LC) explained that the group, which consists of an array of federal government agencies, is assembling objectives and use cases for various types of digitization efforts (e.g., production of still image master copies). To date, the group’s work has focused largely on still images, and it has put together a specification for TIFF header information and will look at the Extensible Metadata Platform (XMP) schema. In an effort to verify that scanning equipment faithfully reproduces original materials, it is also developing device and object targets and DICE, a software application (currently in beta form).

The group is also working on a specification for digitization of recorded sound and developing audio header standards. However, it is waiting for agencies to gain more experience before it tackles video.

The meeting ended with a detailed overview of LC’s plan to establish a group that will sustain NDIIPP's momentum. The program has just achieved permanent status in the federal budget, and all of the grant projects that it funded will end next year.

In an effort to sustain the partnerships developed during the grant-driven phase of NDIIPP’s existence, LC would like to create an organization that it is tentatively calling the National Digital Stewardship Alliance. Meg Williams of LC’s Office of Counsel outlined what the organization’s mission and governance might look like; before creating the final draft charter, LC will host a series of conference calls and develop an online mechanism that will enable the partners to provide input.

LC anticipates that this alliance, which is intended to be low-cost, flexible, and inclusive, will help to sustain existing partnerships and form new ones. In order to ensure that the organization remains viable, LC envisions that the organization will consist of LC itself, members, and associates:
  • Organizations willing to commit to making sustained contributions to digital preservation research would, at the invitation of LC, become full members of the alliance and would enjoy full voting rights. Member organizations would not have to agree to undertake specific actions or projects, but they would have to commit to remaining involved in the alliance over time.
  • Individuals and organizations that cannot commit to making ongoing, sustained contributions to digital preservation research but have an abiding interest in digital preservation, support the alliance’s mission, and are willing to share their expertise would, at LC’s invitation, become associates. Associates will not have voting status.
  • LC itself will serve as the alliance’s chair and secretariat and will use program funding to support its activities; there will be no fees for members or associates. It will also maintain a clearinghouse and registry of information about content, standards, practices and procedures, tools, services, and training resources. It will also facilitate connections between members and associates who have common interests, convene stakeholders to develop shared understanding of digital stewardship principles and practices, report periodically on digital stewardship, and provide grant funding if such monies are available.
LC projects that this committee will have several standing committees responsible for researching specific areas of interest:
  • Content: contributing significant materials to the “national collection” to be preserved and made available to current and future generations.
  • Standards and practices: developing, following, and promoting effective methods for identifying, preserving, and providing access.
  • Infrastructure: developing and maintaining curation and preservation tools, providing storage, hosting, migration and other services, building collection of open source tools.
  • Innovation: encouraging and conducting research, periodically describing a research agenda.
  • Outreach and education: for hands-on practitioners, other stakeholders, funders, and the general public.
  • Identifying new roles: as needed.
LC also sees these committees as having a governance role: at present, it envisions that the alliance’s Governing Council will consist of the Librarian of Congress, the LC Director of New Initiatives, the chairs of all of the standing committees, and a few members at large.

Williams closed by asking everyone present to think about this proposal how to define “success” and “failure” for the alliance, identify benefits of participation for their own institutions and for others, and supply feedback to LC. LC hopes to have a final draft charter finished later this year.

At this point, I think that creating some sort of formal organization makes a lot of sense but don’t have any strong ideas one way or another about the specifics of LC’s proposal. The past few days have been jam-packed/ Even though I relished the opportunity to hear about what’s happening with NDIIPP and to meet face-to-face with my PeDALS project colleagues -- several people told me that the PeDALS group struck them as really hard-working and really fun, and they’re right -- I’m really feeling the need to get home (I’m writing this post on the train), get some sleep, and reflect on everything that took place over the past couple of days. I’ll keep you posted . . . .

George Washington Bridge, Hudson River, as seen from Amtrak train no. 243 at around 8:30 PM.

Thursday, June 25, 2009

NDIIPP project partners meeting, day two

The National Digital Information Infrastructure Preservation Program (NDIIPP) partners meeting enables recipients of NDIIPP grant monies to discuss their work, seek feedback, and identify areas of common interest.

Today’s sessions began with Michael Nelson (Old Dominion University), who outlined the Synchronicity project, which will enable end users to recover data that has “vanished” from the Web. Synchronicity is a Firefox Web browser extension that catches that “File Not Found” messages that appear when a user attempts to access a page that no longer exists or no longer contains the information that the user seeks. It then searches Web search engine caches, the Internet Archive, and various research project data caches and retrieves copies of the information that the user seeks.

If these sources fail to provide the desired information, Synchronicity then generates a search engine query based upon what the message is “about” and attempts to find the information on the Web. These queries are based on “lexical signatures” (e.g., MD5 and SHA-1 message digests) and page titles, and preliminary research indicates that these searches are successful about 75 percent of the time. Nelson and his colleagues are currently exploring other methods of locating “lost” content and how to handle pages whose content have changed over time.

I’ve often used the Internet Archive and Google’s search caches to locate information that has vanished from the Web, and I’m really looking forward to installing the Synchronicity plug-in once it becomes available.

Michelle Kimpton of DuraSpace then discussed the DuraCloud project, which seeks to develop a trustworthy, non-proprietary cloud computing environment that will preserve digital information. In cloud computing environments, massively scalable and flexible IT-related capabilities are provided “as a service” over the Internet. They offer unprecedented flexibility and scalability, economies of scale, and ease of implementation. However, cloud computing is an emerging market, providers are motivated by profit, information about system architectures and protocols is hard to come by, and as a result cultural heritage institutions are rightfully reluctant to trust providers.

DuraCloud will enable institutions that maintain DSpace and FEDORA institutional repositories to preserve the materials in their repositories in a cloud computing environment; via a Firefox browser extension, it will also allow users to identify content that should be preserved. A Web interface will enable users to monitor their data and, possibly, run services.

DuraCloud members to create and manage multiple, geographically distributed copies of their holdings, monitor their digital content and verify that they have not been inadvertently or deliberately altered, and take advantage of the cloud’s processing power when doing indexing and other heavy processing jobs. It will also provide search, aggregation, video streaming and file migration services and will enable institutions that don’t want to maintain their institutional repositories locally to do so within a cloud environment.

The DuraCloud software, which is open source, will be released next month, and in a few months DuraSpace itself will conduct pilot testing with a select handful of cloud computing providers (Sun, Amazon, Rackspace, and EMC) and two cultural heritage institutions (the New York Public Library and the Biodiversity Heritage Library).

Fascinating project. We’ve known for some time that DSpace and FEDORA are really access systems, but lots of us have used them as interim preservation systems because we lack better options.

The next session was a “breakout” that consisted of simultaneous panels focusing on one or two NDIIPP projects. The Persistent Digital Archives and Library System (PeDALS) project was featured in a session that focused on digital preservation contracts and agreements. The first half of the session consisted of an overview of the contracts and agreements that support a variety of collaborative digital preservation initiatives:
  • Vicki Reich discussed the CLOCKSS Archive, which brings together libraries and publishers on equal terms and provides free public access to materials in the archive that are no longer offered for sale.
  • Julie Sweetkind-Singer detailed the provider agreements and content node agreements that govern the operations of the National Geospatial Digital Archive.
  • Myron Gutman discussed the development Data-PASS, which grew out of previous collaborations between the project’s partners and lengthy experience preserving social science data.
  • Dwayne Buttler, an attorney who was instrumental in crafting the agreements that support the operations of the MetaArchive Cooperative, emphasized that contracts, which focus on enforceability, grow out of a lack of trust and allow for simultaneous sharing and control; in contrast, agreements articulate goals.
The second half of the session focused solely on PeDALS. Richard Pearce-Moses (Arizona State Library and Archives; principal investigator), Matt Guzzi (South Carolina Department of Archives and History), Alan Nelson (Florida State Library and Archives), Abbie Norderhaug (Wisconsin Historical Society), and Yours Truly (New York State Archives) informally discussed some of the lessons that we’ve learned as the project unfolded. Among them:
  • People involved in long-distance collaborative projects need structured, consistent activities and expectations of involvement; both are key to fostering a sense of project ownership.
  • Lack of face-to-face interaction makes it harder for people to feel engaged; conference calls and other tools can help bridge the gap, but nothing really takes the place of getting to know other people.
  • Working in smaller teams capitalizes upon our strengths -- provided that we make sure that the right mix of IT, archival, and library personnel are involved.
  • Team members must be open to learning as they go and creative and innovative.
  • Working on this project has brought to light a number of challenges: communication and collaboration over long distances and multiple time zones, differences in organizational cultures, responsibilities, and IT infrastructures, learning to speak each other’s languages, and finding the right IT consultant.
  • We are nonetheless rowing in the same direction: we’ve learned to balance local practice with common requirements, and individual partners are beginning to incorporate PeDALS principles and standards into their current cataloging and other work.
In addition, Alan Nelson discussed how the IT personnel involved in the group have adopted the Agile Scrum process . . . and illustrated the difference between involvement and commitment.

The second “breakout” session took place after lunch, and the session I attended focused on building collaborative digital preservation partnerships:
  • Bill Pickett discussed the Web History Center’s efforts to provide online access to archival materials documenting the development of the World Wide Web and the organization’s need for partners.
  • David Minor outlined the work of the Chronopolis consortium, which is striving to build a national data grid that supports a long-term preservation (but not access) service.
  • Martin Halbert detailed the work of the MetaArchive, a functioning distributed digital preservation network and non-profit collaborative.
  • Beth Nichol discussed the Alabama Digital Preservation Network, which grew out of work with the MetaArchive and a strong history of informal statewide collaboration.
During the follow-up discussion, Martha Anderson (Library of Congress) made a really interesting point: according to an IBM study that LC commissioned, the strongest digital preservation organizations are focused on content; weaker groups are focused on tools. The study also found that tool-building works really well when there is a community interest in a tool and a central development team and that natural networks that grow out of years of other collaborative work also lead to the creation of strong organizations; however, there are other ways to build trust.

The end of the day brought all of the attendees back together. Abby Smith of NDIIPP provided an update on the work of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access, which focuses on materials in which there is clear public interest and seeks to frame digital preservation and access as a sustainable economic activity (i.e., a deliberate, ongoing resource allocation over long periods of time), articulate the problems associated with digital preservation, and provide practical recommendations and guidelines.

Economic sustainability requires recognition of the benefits of preservation, incentives for decision-makers to act, well-articulated criteria for selection of materials for preservation, mechanisms to support ongoing, efficient allocation of resources, and appropriate organization and governance. As a result, the task force’s work -- an interim report released last December and a forthcoming final report -- is directed at people who make decisions about the allocation of resources, not people who are responsible for the day-to-day work of preserving digital information.

Smith wrapped up by making a series of thought-provoking points:
  • Preservation is a derived demand, and people will not pay for it. However, they will pay for the product itself. We need to think of digital information as being akin to a car: it’s something that has a long life but requires periodic maintenance.
  • Everything in the digital preservation realm is dynamic and path-dependent: content changes over time, users change over time, and uses change over time. Decisions made now close off future options.
  • Librarians and archivists are the defenders of the interests of future users, and we need to emphasize that we are accountable to future generations.
  • Fear that digital preservation and access are too big to take on is a core problem.
IMHO, Smith's last point succinctly identifies the biggest barrier to digital preservation.

Wednesday, June 24, 2009

NDIIPP project partners meeting, day one

I’m in Washington, DC for the National Digital Information Infrastructure Preservation Program (NDIIPP) grant partners meeting. NDIIPP is a program of the Library of Congress (LC), and we learned today that it has just been awarded permanent status in the federal budget. As a result, the program should receive an annual appropriation and become a permanent part of the digital preservation landscape.

In response to this change, LC is thinking of creating a National Digital Stewardship Alliance (the name may change), which would allow current NDIIPP partners to continue working with LC and attract new partners. Organizations that are willing and able to direct resources to NDIIPP initiatives will have a voice in the operations of the alliance, and other interested institutions and individuals can become observers. I’ll be sure to post more information about this alliance as it becomes available.

Martha Anderson (LC) opened the meeting by furnishing a quick overview of NDIIPP’s progress to date and in the process of doing so highlighted a simple fact that reinforces the conclusions the PeDALS project partners and many other people are reaching: “metadata is your worldview.” In other words, no two organizations use metadata in precisely the same way, and while there may be broad agreement as to how standards should be used, there must always be room for local practice.

The keynote speaker, Clay Shirky, noted that in many respects this persistence of local practice is a very good thing: the existence of different preservation regimes, practices, and approaches reduces the risk of catastrophic error.

Shirky’s work focuses on the social and economic impact of Internet technologies, and his address and follow-up comments highlighted the social nature of digital preservation problems.

The Internet has enabled just about anyone who wishes to create media to do so. Instead of the top-down, one-to-one or one-to-many models that have characterized the production of information since the invention of the printing press, we are seeing the emergence of a many-to-many model. Traditional media have high upfront costs, but the cost of disseminating information via the Internet is negligible. Instead of asking “why publish?” we now ask “why not publish?”

The profusion of Internet media has helped to popularize the notion of “information overload,” but our problem is in fact “filter failure.” Information overload has existed since the invention of the printing press, but we generally didn’t notice it because bookstores, libraries and other institutions created systems for facilitating access to printed information. However, on the Internet, information is now like knowledge itself: loosely arranged and variably available.

Shirky asserted that in the “why not publish?” era, librarians, archivists, and others seeking to preserve digital resources no longer need to decide which social uses for a given piece of information should be privileged by cataloging. Instead, they should actively seek to incorporate user-supplied information into the descriptive metadata (i.e., the filters) that they maintain. For example, user-created Flickr tags indicate that a given Smithsonian image of a fish is of demonstrated interest to both an ichthyologist and to a crafter who placed an image of the fish on a purse. Prior to the rise of the Internet, cataloguers would give the scientist’s use of this image more weight than that of the craftsperson. However, as long as metadata is creating value for some group of people, why not allow it to be applied as broadly as possible? In other words, the question we must answer is no longer “why label it this way?” but “why not label it this way?”

The incorporation of user-supplied metadata challenges librarians and archivists who fear losing control over the ways in which information is presented to researchers. However, as Shirky pointed out, this loss of control has already happened: it’s a mistake to believe that we can control how our institutions and holdings will be discussed. All we can really do is decide how to participate in these discussions. President Obama’s 2008 campaign provides a good example of active participation: the campaign understood right away that providing a really clear vision for Obama would empower supporters to talk about Obama without the campaign’s assistance. It then made use of the best user-generated content. In order to do so, it had to accept that some people would make critical and even bigoted use of the material it made available.

Shirky also noted that digital preservation itself has to be social. The “Invisible College,” a sixteenth-century group of intellectuals who established a set of principles for investigating the natural world and sharing their research, is a good model: its expectation that the results of research would be available for review and development of further inquiry gave rise to modern science, and we are now starting to think of digital preservation as an endeavor requiring collaboration, sharing, and community-building.

The social dimension of preservation also extends to end users. One of the big mental shifts in the NDIIPP project has been from “light” (i.e., totally open) and “dark” (i.e., completely inaccessible) archives to “dim” archives. The more secret something is, the harder it is to preserve. In some cases, we have no choice but to bear the costs of preserving inaccessible materials. However, to the degree that we can turn up the dimmer switch, we should do so. If we allow someone to view a movie -- even in five-minute snippets -- s/he will at least be able to tell us if a snippet has gone bad. Even a little exposure lowers the costs of preservation, and lowering costs increases the possibility that something will be preserved. Moreover, if we develop simple, low-cost tools that enable end users to take an active role in preserving information that is important to them, we’ll get a clearer picture of what they find important and increase the chance that information of enduring value is preserved.

All in all, a really vivid, thought-provoking presentation; this summary doesn’t do it justice.

After a lengthy break, Katherine Skinner and Gail MacMillan of the MetaArchive Cooperative furnished a fascinating overview of the results of two digital preservation surveys, one of which focused on electronic theses and dissertations and the other on cultural heritage institutions of all kinds.

The surveys were meant to identify institutions that were collecting digital materials, types of materials being collected, how these materials are stored, barriers to preservation, and the most desired preservation offerings. Respondents self-selected to take these surveys.

As Skinner and MacMillan noted, the findings reveal some unsettling problems:
  • Most institutions are actively collecting digital materials, and survey respondents hold an average of 2 TB of data.
  • Most respondents hold many different types of file formats and genres of information
  • Storage protocols vary widely. Some respondents are using purpose-built preservation environments (e.g., iRODS), others are relying upon access systems to preserve materials, while others have home-grown systems. Some respondents simply store materials on creator-supplied portable media.
  • The manner in which materials are organized also varies widely, and in many instances organizational schemes (or the lack thereof) pose preservation challenges.
  • Respondents are actively engaging with the ideas, have a high level of knowledge about community-based approaches to digital preservation, and still feel responsible for preservation.
  • Preservation readiness is low -- most institutions aren’t even backing up files, and most also lack preservation plans and policies -- but desire is high. People want training, independent assessments of their capacity, and the ability to manage their own digital preservation solutions. People don’t want to outsource digital preservation; however, some outsourcing will be needed, particularly for smaller institutions.
  • Respondents themselves identified insufficient preservation resources as the biggest threat; inadequate policies and plans, deteriorating storage media, and technological obsolescence were also mentioned.
  • Interestingly, the preservation offerings that respondents most desired did not address the threats that they identified. Cultural heritage institutions wanted training provided by professional organizations, independent study/assessment, local courses in computer or digital technology, new staff with digital knowledge and experience, consultants, and training from vendors. Colleges and universities responsible for electronic theses and dissertations wanted cooperative preservation framework, standards, training on best practices, model policies, conversion or migration services, preservation services provided by third-party vendors, and access services.
Skinner and MacMillan concluded that the most effective preservation strategies incorporate replication of content, geographic distribution, secure locations for storage, and private networks of trusted partners. However, most respondents seem to have fallen prey to “cowpath syndrome:” they have idiosyncratic, ad-hoc data storage structures that grew out of pressing needs, but these structures are increasingly difficult to expand and maintain over time, and some sort of triage will eventually become necessary. Moreover, there is a disconnect between administrators and people who are actually responsible for hands-on preservation work: administrators want to keep things in-house and under control, but hands-on people see the value of collaboration and distributed storage.

I suspect that everyone at this meeting faces at least some of these challenges and shortcomings and that many of us are going to go home and discuss at least some of these findings with our colleagues and managers . . . .

Tuesday, June 23, 2009

Stonewall riot police reports available online

This month, OutHistory.org features a new online exhibit that was made possible by the New York State Freedom of Information Law (FOIL).

The Stonewall riots, a series of spontaneous demonstrations that erupted after police raided a Greenwich Village gay bar and became the founding symbol of the modern gay, lesbian, bisexual and transgender rights movement, erupted during the wee hours of 28 June 1969. In honor of the fortieth anniversary of this event, Jonathan Ned Katz, the pioneering scholar of LGBT history, has created an online exhibit that features digital images and transcriptions of nine New York City Police Department (NYPD) records documenting the protests.

With the assistance of historian David Carter, Katz obtained copies of seven of the documents by submitting a FOIL request to the NYPD in May 2009. When responding to his request, the NYPD opted not to redact the names of the people who were arrested in connection with the Stonewall riots. As Katz notes, these records identify protesters and police officers whose involvement has not been documented in other sources and suggest avenues for further research.

They also highlight how the NYPD's attitudes have changed as time has passed (or, perhaps, because time has passed): the other two documents at the center of the exhibit were released in 1988 to another researcher who filed a FOIL request (and ultimately sued the NYPD), and at that time, the NYPD blacked out the names of arrestees.

Although I visit OutHistory.org from time to time, I learned about the addition of these records to the site via a New York Times City Room blog post highlighting the new exhibit. This post also includes a brief interview with David Carter, whose 2004 account of Stonewall and its immediate aftermath is widely regarded as definitive.

Sunday, June 21, 2009

A day at the Bronx Zoo

Yesterday, my friend Gloria and I took a trip to the Bronx Zoo. The story of how this trip came to be is kind of complicated, so I'll simply say that I'm glad that it happened. Although I am in many respects ambivalent about zoos, I appreciated having the opportunity to see animals I will likely never see in the wild. I also relished being able to catch up with Gloria, who worked for a long time as an archivist but whose work now centers upon research and grant writing. I don't see her as frequently as I did when we worked in adjacent offices, and yesterday's outing was a real treat.

We took a chartered bus to the zoo, and for some reason the driver opted not to take the New York State Thruway all the way down to the Bronx. Instead, we headed into New Jersey and then crossed the George Washington Bridge into Manhattan. I've driven over the GW several times, but have never been able to check out the New York skyline while doing so; it's not really possible to do any sightseeing while driving on a two-level, fourteen-lane toll bridge. Grey as the day was, it was nice to have the opportunity to take in the view.

When we got to the zoo itself, we began in the Amphibian House, which is housed in Zoo Center, a Beaux Arts structure that features splendid sculptures of animals. One of the most striking things about this particular zoo is the attractiveness of its grounds. Many of the zoo's 19th- and early 20th century buildings are still standing, and the grounds around them are still are still carefully landscaped. The newer buildings are less ornate but nonetheless complementary, and the grounds have a sylvan, tranquil quality that makes it easy to forget that one is in an urban environment.

The Wildlife Conservation Society, which operates the Bronx Zoo, was founded in 1895. It maintains its own archives, and detailed information about its holdings is available through the New York State Historical Documents Inventory.

We then moved onto the Monkey House, which features a host of captivating small monkeys from South America. The Silvery Marmoset is native to Bolivia and Paraguay.

I kept my camera's flash turned off during my visit to the zoo. Although I'm sure that the animals are used to flash photography and all sorts of other disruptions -- kids sometimes yell at animals, tweens and teens sometimes strike out on their own, and parents don't always instruct or supervise -- I thought it only fair to minimize the amount of disturbance I caused.

We then moved onto the Madagascar exhibit, which is quite new. Fish swam around this partially submerged Nile Crocodile . . .

. . . which was resting its head on rocks above the water line.

This Day Gecko is also a native of Madagascar.

The lemur area sits at the center of the Madagascar exhibit. The Bronx Zoo has at least two Collared Lemurs and at least half a dozen Ring-tailed Lemurs. Photographing them was really difficult: they are fast, active animals!

We moved on to the Birds of Prey area. I loved the almost comically grim look of these Cenereous Vultures, which are native to southern Europe and northern Africa and can be found as far east as China. I vividly recall a sequence of Peanuts strips in which Snoopy pretended to be a vulture . . . until a real vulture landed beside him and scared him senseless. I think that Charles Schultz used a Cenereous Vulture as the model for his drawing of said vulture.

Although we somehow managed to miss the great apes, we spent quite a bit of time in the Africa section of the zoo. The baby Giraffe on the right, who was born in February of this year and dubbed Margaret by zoo personnel, was keeping a close eye on her mother and the other adults.

It's just about impossible to see the entirety of the Bronx Zoo in a single day, and we didn't have the chance to take the monorail that allows visitors to see the elephants and many other large African animals. We saw only this skull, which belonged to a male African Elephant that in 1989 was illegally killed for its tusks. The zoo has situated this skull on the periphery of the Giraffe habitat and posted signs that discuss the problem of poaching and the Wildlife Conservation Society's efforts to combat it.

This Lion cub, named Moxie on account of her playful, gutsy personality, was relaxing in the shade with her mother. Her father was resting nearby.

I mentioned above that I have mixed feelings about zoos, and seeing this Polar Bear brought all of my negative feelings to the fore. As the zoo's Web site indicates, Polar Bears are solitary, highly mobile creatures: a given bear's home range is between 93-186 square miles, and adults travel alone unless they are seeking a mate or raising young. The habitat of the bear pictured above is smaller than my apartment. The bear itself seemed listless and bored; toys, occasional treats, and the opportunity to smell noisy humans and their food are poor substitutes for the freedom to roam, hunt, and reproduce at will.

The Grizzly Bears, who live next to the Polar Bear, seemed slightly happier, if only because they are more social and can be housed as a group and because their habitat is slightly larger than that of their next-door neighbor. Moreover, as the zoo's signage makes it plain, these bears cannot live in the wild. All of them originally came from Montana or Wyoming, and all of them were deemed "nuisance bears" because they persistently visited areas settled by humans. If the Bronx Zoo hadn't taken them in, wildlife authorities would have killed them.

The zoo's signage also indicates that most of these bears carry permanent reminders of their past encounters with humans: x-rays have revealed buckshot or bullets embedded in their flesh. The bear pictured above constantly shakes its head as it walks -- not normal for a bear -- and Gloria and I wondered whether this behavior is the result of past head trauma.

The Tiger habitat, which is much newer than that of the bears, is much larger. The Tigers have a relatively large wooded area in which to roam, and the zoo makes a conscious effort to keep them mentally and physically active. "Enrichment" sessions involving toys or treats (or toys containing hard-to-get-at treats) take place several times a day.

These Père David's Deer exemplify the good work that zoos can do: originally native to China, these deer are now extinct in the wild. The Bronx Zoo and other zoos that hold captive populations of these animals are now trying to reintroduce these deer to their native habitat.

The Bronx Zoo has a lengthy history of helping to rebuild wild animal populations. The American Bison who live at the zoo are descendants of a small group of animals brought to the zoo in 1899. Other descendants of the zoo's American Bison were returned to the West, and most of the 20,000 wild American Bison that roam through Yellowstone National Park and other protected areas trace their lineage to the Bronx Zoo herd.

We ended our day with a late lunch at the zoo's cafe, where we were joined by one of the resident Indian Peafowl. Native to the Indian subcontinent, these beautiful, noisy creatures have the run of the zoo and move freely from habitat to habitat; however, they seem to have sense enough to avoid the areas housing the lions, tigers, and bears. The peafowl and ducks, which are also free to move about the zoo's grounds, have little to no fear of people, and this peacock was not shy about seeking handouts from the humans seated outside the cafe.

We left the Bronx Zoo tired and wet (spring in the Northeast has been cool and very wet, and yesterday was no exception), but very glad we came and enthused about the prospect of making a return trip at some point in the future.

Tuesday, June 16, 2009

Electronic records potpourri

Lots and lots of stuff relating to electronic records (and Web 2.0) this week . . . and it's only Tuesday.

This week's New York Times Sunday Magazine features a lengthy article about data centers -- those huge, ever-growing, geographically dispersed, and interconnected banks of servers that support Facebook, Flickr, Twitter, Blogger (l'Archivista's host), and Web-based e-mail, gaming, and applications. If you've ever wondered what makes all this stuff work (and how much power it consumes), this article furnishes a good non-technical overview . . . .

. . . . And if you've ever wondered about the robustness of the networks that support our online lives, check out what's happening in Iran: the government may be limiting access to social media in an effort to hamstring the opposition, which has been using Twitter and other tools to coordinate protests and disseminate information. (More info here, here, and here. If you want to see how these tools are being used, Andrew Sullivan and crew have been compiling and linking to English-language information sent out by Iranians involved in the protests. You can also follow real-time updates on Twitter.)

The Spring/Summer 2009 issue of the American Archivist is out, and it includes articles concerning cell phone-generated archival records, collaborative efforts to preserve electronic data, and conversion of electronic records to preservation formats.

The U.S. National Archives and Records Administration anticipates that President Obama's administration will create petabytes of archival electronic records.

Ian Wilson, the just-retired Librarian and Archivist of Canada, reflects upon Library and Archives Canada's digitization program, efforts to capture the federal government's Web presence, and records management initiatives.

Finally, every electronic records archivist has to contend with vendors that believe (or at least hope) that they've managed to develop a permanent storage medium for electronic records. Invariably, each vendor is the sole source of the medium it trumpets, and after a while the high-pressure sales pitches and breathless news releases make one a bit cynical -- which is why Bruce Sterling's snarky reaction to a giddy news article about a nanoscale storage research project brought a big smile to my face.

Sunday, June 14, 2009

Catching up: Cologne Archives; the Stasi and recent German history


I planted three "Tiny Bee" Asiatic lily cultivars last year, and they didn't do very well. I didn't expect them to survive, and was very pleasantly surprised when they sprang to life earlier this spring. As of today, they have more than a dozen full blooms and a like number of buds. Interesting things -- most of which aren't as pretty as these lilies -- sometimes pop up unexpectedly . . . .

Sorry for the light blogging over the past week. I've been struggling to meet multiple deadlines at work, combating (organically and non-lethally) the squirrels that are attacking the lettuce, beets, and other produce in the container garden, and getting used to the feeling of having a trio of staples in my scalp; there's nothing like a minor household accident to keep an archivist's life interesting.

Next week's blogging may be similarly light: I'm planning to attend a National Digital Information Infrastructure Preservation Program grant partners meeting during the last full week of June, and as a result must devote next week to wrapping up some loose ends, meeting some additional deadlines, and preparing for departure.

However, before any more time elapses, I wanted to comment on a couple of archives-related developments that have taken place in Germany. Neither one is particularly new, but the first is really encouraging and the second highlights the role of archives in shaping -- and, in this instance, destabilizing -- collective understandings of history.

The first piece of news concerns the records Historical Archive of the City of Cologne, which collapsed on 3 March of this year: the archive's staff and other experts on the site have been stunned and pleased by the condition of the records that have been recovered to date. As of 1 June, approximately 85 percent of the archive's holdings have been recovered (the remaining 15 percent is submerged in groundwater), and roughly 75 percent of the recovered material is relatively intact. Although archivists still anticipate that it will take approximately 30 years to recover from this disaster, they also expect that digital technology will aid the process: special software developed to reassemble documents shredded by the Stasi, the East German secret police, in the days before the collapse of the East German government may help them reassemble torn and badly damaged documents.

The other archives-related development of note also relates to the Stasi. The records concern an event that took place on 2 June 1967, when police officer Karl-Heinz Kurras shot and killed Benno Ohnesorg, an unarmed, at a political protest. Ohnesorg's death triggered mass protests throughout the nation and helped to shape the political views of countless young Germans who saw Kurras as a far-right extremist. Forty years after Ohnesorg's death, many Germans identify 2 June 1967 as an important date in their nation's history: the upheaval that followed in the wake of Ohnesorg's death profoundly affected life in the Federal Republic of Germany. Some Germans believe that it made the Federal Republic more democratic and more open, while others are convinced that it ushered in an era of social decay, but everyone agrees that it was significant.

on 21 May, two historians who were conducting research in the vast archives of the Stasi announced that they had inadvertently discovered compelling evidence that Kurras had been on the Stasi's payroll since 1955. Kurras, who was tried but never convicted of any crime, roundly denies that the Stasi ordered him to shoot Ohnesorg, and to date, no records indicating that the Stasi ordered Kurras to kill have surfaced. However, it is widely known that the East German government actively sought to destabilize the Federal Republic, and questions about Kurras's motivations have led many Germans to ponder their nation's recent past. If Kurras's Stasi ties had come to light sooner, would the the mass student and women's movements fueled, directly or indirectly, by outrage over Ohnesorg's death have been as large or as influential? Would the left-wing terrorist groups that plagued the Federal Republic in the 1970s existed had young radicals widely known about Kurras's true political beliefs? Would the Federal Republic be better or worse off?

In sum, a handful of archival records may ultimately cause an entire nation to reassess and reinterpret its recent past.

Wednesday, June 10, 2009

International Archives Day

International Archives Day 2009 poster from Japan. Image courtesy of the International Council on Archives.

Today is the second International Archives Day. Most American archives and archivists have yet to focus on International Archives Day, but repositories and archives professional associations are taking the opportunity to highlight the importance of archives to collective memory and governmental accountability and transparency.

Among the many archives that have chosen to observe International Archives Day are:

International Archives Day likely hasn't gotten much traction in the United States because of its newness and the established nature of American Archives Month, but it would be great to see lots of American repositories organize around International Archives Day 2010: as the International Council on Archives, which established International Archives Day, points out, it offers "countries which already have well-established celebrations at other times of year . . . another chance to reinforce key messages about the significance of archives."

Tip o' the hat to Felipe Diez of Solidarity Köln Historisches Archiv.

Saturday, June 6, 2009

New York Archives Conference, day two

The 2009 New York Archives Conference wrapped up yesterday afternoon, and everyone in attendance seemed to have a great time.

The first morning session I attended, “Exploring the Possibilities of Web 2.0 for Cultural Heritage Websites,” gave attendees an introduction to the world of Web 2.0 and some of the ways in which archivists could make use of it.

Greg Bobish (University at Albany, SUNY) provided an overview of some of Web 2.0’s core concepts and then noted the characteristics that Web 2.0 technologies such as blogs, wikis, and social networking sites share: they are available online from almost any computer (or other device), require minimal technical skills, and encourage and participation and creation and editing of content. Bobish’s presentation, which is a great introduction to Web 2.0 principles, is available online.

Nancy Cannon and Kay Benjamin (both from the SUNY College at Oneonta) then outlined how Web 2.0 technology could be used to make primary source materials freely available to students, teachers, and researchers. They obtained permission from the Delaware County Historical Association to reproduce materials that shed light on life in the county prior to the Civil War, and Cannon drafted historical essays that placed the primary source materials in context. Cannon and Benjamin then used basic HTML coding to create their site, Voice of the People: Daily Life in the Antebellum Rural Delaware County New York Area.

Cannon and Benjamin used Google Maps to add interactivity to sections of the site documenting an 1851 sea voyage from New York to California and a Delhi family's 1823 journey through upstate New York. Benjamin then gave a practical demonstration of how to set up a Google Maps account and then combine maps with text, images, and multimedia materials. As she noted, Google Maps can be of great use to archivists and librarians who want to create interactive online content on a shoestring.

I next attended “Digitizing Audio and Video Materials.” My colleague Monica Gray opened the session by explaining how the New York State Archives used a one-time allocation of $25,000 to outsource the digitization of 53 motion picture films, 98 video recordings, and 34 audio recordings.

In preparation for digitization, Gray conducted an inventory of holdings, did a lot of background research into digitization standards and best practices, and worked with colleagues and vendors to select materials that were of interest to researchers or in formats on the verge of obsolescence. She stressed that archivists need to specify exactly what they want from their vendors, determine in advance whether to add title frames, etc., and anticipate the need to provide access to the resulting files.

As a result of this project, the State Archives now manages preservation master copies (.wav format, 44.1 kHZ, 16 bit), and access copies (.mp3 format) of audio recordings and preservation master copies (.avi format) and access (.wmp format) copies of moving image materials. It is now focusing on providing access to its use copies.

Gray also outlined some easy preservation measures that all archivists can undertake:
  • Store media vertically, not horizontally.
  • Rewind all recordings to the start.
  • Remove all record tabs from video and audio cassettes.
  • Remove papers from film canisters (dust is the great enemy of tape and film).
  • Use film strips that measure the extent of vinegar syndrome in motion picture film.
Andrea Buchner (Gruss Lipper Digital Laboratory, Center for Jewish History) then discussed the results of her repository’s year-long, grant-funded pilot digitization project. Staff digitized 94 oral histories on 142 audio cassettes, 79 hours worth of recordings on 193 reel-to-reel tapes; they also produced transcripts of 23 oral history interviews. Each hour of preservation master recordings comprises 1 GB of data, and Buchner determined that it cost $80 to produce, catalog, and store one hour of digital audio data.

Library staff created preservation master files of each recording (PCM.wav format, 2 channel stereo, 48.1 kHz, 24 bit). Derivative access copies were produced in .mp3 format. They also created a MARC21 catalog record for each recording and incorporated data captured during the digitization process into each record.

Buchner noted that the digitization process itself was easy compared to other challenges that staff encountered:
  • Unreliable metadata: people hadn’t listened to these tapes in decades, and existing catalog records weren’t always accurate.
  • Copyright: in some instances, staff had to make use of the “library exception” in U.S. copyright law; i.e., they made a limited number of copies and must restrict access to onsite users, include a copyright notice, and inform users that they should not exceed the fair use provision of U.S. copyright law.

Melinda Dermody (Belfer Audio Laboratory and Archive, Syracuse University) then outlined how her repository digitized some of its approximately 22,000 cylinder recordings, 12,000 of which are unique titles. The Belfer Audio Archive received a $25,000 grant for this ongoing three-year project; a gift that made possible the purchase of a new digital soundboard has made it much easier for staff to work on this project.

The project’s core team includes Dermody, a music librarian, the core metadata librarian, and the digital initiatives librarian, and the Belfer's sound engineer. The group’s goal was to make available online 6,000 audio files (300 are currently available), and to create create preservation master (.wav format, 44.1 kHz, 24 bit) and access (.mp3 format) copies of each recording.

The group determined which cylinders had already been digitized by another university, identified cylinders in fragile condition, and assessed the interests of music faculty and researchers. The digitization of selected recordings is being done by Belfer Audio Archive staff, and staff have created or revised a MARC record for each recording. They use a MARC-to-Dublin Core crosswalk to populate the metadata fields of CONTENTdm, which is being used to provide access to the use copies of the recordings.

After the second session ended, all of the attendees convened for lunch and a great talk by Syracuse University Archivist Ed Galvin, who outlined how the Syracuse University Archives was drawn into the production of The Express (2008), a film about the life of alumnus Ernie Davis, the first African-American winner of the Heisman Trophy.

Preparations for the filming of The Express brought Universal’s production designers and other Hollywood personnel to the SU campus, and Galvin and his staff spent the next 18 months responding to their requests. The filmmakers were intent upon reconstructing Davis’s life on campus as faithfully as they could, and developed a wide-ranging and sometimes surprising list of items they sought and questions they wished to have answered. Galvin and his colleagues supplied detailed information about uniforms, etc., and other aspects of campus life and gave production staff access to yearbooks, copies of the student newspaper, copies of football programs, other campus publications and memorabilia, images of the coach’s office and other SU facilities.

The SU Archives also led licensing negotiations with Universal on behalf of the entirety of the university at large; however, much of the SU material in the film came from departments other than the archives.

Completion of the film, most of which was shot in Chicago, brought additional challenges. The film’s world premiere was held in Syracuse, prompting SU’s marketing unit and development office and a California film marketing firm to request additional materials from the SU Archives. Three days after the film’s premiere, Universal asked the archives to locate footage that could be used to produce a bonus featurette for the film’s DVD release. The archives also received requests for materials from alumni, politicians, History Day students, and other interested individuals.

Galvin made it plain that he and his staff often enjoyed working on this project, but also emphasized that archives approached by film studios should draw up detailed contracts and specify fees before any work begins; SU received only $4,000-$5,000 -- which did not even cover reproduction costs -- for 18 months of intense work on The Express.

NYAC conferences typically don't have overarching themes, but it struck me on the way home that just about every speaker I heard at this year's meeting centered upon clearly articulating one's expections -- about security measures, vendor deliverables, project specifications and outcomes -- and documenting whether or not they have been met. We as a profession haven't always excelled at doing so, and it was really heartening to hear so many colleagues assert the need for this sort of activity.

Friday, June 5, 2009

New York Archives Conference, day one

Grewen Hall, LeMoyne College

Yesterday was the first day of the New York Archives Conference (NYAC), which is being held at Lemoyne College in Syracuse. One of the things I really like about NYAC is its informality: many people either know each other or know of each other’s work, and the atmosphere is intimate and convivial as a result.

Today was jam-packed with sessions and other activities. IT started with a plenary session led by Maria Holden (New York State Archives), who outlined how the State Archives has responded to a recent internal theft and left the attendees with the following advice:
  • It is up to you to take ownership of security. It isn’t something that just happens or is the concern of a handful of people.
  • Do not wait until something bad happens. Addressing security issues before trouble occurs helps to avert problems and makes it easier to manage change and secure staff support.
  • Become with security standards and guidelines relating to cultural heritage institutions.
  • Do your due diligence: develop policies and procedures, and document what you have done to improve security.
  • Remember that security is as much about protecting the innocent as it is about protecting collections. Employees need to understand that good security practices help to ensure that they will not become suspects in the event that a theft takes place.
The “Everyday -- Ethics (or What Do I Do Now?)” session touched on a host of related issues, and all of the panelists made some great points.
  • Geoff Williams (University at Albany, SUNY) asserted that archivists need to question whether they should use their own holdings when conducting their own scholarly research; any archivist who does so will have to figure out what to do when other scholars want access to the results of their research and determine what to do when other users want to see the records that they’re using.
  • Kathleen Roe (New York State Archives) discussed the thorny issue of collecting manuscripts, ephemera, and artifacts that fall within their own institution’s collection parameters and concluded that the safest course of action is to avoid collecting anything that, broadly defined, falls within the collecting scope of one’s employer; this approach avoids both the actuality and the appearance of impropriety -- and frees one to develop new collecting interests.
  • Trudy Hutchinson (Bellevue Alumnae Center for Nursing History, Foundation of New York State Nurses) discussed how her nursing background informs her understanding of archival ethics and how, as an undergraduate majoring in public history, she had been required to develop a written personal code of ethics. She has since updated and expanded this code, which she discussed with her current employer during the interview process, and would like to see all archives students develop such does. (This is a great idea for current professionals, too.)
  • Patrizia Sione (Kheel Center, Cornell University) discussed a variety of ethical issues that she has confronted, and noted that she would like to see employers develop written policies relating to scholarly research undertaken by staff. She also emphasized the importance of working with donors to ensure that the privacy of correspondents, etc., is appropriately protected; doing so will ensure that appropriate access restrictions are spelled out in deeds of gift. Finally, she noted that archivists need to be sensitive to the ways in which the pressure to assist researchers with ties to high-level administrators can conflict with their ethical obligation to treat all users equitably.
In the next session, “Can We Afford Not to Act? Strategies for Collection Security in Hard Times,” Richard Strassberg (independent archival consultant) and Maria Holden (New York State Archives) outlined a wide array of low-cost security measures.

Richard Strassberg noted that the recession makes protection of collections particularly important: instances of shoplifting and employee theft are on the rise, and archivists and researchers face the same financial pressures as everyone else. He also noted that the increasing prevalence of online finding aids and digitized images has had mixed results: although they make it easier for honest dealers and collectors to identify stolen materials, they also make it easier for dishonest individuals to hone in on valuable materials.

He then outlined what he called “minimal level protection” strategies for cultural institutions, all of which require staff time but don’t cost much:
  • Have a crime prevention specialist employed by the local or state police do an assessment of your facility.
  • Establish links with the local police so that they know that you hold valuable materials.
  • Have a fire inspection conducted (but make sure that your management knows in advance that you’re planning to do so -- the fire department will close your facility if it finds serious problems that management isn’t able to fix).
  • Get a security equipment quote; even if you don’t have the money, the cost might be lower than you expect, and having the quote will give you a fundraising target.
  • Do an insurance review and have your holdings appraised; doing so will help you in the event that you suffer a loss.
  • Protect your perimeter by tightly controlling keys and, if possible, screwing window sashes shut.
  • Avoid drawing attention to valuable materials. Don’t put up red-flag labels (e.g., “George Washington letter”) in your stacks and be cautious about what you display to VIPs and other visitors.
  • Tighten up on hiring. Conduct background checks if you can, and carefully check references by phone.
Strassberg emphasized that these measures will protect collections from “conditionally honest visitors,” but will not guard against thefts by staff. Moreover, they are not sufficient for repositories that hold materials of particular interest to thieves (e.g., collections relating to politics, sports, Native Americans, African Americans, and literary figures); such institutions will likely have to invest in electronic anti-theft technology.

In the event that a theft occurs or is suspected, contact, in the following order: your supervisor (or, if s/he is the suspect, his/her boss), the police, the donor (if applicable and he/she is still around), and your staff. Staff must be cautioned not to talk about the theft with family, friends, or co-workers. Also, develop a local phone tree -- external thieves tend to hit all of the repositories in a region within a short amount of time, and your colleagues will appreciate being informed. Avoid sending out e-mail alerts; you don’t want to document suspicions that might be unfounded.

Strassberg concluded by noting that librarians and archivists must be trained to confront suspected thieves in a legal and appropriate manner -- or how to set the process of confrontation in motion by contacting security or the police. They also need to know that they cannot physically prevent anyone from leaving the research room; in New York State, they might be guilty of battery if they attempt to do so.

Maria Holden then focused upon internal theft, which is the most common security threat that archives face. Employee theft is a complex problem, and full understanding of it is hard to come by. Theft is motivated by a variety of factors: personality disorders, gambling or substance abuse problems, retaliation for actual or perceived slights, and feelings of being unvalued.

We need to create a work environment that discourages theft and to control when, where, and how people interact with records; doing so protects not only the records but also innocent people who might be otherwise be suspected of wrongdoing. There are several ways we can do so:
  • Hiring should be done carefully and with due diligence. The references of prospective employees should be screened carefully, and their collecting habits should be scrutinized carefully; the results of these checks should be documented. Many archives compel staff to adhere to codes of ethics and sign disclosure statements re: their collecting and dealing habits. The code of ethics developed by the Association of Research Libraries might be a good model.
  • A number of recent thefts have been perpetrated by interns and volunteers. Develop a formal application process for interns and volunteers, document the process, and supervise interns and volunteers at all times.
  • Keep order in your house. There is growing evidence in the literature that disordered environments can encourage delinquent behavior. Order begets respect for collections.
  • Keep collections in the most restricted space possible. The State Archives has looked at every space in which records might be found (research room, scanning lab, etc.) and then figured out when it’s appropriate to bring records into a given space and how long they should remain in it. Develop overarching rules governing removal and return of records to the stacks.
  • Keep collections in the most secure space possible, grant access rights thoughtfully, designate spaces for storage, work, and research, and establish parameters for working hours; many internal thefts occur during off-hours.
During the question and answer period, Kathleen Roe made an important point: Sometimes, people start out honest, then fall prey to gambling or other addictions or personal problems. We have to make it difficult for desperate people to steal from our holdings.

Richard Strassberg also emphasized the research proves that most people are conditionally honest, i.e., they won’t steal from their friends. We need to create work environments that make people feel valued.

I took part in one of the late afternoon sessions, “The Challenge of the New: Archivists and Non-Traditional Records,” which focused on various electronic records projects at the New York State Archives. Ann Marie Przybyla discussed our new e-mail management publication, Michael Martin detailed our Web crawling activities, and I discussed the processing and description of a series of records relating to the “Troopergate” scandal.

At the end of the day, we went to a reception and a great tour of the LeMoyne College Archives led by College Archivist Fr. Bill Bosch. Afterward, I went out to dinner with my State Archives colleagues Monica Gray and Pamela Cooley, Capital Region Documentary Heritage Program Archivist Susan D’Entremont, and Nathan Tallman, who just graduated from the University at Buffalo’s library school and is a project archivist at the Herschell Carousel Museum. We had a great time, and all of us would recommend Phoebe’s to anyone visiting Syracuse.

Thursday, June 4, 2009

New York State Cyber Security Conference

I’m doubling up on conferences this week. Yesterday, I got the chance to sit in on a couple of sessions of the New York State Cyber Security Conference, which is always held in Albany. I then headed for Syracuse to attend the annual meeting of the New York Archives Conference, which started today.

The first session I attended, Acquiring Computer Communications: Often a Treacherous Task, focused on the use of electronic communications as evidence in legal or disciplinary proceedings. Stephen Treglia, an Assistant District Attorney with the Nassau County District Attorney’s Office, highlighted the many problems that employers and law enforcement agencies in New York State must confront. The legal terrain is laden with pitfalls.

The search and seizure of electronic communications (e.g., e-mail) has been the subject of a substantial amount of case law, and many of the non-computer issues relating to search and seizure translate well to computer issues. However, to date, most of the case law pertaining specifically to electronic communications has focused on child pornography. The courts are only now turning their attention to search and seizure of electronic communications relating to white-collar crime, and archivists and records managers should note that very little case law focuses upon search and seizure of electronic communications that document improper recordkeeping.

As if the situation weren’t murky enough, and most of the case law is federal. New York State law tends to be more respectful of individual rights than federal law, and not all federal case law is applicable in New York.

Treglia then provided an overview of current case law, with a particular focus on the workplace. He emphasized that current case law regarding employer searches of staff computers indicates that office policies trump individual privacy concerns. However, the court that handed down the prevailing opinion noted that the employee did not assert that he did not know about the policy, and future defendants may make this argument. As a result, employers should establish computer and Internet use policies and have each employee sign a statement indicating that s/he is aware of these policies and of the penalties for violating them.

Treglia’s presentation, which highlighted many inconsistencies and oddities in case law, made it plain that legislators and the courts have a lot of work to do to bring the law into line with the age of the Internet and that law enforcement personnel, attorneys, employers, schools -- and even some parents attempting to monitor their children’s Internet and cell phone usage -- will find themselves stumbling across uncertain terrain for some time to come.

The next session I attended, Incident Response Using Open Source Forensic Tools, focused on the New York State Digital Forensics Workgroup’s testing of open source alternatives to commercial forensics packages such as EnCase. The Digital Forensics Workgroup is headed by the New York State Police and consists of staff employed by many other agencies. Tom Hrbanek of the State Police, who initiated the discussion, noted that many agencies struggle to find the resources needed to do forensics work, and the workgroup wanted to see whether open source software would lower training and other costs. It also wanted to determine whether open source tools would make it easier for the workgroup to expand its focus to include live capture of evidence as well as post facto incident response.

John Griffin of the New York State Multi-Agency Digital Forensics Analysis Center, which focuses on state employee misconduct, explained how the workgroup conducted its tests. It spent about $400 to purchase a desktop computer that ran Linux and installed several open source forensic tools. It then downloaded and ran a hypothetical hacking scenario created by the National Institute for Standards and Technology (NIST). This scenario is accompanied by 31 questions that forensic analysts should be able to answer, and the testing team was able to answer all 31 questions with the open source tools and to validate the results with commercial forensics applications.

Mike Gibbs of the New York State Office of Children and Family Services then outlined some of the technical dimensions of the project. The forensics tool that the testing team used is called PTK, which runs on a variety of Linux distributions and on Mac OS X, and he discussed some of the problems they encountered. He also directed attendees to more information about the project and the software used.

Tom Hrbanek concluded by noting that work on the project continues and that it will expand to include capturing live memory dumps and data moving across networks, etc., and that the group will present its findings in detail at the International Conference on Digital Forensics & Cyber Crime, which will be held at the University at Albany, SUNY in September.

Although some components of this presentation exceeded my technical expertise, it was fascinating to hear that forensics personnel focus on issues of authenticity and integrity and use some of the techniques (e.g., fixity checking, keeping computers offline) that we often use. There are of course huge differences between the two fields -- they're trying to put away bad guys, and we’re trying to keep records intact and accessible across time. It’s always fascinating to see how the digital era has forced professions that formerly had little in common to focus on some of the same concerns.