Tuesday, April 28, 2009

MARAC: Wikis Here, There, and Everywhere

Well, here it is, a mere ten days late: my final MARAC Spring 2009 post. I think I’m going back to the daily post style that I used at SAA 2008 -- unless, of course, anyone out there has a better idea . . . .

The last session I attended highlighted the many different ways in which archives are using wikis. I learned a few things about the varied uses to which wikis can be put . . . and a few things about why my own experiences with them have been less than satisfactory.

Kate Colligan outlined her use of a wiki to support the University of Pittsburgh's processing of the records (1887-1973) of the Allegheny County (Pa.) Coroner. Approximately 30 people, most of them undergraduate interns, ultimately participated in this project, which involved the flattening, rehousing, and indexing of approximately 220,000 trifolded documents.

In order to sustain the interns’ interest in the project and satisfy the writing component of their internships, Colligan created the Coroner Case File Documentation Wiki. This wiki allowed the interns to share in real time interesting things they found within the records, add descriptive tags, supply file arrangement information, and document their responses to files concerning murders, suicides, and accidents. Colligan also gave students research assignments that broke up the monotony of (and sometimes disrupted) processing, and this research is reflected in the wiki’s detailed timeline of life in Pittsburgh.

Colligan concluded that when working with wikis, immediacy is a more important goal than perfect writing and presentation. One should also have a clear sense of one’s target readership. In the final analysis, the core readership of this wiki seems to have been the project staffers themselves; however, the wiki has been discussed in genealogical chat rooms and has gotten a fair amount of international traffic.

Finally, Colligan noted that the creation of the wiki means that the preservation issues associated with this project have grown to encompass digital materials. She isn’t sure what the future holds for this wiki, but it has survived a recent migration from an older version of the wiki software (PBWiki) to a newer one (PBWorks).

Jean Root Green succinctly discussed the Binghamton University Libraries’ internal staff wiki. The wiki (created with MediaWiki) has been in place since 2005, and its unveiling was accompanied by a lot of staff training and the development of style guides, templates, and resources that made it easier for staff to use the wiki appropriately. She stressed that the careful planning that went into the development of the wiki and its supporting materials is crucial to the wiki’s success: even people who generally aren’t comfortable with technology feel comfortable making use of the wiki.

The wiki enables staff to discuss internal matters candidly, collaborate on policy and other documents, and it automatically records and tracks changes. It has pages for all projects, committees, task forces, etc., and includes documentation for and links to additional information about all of the libraries’ information technology systems. In addition, it enables staff to publicize collections internally and post reports about conference sessions and other professional development events that they have attended.

David Anderson detailed how George Washington University’s Special Collections Research Center used MediaWiki to create the George Washington University and Foggy Bottom Historical Encyclopedia. Unlike paper encyclopedias, which fade from consciousness soon after publication, this encyclopedia is online, constantly updated, and frequently consulted.

Work on the encycopedia began in 2006, when Anderson created templates and instructions for adding content, and to this day it adheres more closely to the traditional scholarly model of enyclopedia production than to the interactive Wikipedia model: two editors initially oversaw the development of the enyclopedia, and Anderson now serves as the gatekeeper for all additions and revisions. I suspect that Anderson and his colleagues were drawn to MediaWiki not because it can incorporate user-generated content but because it’s free and easy to use.

Scanned documents, articles written by faculty, staff, and students, timelines, and other materials are regularly added to the encyclopedia. At this time, there are 2,910 items in the database and 648 legitimate content pages; each photo is counted as a separate page, hence the discrepancy. There have been over 2 million page views to date. The most popular pages are the main page, the A-Z listing of campus buildings, and pages dedicated, among other things, to football (the university hasn’t fielded a team since 1966), distinguished alumni, Muhummad Ali (who one spoke on campus), various aspects of student life, and cheerleading.

Anderson noted that Google and other search engines have indexed these pages, and as a result he and his colleagues have gotten some non-historical reference inquiries; as a result, he has modified some pages to include pointers to, e.g., campus events calendars.

I’m glad I attended this session. Wikis really are suited to the sort of internal information-sharing that Jean Green discussed, and can readily serve as the backbone of scholarly Web projects of the sort that David Anderson developed. Kate Colligan’s processing wiki is also a great use of the technology; such wikis can capture information that might otherwise remain unrecorded.

However, wikis also have their limits, and this session led me to realize that my colleagues and I have sometimes used wikis not because they were the best tool for the job but because they were the least awful of the available IT options. In some instances, we actually need is something that combines the best features of, e.g., Microsoft Word (i.e., ability to create long, complex, highly formatted documents) with the ease of use and change tracking features of the best wiki software -- without the clutter and chaos of, e.g., Track Changes. If you have any suggestions, I would be most appreciative.

MARAC: Flickr: An Image is Worth a Thousand Views

Flickr is an online photo sharing site that enables users to “tag” (i.e., provide descriptive and other information about) one’s images. In this great session, archivists working in a variety of settings highlighted its practical value to archives.

Barbara Natonson discussed a pilot project undertaken by the Library of Congress (LC), which wanted to learn how social tagging could help cultural institutions and participate in an online community. LC chose Flickr because of its popularity and because its application programming interface (API) facilitated batch loading of photos. LC’s experience should be of interest to many larger repositories.

LC determined at the outset that every image it placed on Flickr would be available via its own site and that it would post only those images that lacked known copyright restrictions. It then did some custom programming that made batch loading practical and made its copyright statement (developed in consultation with the U.S. Copyright Office) appear whenever one of its photos was displayed. It also purchased a Flickr Pro account ($24/year) that allowed it to add large numbers of images and view access statistics.

LC’s first photos went online in early 2008, and LC adds new photos on a weekly basis. As of mid-March 2009, LC’s Flickr images have gotten roughly 15 million views. Most of the traffic comes from Flickr itself, but some of it arrives via seach engines, which index user comments.

To date, approximately 4,500 users have commented on at least one LC image. However, 40 percent of the tags are supplied by a small group of people, and most of the comments concerning images accompanied by good descriptive information simply repeat that information or document emotional/aesthetic responses. Images that lack such information produce the informative tags and comments that LC seeks.

A core group of approximately 20 “power commenters” corrects place names, supplies additional descriptive information, does historical detective work, and incorporates LC images into Wikipedia, etc., entries. These commenters have also highlighted how places have changed over time; photos documenting changes and links to GoogleEarth accompany some of these discussions.

LC actively monitors its Flickr photosets for uncivil discourse, and staff incorporate user-supplied information into LC’s descriptive resources and periodically update Flickr users on LC’s work; this work takes about 15-20 hours per week, and staff rotate responsibility for it. LC has also started incorporating links to Flickr versions of its images into its online catalog.

Natonson noted that that there are some risks to Flickr (and, by extension, other Web 2.0 technologies):
  • Disrespect for collections -- Flickr privileges the individual image
  • Loss of meaning/contextual information -- LC links Flickr images to its descriptive information in an effort to remedy
  • Reduced revenue from photo sales
  • Undigitized collections are by definition excluded
However, there are also substantial benefits:
  • Collections are made more widely available
  • LC gets additional information about its collections
  • The visibility of specific photos is increased
  • LC’s Flickr presence helps win support for cultural heritage institutions
  • Users can mix past and present -- thus leading to a more informed world
Natonson also discussed The Commons, which Flickr has developed specifically for cultural heritage institutions wishing to provide access to images lacking known copyright restrictions and which tries to address the individual biases in Flickr’s existing terms of service. At present, 24 institutions are members of The Commons.

The other presenters highlighted how smaller repositories could make use of Flickr. Judy Silva discussed how the Slippery Rock University Archives, which uses CONTENTdm to manages its digital collections, has used Flickr to reach out to new audiences and experiment with Web 2.0 technology. Slippery Rock’s Flickr project, which made use of the university library’s existing Flickr account, centered on 41 digitized photographs taken by an alumnus during his time in service during the Second World War.

It took Silva one afternoon to load the images into Flickr and do some very basic (i.e., non-LC) tagging, and the rewards have been substantial: to date, Slippery Rock has gotten over 700 comments on these photographs, and one commenter forwarded the obituary of one of the people depicted in one of the images.

Owing to the success of this project, Silva is thinking of adding more recent images in an effort to get information from people who might Google themselves or their friends.

Malinda Triller was not able to come to Charleston, so her colleague Jim Gerenscer discussed how Dickinson College's Archives and Special Collections department, which also uses CONTENTdm, is using Flickr to publicize and obtain more information about its photographic holdings.

By design, the archives’ Flickr project was simple enough to be completed largely by undergraduates. The archivists identified images that lacked copyright restrictions, had appeal outside of the Dickinson community, and had basic contextual metadata, and students scanned the images and added them to Flickr.

Unlike LC and many other repositories, which create high-resolution master images in TIFF format and mount lower-resolution JPEG derivatives on Flickr and their own Web sites, Dickinson didn’t want to manage TIFF files. Students thus scanned the images in JPG format, at 100 dpi, and in grayscale or color as appropriate; in the future, the archives will rescan the images as needed. Project work is documented in a basic spreadsheet that contains the unique identifier, description (collection-derived or student-supplied), and title of each image.

To date, Dickinson’s Flickr photosets, which consist of images of an 1895 family trip to Europe, the 1893 Columbian Exposition, a school for Native American children, and construction of a major road in Alaska, have received 66,000 hits, which is a remarkable amount of exposure for a college archives; however, the archives recently learned that its Flickr account settings greatly limited the number of people who could comment upon the images, and it corrected this error a short time ago. The archives is really pleased with the project and is planning to add another set of images to Flickr.

I think that a lot of archivists are hesitant to embrace Flickr and other interactive Web technologies because they either don’t grasp their potential or fear that they’ll find themselves in the midst of a digital Wild West. This session highlights how repositories of varying sizes can use Web 2.0 technology without being consumed by it or losing physical or intellectual control of their holdings, and many of the attendees seemed really intrigued by these presentations. I suspect that The Commons will grow as a result of this session . . .

Sunday, April 26, 2009

MARAC: Will the Fruit Be Worth the Harvest? Pros and Cons of End of Presidential Term Web Harvesting Projects

We as a profession are still trying to figure out how to deal with Web sites, which exist in the netherworld separating archives and libraries and pose a host of preservation challenges, and this session furnished interesting insight into the contrasting approaches of the U.S. National Archives and Records Administration (NARA) and the Library of Congress (LC).

Session chair Marie Allen (NARA) noted that NARA’s handling of Web records has consistently engendered controversy. Its 2000-01 decision to compel Federal agencies to copy their Web site files, at their expense, at the end of President Bush’s first term of office and transfer them to NARA within eight days of doing so angered agencies, and its 2008 decision not to take a Web snapshot (i.e., a one-time copy) of federal agency sites at the end of President George W. Bush’s second term aroused public concern.

Susan Sullivan (NARA) pointed out that in 2004 NARA had contracted with the Internet Archive to copy publicly accessible federal government Web site that it had identified and to provide access to the copies, then explained the rationale for NARA’s 2008 decision: it has determined that Web records are subject to the Federal Records Act and must be scheduled and managed appropriately. It issued Guidance on Managing Web Records in January 2005 and has since offered a lot of training and assistance to agencies; some of this information is available on NARA’s Toolkit for Managing Electronic Records, an internet portal to resources created by NARA and many other entities.

Sullivan emphasized that snapshots are expensive, have technical and practical shortcomings, and encourage the agency misperception that NARA is managing Web records. In fact, there is no authoritative list of federal government sites, which means that snapshots fail to capture at least some sites. Moreover, snapshots capture sites as they existed at a given point of time, cannot capture Intranet or “deep Web” content, and are plagued by broken links and other technical limitations. In sum, snapshots do not document agency actions or functions in a systematic and complete manner.

NARA is still copying Congressional and Presidential Web sites, which are not covered by the Federal Records Act. Although these snapshots have all of the problems outlined above, NARA regards them as permanent.

Abbie Grotke (LC) then outlined LC’s response to NARA’s 2008-09 decision: in partnership with the Internet Archive, the California Digital Library, the University of North Texas, and the Government Printing Office, it opted to take snapshots of publicly accessible federal government sites. All of the partners seek to collect and preserve at-risk born-digital government information, and all of them believed that the sites had significant research potential.

The partners developed a list of URLs of publicly accessible federal government sites in all three branches of government; they placed particular emphasis on identifying sites that were likely to disappear or change dramatically in early 2009. They then asked a group of volunteer government information specialists to identify sites that were out of scope (e.g., commercial sites) or particularly worthy of crawling (e.g., sites focusing on homeland security). This process ultimately yielded a list of approximately 500 sites.

The partners took a series of comprehensive snapshots and a number of supplemental snapshots focusing on high-priority sites. Much of this work centered on two key dates -- Election Day and Inauguration Day -- but some copying is still taking place.

Grotke outlined the project’s challenges, which will be familiar to any veteran of a multi-institutional collaborative project. The partners had no official funding for this project and thus have had to divert staff and resources from day-to-day operations. They have also had a difficult time managing researcher expectations: users want immediate access to copied sites, but the indexing process is time-consuming. The partners have also had to accept that, owing to the technical limitations of their software and the possibility that some sites escaped their notice, they could not fully capture every federal government site.

The snapshots have nonetheless captured a vast quantity of information that might otherwise be lost, and the project is also paving the way for future collaborations.

Thomas Jenkins (NARA) then explained how Web sites fit into NARA’s three-step appraisal process, which is guided by Directive 1441 (some of which is publicly accessible):
  • Data gathering. When appraising Web sites, an archivist visits each site and analyzes the information found on it, interviews agency Web administrators, assesses the recordkeeping culture of the creating agency, and determines how the site’s content relates to permanent records in NARA’s holdings.
  • Drafting of appraisal memorandum. The archivist prepares a detailed report that assesses the extent to which the site documents significant actions of federal officials, the rights of citizens, or the “national experience.” The report also examines the site’s relationship to other records identified as permanent (i.e., is the Web site the best and most comprehensive source of information?)
  • Stakeholder review. Each appraisal memorandum is circulated within NARA and then published in the Federal Register in order to solicit agency and public input.
Using a site created by the U.S. Department of Justice as an example, Jenkins highlighted how this process works and why NARA ultimately determined that this site, which contains only a fraction of the information contained within other series deemed archival, did not warrant permanent retention. In contrast, NARA has determined that the site of the U.S. Centennial of Flight Commission warrants permanent preservation because it contains significant information not found in other series.

In response to a comment concerning whether Web snapshots capture how an agency presents itself to the public, Jenkins stated that NARA assesses whether the information presented on a given site is unique. Moreover, NARA is aware that other entities are crawling federal government sites. Although there is a risk that this crawling activity will cease, a risk analysis indicated that archival records and other sources of information amply document the agency’s activities.

Although this session illuminated how and why NARA and LC reached such sharply contrasting decisions and highlighted some resources that somehow escaped my attention, it underscored precisely why the profession hasn't reached any sort of consensus and is unlikely to do so in the near future. Many if not most state and local government archives lack the degree of regulatory authority afforded by the Federal Records Act, and as a result many of them will not want to rely upon the kindness of site creators. Archivists working in repositories with broad collecting missions may have great difficulty ensuring that creators properly maintain, copy, and transfer site files. Moreover, some archivists will doubtless differ with NARA's conclusion that documenting how site creators presented themselves to the public is not sufficient reason to take periodic Web site snapshots or otherwise preserve sites comprehensively. As a result, many of us will likely find LC's approach to federal government sites or NARA's handling of Congressional and Presidential Web sites more relevant to our own circumstances than NARA's treatment of executive-branch agency sites.

Thursday, April 23, 2009

2009 Best Practices Exchange: call for proposals

The University at Albany, SUNY has issued the call for proposals for the 2009 Best Practices Exchange. It's making its way onto listservs targeting state government electronic records archivists, digital librarians, electronic records managers, and IT professionals. Just in case you haven't seen it yet, here it is . . . .


We are seeking proposals for sessions to be presented at the 4th annual Best Practices Exchange (BPE), which will be held in Albany, New York, at the University at Albany, SUNY, on September 2-4, 2009. The BPE is a conference that focuses on the management of digital information in state government, and it brings together practitioners to discuss their real-world experiences, including best practices and lessons learned. The theme of this year's BPE is "Tackling Technology Together." Its focus will be on collaboration between and within branches of state government, and between librarians, archivists, records managers, information technology professionals, and others concerned with managing state digital assets.

This year's conference has four tracks. Each track is enumerated below, along with a list of themes embraced by each track. We ask that potential speakers be guided, but not limited, by the themes indicated. Each session will be 90 minutes long.

1) Finding Funding: securing support, developing a marketing strategy, unexpected funding sources, and advocacy

2) Creative Collaboration: finding common ground, a seat at the table, and unexpected partners; crossing professional boundaries; fostering leadership; building communities; and sustaining collaboration

3) Educating Each Other: learning new technical skills and new "soft" skills, learning each others' language, and ensuring professional development

4) Living Without Closure: morphing from project to program, defining "finished," planning for an unknown future, finding new uses for old ideas and tools, and managing change

Please send all session proposals to Brian Keough, Head of the M.E. Grenander Department of Special Collections and Archives, University at Albany, SUNY, bkeough[at]uamail.albany.edu. The deadline for submission is July 15, 2009.

Catching up

Bluestone Lake, Summers County, West Virginia, 20 April 2009.

I was planning to spend this evening blogging about the remainder of the Spring 2009 MARAC sessions that I attended, but I'm still trying to catch up all of the work and personal stuff that accumulated while I was at MARAC and with family in Mercer County, West Virginia. As a result, I'll likely need to wait until the weekend to catch up on all of my planned MARAC posts; all of the sessions were really good, and all of them warrant more thought than I'm capable of mustering at this time.

Wednesday, April 22, 2009

MARAC: There and Back Again: Nazi Anthropological Data at the Smithsonian

I wrote this post during a long layover at the Detroit Metro Airport on 21 April 2009, and finished around 8:35 PM, but simply wasn't prepared to pay $8.00 for the privilege of accessing DTW's wireless connection.

I attended this session simply because the topic seemed interesting, and I’m glad I did: the records at the center of this session are inherently interesting (albeit in a disturbing sort of way), have a complicated, transnational provenance, and processing them, reformatting them, and determining where they should be housed posed real challenges. Although most of us will never encounter a situation quite as complex, many of us eventually encounter records of uncertain or disputed provenance, materials that lack discernable order, or multi-stage reformatting projects. The decisions that the Smithsonian made and the lessons that it learned thus ought to be of interest to many archivists.

The records in question were created by the Institut für Deutsche Ostarbeit (IDO; Institute for German Work in the East), which the Nazis created in 1940 to settle all questions relating to occupation of Eastern Europe. Edie Hedlin (Smithsonian Institution Archives), Beth Schuster (Thomas Balch Library), and Ruth Selig (Smithsonian) took turns discussing the records’ complicated custodial history and the Smithsonian’s involvement with them.

The IDO had many sections, including one that focused on “racial and national traditions” and researched Polish ethnic groups; however, apart from one study completed in the Tarnow ghetto, the IDO’s racial and national section did not study Jews. The section gathered or created data forms (e.g., personal and family histories), photographs of people and objects, and bibliographic and reference cards and published articles based on some of this research.

U.S. and British troops captured the IDO’s records in 1945, and the U.S. Army brought the records to the United States in 1947. The War Department’s intelligence division and the Surgeon General’s medical intelligence unit went through the records (in the process destroying whatever original order may have existed) and then offered them to the Smithsonian. The Smithsonian accepted the records, but then transferred some of them to the Library of Congress, the National Gallery of Art, and the Pentagon (which then sent some of the records to the National Archives). As a result, there are small pieces of the collection all over Washington, DC.

The IDO records held by the Smithsonian were not used for research until 1997, when a cultural anthropologist reorganized some of them, created the collection’s first detailed finding aid, and eventually published a book based on her research.

In 2003, the Polish Embassy requested that the IDO records be returned to Poland. It took the Smithsonian about five years to figure out how to respond to this request, and its response was the product of repeated consultation between various units of the Smithsonian, the State Department’s Holocaust studies unit, and the Library of Congress, which had received competing requests from the German and Polish governments for materials that had been created by German authorities but which concerned Poland; the State Department, which noted that the Smithsonian’s decision might set a precedent, wanted the governments to reach some sort of agreement concerning the materials in LC’s possession.

In order to determine how it would respond to the Polish government’s request, the Smithsonian set up a task force that examined:
  • Accepted archival principles and guidelines;
  • Whether the U.S. Army had acted legally when it took the records and gave them to the Smithsonian;
  • Whether the other Allied nations had any legal claim to the records;
  • The Smithsonian’s authority to acquire, hold, and de-accession archival collections;
  • The records’ unique characteristics and potential research uses;
  • Whether various other parties—the U.S. Army, the Bundesarchiv and other German government agencies, the U.S. National Archives and Records Administration, the U.S. Holocaust Memorial Museum, the Polish government, and the U.S. State Department—had any interest in the records;
  • The impact of any precedents that the Smithsonian’s actions would establish upon the Smithsonian itself, the Library of Congress, the Hoover Institution (which holds most of the records of the Polish government in exile), and U.S. government agencies.
The process of determining whether other parties had any interest in the records required tact and discretion. However, the Smithsonian eventually determined that neither the U.S. Army nor the Bundesarchiv objected to returning the records to Poland, and the State Department, which was extremely helpful throughout the process, determined that the German government had no interest in the records.

In September 2005, the Smithsonian decided that it would make copies of the records and then transfer the originals to the Jagiellonian University Archives, which agreed to make them publicly accessible. It opted to digitize the records and then produce microfilm from the scans, and needed to raise a lot of money to do so. It initially requested funding from a private foundation, which deferred giving an answer for approximately a year. When the Polish Embassy inquired about the status of the project, the Smithsonian seized the opportunity to cc: approximately 20 other people and institutions in its response. As a result of this e-mail exchange, the U.S. Holocaust Memorial Museum offered funding for digitization and for conservation and allowed the Smithsonian to use its standing digitization contract; the Polish university to which the records were headed also offered some support.

The Smithsonian engaged Schuster, an archival intern fluent in German, to process the records and oversee their digitization. Schuster humidified, flattened, and cleaned the records, which were trifolded and covered in coal dust and other contaminants, and rehoused them in boxes suitable for A4-sized paper. She imposed order upon them, which was no small challenge. The anthropologist who prepared the initial finding aid had attempted to arrange the records geographically; however, she was chiefly interested in the IDO’s Tarnow ghetto and Krakow studies, and as a result most of the collection was unarranged. Schuster ultimately organized the records by type. In order to preserve the initial arrangement of the records (which was reflected in the anthropologist’s published citations), she created an Access database that tracked the original and new order of each document in the collection and generated container lists that contained crosswalks between the two arrangements.

Schuster also shared a couple of lessons she learned during the digitization phase of the project:
  • Digitization should begin only after a collection is completely conserved and reprocessed. Project deadlines led the Smithsonian to start digitizing as soon as possible, and as a result, the image files had to be renamed after processing.
  • Do not underestimate the amount of time and effort needed for good quality control. The Smithsonian needed accurate, complete surrogates and to ensure that every original had been scanned, and as a result Schuster needed to examine each image and count the number of pages in each folder. She had to send back to the vendor many originals that were scanned crookedly or were missed, and she has a jaundiced view of outsourcing as a result.
The project wrapped up in late September 2007, when the records were sent to Poland via diplomatic pouch; however, Schuster continued to rename the image files and correct the finding aid, and the Smithsonian finished producing microfilm from the digital surrogates in April 2009. The transfer deeply pleased the Polish government: within a few months of the transfer, it tracked down people who had taken part in IDO studies as children and completed a short film highlighting their recollections.

Ruth Selig concluded by making a very important point: the transfer was successful because the Smithsonian committed to working through a complicated process in a very deliberate, step-by-step manner. Many different institutions were brought together in interesting and unanticipated ways, and everyone was pleased with the outcome. Even the State Department was pleased; the initial request was technically issued by Jagiellonian University and directed to the Smithsonian, which is not a government agency, so the Smithsonian’s transfer decision really isn't precedent-setting.

All in all, a good session full of practical tips for dealing with a wide array of complex issues.

Saturday, April 18, 2009

Radio silence . . . for a while

I'm heading off to the Internet-free zone of my relatives' home for a few days, so I won't have the chance to post anything else about MARAC until sometime next week. Look for a flurry of posts when I get back. . . .

MARAC: A Peak at the Portfolio

Last night, I broke one of the vows I made to myself when I started this blog: I didn’t write about all of the conference sessions I attended on the same day that I attended them. I stayed in the hospitality suite a little too long, and by the time I got back to my room I wasn’t in the mood to do anything more than tweak what I had written about Ken Thibideau’s plenary address. However, I’m not feeling too remorseful: I really enjoyed getting the chance to talk with archivists I rarely see at places other than MARAC.

Moreover, I’m spending a leisurely afternoon in Charleston while waiting for my parents, who are en route from Ohio, to meet me here; we’re going to spend the next few days visiting relatives in Mercer County, West Virginia. As I write this, I’m sitting at a window table at Taylor Books, Charleston’s gem of an independent bookstore, café, and gallery. I can’t think of a better place to atone -- at least in part -- for last night’s fall from grace . . . .

Yesterday morning, I sat in on A Peak Inside the Portfolio, which focused on several initiatives that may facilitate the preservation of electronic records.

Don McLaughlin of West Virginia University discussed the SLASH2 project, and in the process gave a terrifying overview of the exponential increase in the amount of scientific data being generated (e.g., the Large Halon Collider at Switzerland’s CERN generates approximately 1 GB of data per second) and scientists’ need to preserve this data for future reuse. It’s simply not possible to build a single centralized resource that can store, preserve, and support analysis of immense datasets, so people who work with and preserve the data must spend a lot of time moving data across multiple, often geographically distributed systems and dealing with corrupted files, bad disk drives and tapes, and other problems. SLASH2, a data management system developed by the Pittsburgh Supercomputer Center (PSC) and currently being tested by PSC and West Virginia University, will automate much of this work and should improve system performance.

As the archivist daughter of an engineer, I was really taken with the presentation given by Victor Mucino of West Virginia University. Mucino, who is an engineer is examining the IDEFO standard and other options for adding contextual information to STEP and other current standards for the exchange of electronic engineering and design information. These standards can express how a given thing can be produced, but they don’t explain why it was produced as it was; for example, the size of a key piece of the new Robert C. Byrd (!) Telescope in Green Bank, WV was determined by the size of the smallest tunnel between the site of its production and Green Bank, but the telescope’s design documentation omits this basic fact. They also omit the results of failure analyses and other tests. Mucino is exploring how STEP and other standards can be expanded to include this sort of contextual information, which facilitates troubleshooting, subsystem design and replacement work, and subsequent innovation.

Richard Marciano of the Data Intensive Cyber Environments (DICE) Group at the University of North Carolina-Chapel Hill discussed two projects that make use of the DICE Group’s iRODS (Integrated Rule Oriented Data System) data grid management system: the Transcontinental Persistent Archive Prototype, which is a partnership with the National Archives and Records Administration (NARA), and Distributed Custodial Archival Preservation Environments (DCAPE), which seeks to develop a digital preservation service for state archives and state university archives (and which I’m actively involved). iRODS assumes that collaborators are at multiple sites and have different policies, storage systems, and naming conventions, and makes it possible to store and access data in any format, stored in any type of storage system, and stored anywhere over a wide area network. It also allows users to specify high-level policies governing management of data, then breaks down those policies into rules that can be followed by computers and microservices that execute the rules. Richard concluded by noting that iRODS is both top-down and consensus-driven: any community that wishes to use iRODS needs to get together, determine its data management, preservation, and access policies, and translate these policies into rules and microservices that iRODS can understand.

After Richard’s presentation ended, Mark Conrad of the National Archives, who moderated the session, made a really important point: but every archives has policies, but articulating those policies is a real challenge, particularly when electronic records are involved. He’s absolutely right. Working on the DCAPE and PeDALS projects and dealing with some things that have recently come up at work has really driven home the importance of defining and documenting policies governing the processing of electronic records. We’re so comfortable with paper records that we don’t often question our processing, description, etc., practices -- or explain to students or paraprofessional staff the underlying rationale for their assignments. This is not good practice, and Mark is absolutely right that we as a profession need to devote ourselves to documenting our policies.

Friday, April 17, 2009

MARAC: plenary session

The Spring 2009 meeting of the Mid-Atlantic Regional Archives Conference started this morning with the plenary address, which was delivered by Ken Thibideau, director of the Electronic Records Archives (ERA) initiative at the U.S. National Archives and Records Administration (NARA). He sought to put ERA in an archival context, and two main threads -- the exponential increase in the volume and complexity of federal electronic records and NARA’s efforts to avoid a catastrophic system design failure akin to those experienced by other agencies and numerous software firms -- shaped his presentation.

Thibideau noted that for more than two decades, NARA’s technical capacity was limited to copying data from one tape to another and generating printouts of the data. However, the volume of electronic records transfers to NARA continually increased, and legal requests for Oliver North’s e-mail and the impending departure of President Clinton revealed that this approach was no longer workable: NARA estimated that tapes transferred by the Clinton White House would reach the end of their lifespan before NARA could finish copying all of the data on them to newer media. The ERA initiative was born out of this realization.

After outlining differences in how ERA will handle executive branch agency and Presidential records, Thibideau discussed how ERA has altered NARA’s workflows, and it seems as if NARA made some really smart decisions during the design process; during Thibideau's talk, I started contemplating how we might streamline some of our own workflows. ERA will manage the scheduling and transfer of all records, regardless of format, and reduce the number of forms that agencies must complete. It will also require agencies to supply additional information about records, particularly those in electronic format, and specify how and when permanent records will be transferred to NARA; at present, agencies don’t have to do so. ERA will also support the online transfer of electronic records and manage all of the metadata documenting the acquisition, processing, and dissemination of electronic records. In the future, it will also support review and redaction and the long-term preservation of electronic records.

In assessing the likelihood of ERA’s success, Thibideau underscored a key point: NARA has been extremely consistent about its expectations for the system. It outlined its requirements in a planning document that has not been changed substantially since 2003, and as a result has been able to focus on securing what it wants; the FBI’s Virtual Case File project and several other large-scale system design initiatives failed in part because managers could never settle on core requirements. Although the order in which some NARA components will be rolled out has changed, NARA is on track to acquire all of the functionality outlined in the 2003 planning document by the planned date of 2012.

Thibideau concluded by discussing the broader impact of ERA, which he admitted might not be as dramatic or as wide-ranging as some in the archival community had initially hoped. Although it might be legally possible for state and local governments to use ERA, they would have to adopt NARA’s scheduling process and hire someone to integrate ERA’s off-the-shelf components into their existing IT environments. However, it might be possible for them to use ERA as a preservation service, and NARA may develop open-source preservation tools that others could employ. In addition, its involvement in the federal government’s IT procurement process will help to ensure that archival concerns are increasingly reflected in system design.

All in all, a good, thought-provoking talk.

Thursday, April 16, 2009

A day in Charleston, West Virginia

I'm in Charleston for the Spring 2009 meeting of the Mid-Atlantic Regional Archives Conference, which starts tomorrow, but when I was making travel arrangements I decided to arrive a day early so that I could spend a little time exploring the city. I've been through Charleston countless times while traveling to see my mother's family in Mercer County, but I've never been to Charleston, and I decided to seize the opportunity to do so.

The conference hotel is on the banks of the Kanawha River in the heart of downtown Charleston, and I spent the morning taking an impromptu walking tour.

I first headed to the Kanawha County Courthouse, which was built in 1892 and sits on the corner of Court and Virginia Streets. Its masonry construction and Romanesque Revival style are common features of downtown Charleston's streetscape.

The much newer Robert C. Byrd United States Courthouse is also on Virginia Street, roughly opposite the county courthouse. Senator Byrd has been a tireless champion of West Virginia, and has used his power and influence to bring federal dollars to the state. As a result, many, many things in West Virginia are named after Senator Byrd or his late wife; in fact, earlier today, Senator Byrd was in Huntington for the dedication of the Erma Ora Byrd Center for Educational Technologies at Marshall University.

The city's Municipal Auditorium, which has been restored to its Art Deco glory, sits on Virginia Street next to the Byrd Courthouse.

The gleaming domes of St. George's Orthodox Cathedral, which is on Court Street a few blocks away from the courthouses, fit quite well with the adjacent glass-box skyscraper.

Taylor Books is a really nice independent bookstore, gallery, and cafe situated on Capitol Street, in the heart of the Downtown Charleston Historic District. I pumped a little money into the local economy while here.

Capitol Street was home to the state capitol (which has moved around quite a bit) between 1885 and 1921, when the building burned to the ground. All that remains at the site is a piece of the stonework from the building's portico.

This sleekly modern Stone and Thomas building was on the West Virginia Department of History and Culture's 2005 list of the state's most endangered historic sites. The Stone and Thomas department store chain was a West Virginia institution, and it vanished in 1998 when the Ohio-based Elder-Beerman chain purchased Stone and Thomas and renamed all of the stores.

Downtown Charleston is home to many, many historic churches, but the mid-day sun made it difficult to get decent pictures of many of them. The First Presbyterian Church, which sits on Leon Sullivan Way (formerly Broad Street), was one of the few that I photographed well. If you would like to see pictures of these houses of worship, all of which are really lovely, check out the Wikipedia page devoted to the National Register of Historic Places listings in Kanawha County, West Virginia; another MARAC attendee is devoting himself to populating this page with images.

The Masonic Temple, which sits at the corner of Virginia and Hale streets, is a gleaming neo-Gothic beauty.

In the afternoon, I took a guided tour -- just for MARAC folks -- of the State Capitol and the West Virginia Cultural Center, which are situated a couple of miles to the east of downtown Charleston. I have countless girlhood memories of seeing the dome of the Capitol (then all gold) from I-77, and I'm elated that I got to see it up close.

The interior of the Capitol dome is as lovely as the exterior.

Unlike many state capitols, the West Virginia State Capitol is relatively devoid of murals and other artwork, which is confined to select areas of the building. As a result, the building, which was designed by Cass Gilbert, has an elegant simplicity that is really appealing. The hall leading to the Senate chamber exemplifies the understated opulence of the place.

A statue of U.S. Senator Robert C. Byrd occupies a prominent position in the capitol rotunda. The head is disproportionately large. Is this some sort of sly commentary, or a sign of the artist's limitations?

As we were preparing to enter the Governor's press room in the Capitol, West Virginia's First Lady, Gayle Conelly Manchin, walked past our tour group and greeted us. A few minutes afterward, we went to the Governor's Mansion. The Governor and First Lady live in the upper floors of the building, but the first floor and part of the second floor are open to the public.

One of the most notable features of the interior is the double staircase leading from the Reception Hall to the second floor.

Our tour ended at the West Virginia Cultural Center, which houses the State Archives, State Library Commission, and State Museum. The State Museum is in the final stages of a mammoth renovation, and we got a behind-the-scenes tours of the new exhibits, which are slated to open on 20 June of this year. My pictures really don't do them justice, so I'm not posting any of them; however, the State Museum has an interactive update on the progress of the renovation on its site. The people of West Virginia are going to have a wonderful new museum.

One of the memorials on the Capitol grounds honors West Virginia's coal miners. It's a nicely executed statue, and one that resonates (even if there is a badly-placed lamp post behind it): my grandfather and two of my uncles were miners. However, later today, I learned that one of the plaques on the far side of the statue's base commemorates the mining technique known as mountaintop removal. It's basically strip mining on steroids, and it's widely used throughout Appalachia. The mining companies assert that mountaintop removal is the only economically feasible way to remove the coal from the earth, but many Appalachian residents bitterly oppose it because it destroys the landscape, contaminates wells, and may contribute to flooding problems.

I ended the day on the tranquil banks of the Kanawha River. Our hotel is opposite a small park, and there's a paved riverfront path that's at least a couple of miles long. It's the perfect place for an evening stroll.

Tuesday, April 14, 2009

Heading off to Charleston

I'm heading off to Charleston, West Virginia for the Spring meeting of the Mid-Atlantic Regional Archives Conference, so I probably won't do a lot of blogging during the next couple of days.

In the meantime, if the Susan Boyle craze has passed you by, you owe it to yourself to check out her staggering performance (embedding disabled) on Britain's Got Talent. When Boyle, who is 47, unemployed, and quite ordinary-looking, walked onstage, both the audience and the judges snickered at her. Then she began singing "I Dreamed a Dream" from Les Miserables, and moved them to tears. Susan Boyle is a beautiful woman and a gifted artist.

Monday, April 13, 2009

NYS e-records symposium available online

Last year, the New York State Archives co-sponsored a series of electronic records symposia that highlighted how local governments and state agencies were addressing various electronic records issues. I was able to attend and blog about one symposium, Taming the Wild Frontier: EDMS Implementations for State and Local Government, but my synopsis really didn't do the sessions justice. Happily, these sessions were taped, and anyone who has Flash Video Player installed on his or her computer can view the slides and listen to the speakers via the State Archives' Web site; closed captioning is available.

These presentations have actually been on up on the State Archives site for quite some time, but I somehow managed to overlook them -- which is deeply embarrassing given that there's a prominent link to them on the home page! However, these presentations are so good that I'm willing to look a bit foolish in order to trumpet their existence. If you're interested in electronic document management systems, electronic records management, or recordkeeping in state and local government environments, by all means check them out.

Sunday, April 12, 2009

Spring comes to upstate New York

Although today was chilly and windy, spring has definitely arrived. Green shoots of grass are starting to appear, and when I took out the recyclables yesterday I was surprised to find half a dozen daffodils peeking through the detritus in the back yard. They were an unexpected and most welcome find.

Wherever you are and whatever your weather's like, I hope that you are having a nice Easter or Passover.

Saturday, April 11, 2009

Records of Guatemala's disappeared

Every now and then, I come across a story that really makes me think about the importance of archives -- and the dangers that sometimes confront archivists and other people committed to preserving the historical record and demanding accountability from their government.

Earlier today, the Washington Post published an article highlighting the Guatemalan government's efforts to provide access to records documenting the actions of the army and national police during the country's long and extremely bloody civil war. Over 200,000 people were killed during the conflict, and approximately 40,000 disappeared without a trace. The fates of many people who disappeared are documented in records that were created by the Guatemalan national police, who subsequently tossed the documents into a disused munitions depot and left them to rot there. Civilian authorities inadvertently discovered the records in 2005, and archivists then began cataloging and scanning them; to date, roughly 7.5 million of an estimated 80 million documents have been digitized.

Approximately two weeks ago, the government began making the scanned images publicly accessible. Some of the digitized records consist of photographs of arrested students and labor leaders, and others provide detailed directions about how to spy on people who were subsequently kidnapped and murdered.

Although relatives of people who disappeared or were killed are glad that these records are being made accessible, other people fear being named as informants or perpetrators -- and at least a few of the latter are intent on keeping old crimes buried. On 24 March, Sergio Morales, the Guatemalan government's human rights ombudsman, released a public report concerning the records. The next day, his wife was kidnapped, drugged, tortured for several hours, and then released; her release may have been an attempt to lure Morales to a secluded place where he could be killed, but he and his wife were reunited without incident.

Morales and the archivists are nonetheless pressing on: records the archivists have processed have led to the issuance of arrest warrants for several people, and given that they have processed less than 10 percent of the records in the archives, it's highly likely that their work will result in many more criminal cases.

Given the horrific experience of Morales's wife, I think it's fair to say that the archivists responsible for processing the Guatemalan police records are working in a fairly risky environment. Those of us who are fortunate enough to work in stable democracies speak quite frequently about the importance of archives in holding government accountable, but we tend to focus on corruption and general stupidity, not mass torture and murder. We would do well to keep in mind that the accountability concerns of colleagues in many countries do center upon torture and murder -- and they might well pay a very high price for upholding our profession's ideals.

It is possible that the archivists working on the Guatemalan police records will be able to finish their work without incident. As the article points out, the very survival of the archives is a sign that Guatemala is moving toward the rule of law. At the time of its discovery by civil authorities, the archives was guarded by Ana Corado, a police officer who had been given the task because she had spurned her superior's advances. She began trying to care for the files, and when her supervisor ordered her to burn the documents, she refused on the grounds that unauthorized destruction of records was against the law. Despite her defiance, Corado is still a police officer, which would not have been the case a short time ago:
"If this had happened 20 years ago, I wouldn't be alive," Corado said. "I would be disappeared."
Moreover, the Guatemalan government is also seeking to declassify military records documenting the army's campaigns against leftist guerillas, which often resulted in the indiscriminate slaughter of civilians. Although the army is resisting this effort, the government has established an archives to house these records and continues to press for their release.

Thursday, April 9, 2009

SAA Election

The deadline for voting in the 2009 SAA election is April 11 at 11:59 PM Eastern Time, which means that anyone who hasn't voted has, at the time of this writing, approximately 55 hours to do so!

If you were an individual or student SAA member of good standing as of 28 February of this year or were the primary contact of an institutional member in good standing as of 28 February, you're eligible to vote in this election.

The online ballot can be accessed here (use your SAA user ID and password to log in), and the candidates' statements are available here.

Trying to make up your mind? To the best of my knowledge, Kate at ArchivesNext is the only person who has posted detailed endorsements online. Some of my choices differ from hers, but, as usual, she clearly outlines the reasons for her choices. If you're on the fence, you might want to see what she has to say.

Tuesday, April 7, 2009

Bettina Schmidt-Czaia, Cologne Archives

I honestly don't know how I missed it, but on 27 March the Guardian Weekly published Bettina Schmidt-Czaia's first-person account of the collapse of the Historical Archive of the City of Cologne. Schmidt-Czaia is the director of the Historical Archive, and her matter-of-fact narrative is wrenching.

3 March 2009 began well for Schmidt-Czaia, who spent part of the morning reflecting upon the progress that she and her growing staff were making: they were improving the care of the collections and creating new programs and exhibits that raised the repository's public profile. However, shortly after lunch, an alarm went off, everyone was told to leave the building immediately, and everything literally fell apart:
I opened the door of the reading room on the ground floor, and was confronted with panic. People were running in different directions or frantically packing up their belongings . . . . The last thing I remember doing was shouting "Outside, everybody, outside!" And then I ran.

A few seconds after I reached the back door and ran out onto the street, the school yard behind our building collapsed. When I turned around I saw the upper part of the archive bursting into a big, brown cloud.

I screamed. A terrible pain grew in my stomach as I realised what was happening. Cultural remains, collected for much longer than hundreds of years, were being destroyed. And it was taking place within seconds.
Schmidt-Czaia discusses in detail the recovery effort taking place at the site and the impact of the collapse on the archive's donors and its users, which include not only scholars but also school groups, genealogists, and the many people who saw the archive's local history exhibits. Finally, she outlines her hopes for building a new archival facility in a different part of the city.

Schmidt-Czaia has all of the hallmarks of a truly top-notch archivist: love of the records in her care, deep concern for her staff, determination to carry on despite suffering inconceivable loss, and a clear vision for the future. No archivist worth his or her salt will fail to be humbled and moved by her simple, eloquent account of living through and responding to the unthinkable.

Friday, April 3, 2009

News from Cologne

Today, one month to the day after the collapse of the building housing the Historical Archives of the City of Cologne, city officials began looking for a suitable site for a new archival facility. It seems that the new archives will be situated in the city center.

The recovery effort at the site of the collapse will likely continue for several more months. Earlier today, the Express posted a gallery of images of archivists and emergency personnel painstakingly searching through the rubble.

Finally, residents of and firms located in Cologne have to date donated 135,000 euros for the victims of the collapse, which demolished several adjacent apartment buildings, and tomorrow charitable officials in the city will begin disbursing a second round of assistance. Two young men died as a result of the collapse, and a number of other people were left homeless.

Wednesday, April 1, 2009

Security in Archives and Manuscript Repositories workshop

On Monday, I attended a one-day version of SAA's Security in Archives and Manuscripts Repositories workshop that Mimi Bowling and Richard Strassberg developed specifically for New York State Archives, New York State Library, and New York State Museum staff. About fifteen months ago, we learned that the Archives and Library were victims of a major internal theft, and this special workshop is part of our ongoing response to this discovery.

Sadly, theft of some sort is something that every archivist will likely cope with at some point in his or her career, and I haven't met an archivist who wasn't changed, in ways large and small, good and bad, by the experience of recovering from a theft. Mimi and Richard, who are both nationally recognized security experts and incredibly helpful and practical people, offer a wealth of information about reducing the risk of theft and how to respond when a theft comes to light.

I don't want to go into a ton of detail about the workshop, largely because I think that every archivist should take it, but I do want to share a few thought-provoking things that Richard and Mimi discussed:
  • Theft and sale of cultural materials is global and extremely profitable; only drug trafficking and computer crime are more lucrative.
  • Although there are no studies specifically focusing on archives and manuscript repositories, a 2001 FBI study of art museum thefts found that museum personnel were responsible for 82% of them. Law enforcement is aware of this study and will invariably treat archives and manuscript repository staff as prime suspects. Good security policies and practices enable investigators to clear innocent people as quickly as possible; they also minimize thefts of opportunity.
  • Unlike public library staff, who are often trained to deal with people who are upset or attempt to destroy or steal materials, archivists and manuscript curators generally don't know how to confront suspected thieves or vandals. We need written policies and training that will enable us to handle such situations lawfully and effectively.
Richard and Mimi also highlighted some resources and recent developments that should be of interest to the security-minded archivist (i.e., every archivist):
  • The Rare Books and Manuscripts Section (RBMS) of the Association of College and Research Libraries maintains an online listing of known thefts of library and archival materials. It's sobering reading.
  • A few weeks ago, RBMS issued the final draft of its Guidelines Regarding Security and Theft in Special Collections. Although Mimi and Richard differ with some of the advice found in this publication, they do see it as a valuable resource.
  • A bill (H.R. 1166) that would make it a felony to sell stolen property via the Internet is currently making its way through the House of Representatives. Although this legislation is being pushed by big-box stores concerned about the theft and subsequent resale of electronics and other big-ticket items, it could be used to prosecute people who steal cultural heritage materials and sell them on eBay, Amazon, etc.
  • It's still difficult to convince prosecutors that crimes against cultural heritage materials warrant serious penalties. In particular, theft and destruction of public library materials is generally regarded as a minor matter; at least in New York State, prosecution is generally contingent on the cash value of the materials, which means that these crimes are typically treated as misdemeanors. Unless the library community and its friends mobilize and agitate for change, this situation likely won't improve.
Take this workshop. You won't regret it.