Wednesday, April 20, 2011

Installing Archivematica

Last week, my intrepid colleague Michael and I started playing around with Archivematica, the first open-source, Open Archival Information System Reference Model-compliant digital preservation system that can be installed on a desktop computer; it's fully scalable, so it also works well in a large-scale Linux server environment. Archivematica, which is being developed by Artefactual Systems in collaboration with the UNESCO Memory of the World's Subcommittee on Technology, the City of Vancouver Archives, the University of British Columbia Library, the Rockefeller Archive Center, and several other collaborators, is still in alpha testing mode, but it integrates a lot of open source digital preservation tools, including BagIt, the Metadata Extraction Tool developed by the National Library of New Zealand, and JHOVE and uses PREMIS, METS, Dublin Core, and other widely used metadata standards.

My intrepid colleague Michael and I have wanted to play around with Archivematica for some time, and last week we finally got around to downloading and installing it. The process went a lot more smoothly than we anticipated -- in large part because we read Angela Jordan's candid Practical E-Records post about her experiences and Michael J. Bennett's detailed Archivematica installation instructions as well as some of the instructions provided on the Archivematica site -- but we did hit a few sticking points. I'm sharing what we learned in hopes of helping other archivists who are interested in experimenting with Archivematica.

Archivematica is designed to operate within an Ubuntu Linux environment, but Mac and Windows users can easily install a virtual appliance that makes it possible to set up an Ubuntu environment on their computers. We opted to install Oracle VirtualBox, which is recommended by Archivematica's developers, and we were both really impressed by the clearly written, logically organized, and complete instructions that accompanied the software. I've encountered a lot of bad installation instructions and user manuals, and it's always a pleasant surprise when I run across manuals produced by good, careful technical writers. However, the manual didn't mention one thing that we and Angela Jordan encountered: as you install VirtualBox on a Windows machine, Windows will repeatedly warn you that you are attempting to install non-verified software and ask you whether you're certain you want to do so. Be prepared to click through lots of dire dialog box warnings.

After we set up VirtualBox, we followed Michael Bennett's instructions for installing Xubuntu 10.4. The installation process was simpler than we anticipated -- we basically clicked through a setup wizard -- but we had to stop work for the day a few minutes after the installation was complete.

Installing Archivematica itself was a bit more challenging. It took us a little while to figure out that we really did have to install it via the Web; much to our dismay, copying the files on the Archivematica Launchpad onto a DVD -- something that we had done several days before -- and then installing Archivematica via the DVD simply doesn't work.

Moreover, Michael and I are both completely new to Ubuntu, so we were a bit flummoxed by the Ubuntu Repository Package instructions that appear on the Archivematica site. I did a little Googling and discovered that we had to access Ubuntu's command line interface to install Archivematica and that we could do so via Terminal. We also found Michael Bennett's step-by-step instructions, which highlight some trouble spots, really helpful. However, Bennett's instructions illustrate how to copy the installation commands from the Archivematica Web site and paste them into the Terminal interface, and for some reason we simply couldn't paste the text we copied into Terminal. We were a little pressed for time, so in lieu of troubleshooting our copy/paste problem, we opted to type all of the installation commands into Terminal -- and hit a few trouble spots of our own as a result.

We hesitantly entered the command to add the first Archivematica PPA, and were gratified to discover that it apparently worked: the screen displayed a few lines of text, the word "error" didn't appear anywhere, and we were prompted to enter another command. We ran the second Archivematica PPA command and the trio of archivematica-shotgun commands without incident, but we had real problems running the vmInstaller-environment.sh. After about half a dozen error messages, we figured out what we were doing wrong: our all-too-human minds led us to read "enviroment," the last element in the command, as "environment."

There is only one "n" in "enviroment"!

Entering the flock (i.e., file lock) call also posed a few problems. Because we were typing, not copying and pasting, the commands, we first had to figure out whether the five asterisks at the start of the call were separated by spaces; they are. Then we had to figure out how to access the end of the flock call, which is hard to see on the Archivematica Web site. Fortunately, M.J. Bennett's instructions revealed that the text was indeed there, and we could view it when we highlighted it.

The highlighted segment of the flock call reads: /sharedDirectory/watchedDirectories/quarantined" Note the presence of the the quotation mark at the end.

After we rebooted our Ubuntu virtual machine, we were able to access Archivematica without any problems . . . but had to shut it down immediately and make our way to a previously scheduled event.

Michael and I estimated that it took a total of about four hours to install VirtualBox, Xubuntu 10.4, and Archivematica, and I'm pretty sure that the fumbles outlined above and our repeated readings of various installation manuals took up approximately one hour of that time. Moreover, a lot of the Archivematica installation time was taken up by sitting around and waiting for the commands to execute -- be prepared to see many, many lines of text appear in Ubuntu's Terminal -- and we could have done a little light work (e.g., proofreading draft MARC records, completing travel paperwork) while waiting to enter the next command.

I'm out of the office at the moment and Michael's going to have to focus on other projects during the next couple of weeks, but we'll start experimenting with Archivematica as soon as we get the chance. In the coming months, I'll put up at least a couple of posts outlining our findings.

Tuesday, April 12, 2011

Digital Preservation Management workshop, Albany, NY, 5-10 June 2011

On June 5-10, the University at Albany, SUNY will host the Digital Preservation Management: Implementing Short-Term Solutions for Long-Term Problems workshop.

This workshop, which is a radically revised and expanded version of the workshop (with accompanying online tutorial) that Nancy McGovern and Anne Kenney developed at Cornell University in 2003, is aimed at managers at organizations of all kinds who are or will be responsible for managing digital content over time. Nancy McGovern will be the lead instructor, and three other instructors will teach sections of the workshop. Theresa Pardo of the Center for Technology in Government will deliver a keynote address.

The workshop will begin on the evening of Sunday, June 5, continue Monday -Thursday from 9:00 AM- 5:00 PM, and conclude on Friday, June 10, at noon. The cost of registration is $950.00.

Additional information about the content and instructors is available at: www.icpsr.umich.edu/dpm/workshops/fiveday.html.

N.B.: Prospective attendees must submit an application. The application will be made available at http:// www.regonline.com/DPMworkshop-Albany2011 at 1:00 PM ET on Wednesday, 13 April (i.e., tomorrow!) and will be available until all 24 workshop slots have been filled. Applicants will receive notification of acceptance or denial within five business days of applying, and successful applicants will able to complete the registration process in early May.

Unfortunately, I won't be submitting an application: I'm not a manager, training funds are scarce these days, and I have a conflicting commitment. However, everyone who has attended one of these workshops has raved about the experience, and I obsessively studied the accompanying online tutorial when I first became an electronic records archivist. If you manage an archives, you should attend this workshop if at all possible. If you work with electronic records, you should strongly encourage your boss to attend this workshop.

Monday, April 4, 2011

"Newly found documents shed light on MLK's convicted killer"

Forged Canadian passport used by James Earl Ray, who was apprehended at Heathrow Airport in London on June 8, 1968 and extradited shortly afterward. Image courtesy of the Shelby County, Tennessee Register of Deeds.

I was perusing CNN's Web site this weekend, and the above headline jumped out at me; looking for records-related news is one of the occupational hazards of being an archivist and one of the chief avocational hazards of being an archivist blogger. I clicked the link, and my curiosity was amply rewarded.

Forty-three years ago today, civil rights leader Martin Luther King, Jr., was slain in Memphis, Tennessee, where he was lending his support to striking sanitation workers. James Earl Ray was apprehended approximately two months after the shooting and ultimately entered a guilty plea in order to avoid undergoing a jury trial.

Until recently, little was known about Ray's state of mind in the months following his arrest or the inner workings of the investigations into King's murder. However, several years ago, staff from the Shelby County, Tennessee Register of Deeds who were processing unidentified materials in the Shelby County Archives found a large bundle of photographs, documents, and other materials relating to Ray, who later recanted his confession and unsuccessfully sought a jury trial.

All of these materials -- crime scene photographs, letters between Ray and members of his family, audio files, court records, and records of local, state, and federal prosecutors and law enforcement agencies -- have been digitized and are now accessible via the Web site of the Shelby County Register of Deeds.

These records have a broader context that is reflected in the rich holdings of King Collection at Morehouse College, the Dr. Martin Luther King, Jr. Archive at Boston University, and the Martin Luther King, Jr. Research and Education Institute at Stanford University -- and in countless collections that document the lives and work of civil rights activists and white supremacists, slaveowners, abolitionists, and slaves, and countless other Americans of all races, ethnicities, religions, and social and political beliefs.

These materials nonetheless add significantly to our understanding of the environment in which Martin Luther King, Jr. lived and worked, our knowledge of the man who was convicted of murdering him, and our understanding of how the police and the courts dealt with a murder that had a profound and lasting impact upon American society. (And to think that people -- including those who really should know better -- sometimes assume local government records are dull or inconsequential!) Thanks to Shelby County Register of Deeds Tom Leatherwood and his staff for devoting a lot of time, effort, and resources to making these important records widely available.