Wednesday, April 20, 2011

Installing Archivematica

Last week, my intrepid colleague Michael and I started playing around with Archivematica, the first open-source, Open Archival Information System Reference Model-compliant digital preservation system that can be installed on a desktop computer; it's fully scalable, so it also works well in a large-scale Linux server environment. Archivematica, which is being developed by Artefactual Systems in collaboration with the UNESCO Memory of the World's Subcommittee on Technology, the City of Vancouver Archives, the University of British Columbia Library, the Rockefeller Archive Center, and several other collaborators, is still in alpha testing mode, but it integrates a lot of open source digital preservation tools, including BagIt, the Metadata Extraction Tool developed by the National Library of New Zealand, and JHOVE and uses PREMIS, METS, Dublin Core, and other widely used metadata standards.

My intrepid colleague Michael and I have wanted to play around with Archivematica for some time, and last week we finally got around to downloading and installing it. The process went a lot more smoothly than we anticipated -- in large part because we read Angela Jordan's candid Practical E-Records post about her experiences and Michael J. Bennett's detailed Archivematica installation instructions as well as some of the instructions provided on the Archivematica site -- but we did hit a few sticking points. I'm sharing what we learned in hopes of helping other archivists who are interested in experimenting with Archivematica.

Archivematica is designed to operate within an Ubuntu Linux environment, but Mac and Windows users can easily install a virtual appliance that makes it possible to set up an Ubuntu environment on their computers. We opted to install Oracle VirtualBox, which is recommended by Archivematica's developers, and we were both really impressed by the clearly written, logically organized, and complete instructions that accompanied the software. I've encountered a lot of bad installation instructions and user manuals, and it's always a pleasant surprise when I run across manuals produced by good, careful technical writers. However, the manual didn't mention one thing that we and Angela Jordan encountered: as you install VirtualBox on a Windows machine, Windows will repeatedly warn you that you are attempting to install non-verified software and ask you whether you're certain you want to do so. Be prepared to click through lots of dire dialog box warnings.

After we set up VirtualBox, we followed Michael Bennett's instructions for installing Xubuntu 10.4. The installation process was simpler than we anticipated -- we basically clicked through a setup wizard -- but we had to stop work for the day a few minutes after the installation was complete.

Installing Archivematica itself was a bit more challenging. It took us a little while to figure out that we really did have to install it via the Web; much to our dismay, copying the files on the Archivematica Launchpad onto a DVD -- something that we had done several days before -- and then installing Archivematica via the DVD simply doesn't work.

Moreover, Michael and I are both completely new to Ubuntu, so we were a bit flummoxed by the Ubuntu Repository Package instructions that appear on the Archivematica site. I did a little Googling and discovered that we had to access Ubuntu's command line interface to install Archivematica and that we could do so via Terminal. We also found Michael Bennett's step-by-step instructions, which highlight some trouble spots, really helpful. However, Bennett's instructions illustrate how to copy the installation commands from the Archivematica Web site and paste them into the Terminal interface, and for some reason we simply couldn't paste the text we copied into Terminal. We were a little pressed for time, so in lieu of troubleshooting our copy/paste problem, we opted to type all of the installation commands into Terminal -- and hit a few trouble spots of our own as a result.

We hesitantly entered the command to add the first Archivematica PPA, and were gratified to discover that it apparently worked: the screen displayed a few lines of text, the word "error" didn't appear anywhere, and we were prompted to enter another command. We ran the second Archivematica PPA command and the trio of archivematica-shotgun commands without incident, but we had real problems running the After about half a dozen error messages, we figured out what we were doing wrong: our all-too-human minds led us to read "enviroment," the last element in the command, as "environment."

There is only one "n" in "enviroment"!

Entering the flock (i.e., file lock) call also posed a few problems. Because we were typing, not copying and pasting, the commands, we first had to figure out whether the five asterisks at the start of the call were separated by spaces; they are. Then we had to figure out how to access the end of the flock call, which is hard to see on the Archivematica Web site. Fortunately, M.J. Bennett's instructions revealed that the text was indeed there, and we could view it when we highlighted it.

The highlighted segment of the flock call reads: /sharedDirectory/watchedDirectories/quarantined" Note the presence of the the quotation mark at the end.

After we rebooted our Ubuntu virtual machine, we were able to access Archivematica without any problems . . . but had to shut it down immediately and make our way to a previously scheduled event.

Michael and I estimated that it took a total of about four hours to install VirtualBox, Xubuntu 10.4, and Archivematica, and I'm pretty sure that the fumbles outlined above and our repeated readings of various installation manuals took up approximately one hour of that time. Moreover, a lot of the Archivematica installation time was taken up by sitting around and waiting for the commands to execute -- be prepared to see many, many lines of text appear in Ubuntu's Terminal -- and we could have done a little light work (e.g., proofreading draft MARC records, completing travel paperwork) while waiting to enter the next command.

I'm out of the office at the moment and Michael's going to have to focus on other projects during the next couple of weeks, but we'll start experimenting with Archivematica as soon as we get the chance. In the coming months, I'll put up at least a couple of posts outlining our findings.


Evelyn said...

Hi Bonnie,

As systems archivist on the Archivematica project I read your post with great interest! Thank you for highlighting some of the issues you encountered while installing the software. My colleagues and I will make some changes to address your comments...including changing the spelling of "enviroment".

Evelyn McLellan

Tundra2 said...

Hey Bonnie,

If your Archivematica install ever blows up or for whatever reason you need to re-install from scratch, no need to hand-type all of those Linux terminal entries. To get around the problematic copy/paste issue, just open Firefox within Ubuntu running in your VirtualBox VM, and you'll find that you can download my directions from there (to copy/paste from) and/or copy/paste from the AM wiki's instructions without a problem. That should help avoid some of the fumble fingered manual typing errors that you describe.

MJ Bennett

Jules Lauwerier said...

In case it might help: paste in a Terminal-window under Linux usually is done via CTRL-SHIFT-V and not by the more common CTRL-V

l'Archivista said...

Evelyn, M.J., and Jules: thank you for the tips!

card scanner said...

I was getting some problem with Archivematica, don't know whats the problem. But then I reinstall it from scratch as you suggested Tundra2 and it is working great. Thank you very much.