Showing posts with label Archivematica. Show all posts
Showing posts with label Archivematica. Show all posts

Sunday, August 28, 2011

Practical Approaches to Born-Digital Records: Archivematica


Archivists mingle around a full-sized skeleton cast of Sue, the largest, most complete, and best preserved Tyrannosaurus rex fossil ever discovered, during a Society of American Archivists reception at the Field Museum, Chicago, Illinois, 26 August 2011. Sue is 42 feet (12.8 m) long and 12 feet (3.66 m) high at the hip.

I’ve always feared getting sick at the annual meeting of the Society of American Archivists, and yesterday it happened. I stayed in bed and missed all of yesterday’s sessions and a section meeting. I somehow dragged myself to the last few minutes of evening reception at the Field Museum, but I felt quite like Sue, the magnificent T. rex who presided over the festivities: an empty-headed and mildly scary-looking dead thing.

I was still a bit shaky today, and I managed to miss all of this morning’s first session and part of Session 610, Practical Approaches to Born-Digital Records: What’s Coming Next, which focused on Archivematica. Archivematica is a digital preservation platform that brings together a wide array of open source anti-virus, metadata extraction, file conversion, and other tools and supports automated processing of archival electronic records. We’ve just started experimenting with Archivematica, and I really wanted to hear about other archivists’ experiences with it.

I didn’t get to hear Peter Van Garderen of Artefactual Systems discuss Archivematica’s development or plans for future enhancements and came in as Glenn Dingwall (City of Vancouver Archives) was wrapping up his presentation.

In lieu of recapping the presentations of Paul Jordan (International Monetary Fund) and Angela Jordan (University of Illinois Urbana-Champaign) or summarizing the question-and-answer component of this session, I’m simply going to highlight the most interesting points that arose during its second half. I think that Archivematica holds great promise, and many of the presenters and audience members were of the same opinion, so don’t let this post deter you from investigating it yourself. However, you should keep in mind that Archivematica:
  • Is not a complete digital preservation system. It creates Archival Information Packages (AIPs) that can be preserved over the long term, but it doesn’t provide for storage of these AIPs.
  • Is designed with scalability in mind. It can be run on a desktop in a small repository or on a very large server array. From a technical point of view, the chief bottlenecks limiting large-scale implementations are processing speed and capacity and limits on the time of staff needed to obtain intellectual control over the materials.
  • Will be of particular interest to small repositories; however, not all of them will be able to meet the platform’s hardware requirements or acquire the requisite technical knowledge.
  • Requires some degree of technical know-how and quite a bit of willingness to get one’s hands dirty. Archivematica requires a real or virtual Linux environment. Most archivists aren’t familiar with Linux and must be willing to learn. Moreover, the installation process isn’t as straightforward as it could be. Fortunately, Michael Bennett has written really useful installation instructions and Angela Jordan has posted about her experience; FWIW, I’ve also posted about our own installation experience.
  • May require customization. For example, the International Monetary Fund will have to do figure out how to keep classified documents that should be included in AIPs out of the Dissemination Information Packages that Archivematica creates.
  • Requires some additional development. (Given that it has yet to reach the beta stage of development, this need isn't surprising.) Session participants articulated several desired improvements that would give archivists the ability to specify which preservation/normalization formats will be employed, enable them to reinsert or otherwise deal with files or folders that Archivematica rejects, and shed light upon why the ingest process sometimes stalls. Participants also wanted to see Archivematica support creation of Submission Information Packages, improve processing of e-mail, and integrate records management.

Wednesday, April 20, 2011

Installing Archivematica

Last week, my intrepid colleague Michael and I started playing around with Archivematica, the first open-source, Open Archival Information System Reference Model-compliant digital preservation system that can be installed on a desktop computer; it's fully scalable, so it also works well in a large-scale Linux server environment. Archivematica, which is being developed by Artefactual Systems in collaboration with the UNESCO Memory of the World's Subcommittee on Technology, the City of Vancouver Archives, the University of British Columbia Library, the Rockefeller Archive Center, and several other collaborators, is still in alpha testing mode, but it integrates a lot of open source digital preservation tools, including BagIt, the Metadata Extraction Tool developed by the National Library of New Zealand, and JHOVE and uses PREMIS, METS, Dublin Core, and other widely used metadata standards.

My intrepid colleague Michael and I have wanted to play around with Archivematica for some time, and last week we finally got around to downloading and installing it. The process went a lot more smoothly than we anticipated -- in large part because we read Angela Jordan's candid Practical E-Records post about her experiences and Michael J. Bennett's detailed Archivematica installation instructions as well as some of the instructions provided on the Archivematica site -- but we did hit a few sticking points. I'm sharing what we learned in hopes of helping other archivists who are interested in experimenting with Archivematica.

Archivematica is designed to operate within an Ubuntu Linux environment, but Mac and Windows users can easily install a virtual appliance that makes it possible to set up an Ubuntu environment on their computers. We opted to install Oracle VirtualBox, which is recommended by Archivematica's developers, and we were both really impressed by the clearly written, logically organized, and complete instructions that accompanied the software. I've encountered a lot of bad installation instructions and user manuals, and it's always a pleasant surprise when I run across manuals produced by good, careful technical writers. However, the manual didn't mention one thing that we and Angela Jordan encountered: as you install VirtualBox on a Windows machine, Windows will repeatedly warn you that you are attempting to install non-verified software and ask you whether you're certain you want to do so. Be prepared to click through lots of dire dialog box warnings.

After we set up VirtualBox, we followed Michael Bennett's instructions for installing Xubuntu 10.4. The installation process was simpler than we anticipated -- we basically clicked through a setup wizard -- but we had to stop work for the day a few minutes after the installation was complete.

Installing Archivematica itself was a bit more challenging. It took us a little while to figure out that we really did have to install it via the Web; much to our dismay, copying the files on the Archivematica Launchpad onto a DVD -- something that we had done several days before -- and then installing Archivematica via the DVD simply doesn't work.

Moreover, Michael and I are both completely new to Ubuntu, so we were a bit flummoxed by the Ubuntu Repository Package instructions that appear on the Archivematica site. I did a little Googling and discovered that we had to access Ubuntu's command line interface to install Archivematica and that we could do so via Terminal. We also found Michael Bennett's step-by-step instructions, which highlight some trouble spots, really helpful. However, Bennett's instructions illustrate how to copy the installation commands from the Archivematica Web site and paste them into the Terminal interface, and for some reason we simply couldn't paste the text we copied into Terminal. We were a little pressed for time, so in lieu of troubleshooting our copy/paste problem, we opted to type all of the installation commands into Terminal -- and hit a few trouble spots of our own as a result.

We hesitantly entered the command to add the first Archivematica PPA, and were gratified to discover that it apparently worked: the screen displayed a few lines of text, the word "error" didn't appear anywhere, and we were prompted to enter another command. We ran the second Archivematica PPA command and the trio of archivematica-shotgun commands without incident, but we had real problems running the vmInstaller-environment.sh. After about half a dozen error messages, we figured out what we were doing wrong: our all-too-human minds led us to read "enviroment," the last element in the command, as "environment."

There is only one "n" in "enviroment"!

Entering the flock (i.e., file lock) call also posed a few problems. Because we were typing, not copying and pasting, the commands, we first had to figure out whether the five asterisks at the start of the call were separated by spaces; they are. Then we had to figure out how to access the end of the flock call, which is hard to see on the Archivematica Web site. Fortunately, M.J. Bennett's instructions revealed that the text was indeed there, and we could view it when we highlighted it.

The highlighted segment of the flock call reads: /sharedDirectory/watchedDirectories/quarantined" Note the presence of the the quotation mark at the end.

After we rebooted our Ubuntu virtual machine, we were able to access Archivematica without any problems . . . but had to shut it down immediately and make our way to a previously scheduled event.

Michael and I estimated that it took a total of about four hours to install VirtualBox, Xubuntu 10.4, and Archivematica, and I'm pretty sure that the fumbles outlined above and our repeated readings of various installation manuals took up approximately one hour of that time. Moreover, a lot of the Archivematica installation time was taken up by sitting around and waiting for the commands to execute -- be prepared to see many, many lines of text appear in Ubuntu's Terminal -- and we could have done a little light work (e.g., proofreading draft MARC records, completing travel paperwork) while waiting to enter the next command.

I'm out of the office at the moment and Michael's going to have to focus on other projects during the next couple of weeks, but we'll start experimenting with Archivematica as soon as we get the chance. In the coming months, I'll put up at least a couple of posts outlining our findings.