Monday, June 14, 2010

NYAC/ARTNY: E-mail management and preservation

The Hudson River, as seen from Walkway Over the Hudson State Historic Park, Poughkeepsie, New York, 4 June 2010.

This post concerns something that happened 11 days ago, which I suppose makes me a full-fledged slow blogger . . . .

I noted a little while ago that I thought that Session 1, which focused on the Archivists’ Toolkit, was the highlight of the recent joint meeting of the New York Archives Conference/Archivists Roundtable of Metropolitan New York. However, Session 7, Management and Preservation of E-mail, was a very, very close second.

Nancy Adgent of the Rockefeller Archive Center (RAC) got things rolling by discussing one of my favorite preservation projects, the Collaborative Electronic Records Project (CERP) undertaken by the RAC and the Smithsonian Institution Archives (SIA).

Using RAC-held Microsoft Outlook .pst files e-mail in a variety of formats and SIA e-mail in a variety of formats Microsoft Outlook .pst files, the CERP team developed separate workflows that took into account their institutional differences. They also tested off-the-shelf conversion tools and and developed a parser that converts e-mail messages and attachments to an XML-based preservation format; over 99 percent of the 89,000 testbed messages parsed successfully.

Along the way, they learned a host of lessons:
  • Transfers from active systems go most smoothly when an archivist and an IT person work together.
  • Minor problems will arise. Some attachments that should have accompanied messages were missing, most likely because they were stored in a central server and the messages were copied from individual users’ desktops. The dates of some individual messages were replaced by the date they were bundled into a .pst file, and the name of the author was sometimes changed to that of the archivist who examined the file. Most strikingly, the installation of new GIS software on testbed equipment changed the display of some message fonts.
  • E-mail requires some processing before it’s converted to XML. The CERP team used a variety of off-the-shelf tools in order to do so.
  • Different anti-virus tools sometimes yield different results. The CERP team used Kaspersky and Symantec, each of which detected a few viruses that the other didn’t find.
  • Searching file names is not a foolproof means of identifying “sensitive” materials. Although both repositories conducted such searches and either removed sensitive information from access copies or documented its existence in finding aids and metadata, they realized that some sensitive information was still lurking in the messages.
Paul Szwedo of the New York State Office of Real Property Services (ORPS) then discussed how his 300-person agency began managing its e-mail. To the best of my knowledge, ORPS has made a lot more progress than any other New York State agency, and I hope that other agencies follow its example.

Prompted by recent changes in the Federal Rules of Civil Procedure, ORPS decided that staff had to take responsibility for their own information. It first reviewed, updated, and consolidated its e-mail, the Internet, and IT equipment use policies so that they harmonized with the state’s information security policy, ethics law, and relevant executive orders. It then amended its e-mail retention policy, which now mandates that users who want to keep their e-mail longer than 120 days must move it into a centralized archive.

ORPS already owned EMC’s DiskXtender archiving product and purchased the EmailXtender (now SourceOne) component (which the Obama White House also uses) for e-mail archiving. Instead of keeping every message, the agency opted to create folders based on length of retention period and rely upon staff to file messages appropriately. Folder access is customized by unit, so units that create records with longer retention periods can manage them properly and those that don’t can’t keep records forever.

After the system was installed, ORPS began providing training and guidance to staff. Staff had eight weeks to manage their existing e-mail, and Lotus Notes’ save-to-local-drive option was disabled. Project leaders sent out reminders to staff and stressed that managers were responsible for ensuring that unit records were managed properly and that unit staff knew how to do so.

Paul identified a number of key success factors:
  • Backing of senior management. (In a brief conversation we had after the session, he indicated that senior managers’ support was the single most important factor. If only figuring out how to secure the backing of senior management were as simple.)
  • Policy development preceded implementation. Business needs should drive IT investment, not the other way around.
  • Staff educated themselves via State Archives workshops and discussions with other agencies.
  • Availability of funding
  • Records management liaisons served as a test group, which facilitated identification of problems and prepared the liaisons to handle questions from other staff
  • All tutorials, videos, and communications relating to the project were placed online
The challenges were nonetheless substantial:
  • Turning a “save everything” organization into an organization that manages its information requires a lot of effort
  • Some people saw e-mail management as a distraction from their “real” work
  • Everyone wanted more time to review and sort messages
  • Upper managers retired, resulting in loss of momentum
  • Networking staff had other responsibilities thrust upon them
  • People save e-mail for easy reference, and don’t necessarily think of it as a record
Project staff also learned several lessons along the way:
  • Don’t assume people are paying attention. Despite repeated warnings and reminders, one person did not review and organize his/her messages and as a result lost all of them.
  • Elicit concerns, and do so upfront if at all possible (As Fynette Eaton has pointed out, this is a key principle of change management)
  • Weigh overhead against policy. On several occasions, ORPS had to tweak its policies because they were placing an undue burden on network personnel.
Christine Midwood of Iron Mountain Digital ended the session by highlighting how new products and services can help address the challenges associated with e-mail:
  • Legal risks and discovery. New products can provide consistent, rapid search across email archives, apply litigation holds by message (and apply multiple holds to a given message), and manage e-discovery cases so that teams can access only those messages responsive to the case they’re working on.
  • Expense. New products can apply retention schedules, streamline costs via outsourcing storage to a cloud environment (my take: the cloud isn’t ready for state and local government records), and consolidate archiving, business continuity, security, anti-virus, etc. functions into a single product.
  • Data loss. Technology can provide a tight, documented chain of custody, capture complete delivery information, and consolidate or eliminate message files stored on individual users’ hard drives.
  • Privacy. Software can now block or quarantine e-mail that contains prohibited or suspect content (e.g., Social Security Numbers) and provide role-based access to e-mail (whole message, metadata only, no access)
  • Productivity. New products offer “continuity” features that minimize e-mail outages, eliminate e-mail quotas, and automate application of retention policies.
In response to a question from an audience member, she made a really interesting point: Iron Mountain and other vendors work with companies that are keenly aware of the risks of keeping information too long, and as a result they get few inquiries about how to handle e-mail that has a permanent retention period. She noted that allowing end users to sort and classify their own messages might open the door to permanent retention, but I suspect that something more (e.g., migration/conversion, preservation metadata) is going to be needed.


Anonymous said...

Hi Bonnie,
Thanks for the shout-out about CERP.

We co-developed the XML preservation schema with the North Carolina State Archives. I am happy to say that we fully implemented the parser and schema as part of email collection processes here at the Smithsonian Institution Archives.

Clarification: SIA used PST files for the project and RAC used email in other formats such as LotusNotes and Apple Mail.

Lynda Schmitz Fuhrig
Smithsonian Institution Archives

l'Archivista said...

Thanks, Lynda! I've updated the post to correct my mixup; the information in my original notes was correct, but I mixed things up when drafting the post.

I initially included some information about the role that the North Carolina-led EMCAP project -- another favorite of mine -- played in developing the schema, but ended up cutting it out in an effort to reduce the length of this post.

Info about the EMCAP project is available at:

records management said...

This is very good concept of managing e-mails and also preserving them.I like the image you shown in this post.Its very beautiful.This blog gives knowledge on various interesting topics.I like your post a lot.Keep up the good work.