Thursday, December 31, 2009

A little end-of-year electronic records housekeeping . . . .

Governor Nelson A. Rockefeller and First Lady Margaretta "Happy" Rockefeller with guests on New Year's Eve, 31 December 1970. At noon on this day, Governor Rockefeller was sworn in as governor for the fourth and final time. New York (State). Governor. Public information photographs, 1910-1992. Series 13703-83, Box 3, Number 4408_23. Image courtesy of the New York State Archives.

I'm planning to devote part of the long weekend to tackling some long-neglected domestic chores, and I'm going to start by tidying up some loose ends on this blog. Owing to a combination of time pressures and a slow-to-heal (but steadily improving) injury, I let a few significant electronic records developments slide by without comments. All of them will remain relevant in 2010, so this is a good time to draw attention to them.
  • In October, the Arizona Supreme Court ruled (Lake v. City of Phoenix) "that if a public entity maintains a public record in an electronic format, then the electronic version, including any embedded metadata, is subject to disclosure under [the state's] public records laws." This ruling may seem a bit obvious to any archivist or records manager, but it's actually quite significant: earlier this month, an attorney who works for New York State's Committee on Open Government noted that, until now, neither case law nor legislation has really specified whether metadata is covered by state and federal freedom of information laws. As a result, Washington State's Supreme Court, which is set to hear a similar case, and courts in other jurisdictions will likely devote a lot of attention to this ruling. If you're curious about some of the potential implications of this ruling, check out what Ars Technica and Inside Counsel have to say about it.
  • Since 2003, ARMA and Cohasset Associates (and, sometimes, AIIM) have conducted annual surveys of electronic records management practices. The results of the 2009 survey were released in October, and as you might expect, the results are a mix of good and bad -- very bad -- news: organizations are starting to take action to correct their electronic records and information management problems, but most of them still have a long, long way to go before all of their problems are solved. Moreover, today's electronic records and information management shortcomings are so severe that they may well "jeopardize the future reliability, availability, and trustworthiness of many records. " If you want to figure out how your organization's policies and practices compare to those of others or have any sort of interest in electronic records management, the report (executive summary here, full text here) is an interesting, sobering read.
  • AIIM regularly offers free Webinars focusing on records, content, and business process management, and archived Webinars are available via its Web site; registration is required. Recently archived Webinars focus on the current state of electronic records management, determining responsibility for records/content management, managing government content in the cloud, and other topics relating to electronic records management. If you're finding it harder and harder to get permission to travel to conferences or training sessions or simply want to keep pace with new developments, you might want to check out these Webinars.
  • The New York State Historical Records Advisory Board recently launched a new Web resource, 9/11 Memory and History, that is designed to help survivors and people who lost friends or family on 11 September 2001 preserve photos or letters, drawings or paintings, scrapbooks, sound or video recordings, computer files or digital images, articles of clothing, or other objects. In addition to text-based instructions, the site also includes a number of short videos. Hofstra University archivist Geri Solomon and family member Margie Miller's discussion of what to save and how to save, and staff from the New York State Archives discuss other preservation-related concerns: Director of Operations Kathleen Roe talks about donating materials to a repository, Paper Conservator Sue Bove details how to care for photographs, drawings, and newspapers, and Electronic Records Archivist Bonita Weddle (i.e., Yours Truly) discusses preservation of digital files. Although this site is really targeted to the 9/11 community, other people interested in preserving family history materials and other personal materials should also find it useful.
  • Andrew Sniderman has written a reflective, thought-provoking piece about the psychic cost of GMail and other services that unwittingly lead users to document their lives more fully than ever before: digital archives of e-mails, texts, etc., may make it harder for users to deceive themselves about their motives and actions, but they also make it easier for them to fixate on old wounds and regrets. An ever-growing number of people will no doubt agree with Sniderman's assertion that "preservation gives the past more weight than it sometimes deserves," and many professional archivists will no doubt regard their own personal digital archives with at least some ambivalence. However, as Cal Lee pointed out at SAA earlier this year, archivists are ethically obligated to furnish guidance to people who are struggling to care for their personal digital records, so we really should start thinking about what we'll say to friends, relatives, prospective donors, and others who come to us for help.

Thursday, December 24, 2009

A Christmas Carol: crowdsourcing analysis of the mss.

Christmas tree being cut down at Cherry Plain, New York, for President Roosevelt and Governor Lehman, December 1934. 57L275REF. Image courtesy of the New York State Archives.

A little while ago, the New York Times worked with the Morgan Library and Museum, which owns the original manuscript copy of Charles Dickens's A Christmas Carol, to digitize the manuscript and invited its readers to identify the changes Dickens made while writing and revising it. Over 100 Times readers rose to the challenge, and their findings are featured today. The most interesting finding concerns Tiny Tim's name . . . .

Lots of repositories have manuscripts or records that would lend themselves to this sort of crowdsourcing analysis. However, right now we should probably focus on our families and friends. Whatever you're celebrating or not celebrating, I hope that you're with people you love and that you're having a great time.

Tuesday, December 22, 2009

Reading Archives reading group

Kate T. at ArchivesNext is a font of superb ideas about archival use of Web 2.0 technology, and her latest initiative is an online "group read" of Rand Jimerson's Archives Power: Memory, Accountability, and Social Justice. Kate's started a new blog, Reading Archives Power, that will serve as the vehicle for the group's discussion, and so far approximately 35 people -- a multinational mixture of new graduates and seasoned professionals -- have publicly expressed interest in participating. Rand Jimerson will also participate, so the discussion ought to be really stimulating.

There is no formal signup for the group, but if you are interested -- and I hope you are -- you will need to obtain a copy of Archives Power and start reading it before the discussion begins on 11 January 2010. (Tip from Kate T.: the Society of American Archivists bookstore is charging a lot less than Amazon and other retailers.) Although it's not absolutely necessary, you should also consider introducing yourself to the other members of the group and reviewing Kate's proposed discussion schedule.

I'm really looking forward to this discussion, and I hope to see you over at Reading Archives Power.

Wednesday, December 16, 2009

Bush White House e-mail settlement

News that 22 million lost e-mail messages sent or received by the Bush White House have been recovered and that the National Security Archive (NSA) and Citizens for Responsibility and Ethics in Washington (CREW) have settled their 2007 lawsuit against the Executive Office of the President (EOP) has been all over the media for the past couple of days.

A lot of the media coverage is focusing on a couple of items in the settlement document. First, for reasons of cost, the White House will focus on recovering e-mails sent or received on select days, not every "missing" e-mail that the Bush White House created. Second, the U.S. National Archives and Records Administration (NARA) will take custody of the e-mails and manage them in accordance with the Presidential Records Act, which means that the e-mails won't be disclosed to researchers for years.

However, a quick review of the settlement document itself reveals that, with a handful of exceptions, the media isn't calling attention to a provision that ought to interest anyone who wants to know how the White House does business or how EOP manages its electronic records:
4. Description of Current EOP System: Defendants [EOP and NARA] will provide Plaintiffs [NSA and CREW] with a publicly releasable letter describing in as much detail as possible the current EOP computer system, including its email archiving and backup systems. This document will include a detailed description of the controls in the system that prevent the unauthorized deletion of records.

a. Prior to sending the letter, Defendants will review with Plaintiffs draft(s) of the letter and the Parties will agree upon a final version.

b. Defendants recognize that Plaintiffs intend to release the letter publicly, and Defendants do not object to such a release.

c. Defendants will produce this letter by January 15, 2010.
Although I'm sorry that the EOP and NARA personnel charged with producing this document will likely have to curtail their holiday breaks, I'm looking forward to the end result. It should make for interesting reading.

Thursday, December 10, 2009

And you thought government records were boring . . . .

Government archives sometimes get a bad rap: a lot of people are under the impression that they're stuffed full of old policy documents and other legally necessary but deadly dull stuff, and even archivists who work in other settings sometimes think that government records are pretty dry.

I could go into my standard spiel about how government records document the rights of citizens to hold property, receive benefits that they have earned, and participate in civic life and how government archives promote government transparency and accountability. I could also go on about how government archives attract not only academic historians and genealogists but also biologists, engineers, historic preservationists, linguists, epidemiologists, attorneys, documentary filmmakers, and all sorts of other users. All of these things are true.

However, one of the things I most like about working in government archives is that even the most humdrum-seeming of records series can contain the unexpected. Seven or eight years ago, several colleagues and I were moving a very large and red rot-plagued series of 19th century financial records and discovered that the series included a little volume bearing a crudely inked title: "No-Good Lawyers." It was a listing of Victorian-era attorneys who had, in various ways, run afoul of the New York State Banking Department -- and a welcome little diversion from a laborious and dirty task.

Sometimes the unanticipated finds are amusing, and sometimes they're horrifying. When I was still in grad school, I was examining a series of photographs taken at a psychiatric facility and found that, in addition to images of female patients playing around with cosmetics and staff-patient softball games, it included a series of photographs documenting the administration of electroconvulsive therapy (ECT). It took me a little while before I figured out precisely what was going on in those photos, and when I finally did, I was completely unnerved. I hastily put the photos back in their box, told the reference archivist what I had found, and fled the research room. I haven't seen those photographs in over a decade, but the thought of them is still unsettling.

And sometimes, of course, the finds are hilarious. Every day, the Web site of the U.S. National Archives and Records Administration (NARA) highlights one of the records in NARA's holdings. Many of the featured records are historically significant; for example, yesterday's featured document consists of the U.S. Congress's official copy of the Twelfth Amendment, which it passed on 9 December 1803. However, today's document, which was issued on 10 December 1959, concerns a less weighty matter: the United States government's efforts to find the Yeti.

Clicking on the image below will bring up a much larger and more legible version. I'm particularly fond of regulation no. 2.

"Regulations Governing Mountain Climbing Expeditions in Nepal - Relating to Yeti"; UD-WW, 1454, , Box 252, Accession #64-9-0814, folder 5.1 Political Situation - General, File ended Dec 31, 1959; Records of the Agency for International Development; Record Group 286; National Archives. Image courtesy of the U.S. National Archives and Records Administration.

Tuesday, December 8, 2009

TSA's bad PDF redaction . . . and tips on redacting PDFs properly

The Transportation Security Administration (TSA) is the latest in a long line of Fortune 500 companies and federal government agencies to discover that information can all too easily be recovered from an improperly redacted PDF document. On Sunday, blogger The Wandering Aramean announced that the TSA had posted a copy of its Screening Management Standard Operating Procedure manual, which provides detailed information about how TSA personnel screen passengers and luggage, on a federal contract soliciation Web site.

Portions of the manual, which is identified as containing Sensitive Security Information, were redacted, but . . . whoever did the redactions simply used Adobe Acrobat or other PDF-compatible software to draw black boxes over the information that should have been redacted. As I've noted before, it doesn't take tons of computer know-how to recover the information hiding under those black boxes, and The Wandering Aramean and lots of other people were able to do so. The TSA has pulled the manual off the federal contract site, but you can find a complete and unredacted copy here and on lots of other sites.

The TSA has stated that the version of the manual it posted has been superseded repeatedly, that it was never actually used by TSA personnel, and that TSA security procedures have changed substantially since it was written. However, the damage has been done: the blogosphere and the news media are having a field day, and Congress is demanding an investigation. I know that beating up on the TSA is something of a sport (and, believe me, I have some issues with its 3-1-1 policy), but I really do feel for the folks at TSA HQ who have to clean up this mess.

Putting poorly redacted PDFs on the Web seems to be something of a fad these days -- Google did it a few weeks ago -- but I don't want to see archivists or records managers fall prey to the pitfalls that have ensnared so many others. If you're trying to figure out how to provide access to PDFs that contain information restricted by law or donor agreement, here are a few pointers:
  • If you're working with a PDF file, never, ever use Adobe Acrobat's Draw or Annotate tools (or comparable tools in other programs) to place black, white, etc. boxes over the information you wish to redact. All a savvy user needs to do is to copy the PDF in its entirety and paste it into a word processing document. Moreover, someone with ready access to Adobe Acrobat or comparable software can skip the copying and pasting and simply open the PDF and remove the boxes that you drew. Don't think that locking your PDF will keep this from happening: shareware that promises to unlock PDFs is all over the Interwebs.
  • If you're working with a word processing document that you plan to convert to PDF format, never, ever attempt to redact information by changing the font color to white or using a shading or highlighting feature to obscure the text and then converting the document to PDF format. The copy-and-paste technique outlined above will reveal the hidden text; users might have to play with the font colors a bit, but doing so won't take them more than a few seconds.
At present, there are several good tools for redacting PDF files, and you'll need to assess your current software setup, the amount of redaction work you'll have to do, and your budget in order to decide which one works best for you.
  • If you've got an older version of Acrobat, two third-party plug-ins for Adobe Acrobat, Redax and Redact-It, are time-tested and have substantial followings in the legal community.
  • If you are using an older version of Adobe Acrobat and can't or don't want to upgrade or purchase an add-on tool, the National Security Agency has produced a document that outlines a laborious but effective redaction procedure.
  • If you've got an old version of Acrobat, no money for an upgrade or a plug-in, and only a handful of documents to redact, you might want to consider printing out the documents, whipping out a black magic marker, and redacting information the old-fashioned way. Photocopy the redacted printouts to reduce the chance that the text can be read through the marker, then scan the photocopies.
If you do commit to redacting documents electronically:
  • Make sure you know how to use your chosen redaction tool. Most of them are pretty straightforward, but slip-ups are possible, and you don't want slip-ups circulating on the Web. All of the software tools listed above are well-documented, so take the time needed to review and digest said documentation.
  • Prepare a test file and familiarize yourself with your chosen software tool before you start working with real live documents. If you can get a disinterested third party (preferably one with lots of IT or digital forensics experience) to review your test file and verify that the information you've redacted really is gone, by all means do so.
  • This may seem a bit obvious, but someone once asked me, so I'm going to come right out and say it: don't redact your original e-documents. Chances are, your documents will one day be fully discloseable, so make electronic copies of them, redact the copies, and keep both the copies and the originals. Doing so increases your storage and preservation commitments, but there really aren't any good alternatives, particularly for records warranting permanent retention.
  • Keep abreast of the relevant legal and digital forensics literature: people are trying to figure out how to "break" all of the tools listed above and recover information redacted with these tools. One of them may eventually succeed, at which point all bets are off.
Finally, a gentle disclaimer: the above information is . . . simply information, not legal, financial, medical, dental, or any other kind of advice. As is the case with everything on this blog, it's not necessarily reflective of the opinions and policies of my employer, either. It does reflect my own knowledge at the time of this writing, but, as is the case with all things electronic, electronic redaction technology and best practices change rapidly. It's really up to you to investigate the options for yourself and to make sure that the electronic information you redact really can't be recovered.

Happy redacting!

Monday, December 7, 2009

"So far, it's the best job in the country"

Last week, David Ferriero, the new Archivist of the United States, delivered his first State of the Archives address. I was particularly cheered by his continuing emphasis on the challenges posed by electronic records and electronic records management, which he likened to the problems faced by Robert Digges Wimberly Connor, the first Archivist of the United States, who fought valiantly to ensure that the nation's long-neglected records were properly housed:
. . . . It seems to me that we are at a similar crossroads in the history of the Archives in the challenges we face with the electronic records of the agencies we serve. Varieties of technology, platforms, software, practice, and lack of standards complicate the work of ingesting, preserving, and making available the records of the government. The work we have undertaken with Lockheed Martin is, of course, being watched closely by our funders, our stakeholders, and the rest of the archival community who is grappling with similar issues of born digital records. We have to get this right.

I also see the Electronic Records Archives initiative as a vehicle for reestablishing our oversight of the records management programs of each agency—working with agencies to establish protocols, practices, and annual audits.
I also like that Ferriero recognizes the larger archival community's interest in the Electronic Records Archives, and I hope that he continues predecessor Allen Weinstein's effort to bring the U.S. National Archives and Records Administration into closer alignment with archival professional organizations and other repositories throughout the nation.

If you want a sense of Ferriero's background and personality, check out the lengthy profile in today's Washington Post, which highlights his decades of work in libraries and includes video footage of him examining materials in the stacks of the Archives I facility in Washington, DC. The video's only 42 seconds long, but it reveals that the new Archivist has a puckish sense of humor:

Sunday, December 6, 2009

An archivist responds to Jon Stewart

As you all know, a few weeks ago, Jon Stewart had a little fun at the archival profession's expense. Now, the Woody Guth3 (who may or may not be archivist/lyricist David Kay) explain -- for the benefit of Mr. Stewart and all the other uninformed souls out there -- what we do and why most of us have at least one graduate degree:

Thursday, December 3, 2009

Something to ponder

The Post Carbon Institute is a think tank that seeks to supply "individuals, communities, businesses, and governments with the resources needed to understand and respond to the interrelated economic, energy, and environmental crises that define the 21st century."

A couple of months ago, the Institute published "Our Evanescent Culture and the Awesome Duty of Librarians," in which Richard Heinberg outlined the macro-level threats to the survival of digital information. Among them: failure to maintain reliable sources of power generation and delivery, nuclear war, and the systemic vulnerabilities associated with living in an increasingly interconnected world.

Sometimes, those of us charged with preserving digital information are so focused on the very real short-term threats such as file corruption, hardware failure, and software obsolescence that we sometimes forget that, as Heinberg asserts, "digitization represents a huge bet on society’s ability to keep the lights on forever."

Can we keep the lights on forever? Even if you think that the Post Carbon Institute's being overly alarmist about global warming and fossil fuel supplies, you have to admit that we're taking an awfully big gamble. A number of years ago, an historian of medicine told me that, statistically speaking, humanity is really overdue for a pandemic that combines the mortality rate of AIDS with the contagiousness of the common cold -- and for the social, political, and economic havoc that such pandemics wreak. We haven't experienced a "hot" global war for over sixty years, but in the larger scheme of things, sixty years is the mere blink of an eye.

And, of course, it's quite likely that one day our culture will be known chiefly through archaeological digs and a few surviving artworks and texts. Woe betide the 30th century archeologist who unearths a cache of data tapes!

Heinberg concludes that, given the very real risk that digital information will be lost, librarians (and, by extension, archivists) should be mindful of the importance of "conservation of essential cultural knowledge in non-digital form." Maybe he's right: perhaps we should devote a little effort to leading the public discussion about how our culture should be remembered and making sure that at least some information about our values and accomplishments is preserved in human-readable form.

Read the whole article. It's really good.

(Hat-tip: Alan's Notes on Digital Preservation.)