Showing posts with label e-records. Show all posts
Showing posts with label e-records. Show all posts

Saturday, August 6, 2016

SAA day two: electronic records

Comb jellyfish at the Georgia Aquarium, Atlanta, Georgia, 2 August 2016.
Even though I always make it a point -- at least when I'm paying my own way -- to attend a few Society of American Archivists conference sessions that have nothing to do with my current job responsibilities, I also seek out electronic records sessions that intrigue me or push me a little past my comfort zone. I attended two such sessions this morning: session 309, "DWG, RVT, BIM: A New Kind of Alphabet Soup, with a Lot More Heartburn," and session 409, "Working Together to Manage Digital Records: A Congressional Archives Perspective."

Monday, October 19, 2015

Best Practices Exchange 2015: day one

Light fixture, Pennsylvania State Museum, Harrisburg, Pennsylvania, 19 October 2015.
The 2015 Best Practices Exchange (BPE) got underway at the Pennsylvania State Museum in Harrisburg earlier today. The BPE is a conference that brings together archivists, librarians, information technologists, and other people who seek to preserve born-digital state government information, and it emphasizes sharing lessons learned (i.e., lessons taught by failure) as well as success stories. It's my favorite conference, and I always leave the BPE feeling energized and inspired.

I'm a little under the weather and am still thinking through some of the things I heard about today, so this post is going to be brief. However, I did want to pass on something that really piqued my interest:
  • A group of Michigan archivists and librarians doing hands-on digital preservation work have formed a grassroots organization, Mid-Michigan Digital Practitioners, that meets twice a year to exchange information. The group has no institutional sponsor, has no formal leadership structure, and charges no membership dues; however, the website of Michigan State University's Archives and Historical Collections includes information about and presentations delivered at past meetings. Mid-Michigan Digital Practitioners has capped its size in an effort to ensure that it remains small enough to allow members to form a tightly knit, geographically concentrated community of practice, and I think that this is a good thing. Local and regional professional organizations and regional, national, and international communities of practice are all incredibly valuable, but local, less formalized communities can propel enduring collaboration and can be far less intimidating to people who are just beginning to grapple with digital preservation issues. I would love to see lots of little, unstructured, and locally based digital preservation groups pop up all over the place.
I also want to share a couple of key points that a pair of experienced professionals made about making the case for electronic records management and digital preservation:
  •  The technologies we will use to manage and preserve archival records are the same technologies we will use to preserve records that are not permanent but which have lengthy retention periods. When making the case for digital preservation to CIOs and other high-ranking, we should consider focusing less on the former and emphasizing that we can help care for the latter. If we create an environment in which people are comfortable sending records that have long retention periods to an archives-governed storage facility -- just as they are currently comfortable sending paper records that have long retention periods to a different archives-operated storage facility -- we can easily take care of preserving those records that warrant permanent preservation.
  • All too often, we think in terms of what records creators must do in order to comply with regulations, laws, or records management best practices. We should instead assess the environment in which records creators operate, identify the problems with which creators are struggling, and then stress how we can help to solve these problems.
 Finally, one attendee made a comment that struck me as being so basic that it's often overlooked:
  • When we talk about "electronic records," many people simply assume that we're advocating scanning paper documents and then getting rid of all paper records. We need to make sure that people understand that we're focusing on those materials that are created digitally and will be managed and preserved in digital format. How do we do this?
More tomorrow.

Saturday, August 22, 2015

SAA 2015: making born-digital records accessible


Terminal Tower, Public Square, Cleveland, Ohio, 2015-08-21. Until 1991, Terminal Tower was the tallest building in the city of Cleveland and the state of Ohio. I have loved this building as long as I can remember.
SAA 2015 is in full swing. Today, I sat in on two sessions -- Arrangement and Description and Access for Digital Archives (session 401) and Out of the Frying Pan and Into the Reading Room (session 507) -- that focused on on providing access to born-digital materials. I was tardy in arriving to the first and had to leave the second in order to travel to an offsite meeting, so what follows is a partial listing of things I found interesting or useful.
  • One repository is providing access to a born-digital body of materials that is subject to varying copyright and donor restrictions by loading copies of the files onto a laptop that is not connected to any network and has disabled USB ports. This approach isn't perfect, but archivists shouldn't wait for perfection to start making their holdings accessible. (Moreover, as another archivist pointed out, this approach requires minimal IT support.) 
  • No two collections are the same, and processing is always time-consuming. Another repository assesses each collection of born-digital materials for quality of data, authenticity of data, complexity of the access restrictions associated with copyright and donor stipulations, and anticipated level of use. Records that contain high quality and authentic data, lack complicated access restrictions, and will likely receive high use receive more intensive processing than those that don't meet these criteria. 
  •  The amount of processing work we do will likely vary. One institution has some born-digital collections that consist of flat groupings of items and some collections that consist of files arranged in directory structures. In other instances, collections are mixtures of analog and digital items, and the archives wants the arrangement of the digital materials to correspond to that of the analog. 
  •  We don't yet have a firm sense of what our users want. Some of our users are comfortable with doing keyword or other types of searches, and others are accustomed to box-and-folder hierarchies. We may discover that we need to try to meet the needs of both groups. 
  • Access solutions are varied, constantly changing, and have a way of emerging in response to pressing user requests. We need to remain flexible and mindful of the fact that solutions that work at one institution might not work at another. 
  • We need to publicize our born-digital holdings, and we need to make sure that colleagues who do reference work are comfortable working with these materials and highlight their existence to researchers when appropriate.
The question of making restricted materials available online also came up, and one presenter recommended making use of the redaction functionality being incorporated into BitCurator and informing end users of their responsibilities regarding inappropriate disclosure of information that may be subject to various restrictions. The latter approach was also explored quite extensively in a Thursday afternoon pop-up session that centered on issues raised by recent events at the University of Oregon, and the discussion included making access contingent upon entering into formal, online agreements.

I find this an intriguing approach, but most most government archives will likely be very slow to embrace it. Some state open records laws specify that records creators and archives cannot impose limitations on the use of information that is disclosed in response to freedom of information requests; if a record contains restricted information, the creating agency or the archivist must redact it prior to disclosing it. Moreover, governments tend to be risk-averse -- sometimes excessively, and sometimes with good reason. However, I can envision some scenarios in which government archives might well adopt this approach; using a click-through agreement to highlight the presence of records potentially covered by copyright isn't quite the same thing as hoping a researcher will abide by an agreement prohibiting disclosure of information found within psychiatric case files.

Finally, in response to a question concerning whether we should embed all of the metadata we're creating as we work with digital materials into our finding aids, one of the panelists in session 401 said something that's been on my mind for some time: we need to start thinking about moving away from document-based finding aids. I like Encoded Archival Description (and well-crafted MARC records make me feel as if there is an inner logic and order to the world), but it's high time we stopped thinking of archival description solely in terms of "fast paper."

Thursday, January 8, 2015

Jump In: electronic records

Do you lack hands-on electronic records experience? Are you growing more and more concerned about the floppy disks, CD's, and other portable media lurking in your paper records?  Do you work best when you have a firm deadline? Do you like winning prizes?

If you answered "yes" to most or all of the above questions, you need to know that  the Manuscript Repositories Section of the Society of American Archivists (SAA) is sponsoring its third Jump In initiative, which supports archivists taking those essential first steps with electronic records. Using OCLC Research's excellent You’ve Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Received on Physical Media as a guide, Jump In participants will inventory some or all of their physical media holdings and write a brief report outlining their findings and, possibly, next steps.

Jump In participants will be granted access to a dedicated listserv, and every participant will be entered into a raffle to win a free seat ($185 value) in any one-day Digital Archives Specialist workshop offered by SAA. In addition, select participants will be invited to present their findings at the Section's 2015 annual meeting in Cleveland.

The deadline for committing to Jump In is 16 January 2015, and the deadline for submitting reports is 1 May. For more information about the survey process, possible report topics, and rules of participation, consult the Jump In 3: Third Time's a Charm announcement.

This is a great way to start taking some electronic records baby steps; even if you're a lone arranger, you should be able to craft a survey project that can easily be completed by 1 May. Kudos to the Manuscripts Repositories Section for creating and sustaining this initiative.

Tuesday, August 12, 2014

CoSA SERI PERTTS Portal


If you know what the above means, feel free to skip this post. If you don't, here's an explanation:
This afternoon, I attended the first half of a two-part CoSA workshop focusing on the new portal, which was developed by the SERI Best Practices and Tools Subcommittee. I've been aware of its development -- I'm a member of the SERI Education Subcommitee, which has developed some content for it --but I haven't had the chance to check it out until today. It's still something of a beta build and will be expanded considerably in the coming months, but it already contains a wealth of information:
  • Information about CoSA's electronic records webinars, including a schedule of upcoming sessions and links to recordings and slides from past webinars. 
  • Handouts and slides prepared by instructors of the July 2013 SERI Introductory Electronic Records Institute: Mike Wash (U.S. National Archives and Records Administration), Doug Robinson (National Association of State Chief Information Officers), Pat Franks (San Jose State University), and Cal Lee (University of North Carolina - Chapel Hill).
  • How-to guides and short videos that explain how to complete various processes or use specific tools. Areas covered: file authentication and integrity processes, detection of duplicate files, file format conversions, identifying file properties, renaming files, and ingest/accessioning processes.
  • Links to electronic records training opportunities offered by other organizations.
  • Information about the State Electronic Records Program Framework, which is based upon the Digital Preservation Capability Maturity Model and enables state archives (and anyone else interested in doing so) to assess their preservation infrastructure and identify areas for improvement. If you're employed by a state archives and took the SERI self-assessment, you'll be particularly interested in the portal's discussion of the tangible steps needed to advance from Level 0 to Level 4 within each of the framework's 15 components and in its practical tips for completing the self-assessment the next time it's offered.
  • An ever-expanding and keyword searchable database of summary information about and links to resources relating to virtually every aspect of electronic records management and preservation. If you create a free PERTTS portal account, you'll be able to comment upon these resources; if you would prefer not to create an account, you'll still be able to access them. CoSA will also develop a simple form that will enable you to suggest resources that should be added to the portal.
  • An electronic records glossary that draws from a wide array of sources.
  • Brief case studies and examples of real-world implementations of metadata standards, security protocols, Archival Information Package construction, and other facets of electronic records work.
This is a great resource, and I think it's going to expand and evolve in some really interesting ways. Check it out.

Thursday, October 24, 2013

More Podcast, Less Process

Well, this is cool: More Podcast, Less Process is a new podcast that features "archivists, librarians, preservationists, technologists, and information professionals [speaking] about interesting work and projects within and involving archives, special collections, and cultural heritage." The first episode, CSI Special Collections: Digital Forensics and Archives, featured Mark Matienzo of Yale University and Donald Mennerich of the New York Public Library and debuted at the start of this month. The second, How to Preserve Change: Activist Archives and & Video Preservation, was released yesterday. In it, Grace Lile and Yvonne Ng of WITNESS discuss the challenges associated with preserving video created by human rights and other activists, producing activist video in ways that support long-term preservation, and WITNESS's impressive new publication, The Activists’ Guide to Archiving Video.
Hosted by Jefferson Bailey (Metropolitan New York Library Council) and Joshua Ranger (AudioVisual Preservation Solutions), More Podcast, Less Process is part of the Metropolitan New York Library Council's Keeping Collections project. Keeping Collections provides a wide array of "free and affordable services to any not-for-profit organization in the metropolitan New York area that collects, maintains, and provides access to archival materials." This podcast greatly extends the project's reach.

Given the mission and interests of its creators, I suspect that quite a few More Podcast, Less Process episodes will focus on the challenges of preserving and providing access to born-digital or digitized resources. I'm waiting with bated breath.

More Podcast, Less Process is available via iTunes, the Internet Archive, Soundcloud, and direct download. There's also a handy RSS feed, so you'll never have to worry about missing an episode. Consult the More Podcast, Less Process webpage for details.

Full disclosure: Keeping Collections is supported in part by the New York State Documentary Heritage Program (DHP), which is overseen by the New York State Archives (i.e., my employer). However, I'm plugging More Podcast, Less Process not because of its DHP connections but because it's a great resource.

Sunday, August 18, 2013

CoSA-SAA 2013: Thinking Beyond the Box

The 2013 joint annual meeting of the Council of State Archivists (CoSA) and the Society of American Archivists (SAA) ended at around 1:00 PM today. I'm feeling a bit crispy around the edges and a bit sad about not getting to see everyone or everything I wanted to see, but I'm nonetheless happy. The sessions I attended were all excellent, and most of the people with whom I spoke were also pleased with this year's meeting.

I particularly enjoyed Session 610, Thinking Beyond the Box: How Military Archivists Are Meeting 21st Century Challenges, which started at 8:00 AM this morning (N.B.: I would be remiss if I failed to mention that I am not a morning person. As enthused as I was about this session, I suspect I didn't catch some of the details.)

I asked to serve as the Program Committee's liaison to this session because I thought it would be really interesting, and I was not disappointed. In a society that has for four decades relied upon an all-volunteer military, it's all too easy for those who don't have deep connections to individual military personnel or to the armed forces as institutions to overlook the size, scope, and complexity of the military and the volume, richness, and variety of the records generated by the armed forces and the personal papers created by individual military personnel. This is a problem: if we're to gather and maintain a documentary record that does justice to American society, we need to give the military its due. As today's session emphasized, military records also help to document other aspects of our history and culture. Moreover, the approaches that military archivists have developed to ensure that the documentary record is sufficiently comprehensive and that vast quantities of electronic records are processed quickly and appropriately ought to be of broad professional interest.

Anthony Crawford (Kansas State University) emphasized the value of military records and personal papers of individual servicemen and -women to scholars researching a wide array of subjects:
  • Papers of medical personnel are of interest to historians of medicine and, in the case of women who served, historians of women and gender.
  • Military records and persona papers also document the history of the communities in which they served. A historic preservationist seeking to preserve a British refugee facility that had originally been a military hospital made extensive use of the personal papers of a Using the papers of a member of the U.S. Army Nurse Corps who had been stationed there during the Second World War.
  • Artwork that appears in military publications and on military posters is of interest to historians of art. Hollywood has often sought assistance from the military, films that depict the armed forces in a positive light are sometimes shot on military bases and use soldiers as extras, and historians of film will find these relationships documented in military records.
  • Historians of food and foodways will find that military has reached out to experts of various kinds to obtain information about the nutritional needs of troops and to supply information about the nation's food supply. Menus documenting the meals served to troops are also of interest to these researchers.
James Ginther (Library of the Marine Corps) detailed his repository's efforts to ensure that the Marine Corps's involvement in the recent conflict in Iraq is appropriately documented. The Marine Corps views command chronologies prepared by commanders as the official record of unit-level involvement in conflicts, but many of these chronologies lack essential detail. Marine Corps archivists devised a variety of strategies to overcome these deficiencies – and did so in ways that will be of interest to other institutional archivists seeking to encourage improved recordkeeping:
  • They assembled lists of the personnel responsible for preparing command chronologies. Recognizing that units engaged in combat had other priorities, they didn't press those responsible. However, they did start sending letters of acknowledgement to commanders, who for a long time thought that the reports were disappearing into a black hole in Washington; the letters also indicated that archivists could help them obtain historical information about their units. Once commanders realized that their reports were being read, their reports became more detailed.
  • They trained captains who attended the annual Expeditionary Warfare School and stressed that command chronologies constitute the official record of a unit's activities: the Marine Corps assumes that anything not mentioned in the reports didn't happen. They also emphasized that the Marine Corps uses command chronologies to set budgets and grant awards and that the Veterans Administration (VA) also consults them.
  • They began collecting personal papers and other materials that supplemented the command chronologies. A friend of Ginther's who was deployed to Iraq took a vast number of photographs and conducted oral histories that formed the basis of an award-winning book and donated all of the materials to the Library of the Marine Corps.
  • They also reach out to visiting groups of veterans and other people. When visitors learn about the archives' holdings, they often donate personal papers or agree to an oral history interview with a Marine Corps archivist.
Joel Westphal, who was until recently employed by the United States Central Command (CENTCOM), detailed how CENTCOM is preserving the joint headquarters records created as a result of Operation Iraqi Freedom. The Iraqi conflict is significant in that it marked the first time in military history that the majority of records (more than 95 percent) were created in digital format, and the joint headquarters records were at the center of the largest single transfer of electronic data from a war zone during an ongoing military operation. At the present time, the records, which comprise approximately 52 TB of data, constitute the largest single collection of electronic war records ever assembled; however, the records documenting joint headquarters operations in Afghanistan will ultimately comprise roughly 150 TB of data.

Efforts to preserve these records grew out of a previous failure: only a small percentage of Gulf War records were ever transferred to the U.S. Archives and Records Administration (NARA), and both CENTCOM and NARA were intent ensuring that Operation Iraqi Freedom was documented appropriately. CENTCOM began working on records preservation projects as early as 2003, and NARA began asking about Operation Iraqi Freedom records in 2009. As a result of NARA's inquiries, a war records group was established and United States Forces-Iraq was pushed to establish a records management program and to transfer its records to CENTCOM.

 In April 2010, a five-day assessment of United States Forces-Iraq recordkeeping practices was completed. Although some of the published findings of this assessment turned out to be inaccurate, its estimate of the volume of records was both accurate and extremely important. The records were then inventoried, and CENTCOM established a technical transfer team and a technology team to prepare for the transfer of 52 TB of data.

On August 31, 2010, President Obama declared that Operation Iraqi Freedom had ended, and CENTCOM focused on copying the records onto a storage array and transferring the storage array to CENTCOM headquarters in Tampa, Florida; a full backup copy of the unprocessed data was conveyed to NARA.

A team of three CENTCOM staffers is currently processing the records and sending those identified as permanent to NARA, and the team's processing decisions will be of interest to anyone attempting to implement More Product, Less Processing to born-digital records:
  • The team was adamant that the original order of the records be preserved at all costs, which saved vast amounts of time; the team can now processing 175,000 records per staff member per month.
  • Millions of the records are e-mail messages, and many of them are of transitory value or are non-record material. In order to speed processing and avoid retaining an unmanageable mass of records, the processing team decided that e-mails of generals, admirals, and colonels who held important positions are permanent and that all e-mails of lower-level personnel are retained for 6 years and then destroyed.
  • The team is working with a document analytics vendor whose tools could weed out redundant or near-redundant records, empty folders and zero-byte files, executable files lurking in data-only directories, and other materials that clearly don't warrant preservation.
One final word about this session:  it was assembled by the Military Archives Roundtable, which was established last year.  I took a few minutes today to read the petition to SAA Council seeking permission to form the roundtable, and it's a pretty impressive document.  I expect all manner of interesting things from this group.

Image: the Beauregard-Keyes House, 1113 Chartres Street, New Orleans, 17 August 2013.  This home, which was built in 1826, is an elevated center hall colonial -- a bit of an odd sight in the French Quarter.  Confederate general Pierre Gustave Toutant Beauregard lived in the house in 1860 and from 1866-68.

Sunday, June 23, 2013

New York Archives Conference 2013 recap

Earlier this month, I had the privilege of attending the joint 2013 meeting of the New York Archives Conference and the Archivists Roundtable of Metropolitan New York, which was held at the C.W. Post campus of Long Island University. I was initially scheduled to give one presentation and agreed at the last minute to speak twice, so I didn't get the chance to attend as many sessions or explore the surrounding area as much as I would have liked. However, I did learn a few interesting things:
  • I attended the Society of American Archivists' Privacy and Confidentiality Issues in Digital Archives workshop, which was held the day before the conference began, and I'm pleased to report that both the workshop and instructor Heather Briston (University of California, Los Angeles) are fantastic. I've a substantial amount of time working with records that contain information that is restricted in accordance with various state and federal laws, and I still learned quite a bit. If you get the chance to take this workshop, by all means do so. 
  • Jason Kuscma, the executive director of the Metropolitan New York Library Council, delivered a thought-provoking plenary address, "(Re)Building: Opportunities for Collaboration for New York's Cultural Heritage Institutions," in which he used post-Hurricane Sandy recovery efforts as an entry point for discussing the concept of collaboration. I was particularly struck by his analysis of why collaboration, which involves sharing of risk, is so difficult: it forces us to admit what we don't know, it makes us confront ambiguity and fluidity, it requires discussion and deliberation, it compels us to share information that we may view as proprietary, it has the potential to expose us to even more conflict than we currently experience, and it makes us worry about who's going to get credit for the successes and blame for the failures. I've been involved in a number of collaborative projects over the years, and some of them went belly-up as a result of some or all of the problems that Kucsma identified. The successful ones worked because people were willing get out of what he referred to as "emotional, cultural, and institutional silos," embrace uncertainty, define achievable goals, and entertain the possibility of working with unconventional partners.  As Kucsma pointed out, Hurricane Sandy is merely a dramatic example of a problem that's too large and too complex for any one organization to take on by itself. Archivists and librarians face a growing number of such problems, and we need to figure out how to tackle them together.
  • Kucsma also highlighted the existence of a recent report that somehow escaped my attention. I2NY: Envisioning an Information Infrastructure for New York State was prepared at the behest of New York's regional library associations, and it assesses the state's current library information landscape, which already features some collaborative initiatives, and outlines how the library associations can move toward building a fully comprehensive, fully collaborative information infrastructure.  The report doesn't discuss born-digital archival records, but it does envision the expansion of the collaborative archival digitization efforts led by the regional library associations (which are now exploring how to incorporate digital surrogates or archival materials into the Digital Public Library of America). It calls for creating innovative professional development opportunities. 
  • I do not envy curators seeking to preserve born-digital works of art. In addition to worrying about all of the hardware and software, data integrity, storage, metadata, information security, and other technical concerns that anyone seeking to preserve digital resources must address, they also have the unenviable task of sussing out the artist's intent and preserving significant properties that may be unique to each viewer/listener or dependent upon external resources.  The interactive (and very cool) short film The Wilderness Downtown requires that each viewer enter an address and then pulls data from Google Street View to create visual content. Static and distortions present on an analog recording of an experimental television show may be the result of media degradation . . . or may be the result of the creator's deliberate manipulations. 
  • Cornell University's Rose Goldsen Archive of New Media Art holds a host of analog video, old CDs and DVDs that require Mac OS 9 or other obsolete software or hardware, and Internet art.  At present, the archive maintains an array of older hardware and software and focuses on documenting playback requirements, digitizing analog content, archiving Web sites, and developing emulation software. It's also using National Endowment for the Humanities grant funding to preserve CD-ROM-based works of art.  This grant project should allow Cornell to identify how to conduct technical analyses of digital artworks, develop generalizable user profiles for new media art, create a viable data object model and associated PREMIS or RDF metadata profile, and identify a Submission Information Package structure that will support long-term preservation.
  • The Museum of Modern Art (MoMA) is developing a Digital Repository for Museum Collections that currently houses 60 TB of artwork that was originally stored on floppy disks, CDs, and other portable media.  Archivematica will supply this repository's core processing services, and a conservation management application will be created to house descriptive information and document software and other dependencies.  MoMA is also exploring using emulation to make digital artworks accessible not only to people who visit MoMA's physical exhibit spaces but also to people who access MoMA's website.  MoMA is also in the midst of completing a formal study that compares the fidelity of emulation vs. native hardware and software, and I'm really looking forward to seeing the findings arising from this study.
  • Documentary filmmaker Jonathan Minard, whose work in progress Archive examines the future of long-term digital storage, the development of the Internet, and Internet preservation efforts, highlighted an essential but frequently overlooked truth:  the Internet is a utility, not a library, and its operations are governed chiefly by market considerations. Cultural heritage professionals disregard this truth at their peril. (BTW, part one of Archive, which focuses on the work of the Internet Archive, is available online.)
  • The National Digital Stewardship Alliance (NDSA), a Library of Congress-led membership organization of individuals and organizations seeking to preserve digital cultural heritage materials, is developing Levels of Digital Preservation, a simple, tiered set of guidelines that will allow institutions to assess how well they're caring for their digital holdings. It addresses storage and geographical redundancy, file fixity and data integrity, information security, metadata, and file format issues, and the NDSA group developing it would appreciate your feedback.
  • If you want a DSpace-powered institutional repository but lack the IT resources needed to maintain your own DSpace installation, you're in luck:  DuraSpace, the non-profit organization that guides the development of DSpace and several other digital access and preservation tools, is now offering DSpace Direct, a hosted DSpace service. For approximately $4,000 a year, you can quickly set up your own DSpace institutional repository, select the language customization and other features that meet your needs, and allow DuraSpace to take care of storing and backing up your data (via Amazon Web Services) and upgrading your DSpace software.
Image: This building, now known as Winnick House, was formerly Hillwood, the house that anchored the Gold Coast estate of Post cereal heiress Marjorie Meriweather Post and her second husband, financier E.F. Hutton. The estate was sold to Long Island University in 1951. Winnick House, which is by far the grandest structure on Long Island University's C.W. Post campus, houses the university's administrative offices. This photograph was taken on 4 June 2013.

Thursday, February 28, 2013

Electronic records disaster preparedness workshops in New York State

Owing to the extensive damage that eastern New York State suffered as a result of the remnants of Hurricanes Irene and Lee and that New York City and Long Island experienced as a result of Hurricane Sandy, archives throughout the state are devoting a lot more attention to disaster preparedness and recovery.  They're also trying to fill in some gaps in the existing professional literature, which is pretty squarely focused on paper and film-based records.  In an age in which ever-increasing quantities of archival records are created and housed digitally and repositories create and maintain access tools electronically,  archivists and records managers need to know how to protect their digital assets and recover data stored on electronic media.

In an effort to give archivists and records managers the tools they need, two organizations are offering electronic records disaster preparedness and recovery workshops next month.

First, on 7 March, the Metropolitan New York Library Council and New York University's Moving Image Archiving and Preservation program are offering an all-day Disaster Preparedness and Response Bootcamp for Mixed Media Collections workshop in New York City:
Description
When a disaster strikes and valuable collections are damaged, the clock begins ticking. The actions taken in the first few hours and days are critical to the long-term recovery of the material. Yet this is also the time when more damage can be done due to chaos, carelessness, and lack of preparation. Disaster preparedness plans can provide guidance, but every disaster is different and disaster plans need to be adapted to the specific response scenario. This workshop will focus on disaster preparedness planning and first response, and will provide participants with the opportunity to think on their feet, get hands-on handling experience, discuss challenges, and learn from real-world case studies.

Learning Outcomes

Participants will be introduced to critical first response steps as well as logistics considerations and operational requirements of a salvage and recovery scenario for cultural heritage collections. Participants will also learn how to improve their disaster preparedness plans so that when the next disaster strikes, caretakers will be ready to respond. While handling and recovery procedures for different media types will be discussed, it will not go into great detail on conservation procedures for specific media types. Recovery procedures for media such as video, audio, and film will be emphasized, due to the unique requirements of these media, and lack of available literature.


Disclaimer
Portions of this workshop will be videotaped.
By registering to participate in this workshop, you grant METRO and MIAP the right to record and distribute through audio/video recording your image and/or comments or questions that may result from your participation.

Please be advised that you will get dirty during the course of this workshop. Please dress accordingly.
This workshop is being taught by Kara van Malssen, who is a Senior Consultant for AudioVisual Preservation Solutions and a graduate of the Moving Image Archiving and Preservation Program who first started doing multimedia disaster recovery work in Katrina-stricken New Orleans.  (Check out her master's thesis -- it's superb.)

The registration fee for this workshop, which is partially supported by the Institute of Library and Museum Services, is $45.00.  The workshop will be held at the Metropolitan New York Library Council's Training Center, which is located at 57 East 11th Street, 4th Floor, New York, NY 10003.

 Second, on 26 March, the New York State Archives is offering a three-hour Electronic Records Disaster Planning & Response workshop in Albany:
Description
This workshop specifically focuses on electronic records disasters. Electronic records are susceptible to damage from water from floods or fires, heat from fires, power surges, computer viruses, and accidental or intentional destruction of data. Participants will learn how to mitigate these risks and respond should disaster occur.

Goals
This workshop will explain:
  • How to assess your organization's risk of experiencing an electronic records disaster
  • How to reduce the chances that a disaster will damage your electronic records
  • How to salvage various types of electronic media
  • How to recover data stored on damaged electronic media
Although this workshop has been customized for Records Management Officers employed by New York State government agencies, anyone may attend and most of the information contained within it will be of use to archivists and records managers working in a wide array of settings.

Although my employer's website doesn't identify any workshop instructor by name, I developed and am teaching this particular offering (apologies for the shameless self-promotion).  If you can't make it to Albany on 26 March, please note that we're planning to offer it again in the reasonably near future -- perhaps in person, quite possibly online.

There is no registration fee for this workshop, which will be held in the 11th floor conference room of the Cultural Education Center, Albany, NY 12230.

Friday, October 12, 2012

Day of Digital Archives

Today is the second annual Day of Digital Archives, which seeks to:
raise awareness of digital archives among both users and managers. On this day, archivists, digital humanists, programmers, or anyone else creating, using, or managing digital archives are asked to devote some of their social media output (i.e. tweets, blog posts, youtube videos, etc.) to describing their work with digital archives. By collectively documenting what we do, we will be answering questions like: What are digital archives? Who uses them? How are they created and managed? Why are they important?
For me, this particular Day of Digital Archives was -- with the exception of this blog post -- completely devoid of digital archives.  I'm visiting my parents at the moment, and today has been more about talking with my mom and dad, driving around, buying food, and going through stuff (physical objects, not digital files or emotional issues) than anything else.  However, I do have a few minutes in which to dash off a quick post, so I'll outline the things that I've during the last five working days:
  • Since 2006, my repository, the New York State Archives, has been using OCLC's Heritrix-based Web Harvester to capture state government Web sites.  We've now documented three (!) gubernatorial transitions and a host of other changes in state government, so now is a good time to step back, assess what we've captured, and determine whether we should capture specific sites more frequently, less frequently, or at roughly the same rate, so a colleague and I have been sifting through a subset of our captured sites and preparing a draft report and recommendations. 
  • Our preservation copies of our Web captures are housed in OCLC's Digital Archive, and we're starting to explore the possibility of using the Digital Archive for remote storage of some of our other electronic records.  OCLC's Digital Archive documentation is pretty good, but it doesn't answer all of our questions, so earlier this week, one of my colleagues and I sat down for a conference call with an OCLC staffer.
  • I put together the first draft of a document that discusses the basics of electronic records disaster preparedness, particularly for small organizations that aren't likely to have full-fledged disaster preparedness or business continuity plans, and outlines how to salvage and stabilize damaged electronic media in the wake of a disaster.  Several colleagues are currently reviewing it.
That's what I did at the office.  I've also been focusing on a couple of extracurricular projects:
  • I coordinated the assembly of a session proposal for the 2013 joint annual meeting of the Council of State Archivists and the Society of American Archivists that focuses on records management and digital preservation in cloud computing environments.
  • My former colleague Jim Tammaro is now teaching an Advanced Archives Management course at SUNY Buffalo, and next Tuesday I'm speaking to his students about archival preservation of Web sites and social media content; in addition to various policy issues, I'm going to highlight Heritrix, HTTrack, and various other tools.  I began working on my slides and handouts several weekends ago, and I'll put the finishing touches on them tomorrow and Sunday.
  • At the upcoming Mid-Atlantic Regional Archives Conference (MARAC) meeting in Richmond, Virginia, Paul Wester and Arian Ravanbaksh of the U.S. National Archives and Records Administration (NARA) and I will be taking part in a session focusing on the recent Presidential Memorandum on Managing Government Records.  Paul and Arian will talk about the memorandum, which heavily stresses the need for appropriate management of federal electronic records, and NARA's efforts to provide advice and guidance to federal agencies seeking to comply with this directive.  I'll discuss the implications of the memorandum for state governments, and I devoted a couple of evenings to pulling together an initial outline and compiling background statistics re:  recent changes in state archives staffing levels in the MARAC region.

Friday, August 10, 2012

SAA 2012: electronic records in political collections

I spent most of yesterday afternoon contending with a migraine, so my memories of yesterday's “Share a Byte! A Practical, Collaborative Approach to Electronic Records in Modern Political Collections” session are a bit vague in spots, but I was so impressed with both presentations that I feel compelled to write about them. I only hope I can do them justice.

The first presenter, Jennifer Huebscher of the Minnesota HistoricalSociety, discussed her repository's quick-and-dirty but highly effective approach to increasing access to electronic records. She focused on the electronic records of former Governor Timothy Pawlenty, who was exploring a presidential run at the time the records were transferred, and on the records of a gubernatorial redistricting commission.

From the start, the Minnesota Historical Society's collaborative relationship with the Office of the Governor smoothed the way. The records were covered by a retention schedule that was devised during the tenure of Pawlenty's predecessor, and as a result Pawlenty's staff knew that certain types of records should be kept. Toward the end of Pawlenty's second term, his staff contacted the Minnesota Historical Society and then arranged an in-person meeting to discuss the impending transfer.

At the end of the year, the Office of the Governor placed the records on portable media and gave them to the Minnesota Historical Society. Owing to its discussions with the governors' staff, the archivists had a clear sense of what to expect, were able to compare the files it had in hand with the list of files it anticipated receiving, and were able to obtain a missing set of files from Pawlenty's staff

The files consisted of image files, sound files, and one moving image file. The image files, which consisted of digital photographs of Governor Pawlenty and the First Lady, were transferred on two DVD-R discs and consisted of 1,740 files, most of which were in JPEG format but also included some TIFF, PDF, and BMP files. Images of the Governor were placed on one disc, and images of the First Lady were placed on the other, and each disc contained nine folders – one for each year the Governor was in office. The file names ranged from descriptive to vague, and the naming conventions used for images of the Governor differed from those used for images of the First Lady.

Some of the sound files were transferred on CD and DVD, and Minnesota Historical Society harvested others from the Web using HTTrack. Most of the 410 files were in MP3 format, but others were WAV or CDA files, and one was an MP4 file.

Minnesota Historical Society's electronic records archivist copied the files onto a secure Storage Area Network maintained by the state's Enterprise Technology department, and staff continue to run checksums periodically; however, a full-fledged framework for preserving these files has yet to be developed.

Internal collaboration made the records broadly accessible. The electronic records archivist produced a set of copies that cataloging staff processed and described, and the two worked together to figure out how best to provide access to them. Owing to significant public and media interest in the files, Huebscher and her colleagues sought to apply the principles of More Product, Less Process processing. They didn't alter file names or the overall arrangement of the files unless duplication or other problems made doing so absolutely necessary, they didn't create a set of preservation masters in normalized formats, they didn't add any extra metadata, they didn't do any additional research that would have enhanced description of files that had non-descriptive or undated file names. They created the finding aids describing the records by using a simple template, extracting file names, and using matching the hierarchical arrangement of the finding aids to the hierarchical arrangement of the files themselves.

The finding aids also facilitate access to the files themselves. The sound recordings finding aid covers a mix of born-digital files and physical cassettes and CDs, the display is simple and uncluttered, and the access copy of each born-digital file is hyperlinked in a field so users can easily download the files; the finding aid also includes file sizes to that users could estimate download times. The photographs finding aid includes a thumbnail illustration for each photograph (housed within a tag), and the amount of description varies depending upon information provided with each photo.

The Minnesota Historical Society took a similar approach to making accessible geospatial data created by a gubernatorial redistricting commission, and plan to use the procedures they developed when processing the Pawlenty records and the redistricting commission files to make other records transferred on disc accessible via the Web.

I was in pretty bad shape by the time Jim Williams of Middle Tennessee State University's Albert Gore Research Center began discussing his institution's efforts to rescue two U.S. Representatives' constituent service files, so my notes and my memory of his presentation are both deficient; as a result, I'm limiting my comments to the portions of the presentation I remember semi-clearly.

The constituent services files Williams and his colleagues sought to preserve were created using Lockheed Martin's Intranet Quorum (IQ) application. The offices of many U.S. Representatives use IQ to track correspondence, store constituent contact information, and track the progress of constituent cases. IQ is proprietary, and each office that uses it pays roughly $60,000 per year to do so. Lockheed can convert the data in IQ systems to a more user-friendly format, but there is a cost associated with doing so.

Middle Tennessee State University was able to persuade the U.S. Representatives who donated their records to pay for the conversion of their IQ data, but other repositories may find themselves forced to pay for conversion or to convert the data themselves. As a result, the university hopes to take the lead in developing ways to reconstruct IQ databases and to collaborate with other archives seeking to to the same thing. Anyone interested in participating in a consortium devoted to preserving IQ data should contact Williams at Jim.Williams-at-mtsu.edu.

Image:  Light fixtures in Sapphire Room OP, Hilton San Diego Bayfront, 10 August 2012.

Thursday, May 3, 2012

NARA releases 2011 records management assessment

Since 2009, the U.S. National Archives and Records Administration (NARA) has conducted annual surveys of federal government agencies' records management practices.  All of these surveys have revealed that electronic records management is a particular challenge for the federal government, and the 2011 assessment, the results of which NARA released earlier this week, is no exception.  Although NARA identified some modest successes, most notably increased transfers of archival electronic records, it's plain that management of electronic records remains an area of particular concern.  NARA found that:

< snip >
  • Many respondents do not know or understand key terms and concepts pertaining to electronic records;
  • Many respondents consider various aspects of electronic records management to be the purview of information technology staff;
  • A significant number of agencies do not have migration procedures in place to ensure that electronic records are retrievable and usable to conduct agency business;
  • Many respondents believe that media neutral records schedules eliminate the need for records management policies and procedures specific to electronic records;
  • A significant number of agencies use backup tapes, which NARA does not consider a recordkeeping system, to preserve electronic documents and e-mail records;
  • A third of agencies are using an ERMS [Electronic Records Management System] or RMA [Records Management Application] to manage their electronic records;
  • Over 40 percent of agencies use e-mail archiving applications to manage e-mail messages . . . .
< /snip >

These findings are depressing but not particularly surprising.  Electronic records management remains a real challenge for many public- and private-sector organizations.  I would be willing to bet that the feds are actually ahead of most (but by no means all) state and local governments, and I suspect that many corporations -- even those whose stock in trade is digital information -- are similarly challenged. Earlier this week, I blogged about the near-disaster that Pixar (which should be applauded for its candor) experienced, and Twentieth-Century Fox and Paramount have discarded or lost digital files that have monetary and artistic value.  A host of other corporations are probably hoping that their records and information management nightmares remain out of the public eye.

What does NARA propose to do about the sorry state of federal records management?  Appendix I of the recently released report offers a detailed plan of action, and I encourage you to read it -- and the rest of the report -- in its entirety.  However, I will say that I'm particularly pleased that NARA wants agencies to incorporate records management plans -- with benchmarks and resource allocations -- into their annual budget submissions to the Office of Management and Budget (OMB).  I'm also glad that NARA to work with OMB to ensure that records management and archival functions are incorporated into new electronic recordkeeping systems and into the federal "IT governance process."  When a fiscal control entity demands something, government agencies tend to listen.

Friday, April 27, 2012

How Toy Story 2 was almost lost


Even the pros have close calls sometimes. In this video, two Pixar employees explain how the files that comprised the film Toy Story 2 were almost lost as a result of an erroneous delete command and a backup routine that had stopped working properly. The only thing that saved Pixar from having to devote a year to reconstructing the lost files: the film's technical director was doing a lot of work at home and had a copy of the files on her home computer.

Moral of the story: verify that your backup routine is producing readable backups -- and be very, very careful when typing Unix/Linux "rm" commands!

Saturday, April 14, 2012

MARAC Spring 2012: Fundamentals of Electronic Records

The Spring 2012 meeting of the Mid-Atlantic Regional Archives Conference featured two sessions focusing on electronic records, and the second session, "Fundamentals of Electronic Records," took place earlier today.

My colleague Michael Martin opened the session by discussing how the New York State Archives typically conducts appraisals. Regardless of format, we compile information about the history of the unit that created or currently maintains the records, the disposition of similar records created by other agencies, similar records already in our holdings, and published research that makes use of similar records. We also look for records disposition schedules for similar or related records, and pertinent state and federal laws and regulations. We then meet with creators to determine the contents of the files, identify any major gaps, examine blank forms or computer reports, and assess the environment in which the records are housed. All of this research forms the basis for formal appraisal reports that assess the legal, administrative, environmental, and research value of the records, identify major preservation and access issues, and recommend specific records management, accessioning, and preservation actions.

When appraising electronic records, we push against creator assumptions that aren't always accurate: that gaps won't exist, that volume won't be an issue, that everything can be easily found, and that passively managed records will remain accessible over time. We also complete a supplemental technical appraisal. We make it a point to speak not only to agency records managers and records creators but also agency IT personnel, and we gather information about the name of the system in which the records are housed, the type(s) of records present, ownership of the records, the hardware and software environment, the size of the system, the physical location of the hardware housing the system, how often records are retrieved and used, the accuracy and completeness of the data, and the existence and location of backup copies. The technical appraisal also assesses the long-term resource commitments needed to ensure that the records will remain accessible over time.

Sibyl Shaefer and Laura Montgomery of the Rockefeller Archive Center focused on the accessioning and ingestion of electronic records. The Rockefeller Archive Center has a sizable backlog of unprocessed records, some of which consist of a mix of paper records and electronic records on legacy media. The digital archivists are searching through boxes, removing legacy media, and producing basic preservation copies of the electronic records, but the paper records may not be processed for some time after this sifting takes place. As a result, the possibility that the relationship between the paper and electronic records will be permanently severed is quite real. In order to ensure that this doesn't happen, Shaefer and Montgomery document the removal of the electronic media in the Resources module (the Accessioning module isn't sufficiently flexible) in their instance of the Archivist's Toolkit (our accessioning workflow is still paper-centric, so for now we're documenting separations of this nature on paper). When the repository receives new accessions, staff conduct a quick survey of the collection, remove the digital media, attach tracking sheets to each piece of media, and create a collection record in the Archivist's Toolkit that documents the removal of the media.

The Rockefeller Archive Center uses Archivematica to ingest electronic records and create item-level preservation and administrative metadata and Submission Information Package-level description metadata. At present, rights issues are a real concern: many of the collections that consist of a mix of paper and electronic records are covered by old donor agreements that make no reference to electronic records, online access, or related issues. Staff eventually hope to enter all information about rights issues into Archivematica at the point of ingest and have it reflected in the PREMIS metadata that Archivematica creates upon ingest.

Jeanne Kramer-Smyth of the World Bank Archives (and author of the always awesome Spellbound Blog) concluded the session with a provocative assessment of issues relating to access. Noting that records aren't truly accessible unless they're also understandable and meaningful, she highlighted the importance of making sure that preservation actions don't inadvertently alter the significant properties of records. For example, the New York Public Library archivist who processed the papers of Jonathan Larsen, the creator of the musical Rent, discovered a mystifying one-line inconsistency in the Microsoft Word 5.1 file containing the lyrics to one of the songs: when opened in an emulator, the line read "before the virus [HIV] strikes." When opened in Microsoft Word 5.1, the line was completely different. Only after opening the file in a hex editor did the archivist figure out what was going on: Microsoft Word 5.1 had a save feature that embedded revisions at the end of the file, but the emulator wasn't configured to read and apply these changes. Had the archivist not taken the precaution of opening the file in its native environment, he or she might have decided that the emulator was a reliable preservation and access tool for Microsoft Word 5.1 files.

As Kramer-Smyth pointed out, migrating files from one format to another can also cause problems: loss of information, loss of fidelity (i.e., changes in appearance or behavior), loss of authenticity/legal admissibility, and the likelihood that migration will have to be performed repeatedly. Moreover, in some instances, it may not be possible to migrate files. In others, one may have to pull records into an emulated environment prior to migrating them

Kramer-Smyth also highlighted a couple of intriguing emulation environments. Basilisk II emulates older Macintosh environments, and Dioscuri provides a universal virtual computer that enables you to run a variety of operating systems and software applications, and all you need to do in order to keep it usable is migrate its interface over time. However, she stressed once again that emulation has its limitations: you need to mimic hardware (a particular concern when attempting to replicate the original user experience), you need to preserve the original operating system and application software, and software licensing issues are a matter of enduring concern.

Despite the limitations of migration and emulation, in the end we will probably have to embrace both approaches: migration can keep electronic files accessible in the relative short term, and emulation will likely be needed in the longer term.

In closing, Kramer-Smyth offered a few intriguing thoughts about end user access:
  • In most instances, we will not construct electronic reading rooms akin to the onsite reading rooms that enable us to provide access to paper materials. However, in instances in which specialized hardware is called for or we want to ensure that users don't copy or disseminate materials that are legally restricted or have intellectual property restrictions, we may require users to visit our physical repositories.
  • We may create virtual reading rooms at some point in the future, but at present most of us have neither the technological resources nor the volume of electronic files needed to make this approach workable.
  • NARA and Maine's Office of GIS allow users to download electronic records in a variety of formats, and we may want to consider embracing this user-centered approach.
I'm heading back to Albany in a little while, but tomorrow I'll put together a post that highlights some of the other tidbits I picked up at MARAC and the beauty that is Cape May. If you ever get the chance to visit this charming little city, by all means do so.

Photo: the Joseph and John Steiner Cottages at 22 and 24 Congress Street, Cape May, New Jersey, 13 April 2012. These homes, which have signs indicating that they were built in 1848, aren't as large or as ornate as many other Cape May Victorians, but they have a sweet charm all their own.

Tuesday, February 21, 2012

Electronic records roundup

In no particular order, some electronic records news that may be of interest:
  • The thoughtful and hard-working folks at the South Carolina Department of Archives and History have explained some of the challenges of preserving the state's digital history. (As you'll recall, gubernatorial e-mail management practices recently gave rise to controversy in the Palmetto State.)
  • The archivists at Queens University (Canada) are grappling with similar issues.
  • David Pogue and CBS Sunday Morning drew attention to "data rot" -- the problems associated with hardware and software obsolescence. (N.B.: Pogue thinks that the word "archivist" contains a long "i."
  • If you're interested in the evolution of cybersecurity, be sure to check out the short films that were shown at the annual conferences attended by Bell Labs executives. You'll find them on YouTube courtesy of the AT&T Archives. (And if you're interested in the history of hacking, be sure to check out Ron Rosenbaum's fascinating 1971 article on "phone phreaking," which captured the imagination of a generation of computer enthusiasts -- Steve Jobs among them.)
  • A computer science Ph.D. student has found that, less than a year after the revolution in Egypt, approximately 10 percent of the social media posts documenting it have vanished from the live Web. A variety of factors account for this situation. People sometimes post things, regret doing so, and then delete them. Others get tired of maintaining their accounts and delete or deactivate them. Others were almost certainly the target of government repression and either removed content under duress or had content removed without their consent. The student's overarching conclusion: we need to become a lot more proactive about capturing Web content that documents the unfolding of historically significant events. (He'll get no argument from me.)

Tuesday, January 3, 2012

Need help addressing your e-records issues?

If one of your New Year's resolutions involves finally doing something about your electronic records, be sure to check out Preserving Electronic Records in Colleges and Universities: Getting Your Program off the Ground. This online workshop, which records and knowledge management expert Steve Goodfellow developed for the New York State Historical Records Advisory Board, will take about two hours to complete and covers a host of topics:
  • E-records to be aware of in your environment
  • Awareness of the issues
  • Standards and other e-preservation initiatives
  • Goals and strategies for your preservation efforts
  • Disaster preparation and recovery planning
  • Developing an action plan
As the title suggests, this non-technical workshop was originally developed for college and university archivists in New York State. However, the information it contains is relevant to archivists working in a variety of settings who are trying to figure out how to start addressing their electronic records issues, and the workshop videos -- which can be viewed in one sitting or in ten-minute increments as time permits -- and supporting materials are freely available to everyone with an Internet connection.

Thursday, December 15, 2011

Catching up

A few things you might have missed:
  • Late last month, President Obama issued a memorandum directing each federal government agency to perform a comprehensive review of its records management program and then prepare a report for the Archivist of the United States and the Director of the Office of Management and Budget that outlines its plans to maintain and improve its program, "particularly with respect to managing electronic records, including email and social media, deploying cloud based services or storage solutions, and meeting other records challenges." These reports are due on 27 March 2012.
  • Paper records created during an internal military investigation of a November 2005 massacre of civilians in the Iraqi city of Haditha were slated for destruction. However, the records, many of them marked as being secret, ended up in trailers purchased by a local businessman, who hauled the trailers to a Baghdad junkyard. Several weeks ago, a New York Times reporter covering the American withdrawal from Iraq inadvertently found them there. At present, it is unclear whether the military will open an investigation into the handling of these records.
  • After a legal review, the Massachusetts State Archives has decided to open approximately 460 boxes of paper records of former Governor and current Presidential candidate Mitt Romney to researchers. Staff will review the files prior to disclosure and either remove or redact legally restricted information. The repository initially restricted access to the records as a result of a court ruling stating that gubernatorial records were exempt from the state's freedom of information law. As you'll recall, during the last days of the Romney administration, all of the files on its e-mail servers were deleted, several high-ranking officials were allowed to purchase the state-owned hard drives they used, and leased computer equipment was replaced.
  • The administration of South Carolina Governor Nikki Haley routinely deletes internal e-mails. The administration claims that it does so in order to free up storage space on its server, but Erik Emerson, Director of the state's Department of Archives and History, asserts that it violates state records laws.
  • OccupyArchive is George Mason University's Roy Rosenzweig Center for the History of New Media effort to capture digital items documenting Occupy Wall Street and other Occupy movements throughout the world. As Rosenzweig Center director Sharon Leon notes, they're "documenting a post-print movement" -- something that archivists must do if they want to ensure a complete and accurate documentary record.
  • Finally, on a lighter note, here's why we need to caution teens about sexting: sooner or later, their sexts will be all over the Internet for everyone to read.

Tuesday, December 6, 2011

Salvage and recovery of water-damaged solid-state electronic media

In the wake of tropical storms Irene and Lee, I've done some research into how to salvage and recover data housed on flood-damaged electronic media. There are some great, media-specific resources out there:
However, at present, information about how to salvage and recover data housed on solid-state media such as flash drives and digital camera and smartphone memory cards and solid-state devices such as portable music players (sometimes used to record audio), tablet devices, and computers with solid-state drives (e.g., MacBook Airs) isn't readily available. As a result, I contacted several vendors who specialize in recovering data from electronic media and devices damaged in floods, fires, and other disasters and asked for their advice. What follows is an initial summary of these conversations. I hope that it fills a gap in the existing professional literature -- and that no one who reads this blog ever has cause to make use of the following advice.

First, a few general guidelines:
  • Restoring data from backups is always easier and cheaper than recovering data housed on damaged electronic media. Back up your data!
  • A good disaster management plan will reduce the risk that your media will be damaged. For more information about developing such plans, consult the New York State Archives publication Preparing for the Worst: Managing Records Disasters.
  • Floods and burst pipes aren't the only water-based disasters. First responders use water to fight fire and to keep down dust from collapsed structures. If your media is burned or crushed and wet, treat it as water-damaged.
  • In some instances, you may have no choice but to try to recover data from damaged media. Backups may be incomplete or become corrupt, and sometimes records created immediately before disaster strikes (e.g., photographs documenting a crime scene) are so valuable that the time and expense associated with recovery is warranted.
  • When disaster strikes, salvage damaged media and stabilize it long enough to determine whether your backups are complete and intact. If your backups are complete and readable or the records on the damaged media are less than essential, don't attempt to recover the data stored on the damaged media; however, as noted below, the cost of attempting to recover non-essential data from water-damaged flash drives and memory cards is so low that you might want to give recovery a shot. If the records are essential and backups don’t exist, are incomplete, or have been corrupted, attempt to recover the data housed on the damaged media.
  • Actions suitable for water-damaged paper records may destroy electronic media. Although solid-state media should be air- or rice-dried (see below), some types of electronic media (e.g., hard drives) should be kept wet. Freeze- or vacuum-drying or using heat to speed air drying will likely destroy most forms of electronic media, and using heat to speed air-drying may also damage or destroy media.
  • Protect yourself. Before you enter a flooded area, consult with emergency personnel and make sure that it's safe to enter. Contaminated water and live electricity -- keep in mind that uninterruptible power supplies attached to hardware may be live well after the power goes off -- pose serious safety risks, and noxious gases can build up, particularly in basements. Wear appropriate protective gear.
  • Be prepared to document the disaster. If you need to file an insurance claim, your insurer will likely want photographs illustrating the extent of the damage. If the disaster is small (e.g., you drop a thumb drive housing important records into a cup of coffee), you may want pictures for your own records. If you're an archivist, records manager, or conservator, you may also want images to incorporate into presentations, publications, or other training materials. You may also need to take notes about the scope of the disaster and the location of hardware and media (first responders sometimes disconnect stuff and move it around).
Now, down to the nitty-gritty of salvaging and recovering water-damaged solid-state media and devices. If you're confronted with water-damaged solid-state media or devices, the following guidelines will maximize your chances of recovering your data.

Before you begin your initial salvage and stabilization effort, make sure you have the appropriate supplies on hand. For solid-state media and device(s), you'll need, at minimum, some clean, dry, lint-free cotton cloths (in a pinch, old bedsheets or garments will do) and some gallon- or quart-sized zippered plastic storage bags. Odd as it may seem, you may also want to have some uncooked white rice on hand.

Salvage and stabilization of flash drives and memory cards
  • Remove memory cards from devices and disconnect drives from powered-down hardware.
  • Wipe off any surface dirt and water with a clean, dry, lint-free cloth and then air-dry the media as soon as possible: place the media on a clean, dry, lint-free cotton cloth and prop it up in a way that speeds drainage.
  • You may use fans and dehumidifiers to facilitate the drying process.
Salvage and stabilization of solid-state devices (e.g., cell phones, tablet devices, computers with solid-state drives)
  • Unplug or remove the battery as soon as possible and gently shake the device to remove water lodged in ports and other openings.
  • Wipe off surface dirt and water with a clean, dry, lint-free cotton cloth and then air-dry or "rice-dry" the device. To air dry the device, place it on a clean, dry, lint-free cotton cloth and prop it up in a way that facilitates drainage. You may use fans and dehumidifiers to speed the process. To rice-dry the device, place it in a zippered plastic storage bag and then fill the bag with uncooked white rice. If you must retain the device for more than 2-3 days, replace the rice to reduce the risk of mold growth. (FYI, this "rice-dry" technique may also bring water-damaged cell phones or digital cameras back to life . . . but I don't think I would trust such a device in a mission-critical situation.)
After you've salvaged and stabilized the media or device(s), assess whether recovery is warranted. Do you have complete, uncorrupted backups of the records stored on the media or device? If you do, restored the data from the backups and discard your damaged media. If you don't, how valuable are the records? Are they essential to your business operations or a court proceeding? Are they of immense historical (or, in the case of personal files, sentimental) value? How great is the cost of recovering the data? As noted below, the cost of attempting to recover non-essential data from a flash drive or memory card is quite low. The cost of having a vendor recover data from a solid-state device can be quite high. You have to determine whether the value of the records warrants the cost of recovering them.

If you determine that the data is essential and warrants the cost of recovery, you'll need to contract with a vendor that specializes in data recovery work. Many state archives maintain lists of such vendors, and a quick Web search will identify many others.

If the data is non-essential, discard the media or device appropriately; however, if the data is stored on a flash drive or memory card, you may want to try to recover it yourself. Damaged flash drives and memory cards that house legally restricted or sensitive data should be physically destroyed (by a recycling vendor or with a hammer or shredder), and damaged devices that house such data should be sent to a vendor that will destroy their drives and recycle their other components. Damaged media and devices that don't contain such data can probably be recycled by vendors who specialize in processing electronic waste.

Recovering data from flash drives and memory cards
  • If the data is essential, send the drive or card to a qualified disaster recovery vendor.
  • If the data is non-essential, attempt to read the files on the damaged device. If you are successful, copy the files onto new media and discard the damaged media. If you are not successful, admit defeat and discard the media or, if you are attempting to recover data from a memory card, decide whether the purchase of commercial recovery software (prices begin at around $30.00) is warranted.
Recovering data from solid-state devices
  • Air- or rice-dry the device(s) and then send the device(s) to a qualified disaster recovery vendor. These devices are difficult to open and require special handling. Do not attempt to recover the data yourself.
Establish a relationship with your disaster recovery vendor as quickly as possible. Most vendors have 24/7 phone coverage, and they may be able to provide additional stabilization and recovery advice, offer pickup service (particularly in major metropolitan areas), and provide special handling or packing instructions. Moreover, the sooner recovery begins, the greater the chance it will be successful.

The U.S. National Archives and Records Administration has a great list of security and other considerations that should be discussed with prospective vendors and incorporated into service contracts. I have only one thing to add: be honest about the nature of your disaster. If your media or device came into contact with water that may have contained biological or chemical hazards, tell the vendor about it. Vendors have the protective gear and equipment needed to work with contaminated material and they deal with embarrassing situations (e.g., "I dropped my camera in the toilet!") all the time, but they need to know what's coming their way.

As far as sending the media or device(s) to the vendor is concerned, follow the instructions provided by the vendor. However, you will probably be asked to do the following:
  • Place each piece of media and each device into a zippered plastic storage bag.
  • Surround each bagged piece of media or device with bubble wrap.
  • Pack the media or device(s) appropriately.
  • If sending portable media to a vendor, you may be able to use a rigid shipping envelope. You can also use a box at least twice as large as the media.
  • If sending device(s) to a vendor, use a box at least twice as large as the device
  • If using a box, immobilize the media or device(s) with packing material (N.B.: some vendors will request that each piece of media and each device be placed in its own box)
  • Ship to the vendor via overnight delivery service

Disclaimer: I am not liable for any losses or damages resulting from following any of the advice contained within this post.

Thursday, November 17, 2011

State government electronic records in the news

Two stories relating to the management and continued accessibility of state government records popped up on my radar screen earlier today. Both of them warrant watching; it doesn't seem as if either situation will be resolved any time soon.

The first involves gubernatorial records, an ever-present matter of interest and concern. Earlier today, the Boston Globe reported that during the last days of Republican presidential candidate Mitt Romney's tenure as governor of Massachusetts, eleven of his high-ranking staffers used personal funds to purchase their state-supplied hard drives and laptops, staff replaced all of the other computers in the governor's office, and all Romney-era e-mail was deleted from the office's e-mail servers. When Deval Patrick, a Democrat, took office, he and his staffers found an electronic blank slate.

Romney's position is that staffers who purchased hardware did so openly and that he and his staffers complied with all records laws. It does seem that the Romney administration did transfer a substantial body of records to the Massachusetts Archives: according to the Globe, the the repository holds 700-800 boxes of paper records documenting the Romney administration. However, it's not clear whether these records include print copies of the e-mails. The Globe doesn't provide detailed information about them, and the Massachusetts Archives doesn't have an online catalog or detailed online finding aids.

Secretary of State Bill Galvin, who oversees the Massachusetts Archives, told the Globe that the hardware purchases strike him as odd and that the gubernatorial e-mail should have come to the archives: "Electronic records are held to the same standard as paper records. There’s no question. They’re not in some lesser standard."

Romney's campaign manager asserts that the Patrick administration is making a stink about the hardware purchases, computer replacement, and e-mail deletion because it is acting as "an opposition research arm of the Obama reelection campaign." After the Globe story appeared this morning, he filed a state Freedom of Information Act request seeking "all email correspondence, phone logs, and visitor logs" documenting contacts between Patrick administration staffers and prominent Obama political advisers David Plouffe, David Axelrod, and Jim Messina. Governor Patrick’s chief legal counsel has stated that staff will "be happy to fulfill" this request.

I'm not an expert on Massachusetts records laws, so I'm going to have wait for the experts to weigh in on whether the actions of Governor Romney and his staff were legal. Do I wish that the e-mail had been preserved? Of course I do. I'm an archivist, and my job is to preserve records of enduring value and to provide access to them. Gubernatorial correspondence and internal memoranda, regardless of format, do have enduring value. Do I think that Governor Romney should be pilloried for destroying the e-mail? If he violated the law, I hope he gets what's coming to him. If he didn't, I hope that Governor Patrick and other Massachusetts politicians focus on strengthening laws concerning the retention and disposition of gubernatorial records.

Do I think that Governor Patrick brought up these issues in an effort to give President Obama a boost? I don't know. Patrick and Obama are close allies, so it's possible. However, I'm also under the impression that Governor Patrick has his own reasons for disliking Governor Romney, and I'm open to the possibility that he and his staffers are discussing the matter because they keep getting freedom of information requests for Romney-era records. I must admit that I am curious as to how well the Patrick administration is managing its own records.

The second relates to an outrage. As anyone who's been paying even the slightest attention to the American news media knows, former Penn State assistant football coach Jerry Sandusky was recently arrested on charges that he sexually molested eight young boys. Two university administrators have been charged with perjury, and the university's president and football coach have lost their jobs.

Questions as to precisely what the president, the coach, and other university administrators knew about Sandusky and when they knew it are rampant. However, Pennsylvania's Right to Know Law, which was extensively revised in 2008, explicitly exempts most records created by Penn State, Lincoln University, the University of Pittsburgh, and Temple University. As a result, there is a distinct possibility that only those e-mails, phone records, and other Penn State records introduced in open court will be disclosed to the public -- unless, as the New York Times urged earlier today, the Pennsylvania legislature and governor move to lift this exemption.

Publicly funded universities in many other states -- New York included -- are subject to freedom of information laws. For what it's worth, I really don't see why Penn State, Pitt, Temple, and Lincoln should be granted such sweeping exemptions, and I hope that Pennsylvania's law changes. At the very least, I hope that Penn State's new administrators recognize that openness and honesty are essential to restoring the university's good name and start releasing records of their own accord as soon as prosecutors permit them to do so.

Yes, I know that Penn State is going to be hit with civil lawsuit after civil lawsuit and that its lawyers would probably jump for joy if a fire or flood destroyed a ton of university records. However, the lawsuits will come and the cost of settling them will be staggeringly high no matter what the university does.

Of course, Penn State is not the only entity that has relevant records: Sandusky met the boys he is accused of sexually assaulting through The Second Mile, a charitable organization that he founded. However, earlier today, the New York Times reported that investigators have yet to locate some important Second Mile records:
Officials at the Second Mile . . . reported that several years of the organization’s records were missing and had perhaps been stolen. The missing files, investigators worry, may limit their ability to determine if Sandusky used charity resources — expense accounts, travel, gifts — to recruit new victims, or even buy their silence . . . .

Much of the [charity's] older paperwork was stored at an off-site records facility. The travel and expense records, for instance, had been sent over several years earlier. But select members of the charity’s board of directors were alarmed to learn recently that when the records facility went to retrieve them, some of those records — from about 2000 to 2003 — were missing.

. . . . Subsequently, the [Second Mile] foundation located apparently misfiled records from one of the years, but the rest seem to have disappeared.
As awful as the Sandusky-Penn State situation currently appears, I can't help but think that we've seen only the tip of the iceberg. All the more reason to be as honest and as open as possible. The sooner the truth comes out, the sooner the victims can focus on rebuilding their lives and the sooner Penn State can focus on rebuilding itself.