Sunday, May 29, 2011

Government social media records

Local, state, and federal governments are increasingly using social media to convey important information and to solicit feedback from citizens. However, governments and officials are still struggling to adapt to a Web 2.0 world. Michigan, for example, is actually taking down some social media content as a result of legal considerations and resource limitations. U.S. Representative Patrick McHenry (R-NC) and the U.S. Food and Drug Administration have recently learned that citizen feedback isn't limited to "likes" or approval. At this moment, U.S. Representative Anthony Weiner (D-NY) is probably wishing that he had never gotten a Twitter account. And Kent County, Delaware has recently issued a policy that bars employees from posting negative comments about their colleagues or county government -- whether via county-owned computers while at work or via their own computers and cell phones while off the clock .

What to do? I can't help you strike the right balance between the need to present an appropriate face to the public and the free speech rights of your employees -- the courts will probably do that -- but if you're a government official or employee contemplating using social media, be sure to check out the following resources:
Even a cursory glance at these resources will underscore the fact that, in most jurisdictions, social media content typically meets the statutory definition of a "public record" and must thus be managed properly. For tips on how to do so, consult the following:
At present, just about everyone seems to agree that most social media content has a short retention period. Unfortunately -- but not surprisingly -- there is no consensus regarding how best to capture and preserve content that has enduring value. There are lots of tools out there, and all of them have different features and save content differently. Given that most social media content will likely be destroyed within a relatively short timeframe, this isn't as big a problem as it might be. However, I suspect that those of us charged with capturing and preserving content deemed archival may run into some preservation problems a few years down the road -- and I hope that the federal government's 2009

My own experience is limited to capturing content created by others, which poses some additional challenges: some social media capture tools are expressly designed to help people preserve their own content and require full login rights. In such a situation, use of a Web crawler may be the best approach. I've experimented -- with decidedly mixed results -- with using OCLC's Heritrix-based Web Harvester to capture Facebook, IdeaScale, Twitter, and YouTube content, and I know several people have had somewhat greater success with Heritrix-based Archive-It service. If you're interested in exploring Web crawling of social media content, check out this nice list of Web crawling software and services.

If you're looking to preserve your own content, other options are available:
  • Several low- and no-cost tools that support capture and archiving of one's own Facebook and Twitter content are out there. For more information, consult April Edmonds's superb overview.
  • TwapperKeeper enables you to capture and preserve tweets (i.e., individual Twitter posts) that contain specified hashtags or keywords. Using this tool to capture all of the tweets created by a specific office may be a challenge, but it can be used to capture all of the tweets related to a specific subject or event. Sadly, the "download and export" and "API" components features present within the Web-based version of TwapperKeeper were recently removed at the behest of Twitter. However, it's still possible to install an open source version of the software that still includes these features on your own server.
  • A growing number of software companies, among them Arkovi, Backupify, LiveOffice, Smarsh, Sonian, and Symantec, are creating social media archiving tools or incorporating them into larger e-mail archiving products. If you're already using an e-mail archiving product, investigate whether it also supports social media archiving. If you're not, a stand-alone commercial product or service may meet your needs.
Finally, please note that, at present, the imperative to manage state and local government social media records may conflict with the terms of service agreements governing usage of social media services such as Facebook or Twitter; in many instances, these agreements limit the extraction or repurposing of content. The federal government has negotiated special agreements with many social media service providers, and the National Association of State Chief Information Officers has negotiated a model Terms of Service agreement for state and local government Facebook users and is currently seeking to develop similar agreements with other social media service providers, but it's likely going to be some time before the legal issues that might affect our ability to manage social media records are resolved conclusively.

Saturday, May 28, 2011

New NARA trustworthy digital repositories guidance

I've made it back to Albany and am wading through piles of mail and other stuff that either accumulated in my absence or simply wasn't dealt with before I left town. Included in that pile are a few nuggets of information that I wanted to pass on to you, and over the next few days, I'm going to do just that.

The U.S. National Archives and Records Administration has just released Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open Archival Information System (OAIS) Standard Reference Model. It contains a brief overview of the six core business processes -- ingest, archival storage, data management, administration, preservation planning, and access -- outlined in the OAIS Reference Model and a series of nicely thought-out questions that enable agencies to assess how well their current high-level policies, practices, and procedures support each of the six processes and to identify needed improvements.

As might be expected, this publication is directed at federal agency CIO's, program managers, and records officers. However, anyone charged with developing and implementing policies, procedures, and processes that support the long-term preservation of college/university, local or state government, or corporate digital materials should examine it closely. It's only twelve pages long and refreshingly free of jargon, which means that non-archivists and non-records managers may actually read it, and almost all of the questions it asks are broadly applicable. Strongly recommended.

Tuesday, May 24, 2011

Ohio public records controversy

Apologies for the virtually non-existent posting during the past few weeks. I've spent three of the past four weeks attending to family matters, and I've simply had to put this blog and a bunch of other things on the back burner.

At the moment, I'm in Ohio, which is the site of a fascinating court case centering upon management of and access to public records. Ohio's Public Records Law currently allows citizens to receive civil forfeitures from local governments that have improperly destroyed requested records, and late last month the Ohio Supreme Court heard oral arguments in a case involving Timothy Rhodes, who originally sought approximately $5 million in damages from the city of New Philadelphia, which destroyed audio recordings of roughly 20 years of 911 calls.

Rhodes initially submitted identical requests to several Ohio communities, all of which had destroyed their recordings. However, New Philadelphia was the only one that did not have an Ohio Historical Society-approved records schedule that gave it the legal right to destroy the recordings after they reached the end of their retention period.

New Philadelphia contends that Rhodes' multiple requests prove that he had no interest in the content of the records themselves and was seeking to exploit the Public Records Law for personal gain. Rhodes asserts that even though he did have a legitimate need for the information in the records -- he was involved in a group that opposed tax increases earmarked for countywide 911 services -- the Ohio Public Records Act states very plainly that records requesters do not need to explain why they want to see records or what they plan to do with the information the records contain.

A lower-court jury determined that Rhodes was seeking financial gain and ruled in favor of the city, an appeals court overturned the verdict on the grounds that Rhodes' motives were irrelevant, and now the matter is in the hands of the Supreme Court. Local governments throughout the state are eagerly awaiting the Supreme Court's ruling. Several other small cities are currently being sued by requesters seeking millions of dollars in civil penalties, and many, many others fear that a ruling in Rhodes' favor will lead to a deluge of requests from citizens seeking to turn bad records management practices into a source of personal income.

For what it's worth, I'm really of two minds about this case. I understand the fear of officials in New Philadelphia -- which is about an hour southeast of my current location -- and other localities about the ramifications of a ruling in Rhodes' favor. I don't have ready access to any statistics regarding open records requests, but my own experience and that of many colleagues at all levels of government suggest that public records requests are increasing in both volume and complexity. My experience also suggests that the local governments that would be most negatively affected by such a ruling are those that are least able to bear the burden: northern cities battered by decades of deindustrialization and southern small towns that struggled even when the rest of the state prospered and now compete to attract minimum-wage jobs.

At the same time, I believe that governments should take their records management responsibilities seriously, that citizens have the right to access public records, and that they shouldn't have to explain why they wish to access records. It's been a while since I've been in contact with anyone at the Ohio Historical Society, which serves as Ohio's state archives and records management agency, but it seems to me that the financial penalties outlined in the Public Records Act is a powerful inducement to get one's records management house in order; of course, one might ask whether the financial penalties for non-compliance might best be directed to a dedicated records management improvement fund or otherwise amended to reduce profiteering.

I have no idea when the Ohio Supreme Court will hand down its ruling, but I'm certainly looking forward to reading it.

Saturday, May 7, 2011

MARAC Spring 2011: New Tools to Address Electronic Records Challenges

Fifth order Fresnel lens used in the Jones Point lighthouse, Alexandria, Virginia, during the 19th century and now held by The Lyceum, Alexandria, Virginia, as seen on 5 May 2011.

The first session of the Spring 2011 Mid-Atlantic Regional Archives Conference focused on three electronic records research projects sponsored by the U.S. National Archives and Records Administration. All of them are intriguing, and all of them promise to help many electronic records archivists do their work.

Peter Bajscy (National Center for Supercomputing Applications) detailed the cloud-based solutions that he and his colleagues have developed in order to the challenges associated with the increasing number and complexity of file formats, the increasing volume of electronic records, growing hardware and software complexity, and ephemeral support for proprietary software. I haven’t had the opportunity to check out these tools, but I certainly will do so as soon as I get the chance:

Conversion Software Registry: A registry and freely accessible search tool that enables users seeking to convert files from one format to another to specify the format of the records with which they’re working and the desired preservation format and then review a list of appropriate conversion tools. Over 2,000 software packages are documented in the registry.
Polyglot: a cloud-based, open source conversion tool suitable for classified and proprietary information.
Versus [in development]: a tool that can compare original and converted versions of the same digital object -- simple and complex -- and evaluate resulting information losses. The results of these comparisons can be used to determine which preservation approach results in the least loss.

Bajcsy and his team are also interested in developing a Universal File Viewer, a cloud-based service that could provide a preview of files encoded in any format.

Bajcsy also posed a few questions for the audience to consider:
  • His team can deliver, on average, 1537.41 file conversions in one hour (50% utilization of a single CPU virtual machine and 50% virtual uptime of the virtual machine). Does this conversion rate meet archival needs?
  • How many file formats have you personnally encountered in your work?
  • Would the Universal File Viewer provide an added value?
  • Is data-driven file format selection for preservation a viable approach?
  • Is software robustness evaluation a viable approach to determining whether a given file is well-formed? (i.e., determining how many applications can open a given file might be a more practical means of determining well-formedness than comparing the file to the format specification.)
  • What is the value of data-driven evaluation of quality of software input/output functionality?
William Underwood (Georgia Tech Research Institute) then discussed his work on new tools for identifying file formats, identifying document types, and extracting metadata.

Archivists must identify file formats for a variety of reasons: assessing compliance with submission agreements/transfer memoranda, reading/playing files, conversion to standard or preservation formats, extracting information from archive files (e.g., .zip, .arc), password recovery and decryption, and repairing damaged files. In some instances, it may be possible to use external identifiers (e.g., file extensions, MIME types) to identify unknown formats. However, in some instances, external indicators are not sufficient., and the most popular analytical tools, the Linux file command and magic file, have some limitations: their output is sometimes ambiguous, they test output metadata as well as file types, and their tests for character set and language of text files are less than perfect.

Underwood and his colleagues are refining the Linux file command and magic file so that they produce file format signatures that can be compared to signatures of known file formats. To date, they have defined roughly 850 file format signatures and have collected examples of approximately 700 different file format types. They have also created a file signature database and, as moderator Mark Conrad noted afterward, contributed file signatures to the National Archives of the United Kingdom (NAUK) PRONOM file format registry; these signatures have been incorporated into DROID, NAUK’s open source file format identification tool.

Underwood and his colleagues are also testing new techniques for recognizing document types and extracting descriptive metadata. Their focus is on legacy documents that do not conform to XML document type definitions. They examine the intellectual form (i.e., structure) of these documents and then construct “intellectual grammars” for each document type (e.g., memoranda) and use intellectual extraction techniques to pull out names, dates, and other metadata elements.

Underwood noted in passing that after and his colleagues have extracted this metadata, they can write rules that enable us to create item-level descriptions. From those item-level descriptions, they can write rules that enable us to create file-level and then series-level descriptions. I was really struck by this statement, which suggests that automation is going to lead to some really intriguing -- and to many people unsettling -- changes in archival descriptive practice.

Underwood and his team hope to apply induction techniques to examples of a particular document type and generate a “document grammar” automatically and to expand their extraction techniques to include physical elements of documentary form (e.g., fonts) and document grammars of physical layouts. Cool stuff.

I really can’t do justice to the third presentation. Maria Esteva (Texas Advanced Supercomputing Center) and her colleagues are exploring possible archival uses of visualization technology, and, not surprisingly, her presentation included a lot of illustrations and multimedia material. If you want to get a sense of these materials look like -- and I recommend that you do so -- some of them are featured and on the Texas Advanced Supercomputing Center site and in this month’s issue of Discover magazine; the team has also outlined its findings here.

Visualization tools can be used to depict, compare, and contrast many different types of data. Esteva and her colleagues hope that visualization, which is often easier for the mind to grasp than lengthy textual or statistical analyses, will ultimately help to guide archival processing decisions, facilitate analysis large quantities of electronic records that consist of multiple document types and complex digital objects, and to enhance access to large, complex grouping of electronic records.

Using an electronic records testbed supplied by NARA, Esteva and her colleagues use a variety of automated techniques to identify groupings of files and data objects related by provenance and extract information about content and organization, and then place the resulting data in a relational database. They then use data mining, alignment algorithms, natural language processing, data distributions, and information classes to compare, contrast, and identify intellectual relationships between records and use visualization tools to create graphic representations of the results of their analyses: pie charts, network graphs, and, in particular, tree diagrams.

Esteva presented two visualization case studies that drew upon the testbed. The first highlighted how visualization could help archivists process electronic records by highlighting intellectual content and relationships that were not immediately apparent, assess preservation needs, and identifying other salient characteristics of the records. The other showed how visualization could help users identify collections that were particularly relevant to their research needs. Researchers searching for materials that contain specific intellectual content, have a specific provenance, were created at a specific time, exhibit specific patterns, or have some combination of these and other characteristics could visually assess which collections would be the most fruitful.

Maybe it’s a sign of advancing age (not to mention my fondness for the written word), but at this point I’m not quite convinced that researchers seeking records that possess a specific characteristic or cluster of characteristics will invariably prefer analyzing a tree diagram. However, I am intrigued by visualization technology and I think that we’ll soon come to accept that it can help us identify materials that have particular preservation needs, are responsive to specific freedom of information requests, or have specific traits or patterns that might have otherwise escaped our attention. In addition, I think that researchers will embrace visualization technology as an analytical tool; for example, many researchers will likely use relationship graphs to highlight patterns of interaction embedded within large clusters of e-mail messages.

Friday, May 6, 2011

MARAC Spring 2011: Archival Ethics and the Call of Justice

1315 Duke Street, Alexandria, Virginia, 5 May 2011. Between 1828-1861, this unassuming brick building was used as a holding pen for slaves awaiting sale in Natchez, New Orleans, or elsewhere; neighboring structures were also part of the city's slave trade district. It now home to the Northern Virginia Urban League and its Freedom House Museum, which documents the lives of the men, women, and children who were imprisoned here.

The Spring 2011 meeting of the Mid-Atlantic Regional Archives conference got off to a roaring start with Rand Jimerson’s thought-provoking plenary address, "Archival Ethics and the Call of Justice." Jimerson’s words have been bouncing around my head since this morning, and this post is an effort to nail some of them down. First, however, a disclaimer is in order. I’m a little sleep-deprived at the moment, and as a result some of the first half of Jimerson’s address bounced right off my benumbed skull. In other words, this post may not be fully faithful to his remarks. However, what I heard (or think I heard) got at least a few of my mental wheels spinning.

Jimerson began by summarizing several propositions put forth at a 2005 colloquium sponsored by the Nelson Mandela Foundation:
  • Archivists must avoid allowing normative conceptions of society to color the ways in which they select, acquire, and furnish access to materials.
  • Archivists must fight against destruction or neglect of records that document oppression.
  • (Oppressive regimes tend to be really good at documenting their crimes but attempt to destroy their records when their demise is imminent.)
  • Archivists must proactively create archives that reflect the full diversity of their societies.
  • Archivists should not be passive documenters of society but active participants in efforts to achieve social justice.
All of these propositions are new (or relatively new) to the archival profession, which has traditionally seen itself as objective and neutral. However, archives have traditionally served and reinforced he interests of entrenched power: their holdings reflect the words and deeds of the powerful, the successful, and the educated, and people and groups lacking one or more of these characteristics have either remained undocumented or documented only by records creators opposed to or indifferent to their experiences and perspectives. In recent years, Archivists have consciously started making an effort to make the documentary record more inclusive, but our emphasis upon provenance and upon the written word ensures that we are subtly biased toward the powerful and the influential.

Jimerson noted that many archivists might have trouble accepting that their work and their holdings reflect and perpetuate existing relations of power and might be deeply wary of the "call to justice" articulated in Johannesburg in 2005. However, he noted that it is possible to maintain professional standards of objectivity while at the same time accepting the impossibility of being personally neutral: as historian Thomas Haskell has asserted, a commitment to telling the truth does not prevent one from engaging in advocacy, but it does place certain intellectual limits on one’s advocacy. Moreover, answering the "call to justice" does not mandate that one adopt a particular partisan affiliation. However, it does mandate that one embrace and defend democratic values (e.g., government openness and transparency, the right of all citizens to participate fully in the life of their society and to have their histories and perspectives documented).

Jimerson then offered a variety of ways in which archivists can answer this "call of justice":
  • Ensure diversity in the archival record. The Society of American Archivists has recently identified the need for diversity in the record and in the profession as one of three key priorities, and this is a step in the right direction.
  • Welcome the stranger into the archives. We seek to include previously marginalized groups in archival documentation and ensure that they are full partners in the recordkeeping process. In the end, the entire community must be the provenance.
  • Base selection and appraisal decisions should be based upon clearly articulated and widely accessible criteria. We need to document our decisions.
  • Listen for oral testimony. Many peoples throughout the world -- including some residing in Canada and the U.S. -- do not write down their histories. If we do not seek out oral testimony and conduct oral histories, we will not know large parts of the world from the inside.
  • Make archival description sensitive to power relationships and conscious of the coded language that describes the social dynamics that led to their creation.
  • Make records accessible freely and openly, within the bounds established by privacy concerns and cultural concerns (e.g., access to tribal records).
  • Embrace new technologies. Social media and electronic records make it easier to make information widely available. Moreover, we need to embrace Kate Theimer’s conception of Archives 2.0: promote openness, flexible, user-centered, efficient, assessment-oriented.
  • Support open government, transparency, and democratic values.
  • Engage in public advocacy, which may include becoming whistleblowers when powerful people and groups try to destroy or alter records.
As noted above, Jimerson’s address was provocative. First, it made me painfully aware of the manner in which I still privilege the written word and literary aptitude. I came to archives as an aspiring labor historian seeking to recover the experiences of men and women who created few written records. My earliest work in archives focused on increasing the inclusivity of the documentary record, and I will argue to death the importance of ensuring the comprehensiveness of the historical record. I am nonetheless unduly impressed by people who “write well” and can be quite uncharitable toward people who are not proficient writers (especially if they’re hard-partying or unfocused undergraduates -- hence my decision not to finish my Ph.D. and go into academe).

I don’t think I will ever overcome this bias -- and in some respects I don’t really want to -- but Jimerson’s words were a stinging reminder that I need to be aware of it and to ensure that I go out of my way to treat with respect records creators, researchers, and other people who don’t embrace the written word as I do, to understand how they understand the world and document their histories, and to do what I can to ensure that they are equitably represented in the documentary record.

I also started thinking about the ways in which Jimerson’s ideas seem to be rooted in relatively recent developments in historical scholarship. The historians who pioneered the "new social history" -- "history from the bottom up" -- in the 1970s and 1980s began scouring records created by elites for information about the lives and perspectives of non-elite people: slaves, laborers, women of all classes, and racial, ethnic, and religious minorities. Barbara Hanawalt’s superb The Ties that Bound: Peasant Families in Medieval England, which mines records of royal inquiries into unnatural deaths for evidence of everyday peasant life, is a superb example of this sort of reading against the grain: Hanawalt was able to reconstruct how these largely illiterate men and women bathed and washed their clothes (yes, they did these things!), cared for children and the elderly, attempted to regulate sexual relationships and negotiate internal social hierarchies, distributed food and other essential resources, and grew crops, tended animals, and produced various necessities of life. Charles Joyner’s Down by the Riverside: A South Carolina Slave Community, which draws upon plantation owners’ diaries and records in addition to oral histories of and narratives written by former slaves, is another stellar example.

I can easily envision a scenario in which this sort of historical inquiry might be viewed as oppressive in and of itself. For example, one person whose life is partially documented in the records of government social service agencies might welcome the sort of inquiry undertaken by a social historian intent upon treating his or her subjects respectfully, but another might view it as yet another unwelcome and painful intrusion perpetuated by yet another educationally, socially, and economically privileged person. However, it strikes me that the philosophical commitments of the new social historians (e.g., belief in the inherent dignity and value of all persons, desire for a comprehensive and equitable historical record) are closely related to those of Jimerson’s justice-focused archivists. The new social history is still reshaping the archival worldview -- and, in my view, that’s a very good thing.

Thursday, May 5, 2011

Southern archives in need

As you all know, the southeastern United States experienced a record-breaking and devastating series of tornadoes. Approximately 350 people were killed, entire neighborhoods were destroyed, and recovering from the physical damage wreaked by the storms will likely take years.

Given the loss of life and property and the profound psychological impact of this catastrophe, worrying about the fate of historical records may seem like a trivial thing. However, records are essential to the recovery process: government records document the civic and property rights of citizens struggling to rebuild their lives, and other types of records can help to reconstruct the sense of place and context that disaster tears asunder.

Folks on the ground have the best grasp of the situation, but here's what I've been able to find via the Web:
  • The Pratt City branch of the Birmingham Public Library was severely damaged and will likely have to be torn down. However, the Pratt City Historical Archives, which was housed in the building, was not damaged.
  • Public libraries in Alabama, Georgia, Mississippi, and Tennessee were destroyed or badly damaged, and staffers at some other libraries have lost their homes. The Federal Emergency Management Agency recently added public libraries to its list of community resources that merit immediate assistance following a disaster, but these libraries and their communities have a long road ahead of them.
  • Some the archival and other records created by the city of Riverside, Alabama suffered water damage.
  • The genealogy and local history collections of the Dade County (Georgia) Public Library System were stored in a facility that suffered terrible damage. A contractor is currently sifting through the collections and identifying salvageable materials.
  • The Alabama Department of Archives and History is still trying to contact local governments in the hardest-hit parts of the state.
If past disasters are any guide, the coming weeks will bring a mix of news that is good and very, very bad. Please consider making a gift to a charitable organization helping people rebuild their lives and a gift to the National Disaster Recovery Fund for Archives.