Wednesday, January 5, 2011

The infopocalypse is upon us

Last week, the Boston Phoenix published an article by Chris Faraone highlighting how local, state, and federal governments are struggling to manage ever-increasing amounts of digital data. Provocatively titled "Infopocalypse: The Cost of Too Much Data," Faraone notes that:
The United States Census Bureau alone maintains about 2560 terabytes of information -- more data than is contained in all the academic libraries in America, and the equivalent of about 50 million four-drawer filing cabinets of documents.
Other federal agencies have similarly mind-boggling quantities of data, and state and local governments are also amassing vast stores of digital information.

Not surprisingly, "public data remains, by and large, a disorganized mess." Governments don't know precisely what they have or how to make best use of it, and old, paper-centered ways of responding to freedom of information requests and performing other essential functions persist.

Why does this situation exist? In my humble opinion, Faraone has nailed the root causes:
There is too much data. Digital storage is not a natural resource. The amount of information that government agencies may be required to keep — from tweets and e-mails to tax histories — is growing faster than the capacity for storage.
There's not enough manpower to manage all this data. The Obama administration hopes to make more information freely available online. But in the meantime, the old method of requesting data from the government -- filing a FOIA request -- is bogged down due to an insufficient workforce and long request backlogs.
Private companies are storing public data. This trend in outsourcing, largely the result of too much data and too little manpower, is a potential threat to both access and security, as resources that belong to the people are entrusted to outside vendors, raising new privacy concerns.
What to do about this situation? As Faraone notes, the data center consolidation strategy being pushed by Vivek Kundra, the Chief Information Officer of the United States, may help, but it's only a start. Faraone also suggests -- correctly -- that hiring additional staff who can process freedom of information requests and making readily available online data that doesn't contain legally restricted or, in the federal environment, classified information would also improve things a bit.

However, none of these things will solve the problem, which, as Sunlight Foundation policy director John Wunderlich pointed out to Faraone, is in many ways akin to that posed by the explosive growth of paper government records during the first two-thirds of the 20th century:
"Back then [government agencies] didn't know what to throw out, what to standardize, or how to organize. The challenges we face in data are in similar scope -- that's why it's so important that these issues are addressed head-on before it's too late."
Surprisingly, Faraone makes no mention of the U.S. National Archives and Records Administration (NARA), which works with agencies to figure out how to standardize and organize their records and how and when to dispose of records that have reached the end of their useful life, or of the role that agency records managers have -- or, as is all too often the case, should have -- in ensuring that all agency records are properly managed. Hiring some records management personnel -- at NARA, the 50 state archives, larger local governments, and larger government agencies -- would no doubt help to reduce agency storage pressures.

However, the more I work with electronic records, the less convinced I am that simply hiring a few more records managers will make everything better. We forget sometimes that formalized records management theory and practice are not mere outgrowths of common sense. They were practical responses to the challenges posed by the deluge of paper records created by ever-larger and ever more complex organizations. The infopocalypse that we face is in some respects quite similar to that which confronted our mid-20th century predecessors, but it is also, at least in some respects, unique. Addressing the challenges associated with our infopocalypse successfully will likely mean a shift in thinking no less monumental than that which propelled the rise of records management as a discipline and More Product, Less Process archival processing.

What will this shift in thinking look like? I don't know. I anticipate I that we're going to focus less on one-on-one guidance and more on standards development and automation of tasks now performed by humans. I also expect that our definitions of "record" and "records series" will be altered significantly and I suspect that, at some point in the future, be discarded altogether.

Yeah, I'm scared, too. However, our mid-20th century predecessors were as shaken by the changes in their record-keeping environment as we are by the changes in ours. They chose to meet those challenges head-on, and, after a lot of hard work and mistakes along the way, eventually developed workable solutions to complex problems. If we have any interest in surviving -- which may well mean evolving from "archivists" and "records managers" into "digital preservationists" or "data curators" or somesuch -- we'll take our lumps and do the same.

No comments: