Showing posts with label cloud computing. Show all posts
Showing posts with label cloud computing. Show all posts

Tuesday, December 4, 2012

2012 Best Practices Exchange, day one


Today was the first day of the 2012 Best Practices Exchange (BPE), an annual event that brings together archivists, librarians, IT professionals, and other people interested in preserving born-digital state government information. The BPE is my favorite professional event, in no part because it encourages presenters to discuss not only their successes but also the ways in which unexpected developments led them to change direction, the obstacles that proved insurmountable, and the lessons they learned along the way.

As I explained last year, those of us who blog and tweet about the BPE are obliged to use a little tact and discretion when making information about the BPE available online. Moreover, in some instances, what's said is more important than who said it. As a result, I'm going to refrain from tying some of the information that appears in this and subsequent posts re: the BPE to specific attendees.

I'm also going to keep this post short. Our opening discussion began at 8:30, the last session ended at 4:45, and I was in a Persistent Digital Archives and Library System (PeDALS) meeting until almost 6:00. The PeDALS crew then hit the streets of Annapolis, Maryland. We started off with ice cream at the Annapolis Ice Cream Company (yum!), and then three of us split off from the group, rested a bit and caught up on the day's e-mail, and had dinner at the Ram's Head Tavern (also yum!) The BPE resumes tomorrow at 8:30, and I'm presenting at the end of the day, so I'm going to highlight the most interesting tidbits of information that I picked up today and then head to bed.

Doug Robinson, executive director of the National Association of Chief Information Officers (NASCIO) was this morning's plenary speaker, and he made a couple of really interesting points:
  • CIOs are juggling a lot of competing priorities. They're concerned about records management and digital preservation, but, as a rule, they're not worried enough to devote substantial attention or resources to improving records management or addressing preservation issues.
  • Cloud computing is now the number one concern of state CIOs, and CIOs are starting to think of themselves less as providers of hardware and software than as providers of services. Moreover, the cloud is attractive because it reduces diversity and complexity, which drive up IT costs. Robinson suspects that most states will eventually develop private cloud environments. Moreover, a recent NASCIO survey indicates that 31 percent of states have moved or plan to move digital archives and records management into the cloud.
  • CIOs are really struggling with Bring Your Own Devices issues and mobile technology, and the speed with which mobile technology changes is frustrating their efforts to come to grips with the situation. Citizens want to interact with state government via mobile apps, but the demand for app programmers is such that states can't retain employees who can create apps; at present, only one state has app programmers on its permanent payroll.
  • Cybersecurity is an increasingly pressing problem. States collect and create a wealth of data about citizens, and criminals (organized and disorganized) and hacktivists are increasingly interested in exploiting it. Spam, phishing, hacking, and network probe attempts are increasingly frequent. Governors don't always grasp the gravity of the threats or the extent to which their own reputations will be damaged if a large-scale breach occurs. Moreover, states aren't looking for ways to redirect existing resources to protecting critical information technology infrastructure or training staff.
  • Most states allocate less than two percent of their annual budgets to IT. Most large corporations devote approximately five percent of their annual budgets to IT.
I had the privilege of moderating one of the afternoon sessions, "Tearing Down the Borders: Coast-to-Coast Archives; Record-Keeping in the Cloud," in which Oregon State Archivist Mary Beth Herkert discussed her state's development of a cloud-based electronic records management system for state agencies and local governments, Bryan Smith of the Washington State Digital Archives detailed some of the Digital Archives' recent technical innovations. They then discussed their joint, National Historical Publications and Records Commission-funded effort to explore expanding Oregon's records management system to Washington State and ingesting Oregon's archival electronic records into Washington's Digital Archives.

I was really struck by Mary Beth's explanation of the cost savings Oregon achieved by moving to the cloud. In 2007, the Oregon State Archives was able to develop an HP Trim-based electronic records management system for its parent agency, the Office of Secretary of State. It wanted to expand this system, which it maintained in-house, to all state agencies and local governments, but it couldn't find a way to push the cost of doing so below $100 per user per month. However, the State Archives found a data center vendor in a small Oregon town that would host the system at a cost of $37 per user per month. When the total number of users reaches 20,000 users, the cost will drop to $10 per user per month.

Bryan made a couple of really intriguing points about the challenges of serving as a preservation repository for records created by multiple states.  First, partners who don't maintain technical infrastructure don't always realize that problems may be lurking within their digital content.  Washington recently led a National Digital Information Infrastructure Preservation Program (NDIIPP) grant project that explored whether its Digital Archives infrastructure could support creation of regional digital repository, and the problems that Digital Archives staff encountered when attempting to ingest data submitted by partner states led to the creation of tools that enable partners to verify the integrity of their data and address any hidden problems lurking within their files and accompanying metadata prior to ingest.

Second, the NDIIPP project and the current Washington-Oregon project really underscored the importance of developing common metadata standards. The records created in one state may differ in important ways from similar records created in another state, but describing records similarly lowers system complexity and increases general accessibility. Encoding metadata in XML makes it easier to massage metadata as needed and gives creators the option of supplying more than the required minimum of elements.

I'm going to wrap up this post by sharing a couple of unattributed tidbits:
  • One veteran archivist has discovered that the best way to address state agency electronic records issues is to approach the agency's CIO first, then speak with the agency's head, and then talk to the agency's records management officer. In essence, this archivist is focusing first on the person who has the biggest headache and then on the person who is most concerned about saving money -- and thinking in terms of business process, not records management.
  • "If you're not at the table, you're going to be on the menu."
Image:  Maryland State House, Annapolis, Maryland, 4 December 2012.  The State House, which was completed in 1779, is the first state capitol building completed after the American Revolution, the oldest state capitol that has been in continuous legislative use and the only state house that has an all-wooden dome.

Friday, October 21, 2011

BPE 2011: emerging trends


The 2011 Best Practices Exchange (BPE) proceeds apace, and today I’m going to focus upon yesterday’s plenary session, which featured Leslie Johnston, the Director of Repository Development at the Library of Congress (LC). Johnston devoted a lot of time to discussing ViewShare, LC’s new visualization and metadata augmentation tool, but I’ll discuss ViewShare in a forthcoming post about some of the new tools discussed at this year’s BPE. Right now, I want simply to furnish an overview of her exhilirating and somewhat unsettling assessment of the changing environment in which librarians and archivists work:
  • Users do not use digital collections in the same way as they use paper collections, and we cannot guess how digital collections will be used. For example, LC assumed that researchers would want textual records, but a growing number of researchers want image files of textual records.
  • Until recently, stewardship organizations have talked about collections, series, etc., but not data. Data is not just generated by satellites, experiments, or surveys; publications and archival records also contain data.
  • We also need to start thinking in terms of “Big Data.” The definition of Big Data -- what can be easily manipulated with common tools and can be managed and stewarded by any one institutions -- is rather fluid, but we need to start thinking in these terms. We also need to be aware that Big Data may have commercial value, as evidenced by the increasing interest of firms such as Ancestry.com in the data found in our holdings.
  • More and more, researchers want to use collections as a whole and to mine and organize the collections in novel ways. They use algorithms to do so and new tools that create visual images that transform data into knowledge. For example, the Digging into Data project examined ways in which many types of information, including images, film, sound, newspapers, maps, art, archaeology, architecture, and government records, could be made accessible to researchers. One researcher wanted to digitally mine information from millions of digitized newspaper pages and see whether doing so can enhance our understanding of the past. LC’s experience with archiving Web sites also underscores this point. LC initially assumed that researchers would browse through the archived sites. However, researchers want access to all of the archived site files and to use scripts to search for the information they want. They don’t want to read Web pages. Owing to the large size of our collections, the lack of good tools, and the permissions we secured when LC crawled some sites, this is a challenge.
  • The sheer volume of the electronic data cultural stewardship organizations need to keep is a challenge. LC has acquired the Twitter archive, which currently consists of 37 billion individual tweets and will expand to approximately 50 billion tweets by year’s end. The archive grows by 6 million tweets an hour. LC is struggling to figure out how best to manage, preserve, and provide comprehensive access to this mass of data, which researchers have already used to study the geographic spread of the dissemination of news, the spread of epidemics, and the transmission of new uses of language.
  • We have to switch to a self-serve model of reference services. Growing numbers of researchers do not want to come to us, ask questions of us, and then use our materials in our environment. They want to find the materials they need and then pull them out of our environment and into their own workspaces. We need to create systems and mechanisms that make it easy for them to do so. As a result, we need to figure out how to support real-time querying of billions of full-text items and the frequent downloading by researchers of collections that may be over 200 TB each. We also need to think about providing tools that support various forms of collection analysis (e.g., visualization).
  • We can’t be afraid of cloud computing. Given the volumes of data coming our way and mounting researcher demands for access to vast quantities of data, the cloud is the only feasible mechanism for storing and providing access to the materials that will come our way. We need to focus on developing authentication, preservation, and other tools that enable us to keep records in the cloud.
There’s lots and lots of food for thought here -- including a few morsels that will doubtless induce indigestion in more than a few people -- and it’s just a taste of what’s coming our way. If we don’t come to terms with at least some of these changes, we as a profession will really suffer in the coming years. Let's hope that we have the will and the courage to do so.

A bottle of locally brewed Kentucky Bourbon Barrel Ale at Alfalfa Restaurant, Lexington, Kentucky, 20 October 2011. I highly recommend both the ale and the restaurant, but please note that Kentucky Bourbon Barrel Ale is approximately 8 percent alcohol. Just like the BPE, it's a little more intoxicating than one might expect.

Thursday, October 20, 2011

BPE 2011: ERA and the move to the cloud


This week, I’m spending a little time with my parents in Ohio and at the 2011 Best Practices Exchange (BPE) in Lexington, Kentucky. The BPE, which brings together state government, academic, and other archivists and librarians and other people seeking to preserve state government enduring information of enduring value, is my favorite archival conference. The Society of American Archivists annual meeting is always first-rate, but it’s gotten a little overwhelming, and I love the Mid-Atlantic Regional Archives Conference (MARAC), but nothing else has the small size, tight focus on state government records, informality, and openness that characterize the BPE.

Before I start detailing today’s highlights, I should say a few things about the content of these posts. For the past few years, those of us who have attended the BPE have tried to adhere to the principle that “what happens at BPE, stays at BPE.” This doesn’t mean that we don’t share what we’ve learned at the BPE (hey, I’m blogging about it!), but it does mean that we’re sensitive to the fact that candor is both essential and risky. The BPE encourages people to speak honestly about how and why projects or programs went wrong and what they learned from the experience. Openness of this sort is encouraging; all too often, we think that we’re alone in making mistakes. It's also helpful: pointing out hidden shallows and lurking icebergs helps other people avoid them. However, sometimes lack of senior manager commitment, conflicts with IT personnel, and other internal problems contribute to failure, and colleagues and supervisors occasionally regard discussion of internal problems as a betrayal. As a result, BPE attendees should exercise some discretion, and those of us who blog about the BPE should be particularly careful; our posts are a single Web search away. As a result, in a few instances I may write about the insights and observations that attendees have shared but obscure identifying details.

Moving on to this year's BPE itself, I'm going to devote the rest of this post to the insights and predictions offered up by U.S. National Archives and Records Administration (NARA) Chief Information Officer Mike Wash, who spoke this morning about the Electronic Records Archives (ERA), NARA’s complex, ambitious, and at times troubled electronic records system, and some changes that are on the horizon.

At present, ERA sort of works: staff use it to take in, process and store electronic records. At present, ERA holds approximately 130 TB of data. The Office of Management and Budget wants NARA to take in 10 TB of data per quarter, and NARA is working with agencies to meet this benchmark. However, ERA lacks an integrated access mechanism, and it contains multiple modules. The Base module handles executive agency data, the EOP module handles presidential records (and includes some internal access mechanisms), the Classified module holds classified records, and several other modules were built to deal with specific problems.

Building ERA taught NARA several lessons:
  • Solution architecture is critical. ERA’s multiple modules are a sign of a failed system architecture. Anyone building such a system must carefully consider the business and technical architecture carefully during the planning stage and must manage the architecture carefully over time.
  • The governance process must be clear and should start with business stakeholders. What do they really need the system to do, and how do you ensure that everyone stays on the same page throughout the process? Information technology invariably challenges control and authority, but if you set up your governance process properly, you should be able to retain control over system development.
  • Over communicate. Funders and other powerful groups need frequent updates; failure to keep feeding information to them can be profoundly damaging.
  • You must manage the project. The federal government tends to hire contractors to develop IT systems, and contractor relationships tend to deteriorate about six months after the contract is awarded. Most federal agencies cede authority to contractors because they are loath to be seen as responsible in the event that a project fails, but staying in control of the project increases your chances that you'll get the system you want.
  • Watch costs closely. Cost-escalating provisions have a way of sneaking into contracts.
  • Be mindful of intellectual property issues. The federal government typically reserves the right to all intellectual property created as a result of contracts, but this doesn’t always happen, and the vendor that built the first iteration of ERA has asserted that it controls some of the technology that now makes the system work; NARA will be much more assertive in working with future ERA vendors.
Wash also made some intriguing observations about some of the challenges that NARA and other archives are confronting:
  • At present, our ability to acquire data is limited by bandwidth limitations. It takes more than three days to convey 20 TB of data over a 1 gbps data line and at least a month to convey it via the Internet. NARA recently took custody of 330 TB of 2010 Census data, and it did so by accepting a truckload of hardware; at present, there are no alternatives to this approach.
  • The rate of data creation continues to accelerate. The administration of George W. Bush created 80 TB of records over the course of 8 years, but the Obama administration likely created more than 80 TB of data during its first year.
Wash indicated that NARA is starting to think that federal records should be created and maintained in a cloud computing environment and that transfer of custody from the creating agency to NARA should be effected by changing some of the metadata associated with the records being transferred.

Wash noted that the move to cloud computing will bring to the fore new preservation and authentication concerns. It also struck me that the transition that Wash envisions assumes the existence of a single federal government cloud that has adequate storage, security, and access controls and that, at least at this time, many states aren’t yet thinking of constructing such environments. Individual state agencies may be thinking of moving to the cloud, but most states don't seem to be preparing to move to a single, statewide cloud environment. Moreover, owing to its sheer size, the federal government is better able to negotiate favorable contract terms than state or local governments; the terms of service agreements that the feds hammered out with various social media providers are an excellent example. I have the uneasy feeling that some governments will accept, out of lack of knowledge, desperate financial straits, or inability to negotiate optimal terms, public cloud service contracts that prove problematic or outright disastrous.

Its nonetheless apparent that government computing will move into the cloud, that this transition offers both new challenges and new opportunities for managing and preserving records, and that archivists and records managers are going to have come to grips with these changes. The next decade promises to be most interesting.

The Lexington Laundry Company building on West Main Street, Lexington, Kentucky, 20 October 2011. This little gem was built ca. 1929, is an outstanding example of Art Deco architecture in the city, and is part of Lexington's protected Downtown Commercial District. It now houses an art gallery.