The 2011 Best Practices Exchange (BPE) proceeds apace, and today I’m going to focus upon yesterday’s plenary session, which featured Leslie Johnston, the Director of Repository Development at the Library of Congress (LC). Johnston devoted a lot of time to discussing ViewShare, LC’s new visualization and metadata augmentation tool, but I’ll discuss ViewShare in a forthcoming post about some of the new tools discussed at this year’s BPE. Right now, I want simply to furnish an overview of her exhilirating and somewhat unsettling assessment of the changing environment in which librarians and archivists work:
- Users do not use digital collections in the same way as they use paper collections, and we cannot guess how digital collections will be used. For example, LC assumed that researchers would want textual records, but a growing number of researchers want image files of textual records.
- Until recently, stewardship organizations have talked about collections, series, etc., but not data. Data is not just generated by satellites, experiments, or surveys; publications and archival records also contain data.
- We also need to start thinking in terms of “Big Data.” The definition of Big Data -- what can be easily manipulated with common tools and can be managed and stewarded by any one institutions -- is rather fluid, but we need to start thinking in these terms. We also need to be aware that Big Data may have commercial value, as evidenced by the increasing interest of firms such as Ancestry.com in the data found in our holdings.
- More and more, researchers want to use collections as a whole and to mine and organize the collections in novel ways. They use algorithms to do so and new tools that create visual images that transform data into knowledge. For example, the Digging into Data project examined ways in which many types of information, including images, film, sound, newspapers, maps, art, archaeology, architecture, and government records, could be made accessible to researchers. One researcher wanted to digitally mine information from millions of digitized newspaper pages and see whether doing so can enhance our understanding of the past. LC’s experience with archiving Web sites also underscores this point. LC initially assumed that researchers would browse through the archived sites. However, researchers want access to all of the archived site files and to use scripts to search for the information they want. They don’t want to read Web pages. Owing to the large size of our collections, the lack of good tools, and the permissions we secured when LC crawled some sites, this is a challenge.
- The sheer volume of the electronic data cultural stewardship organizations need to keep is a challenge. LC has acquired the Twitter archive, which currently consists of 37 billion individual tweets and will expand to approximately 50 billion tweets by year’s end. The archive grows by 6 million tweets an hour. LC is struggling to figure out how best to manage, preserve, and provide comprehensive access to this mass of data, which researchers have already used to study the geographic spread of the dissemination of news, the spread of epidemics, and the transmission of new uses of language.
- We have to switch to a self-serve model of reference services. Growing numbers of researchers do not want to come to us, ask questions of us, and then use our materials in our environment. They want to find the materials they need and then pull them out of our environment and into their own workspaces. We need to create systems and mechanisms that make it easy for them to do so. As a result, we need to figure out how to support real-time querying of billions of full-text items and the frequent downloading by researchers of collections that may be over 200 TB each. We also need to think about providing tools that support various forms of collection analysis (e.g., visualization).
- We can’t be afraid of cloud computing. Given the volumes of data coming our way and mounting researcher demands for access to vast quantities of data, the cloud is the only feasible mechanism for storing and providing access to the materials that will come our way. We need to focus on developing authentication, preservation, and other tools that enable us to keep records in the cloud.
A bottle of locally brewed Kentucky Bourbon Barrel Ale at Alfalfa Restaurant, Lexington, Kentucky, 20 October 2011. I highly recommend both the ale and the restaurant, but please note that Kentucky Bourbon Barrel Ale is approximately 8 percent alcohol. Just like the BPE, it's a little more intoxicating than one might expect.