l'Archivista: collaborative tools

Showing posts with label collaborative tools. Show all posts

Monday, September 21, 2009

BPE 2009: collaboration

Robert Vitello and Bill Travis detail the origins and goals of the New York State Economic Security and Human Services Advisory Board, Best Practices Exchange, 3 September 2009.

[I had hopes of wrapping up my Best Practices Exchange blogging last week, but life had other plans. I really wish I could say that I'm slow blogging, but unfortunately I'm merely late blogging -- and at present there's no manifesto for that.]

One of the most interesting Best Practices Exchange sessions I attended highlighted a couple of really productive collaborations.

The first presenter, Nancy Adgent of the Rockefeller Archive Center (RAC), discussed the Collaborative Electronic Records Project (CERP), which allowed the RAC and the Smithsonian Institution Archives (SIA) to develop tools for the preservation of e-mail.

Although the two institutions had some common strengths -- forward-thinking and pro-active directors, similar collecting policies, and above-average staffing levels -- they differed in their governance structures, level of authority over records creators, funding streams, staffing levels, and the e-mail formats for which they were responsible. They also had to contend with the challenges posed by physical distance, the need to develop a new knowledge base, various administrative and staffing problems, and the SIA's quasi-governmental status, which eliminated several sources of funding that the RAC could have otherwise pursued.

These differences and challenges forced the RAC and the SIA to develop e-mail tools that could handle a variety of of e-mail formats. It also exposed a number of issues that other archives might encounter: inadvertent changes wrought by global software upgrades pushed out to the SIA's networked CERP computers (but not the RAC's machines, which remained offline), and differences in the capacity of various virus detection applications.

Nancy then provided a brief overview of the tools that CERP uses to process e-mail, among them Aid4Mail, which converts Microsoft PST files to Microsoft .msg format and allowed staff to identify and remove non-record messages, and various tools that convert messages in various formats to the MBOX format, which CERP's parser converts to XML for preservation purposes. She also discussed how CERP and the E-mail Collection and Preservation (EMCAP) project, which also sought to use XML to preserve e-mail, developed a common XML schema.

Nancy made a really great closing point: odd couples can produce some good offspring! Even though the RAC and the SIA produced different guidance products tailored to the needs of their respective donor communities and their own institution-specific workflow processes, procedures, and forms, they developed and tested common tools for processing and preserving e-mail. And they look like really great tools! We're anticipating a transfer of e-mail pretty soon, and I'm really looking forward to giving CERP's parser a spin.

The next presentation was delivered by two New York State agency CIO's -- Bill Travis of the Office of Children and Family Services and Robert Vitello of the Department of Labor -- and focused on the work of the New York State Economic Security and Human Services Advisory Board. It underscored how shared problems can sometimes give rise to really effective collaboration.

Several years before the State CIO took office, the State had purchased a suite of out-of-the-box products that had been purchased to manage various human services programs and services. CIOs of agencies that were using these products had begun meeting to discuss that problems they encountered as they tried to make these products fit the State's county-administred, state-supervised model of service provision.

The agencies ultimately informed the State CIO that they would not use these products, and she accepted their decision. However, she also challenged them to develop an enterprise-wide approach. For years, the federal government has forced state human services agencies to construct IT silos, but the situation has changed in recent years, and there is real potential for cost savings is (the board's member agencies account for 70 percent -- approximately $1 billion per year -- of the State's IT expenditures)

The board has established a series of guiding principles:

Provide for interoperability using open standards and seamless data sharing through common enterprise systems.
Deploy an "Open New York" community approach to facilitate peer review and enhance quality control.
Leverage prior IT investments with software reuse when feasible to achieve greater cost efficiencies.
Implement agile systems development approaches to improve speed to market
Establish strong enterprise governance to ensure alignment of technology plans with business goals
Seek innovative collaborations to leverage State enterprise IT resources and assets

More information about these guiding principles is outlined in the board's January 2008 strategy document, and information about the board's work appears in its September 2009 progress report.

I was really struck by how Travis, Vitello, and the other board members were able to capitalize on their willingness to pool their expertise and share information. Thanks to this combination of characteristics -- plus strong support from the State CIO -- they've been able to make real headway, and it will be interesting to see how their work evolves. I get the sense that my employer will be well-positioned to do so: the board is just starting to focus on e-discovery and its relationship to records management.

Thursday, June 25, 2009

NDIIPP project partners meeting, day two

The National Digital Information Infrastructure Preservation Program (NDIIPP) partners meeting enables recipients of NDIIPP grant monies to discuss their work, seek feedback, and identify areas of common interest.

Today’s sessions began with Michael Nelson (Old Dominion University), who outlined the Synchronicity project, which will enable end users to recover data that has “vanished” from the Web. Synchronicity is a Firefox Web browser extension that catches that “File Not Found” messages that appear when a user attempts to access a page that no longer exists or no longer contains the information that the user seeks. It then searches Web search engine caches, the Internet Archive, and various research project data caches and retrieves copies of the information that the user seeks.

If these sources fail to provide the desired information, Synchronicity then generates a search engine query based upon what the message is “about” and attempts to find the information on the Web. These queries are based on “lexical signatures” (e.g., MD5 and SHA-1 message digests) and page titles, and preliminary research indicates that these searches are successful about 75 percent of the time. Nelson and his colleagues are currently exploring other methods of locating “lost” content and how to handle pages whose content have changed over time.

I’ve often used the Internet Archive and Google’s search caches to locate information that has vanished from the Web, and I’m really looking forward to installing the Synchronicity plug-in once it becomes available.

Michelle Kimpton of DuraSpace then discussed the DuraCloud project, which seeks to develop a trustworthy, non-proprietary cloud computing environment that will preserve digital information. In cloud computing environments, massively scalable and flexible IT-related capabilities are provided “as a service” over the Internet. They offer unprecedented flexibility and scalability, economies of scale, and ease of implementation. However, cloud computing is an emerging market, providers are motivated by profit, information about system architectures and protocols is hard to come by, and as a result cultural heritage institutions are rightfully reluctant to trust providers.

DuraCloud will enable institutions that maintain DSpace and FEDORA institutional repositories to preserve the materials in their repositories in a cloud computing environment; via a Firefox browser extension, it will also allow users to identify content that should be preserved. A Web interface will enable users to monitor their data and, possibly, run services.

DuraCloud members to create and manage multiple, geographically distributed copies of their holdings, monitor their digital content and verify that they have not been inadvertently or deliberately altered, and take advantage of the cloud’s processing power when doing indexing and other heavy processing jobs. It will also provide search, aggregation, video streaming and file migration services and will enable institutions that don’t want to maintain their institutional repositories locally to do so within a cloud environment.

The DuraCloud software, which is open source, will be released next month, and in a few months DuraSpace itself will conduct pilot testing with a select handful of cloud computing providers (Sun, Amazon, Rackspace, and EMC) and two cultural heritage institutions (the New York Public Library and the Biodiversity Heritage Library).

Fascinating project. We’ve known for some time that DSpace and FEDORA are really access systems, but lots of us have used them as interim preservation systems because we lack better options.

The next session was a “breakout” that consisted of simultaneous panels focusing on one or two NDIIPP projects. The Persistent Digital Archives and Library System (PeDALS) project was featured in a session that focused on digital preservation contracts and agreements. The first half of the session consisted of an overview of the contracts and agreements that support a variety of collaborative digital preservation initiatives:

Vicki Reich discussed the CLOCKSS Archive, which brings together libraries and publishers on equal terms and provides free public access to materials in the archive that are no longer offered for sale.
Julie Sweetkind-Singer detailed the provider agreements and content node agreements that govern the operations of the National Geospatial Digital Archive.
Myron Gutman discussed the development Data-PASS, which grew out of previous collaborations between the project’s partners and lengthy experience preserving social science data.
Dwayne Buttler, an attorney who was instrumental in crafting the agreements that support the operations of the MetaArchive Cooperative, emphasized that contracts, which focus on enforceability, grow out of a lack of trust and allow for simultaneous sharing and control; in contrast, agreements articulate goals.

The second half of the session focused solely on PeDALS. Richard Pearce-Moses (Arizona State Library and Archives; principal investigator), Matt Guzzi (South Carolina Department of Archives and History), Alan Nelson (Florida State Library and Archives), Abbie Norderhaug (Wisconsin Historical Society), and Yours Truly (New York State Archives) informally discussed some of the lessons that we’ve learned as the project unfolded. Among them:

People involved in long-distance collaborative projects need structured, consistent activities and expectations of involvement; both are key to fostering a sense of project ownership.
Lack of face-to-face interaction makes it harder for people to feel engaged; conference calls and other tools can help bridge the gap, but nothing really takes the place of getting to know other people.
Working in smaller teams capitalizes upon our strengths -- provided that we make sure that the right mix of IT, archival, and library personnel are involved.
Team members must be open to learning as they go and creative and innovative.
Working on this project has brought to light a number of challenges: communication and collaboration over long distances and multiple time zones, differences in organizational cultures, responsibilities, and IT infrastructures, learning to speak each other’s languages, and finding the right IT consultant.
We are nonetheless rowing in the same direction: we’ve learned to balance local practice with common requirements, and individual partners are beginning to incorporate PeDALS principles and standards into their current cataloging and other work.

In addition, Alan Nelson discussed how the IT personnel involved in the group have adopted the Agile Scrum process . . . and illustrated the difference between involvement and commitment.

The second “breakout” session took place after lunch, and the session I attended focused on building collaborative digital preservation partnerships:

Bill Pickett discussed the Web History Center’s efforts to provide online access to archival materials documenting the development of the World Wide Web and the organization’s need for partners.
David Minor outlined the work of the Chronopolis consortium, which is striving to build a national data grid that supports a long-term preservation (but not access) service.
Martin Halbert detailed the work of the MetaArchive, a functioning distributed digital preservation network and non-profit collaborative.
Beth Nichol discussed the Alabama Digital Preservation Network, which grew out of work with the MetaArchive and a strong history of informal statewide collaboration.

During the follow-up discussion, Martha Anderson (Library of Congress) made a really interesting point: according to an IBM study that LC commissioned, the strongest digital preservation organizations are focused on content; weaker groups are focused on tools. The study also found that tool-building works really well when there is a community interest in a tool and a central development team and that natural networks that grow out of years of other collaborative work also lead to the creation of strong organizations; however, there are other ways to build trust.

The end of the day brought all of the attendees back together. Abby Smith of NDIIPP provided an update on the work of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access, which focuses on materials in which there is clear public interest and seeks to frame digital preservation and access as a sustainable economic activity (i.e., a deliberate, ongoing resource allocation over long periods of time), articulate the problems associated with digital preservation, and provide practical recommendations and guidelines.

Economic sustainability requires recognition of the benefits of preservation, incentives for decision-makers to act, well-articulated criteria for selection of materials for preservation, mechanisms to support ongoing, efficient allocation of resources, and appropriate organization and governance. As a result, the task force’s work -- an interim report released last December and a forthcoming final report -- is directed at people who make decisions about the allocation of resources, not people who are responsible for the day-to-day work of preserving digital information.

Smith wrapped up by making a series of thought-provoking points:

Preservation is a derived demand, and people will not pay for it. However, they will pay for the product itself. We need to think of digital information as being akin to a car: it’s something that has a long life but requires periodic maintenance.
Everything in the digital preservation realm is dynamic and path-dependent: content changes over time, users change over time, and uses change over time. Decisions made now close off future options.
Librarians and archivists are the defenders of the interests of future users, and we need to emphasize that we are accountable to future generations.
Fear that digital preservation and access are too big to take on is a core problem.

IMHO, Smith's last point succinctly identifies the biggest barrier to digital preservation.

Tuesday, April 28, 2009

MARAC: Wikis Here, There, and Everywhere

Well, here it is, a mere ten days late: my final MARAC Spring 2009 post. I think I’m going back to the daily post style that I used at SAA 2008 -- unless, of course, anyone out there has a better idea . . . .

The last session I attended highlighted the many different ways in which archives are using wikis. I learned a few things about the varied uses to which wikis can be put . . . and a few things about why my own experiences with them have been less than satisfactory.

Kate Colligan outlined her use of a wiki to support the University of Pittsburgh's processing of the records (1887-1973) of the Allegheny County (Pa.) Coroner. Approximately 30 people, most of them undergraduate interns, ultimately participated in this project, which involved the flattening, rehousing, and indexing of approximately 220,000 trifolded documents.

In order to sustain the interns’ interest in the project and satisfy the writing component of their internships, Colligan created the Coroner Case File Documentation Wiki. This wiki allowed the interns to share in real time interesting things they found within the records, add descriptive tags, supply file arrangement information, and document their responses to files concerning murders, suicides, and accidents. Colligan also gave students research assignments that broke up the monotony of (and sometimes disrupted) processing, and this research is reflected in the wiki’s detailed timeline of life in Pittsburgh.

Colligan concluded that when working with wikis, immediacy is a more important goal than perfect writing and presentation. One should also have a clear sense of one’s target readership. In the final analysis, the core readership of this wiki seems to have been the project staffers themselves; however, the wiki has been discussed in genealogical chat rooms and has gotten a fair amount of international traffic.

Finally, Colligan noted that the creation of the wiki means that the preservation issues associated with this project have grown to encompass digital materials. She isn’t sure what the future holds for this wiki, but it has survived a recent migration from an older version of the wiki software (PBWiki) to a newer one (PBWorks).

Jean Root Green succinctly discussed the Binghamton University Libraries’ internal staff wiki. The wiki (created with MediaWiki) has been in place since 2005, and its unveiling was accompanied by a lot of staff training and the development of style guides, templates, and resources that made it easier for staff to use the wiki appropriately. She stressed that the careful planning that went into the development of the wiki and its supporting materials is crucial to the wiki’s success: even people who generally aren’t comfortable with technology feel comfortable making use of the wiki.

The wiki enables staff to discuss internal matters candidly, collaborate on policy and other documents, and it automatically records and tracks changes. It has pages for all projects, committees, task forces, etc., and includes documentation for and links to additional information about all of the libraries’ information technology systems. In addition, it enables staff to publicize collections internally and post reports about conference sessions and other professional development events that they have attended.

David Anderson detailed how George Washington University’s Special Collections Research Center used MediaWiki to create the George Washington University and Foggy Bottom Historical Encyclopedia. Unlike paper encyclopedias, which fade from consciousness soon after publication, this encyclopedia is online, constantly updated, and frequently consulted.

Work on the encycopedia began in 2006, when Anderson created templates and instructions for adding content, and to this day it adheres more closely to the traditional scholarly model of enyclopedia production than to the interactive Wikipedia model: two editors initially oversaw the development of the enyclopedia, and Anderson now serves as the gatekeeper for all additions and revisions. I suspect that Anderson and his colleagues were drawn to MediaWiki not because it can incorporate user-generated content but because it’s free and easy to use.

Scanned documents, articles written by faculty, staff, and students, timelines, and other materials are regularly added to the encyclopedia. At this time, there are 2,910 items in the database and 648 legitimate content pages; each photo is counted as a separate page, hence the discrepancy. There have been over 2 million page views to date. The most popular pages are the main page, the A-Z listing of campus buildings, and pages dedicated, among other things, to football (the university hasn’t fielded a team since 1966), distinguished alumni, Muhummad Ali (who one spoke on campus), various aspects of student life, and cheerleading.

Anderson noted that Google and other search engines have indexed these pages, and as a result he and his colleagues have gotten some non-historical reference inquiries; as a result, he has modified some pages to include pointers to, e.g., campus events calendars.

I’m glad I attended this session. Wikis really are suited to the sort of internal information-sharing that Jean Green discussed, and can readily serve as the backbone of scholarly Web projects of the sort that David Anderson developed. Kate Colligan’s processing wiki is also a great use of the technology; such wikis can capture information that might otherwise remain unrecorded.

However, wikis also have their limits, and this session led me to realize that my colleagues and I have sometimes used wikis not because they were the best tool for the job but because they were the least awful of the available IT options. In some instances, we actually need is something that combines the best features of, e.g., Microsoft Word (i.e., ability to create long, complex, highly formatted documents) with the ease of use and change tracking features of the best wiki software -- without the clutter and chaos of, e.g., Track Changes. If you have any suggestions, I would be most appreciative.

l'Archivista

Monday, September 21, 2009

BPE 2009: collaboration

Thursday, June 25, 2009

NDIIPP project partners meeting, day two

Tuesday, April 28, 2009

MARAC: Wikis Here, There, and Everywhere

Search l'Archivista

About l'Archivista

Caveat lector

Contact l'Archivista

Blog Archive

New York State Archives News and Events

Archivist and Records Manager Blogs

Blogs of Archives in New York State

New York State History Blogs

L'Archivista Also Reads

Labels

Where Are l'Archivista's Readers?

Legal Stuff

l'Archivista

Monday, September 21, 2009

BPE 2009: collaboration

Thursday, June 25, 2009

NDIIPP project partners meeting, day two

Tuesday, April 28, 2009

MARAC: Wikis Here, There, and Everywhere

Search l'Archivista

About l'Archivista

Caveat lector

Contact l'Archivista

Blog Archive

Subscribe To

New York State Archives News and Events

Archivist and Records Manager Blogs

Blogs of Archives in New York State

New York State History Blogs

L'Archivista Also Reads

Labels

Where Are l'Archivista's Readers?

Legal Stuff