![]()
CDNLAO Newsletter
No. 58, March 2007
![]()
- Introduction
- National Library of New Zealand and British Library partnership
- International context
- The Web Curator Tool project
- The tool at work in collecting organisations
- Conclusion
Introduction
More and more of the world's documentary heritage is only available online. Users find this content valuable and convenient, but its impermanence, lack of clear ownership, and dynamic nature pose significant challenges to any for collecting professional or researcher attempting to acquire it.
To address these problems, the National Library of New Zealand Te Puna Mātauranga o Aotearoa and The British Library initiated a project to design and build a Web Curator Tool that supports the selection, harvesting and quality assessment of online material by collaborating users in a library environment. The Web Curator Tool enables selective web harvesting, where a selector identifies parts of or whole websites for harvest, usually in relation to a focused subject area, or a significant event or theme. This is a high-quality, crafted approach to harvesting web content.
The Web Curator Tool is the latest development in the practice
of web harvesting (using software to 'crawl' through a specified
section of the world wide web, and gather 'snapshots' of websites,
including the images and documents posted on them). The tool is a
further advance in the race to ensure the world's digital heritage is
preserved for future generations and not lost through obsolescence and
the temporary nature of the web.
National Library of New Zealand and British Library partnership
The partnership between the National Library of New Zealand and the British Library was brought together under the auspices of the International Internet Preservation Consortium (IIPC). The IIPC envisioned a desktop solution to the challenge of collecting web material that would allow widespread implementation of web harvesting without requiring a high level of technical understanding within organisations.
The National Library of New Zealand and the British Library agreed to develop a solution that would manage the web harvesting process and could be adapted for all consortium members and other institutions. The project was funded entirely by the national libraries, with IIPC members contributing to the initial solution requirements.
International context
The value of harvesting web material has been recognised by several
national libraries, cultural institutions and consortia where web
archiving programmes have been instituted. These organisations include
the PANDORA consortium (National Library of Australia and partners);
the UK Web Archiving Consortium (UKWAC) (British Library, National
Archives, National Library of Wales, National Library of Scotland, the
Joint Information Systems Committee, and the Wellcome Trust); the
Library of Congress; the National Library of France
(Bibliothèque Nationale de France); the Nordic Web Archive
(national libraries of Denmark, Iceland, Norway, and Sweden); Library
and Archives Canada (Bibliothèque et Archives Canada); and the
Internet Archive.
The Web Curator Tool project
The project had two formal goals:- to produce a combined requirements specification for the IIPC; and
- to design and build a Web Curator Tool that:
- a. meets the needs of the National Library of New Zealand;
- b. meets the needs of the British Library; and
- c. can be extended to meet the needs of the National Library of Australia and other IIPC members.
The web curator tool has been developed as an enterprise class solution. It is interoperable with other organisational systems and has a user-centred design. The web curator tool enables users to select, describe and harvest online publications without requiring an in-depth knowledge of web harvesting technology. It is auditable, has workflows and identifies the content for archiving and then manages it, including permissions, selection, descriptions, scoping, harvesting and quality review.
The tool supports a workflow comprising a series of specialised tasks: selecting an online resource; seeking permission to harvest it and make it publicly accessible; describing it; determining its scope and boundaries; scheduling a web harvest or a series of web harvests; performing the harvests; performing quality review and endorsing or rejecting the harvested material; and depositing endorsed material in a digital repository or archive.
Following the completion of testing, the Web Curator Tool was returned to the web archiving community in September 2006 as an open-source project. It can be downloaded from http://webcurator.sf.net/.
The tool at work in collecting organisations
There is a growing appreciation of the value of web material, and a corresponding interest in implementing processes to develop collections of web material. Libraries and archives have a vital role in collecting material to service the short, medium, and long-term needs of researchers.
Increasing numbers of researchers conducting advanced or academic research are using web resources they want to permanently cite. However, the issue of persistence for citation affects all disciplines and all areas of research that need to be auditable, verifiable and scholarly. Other researchers actually utilise web material in their studies, and require permanent records of the state of the Internet.
The Web Curator Tool enables collecting organisations (such as national
libraries, university libraries, special libraries, archives and
research libraries) to implement collecting initiatives that:
- are abcollorative (use of widely spread expertise within or across organisations);
- provide an organisation specific approach to collecting web material; and,
- respond to national interest, relate to academic
specialities, are subject specific or topical research areas, and of
interest to the organisation's research community.
Some examples of selective web harvesting and their potential research value are:
- Political party websites (during a general election): political studies.
- Dating, friendship, blogs, or community websites: gender studies or social networking.
- Recreational or professional listservs or websites: leisure studies or industrial studies.
- Trading or content sharing websites: ecommerce or marketing.
Conclusion
'It has been very exciting to be involved with the British Library and
the IIPC in this flagship project. It is also a great pleasure
for us to contribute to a project that will benefit all participants in
the digital preservation space,' says Penny Carnaby, National
Librarian and Chief Executive of the National Library of New Zealand.
The tool was launched and demonstrated by representatives from the National Library of New Zealand and the British Library at the 6th International Web Archiving Workshop, held at the European Digital Libraries conference in Spain in 2006. The project team is currently working to expose it to a wider audience and publicise the opportunity to evaluate, trial, implement, and add to this collaborative open source innovation.
More information |
| About the National Library of New Zealand www.natlib.govt.nz |
| About the Web Curator Tool http://webcurator.sf.net/ |
| About The British Library http://www.bl.uk/ |
| International Internet Preservation Consortium http://netpreserve.org/about/index.php |
| 'Web Curator Tool' by Philip Beresford, British Library Ariadne Issue 50, 30 January 2007 www.ariadne.ac.uk/issue50/beresford/ |
| 'Building a Web Curator Tool for the National Library of New Zealand'
by Gordon Paynter and Ingrid Mason, National Library of New Zealand. Paper given at the 2006 LIANZA conference, Wellington http://opac.lianza.org.nz/cgi-bin/koha/opac-detail.pl?bib=121 |
Copyright (C)2007 National Library of New Zealand
