Skip navigation

CDNLAO


CDNLAO Newsletter

No. 58, March 2007

Special topic: Archiving and Preservation of Online Publications

National Diet Library's Projects of Collecting, Preserving and Providing Internet Resources

by National Diet Library

  1. Experimental projects of collecting, archiving and providing Internet information resources
  2. Report of the Legal Deposit System Council
  3. Creation of the Digital Archive

The National Diet Library (NDL) set out the "National Diet Library Electronic Library Concept" in May 1998, which defines that information resources available on the Internet should be collected, archived and provided by the library. In 2000 the NDL formulated the "Basic Implementation Plan for Electronic Library Services" and started experimental projects from 2002. In February 2004 the NDL worked out the "National Diet Library Digital Library Medium Term Plan for 2004" and accordingly has since embarked on projects of constructing digital archives.

Based on these plans mentioned above, we have continued to develop projects of collecting and archiving Internet information resources. The following are the main projects that we have worked on.

1. Experimental projects of collecting, archiving and providing Internet information resources

In FY2002 in tandem with the opening of the Kansai-kan, the NDL started experimental projects with regard to collecting, archiving and providing Internet information resources. The following three projects were conducted as testbeds which were to lead to designing the operational model.

WARP (Web Archiving Project) (http://warp.ndl.go.jp)

WARP is the project in which the NDL has selectively collected Internet resources to archive and make them available to the public provided that the copyright holders give the library permission to do so. The project started in FY2002 on an experimental basis and moved into the operational stage in FY2006.

In WARP the NDL has collected websites of the following: national administrative agencies and institutions; prefectural governments; cities, towns, villages and their consolidation councils before and after consolidation; public-interest corporations/organizations and national universities before their changing into independent agencies; and various events such as the FIFA World Cup 2002. The NDL has also collected online periodicals issued on the Internet by  organizations of the kinds mentioned above.

These information resources are being collected mainly by a web crawler based on permission from the copyright holders. We assign metadata to each website and online periodical as a unit object to be organized (title) with which different versions (items) collected on different dates are linked. As of January 2007, we have archived the following number of Internet resources.

Table: WARP contents (as of January 2007)

Type Number of titles Number of items Number of files  Volume of data (GB)
Online periodicals (total) 1,499 6,837 4.57 560
Websites (total) 1,922 7,521 54.37 3,403
National agencies 38 298 8.85 683
Prefectures 8 98 8.31 495
Cities, towns, and villages to be consolidated 1,687 6,095 21.33 1,436
Public-interest corporations/organizations 85 780 12.73 615
Universities 71 106 2.61 155
Events 26 102 0.47 18
Others 7 42 0.07 1
Total 3,421 14,358 58.94 3,963

  • Survey on Comprehensive Collection, Storage, and Archiving of Japanese Web Sites

This survey was conducted from October 2004 to March 2005 to study the feasibility and methodology of collecting, storing, and archiving Japanese websites. The survey targeted domestic websites including those in the JP domain. According to the survey result, we calculated that the total volume of Japanese websites was approximately 18.4 TB, total file numbers 450 millions. For more information on the survey, please see the summary on the NDL website:
http://www.ndl.go.jp/en/aboutus/bulkresearch2005summary_e.html


A large part of the Internet resources is, in fact, in the form of databases, which cannot be collected by a web crawler and thus cannot be searched via search sites. Dnavi navigates users to gateways of databases in Japan. It started in FY2002 as an experimental project, and shifted to the operational stage in FY2006 together with WARP. There are 9,600 databases in Dnavi as of January 2006.


2. Report of the Legal Deposit System Council

In March 2002 the Chief Librarian of the NDL consulted the Legal Deposit System Council to seek their views on the following questions: Should networked electronic publications issued within Japan be incorporated into the legal deposit system?  If not, what selection criteria should be applied to them, and by what means should they be collected?

In December 2004 the Council submitted its report to the Chief Librarian, titled "Concept of the Acquisition System for the Networked Electronic Publications." The report concludes that incorporation of networked electronic publications into the legal deposit system is not appropriate in light of the fundamental principles of the legal deposit system, but it also suggested the framework of another method for collecting networked electronic publications outside of the legal deposit system. The summary and full text of the report can be seen at: http://www.ndl.go.jp/en/aboutus/deposit_council_book.html

3. Creation of the Digital Archive


The NDL formulated the “National Diet Library Digital Library Medium Term Plan 2004” in February 2004. This plan specifies the objectives of the NDL digital library services, one of which is to make the NDL a major base of digital archives in Japan.

The NDL has been working on developing the NDL Digital Archive System (NDL DA System) as an infrastructure system for digital archives. This system aims to assure the overall operations from collecting, organizing, providing through preserving digital information, and to ensure its long-term preservation and availability. Digital information in this case includes not only Internet information resources but also packaged electronic publications such as CD-ROMs as well as information digitized from paper publications. The NDL DA System is based on the Open Archival Information System (OAIS) reference model (ISO14721:2003), and consists of three layers: application, preservation and mass storage. Digital information is to be preserved for the long term as information packages based on the Metadata Encoding and Transmission Standard (METS). We intend to use Metadata Object Description Schema (MODS) as our standard for creating descriptive metadata. We are working on system development in the hope that the NDL DA System will start operating in FY2009. The present working WARP system will be converted to the NDL DA System.

The NDL is also working on the creation of Japan's digital archive portal. Since 2004 a prototype system of the NDL Digital Archive Portal has been made available to the public. In 2007 we will start the full-scale service.

The NDL aims to collect Internet information in the Japan web domain based on future legislation. To turn this vision into reality, the present legislation should be adapted to address the current issues including that of intellectual property. We presume that there will be no short-term solution, but we will continue to work hard to help realize such legislation.

There is also a need to solve technological problems such as the obsolescence of playback systems for collected digital information resources in order to make them accessible for the long term. The NDL has been conducting feasibility studies on various preservation technologies including migration and emulation. In addition we need to promote standardization of technologies necessary for playback, metadata, etc. The NDL intends to cooperate with other digital archives including libraries and archives in Japan as well as the private sector and researchers to promote standardization.

We will continue to work hard on collecting and preserving Internet resources. We hope that we will be able to promote standardization and cooperation in Japan as well as to contribute to the international community.


Copyright (C)2007 National Diet Library, Japan


Webmaster:

Branch Libraries and Cooperation Division, Administrative Department, National Diet Library
1-10-1 Nagata-cho, Chiyoda-ku, Tokyo 100-8924 Japan
Tel: +81-3-3581-2331 / Fax: +81-3-3508-2934 / E-mail: kokusai@ndl.go.jp
(The National Diet Library is responsible for the maintenance of the CDNLAO website)