Skip navigation

CDNLAO


CDNLAO Newsletter

No. 58, March 2007

Special topic: Archiving and Preservation of Online Publications

A Web Archiving System of the National Library of Korea: OASIS

by National Library of Korea, Seoul, Republic of Korea

  1. Introduction
  2. OASIS Approach for Web Resource Collection
  3. OASIS Workflow and Process
  4. Future Development Direction

1. Introduction

With the rapid development of information and communication environment, numerous intellectual works are available in digital format on the Internet, and those digital resources have disappearing tendencies soon after their appearance. Digital archiving is the long-term procedure to process, manage and preserve those digital objects, which are considered to have timeless value. Since 1990's, as their long-term national projects, many countries like Australia, the United States, and European nations have progressed their online preservation efforts for digital resources led by their national libraries with cooperation from other institutions and organizations.

The National Library of Korea (NLK), with the change of status of libraries in digital information era, has planned an efficient national information service to the people with collection of quality online digital information and provision of public service, to preserve those intellectual records for the next generations to come.

For the opening of the National Digital Library of Korea in 2008, to collect various web contents, NLK is working on a project for online digital resource collection and preservation, OASIS (Online Archiving & Searching Internet Sources www.OASIS.go.kr). The OASIS system was developed in December 2005, to preserve online digital resource for the future generation, to collect and preserve national digital cultural heritage, and to establish standard management policies for the digital resources.


The OASIS Project, one of the four innovation brand policies by Korea's Ministry of Culture and Tourism, receives a lot of attention through the government. The government supports the project with about $1 million, in the two year period of 2005 and 2006 for its online digital resource collection and preservation process. According to the current mid-to-long term strategy plan, from 2007 the government will support more systematic and developmental directions and increase the budget aiming at the National Digital Library opening in 2008 and 1 million web resource collection developments in 2010.

2. OASIS Approach for Web Resource Collection

2.1 Selective Collection of Web Resources

NLK's approach for web archiving is basically a selective collection. Currently we have two types of objects to collect: web sites and individual web digital resources. They are being selectively collected by an established collection development policy. We will expand the target objects into video, image, and audio gradually.


Among the potential objects for collection, there are possibilities to have their printed versions already, but currently we keep collecting them according to the collection development policy, regardless of the potential duplicity.

2.2 OASIS Collection Target and Collection Policy

The selection of target resources was based on the utility for the current or the future information need, author's popularity, the uniqueness of information, academic contents, being up-to-date of the information, frequency of upgrading, and the accessibility.


To be selected as national preservation resources, the collection digital resource should be something important related to Korea's society, politics, culture, religion, science or economy, and authored by Koreans. Also, it should be written by those who have authorities in their expert area, such as well-known professors and researchers in the university in Korea, and they should be something that was considered to have contributed to its discipline nationally or internationally.


Examples include the digital resources considered as valuable in terms of collection and preservation based on their being up-to-date, scarcity, and utility, about the current hot issues such as national parliamentary election and the new executive capital city. They also include articles in journals that are evaluated by international organizations with reputation and authority.

2.3 OASIS Collection Steps

There are 5 steps for NLK's collection development for valuable online digital resources on the web.


The first step is the selection review process. One method is by selection policy and the other one is by the committee for digital resource collection and preservation, which consists of experts from each subject area. The second step is to process any copyright on the selected target objects, and to collect them by OASIS system. The third step is to catalog the collected digital resources by Dublin Core's basic elements such as title, URL, publisher or abstract, and subject analysis. The fourth step is to review the catalogs, to correct errors, and to make final decisions about the resource's value for preservation by subject experts. The fifth step is the preservation process where the collected digital resources are converted into preservation file format, preservation media are selected, and the collections are moved to the media.


The sixth and final step is where preparation for service to users is executed with those online digital resources of which copyright issues are resolved.

2.4 OASIS Annual Resource Collection Statistics

The collection started in 2004 and currently OASIS has 156,798 resources in total. The collection size is about 2.4 terabytes.

Table 1. OASIS Resources Collection Statistics (Number of Titles)
Type of Resources 2004 2005 2006 Total
Individual Digital Resource 43,861 45,280 42,958 132,099
Web Site 1,218 2,716 20,765 24,699
Total 45,079 47,996 63,72 156,798

Individual digital resources were document files created by government organizations, other public institutions, research institutions, associations, and individuals. For web site resources, we collected all subject areas including sites for the new executive capital city, election sites and local festivals. The collection aims at 1 million web resources archiving in 2010, and the target areas will be expanded to video, image, and sound.

In terms of copyright agreement for collection and preservation of the collected resources, in 2005, out of 1,002 institutions asked to agree, 209 agreed at about 20% of agreement rate, while in 2006 only about 17% agreed, 112 out of 650.

Since there is lack of understanding of digital archiving and low agreement rate for copyright clearance by copyright holders, it is necessary to encourage government and other major organizations to increase their voluntary participation in national projects like digital archiving.

3. OASIS Workflow and Process

OASIS workflows and processes are described for web sites and individual digital resources respectively.

The process for web sites does not finalize with one cycle for mirroring because web sites change their contents continuously. It is necessary to collect their resources to preserve them by certain time periods. However, it is impossible for a manager to monitor numerous web sites changes manually, and it is considered a waste of resources to collect every resource unconditionally by a certain interval to preserve, for example, one month, two months, or six months.

Workflow for Website Archiving
Fig. 1. Workflow for Website Archiving

The OASIS system lets collecting robots continuously collect registered sites' resources, monitor their changes, and compare the current state with the previously saved one to provide numbers for those changes. According to the number, the manager decides whether the new collection will be preserved or not.

The general workflow and process for web archiving is seen in Fig. 1. Based on user's recommendations, authors' donations or the manager's own selection, the basic information about a target site and the collection schedule are defined. A web robot mirrors the site by a certain schedule for the first collection.

The manager reviews the first collection and makes a preservation copy of it. Later a web robot does the second collection by the schedule to show the change rate by comparison with the first preservation copy. The manager checks the change rate to decide whether the second copy of the collection should be made or not. The third collection is compared with the second copy to show the change rate, if any has occurred.

The selected individual digital resources are collected by a robot. The robot collects the target resources, checks duplicity, automatically classifies them according to the classification system and extracts abstract information. For the processed individual resources, the manager inputs various metadata, reviews and corrects to make final catalog to preserve.

4. Future Development Direction

As knowledge information resources migrate from paper to digital formats, increasing necessity is found for collection and preservation of digital knowledge information resources at the national level. Recognizing digital resources' being short-lived, the OASIS system is running at the national level led by NLK to collect and preserve valuable digital resources for the current generation to inherit to the next generation as digital cultural heritage.

To accomplish the mission, the OASIS system provides national standard models for submission of online digital resources to the authority in the future digital environment and for standardization of collection and preservation systems for online digital resources.

Major development technologies are applied to OASIS at the levels of collection, preservation, management, public service, etc. They include the development of web robot agents and techniques to use them, automatic classification and automatic abstracting and others for the collection process. For the preservation process, periodic management of recording media and backup technology should be accomplished. For public service, refinement of search technology for the copyright-cleared resources should be followed.

As a major subsystem of the National Digital Library that will be opened in 2008, the OASIS system will establish a cooperation system with related organizations led by NLK to be distributed as a standard system. The distributed system will assign web resource collection processes to each subject area.

National Library of Korea,
Seoul, Republic of Korea oasis@mail.nl.go.kr

Copyright (C)2007 National Library of Korea


Webmaster:

Branch Libraries and Cooperation Division, Administrative Department, National Diet Library
1-10-1 Nagata-cho, Chiyoda-ku, Tokyo 100-8924 Japan
Tel: +81-3-3581-2331 / Fax: +81-3-3508-2934 / E-mail: kokusai@ndl.go.jp
(The National Diet Library is responsible for the maintenance of the CDNLAO website)