![]()
CDNLAO Newsletter
No. 58, March 2007
![]()
- Introduction
- OASIS Approach for Web Resource Collection
- OASIS Workflow and Process
- Future Development Direction
1. Introduction
With the rapid development of information and communication environment, numerous intellectual works are available in digital format on the Internet, and those digital resources have disappearing tendencies soon after their appearance. Digital archiving is the long-term procedure to process, manage and preserve those digital objects, which are considered to have timeless value. Since 1990's, as their long-term national projects, many countries like Australia, the United States, and European nations have progressed their online preservation efforts for digital resources led by their national libraries with cooperation from other institutions and organizations.
The National Library of Korea (NLK), with the change of status of libraries in digital information era, has planned an efficient national information service to the people with collection of quality online digital information and provision of public service, to preserve those intellectual records for the next generations to come.
For the opening of the National Digital Library of Korea in 2008, to collect various web contents, NLK is working on a project for online digital resource collection and preservation, OASIS (Online Archiving & Searching Internet Sources www.OASIS.go.kr). The OASIS system was developed in December 2005, to preserve online digital resource for the future generation, to collect and preserve national digital cultural heritage, and to establish standard management policies for the digital resources.
The OASIS Project, one of the four innovation brand policies by Korea's
Ministry of Culture and Tourism, receives a lot of attention through
the government. The government supports the project with about $1
million, in the two year period of 2005 and 2006 for its online digital
resource collection and preservation process. According to the current
mid-to-long term strategy plan, from 2007 the government will support
more systematic and developmental directions and increase the budget
aiming at the National Digital Library opening in 2008 and 1 million
web resource collection developments in 2010.
2. OASIS Approach for Web Resource Collection
2.1 Selective Collection of Web Resources
NLK's approach for web archiving is basically a selective collection. Currently we have two types of objects to collect: web sites and individual web digital resources. They are being selectively collected by an established collection development policy. We will expand the target objects into video, image, and audio gradually.
Among the potential objects for collection, there are possibilities to have their printed versions already, but currently we keep collecting them according to the collection development policy, regardless of the potential duplicity.
2.2 OASIS Collection Target and Collection Policy
The selection of target resources was based on the utility for the current or the future information need, author's popularity, the uniqueness of information, academic contents, being up-to-date of the information, frequency of upgrading, and the accessibility.
To be selected as national preservation resources, the collection digital resource should be something important related to Korea's society, politics, culture, religion, science or economy, and authored by Koreans. Also, it should be written by those who have authorities in their expert area, such as well-known professors and researchers in the university in Korea, and they should be something that was considered to have contributed to its discipline nationally or internationally.
Examples include the digital resources considered as valuable in terms
of collection and preservation based on their being up-to-date,
scarcity, and utility, about the current hot issues such as national
parliamentary election and the new executive capital city. They also
include articles in journals that are evaluated by international
organizations with reputation and authority.
2.3 OASIS Collection Steps
There are 5 steps for NLK's collection development for valuable online digital resources on the web.
The first step is the selection review process. One method is by selection policy and the other one is by the committee for digital resource collection and preservation, which consists of experts from each subject area. The second step is to process any copyright on the selected target objects, and to collect them by OASIS system. The third step is to catalog the collected digital resources by Dublin Core's basic elements such as title, URL, publisher or abstract, and subject analysis. The fourth step is to review the catalogs, to correct errors, and to make final decisions about the resource's value for preservation by subject experts. The fifth step is the preservation process where the collected digital resources are converted into preservation file format, preservation media are selected, and the collections are moved to the media.
The sixth and final step is where preparation for service to users is
executed with those online digital resources of which copyright issues
are resolved.
2.4 OASIS Annual Resource Collection Statistics
The collection started in 2004 and currently OASIS has 156,798 resources in total. The collection size is about 2.4 terabytes.
| Type of Resources | 2004 | 2005 | 2006 | Total |
| Individual Digital Resource | 43,861 | 45,280 | 42,958 | 132,099 |
| Web Site | 1,218 | 2,716 | 20,765 | 24,699 |
| Total | 45,079 | 47,996 | 63,72 | 156,798 |
Individual digital resources were document files created by government organizations, other public institutions, research institutions, associations, and individuals. For web site resources, we collected all subject areas including sites for the new executive capital city, election sites and local festivals. The collection aims at 1 million web resources archiving in 2010, and the target areas will be expanded to video, image, and sound.
In terms of copyright agreement for collection and preservation of the collected resources, in 2005, out of 1,002 institutions asked to agree, 209 agreed at about 20% of agreement rate, while in 2006 only about 17% agreed, 112 out of 650.
Since there is lack of understanding of digital archiving and low agreement rate for copyright clearance by copyright holders, it is necessary to encourage government and other major organizations to increase their voluntary participation in national projects like digital archiving.3. OASIS Workflow and Process
OASIS workflows and processes are described for web sites and individual digital resources respectively.
The process for web sites does not finalize with one cycle for mirroring because web sites change their contents continuously. It is necessary to collect their resources to preserve them by certain time periods. However, it is impossible for a manager to monitor numerous web sites changes manually, and it is considered a waste of resources to collect every resource unconditionally by a certain interval to preserve, for example, one month, two months, or six months.
| Fig. 1. Workflow for Website Archiving |
The OASIS system lets collecting robots continuously collect registered sites' resources, monitor their changes, and compare the current state with the previously saved one to provide numbers for those changes. According to the number, the manager decides whether the new collection will be preserved or not.
The general workflow and process for web archiving is seen in Fig. 1. Based on user's recommendations, authors' donations or the manager's own selection, the basic information about a target site and the collection schedule are defined. A web robot mirrors the site by a certain schedule for the first collection.
The manager reviews the first collection and makes a preservation copy of it. Later a web robot does the second collection by the schedule to show the change rate by comparison with the first preservation copy. The manager checks the change rate to decide whether the second copy of the collection should be made or not. The third collection is compared with the second copy to show the change rate, if any has occurred.
The selected individual digital resources are collected by a robot. The robot collects the target resources, checks duplicity, automatically classifies them according to the classification system and extracts abstract information. For the processed individual resources, the manager inputs various metadata, reviews and corrects to make final catalog to preserve.
4. Future Development Direction
As knowledge information resources migrate from paper to digital formats, increasing necessity is found for collection and preservation of digital knowledge information resources at the national level. Recognizing digital resources' being short-lived, the OASIS system is running at the national level led by NLK to collect and preserve valuable digital resources for the current generation to inherit to the next generation as digital cultural heritage.
To accomplish the mission, the OASIS system provides national standard models for submission of online digital resources to the authority in the future digital environment and for standardization of collection and preservation systems for online digital resources.
Major development technologies are applied to OASIS at the levels of collection, preservation, management, public service, etc. They include the development of web robot agents and techniques to use them, automatic classification and automatic abstracting and others for the collection process. For the preservation process, periodic management of recording media and backup technology should be accomplished. For public service, refinement of search technology for the copyright-cleared resources should be followed.
As a major subsystem of the National Digital Library that will be opened in 2008, the OASIS system will establish a cooperation system with related organizations led by NLK to be distributed as a standard system. The distributed system will assign web resource collection processes to each subject area.
National Library of Korea,
Seoul, Republic of Korea oasis@mail.nl.go.kr
Copyright (C)2007 National Library of Korea
