CDNLAO Newsletter

No. 47, July 2003

==========Special Topic: Digitization============================

(News from the National Library of India)

India flag
Digitization of Manuscripts of 
the National Library of India

[back---contents---next]


  • 1. Introduction
  • 2. Digitization a timely initiative
  • 3. Digitization and the National Library
  • 4. Digitization of Manuscripts
  • 5. A Sample Project
  • 6. About the Manuscript
  • 7. Project Set-up
  • 8. Project Process
  • 9. Project Output
  • 10. Project Experience
  • 1. Introduction

    The National Library of India, located in Kolkata (Calcutta) is the largest library in the country. It is an institution of national importance under the Department of Culture, Ministry of Tourism and Culture, Government of India. The National Library came into being in 1948, with the passing of Imperial Library (Change of Name) Act 1948. In the same year, it was shifted to Belvedere Estate, its present location, which was the former viceregal palace. It is now housed in three separate buildings with a separate preservation laboratory. It was formally opened to the public on 1st February 1953 by the late Maulana Abul Kalam Azad, the then Union Education Minister. The origin of the library, however, can be traced dating back to March 1836, when the Calcutta Public Library was opened to the public at 30 Esplanade Row, Calcutta. Subsequently the Calcutta Public Library was merged with the Imperial Library in January 1903. 

    The Library's responsibility is to collect, disseminate and preserve the printed heritage of the country. It's also one of the oldest institutions of the country focussed on the conservation and maintenance of the bibliographic documents in various fields of knowledge. The National Library of India is celebrating its centenary year with some new initiatives and challenges. Digitization of manuscripts is one such initiative the library has taken up responding to the growing use and adaptability of information technology to library-related activities. 

    National Library of India
               The National Library of India (The Belvedere Building)
     

    2. Digitization a timely initiative

    The issue has become important in recent time due to the advancement of information technology and its application in all phase of life. The libraries, both public and research need to adapt to the emerging scenario and take full advantage of this technology. One of the major activities listed for the tenth five-year plan of the country for the library sector is automation, modernisation and networking of libraries. Digitization is seen to be one such job to achieve this target. Barring some isolated attempts by some institutions, organisations, libraries on a small scale, no major initiative has been taken so far in this direction. 

    Digital libraries have three principal advantages over conventional ones: they are easier to access remotely, they offer more powerful searching and browsing facilities, and they serve as a foundation for new value added services. In context where the collections are rare and unique, the digitization also serves as a preservation tool. The case of the National Library falls mainly under the last category. 
     

    3. Digitization and the National Library

    The National Library undertook a pilot project entitled "Down Memory Lane" to digitize its rare and brittle books in late 90's. The English books that were published prior to 1900 and Indian books published before 1920 were taken into consideration. A local private agency was given the responsibility to scan and clean the documents. The library professionals were given the task of checking the scanned data to prepare citation cards for indexing in order to meet retrieval and reference need. From February 1999 to June 2001, a total of 6601 books containing more than 2.5 million pages were scanned and archived in 548 CD-ROMs (in duplicate).


    4. Digitization of Manuscripts

    In a way it is an extension of the earlier project, the content being the only difference. The National Library has in its possession a small holding of manuscripts representing some basic and important branches of knowledge. These manuscript collections are mostly part of some collections belonging to eminent personalities of India, which were donated by their heirs. The details of the holdings are as follows:
      1. Paper Manuscripts: 3000 volumes approximately.
      2. Correspondence and diaries: 250 volumes approximately.
      3. Palm Leaf Manuscripts: 334 volumes approximately.

    The following is the language-wise break-up of the manuscripts:
      a. Arabic: 681
      b. Persian: 955
      c. Urdu: 21
      d. Bengali: 162
      e. English: 255
      f. Hindi: 5
      g. Tamil: 370
      h. Sanskrit: 790

    While the Tamil manuscripts in palm leaves are unique in character, the Arabic and Persian manuscripts bear beautiful illustrations, fine calligraphy and elegant bindings. Loose letters, diaries and some magnificent dossiers of correspondence represent interesting and authentic records of important personalities. The library has about 
    100 volumes of Xylographs comprising more than 800 items, presented to the Library by Hon'ble Dalai Lama after his visit. These are block prints made from bark of rare Nepali trees.

    Although the storage environment is satisfactory, the manuscripts are facing natural decaying (yellowing, brittleness, and wear and tear). 

    Manuscript from the Vaiyapuri Pillai Collection pt. 1

    Manuscript from the Vaiyapuri Pillai Collection pt. 2

    Palm leaf manuscripts from the Vaiyapuri Pillai Collection



     

    5. A Sample Project

    A sample project was undertaken by the National Library with the objective of better understanding on the different issue pertaining to the digitization of Manuscripts. The main concern areas of this project were as follows:

    1. Technology related issue: The process, output and storage of the digital images of the manuscript. The images need to be as close to the original as possible, with removal of worm marks, stain marks. The images need to be clear and the details of illustrations of pages have to be captured to the best possible extent.
    2. Project economics: Cost-benefit analysis on the project with the long-term view on the scope of project in large scale. 
    3. Project Time frame: The project estimation and determination of the timeframe of completion of digitization of the entire collection of manuscript was imperative, and the sample project was undertaken to get the idea of time of completion which can be extrapolated for the entire scope of project.
    An excellent Persian manuscript - Tutinamah was chosen for the sample project. The project was jointly envisaged and executed by The National Library and Trinetrix Technologies, a Calcutta based Information Technology organisation.
     

    6. About the Manuscript

    Tutinamah: A fine and elegant copy of the older and larger version of the well-known tales of a parrot, by Diya-I-Nakhshabi (d. AD 751 - A.D 1350) who composed it by 1330 AD.

    This beautiful copy, consisting of 52 stories, is written in clear Indian Taliq within gold and colour ruled borders and contains a beautifully illustrated headpiece. There are about 36 coloured exquisite illustrations created out of vegetable and organic dyes, some of which are interesting. The entire manuscript is based on hand made paper and is in bound form. 

    This work was later adapted and abridged by Mohd. Qadri and the Urdu version of the same was published from London in 1852.
     

    7. Project Set-up

    The project set-up was designated into two operational areas:

    1. Image Capture Station:
    The image capture station consisted of a digital camera (Nikon D100 with bayonet mount 28-70mm f/2.8 ED-IF AF-S Zoom-Nikkor lens) mounted vertically on the photographic copystand (Bogen System 800 Repro Copy Stand W/bb 1740), with side illumination through 40 watts incandescent lamp. The background was chosen to be slightly lighter than the document color in order to minimize shadows and optimize digital transfer. The Digital Camera had special colormetric filters that enabled the camera to capture a broader spectrum of colors than most digital scanners. 

    The lighting was also provided selectively from by two 1000-watt Elinchrome strobe lights (daylight balance) at 45 % angle to copy surface, with multiple diffusion filters between copy surface and light to soften shadows and reduce glare for specific pages with illustrations.

    2. Image Processing Station:

    The image processing station was a HP Brio PC with Pentium IV processor, 128 MB DDRAM. The workstation had the image processing softwares like Kodak Imaging, Adobe Photoshop 6. There was an image transfer device connected to the USB port, which gathered images from the memory card of the digital camera.
     

    8. Project Process

    The project process consisted of the following steps:

    1. Document Assessment and set-up:
    The condition of the document, sequence of pages, original page numbering order, was noted at this stage. The lighting environment was adjusted as per the specific requirement of the document using a light meter. The book was set-up on the Photographic Copy Stand bed opening it at an angle of 120 degree to avoid the stress to the binding of the manuscript.

    2. Image Capture:
    At this stage the image was captured from the manuscript at the image capture station. Initially few shots were taken at different aperture, focal length and shutter speed. The captured images were transferred to the Image Processing Station for comparative study and standardisation of the image capture specifications. 

    The final images were captured using a cable shutter-release trigger at 
      -Aperture: 16
      -Focal Length: 55mm
      -Shutterspeed: 2.5 seconds

    The images were taken first for all right-hand-side pages and then for all left-hand-side pages (as this is a Persian manuscript). The images were captured in colour as uncompressed 8-bit-per-channel (24 bit RGB) TIFF files at 300 dpi.

    The images were then transferred to the Image Processing Station.

    3. Image Processing:
    The image processing consisted of following steps:

    • Image Identification Tagging: The images, once transferred from the Image Capture Station, were renamed as per the page sequence.
    • Image Quality Check: The images were checked for any deviation in terms of clarity,  legibility, colour.
    • Basic Editing: The images were checked for any tilt/skews and deviation from normal orientation, and were rectified to the acceptable level of 4-degree tilt of NARA specifications. The images, which contained some portion of the opposite page, were cropped, resized and the normal processing was done.
    • Final Editing: The graphics level of each image was checked with the original. The images, which had come brighter, were toned down to match the actual. The unwanted stain marks, worm marks were removed. The colour channels were checked to conform  to 8 bit per channel specifications.
    • Format Conversion: The base files were converted to three basic formats as per the requirements, namely PDF, TIFF and JPEG.
    • E-Book Format Conversion:  The individual image PDF files were tagged and a composite PDF file was prepared as per the original document pagination and sequence.
    9. Project Output

    The images were obtained in three forms, namely TIFF, PDF and JPEG. All the image files of the individual pages were obtained in uncompressed TIFF, and JPEG, with the objective of archival. The composite PDF containing the individual pages in were in E-book form, with the objective of viewing and access. The image files of a page of the document of various intermittent stages of processing were also obtained. The images were stored in CD-ROM and were made resident in hard-disk of the central server.
     

    10. Project Experience

    The aspects of the project, which needs attention if the project is taken-up at larger scale, are as follows:

    • The project would need a server administered harddisk based storage system with fault tolerance and disaster recovery provision along with CD-ROM based storage for archival.  The reason being, it's difficult for a CD-ROM to contain images of a document or an E-book in its entirety. For random access by viewers, hard-disk based storage is more reliable option.
    • It was observed for the images of document pages containing the illustrations, there were undesirable and unavoidable but minute tonal variations. This is because the illustration contains any shade of colour, which can lie in the spectrum of millions of colour. The CCD unit of the digital camera captures a limited band of the spectrum of colours. To circumvent this limitation, white light can be used.
    • The digital restoration of the images of the manuscript was done using the state-of-art image editing software, Adobe Photoshop Version 6, which is in itself very resource consuming on the processing workstation. The process of digital restoration is also very expertise-intensive which involves cloning, multilayer processing, the hue, saturation and gradient adjustments etc. It was observed that for complete satisfactory digital restoration of an A4 page of the said manuscript, atleast 4-5 hours were needed for an expert professional to work with the above mentioned infrasructural set-up.
     

     

    up---contents---next]
    All Rights reserved. Copyright (c) the National Library of India, 2003