Building the Digital Library
by Kathy Parker
Legends of buried treasure abound in Appalachia. Confederate gold caches, Indian mines, and bank robbery loot are all rumored to be awaiting discovery. Appalachian libraries hold a different kind of buried treasure. Letters, diaries, photographs, recordings, and artifacts are locked away in special collections of geographically remote institutions. New digitization initiatives have helped bring some of those hidden treasures to light. Among others, the Digital Library of Appalachia (http://www.aca-dla.org) provides online access to archival and historical materials related to the culture of the southern and central Appalachian region. Just as a map is needed to locate a lost mine, however, researchers need mechanisms for resource discovery in online collections. In digital library parlance, those mechanisms are metadata, structured descriptions of digital objects in a searchable database.
Five years ago, member libraries of the Appalachian College Association began planning an online collection of resources to generate interest and foster scholarship in the Appalachian region. That plan came to fruition in the form of the Digital Library of Appalachia (DLA), a bibliographic, full text, and digital media resource providing information on the culture, geography, environment, and history of Appalachia. It now includes thousands of items from the special collections of eleven libraries in Virginia, West Virginia, Kentucky, Tennessee, and North Carolina. While participants in the DLA did develop guidelines for selecting and scanning resources, it soon became clear that the most important collaborative decisions would be related to the tools provided for users to access the digitized resources. It was in creating the metadata, the mechanism for resource discovery, that librarians brought the mission of the DLA to life.
No longer were these
a small group of
The Digital Library of Appalachia has four primary purposes, the first being to improve scholarly access to Appalachian research resources. From the start, then, access was a primary concern. The project needed a metadata scheme that would meet user expectations of unmediated access yet be robust enough to handle substantial academic inquiry. Another purpose of the DLA is to bring together resources that are scattered in multiple collections. Libraries have enjoyed the benefits of cooperative cataloging for nearly 50 years. DLA members wanted a metadata scheme that would also lend itself to cooperative activity, even when participating institutions had different archival practices and staffing levels. Likewise, the metadata scheme would need to facilitate the creation of virtual collections from materials distributed among libraries. The third purpose of the DLA, sharing information with scholars worldwide, meant that it needed a metadata scheme that would be intuitive for users new to Appalachian studies. No longer were these collections serving a small group of specialized scholars with the time, resources, and fortitude to seek out unknown archival collections. The archives now would be serving users across disciplines whose degree of familiarity with Appalachia would vary widely. Finally, the DLA is intended to broaden opportunities for classroom instruction. In addition to adding a previously under-served patron group to the user constituency, this purpose required a metadata scheme that would allow for creation of predefined searches, packages of related information that could be selected specifically to support teaching and learning.
The Digital Library of Appalachia planning committee examined an array of metadata choices. The Appalachian College Association Central Library Council had requested that DLA records be structured so they could be integrated in library catalogs, but traditional Machine Readable Cataloging (MARC) was cumbersome for digital applications. While it was tempting to create a MARClike database variation, in the end librarians recognized that following an established standard would be more useful and cost-effective in the long term. Encoded Archival Description (EAD) has been widely adopted by library special collections, but it did not lend itself to the granularity desired. Other schemes, such as Text Encoding Initiative (TEI) and Canadian Heritage Information Network (CHIN), are particularly well suited to specific material types, but not necessarily accommodating of resource discovery among the broad range of resources included in the DLA. And other schema now employed by digital projects, such as the Metadata Object Description Schema (MODS), were not yet developed as the DLA came online. The Digital Library of Appalachia planning committee settled on Dublin Core (DC) as the metadata scheme most suitable for its project.
Dublin Core Elements Content Instantiation Intellectual Property Title Date Creator Source Format Publisher Subject Identifier Contributor Description Language Rights Relation Coverage Type
Dublin Core is a simple, structured metadata element set first developed in 1995 to meet a need for improved resource discovery on the World Wide Web. While the commercial web industry did not adopt it, DC is employed in many digital library endeavors. Dublin Core prescribes fifteen basic elements for a metadata record, describing an item's content, instantiation or version, and intellectual property characteristics. It generates relatively small records suited to many material types, and readily crosses disciplines. It is the backbone format for the Open Archives Initiative, so records created in the DC standard would be searchable by OAI harvesters, displaying elements familiar to scholars searching other archival projects. DC is flexible, in that it allows for every element to be optional and repeatable. It can accommodate traditional library practice in areas such as controlled vocabulary and material type designations, and also allow for new functions such as electronic grouping and sequencing of items. And now, with a base of ten years' use, librarians have developed and shared crosswalks that make it relatively easy to migrate data from Dublin Core to MARC, MODS, or other metadata schema as needed.
Once the Digital Library of Appalachia committee had decided upon Dublin Core as the metadata standard for the project, it had many more decisions to make regarding how best to implement the standard to meet the needs of its external users and internal operations. Use of Dublin Core, for example, had to work within the constraints of the software used to provide web access to the project - first Extensis Portfolio and now OCLC's ContentDM. DC is chiefly intended to facilitate resource discovery and to improve retrieval of relevant information. However, project archivists were also attentive to the need for administrative metadata and wanted to accommodate a workflow that would encourage best practice in capturing data about digital conversion, acquisitions, rights, preservation, and other technical and administrative concerns as well. Finally, because the participants were located hundreds of miles apart with minimal budgets for travel and training, the DLA needed a set of guidelines that could assure consistent practice.
The Digital Library of Appalachia was not intended as a research project in developing new digital library tools. In choosing Dublin Core for its base metadata scheme, the DLA committee opted to work with a well-established "industry standard," a practice continued whenever possible in creating guidelines for its project. Planners recognized early on that creating good metadata takes time and money, and wanted to maximize benefit and reduce risk in asking member libraries to invest in creating that metadata. Use of standards brings cohesion to the DLA that distinguishes it from a group of loosely linked digital projects. They are essential in a distributed environment to make sure users have consistent, reliable retrieval and display of information. Standards offer some protection against obsolescence, with the expectation of established migration paths as IT systems evolve. And standards allow for automated processing of data with the widest possible utility in meeting future purposes.
The standards employed by the Digital Library of Appalachia are laid out in the Metadata Elements Outline, a portion of which is displayed at right.
The Committee decided which elements would be included in a DLA record, how the fields would be labeled, how they would be displayed, and which authority controls would apply. Some authorities, such as subject terms, were taken from broadly recognized standards; others, such as file name protocols for identifiers, were developed specifically for the project.
DLA Metadata Elements Outline - January 2004 DLA Field Label Maps to DC Searchable Hidden Controlled Vocabulary Authority Comments Title Title Yes No No Mandatory Author Creator Yes No No Mandatory. Personal names:Last, First Description Description Yes No No Mandatory Subject Subject Yes No Yes LC Mandatory Category Subject Yes No Yes DLA Mandatory. Categories from homepage. Identifier Identifier Yes Yes No DLA Mandatory. Identifier from DLA file name protocols. Holding Library Publisher Yes No Yes DLA Mandatory. Name of institution as displayed on DLA website. Alternative Title Title Yes No No Optional Contributor Contributor Yes No No Optional Time Period Coverage Yes No Yes DLA Optional. Formats in metadata guidelines. Place Coverage Yes No Yes TGN Optional. Getty Thesaurus of Geographic Names Date Date No No No Optional. YYYY OR MM/DD/YY OR DD-Month YYY format. Format Format No No Yes Optional Note None No No No Optional Publisher Publisher Yes No No Optional Relation Relation Yes No No Optional Rights Rights No No No Optional. Link to owning library contact recommended. Type Resource Type Yes No Yes DLA Optional. Type elements list. Full Text None Yes No No Optional. Import text to accompany image of original.
Once the elements were defined, the committee prepared a style guide that described in detail how data should be entered in each record. Each element is listed in sequence, with a definition and notes on deriving appropriate content and format. Examples are given reflecting multiple material types. On the next page is a portion of the style guide for the Title element.
A Portion of the Style Guide for the Title Element
Title. [Dublin Core: Title] This field is searchable, and displays to the public.Examples:
How the item is formally or informally known.
In many instances, where a title is not found on the item itself, the cataloger will supply the title. AACR2 and, for manuscripts and realia, AAMP notation is followed. Do not enter initial articles. Capitalize first word and proper nouns. Only one title is to be entered here. Added title entries should be placed in Alternate Title field. Only initial 25 characters display with thumbnail, so cataloger may want to supply brief title here, with full title included as Alternate Title.
Article from PeriodicalINewspaper
- The trail of the lonesome pine I John Fox Jr.
- Under the hill : my life on Cripple Creek I Jeannie Walters Smith.
- Jumping frog of Calaveras County: comparisons between Mississippi and Tennessee folk culture I Judith Brock. In Literature studies for the new century / Lincoln Memorial University, English Dept.
- Dreaded cholera epidemic kills 18!! Manson Wooten. In The Harlan County times-examiner / R. [Robert] Blake, ed.
Photograph / Pictorial Work / Slide / Photographic Negative
- Dear Betsy, Nov. 5, 1862, from Samuel Whitehead to Elizabeth Stuart Whitehead.
- [Treatise on the treatment of the Negro] / Samuel Ball.
- Downtown Middlesboro / Wallace Hubbard.
- [Photograph of Alice Liddell] / [unknown].
- West Virginia : a pictorial and historical map / [published by] West Virginia Secretary of State.
- A map of Lee County, VA : taken the year of our Lord eighteen and twentynine / [signed by] John Roy Lee, esq. ; produced under the direction of Virginia Coal and Coke Organization.
- [Post office furniture] / [created by] Jacob Leigh.
- [Friendship quilt, nine-patch] / [created by] Marion Jackson, et al.
- Sounds of loneliness / Powell Lane.
- How come? ; Tell everyone ; Done this one before / Ronnie Lane.
- [Live recording of Powell Lane playing unidentified song] / Powell Lane.
The expansive use of standards and authority control is intended to increase effectiveness in resource discovery. Not only will standards and authorities foster consistency, but they will also ensure users have enough information to decide if a resource is relevant. However, as noted, the creation of complex metadata records is expensive and time-consuming. Not all participating institutions could devote extensive human resources to creating thorough records. Therefore, the DLA agreed on a minimum of only seven fields, some of which could be added automatically through use of a template. As with many library services, this is an ongoing, evolving effort. No doubt the metadata standards for the DLA will change over time, but our commitment to standards will help provide for a smooth transition.
Alas, we have yet to find a treasure map among the materials scanned for the Digital Library of Appalachia. However, from shapednote singing by Shenandoah Valley Mennonites to oral histories of the Depression to Native American legends, other treasures abound. In creating effective metadata to allow users to discover these library riches, members of the Digital Library of Appalachia project are living up to the implied trust our users place in libraries to provide accurate, credible, organized information. Our reward is not in pieces of eight, but in delighted users who better understand and appreciate Appalachia through our work.
Appalachian College Association Central Library http://alice.acaweb.org
Digital Library of Appalachia http://www.aca-dla.org
Digital Library of Appalachia documentation http://alice.acaweb.org/Committees/ CmteDLA.html
Dublin Core Metadata Initiative http://dublincore.org
Introduction to Metadata http://www.getty.edu/research/conducting_research/ standards/intrometadata/index.html
Kathy Parker is Director of the Pfeiffer Library of West Virginia Wesleyan College and Project Director for the Digital Library of Appalachia.