This draft is made available for my colleagues to read and comment on so that I may benefit from this new form of peer review. This will also give readers an opportunity to learn about anticipated procedures for electronic theses and dissertations at Virginia Polytechnic Institute and State University.
Please send your comments directly to the author.
Theses and dissertations as electronic files transferred from the student author to the Graduate School to the Library may well be the first major source of electronic texts that many catalogers encounter. To prepare for this potential influx of electronic texts, work flow and cataloging guidelines need to be updated. The author suggests expanding current theses cataloging and taking advantage of online information prepared by authors so that the bibliographic records provide OPACS with much more valuable information than do traditional theses cataloging records. This should not require a lot of extra work. In addition, the author covers areas of concern to faculty, students, and UMI
An experienced cataloger moving on to process electronic texts steps back from those years of training and practice and takes the opportunity to examine all the available MARC tags. She does not just look for the new fields that are available to describe the attributes particular to computer files. She systematically works through the fields and imagines an electronic text (e-text) that could be defined and described by each one.
In the spring of 1994 there were three primary resources for e-text cataloging information: the Bibliographic Standards  the CETH (Center for Electronic Texts in the Humanities) guidelines,  and the University of Virginia's online cataloging manual.  These were joined by the OCLC Internet cataloging manual.  The cataloger acquired each of these tools in a unique way. The Bibliographic standards are part of her cataloging department's reference collection. CETH guidelines availability were announced as accessible by FTP (file transfer protocol) on a listserv (probably on several). UVa's guidelines were discovered by browsing library World Wide Web sites while looking for examples of cataloging department home pages.
For an experienced cataloger, it is not necessary to reexamine AACR2r,  but it is reassuring to have the rules cited with the MARC tags in the CETH guidelines. The cataloging rules for computer files are sketchy and it quickly becomes a matter of applying logic to what the user might want to learn from the bibliographic record and what information a computerized access and retrieval system might need. Later, in the endless search for answers, TEI (Text Encoding Initiative) header guidelines  were added to this electronic catalogers' expanding tool kit. All but the latter quickly became the much-thumbed source of the cataloger's queries and provided reassurance that not every initiate to cataloging electronic texts is starting from ground zero when she's faced with her first electronic text.
But, what would these e-texts needing cataloging actually be? At Virginia Tech and other university libraries , electronic theses and dissertations (e-theses) may become the first body of e-texts regularly requiring cataloging. The library became aware that e-theses  were in the offing in the spring of 1993 when the Associate Dean of the Graduate School (ADGS) contacted the library's theses liaison to discuss a partnership for providing access.
The ADGS presented many reasons for providing students with the opportunity to prepare electronic dissertations and over time several more reasons became clear.
There are also advantages to e-theses from the library's point of view.
The director of the Scholarly Communications Project (SCP) (and also a cataloger) met with the ADGS and theses liaison because this unit of Virginia Tech's University Libraries is largely responsible for experimenting with electronic scholarly publishing. The SCP director agreed to bring together staff from all the pertinent areas of the library to discuss possible scenarios for processing e-theses. The staff included the head of cataloging, the theses liaison (also a cataloger), the University Archivist, and the SCP programmer.
Two goals evolved initially:
The ad hoc task force proposed the outline of steps in figure 1 to get the document into the library for cataloging and public access.
In the beginning the task force planned for the processing of ASCII (i.e., text only) files and provide Gopher access. However, the Graduate School adopted the PDF (Portable Document Format) file of Adobe Acrobat as its standard. PDF has several advantages over ASCII because not only can it be viewed from Macs and PCs without any additional work, it very faithfully retains the look and feel of the original document, whether it was created with Word, WordPerfect, LaTeX,. Excel, PowerPoint, PhotoShop, or other applications. The burden of how the document looks online is left entirely up o the author.
The Scholarly Communications Project established its World Wide Web presence in the winter of 1993 and the Adobe Acrobat Reader, a viewer for PDF files, became available at no direct cost to the user shortly thereafter. The SCP also learned about the benefits of PDF in developing the library's electronic reserve system. As a result, some adjustments were made in the original workflow that reduced the number of steps required to process e-theses.
Additional considerations in the processing of e-theses evolved from a regional initiative called the Monticello Electronic Library (MEL) Project. It is sponsored by SURA (13 universities forming the Southeastern University Research Association) and SOLINET (SOutheastern LIbrary NETwork). A group focusing on electronic theses, dissertations, and technical reports was created at the first general meeting held in July 1993 in Atlanta. Further discussions took place when this group met a year later at Virginia Tech. Largely due to the leadership role of the Virginia Tech Graduate School and its work with the University Libraries, the director of the Scholarly Communications Project was charged by the MEL Working Group on Electronic Theses, Dissertations, and Technical Reports with coalescing work that was being done by SURA participants to create bibliographic (or bibliographic-like) descriptions for a variety of databases and catalogs. This involved learning about how TEI (Text Encoding Initiative) headers (derived from SGML, Standard Generalized Markup Language) were being used at the University of Virginia to create MARC bibliographic records for publications issued by its Electronic Text Center, and learning about the codes used to create descriptors for WATERS (Wide Area Technical Report Server),  a WAIS-based distributed technical report repository.
MARC Bibliographic Records
While looking at these new descriptors, Virginia Tech catalogers focused initially on what fields were currently included in the MARC bibliographic record for theses and how these would be the same or different for e-theses. Currently, the MARC record for a dissertation is not very robust and often has a local twist, presenting valuable information in a unique format that can be seen only at the originating institution because it is masked to users of OCLC or other centralized cataloging repository. The ad hoc task force also studied the fields that could be supplied from the electronic documents using the copy/paste features of word processors. It would have been optimal to work with programmers in library automation so that author, title, abstract, and other information be programmatically adapted to the appropriate MARC fields,. Their priorities, understandably, were not the same as the task force's. The extent of AACR2r compliance was another complicating factor. For example, would programming change upper-case letters to lower case?
While the task force was looking at how to do things differently, we chose this opportunity to press the Graduate School to require authors of theses (in all formats) to provide keywords for use in the bibliographic record. Cataloging had been impressed for years with how labor intensive was the task of assigning Library of Congress subject headings, so having the authors assign the uncontrolled subject headings was an appropriate alternative. Since the Graduate School would be rewriting its handbook for student authors, we successfully lobbied for this change to be included in the Graduate Schools first Electronic Dissertation Manual.  The principal cataloger determined that MARC tag 653 would be appropriate for theses author-assigned terms and pointed out to us that without LC subject headings, our records would be considered "minimal level," rather than full level, cataloging. This seemed particularly unjust to us since e-theses cataloging would be more robust than previous theses cataloging because we planned to include additional information.
The catalogers on the ad hoc task force suggested including tables of contents (MARC tag 505) and abstracts (MARC tag 520) since the standard copy-and-paste features of today's word processors would make this a relatively easy process. The table of contents for dissertations, however, proved to be quite generic, usually containing only the standard dissertation topics (e.g., literature review, methodology, findings, etc.), and, therefore, not an enhancement to the information available about the work in the OPAC. The abstract, however, contains valuable information and provides valuable information about the research topic. The 520 is also an indexed field in our online catalog and, therefore, a word-searchable field for OPAC users. Adding the abstract (250-350 words) can, however, add tremendously to the length of the MARC record. See figure 2 and figure 3.
While including abstracts and tables of contents is not necessarily a unique feature of e-text cataloging, cataloging conventions have not generally included the name of the thesis author's department as a standard feature of the bibliographic record. Therefore, another consideration was the opportunity to modify our local practice of creating a pseudo-series statement in order to have the name of the department in a searchable MARC field. Since about 1987 VPI theses catalogers had been creating local notes with the name of the student's department. These 590 notes for OCLC were retagged as 440, with indicators 90, once the record was imported from OCLC into our local system, VTLS.
590 changes to 440 90 VPI & SU. Mechanical Engineering. Ph. D. 1995
Taking advantage of the opportunity to incorporate changes in theses cataloging, we discussed using MARC tag 502, the dissertation note field, to include the degree, institution, and year the degree was granted, expanding institution to include the name of the department. The new note would follow this example:
502 Thesis (Ph. D. in Mechanical Engineering)--Virginia Polytechnic Institute and State University, 1955.
Evaluating the potential value of an e-thesis bibliographic record provided us with the opportunity to propose a substantially enhanced record of real and continuing value to OPAC users. Again AACR2r compliance was atn issue. In reviewing her Cataloging Internet Resources: A Manual and Practical Guide, Nancy Olson states that when cataloging Internet-accessible documents, consider them to be published documents. Therefore, publisher information belongs in field 260 of an e-thesis record. [Coming from a serials background it seems reasonable to add a 710 for this corporate body tracing.] Additional fields required for cataloging computer files include tag 256 for computer file characteristics, tag 538 for notes of system details, and tag 856 for formatted electronic location and access information. These additional fields (505, 260, 710, etc.), however, also increase the length of the record and, therefore, should be carefully considered as should the usefulness of the information provided in meeting the combined needs of OPAC users and computerized access and retrieval systems.
[OCLC fixed field tags] Local lvl:1 Analyzed:0 Operator:43 Edit: CNTL: Rec stat:n Entrd:950321 Used:950417 Type:m Bib lvl:m Govt pub: Lang:eng Source:d Frequn: File:d Enc lvl: Machine:0 Ctry:vau Dat tp:s Regulr: Desc:a Mod rec: Audience: Dates:1994,
CALL NUMBER: LD5655 V856 1995 TEST AUTHOR: Seevers, Gary L., Jr. TITLE: Identification of criteria for delivery of theological education through distance education [computer file] : an international Delphi study / by Gary L. Seevers, Jr. FILE TYPE: Computer data (1 file : ca. 1353 kilobytes) IMPRINT: Blacksburg, VA : Scholarly Communications Project, 1994. NOTE: System requirements: World Wide Web browser and Acrobat Reader. REMOTE ACC.: http://scholar.lib.vt.edu/theses/seeversgl/DISSERTA.pdf NOTE: Public access terminals are available in Newman Library Media Center. NOTE: Title from title page (initial screen display). NOTE: Vita. NOTE: Thesis (Ph. D. in Educational Research and Evaluation)--Virginia Polytechnic Institute and State University, 1994. NOTE: Bibliography: leaves 92-102. SUMMARY: Distance education is one means of delivering theological education which is being used increasingly. This delivery method is particularly helpful to nontraditional students who desire higher education but who cannot leave family and work commitments for residential study. For some in both developing and developed countries, distance education is the only route open to higher theological education. Criteria for assessing effective delivery of distance education have not been established in the literature. The purpose of this study was to identify such criteria. Data were collected with a three-round Delphi from an international panel of seventy-four members comprised of denominational and non-denominational educational administrators and distance educators, denominational district representatives, accreditation representatives, and adult education representatives. Two pilot studies were conducted to test the questions used for round one. Criteria statements were retained if they were deemed "important" or "very important" by at least 80 percent of the respondents on rounds two and three. The panel's responses were found to be independent of respondent location--national or international--and the category of the respondent's group membership. The findings of the study led to the identification of a set of thirty-one criteria in eight categories which may be useful for evaluating existing distance education programs or guiding the development of new programs. The eight categories were ethical concerns, commitment, curriculum, evaluation, support, technology, feedback, and faculty. There was a 100 percent consensus in rating these thirty-one criteria as "important" or "very important" by the panel members. KEY WORDS: Higher theological education; mediated instruction; learning; teaching. ADDED ENTRY: Virginia Polytechnic Institute and State University. University Libraries. Scholarly Communications Project.
Following by more than eight months the work of the ad hoc e-theses committee, the Libraries' Task Force on Cataloging Electronic Texts (TFCET) began reviewing the broader topic of bibliographic control of electronic publications. Richard Sapon-White chaired this task force that included representatives from cataloging, reference, archives, new media, and scholarly communications.
The TFCET focused on adding to current cataloging practices those fields that would enhance the OPAC users' access and conform to AACR2r. So many of the fields describing computer files appear to be redundant; 245 \h, 256, 516, and 538, for example; which tell the OPAC user over and over that the item is a computer file. To stay within the stringent restrictions of full-level cataloging, the members of the task force saw no way to avoid requiring catalogers to use most of the available fields. However, the TFCET concentrated on the MARC tags that would provide information about access. The principal fields include: 256 (computer file characteristics), 506 (restrictions on access note), 516 (type of computer file or data note), 530 (other formats available), 538 (system details note), 556 (accompanying documentation), and 856 (electronic location and access).
Current OPACs, in addition to the limitations of hardware and workstations, however, still prevent most users from accessing electronic texts or images directly and smoothly from one menu or even from a single, multi-function workstation. However, workstations are gradually becoming available that permit users to copy the URL from the bibliographic record and paste it into a World Wide Web browser for accessing an e-text from a single terminal. Knowing this was possible made the TFCET's discussion about including MARC tag 856 a brief one.
Prolonging the discussion was the need to better understand the possible consequences of using subfield u or splitting the URL into the multiple subfields. We went for simplicity and decided to format the 856 subfield u so that it could be copied and pasted into a World Wide Web browser. Again, we were not willing to wait for the programming that would be necessary to combine the separate subfields into a clickable URL.
Determining the fields appropriate to the MARC bibliographic record was only a part of the charge from the Monticello Electronic Library Project e-theses subgroup. The matrix below that shows how the MARC tags correlate with the fields being assigned to describe technical reports available in the WATERS (Wide Area TEchnical Report Service) searchable database (developed by the computer science departments at Old Dominion University, Virginia Tech, University of Virginia, and SUNY Buffalo).
Cataloging has greatly benefited from advances in library automation and the cataloging of e-texts is ripe for further automation. It is now possible to derive MARC cataloging from text mark-up languages, subsets of SGML such as TEI (Text Encoding Initiative) headers, and possibly even HTML (hypertext markup language) tags. Therefore, to the matrix of WATERS and MARC fields, I added the TEI headers.
As one way of getting from the e-text to the MARC record, the TFCET discussed the possibility of having authors assign TEI headers so that a basic record could be created that would include author, title, publisher, file size, file type, abstract, formatted contents notes, and the like. However, it will be mroe timely to use the submission form, begun by the author, added to by the Graduate School, and then the library, as the basis for the cataloging record. The first draft of this form asks the author to supply the following information, to which I have added the MARC tags.
Name: [MARC tag 100] Title: [MARC tag 245]
Document Type (check one): Abstract: [MARC tag 520] Keywords: [MARC tag 653] 1. 4. 2. 5. 3. 6. Department: [MARC tag 502] Degree: [MARC tag 502] Filename(s), size(s): [MARC tag 256] 1. 2. 3. 4. Files delivered via: Appleshare FTP * Appleshare server is currently: macintoy in LIBRARY zone. * FTP server is vatech.lib.vt.edu. Date submitted to Graduate School: [MARC 008/01-10]
In addition to considering the MARC record and the cataloging department work flow, the staff of the Scholarly Communications Project considered a procedure for getting the files from the Graduate School, the mechanics of making an e-thesis available to a cataloger (from the secure and private environment of the server) and for moving an e-thesis into public access (see figure 4). We suggested that the cataloger forward a copy as each one is processed to a server at UMI. If UMI would prefer batch processing, files could be accumulated (i.e., stored in a directory on the e-theses server) for batched file transfer, or perhaps a UMI-access point could be established on the e-theses server from which its staff could retrieve them. This decision has not been made yet.
With input from the University Archivist and addressing a concern of the Graduate School's, long term preservation and access of e-theses was also factored into the procedures. The current plan includes periodically writing e-theses to CD-ROMs for security back-ups and possibly longer term preservation. While this is may not be the final answer, an alternative has not been brought forward; how frequently this would be done has also not been determined.
Draft documentation about e-theses processing has been available through the Internet. It includes converting a dissertation to a PDF file, submitting the approved e-thesis to the Graduate School, how the files would be transmitted to the University Libraries, and how readers could access, read, print, and/or download e-theses. This information along with some background information about the e-theses project at Virginia Tech is available at http://scholar.lib.vt.edu/theses/theses.html.
Faculty and Student Concerns
At the early October 1995 meeting of the Commission on Graduate Studies and Policies, the ADGS announced that e-theses would replace traditional theses beginning with fall 1996 masters and doctoral candidates. Exemptions, he said, may be requested by the committee chairs, but he anticipates that they will be granted only for special circumstances such as when architectural drawings (or the like) cannot be digitally (re)produced with good online results. Later that month at the annual retreat of the Graduate School with deans and department heads from throughout the university, they saw the sequence an e-thesis might complete, moving through screen displays from submission to the Graduate School through to public access from OPAC and World Wide Web access.
At this meeting faculty asked about specific points they wanted clarified, some of which have no answers. For example, "How long does it take to download and/or print an electronic thesis?" This is dependent upon the RAM available on the computer, the size of the document, the type of telecommunciations lines being used, and even the time of day that an electronic document would be accessed. Faculty also had personal concerns, such as the current portability and readability of a paper document versus being limited to terminals to access works in progress. E-theses will require changes in working habits, on a different scale, perhaps, but similar to the way e-mail has changed where and how mail is sent, received, opened, and read.
Clearly, education about PDF and e-thesis preparation is needed and will be an important factor in faculty and student acceptance of this new research and publication process.
"Exchange" software is needed to prepare a PDF file from an existing online document; it can be purchased today from the campus bookstore for less than $30. It is also available at this time for public use in at least two computer labs on campus and more sites will be available as the demand increases. Faculty may have Exchange installed on their office computers through campus license agreements with Adobe.
Faculy and their students may also not be aware that in several campus computer labs there are quality scanners available for use to convert pictures to digital images. Equipment and software are available without charge but there is a cost-recovery fee for printing.
Students who were part of the CGSP asked what will become a common question, "What kind of software should I use to prepare my e-thesis so it will be easily exchanged for a PDF file?" The Graduate School and the Library have experience with many of the popular word processors, spreadsheet packages, and presentation display formats, so we know that Word, WordPerfect, LaTeX, Excel, PowerPoint, and others 'exchange' to PDF files without any problems. The more unusual and less frequently used software will have to be tested or Adobe should be consulted.
In addition, many do not know that one of the benefits of PDF files is that entire documents do not have to be sent back and forth. As long as sender and receiver (author and reviewer, student and faculty/committee member) are working from the same version of the electronic document, only the "margin notes" need to be sent because they are tied to the place in the document from which they came.
Copyright is another issue. E-theses authors will be asked to give permission to the university for the library to provide electronic access since e-theses are considered published documents once they become available on the Internet. The university does not intend to do as most publishers do and require that all copyrights be assigned to it, only that nonexclusive electronic rights be shared with it for the purpose of the library providing electronic access.
Clearly, education of both student authors and faculty committee members is necessary for them to reap the benefits of electronic theses and dissertation preparation. Similarly, library personnel will need training to use the technology to fully improve processing time and OPAC and Internet information access.
In September 1995, UMI was not prepared to participate in the strictly Internet access to e-these that Virginia Tech proposed. The UMI representatives prefer to receive each e-thesis on a separate diskette containing commonly used word processor files(s). They also want several paper documents to accompany each thesis on diskette: the abstract, the title page, and a signed publishing agreement. Since the library is attempting to speed and streamline processing, and since it will be receiving e-theses through the campus network and no on diskettes, the author may have to assume the added burden of preparing the diskette and the paper documents for UMI.
As of this writing, UMI was planning to print out these dissertations and then microfilm them. In the near term this may work but when students take advantage of the added creativity an e-theses allow them to demonstrate, microfilming and printing a work that was never intended for paper will, in some cases at least, be impossible.
Old Fashioned and New Fashioned: But It Feels Like Starting from Scratch
Thirteen years as a cataloger should have made the process of catyaloging a new format an easier one, especially since I had not only learned to catalog serials but had conquered the MARC holdings format as well. However, looking at how to catalog electronic-only journals and then monographic materials such as those being produced in our own textual studies centers as well as electronic dissertations, still made me feel as if I needed to learn about cataloging all over again but from a newer and decidedly different perspective.
Theses and dissertations as electronic files transferred from the Graduate School to the Library may well be the first major source of electronic texts that many libraries encounter regularly. We should seize this opportunity to enhance the OPAC users search results by expanding current theses cataloging and taking advantage of online information prepared by authors. Since authors will probably not be adding TEI or MARC tags to their documents to help cataloging in the near term, catalogers could use the information available in a variety of online sources including the document itself or from the online submission form to provide the basic descriptive MARC fields. Whether programmatic changes can be made or standard copy-and-paste features of word processors are incorporated, enhancing the e-theses bibliographic record does not require a lot of extra work.