Type of Document Master's Thesis Author Averboch, Guillermo Andres URN etd-07212009-040529 Title A system for document analysis, translation, and automatic hypertext linking Degree Master of Science Department Computer Science and Applications Advisory Committee
Advisor Name Title Heath, Lenwood S. Committee Chair Arthur, James D. Committee Member Fox, Edward Alan Committee Member Keywords
- computer language
Date of Defense 1995-06-05 Availability unrestricted Abstract
A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library.
This work focuses on the design of a framework that can be used to convert text documents in any format to equivalent documents in different formats and, in particular, to SGML (Standard Generalized Markup Language). In addition, the framework must be able to extract information about the analyzed documents, store that information in a permanent database, and construct hypertext links between documents and the information contained in that database and between the document themselves. For example, information about the author of a document could be extracted and stored in the database. A link can then be established between the document and the information about its author and from there to other documents by the same author. These tasks must be performed without any human intervention, even at the risk of making a small number of mistakes.
To accomplish these goals we developed a language called DELTO (Description Language for Textual Objects) that can be used to describe a document format. Given a description for a particular format, our system is able to extract information from documents in that format, to store part of that information in a permanent database, and to use that information in constructing an abstract representation of those documents that can be used to generate equivalent documents in different formats.
The system originated from this work is used for constructing the database of Envision, a Virginia Tech digital library research project.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access LD5655.V855_1995.A992.pdf 7.55 Mb 00:34:56 00:17:58 00:15:43 00:07:51 00:00:40next to an author's name indicates that all files or directories associated with their ETD are accessible from the Virginia Tech campus network only.
If you have questions or technical problems, please Contact DLA.