| Type of Document |
Master's Thesis |
| Author |
Pande, Ashwini K
|
| Author's Email Address |
aspande@vt.edu |
| URN |
etd-08282002-151909 |
| Title |
Table Understanding for Information Retrieval |
| Degree |
Master of Science |
| Department |
Computer Science |
| Advisory Committee |
| Advisor Name |
Title |
| Ehrich, Roger W. |
Committee Chair |
| Fox, Edward Alan |
Committee Member |
| North, Christopher L. |
Committee Member |
|
| Keywords |
- Information retrieval
- Statistical crosscorrelation
- Odessa digital library
- detection heuristics
- Table detection
|
| Date of Defense |
2002-08-19 |
| Availability |
unrestricted |
Abstract
This thesis proposes a novel approach for finding tables in text files containing a mixture of unstructured and structured text. Tables may be arbitrarily complex because the data in the tables may themselves be tables and because the grouping of data elements displayed in a table may be very complex. Although investigators have proposed competence models to explain the structure of tables, there are no computationally feasible performance models for detecting and parsing general structures in real data. Our emphasis is placed on the investigation of a new statistical procedure for detecting basic tables in plain text documents. The main task here is defining and testing this theory in the context of the Odessa Digital Library.
|
| Files |
| Filename |
Size |
Approximate Download Time
(Hours:Minutes:Seconds) |
| 28.8 Modem |
56K Modem |
ISDN (64 Kb) |
ISDN (128 Kb) |
Higher-speed Access |
| |
AshwiniPandeTableIR.pdf |
890.86 Kb |
00:04:07 |
00:02:07 |
00:01:51 |
00:00:55 |
00:00:04 |
|