| Type of Document |
Master's Thesis |
| Author |
Tilley, Jason W
|
| URN |
etd-01052009-103100 |
| Title |
A Comparison of Statistical Filtering Methods for Automatic Term Extraction for Domain Analysis |
| Degree |
Master of Engineering |
| Department |
Computer Science |
| Advisory Committee |
| Advisor Name |
Title |
| William Frakes |
Committee Chair |
| Gabriella Belli |
Committee Member |
| Gregory Kulczycki |
Committee Member |
|
| Keywords |
- domain analysis
- term extraction
|
| Date of Defense |
2008-12-22 |
| Availability |
unrestricted |
Abstract
Fourteen word frequency metrics were tested to evaluate their effectiveness in identifying vocabulary in a domain. Fifteen domain engineering projects were examined to measure how closely the vocabularies selected by the fourteen word frequency metrics were to the vocabularies produced by domain engineers. Six filtering mechanisms were also evaluated to measure their impact on selecting proper vocabulary terms. The results of the experiment show that stemming and stop word removal do improve overlap scores and that term frequency is a valuable contributor to overlap. Variations on term frequency are not always significant improvers of overlap.
|
| Files |
| Filename |
Size |
Approximate Download Time
(Hours:Minutes:Seconds) |
| 28.8 Modem |
56K Modem |
ISDN (64 Kb) |
ISDN (128 Kb) |
Higher-speed Access |
| |
JasonThesis_5_10_09.pdf |
1.25 Mb |
00:05:45 |
00:02:57 |
00:02:35 |
00:01:17 |
00:00:06 |
|