Scholarly Communications Project


Visualizing Categorical Time Series Data with Applications to Computer and Communications Network Traces

by

Randy L. Ribler

PhD Dissertation submitted to the Faculty of the Virginia Tech in partial fulfillment of the requirements for the degree of

PhD

in

Computer Science

Approved

Marc Abrams, Chair
Roger Ehrich
Robert Foutz
Ronald Kriz
Calvin Ribbens

April 4, 1997
Blacksburg, Virginia


Abstract

Visualization tools allow scientists to comprehend very large data sets and to discover relationships which are otherwise difficult to detect. Unfortunately, not all types of data can be visualized easily using existing tools. In particular, long sequences of nonnumeric data cannot be visualized adequately. Examples of this type of data include trace files of computer performance information, the nucleotides in a genetic sequence, a record of stocks traded over a period of years, and the sequence of words in this document. The term categorical time series is defined and used to describe this family of data. When visualizations designed for numerical time series are applied to categorical time series, the distortions which result from the arbitrary conversion of unordered categorical values to totally ordered numerical values can be profound. Examples of this phenomenon are presented and explained. Several new, general purpose techniques for visualizing categorical time series data have been developed as part of this work and have been incorporated into the Chitra performance analysis and visualization system. All of these new visualizations can be produced in O(n) time. The new visualizations for categorical time series provide general purpose techniques for visualizing aspects of categorical data which are commonly of interest. These include periodicity, stationarity, cross-correlation, autocorrelation, and the detection of recurring patterns. The effective use of these visualizations is demonstrated in a number of application domains, including performance analysis, World Wide Web traffic analysis, network routing simulations, document comparison, pattern detection, and the analysis of the performance of genetic algorithms.

Full text (PDF) 20,919,384 Bytes


The author grants to Virginia Tech or its agents the right to archive and display their thesis or dissertation in whole or in part in the University Libraries in all forms of media, now or hereafter known. The author retains all proprietary rights, such as patent rights. The author also retains the right to use in future works (such as articles or books) all or part of this thesis or dissertation.
[ETD main page] [Search ETDs][etd.vt.edu] [SCP home page] [library home page]

Send Suggestions or Comments to webmaster@scholar.lib.vt.edu