

Type of Document Dissertation Author Abdulla, Ghaleb M.S. Author's Email Address abdulla@vt.edu URN etd-33098-142912 Title Analysis and Modeling of World Wide Web Traffic Degree PhD Department Computer Science Advisory Committee
Advisor Name Title Fox, Edward Alan Committee Chair Abrams, Marc Committee Member Balci, Osman Committee Member Kafura, Dennis G. Committee Member Nayfeh, Ali H. Committee Member Keywords
- Time Series
- Modeling
- Scalability
- World Wide Web
- Log analysis
- Caching
- Proxy
Date of Defense 1998-04-27 Availability unrestricted Abstract
This dissertation deals with monitoring, collecting,
analyzing, and modeling of World Wide Web (WWW)
traffic and client interactions. The rapid growth of
WWW usage has not been accompanied by an overall
understanding of models of information resources and
their deployment strategies. Consequently, the current
Web architecture often faces performance and reliability
problems. Scalability, latency, bandwidth, and
disconnected operations are some of the important
issues that should be considered when attempting to
adjust for the growth in Web usage. The WWW
Consortium launched an effort to design a new protocol
that will be able to support future demands. Before doing
that, however, we need to characterize current users'
interactions with the WWW and understand how it is
being used.
We focus on proxies since they provide a good medium
for caching, filtering information, payment methods, and
copyright management. We collected proxy data from
our environment over a period of more than two years.
We also collected data from other sources such as
schools, information service providers, and commercial
sites. Sampling times range from days to years. We
analyzed the collected data looking for important
characteristics that can help in designing a better HTTP
protocol. We developed a modeling approach that
considers Web traffic characteristics such as
self-similarity and long-range dependency. We
developed an algorithm to characterize users' sessions.
Finally we developed a high-level Web traffic model
suitable for sensitivity analysis.
As a result of this work we develop statistical models of
parameters such as arrival times, file sizes, file types, and
locality of reference. We describe an approach to model
long-range and dependent Web traffic and we
characterize activities of users accessing a digital library
courseware server or Web search tools.
Temporal and spatial locality of reference within
examined user communities is high, so caching can be an
effective tool to help reduce network traffic and to help
solve the scalability problem. We recommend utilizing
our findings to promote a smart distribution or push
model to cache documents when there is likelihood of
repeat accesses.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access thesis.pdf 2.63 Mb 00:12:09 00:06:15 00:05:28 00:02:44 00:00:14
If you have questions or technical problems, please Contact DLA.