

Type of Document Master's Thesis Author Shukla, Maulik Author's Email Address mshukla@vt.edu URN etd-09092004-152600 Title GeneSieve: A Probe Selection Strategy for cDNA Microarrays Degree Master of Science Department Computer Science Advisory Committee
Advisor Name Title Heath, Lenwood S. Committee Chair Grene, Ruth Committee Member Murali, T. M. Committee Member Ramakrishnan, Naren Committee Member Keywords
- EST annotation
- cDNA microarrays
- probe selection
Date of Defense 2004-09-08 Availability unrestricted Abstract The DNA microarray is a powerful tool to study expression levels ofthousands of genes simultaneously. Often, cDNA libraries representing expressed
genes of an organism are available, along with expressed sequence tags (ESTs).
ESTs are widely used as the probes for microarrays. Designing custom microarrays,
rich in genes relevant to the experimental objectives, requires selection of
probes based on their sequence. We have designed a probe selection method,
called GeneSieve, to select EST probes for custom microarrays. To assign
annotations to the ESTs, we cluster them into contigs using PHRAP. The larger contig
sequences are then used for similarity search against known proteins in model
organism such as Arabidopsis thaliana. We have designed three different methods to
assign annotations to the contigs: bidirectional hits (BH), bidirectional best
hits (BBH), and unidirectional best hits (UBH). We apply these methods to pine and
potato EST sets. Results show that the UBH method assigns unambiguous annotations
to a large fraction of contigs in an organism. Hence, we use UBH to assign
annotations to ESTs in GeneSieve. To select a single EST from a contig, GeneSieve assigns a
quality score to each EST based on its protein homology (PH), cross
hybridization (CH), and relative length (RL). We use this quality score to rank ESTs
according to seven different measures: length, 3' proximity, 5' proximity, protein
homology, cross hybridization, relative length, and overall quality score. Results for
pine and potato EST sets indicate that EST probes selected by quality score are
relatively long and give better values for protein homology and cross
hybridization. Results of the GeneSieve protocol are stored in a database and linked with
sequence databases and known functional category schemes such as MIPS and GO. The
database is made available via a web interface. A biologist is able to select
large number of EST probes based on annotations or functional categories in a quick
and easy way.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access GeneSieve.pdf 1.32 Mb 00:06:06 00:03:08 00:02:45 00:01:22 00:00:07
If you have questions or technical problems, please Contact DLA.