Question: Extract Data And New Genes
0
Luciano Cosme • 220 wrote:
Hi Everyone,
I am working with *Aedes aegypti * and I obtained around 500
million
reads (HiSeq2000, 50bp). After doing all analysis of differential gene
expression using known packages (Tophat, Cufflinks, Deseq etc) I was
able
to find a set of gene of interest, besides some functional group of
genes
that I already knew that I had to look at. Now, just looking over the
4,758
supercontigs and my data using IGV from Broad Institute (loading the
genome
and the SAM files from Tophat), I find a lot of potential new genes
(hundreds or thousands of reads aligning to regions where there is no
gene
annotation), I also find new exons for some genes or exons with
different
sizes. I was thinking to do an *de novo* assembly to find new
transcripts
and genes, but I was wondering if there is something else I could do.
For
example, maybe I could just extract those regions where thousands of
reads
align (new gene). I know that we can extract the sequence data for
specific
transcript, is it possible to extract reads for regions without
annotation,
only based in the number of reads aligned? Maybe I could pull all the
data
together (from a couple sequencing lanes) and align it back to the
genome,
and then proceed to gene annotation. Another problem is that I am not
sure
how reliable would be the annotation only based on the data from
HiSeq2000.
I would appreciate if anyone one have some idea or suggestion in how
to
tackle this problem. Maybe *de novo* assembly is the way to go.
Thank you.
Luciano
--
*Luciano Cosme*
PhD Candidate
Texas A&M Entomology
Vector Biology Research Group
www.lcosme.com
979 845 1885
cosme@tamu.edu
ADD COMMENT
• link
•
modified 6.5 years ago
by
Jeremy Goecks • 2.2k
•
written
6.5 years ago by
Luciano Cosme • 220