Question: Extract Data And New Genes
gravatar for Luciano Cosme
6.5 years ago by
Luciano Cosme220
Luciano Cosme220 wrote:
Hi Everyone, I am working with *Aedes aegypti * and I obtained around 500 million reads (HiSeq2000, 50bp). After doing all analysis of differential gene expression using known packages (Tophat, Cufflinks, Deseq etc) I was able to find a set of gene of interest, besides some functional group of genes that I already knew that I had to look at. Now, just looking over the 4,758 supercontigs and my data using IGV from Broad Institute (loading the genome and the SAM files from Tophat), I find a lot of potential new genes (hundreds or thousands of reads aligning to regions where there is no gene annotation), I also find new exons for some genes or exons with different sizes. I was thinking to do an *de novo* assembly to find new transcripts and genes, but I was wondering if there is something else I could do. For example, maybe I could just extract those regions where thousands of reads align (new gene). I know that we can extract the sequence data for specific transcript, is it possible to extract reads for regions without annotation, only based in the number of reads aligned? Maybe I could pull all the data together (from a couple sequencing lanes) and align it back to the genome, and then proceed to gene annotation. Another problem is that I am not sure how reliable would be the annotation only based on the data from HiSeq2000. I would appreciate if anyone one have some idea or suggestion in how to tackle this problem. Maybe *de novo* assembly is the way to go. Thank you. Luciano -- *Luciano Cosme* PhD Candidate Texas A&M Entomology Vector Biology Research Group 979 845 1885
rna-seq cufflinks • 1.2k views
ADD COMMENTlink modified 6.5 years ago by Jeremy Goecks2.2k • written 6.5 years ago by Luciano Cosme220
gravatar for Jeremy Goecks
6.5 years ago by
Jeremy Goecks2.2k
Jeremy Goecks2.2k wrote:
This shouldn't be completely unexpected. High-coverage RNA-seq data is constantly revealing new exons/splicing/transcripts, even in well- annotated genomes. My suggestion: do reference-guided assembly with Cufflinks; this will yield both existing and new transcripts. You could subtract known genes from the Cufflinks assembly to get only novel transcripts. Best, J.
ADD COMMENTlink written 6.5 years ago by Jeremy Goecks2.2k
Thanks Jeremy, I will do it before try the *de novo *assembly. Luciano
ADD REPLYlink written 6.5 years ago by Luciano Cosme220
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 124 users visited in the last hour