Question: Extract Data And New Genes
6.5 years ago
Luciano Cosme
Luciano Cosme wrote:
Hi Everyone, I am working with *Aedes aegypti * and I obtained around 500 million reads (HiSeq2000, 50bp). After doing all analysis of differential gene expression using known packages (Tophat, Cufflinks, Deseq etc) I was able to find a set of gene of interest, besides some functional group of genes that I already knew that I had to look at. Now, just looking over the 4,758 supercontigs and my data using IGV from Broad Institute (loading the genome and the SAM files from Tophat), I find a lot of potential new genes (hundreds or thousands of reads aligning to regions where there is no gene annotation), I also find new exons for some genes or exons with different sizes. I was thinking to do an *de novo* assembly to find new transcripts and genes, but I was wondering if there is something else I could do. For example, maybe I could just extract those regions where thousands of reads align (new gene). I know that we can extract the sequence data for specific transcript, is it possible to extract reads for regions without annotation, only based in the number of reads aligned? Maybe I could pull all the data together (from a couple sequencing lanes) and align it back to the genome, and then proceed to gene annotation. Another problem is that I am not sure how reliable would be the annotation only based on the data from HiSeq2000. I would appreciate if anyone one have some idea or suggestion in how to tackle this problem. Maybe *de novo* assembly is the way to go. Thank you. Luciano -- *Luciano Cosme* PhD Candidate Texas A&M Entomology Vector Biology Research Group 979 845 1885
6.5 years ago
Jeremy Goecks
Jeremy Goecks wrote:
This shouldn't be completely unexpected. High-coverage RNA-seq data is constantly revealing new exons/splicing/transcripts, even in well- annotated genomes. My suggestion: do reference-guided assembly with Cufflinks; this will yield both existing and new transcripts. You could subtract known genes from the Cufflinks assembly to get only novel transcripts. Best, J.
Thanks Jeremy, I will do it before try the *de novo *assembly. Luciano
