Extract Data And New Genes

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Extract Data And New Genes

0

6.5 years ago by

Luciano Cosme • 220

Luciano Cosme • 220 wrote:

Hi Everyone, I am working with *Aedes aegypti * and I obtained around 500 million reads (HiSeq2000, 50bp). After doing all analysis of differential gene expression using known packages (Tophat, Cufflinks, Deseq etc) I was able to find a set of gene of interest, besides some functional group of genes that I already knew that I had to look at. Now, just looking over the 4,758 supercontigs and my data using IGV from Broad Institute (loading the genome and the SAM files from Tophat), I find a lot of potential new genes (hundreds or thousands of reads aligning to regions where there is no gene annotation), I also find new exons for some genes or exons with different sizes. I was thinking to do an *de novo* assembly to find new transcripts and genes, but I was wondering if there is something else I could do. For example, maybe I could just extract those regions where thousands of reads align (new gene). I know that we can extract the sequence data for specific transcript, is it possible to extract reads for regions without annotation, only based in the number of reads aligned? Maybe I could pull all the data together (from a couple sequencing lanes) and align it back to the genome, and then proceed to gene annotation. Another problem is that I am not sure how reliable would be the annotation only based on the data from HiSeq2000. I would appreciate if anyone one have some idea or suggestion in how to tackle this problem. Maybe *de novo* assembly is the way to go. Thank you. Luciano -- *Luciano Cosme* PhD Candidate Texas A&M Entomology Vector Biology Research Group www.lcosme.com 979 845 1885 cosme@tamu.edu

rna-seq cufflinks • 1.2k views

ADD COMMENT • link •

modified 6.5 years ago by Jeremy Goecks • 2.2k • written 6.5 years ago by Luciano Cosme • 220

0

6.5 years ago by

Jeremy Goecks • 2.2k

Jeremy Goecks • 2.2k wrote:

This shouldn't be completely unexpected. High-coverage RNA-seq data is constantly revealing new exons/splicing/transcripts, even in well- annotated genomes. My suggestion: do reference-guided assembly with Cufflinks; this will yield both existing and new transcripts. You could subtract known genes from the Cufflinks assembly to get only novel transcripts. Best, J.

ADD COMMENT • link written 6.5 years ago by Jeremy Goecks • 2.2k

Thanks Jeremy, I will do it before try the *de novo *assembly. Luciano

ADD REPLY • link written 6.5 years ago by Luciano Cosme • 220

Please log in to add an answer.

Similar posts • Search »

De novo transcriptome assembly and reference guided transcriptome assembly
Hi, I have four related questions about de novo RNAseq data analysis. I have 4 RNAseq data obtai...
De Novo Assembly Plant Transcriptome
Dear Galaxy Expert, I would like to use Galaxy to de-novo assembly single-end read illumina data...
Extract assembled transcripts from Cufflinks gtf without to use reference genome
Hi all, I know there were similar questions, but unfortunately I couldn't find proper answer for...
Problem Report!!
Dear Galaxy member, I'm sending you this e-mail because of a problem I have in fetching sequence...
RNAseq data to be processed in two ways: (i) mapping to de novo Trinity-based transcriptome and (ii) mapping a relatively new genome
Hello all, I am new to RNAseq data and learning this process step by step, so I have a few quest...
Extract Spliced Transcripts From Cuffmerge
Hi All, I have mapped RNA-Seq reads to an unannotated partial genome assembly and assembled them...
Suggestions For De Novo Assembly Plant Transcriptome Without Reference
Dear Galaxy Expert, I would like to use Galaxy to de-novo assembly single-end read illumina data...
cufflink only assembles 1 transcript for gene of interest when tophat splice junctions suggest different alternative splicings
Hello Friends I am currently doing RNA-seq hoping to find new transcript (we found with our qPC...
Error using stringtie - AttributeError: 'NoneType' object has no attribute"
Hi, I have **RNA-seq data** and I am interested in whole gene expression results but also transcr...
De novo transcriptome analysis pipeline
Hello, I am currently running Trinity to do de novo transcriptome assembly of a breeding gland ...
Blast
Hi... I have an analysis of 18,000,000 + sequences (X4) blasting through the HTGS database. Is ...
galaxy tutorial data
I am unable to find data for doing Microbial de novo Assembly for Illumina Data on galaxy. Can an...
Strand Specificity
Dear All, I am trying to analyze my RNAseq data by TopHat-Cufflinks package based on Galaxy. I u...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 172 users visited in the last hour