Question: Finding exogenous transcripts in RNA-Seq bam files
0
gravatar for vasioukhin
16 months ago by
vasioukhin0
vasioukhin0 wrote:

Hello,

What is the easiest way to find and count reads for exogenous transcripts (Cre, GFP, etc.) in RNA-Seq bam files using Galaxy? These transcripts are not in normal genomic databases, but I can make Fasta files for them. Thank you,

Valeri

ADD COMMENTlink modified 16 months ago • written 16 months ago by vasioukhin0

Yes, the reads are from RNA.

ADD REPLYlink written 16 months ago by vasioukhin0
0
gravatar for Jennifer Hillman Jackson
16 months ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

The tool htseq_count could be used.

This would require a reference annotation file for these exogenous transcripts in GTF format as one input. Mapped your reads to the same exact reference genome as the GTF file is based on is the other.

There are other methods. Please let us know if you do not have a GTF file and we can go from there - please note the target genome and if the reads are RNA or DNA (assuming RNA, but please confirm).

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 16 months ago • written 16 months ago by Jennifer Hillman Jackson23k
0
gravatar for vasioukhin
16 months ago by
vasioukhin0
vasioukhin0 wrote:

Thanks Jen. I think what we want to do is something that will be useful for almost all researchers working with RNA-Seq. Our reads are from mouse cells and mouse tissues. We mapped them against mm10 genome and generated GTF and BAM files. What would be the way to generate GTF file for Cre and GFP using their corresponding cDNA sequences?

ADD COMMENTlink written 16 months ago by vasioukhin0

Blast can be used to map the longer sequences to the genome. One of the output format options is tabular. This tabular data can be simply rearranged to create a GFF/GTF file for all fields but the last one (attributes) which will take more formatting.

Attributes are important, especially the gene_id and transcript_id values. Both can be the same value for certain datatypes. Create and format these from the cDNA sequence name itself.

GFF/GTF specifications are available a few places on the web, this is one with links: https://wiki.galaxyproject.org/Learn/Datatypes#GFF

Blast+ is available in the Tool Shed for use with a local or cloud Galaxy.

ADD REPLYlink written 16 months ago by Jennifer Hillman Jackson23k

Dear Jennifer, Thank you for taking your time to answer my questions.I think I am missing something important.Are you explaining how to make a GTF file from a given sequence?GFP and Cre sequences are not in any genomes. What do I blast them against? Wow, it seems that this is such an easy and useful task, to find out whether a given exogenous non-genomic sequence is present in RNA-Seq reads.I am surprised that there is no easy way of doing that. Thanks, Valeri

  From: Jennifer Hillman Jackson on Galaxy Biostar <notifications@biostars.org>

To: vasioukhin@yahoo.com Sent: Wednesday, September 21, 2016 1:32 PM Subject: [galaxy-biostar] Finding exogenous transcripts in RNA-Seq bam files

Activity on a post you are following on Galaxy Biostar User Jennifer Hillman Jackson wrote Comment: Finding exogenous transcripts in RNA-Seq bam files: Blast can be used to map the longer sequences to the genome. One of the output format options is tabular. This tabular data can be simply rearranged to create a GFF/GTF file for all fields but the last one (attributes) which will take more formatting. Attributes are important, especially the gene_id and transcript_id values. Both can be the same value for certain datatypes. Create and format these from the cDNA sequence name itself.GFF/GTF specifications are available a few places on the web, this is one with links: https://wiki.galaxyproject.org/Learn/Datatypes#GFFBlast+ is available in the Tool Shed for use with a local or cloud Galaxy. You may reply via email or visit http://biostar.usegalaxy.org/p/19622/#19634

ADD REPLYlink written 16 months ago by vasioukhin0

If the sequences are not mapped with coordinates to a reference genome, then you could map the reads to the GFP and Cre sequences directly instead. Put the target sequences into a single fasta file and use it as a Custom reference genome. Any fasta file can be used as a "reference genome" - it is a global term. Whole or partial transcriptomes, groups of mRNA or DNA sequences - all are acceptable as long as the data is in fasta format. https://wiki.galaxyproject.org/Support#Custom_reference_genome

ADD REPLYlink written 16 months ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 107 users visited in the last hour