Question: Rna Seq Analysis
0
gravatar for puvan001@umn.edu
6.7 years ago by
puvan001@umn.edu80 wrote:
Hi I have a couple of questions regarding RNA seq analysis. My questions are 1.I need to use a viral genome (very small, ~2kb ) as a reference genome and it is not available in Galaxy. I guess I can use this data from my history. I have a fasta file but I am not sure whether I have to do some kind of indexing or not. 2. In Tophat, default for "maximum number of alignments to be allowed" is 40. What my understanding is a single read can be aligned maximum 40 different places. I am wondering why this is 40. Is there any specific reason? If I need unique mapping, I have to use 1 instead of 40. Am I correct? Thanks SP
galaxy • 966 views
ADD COMMENTlink modified 6.7 years ago by David Matthews630 • written 6.7 years ago by puvan001@umn.edu80
0
gravatar for David Matthews
6.7 years ago by
United Kingdom
David Matthews630 wrote:
Hi, I have done exactly the same kind of thing for adenovirus so I can help with it. In answer to question 1 you do not need to index it will be done for you when tophat is called. Secondly you should leave the 40 multihits as it is and post analysis filter out the multihits - this will allow you to determine if you do have a multihit problem or not and if so whether it is a big problem and where it is on the genome. I have a workflow on Galaxy which you can use called "Bristol workflow to get sorted unique proper pair mapped reads". If you plug in your sam file it should give you files listing only unique hits and those which map more than once. This workflow assumes you have paired end data but it can be modified to work with single end reads as well. Hope this helps. Best Wishes, David. __________________________________ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 D.A.Matthews@bristol.ac.uk
ADD COMMENTlink written 6.7 years ago by David Matthews630
Hi David, Thanks!When I tried to run Tophat, it doesn't recognise my FASTA file and it says "History does not include a dataset of the required format / build". Do you have any thoughts about this? Now it makes more sense about "multihits". Thanks for sharing your workflow. With regards Sumathy I have done exactly the same kind of thing for adenovirus so I can help with it. In answer to question 1 you do not need to index it will be done for you when tophat is called. Secondly you should leave the 40 multihits as it is and post analysis filter out the multihits - this will allow you to determine if you do have a multihit problem or not and if so whether it is a big problem and where it is on the genome. I have a workflow on Galaxy which you can use called "Bristol workflow to get sorted unique proper pair mapped reads". If you plug in your sam file it should give you files listing only unique hits and those which map more than once. This workflow assumes you have paired end data but it can be modified to work with single end reads as well. 1.I need to use a viral genome (very small, ~2kb ) as a reference genome and it is not available in Galaxy. I guess I can use this data from my history. I have a fasta file but I am not sure whether I have to do some kind of indexing or not. 2. In Tophat, default for "maximum number of alignments to be allowed" is 40. What my understanding is a single read can be aligned maximum 40 different places. I am wondering why this is 40. Is there any specific reason? If I need unique mapping, I have to use 1 instead of 40. Am I correct? -- Sumathy Puvanendiran Graduate student
ADD REPLYlink written 6.7 years ago by puvan001@umn.edu80
Hi, You need to run fastq groomer on your rna-seq data. Your reference is fine as a fasta. Austin
ADD REPLYlink written 6.7 years ago by Austin Paul140
Hi Austin I did all these (grooming and trimming)on rna-seq data and I don't have a problem with built in genome . I'll try again! Thanks Sumathy it says "History does not include a dataset of the required format / build". for you when tophat is called. Secondly you should leave the 40 multihits as determine if you do have a multihit problem or not and if so whether it is a big problem and where it is on the genome. I have a workflow on Galaxy which you can use called "Bristol workflow to get sorted unique proper pair mapped reads". If you plug in your sam file it should give you files listing only have paired end data but it can be modified to work with single end reads as I have a couple of questions regarding RNA seq analysis. My questions are 1.I need to use a viral genome (very small, ~2kb ) as a reference genome -- Sumathy Puvanendiran Graduate student
ADD REPLYlink written 6.7 years ago by puvan001@umn.edu80
Hello I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? Thanks Sumathy
ADD REPLYlink written 6.7 years ago by puvan001@umn.edu80
There are many ways. I typically use IGV. It needs a sam file, so I first convert the bam to sam in galaxy, then download the sam file. In IGV, I upload the reference and the sam file, then use IGVtools to index the sam file, then I can visualize the data. Austin
ADD REPLYlink written 6.7 years ago by Austin Paul140
IGV reads BAM files just fine; no need to convert to SAM. Sean
ADD REPLYlink written 6.7 years ago by Sean Davis220
Oops. Good to know. Thanks. Austin
ADD REPLYlink written 6.7 years ago by Austin Paul140
One of the problem is IGV dont have option of creating index file so one has to create index file in Galaxy first to  view in IGV. Jim I have been using IGV 2 beta version it is great work but How hard is to include index functionality with in IGV. I know we can use sam tools also but just for convinence if it is not that much of work. Vasu Subject: Re: [galaxy-user] RNA seq analysis To: "Austin Paul" <austinpa@usc.edu> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>, "puvan001@umn.edu" <puvan001@umn.edu> Date: Friday, May 6, 2011, 8:02 PM IGV reads BAM files just fine; no need to convert to SAM. Sean There are many ways.  I typically use IGV.  It needs a sam file, so I first convert the bam to sam in galaxy, then download the sam file.  In IGV, I upload the reference and the sam file, then use IGVtools to index the sam file, then I can visualize the data.   Austin Hello I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? Thanks Sumathy ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org.  Please keep all replies on the list by using "reply all" in your mail client.  For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:  http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at:  http://lists.bx.psu.edu/ ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org.  Please keep all replies on the list by using "reply all" in your mail client.  For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:   http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at:   http://lists.bx.psu.edu/
ADD REPLYlink written 6.7 years ago by vasu punj360
Hi Thanks! I am little bit familiar with IGV. I'll try then. Sumathy One of the problem is IGV dont have option of creating index file so one has to create index file in Galaxy first to  view in IGV. Jim I have been using IGV 2 beta version it is great work but How hard is to include index functionality with in IGV. I know we can use sam tools also but just for convinence if it is not that much of work. Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>, "puvan001@umn.edu" <puvan001@umn.edu> There are many ways.  I typically use IGV.  It needs a sam file, so I first convert the bam to sam in galaxy, then download the sam file.  In IGV, I upload the reference and the sam file, then use IGVtools to index the sam file, then I can visualize the data. I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? -- Sumathy Puvanendiran Graduate student
ADD REPLYlink written 6.7 years ago by puvan001@umn.edu80
Hi Vasu, I'm going to add the function to index BAM files soon, using Picard. In the beginning.... there was no java BAM reader, only SAM, and I added the index then. Indexed BAMs came along later, but that's probably more than you want to know... I think most people will still use Galaxy to index as it can take a long time, but I agree with you on the convenience factor. Jim
ADD REPLYlink written 6.7 years ago by Jim Robinson150
Thanks Jim,   Vasu   Subject: Re: [galaxy-user] RNA seq analysis To: "vasu punj" <punjv@yahoo.com> Cc: "Austin Paul" <austinpa@usc.edu>, "Sean Davis" <sdavis2@mail.nih.gov>, "galaxy-user@lists.bx.psu.edu" <galaxy- user@lists.bx.psu.edu="">, "puvan001@umn.edu" <puvan001@umn.edu> Date: Friday, May 6, 2011, 9:01 PM Hi Vasu, I'm going to add the function to index BAM files soon, using Picard.   In the beginning....  there was no java BAM reader, only SAM, and I added the index then.  Indexed BAMs came along later, but that's probably more than you want to know...    I think most people will still use Galaxy to index as it can take a long time, but I agree with you on the convenience factor. Jim
ADD REPLYlink written 6.7 years ago by vasu punj360
I generally take the GTF file to UCSC genome browser. If you are visualizing Bam file after alignment. I found IGV convinenet, though you may be able to visualize in Galaxy.   Vasu Subject: Re: [galaxy-user] RNA seq analysis To: "David Matthews" <d.a.matthews@bristol.ac.uk> Cc: galaxy-user@lists.bx.psu.edu Date: Friday, May 6, 2011, 7:30 PM Hello I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? Thanks Sumathy ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org.  Please keep all replies on the list by using "reply all" in your mail client.  For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
ADD REPLYlink written 6.7 years ago by vasu punj360
Sumathy, What kind of problems are you having with Trackster? J.
ADD REPLYlink written 6.7 years ago by Jeremy Goecks2.2k
Hi I may be doing in a wrong way. I clicked trackster and I added the custom build genome. Since it is a very small genome (~2kb), I considered this as a single contig. Then I cliked "add tracks" and added my data file. But I got a message "no data for this contig. Whenever I used built in genomes I did not have any problem. I guess I am doing something wrong here. Sumathy I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? -- Sumathy Puvanendiran Graduate student
ADD REPLYlink written 6.7 years ago by puvan001@umn.edu80
Sumathy, It sounds like you're on the right track. To visualize data for a custom build in Trackster, you need to create a custom build and use that in Trackster: (1) using the top tabs in Galaxy, go to User --> Custom Builds; (2) add a new build with the length info as follows: <contig_name> <length> Important note: you'll need to make sure that your contig name matches the one used in your fasta file. This is my best guess about what's causing problems for you. (3) Create a Trackster visualization using the custom build and add your dataset. Let us know if you have more questions/problems. Thanks, J.
ADD REPLYlink written 6.7 years ago by Jeremy Goecks2.2k
Hi Thank you! yes, your guess is correct. Now it works. Sumathy It sounds like you're on the right track. To visualize data for a custom build in Trackster, you need to create a custom build and use that in Trackster: Important note: you'll need to make sure that your contig name matches the one used in your fasta file. This is my best guess about what's causing problems for you. (3) Create a Trackster visualization using the custom build and add your dataset. I may be doing in a wrong way. I clicked trackster and I added the custom build genome. Since it is a very small genome (~2kb), I considered this as a single contig. Then I cliked "add tracks" and added my data file. But I got a message "no data for this contig. Whenever I used built in genomes I did not have any problem. I guess I am doing something wrong here. I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? -- Sumathy Puvanendiran Graduate student
ADD REPLYlink written 6.7 years ago by puvan001@umn.edu80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 67 users visited in the last hour