Rna Seq Analysis

7.6 years ago by

United Kingdom

David Matthews • 630 wrote:

Hi, I have done exactly the same kind of thing for adenovirus so I can help with it. In answer to question 1 you do not need to index it will be done for you when tophat is called. Secondly you should leave the 40 multihits as it is and post analysis filter out the multihits - this will allow you to determine if you do have a multihit problem or not and if so whether it is a big problem and where it is on the genome. I have a workflow on Galaxy which you can use called "Bristol workflow to get sorted unique proper pair mapped reads". If you plug in your sam file it should give you files listing only unique hits and those which map more than once. This workflow assumes you have paired end data but it can be modified to work with single end reads as well. Hope this helps. Best Wishes, David. __________________________________ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 D.A.Matthews@bristol.ac.uk

ADD COMMENT • link written 7.6 years ago by David Matthews • 630

Hi David, Thanks!When I tried to run Tophat, it doesn't recognise my FASTA file and it says "History does not include a dataset of the required format / build". Do you have any thoughts about this? Now it makes more sense about "multihits". Thanks for sharing your workflow. With regards Sumathy I have done exactly the same kind of thing for adenovirus so I can help with it. In answer to question 1 you do not need to index it will be done for you when tophat is called. Secondly you should leave the 40 multihits as it is and post analysis filter out the multihits - this will allow you to determine if you do have a multihit problem or not and if so whether it is a big problem and where it is on the genome. I have a workflow on Galaxy which you can use called "Bristol workflow to get sorted unique proper pair mapped reads". If you plug in your sam file it should give you files listing only unique hits and those which map more than once. This workflow assumes you have paired end data but it can be modified to work with single end reads as well. 1.I need to use a viral genome (very small, ~2kb ) as a reference genome and it is not available in Galaxy. I guess I can use this data from my history. I have a fasta file but I am not sure whether I have to do some kind of indexing or not. 2. In Tophat, default for "maximum number of alignments to be allowed" is 40. What my understanding is a single read can be aligned maximum 40 different places. I am wondering why this is 40. Is there any specific reason? If I need unique mapping, I have to use 1 instead of 40. Am I correct? -- Sumathy Puvanendiran Graduate student

ADD REPLY • link written 7.6 years ago by puvan001@umn.edu • 80

Hi, You need to run fastq groomer on your rna-seq data. Your reference is fine as a fasta. Austin

ADD REPLY • link written 7.6 years ago by Austin Paul • 140

Hi Austin I did all these (grooming and trimming)on rna-seq data and I don't have a problem with built in genome . I'll try again! Thanks Sumathy it says "History does not include a dataset of the required format / build". for you when tophat is called. Secondly you should leave the 40 multihits as determine if you do have a multihit problem or not and if so whether it is a big problem and where it is on the genome. I have a workflow on Galaxy which you can use called "Bristol workflow to get sorted unique proper pair mapped reads". If you plug in your sam file it should give you files listing only have paired end data but it can be modified to work with single end reads as I have a couple of questions regarding RNA seq analysis. My questions are 1.I need to use a viral genome (very small, ~2kb ) as a reference genome -- Sumathy Puvanendiran Graduate student

ADD REPLY • link written 7.6 years ago by puvan001@umn.edu • 80

Hello I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? Thanks Sumathy

ADD REPLY • link written 7.6 years ago by puvan001@umn.edu • 80

There are many ways. I typically use IGV. It needs a sam file, so I first convert the bam to sam in galaxy, then download the sam file. In IGV, I upload the reference and the sam file, then use IGVtools to index the sam file, then I can visualize the data. Austin

ADD REPLY • link written 7.6 years ago by Austin Paul • 140

IGV reads BAM files just fine; no need to convert to SAM. Sean

ADD REPLY • link written 7.6 years ago by Sean Davis • 220

Oops. Good to know. Thanks. Austin

ADD REPLY • link written 7.6 years ago by Austin Paul • 140

One of the problem is IGV dont have option of creating index file so one has to create index file in Galaxy first to view in IGV. Jim I have been using IGV 2 beta version it is great work but How hard is to include index functionality with in IGV. I know we can use sam tools also but just for convinence if it is not that much of work. Vasu Subject: Re: [galaxy-user] RNA seq analysis To: "Austin Paul" <austinpa@usc.edu> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>, "puvan001@umn.edu" <puvan001@umn.edu> Date: Friday, May 6, 2011, 8:02 PM IGV reads BAM files just fine; no need to convert to SAM. Sean There are many ways. I typically use IGV. It needs a sam file, so I first convert the bam to sam in galaxy, then download the sam file. In IGV, I upload the reference and the sam file, then use IGVtools to index the sam file, then I can visualize the data. Austin Hello I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? Thanks Sumathy ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

ADD REPLY • link written 7.6 years ago by vasu punj • 360

Hi Thanks! I am little bit familiar with IGV. I'll try then. Sumathy One of the problem is IGV dont have option of creating index file so one has to create index file in Galaxy first to view in IGV. Jim I have been using IGV 2 beta version it is great work but How hard is to include index functionality with in IGV. I know we can use sam tools also but just for convinence if it is not that much of work. Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>, "puvan001@umn.edu" <puvan001@umn.edu> There are many ways. I typically use IGV. It needs a sam file, so I first convert the bam to sam in galaxy, then download the sam file. In IGV, I upload the reference and the sam file, then use IGVtools to index the sam file, then I can visualize the data. I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? -- Sumathy Puvanendiran Graduate student

ADD REPLY • link written 7.6 years ago by puvan001@umn.edu • 80

Hi Vasu, I'm going to add the function to index BAM files soon, using Picard. In the beginning.... there was no java BAM reader, only SAM, and I added the index then. Indexed BAMs came along later, but that's probably more than you want to know... I think most people will still use Galaxy to index as it can take a long time, but I agree with you on the convenience factor. Jim

ADD REPLY • link written 7.6 years ago by Jim Robinson • 150

Thanks Jim, Vasu Subject: Re: [galaxy-user] RNA seq analysis To: "vasu punj" <punjv@yahoo.com> Cc: "Austin Paul" <austinpa@usc.edu>, "Sean Davis" <sdavis2@mail.nih.gov>, "galaxy-user@lists.bx.psu.edu" <galaxy- user@lists.bx.psu.edu="">, "puvan001@umn.edu" <puvan001@umn.edu> Date: Friday, May 6, 2011, 9:01 PM Hi Vasu, I'm going to add the function to index BAM files soon, using Picard. In the beginning.... there was no java BAM reader, only SAM, and I added the index then. Indexed BAMs came along later, but that's probably more than you want to know... I think most people will still use Galaxy to index as it can take a long time, but I agree with you on the convenience factor. Jim

ADD REPLY • link written 7.6 years ago by vasu punj • 360

I generally take the GTF file to UCSC genome browser. If you are visualizing Bam file after alignment. I found IGV convinenet, though you may be able to visualize in Galaxy. Vasu Subject: Re: [galaxy-user] RNA seq analysis To: "David Matthews" <d.a.matthews@bristol.ac.uk> Cc: galaxy-user@lists.bx.psu.edu Date: Friday, May 6, 2011, 7:30 PM Hello I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? Thanks Sumathy ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

ADD REPLY • link written 7.6 years ago by vasu punj • 360

Sumathy, What kind of problems are you having with Trackster? J.

ADD REPLY • link written 7.6 years ago by Jeremy Goecks • 2.2k

Hi I may be doing in a wrong way. I clicked trackster and I added the custom build genome. Since it is a very small genome (~2kb), I considered this as a single contig. Then I cliked "add tracks" and added my data file. But I got a message "no data for this contig. Whenever I used built in genomes I did not have any problem. I guess I am doing something wrong here. Sumathy I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? -- Sumathy Puvanendiran Graduate student

ADD REPLY • link written 7.6 years ago by puvan001@umn.edu • 80

Sumathy, It sounds like you're on the right track. To visualize data for a custom build in Trackster, you need to create a custom build and use that in Trackster: (1) using the top tabs in Galaxy, go to User --> Custom Builds; (2) add a new build with the length info as follows: <contig_name> <length> Important note: you'll need to make sure that your contig name matches the one used in your fasta file. This is my best guess about what's causing problems for you. (3) Create a Trackster visualization using the custom build and add your dataset. Let us know if you have more questions/problems. Thanks, J.

ADD REPLY • link written 7.6 years ago by Jeremy Goecks • 2.2k

Hi Thank you! yes, your guess is correct. Now it works. Sumathy It sounds like you're on the right track. To visualize data for a custom build in Trackster, you need to create a custom build and use that in Trackster: Important note: you'll need to make sure that your contig name matches the one used in your fasta file. This is my best guess about what's causing problems for you. (3) Create a Trackster visualization using the custom build and add your dataset. I may be doing in a wrong way. I clicked trackster and I added the custom build genome. Since it is a very small genome (~2kb), I considered this as a single contig. Then I cliked "add tracks" and added my data file. But I got a message "no data for this contig. Whenever I used built in genomes I did not have any problem. I guess I am doing something wrong here. I was able to run RNA seq data against a custom build genome. How can I visualize the results. I tried via trackster and unfortunately I couldn't. Can you help me? -- Sumathy Puvanendiran Graduate student

ADD REPLY • link written 7.6 years ago by puvan001@umn.edu • 80

Similar posts • Search »