Question: Collapsing and Mapping Reads
gravatar for fat9
4.5 years ago by
United States
fat90 wrote:


I'm starting with FASTQ files and want to Collapse the sequences and then Map them to the Saccer3 (yeast) genome using bowtie.

The problem is that if I Collapse the reads first they are in a FASTA format and they need to be FASTQ in order to use Bowtie to map them. Can someone help me with this? Thank you very much for your time and your help!!


bowtie galaxy • 2.0k views
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by fat90

Thank you for responding to my question, much appreciated. I'm not actually doing RNA-seq but HITS-CLIP instead, but there was no tag for that so I thought RNA-seq might be sufficient. Do you still think that Tophat/Tophat2 would be a better fit for sequencing?

(since we must do PCR to generate our cDNA libraries for HITS-CLIP we collapse identical reads)

All Best,


ADD REPLYlink written 4.5 years ago by fat90


I am not specifically familiar with all of the nuances of this protocol, but since this is RNA data (any data from a cDNA library would fall into this category), then choosing a mapper that will accommodate spanning splice sites seems important. A post a bit over a year ago at the primary Biostar web site has Tophat/2 as well as BLAT mentioned as good potential choices, plus some others, in the top ranked answers. These both seem like good places to start.

Keep in mind that BLAT requires longer sequences, but will accept just fasta data and is an excellent tool. The wrapper is available in the Tool Shed for use in a local/cloud Galaxy. It will need proper licensing for the binary - but that is free for academic/research use. You may want to use the cloud/toolshed anyway, if other tools of interest are there. Be sure to check out Amazons AWS grants for educational/research work if choosing the cloud.

Feel free to create new tags - this site is just getting started!


ADD REPLYlink written 4.5 years ago by Jennifer Hillman Jackson25k
gravatar for Jennifer Hillman Jackson
4.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:


Keep in mind that when you collapse sequences, the original quality score is going to be lost. There isn't a way to merge the quality scores and they probably differ between the reads, even if the sequence is the same. Still - if you want to do this, convert to fasta, collapse, then convert back to fastq with default quality scores assigned, using the tool "NGS: QC and manipulation -> Combine FASTA and QUAL". 

Your question does not seem to fit the tags assigned - you can change them or I can. Bowtie/Bowie2 would not be a good choice for mapping RNA-seq reads to yeast because of the splicing (Tophat/Tophat2 is better). Also, removing redundancy in reads will almost certainly bias the expression profile of the data, also not a good idea for RNA-seq. If really doing RNA-seq, I wonder what tutorial are you following .. here are some others to explore:

Best, Jen, Galaxy team

ADD COMMENTlink written 4.5 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour