Question: How to unzip the fastq files to be used in Tophat?
0
gravatar for sangramsahu15
22 months ago by
sangramsahu150 wrote:

While I export RNAseq data from EBI SRA database. It came in xxx.fastq.qz format, which are not accepted by Tophat. I tried with FASTQ groomer. it also showing failed. How to solve this??

rna-seq tophat galaxy • 1.3k views
ADD COMMENTlink modified 22 months ago by jling70 • written 22 months ago by sangramsahu150
2
gravatar for Jennifer Hillman Jackson
22 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The datatype "fastqsanger.gz" can also be assigned. This keeps the data in compressed format, reducing history/dataset content size that counts in account quota size. Be sure to not simply assign "fastqsanger" - tools will not then know that the data is uncompressed.

This is brand-new functionality related with the upcoming 17.01 update Galaxy release. Documentation about usage will be updated once all behavior tunings are complete, around the time of the official release (estimate: few weeks).

Fastqsanger format is required for tools that interpret quality scores for calculations. Confirming fastqsanger (Sanger Phred +33) format is important or low mapping rates will results (and sometimes errors, depending on the tool). How to check format/current quality score scaling in uploaded datasets with FastQC, and how to rescale (if needed) with the Fastq Groomer tool, is explained here: https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA

Thanks!! Jen

ADD COMMENTlink modified 22 months ago • written 22 months ago by Jennifer Hillman Jackson25k

Yes, now the problem is solved with Fastq Groomer. Thank you.

But meanwhile another problem arose, Tophat job i start from the last two days and it is worth 500mb roughly. And it's still in the queue. Is there any problem with the server??

Please help me out in this issue. It's urget.

Thank You, Sangram

ADD REPLYlink written 22 months ago by sangramsahu150
1

Expanded help for compressed fastq.gz data (some duplicated with this post, but wanted to link for reference for others): https://biostar.usegalaxy.org/p/21485/#21500

Reported issues with queued job delays at http://usegalaxy.org are being combined into this post, where updates and feedback will be posted as soon as the core issue is investigated and our advice back for the remedy is determined. https://biostar.usegalaxy.org/p/21484

ADD REPLYlink modified 22 months ago • written 22 months ago by Jennifer Hillman Jackson25k
1
gravatar for jling
22 months ago by
jling70
United States
jling70 wrote:

Make sure your 'Datatype' is fastq.gz

Then go to 'Convert Format' and convert it from fastq.gz to fastqsanger

ADD COMMENTlink written 22 months ago by jling70

Yes my file in fastq.gz format only. But from which option of "Convert Format" i can do this? I found all but nothing such fastq.gz to fastqsanger is there.

ADD REPLYlink written 22 months ago by sangramsahu150
1

Click 'Edit Attributes', on the top click 'Convert Format' and select convert fastq.gz to fastq

Tophat only takes fastqsanger for some reason so you have to change the datatype afterwards from fastq to fastqsanger

ADD REPLYlink written 22 months ago by jling70
1

Good clarification! Uncompress first (fastq.gz > fastq), then assign fastqsanger. Or change the datatype to "fastqsanger.gz" directly for fastq.gz datasets without uncompressing.

Be careful to not assign a compressed datatype to an uncompressed dataset, and the reverse. Also make sure data really is in fastqsanger/fastqsanger.gz format before assigning that to uploaded fastq data. Both are very important to assign correctly.

Fastqsanger indicates Sanger Phred +33 quality score scaling. This is interpreted by tools. If the data is a different type (fastqillumina 1.3-1.7), and assigned as fastqsanger, tools may not fail, but the scientific content will be problematic (example: mapping rates will be very low). How to check the fastq type/scaling is covered by the support wiki (link in the other post) for FASTQ data.

If there is a mismatch between the actual format and the assigned datatype, tool problems will result. Either as an error or as poor scientific result quality.

ADD REPLYlink modified 22 months ago • written 22 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour