Question: Low data conversion rate for BAM-to-SAM. Fix Database, Datatype, Sorting
0
gravatar for cwalker912
14 months ago by
cwalker9120
cwalker9120 wrote:

I have WXS fastq files from an illumina HiSeq 4000 paired end run- I uploaded them through FTP as fastqillumina. They are each about 24 GB. Reads look fine using FastQ Summary Statistics. I aligned to hg19 using BWA for illumina, and got a SAM file that is 62GB. Then I took the SAM file and tried to run SAMTOOLS SAM to BAM. This ran for a few hours and the output BAM file is 1.8 KB, (KILObytes - as in tiny). Please let me know where I went wrong with this workflow... Any help would be greatly appreciated. Thank you very much.

ADD COMMENTlink modified 14 months ago by Jennifer Hillman Jackson23k • written 14 months ago by cwalker9120
0
gravatar for Jennifer Hillman Jackson
14 months ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

Two areas to correct/adjust:

1. Database and Sorting

Does the input dataset have the correct reference genome assigned as the "database"? Samtools requires this as well as sorted input.

Fix: Assign the correct "database". If you used a Custom Reference genome for alignment, then create a Custom Build from that to assign. Sort the input BAM dataset.

How to change datatype: https://wiki.galaxyproject.org/Support#Tool_doesn.27t_recognize_dataset

How to create a Custom Build. Other CG formatting rules on the same wiki: https://wiki.galaxyproject.org/Learn/CustomGenomes

Sorting tips: https://github.com/jennaj/support-prior-qa/wiki/Sort-your-inputs

2. Datatype: Fastqsanger

Tools require .fastqsanger formatted sequence/quality scores. I suspect your data is already in this format and the assignment of .fastqillumina is causing problems. Prior Q&A and bug reports with this type of result (low hits) are often due to the wrong sequence datatype as input - in content or by datatype assignment.

Fix: Double check format and Fastq Groom or assign the correct datatype. Don't just change the assigned datatype or more unexpected results can occur. This is how: https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 14 months ago • written 14 months ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 124 users visited in the last hour