Question: Bwa And Fastq Joiner Issues
0
gravatar for Hilde Stawski
4.9 years ago by
Hilde Stawski10 wrote:
Hello, I hope someone might be able to help me with these issues, as I'm relatively new at Bioinformatics. When analyzing data on the main website of Galaxy (all samples from the same Illumina MiSeq run) some sets fail in the BWA alignment. I have tried rerunning my workflow again, reuploading the FastQ files (in case they were corrupted) but BWA fails every single time. I've pasted the error log below. Q1: Why are most of my datasets aligned by BWA without a problem, but some consistently fail? So, some people at SEQanswers suggested I install a local Galaxy client on my computer, and I'm now trying to rework my workflow so I can use BFAST as an aligner. BWA was giving us some trouble due to indels in our mtDNA sequence, so we are trying to find another aligning tool that is more capable of working with these indels, anyway. However, now I'm running into a second problem. I have PE data, and after grooming both FASTQ files, the FASTQ Joiner generates an empty file. I did try the workaround mentioned here ( http://lists.bx.psu.edu/pipermail/galaxy-user/2012-April/004519.html) but I'm not sure I understand the following instructions "NGS: QC and manipulation -> Tabular to FASTQ" run twice Recreate both FASTQ files from the same tabular file." I ran FASTQ to Tabular with two columns, and joined the files on the first column. Why/how should I recreate both FASTQ files if I wanted to create just one single file to use as input for alignment? FASTQ Groomer doesn't seem to be able to groom the data either "Based upon quality and sequence, the input data is valid for: None Input ASCII range: '0'(48) - 'N'(78) Input decimal range: 15 - 45" Q2: How do I create a single file as input for the aligner? Thanks in advance for any help, Hilde The alignment failed. Error generating alignments. [bwa_sai2sam_pe_core] convert to sequence coordinate... [infer_isize] (25, 50, 75) percentile: (2402, 5234, 8910) [infer_isize] low and high boundaries: 151 and 21926 for estimating avg and std [infer_isize] inferred external isize from 251269 pairs: 5940.068 +/- 4124.607 [infer_isize] skewness: 0.511; kurtosis: -0.744; ap_prior: 1.00e-05 [infer_isize] inferred maximum insert size: 23222 (4.19 sigma) [bwa_sai2sam_pe_core] time elapses: 1.37 sec [bwa_sai2sam_pe_core] changing coordinates of 0 alignments. [bwa_sai2sam_pe_core] align unmapped mate... [bwa_paired_sw] 3297 out of 10352 Q17 singletons are mated. [bwa_paired_sw] 0 out of 188377 Q17 discordant pairs are fixed. [bwa_sai2sam_pe_core] time elapses: 2241.09 sec [bwa_sai2sam_pe_core] refine gapped alignments... 1.37 sec [bwa_sai2sam_pe_core] print alignments... 1.76 sec [bwa_sai2sam_pe_core] 262144 sequences have been processed. [bwa_sai2sam_pe_core] convert to sequence coordinate... [infer_isize] (25, 50, 75) percentile: (2595, 7186, 11297) [infer_isize] low and high boundaries: 151 and 28701 for estimating avg and std [infer_isize] inferred external isize from 84557 pairs: 7236.733 +/- 4785.357 [infer_isize] skewness: 0.066; kurtosis: -1.262; ap_prior: 1.00e-05 [infer_isize] inferred maximum insert size: 27144 (4.16 sigma) [bwa_sai2sam_pe_core] time elapses: 0.38 sec [bwa_sai2sam_pe_core] changing coordinates of 0 alignments. [bwa_sai2sam_pe_core] align unmapped mate... [bwa_paired_sw] 482 out of 3212 Q17 singletons are mated. [bwa_paired_sw] 0 out of 40389 Q17 discordant pairs are fixed. [bwa_sai2sam_pe_core] time elapses: 532.50 sec [bwa_sai2sam_pe_core] refine gapped alignments... /bin/sh: line 1: 28068 Segmentation fault bwa sampe /tmp/ 3030216.cyberstar.psu.edu/tmprNNXwj/tmpXybcjA /tmp/ 3030216.cyberstar.psu.edu/tmpq3YCcl/tmpKvAb8e /tmp/ 3030216.cyberstar.psu.edu/tmpq3YCcl/tmpvUAE3E/galaxy/main_pool/pool3/f iles/005/540/dataset_5540834.dat /galaxy/main_pool/pool3/files/005/540/dataset_5540837.dat >> /galaxy/main_pool/pool2/tmp/job_working_directory/004/860/4860532/gala xy_dataset_5540842.dat
bwa alignment • 1.1k views
ADD COMMENTlink modified 4.9 years ago by Jennifer Hillman Jackson23k • written 4.9 years ago by Hilde Stawski10
0
gravatar for Jennifer Hillman Jackson
4.9 years ago by
United States
Jennifer Hillman Jackson23k wrote:
Hi Hilde, Glad you wrote - we can try to help - To confirm, you were first using the public Main Galaxy instance at https://main.g2.bx.psu.edu usegalaxy.org)? Normally I would suggest sending in a bug report from an error dataset so that we can provide some feedback. But the bit of the error you sent and your own analysis suggests that data content is the root issue. Next time though, this is how to report an error: http://wiki.galaxyproject.org/Support#Reporting_tool_errors These instructions are for creating two inputs - one fastq dataset of forward reads and one fastq dataset of reverse reads - as required by a tool such as BWA. And the overall method was for insuring the the same data was QC'd and a match between these two inputs. This is not our recommended method anymore for the target analysis - it works, but is not needed (for others reading this post - just map the data and filter for pairs after if desired, but this too can often be skipped). For BFAST, you want the reads interleaved and in the same fastq dataset. It sounds like you are having trouble with the 'FASTQ Joiner' tool. There have been known issues with certain sequence ID formats in the past, so verifying format of the inputs would be the first step. If you continue to have no output, you can also send this in as a bug report (if there is an error), or if not in error and just empty, share a link to your history with me and I can provide feedback. I know that there are a few command line tools to join data, and that may be the recommendation - it just depends on your data, but let's check first to see if there isn't another solution first. Use "Option (gear icon) -> Share or Publish -> generate "share" link -> copy and paste into a return email and note the dataset #s that are a concern. Please leave the dataset's undeleted so that I can check the run parameters. You should *not* cc the mailing list when sharing a history link, to keep your data private. Hopefully this helps or will lead to a solution! Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
ADD COMMENTlink written 4.9 years ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 86 users visited in the last hour