Question: MPileup calling every base as <X>
1
gravatar for kjb322
2.8 years ago by
kjb32210
kjb32210 wrote:

Having a few teething problems on my first use of Galaxy (surprise!)

 

Workflow as follows;

Upload fastq files (forward and reverse)

Fastq groomer

Trimmomatic

BWA

SAM to BAM

MPileup

ANNOVAR

 

ANNOVAR gives an empty vcf file and on closer inspection, MPileup gives 55,000,000 lines form 3,600,000 in the sam.

MPileup appears to have called every base, each line looking like this;

 

chr10 1812199 . T <X> 0 . DP=1;I16=1,0,0,0,27,729,0,0,0,0,0,0,0,0,0,0;QS=1,0;MQ0F=1 PL 0,3,4

 

The alt base is <X> for every base. I assumed it was a reference genome mismatch but I have done my best to use several reference genomes and I get the same (or similar) problem.

 

Any offers?

                   
help bwa alignment galaxy mpileup • 952 views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by kjb32210

If you can reproduce this at http://usegalaxy.org, please send in a bug report (from any error dataset, doesn't not have to be from this particular analysis, just a tool in the same history). Or create a shared history link and email that to galaxy-bugs@lists.galaxyproject.org. Note the datasets numbers involved, please.

Make sure that all datasets in the analysis are undeleted for at least one complete pass from start to end.

Include a link to this Biostars post so we can cross-reference the two (in the email or bug report comments).

So that you know, Annovar is only pre-cached with one genome at Galaxy main (hg19). Mpileup advanced settings could be a factor. Sharing is the best way to see everything at once and get to the root of the problem.

Thanks, Jen, Galaxy team

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Jennifer Hillman Jackson25k

The history has got pretty messy as I have been trying everything the firstly identify the point where it goes wrong and then to try and solve it.

https://usegalaxy.org/u/kelly-hunter/h/galaxy-presentation


Thankyou kindly!

ADD REPLYlink written 2.8 years ago by kjb32210

I'll take a look and get back to you by early next week. Thanks!

Ps: Sharing this way means that everyone has public access to your data. If you don't want that, unshare this way and share directly with me. Send an email to the galaxy-bugs list to arrange that.

Jen

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Jennifer Hillman Jackson25k

It's fine, its public data anyway!

Any help at all by Monday would be greatly appreciated if possible and thanks again either way.

 

P.s. In order to not interfere wth that history, is it possible for me to use the uploaded data is this history from another?

ADD REPLYlink written 2.8 years ago by kjb32210

Yes, use the function "copy datasets" to create clone in another history.

I did notice that all the inputs do not quite match up (the inputs are data 1 and data 2, but all other analysis is based off of non-existant data 3 and data 4). That means that the first jobs are "data 3 acting on data 3" and "data 4 acting on data 4". Not sure how this could occur if all executed in the same history, but it may not matter.

ADD REPLYlink written 2.8 years ago by Jennifer Hillman Jackson25k
0
gravatar for kjb322
2.8 years ago by
kjb32210
kjb32210 wrote:

I have re run the workflow to correct the odd inputs you mentioned but found the same issue. Acutally, by filtering the output vcf from MPileup I found that the variants are and so it appears that it is ANNOVAR that I am having trouble with. Even with a filtered vcf containing only variants, ANNOVAR returns an empty file.

ADD COMMENTlink written 2.8 years ago by kjb32210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour