Having a few teething problems on my first use of Galaxy (surprise!)
Workflow as follows;
Upload fastq files (forward and reverse)
Fastq groomer
Trimmomatic
BWA
SAM to BAM
MPileup
ANNOVAR
ANNOVAR gives an empty vcf file and on closer inspection, MPileup gives 55,000,000 lines form 3,600,000 in the sam.
MPileup appears to have called every base, each line looking like this;
chr10 | 1812199 | . | T | <X> | 0 | . | DP=1;I16=1,0,0,0,27,729,0,0,0,0,0,0,0,0,0,0;QS=1,0;MQ0F=1 | PL | 0,3,4 |
The alt base is <X> for every base. I assumed it was a reference genome mismatch but I have done my best to use several reference genomes and I get the same (or similar) problem.
Any offers?
If you can reproduce this at http://usegalaxy.org, please send in a bug report (from any error dataset, doesn't not have to be from this particular analysis, just a tool in the same history). Or create a shared history link and email that to galaxy-bugs@lists.galaxyproject.org. Note the datasets numbers involved, please.
Make sure that all datasets in the analysis are undeleted for at least one complete pass from start to end.
Include a link to this Biostars post so we can cross-reference the two (in the email or bug report comments).
So that you know, Annovar is only pre-cached with one genome at Galaxy main (hg19). Mpileup advanced settings could be a factor. Sharing is the best way to see everything at once and get to the root of the problem.
Thanks, Jen, Galaxy team
The history has got pretty messy as I have been trying everything the firstly identify the point where it goes wrong and then to try and solve it.
https://usegalaxy.org/u/kelly-hunter/h/galaxy-presentation
Thankyou kindly!
I'll take a look and get back to you by early next week. Thanks!
Ps: Sharing this way means that everyone has public access to your data. If you don't want that, unshare this way and share directly with me. Send an email to the galaxy-bugs list to arrange that.
Jen
It's fine, its public data anyway!
Any help at all by Monday would be greatly appreciated if possible and thanks again either way.
P.s. In order to not interfere wth that history, is it possible for me to use the uploaded data is this history from another?
Yes, use the function "copy datasets" to create clone in another history.
I did notice that all the inputs do not quite match up (the inputs are data 1 and data 2, but all other analysis is based off of non-existant data 3 and data 4). That means that the first jobs are "data 3 acting on data 3" and "data 4 acting on data 4". Not sure how this could occur if all executed in the same history, but it may not matter.