How to annotate 500 vcf files using SNPEff or smth else and put every separate result in a folder with the name of the SRA identifier?

Question: How to annotate 500 vcf files using SNPEff or smth else and put every separate result in a folder with the name of the SRA identifier?

19 months ago by

msprindzhuk • 50

msprindzhuk • 50 wrote:

How to annotate 500 vcf files using SNPEff or smth else and put every separate result in a folder with the name of the SRA identifier? Is it implementable using Galaxy tools?

snpeff batch annotation • 745 views

ADD COMMENT • link •

modified 19 months ago by Guy Reeves • 1.0k • written 19 months ago by msprindzhuk • 50

If the datasets (files) are named with the SRA identifier then there is a way which I can explain.

ADD REPLY • link written 19 months ago by Guy Reeves • 1.0k

explain, OK..................

ADD REPLY • link written 19 months ago by msprindzhuk • 50

19 months ago by

Guy Reeves • 1.0k

Germany

Guy Reeves • 1.0k wrote:

OK so if you have 500 VCF files in a history. AND dataset renaming works for this tool (it does for 90% of tools).

1 make a short workflow with any data input datasets joined to the SNPEff tool. Using the workflow is to enable you to use the dataset renaming capacity. In the workflow window of the tool scroll down and click on 'Configure Output: 'snpeff_output''. This should open some options.

2 got to 'Rename dataset'. and add this to this box '#{input}. This uses the information given at the top of the tool info 'Data input 'input' (vcf, tabular, pileup or bed)' then hopefully your output datasets will be named the same at the input vcf dataset. if you want more info on naming 'Click here for more information. ' near the 'Rename dataset' box. there is info on how to add a suffix to the dataset name

3 (optional) personally I also add something to the 'Tags' just below, as this allows you to easily collect all the output files into a single history using this trick.trick to collect tagged datasets

4 Click outside of the last box you have edited and save the workflow (don´t forget)

5 go to the history with all the datasets you want to work with and run the work flow you have saved. Where you select the VCF files click on the little icon wwhich looks like a pile of papers this will allow you to select multiple vcf at a time. For testing just select a few don´t got for 500 first time.

6 Select other files required by the tool then scroll down to bottom.Check 'Send results to a new history' check box . then 'Run workflow'

7 if you are doing just a few files then you should see a list of named datasets appear. This will tell you if the renaming has worked as you wanted. (when you do this for 500 files you will almost certainly get a 'refresh error' which you can ignore. A new history will be generated for each vcf.

8 you can go to users>saved histories to monitor progress across all the histories, as you refresh this you will see 1 history created for each workflow and the datasets should be named with the original file

Is this what you wanted ? did it work?

cheers

Guy

ADD COMMENT • link modified 19 months ago • written 19 months ago by Guy Reeves • 1.0k

Similar posts • Search »