Question: Problem with new tool definition xml file
0
gravatar for luciaaheitor
4 months ago by
luciaaheitor20
luciaaheitor20 wrote:

Hello,

I've cloned and obtained a local Galaxy instance. I've followed the initial suggested steps (https://galaxyproject.org/admin/get-galaxy/) and have already installed tools from the shed, added references and created their indexes, etc. Everything went well and had no issue until this point. However I wanted to add a new tool that I can't find on the tool shed and so I've followed the steps from here: https://galaxyproject.org/admin/tools/add-tool-tutorial/. Initially, I wrote the tool definition xml file by hand and after adding it on the tool_config.xml it never showed on galaxy. I figured that the xml was invalid and tried to get it through Planemo. The xml outputted works perfectly on galaxy but I can't find on Planemo's documentation how to specify the input parameters characteristics so I did them by hand. By changing the xml file it stopped showing on galaxy but by changing it back to the original state it works again.

This is the xml file by planemo:

<tool id="vardict" name="VarDict" version="21.07.2018">
<description>Variant caller</description>
<requirements>
</requirements>
<command detect_errors="exit_code"><![CDATA[
    vardict -G "$input1" -f "$input3" -b "$input2" -F 0 -c 1 -S 2 -E 3 -g 4 "$input4" | teststrandbias.R | var2vcf_valid.pl -f "$input3" -A > "$output1"
]]></command>
<inputs>
    <param type="data" name="input1" format="fasta" />
    <param type="data" name="input2" format="bam" />
    <param type="data" name="input3" format="" />
    <param type="data" name="input4" format="bed" />
</inputs>
<outputs>
    <data name="output1" format="vcf" />
</outputs>
<help><![CDATA[
        /home/lucia/galaxy/tools/vardict/vardict [-n name_reg] [-b bam] [-c chr] [-S start] [-E end] [-s seg_starts] [-e seg_ends] [-x #_nu] [-g gene] [-f freq] [-r #_reads] [-B #_reads] region_info

VarDict is a variant calling program for SNV, MNV, indels (<120 bp default, but can be set using -I option), and complex variants.  It accepts any BAM format, either
from DNA-seq or RNA-seq.  There are several distinct features over other variant callers.  First, it can perform local
realignment over indels on the fly for more accurate allele frequencies of indels.  Second, it rescues softly clipped reads
to identify indels not present in the alignments or support existing indels.  Third, when given the PCR amplicon information,
it will perform amplicon-based variant calling and filter out variants that show amplicon bias, a common false positive in PCR
based targeted deep sequencing.  Forth, it has very efficient memory management and memory usage is linear to the region of
interest, not the depth.  Five, it can handle ultra-deep sequencing and the performance is only linear to the depth.  It has
been tested on depth over 2M reads.  Finally, it has a build-in capability to perform paired sample analysis, intended for
somatic mutation identification, comparing DNA-seq and RNA-seq, or resistant vs sensitive in cancer research.  By default,
the region_info is an entry of refGene.txt from IGV, but can be any region or bed files.

(...)

]]></help>
<citations>
    <citation type="doi">10.1093/nar/gkw227</citation>
    <citation type="bibtex">

@misc{githubVarDict, author = {LastTODO, FirstTODO}, year = {TODO}, title = {VarDict}, publisher = {GitHub}, journal = {GitHub repository}, url = {https://github.com/AstraZeneca-NGS/VarDict}, }</citation> </citations> </tool>

I'd like to change the inputs to this:

    <param type="data_meta" name="Reference" format="fasta" />
    <param type="data" name="BAM file" format="bam" />
    <param type="integer" value="0.005" name="Frequency" />
    <param type="data" name="BED file" format="bed" />

Am I doing something wrong? Please let me know!

ADD COMMENTlink modified 4 months ago by Hotz, Hans-Rudolf1.8k • written 4 months ago by luciaaheitor20
0
gravatar for Hotz, Hans-Rudolf
4 months ago by
Switzerland
Hotz, Hans-Rudolf1.8k wrote:

Hi

The whitespace character in the name attribute are most like the culprit. The name is used as variable in the command line. Also, the type="data_meta" only works as an attribute for 'filter', as far as I know (for full details see: https://docs.galaxyproject.org/en/latest/dev/schema.html#tool-inputs-param )

try:

<param type="data" name="Reference" format="fasta" />
<param type="data" name="BAMfile" format="bam" />
<param type="integer" value="0.005" name="Frequency" />
<param type="data" name="BEDfile" format="bed" />

Hope this helps

Regards, Hans-Rudolf

ADD COMMENTlink written 4 months ago by Hotz, Hans-Rudolf1.8k

Thank you for replying!

I'm sorry for the "data_meta" part, it was an oversight on my part. However, I did as you suggested and it still doesn't work... I've run the linting tool from planemo with the changed xml and this was the output:

Applying linter tests... WARNING

.. WARNING: No tests found, most tools should define test cases. .. WARNING: No valid test(s) found. Applying linter output... CHECK .. INFO: 1 outputs found. Applying linter inputs... CHECK .. INFO: Found 4 input parameters. Applying linter help... CHECK .. CHECK: Tool contains help section. .. CHECK: Help contains valid reStructuredText. Applying linter general... CHECK .. CHECK: Tool defines a version [21.07.2018]. .. CHECK: Tool defines a name [VarDict]. .. CHECK: Tool defines an id [vardict]. .. CHECK: Tool targets 16.01 Galaxy profile. Applying linter command... CHECK .. INFO: Tool contains a command. Applying linter citations... CHECK .. CHECK: Found 2 likely valid citations. Applying linter tool_xsd... CHECK .. INFO: File validates against XML schema. Failed linting

The xml indeed doesn't include a test case but I don't think this is the issue. Does this give more insight on the issue?

The xml as it is now:

<tool id="vardict" name="VarDict" version="21.07.2018">
<description>Variant caller</description>
<requirements>
</requirements>
<command detect_errors="exit_code"><![CDATA[
    vardict -G "$input1" -f "$input3" -b "$input2" -F 0 -c 1 -S 2 -E 3 -g 4 "$input4" | teststrandbias.R | var2vcf_valid.pl -f "$input3" -A > "$output1"
]]></command>
<inputs>
<param type="data" name="Reference" format="fasta" />
<param type="data" name="BAMfile" format="bam" />
<param type="integer" value="0.005" name="Frequency" />
<param type="data" name="BEDfile" format="bed" />
</inputs>
<outputs>
    <data name="output1" format="vcf" />
</outputs>
<help><![CDATA[
        /home/lucia/galaxy/tools/vardict/vardict [-n name_reg] [-b bam] [-c chr] [-S start] [-E end] [-s seg_starts] [-e seg_ends] [-x #_nu] [-g gene] [-f freq] [-r #_reads] [-B #_reads] region_info

VarDict is a variant calling program for SNV, MNV, indels (<120 bp default, but can be set using -I option), and complex variants.  It accepts any BAM format, either
from DNA-seq or RNA-seq.  There are several distinct features over other variant callers.  First, it can perform local
realignment over indels on the fly for more accurate allele frequencies of indels.  Second, it rescues softly clipped reads
to identify indels not present in the alignments or support existing indels.  Third, when given the PCR amplicon information,
it will perform amplicon-based variant calling and filter out variants that show amplicon bias, a common false positive in PCR
based targeted deep sequencing.  Forth, it has very efficient memory management and memory usage is linear to the region of
interest, not the depth.  Five, it can handle ultra-deep sequencing and the performance is only linear to the depth.  It has
been tested on depth over 2M reads.  Finally, it has a build-in capability to perform paired sample analysis, intended for
somatic mutation identification, comparing DNA-seq and RNA-seq, or resistant vs sensitive in cancer research.  By default,
the region_info is an entry of refGene.txt from IGV, but can be any region or bed files.

(...)

]]></help>
<citations>
    <citation type="doi">10.1093/nar/gkw227</citation>
    <citation type="bibtex">

@misc{githubVarDict, author = {LastTODO, FirstTODO}, year = {TODO}, title = {VarDict}, publisher = {GitHub}, journal = {GitHub repository}, url = {https://github.com/AstraZeneca-NGS/VarDict}, }</citation> </citations> </tool>

ADD REPLYlink modified 4 months ago • written 4 months ago by luciaaheitor20

you need to adjust the command line using the variable names $Reference, $BAMfile, $Frequency, $BEDfile, instead of $input1, etc

ADD REPLYlink written 4 months ago by Hotz, Hans-Rudolf1.8k

As suggested, I did just that and it still doesn't appear on Galaxy. Applying the linting tool once again fails just like I sent before. I also tried going back to planemo's original xml and just change the integer parameters (leaving the names as inpu1, etc) and it still gives the same error.

    <command detect_errors="exit_code"><![CDATA[
    vardict -G "$input1" -f "$input3" -b "$input2" -F 0 -c 1 -S 2 -E 3 -g 4 "$input4" | teststrandbias.R | var2vcf_valid.pl -f "$input3" -A > "$output1"
]]></command>
<inputs>
    <param type="data" name="input1" format="fasta" />
    <param type="data" name="input2" format="bam" />
    <param type="integer" value="0.005" name="input3" />
    <param type="data" name="input4" format="bed" />
</inputs>

I'm really sorry for the trouble but I really can't figure the problem out...

ADD REPLYlink written 4 months ago by luciaaheitor20

what is the error you get, when you restart galaxy and it tries to load the tool?

ADD REPLYlink written 4 months ago by Hotz, Hans-Rudolf1.8k

The tool just doesn't show on Galaxy:

What I get with the planemo xml altered https://ibb.co/gfKMKT

What I get without altering it: https://ibb.co/kXY968

ADD REPLYlink modified 4 months ago • written 4 months ago by luciaaheitor20

I believe you, that the tool doesn't show up.....my suggestion was 'to restart galaxy and look for the error you get, when it tries to load the tool'

If you do that, you will get the following error: "ValueError: An integer is required"

well, this is a typical oversight, hence change the line to:

<param type="float" value="0.005" name="input3"/>

and it should work

ADD REPLYlink written 4 months ago by Hotz, Hans-Rudolf1.8k

You were right, I got that error. I have now changed the xml as you suggested and everything is now working properly. Thank you so much for your time!

ADD REPLYlink written 4 months ago by luciaaheitor20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour