Galaxy sourcing reference data from location not in data table

Question: Galaxy sourcing reference data from location not in data table

2.6 years ago by

Belgium

Hi all,

I'm having a issue with the picard tools downloaded from the toolshed. When trying to use the CollectInsertSizeMetrics tool in the picard suite, I'm asked to use a reference genome as input. This is no problem, since the tool should read the available references from the all_fasta.loc file.

However, I noticed that the tool apparently sources it's references from some other location too. This results in 2 links to the hg19 genome, which translates in a comma separated list of the input paths as arguments to the tool, which then enters an error state, as this isn't a legal argument.

Anyone any idea where Galaxy reads it's reference files from, apart from the loc files in tool-data? It seems to look for data in the folder defined in the data table but it also seems to include the ~/galaxy-dist/tool-data/hg19/seq/hg19.fa path for some reason.

This also occurs in all other picard related tools that require a reference fasta input.

Thanks! M

Extra info:

tracking Galaxy release brach 16.01
picard tools revision 11:efc56ee1ade4 (https://toolshed.g2.bx.psu.edu/repository?repository_id=c45d6c51a4fcfc6c)
Ubuntu 14.04 / Python 2.7

~/galaxy-dist/tool-data/all_fasta.loc:

mm10    mm10    Mouse (Mus Musculus): mm10      /Shared/references/mm10/seq/mm10.fa
danRer7 danRer7 Zebrafish (Danio rerio): danRer7        /Shared/references/danRer7/seq/danRer7.fa
hg19    hg19    Human (Homo sapiens) (b37): hg19        /Shared/references/hg19/seq/hg19.fa
hg_g1k_v37      hg_g1k_v37      Human (Homo sapiens) (b37): hg_g1k_v37  /Shared/references/hg_g1k_v37/seq/hg_g1k_v37.fa
hg38    hg38    Human (Homo sapiens) (b38): hg38        /Shared/references/hg38/seq/hg38.fa
equCab2 equCab2 Horse (Equus caballus): equCab2 /Shared/references/equCab2/seq/equCab2.fa

excerpt from the offending tool xml:

<command>
    @java_options@
    ##set up input files

    #set $reference_fasta_filename = "localref.fa"

    #if str( $reference_source.reference_source_selector ) == "history":
        ln -s "${reference_source.ref_file}" "${reference_fasta_filename}" &amp;&amp;
    #else:
        #set $reference_fasta_filename = str( $reference_source.ref_file.fields.path )
    #end if

    java -jar \$JAVA_JAR_PATH/picard.jar
    CollectInsertSizeMetrics
    INPUT="${inputFile}"
    OUTPUT="${outFile}"
    HISTOGRAM_FILE="${histFile}"
    DEVIATIONS="${deviations}"

    #if str( $hist_width ):
      HISTOGRAM_WIDTH="${hist_width}"
    #end if

    MINIMUM_PCT="${min_pct}"
    REFERENCE_SEQUENCE="${reference_fasta_filename}"
    ASSUME_SORTED="${assume_sorted}"
    METRIC_ACCUMULATION_LEVEL="${metric_accumulation_level}"

    VALIDATION_STRINGENCY="${validation_stringency}"
    QUIET=true
    VERBOSITY=ERROR

  </command>
  <inputs>
    <param format="sam,bam" name="inputFile" type="data" label="Select SAM/BAM dataset or dataset collection" help="If empty, upload or import a SAM/BAM dataset."/>
    <conditional name="reference_source">
      <param name="reference_source_selector" type="select" label="Load reference genome from">
        <option value="cached">Local cache</option>
        <option value="history">History</option>
      </param>
      <when value="cached">
        <param name="ref_file" type="select" label="Using reference genome" help="REFERENCE_SEQUENCE">
          <options from_data_table="all_fasta">
          </options>
          <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/>
        </param>
      </when>
      <when value="history">
        <param name="ref_file" type="data" format="fasta" label="Use the folloing dataset as the reference sequence" help="REFERENCE_SEQUENCE; You can upload a FASTA sequence to the history and use it as reference" />
      </when>
    </conditional>

data_tables picard • 878 views

ADD COMMENT • link •

modified 2.6 years ago • written 2.6 years ago by matthias.desmet • 150

Similar posts • Search »