Question: Error when running Galaxy on fasta sequences
0
gravatar for cerrirc
3.1 years ago by
cerrirc0
cerrirc0 wrote:

Hi all,

I am running Repeat Explorer on a fasta file with about 8.000 sequences

I am getting the following error when executing the Clustering tool for repeat discovery:


 

Traceback (most recent call last):

  File "/mnt/raid_galaxy/home/galaxy/galaxy-dist/tools/umbr_programs/seqclust2/programs/all2all_comparison.py", line 240, in <module>
    int(options.min_overlap))
  File "/mnt/raid_galaxy/home/galaxy/galaxy-dist/tools/umbr_programs/seqclust2/programs/all2all_comparison.py", line 207, in all2all_comparison
    clean_hitsort(mgblast_output_files, hitsort_file, ids)
  File "/mnt/raid_galaxy/home/galaxy/galaxy-dist/tools/umbr_programs/seqclust2/programs/all2all_comparison.py", line 145, in clean_hitsort
    ids_destination[line_items[0]].write(line)
KeyError: 'ARSiTERT00200003,'

Does anyone please know what it could be?

Regards,

Ricardo

software error galaxy • 624 views
ADD COMMENTlink modified 3.1 years ago by Jennifer Hillman Jackson25k • written 3.1 years ago by cerrirc0
0
gravatar for Jennifer Hillman Jackson
3.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This is a python error. In the context of what I think this tool is doing, and the input types, I suggest double checking the format of the fasta identifiers plus ensuring that there are no duplicates. You might need to remove extra content from the description lines (leaving only the identifiers on the ">" fasta lines).

Just as a guess: it could be that extra content in the description lines is not parsed out completely by this tool (content after the ">", first word, and first whitespace). Meaning, what megablast parsed out as an identifier is a mismatch for the identifier otherwise indexed by the tool. Cycle the fasta file through Fasta-to-Tabular (breaking the description line into two fields), then Tabular-To-Fasta (selecting just the first field back for the "identifier", the third field back for the "sequence", leaving the second field of "extra description" content behind). Wrapping the fasta lines may also help.

I am not sure where you are working, but if this format change does not work - or it has already been done, contacting the instance owner or the tool author is the next step.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 3.1 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour