Question: Having a problem running RepeatMasker from the Galaxy Toolshed
1
gravatar for scambier
4.2 years ago by
scambier40
United States
scambier40 wrote:

Hello-

I'm trying to run RepeatMasker from the Galaxy Toolshed, (I'm using Cloudman). I'm getting a number of errors, such as the following

tail: cannot open `/mnt/galaxy/tmp/tmpngVXxB/dataset_212.dat.out' for reading: No such file or directory

 

I have zero command line/linux experience, so this error message is beyond me. My question is, is this a problem with my machine/data, or is it a problem with the RepeatMasker tool? Is anyone currently successfully using the RepeatMasker tool? (My files are too big to run through the public repeatmasker server at ISB, so I was hoping Galaxy would be my answer). Thanks so much,

software error galaxy • 1.5k views
ADD COMMENTlink modified 4.2 years ago by Dannon Baker3.7k • written 4.2 years ago by scambier40
2
gravatar for Dannon Baker
4.2 years ago by
Dannon Baker3.7k
United States
Dannon Baker3.7k wrote:

So what's happening here is that the RepeatMasker tool in the toolshed only installs the 'wrapper' for the tool -- the galaxy component.  RepeatMasker still needs to be installed separately.

The first error that was thought to be a data access problem is actually that RepeatMasker (when functioning correctly) creates a temporary file that the wrapper expects to be there.  And, since the RepeatMasker executable isn't available, things blow up when you get to that step.

Here's the readme file associated with that repository (I assume this is the one you're using), where it talks a little about the installation procedure:  https://toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/file/5673e72241aa/readme.rst.  If you'd like to give it a shot, let me know, and I can talk you through actually doing this on your cloud instance.

-Dannon

ADD COMMENTlink written 4.2 years ago by Dannon Baker3.7k
1

Oh wow, (embarrased facepalm). This is what I get for not reading the 'read me' file.

I actually tried installing repeat masker on a linux machine in Amazon's cloud, but couldn't figure it out with my essentially zero computer skills. I'd love to give it a try in galaxy, but am away from my work computer, (and thus decent internet), until tomorrow 9 am pacific time. If you could help me sometime after that, I'd really appreciate it. Thanks so much,

Stephanie Cambier Stetson Lab

ADD REPLYlink written 4.2 years ago by scambier40
1

Sorry for the circumstances, I should add proper tool dependencies and a data manager. But this will take time.
 

ADD REPLYlink written 4.2 years ago by Bjoern Gruening5.1k
0
gravatar for Jennifer Hillman Jackson
4.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This is a data access problem. Have you run other tools successfully on your cloud instance before?

Just so you know, for large files and repeat masker: breaking the job up into smaller chunks then merging the output would not be expected to have any impact on the result data.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 4.2 years ago by Jennifer Hillman Jackson25k
1

Thanks so much for the quick reply!

Yes, everything else in the (now terminated) Cloudman was working great- I uploaded my reads, trimmed, groomed, mapped, converted SAM to BAM and BAM to SAM, all just fine. But when I converted my groomed FASTQ to FASTA to run through RepeatMasker, I got the above error.

And if I can get this working, what size file would you suggest running through RepeatMasker? I've got 6 miseq runs of 2.5 million reads each that I'd like to run through, (which might be overly ambitious?).

ADD REPLYlink written 4.2 years ago by scambier40

How to chunk up the data depends on that servers resources. It was suggested for the other public site. They might post file size limits?

For Galaxy, the error is odd if everything else was working. I would try it again allocating more memory. Perhaps the dataset was corrupted in some way from the prior manipulation or the server ran out of disk space. 

ADD REPLYlink written 4.2 years ago by Jennifer Hillman Jackson25k
1

Hi there. I tried again, in a new Cloudman, and got the same error message as before. I uploaded my fastq via FTP, (illumina 1.8+), groomed it, converted to fasta, (did not rename the sequences when I did so), then tried to run the Fasta through repeatmasker. Here's the second error message I'm also generating, in addition to the first one listed above

/opt/sge/default/execd/spool/ip-10-191-57-33/job_scripts/4: line 34: RepeatMasker: command not found

Does this second error message also indicate a problem with data access? And thanks so much for your help.

ADD REPLYlink written 4.2 years ago by scambier40

This looks like a tool install problem now (which could have led to the other). I would double check how you did this - the repository owned by Björn Grüning has very detailed setup instructions in the README to follow. He also frequents this board so may offer advice. Or you can contact any repository owner through the Tool Shed if there is a problem.

As for input format, once this install is finished, just make sure the fasta files are in proper format. I can't recall if the tool has an issue with spaces in the identifier names (many NGS reads do), but you can always try renaming a few to test if you get an error. If these are longer sequences, you probably need to wrap them after converting to fastq->fasta, but I doubt that would apply in your case. 

Good luck! Jen, Galaxy team

ADD REPLYlink written 4.2 years ago by Jennifer Hillman Jackson25k

Opps, I should have read other replies and not just my messages. Looks like Dannon has you sorted out with similar advice. Great! Jen

ADD REPLYlink written 4.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour