Question: Cuffdiff Errors: "Loading reference annotation" and "Inspecting maps and determining fragment length distributions".
2
gravatar for agrant
18 months ago by
agrant30
agrant30 wrote:

Hey,

I have 30 individuals that have two samples each. I've had no problem running Cuffdiff between the individuals' samples, but I keep getting the "Loading reference annotation and inspecting maps and determining fragment length distributions" errors when running all 60 samples (30 samples for each condition) through Cuffdiff on my galaxy cloud instance.

Because I was running out of RAM on the public galaxy server, I transferred all of my accepted hits file onto a cloud instance along with my merged transcript file but the above error keeps popping up.

I've seen some posts that have a similar problem but I couldn't find a direct solution. I've also double check that all the files match up with the cuff merge dataset and that the same reference annotation was used during cufflinks and cuff merge. I hope I don't have to rerun tophat and cufflinks on my galaxy cloud instance but I'm starting to get desperate.

Update

To identify what files were causing the errors I decided to split my 60 samples (30 indviduals) into two groups. Then, I re-ran cuff merge on both groups. After re-running cuffdiff on my new merged transcripts, the first group ran without any problems while the second group stopped prematurely with the same error expressed above. To further investigate I split up the last 15 individuals into 3 groups. But after running cuff merge and cuffdiff on these 3 groups they all completed with out any errors. Now I even more confused on what is happening...

rna-seq • 666 views
ADD COMMENTlink modified 18 months ago • written 18 months ago by agrant30
1
gravatar for Mo Heydarian
18 months ago by
Mo Heydarian790
United States
Mo Heydarian790 wrote:

Hello, Thanks for reporting this issue and including updates on your solution.

It seems like scaling down the number of samples you input to Cuffdiff eliminates the errors you see. What size worker node are you using to run these jobs? Have you tried re-running your analysis with larger worker nodes?

Cheers, Mo Heydarian

ADD COMMENTlink written 18 months ago by Mo Heydarian790

Thank you for reaching out to me!

I was using the c3.8 x large ec2 instance and I turned autoscaling on with using a minimum of 1 node and a maximum of 16 nodes. Would you suggest something else? Thanks again for getting back to me.

ADD REPLYlink written 18 months ago by agrant30

Hello,

You may want to try using a r3.8xLarge worker node, this has ~ four fold more memory than the c3.8xLarge node. I'm not sure if autoscaling will help here. When running a single job, that job will only run on the cores of a single worker node. If/when additional jobs are submitted that require more cores then are available on the first worker node, then autoscaling will add extra worker nodes.

A bit more information could be helpful. How large are your aligned read (BAM) files, on average? When your jobs are failing, how long do they run for before returning an error? Could you provide the full error report you're getting when these jobs are failing?

Cheers, Mo Heydarian

ADD REPLYlink written 18 months ago by Mo Heydarian790

This is valuable information, thank you Mo.

The total size of all 60 BAM files is 85 gb. Cuffdiff runs for a good 6-8 hours before it suddenly stops. And here is a link to the error report I receive: https://docs.google.com/spreadsheets/d/111N7J2aLZ264GlqnfJ6O2EpUYB4FuorIYeSMmNegBUo/edit#gid=0

I just thought of this, does cuffdiff load all of the accepted hits files into memory? Obsouly that would be a problem if I was only using 60 gb of ram with the c3.8xLarge node.

Thank you again for your time Mo.

ADD REPLYlink written 18 months ago by agrant30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 107 users visited in the last hour