Cuffdiff Errors: "Loading reference annotation" and "Inspecting maps and determining fragment length distributions".

Question: Cuffdiff Errors: "Loading reference annotation" and "Inspecting maps and determining fragment length distributions".

2.3 years ago by

agrant • 30

agrant • 30 wrote:

Hey,

I have 30 individuals that have two samples each. I've had no problem running Cuffdiff between the individuals' samples, but I keep getting the "Loading reference annotation and inspecting maps and determining fragment length distributions" errors when running all 60 samples (30 samples for each condition) through Cuffdiff on my galaxy cloud instance.

Because I was running out of RAM on the public galaxy server, I transferred all of my accepted hits file onto a cloud instance along with my merged transcript file but the above error keeps popping up.

I've seen some posts that have a similar problem but I couldn't find a direct solution. I've also double check that all the files match up with the cuff merge dataset and that the same reference annotation was used during cufflinks and cuff merge. I hope I don't have to rerun tophat and cufflinks on my galaxy cloud instance but I'm starting to get desperate.

Update

To identify what files were causing the errors I decided to split my 60 samples (30 indviduals) into two groups. Then, I re-ran cuff merge on both groups. After re-running cuffdiff on my new merged transcripts, the first group ran without any problems while the second group stopped prematurely with the same error expressed above. To further investigate I split up the last 15 individuals into 3 groups. But after running cuff merge and cuffdiff on these 3 groups they all completed with out any errors. Now I even more confused on what is happening...

rna-seq • 1.1k views

ADD COMMENT • link •

modified 2.3 years ago • written 2.3 years ago by agrant • 30

2.3 years ago by

Mo Heydarian ♦ 830

United States

Mo Heydarian ♦ 830 wrote:

Hello, Thanks for reporting this issue and including updates on your solution.

It seems like scaling down the number of samples you input to Cuffdiff eliminates the errors you see. What size worker node are you using to run these jobs? Have you tried re-running your analysis with larger worker nodes?

Cheers, Mo Heydarian

ADD COMMENT • link written 2.3 years ago by Mo Heydarian ♦ 830

Thank you for reaching out to me!

I was using the c3.8 x large ec2 instance and I turned autoscaling on with using a minimum of 1 node and a maximum of 16 nodes. Would you suggest something else? Thanks again for getting back to me.

ADD REPLY • link written 2.3 years ago by agrant • 30

Hello,

You may want to try using a r3.8xLarge worker node, this has ~ four fold more memory than the c3.8xLarge node. I'm not sure if autoscaling will help here. When running a single job, that job will only run on the cores of a single worker node. If/when additional jobs are submitted that require more cores then are available on the first worker node, then autoscaling will add extra worker nodes.

A bit more information could be helpful. How large are your aligned read (BAM) files, on average? When your jobs are failing, how long do they run for before returning an error? Could you provide the full error report you're getting when these jobs are failing?

Cheers, Mo Heydarian

ADD REPLY • link written 2.3 years ago by Mo Heydarian ♦ 830

This is valuable information, thank you Mo.

The total size of all 60 BAM files is 85 gb. Cuffdiff runs for a good 6-8 hours before it suddenly stops. And here is a link to the error report I receive: https://docs.google.com/spreadsheets/d/111N7J2aLZ264GlqnfJ6O2EpUYB4FuorIYeSMmNegBUo/edit#gid=0

I just thought of this, does cuffdiff load all of the accepted hits files into memory? Obsouly that would be a problem if I was only using 60 gb of ram with the c3.8xLarge node.

Thank you again for your time Mo.

ADD REPLY • link written 2.3 years ago by agrant • 30

Hi Agrant, I have the same problem with cuffdiff (loading reference annotation" and "Inspecting maps and determining fragment length distributions) as you had, my data is 3 conditions each condition 4 replicates and the total size of all of acc hit files around 19 gb, How did you figure out this problem? Regards Fat

ADD REPLY • link written 7 months ago by Fat.eldefrawy • 10

Similar posts • Search »