Running out of space

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Running out of space

0

3 months ago by

Anya.Nikolai • 0

Anya.Nikolai • 0 wrote:

Hello,

I'm trying to run a large dataset and I'm wondering at what point can I delete earlier steps to clear up space for later steps.

Like after I perform a trim can I delete the original FASTQ files? After I perform HISAT2 can I delete the trim files, etc.?

Thank you!

rna-seq account galaxy quota • 150 views

ADD COMMENT • link •

modified 3 months ago by Jennifer Hillman Jackson ♦ 25k • written 3 months ago by Anya.Nikolai • 0

0

3 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Technically, the only data you need to retain are datasets that are currently being used as inputs or that you plan to use as inputs in downstream steps. Also, consider loading the fastq data in a compressed format - that will also reduce the quota space used. All of the newer wrappers accept compressed fastqsanger.gz inputs but some of the older wrappers do not. For example: use HISAT2 and avoid Tophat.

In the context of an analysis, deciding to remove data involves ensuring that a step you think is completed is not only green (a "successful" job) but actually has the best content. Sometimes it takes a few runs to tune parameters optimally. Making decisions about upstream parameters often requires scientific review of downstream summary/data reduction results.

One strategy is to download/save back locally all intermediate datasets (in case you need them again), then permanently delete in Galaxy to recover space (deleting is not enough, the data must be permanently deleted aka "purged"). Make certain downloads are complete -- curl/wget are a good choice for larger data.

You can also set up a workflow so that intermediate datasets are purged while it is running, just be aware that if the final results are not what you want, you'll need to run the entire workflow again after making adjustments to improve the final content and won't have access to the intermediate datasets for review. So, that is often a better choice when running established workflows on batches of data (often bundled into dataset collections).

It is also important to know that larger data might exceed Galaxy's processing resources, in particular when using a public server. If that happens, then you'll need to move to your own Galaxy. Cloudman is a common choice.

Details for all of the above can be found in these resources:

Galaxy Ecosystem: https://galaxyproject.github.io/
Galaxy Tutorials: https://galaxyproject.org/learn/
Support FAQs: https://galaxyproject.org/support/

Thanks! Jen, Galaxy team

ADD COMMENT • link written 3 months ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »

How to purge histories from local instance galaxy interface
Hi, I would like to know how to purge the deleted histories from galaxy interface. I'm running a ...
RNAseq data to be processed in two ways: (i) mapping to de novo Trinity-based transcriptome and (ii) mapping a relatively new genome
Hello all, I am new to RNAseq data and learning this process step by step, so I have a few quest...
How to free up space on Galaxy server (purge history doesn't delete files)
Hello, We are about to run out of space on our Galaxy instance. I purged some histories in order...
Percent used will not go down
Dear all, I'm using 76% of my allowed disk space and after deleting (and permanently deleting...
How to manage getting over disc quota?
I am trying to do alignment of my trimmed fastq files, but running into space issue. The message ...
BWA for Illumina
Hello, I'm starting to use Galaxy, and I'm trying to generate vcf and BAM files from my Fastq (pa...
Used space in account never goes to 0%
I've had a problem with my Galaxy account for a few weeks now, where the amount of space used sta...
used space is not reflected after history deleted
i deleted all old history and just loaded four files for a total of 46 GB. I can't do any analysi...
Temp Files For Workflows
Hi All, I'm fairly new to galaxy and trying to understand it from both a user and tool developer...
Jobs (Tophat/cufflinks) waiting to load for several days (don't seem to be queued yet)
Hello, I'm running tophat/cufflinks/cuffdiff on Galaxy for what seems like is probably a large d...
/mnt/galaxy/tmp folder contains huge "url_pasted..." files which I can't delete
We've been running a private GVL/Galaxy instance for a number of weeks with 2-3 users. All was w...
Paired-end imports produce 8 files. What to do?
Hi all, I've been scouring the forum for the "first step" in importing fastaq.gz (which unzips au...
Free space after deleting and purged the history
I deleted and purged the history but the space not changed?! how to free space after permanantly...
Disk quota does not refresh after permanent deletion of datasets
Several posts mention that the "use percentage" does not always reflect the actual disk quota, ev...
Deleting files is not freeing space
I am running out of space in my galaxy account and have tried to delete a lot of the files. Even ...
splitting and rejoining datasets
I am working with large fastq files which are too big to use on galaxy (about 120 gb, so even aft...
Clear storage space
Hi, Apologies if this is the wrong place to post but I couldn't find any contact details anywher...
Gatk - Base Recalibrator
Hi all, I'm trying to call variants using GATK best practices workflow. So after performing Real...
Trying to use trimmomatic but it won't pick up my fast files?
Hi I'm trying to analyse a large dataset using galaxy and i'm only fairly early on - i've perfor...
Storage Space/Processor Speed
I am trying to perform a "join" on two sets of intervals. There are ~20,000 intervals in one dat...
How to free space in Galaxy
Hi. I'm currently using Galaxy for some experiments on vcf files. I reached the maximum authori...
Question Regarding: Fastq Quality Trimmer
Hello, I am running Galaxy locally and it has been performing flawlessly! I wanted to get more...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 168 users visited in the last hour