I have a problem with uploading my data to usegalaxy.org. I was trying to make FTP link of my data (two files each about 90GB) and uploading them to the galaxy, but after uploading of about 100GB, I received "file already exists" error and the transfer stopped. I would these data further in the galaxy and it's so important to upload them. Can anyone help? my username in galaxy is "aida.shahraki@gmail.com" Regards,
Hello,
There is a 50 GB per-file size limit for uploaded datasets at Galaxy Main https://usegalaxy.org. When uploading BAM files, the maximum size is a bit smaller, due to how these data are indexed on the server -- about 25-35 GB per BAM file works the best.
Please consider using your own Galaxy server when analyzing very large datasets:
Finally, a workaround that is not necessarily recommended, because such large datasets may be too large to process once in Galaxy, or you'll exceed account quota (250 MB) but generally possible for certain limited use cases. You can try it if you want but will need to troubleshoot any problems if it doesn't work.
- Split the data into smaller chunks, uploaded individually, then merged back together.
- This works best for uncompressed plain text files: GTF/GFF3, BED, FASTQ, INTERVAL
- In the Upload tool, directly assign the datatype "tabular" during upload to avoid the chunked data from being detected as known datatype. The goal is to preserve all original formatting. Do NOT use the option to convert "whitespace to tabs" (under the gear icon).
- Then once all chunked data is in your history, merge the data back together with Concatenate.
- At the end it is important to double check the formatting (compare character/word/line counts between the original and uploaded/merge file), view the file to confirm basic formatting, assign the final datatype (bed, fastqsanger, etc), then view the file again to make sure any metadata assigned is correct (important for some datatypes).
For others: Any public Galaxy server may have different limits set and I don't personally know of any that load more than 50 GB, but that Public server's project help/documentation might note this, or the admins could be contacted directly to find out.
Thanks! Jen, Galaxy team