I'm trying to upload .gz-packed fastq files to usegalaxy.org with the ftp client. After the upload finished I've loaded the files to my current history with 'auto detect files' on. Unfortunately, after a short processing time the files are shown to contain only ~3000 sequences instead of 20 M or more. I've tried to upload different files, but this error still persists. In the past I've already worked with galaxy servers and I've never experienced such errors before. Is this a known problem? I thought that it might be temporary, but it persists since more than 2 weeks.
Hello -
There are no known issues with the upload tool on the public Main Galaxy instance at http://usegalaxy.org.
Few suggestions/questions to troubleshoot:
1. Double check that you are working on the Main instance above. For both the FTP and UI loading. This may seem obvious, but if you use a few different Galaxies, this can get confused. If you are using a different instance, then you will need to contact the administrator of that instance for FTP help (they could be having a problem).
2. Does the file end with a .gz extension (only)? The name is alphanumeric plus underscores "_" and or dashes "-" only? No extra dots "."?
3, What happens when you can uncompress the file locally? Is it intact?
4, Confirm that the FTP client you used indicates that the load was "successful". This will be in the logs in the client interface. If the transfer was interrupted for some reason, most clients have a "resume" function.
5. The file is <= 50G (the upload size limit) and there is room in your account for it. Click on "User -> Preferences" to see how much of your quota is used/available. Permanently delete data, if needed, to make room for new analysis.
Hopefully one of these helps to diagnose the problem, but let us know. Jen, Galaxy team
Thank you for the reply! The files are intact when opened locally and I have checked the other points also. I have tried to run FASTQ Groomer on a uncomplete dataset to obtain more information. Here is the error message:
The error in your file occurs between lines '13713' and '13715', which corresponds to byte-offsets '899942' and '900001', and contains the text (59 of 59 bytes shown): @HWI-D00259:193:C5KJVANXX:6:1101:6759:2608 2:N:0:GTTTCG TC
It seems that the file was just split at this point after or during the upload which is strange because I have checked the original .fastq file at the same position and it looks like this:
@HWI-D00259:193:C5KJVANXX:6:1101:6759:2608 2:N:0:GTTTCG |
TCAACATATGCGCAGGTCGTTAGGTGTCTGGTATACCCGAAGGCCTAGTGAAAGCAAGAGATGCCATTCGGTGGTATCGTTTTGGCATGATCCTGGCACTT |
+ |
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF |
I have tried to upload the datasets directly to usegalaxy.org without using a ftp client, hence the smaller datasets are only ~600 MB and I got the same error. Even older datasets which were uploaded properly in the past and already processed successfully, were reduced to ~3000 sequences after I have re-uploaded them to the server today.
Plase make sure you have one file per archive as Galaxy cannot work with multiple files in one archive.
Thank you for the quick reply! It was just a single file and not an archive with multiple files.