4.4 years ago by
If this is Illumina HIseq, then you probably can skip grooming next time and just assign the datatype ".fastqsanger".
Since the data is from an external source, I would recommend to always confirm datatype vs reported type, but you can check just the first 100 or so sequences per dataset (no need to process the whole thing). Use "Text manipulation -> Select first lines" and get the first 400 or so lines (some multiple of 4 for fastq), run FastQC, confirm type, then either directly assign datatype or groom. After assigning datatype, or grooming, is when you'll want to run FastQC on the entire dataset for QC purposes. More help here: http://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA
Sorry that the job seemed to take long to queue and run. The public Main Galaxy instance is busy. But maybe this will help speed things up with future data prep.
Jen, Galaxy team
it take more than 4 hours but my 9Gb of data are now "groomed"