7 months ago by
The format of the original uploaded fastq data (dataset 3 and 20) appear to be Ok. However, I can see that the Trinity assembly is failing at an early step.
I would suggest loading the data directly from the source into Galaxy and manipulating it within Galaxy (trim, other QA). Use the tool Download and Extract Reads in FASTA/Q format from NCBI SRA. Then try a rerun. Sometimes data from this source needs standardized reformatting, see the help here for details: https://galaxyproject.org/support/ncbi-sra-fastq/
The datatype attribute will be automatically assigned. Avoid assigning a database attribute. If you want to annotate datasets by the source species (or any other info), consider using Tags: https://galaxyproject.org/tutorials/histories/#tagging-datasets
If the job fails again with the cleaned up inputs, it is probably too large to run at Galaxy Main https://usegalaxy.org. The tool itself has no known issues and see that you have had other successful runs using different inputs. Choices:
- You could try running a sample/subset of the data through as a test (or for the final result, as there are many duplicated reads, see the FastQC report for details). To sub-sample randomly, convert with Fastq-to-Tabular, run the tool Select random lines from a file, then convert back with Tabular-to-Fastq.
- Consider setting up your own Galaxy server and allocating sufficient memory. Cloudman is a good choice for many. https://galaxyproject.github.io/ and https://galaxyproject.org/choices
Note: The database you have been assigning is not hg38, but another human database. Human hg38 is the genome you had successful mapping against, and because of the fastq database assignment being different, the BAM and other results are ending up with the wrong database assignment (inherited from the fastq input) for tools like Tophat and Cufflinks. This is a known bug we are working to resolve. For now, do not assign datatype for fastq inputs. This will not be a factor for Trinity assembly but I would still avoid the database assignment for fastq/fasta inputs when using most tools. If you must assign it, make sure it is correct (the same database used in the rest of the analysis). https://github.com/galaxyproject/usegalaxy-playbook/issues/104. How to remove/adjust metadata assignments: https://galaxyproject.org/support/metadata/
Thanks! Jen, Galaxy team