I have got TPM data from running Salmon on RNA-seq data. However, this is TPM for each individual transcript (often multiple different ones per gene). I want to collapse multiple transcripts to single genes before running DESeq2. Is there a way to do this in Galaxy? Preferably, a simple way for someone new to this.
Salmon can output both transcript and gene level TPM counts. You will need to provide a file of gene-to-transcript mapping. This is the last option on the Salmon tool form and the label starts with "File containing a mapping of transcripts to genes."
The transcript-to-gene mapping is a tabular dataset or GTF dataset (with gene_id and transcript_id populated) and can be also used with DeSeq2. This is a required input for DeSeq2 when using TPM counts as input instead of counts from featurecounts or htseq_count.
Hope that helps! Jen, Galaxy team
Seems I had a couple of problems there - first off, I was using a gene to transcript mapping file with too much information (too many columns). I trimmed this down to two columns: the transcript id (#mm10.knownGene.name) and the official gene symbol (mm10.kgXref.geneSymbol), and that partly worked. The additional problem is that there is a hand-full of official gene symbols that get spread across 2 or more columns (spaces commas etc?). This put some of it out of register. Cutting only columns 1 and 2 to a new file worked, and it seems to be fine now. Maybe I didn't start with the best format gene to transcript mapping file.