Hello, I have reads from three paired-end libraries and they have been merged, so for each library I have 3 files: merged reads, unmergedR1, unmergedR2. I would like to use these as inputs into Salmon. On the Galaxy menu, it asks if my library is mate-pair and I choose paired-end. Then it gives the option of putting in 2 files. If they hadn't been merged, I would just concatenate all R1 in one file, all R2 in one file (hopefully this preserves order) and use those. However, now I have the merged files. I don't think I can put all the merged reads into the R1 file because the association between the unmergedR1 and unmergedR2 would be lost. Right? How do I go about combining these files and preserving the paired-end, stranded information?
The tool only accepts either paired or single end inputs, not mixed. The expected input is two datasets - one representing the forward reads, the other representing the reverse reads. Your data is from a paired-end protocol - meaning, the strand is still valid for all reads, including the unmatched (orphan) reads. Review the tool help for the Salmon setting Type of index to help decide if you really want to include the unmatched (both options allow the choice to use or to not use these).
If the data was "merged" with the Fastq Interlacer, that isn't needed with this tool and interlaced fastq is not (currently) an accepted input with Salmon. If the data was "merged" with the Fastq Joiner, that input also is not accepted and you might be able to use the Fastq Splitter (if the reads are the same length).
Then do what you state: Concatenate all forward reads, then all reverse reads, and include the unmatched (if you decide to use them still). The read order will not matter with this tool. You might consider running the tool a few times (with and without unmatched reads) and comparing to see which produces better results.
Note: This FAQ is focused on a different overall data source, but the "Solution 2" method can split up any interleaved fastq dataset given the right regular expressions. Joined data that are not of the same length will require you to find/upload the original datasets. Support FAQs: https://galaxyproject.org/support/
- Manipulate interlaced fastq >> https://galaxyproject.org/support/ncbi-sra-fastq/#interlaced-forward-and-reverse-reads
Thanks! Jen, Galaxy team