After small RNA sequencing I performed adapter, quality, and length trimming on fastq files. I then used miRDeep2 mapper to collapse reads followed by using miRDeep2 Quantifier to map the collapsed reads to miRbase fasta file for mature sequences. My output is a tabular file with 6 columns: miRNA name, read count, precursor name , total, seq, seq(norm). I would now like to use this data to do differential expression analysis and create some images for this data. Many of the downstream programs in Galaxy seem to require SAM/BAM files and GFF or GTF files and more genomic information like chromosome location in the files (which is not in my tabular files). Is there a good program for me to use which can take the normalized read counts for each miRNA in different samples and do differential expression on them or what is the best method for getting to that point? I even tried to create my own digital expression matrix for input into edgeR but empty files were returned when trying to run the program. If I could convert my tabular files mapped to the miRNA to SAM/BAM files that would also be helpful.
Since your description is rather general, I make the following assumptions for your case.
You have the tables (miRDeep2 outputs) corresponding to some samples of different conditions. You want to have DE of miRNAs between the conditions.
For calculating DE, I recommend you DESeq2 witch takes the raw read counts of each miRNA.
For illustrating the analysis steps, let's make a toy example. There are two conditions: treated vs untreated. Under each condition there are few samples.
For the example, you could try out the following steps:
generate the mirdeep2 table for each sample.
use the tool
cut
to cut each table to retain the columns:miRNA name
,read count
. (Here we need raw counts since deseq2 does normalization itself.)use the buttons on
history
to group the tables according to the conditions. First clickOperations on multiple datasets
, then select the cut tables under the same condition, then fromFor all selected...
clickBuild dataset list
, name each dataset list either treated or untreated. These dataset lists are used for the inputs of deseq2.for the toy example, the inputs for deseq2 are following:
FactorName
: treatment1: Factor level
: treated2: Factor level
: untreated- for
Counts file(s)
, remember to selectDataset Collections
- at last, execute and cross fingers
Thank you for the very clear directions. When building the dataset list it would not let me rename these (but I just used them as is) and then unfortunately, I got the following error: "failure preparing job" when trying to run DESeq2 with the dataset lists. It seems that this program as well really wants a table from htseq-count or feature-count which do not want to take the tabular output files generated from miRDeep2. Any other thoughts?
The table generated by htseq-count is one column of ids and another column of read counts, which is what you would have after step 2. I tested with a toy set and it worked.
There can be any reasons for "failure preparing job".
Without the detail description, I could only suggest:
regarding
building the dataset list it would not let me rename these
, there is no step forrename
and providing aName
is the required step for building data list. You may not have been succeeded in building the lists. Try to fix it.convert the non-integer numbers (if there are any) to the integers. I am not sure if it is possible via Galaxy but this can be done with EXCEL or Libreoffice.