After small RNA sequencing I performed adapter, quality, and length trimming on fastq files. I then used miRDeep2 mapper to collapse reads followed by using miRDeep2 Quantifier to map the collapsed reads to miRbase fasta file for mature sequences. My output is a tabular file with 6 columns: miRNA name, read count, precursor name , total, seq, seq(norm). I would now like to use this data to do differential expression analysis and create some images for this data. Many of the downstream programs in Galaxy seem to require SAM/BAM files and GFF or GTF files and more genomic information like chromosome location in the files (which is not in my tabular files). Is there a good program for me to use which can take the normalized read counts for each miRNA in different samples and do differential expression on them or what is the best method for getting to that point? I even tried to create my own digital expression matrix for input into edgeR but empty files were returned when trying to run the program. If I could convert my tabular files mapped to the miRNA to SAM/BAM files that would also be helpful.
Since your description is rather general, I make the following assumptions for your case.
You have the tables (miRDeep2 outputs) corresponding to some samples of different conditions. You want to have DE of miRNAs between the conditions.
For calculating DE, I recommend you DESeq2 witch takes the raw read counts of each miRNA.
For illustrating the analysis steps, let's make a toy example. There are two conditions: treated vs untreated. Under each condition there are few samples.
For the example, you could try out the following steps:
generate the mirdeep2 table for each sample.
use the tool
cutto cut each table to retain the columns:
read count. (Here we need raw counts since deseq2 does normalization itself.)
use the buttons on
historyto group the tables according to the conditions. First click
Operations on multiple datasets, then select the cut tables under the same condition, then from
For all selected...click
Build dataset list, name each dataset list either treated or untreated. These dataset lists are used for the inputs of deseq2.
for the toy example, the inputs for deseq2 are following:
1: Factor level: treated
2: Factor level: untreated
Counts file(s), remember to select
- at last, execute and cross fingers