I would be forever grateful if someone could take the time to help me. I am attempting to analyze micro RNASeq data. I downloaded the microRNA sequences from miRBase and mapped my samples (fastq files) using Bowtie. After this, I ran flagstat to check the alignments. The alignments looked great, 75% and above. I then ran Cufflinks to assemble transcripts and estimate abundances in each sample. This data also looked great, here is what it looked like:
tracking_id | gene_id | tss_id | locus | length | coverage | FPKM | FPKM_conf_lo | FPKM_conf_hi | FPKM_status |
CUFF.1 | CUFF.1 | - | hsa-let-7a-1:4-80 | - | - | 1.04E+07 | 1.03E+07 | 1.05E+07 | OK |
CUFF.2 | CUFF.2 | - | hsa-let-7a-2:3-70 | - | - | 1.50E+07 | 1.48E+07 | 1.52E+07 | OK |
CUFF.3 | CUFF.3 | - | hsa-let-7a-3:2-74 | - | - | 1.22E+07 | 1.21E+07 | 1.24E+07 | OK |
Next, I ran Cuffdiff. When running this tool a gff or gtf file is needed so I used the gff file available for grch38 on miRBase. Seemingly this should have worked perfect but when I run the tool the values are all zero and I seem to lose the names of the miRNAs. I get the following output:
gene_id | locus | sample_1 | sample_2 | status | value_1 | value_2 | log2(fold_change) | test_stat | p_value | q_value | significant |
XLOC_000001 | chr1:30365-30503 | Control | Treatment | NOTEST | 0 | 0 | 0 | 0 | 1 | 1 | no |
XLOC_000002 | chr1:1167103-1167198 | Control | Treatment | NOTEST | 0.00E+00 | 0.00E+00 | 0 | 0 | 1 | 1 | no |
XLOC_000003 | chr1:1167862-1167952 | Control | Treatment | NOTEST | 0.00E+00 | 0.00E+00 | 0 | 0 | 1 | 1 | no |
XLOC_000004 | chr1:1169004-1169087 | Control | Treatment | NOTEST | 0.00E+00 | 0.00E+00 | 0 | 0 | 1 | 1 | no |
I realize this is probably some sort of compatibility issue between the fasta file containing the miRNAs and the gff file, does anyone know how I can solve this issue. Here is what the fasta file looks like:
>hsa-let-7a-1 MI0000060 TGGGATGAGGTAGTAGGTTGTATAGTTTTAGGGTCACACCCACCACTGGGAGATAACTATACAATCTACTGTCTTTCCTA >hsa-let-7a-2 MI0000061 AGGTTGAGGTAGTAGGTTGTATAGTTTAGAATTACATCAAGGGAGATAACTGTACAGCCTCCTAGCTTTCCT >hsa-let-7a-3 MI0000062 GGGTGAGGTAGTAGGTTGTATAGTTTGGGGCTCTGCCCTGCTATGGGATAACTATACAATCTACTGTCTTTCCT
Here is what the gff file looks like:
chr1 | miRNA_primary_transcript | 17369 | 17436 | ID=MI0022705;Alias=MI0022705;Name=hsa-mir-6859-1 |
chr1 | miRNA | 17409 | 17431 | ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705 |
chr1 | miRNA | 17369 | 1.74E+04 | ID=MIMAT0027619;Alias=MIMAT0027619;Name=hsa-miR-6859-3p;Derives_from=MI0022705 |