Hi all,
I created tabular count files using FeatureCounts with a GTF file (iGenomes, UCSC hg38) and BAM files from TopHat. The files look correct, but I am getting the following error:
Fatal error: Exit code 1 ()
Warning message:
In data.frame(sampleID = samplenames, factors) :
row names were found from a short variable and have been discarded
Error in `row.names<-.data.frame`(`*tmp*`, value = c("X10", "X1652", "X0", :
duplicate 'row.names' are not allowed
Calls: row.names<- -> row.names<-.data.frame
Warning message:
non-unique values when setting 'row.names': ‘X0’, ‘X1652’
I have used EdgeR on count tables produced by Htseq-count, so I think the problem likely lies in the FeatureCounts. As far as I can tell, there are no duplicate row names (gene IDs). I used the following parameters to generate the files:
Alignment file 28: TopHat on data 7 and data 8: accepted_hits
Gene annotation file history
Gene annotation file 71: genes.gtf
Output format Gene-ID "\t" read-count "\t" gene-length
Create gene-length file False
pe_parameters
Count fragments instead of reads
Only allow fragments with both reads aligned True
Exclude chimeric fragments True
extended_parameters
GFF feature type filter exon
GFF gene identifier gene_id
On feature level False
Allow read to contribute to multiple features False
Strand specificity of the protocol Unstranded
Count multi-mapping reads/fragments
Minimum mapping quality per read 12
Exon-exon junctions
Long reads False
Count reads by read group False
Largest overlap False
Minimum bases of overlap 1
Minimum fraction (of read) overlapping a feature 0
Minimum fraction (of feature) overlapping a read 0
Read 5' extension 0
Read 3' extension 0
Reduce read to single position Leave the read as it is
Only count primary alignments False
Ignore reads marked as duplicate False
Ignore unspliced alignments False
Any suggestions on what is going wrong and how to fix it?
Update: I ran DESeq2 on the count files and it ran fine, so I am not sure why I am having this problem with EdgeR, even though it can run on other count files.
Is there a reason why this was set to False?
Interesting - I had not noticed that. I'll try re-running with "GFF gene identifier gene_id On feature level" set to TRUE. I should also note that we are hosting our own internal (cloud-based) instance of Galaxy, so it is possible something is wrong in the installation.
Unfortunately that did not solve the problem.