Duplicate row names error with EdgeR on FeatureCounts files

Question: Duplicate row names error with EdgeR on FeatureCounts files

5 months ago by

mconerly • 0 wrote:

Hi all,

I created tabular count files using FeatureCounts with a GTF file (iGenomes, UCSC hg38) and BAM files from TopHat. The files look correct, but I am getting the following error:

Fatal error: Exit code 1 ()
Warning message:
In data.frame(sampleID = samplenames, factors) :
  row names were found from a short variable and have been discarded
Error in `row.names<-.data.frame`(`*tmp*`, value = c("X10", "X1652", "X0",  : 
  duplicate 'row.names' are not allowed
Calls: row.names<- -> row.names<-.data.frame
Warning message:
non-unique values when setting 'row.names': ‘X0’, ‘X1652’

I have used EdgeR on count tables produced by Htseq-count, so I think the problem likely lies in the FeatureCounts. As far as I can tell, there are no duplicate row names (gene IDs). I used the following parameters to generate the files:

Alignment file 28: TopHat on data 7 and data 8: accepted_hits
Gene annotation file history Gene annotation file 71: genes.gtf
Output format Gene-ID "\t" read-count "\t" gene-length
Create gene-length file False
pe_parameters
Count fragments instead of reads
Only allow fragments with both reads aligned True
Exclude chimeric fragments True
extended_parameters GFF feature type filter exon
GFF gene identifier gene_id On feature level False
Allow read to contribute to multiple features False
Strand specificity of the protocol Unstranded
Count multi-mapping reads/fragments
Minimum mapping quality per read 12
Exon-exon junctions
Long reads False
Count reads by read group False
Largest overlap False
Minimum bases of overlap 1
Minimum fraction (of read) overlapping a feature 0
Minimum fraction (of feature) overlapping a read 0
Read 5' extension 0
Read 3' extension 0
Reduce read to single position Leave the read as it is Only count primary alignments False
Ignore reads marked as duplicate False
Ignore unspliced alignments False

Any suggestions on what is going wrong and how to fix it?

Update: I ran DESeq2 on the count files and it ran fine, so I am not sure why I am having this problem with EdgeR, even though it can run on other count files.

rna-seq featurecounts htseq • 566 views

ADD COMMENT • link •

modified 5 months ago by Jennifer Hillman Jackson ♦ 25k • written 5 months ago by mconerly • 0

Is there a reason why this was set to False?

GFF gene identifier gene_id On feature level False

ADD REPLY • link written 5 months ago by Jennifer Hillman Jackson ♦ 25k

Interesting - I had not noticed that. I'll try re-running with "GFF gene identifier gene_id On feature level" set to TRUE. I should also note that we are hosting our own internal (cloud-based) instance of Galaxy, so it is possible something is wrong in the installation.

ADD REPLY • link modified 5 months ago • written 5 months ago by mconerly • 0

Unfortunately that did not solve the problem.

ADD REPLY • link written 5 months ago by mconerly • 0

Similar posts • Search »