Question: Duplicate row names error with EdgeR on FeatureCounts files
0
gravatar for mconerly
4 months ago by
mconerly0
mconerly0 wrote:

Hi all,

I created tabular count files using FeatureCounts with a GTF file (iGenomes, UCSC hg38) and BAM files from TopHat. The files look correct, but I am getting the following error:

Fatal error: Exit code 1 ()
Warning message:
In data.frame(sampleID = samplenames, factors) :
  row names were found from a short variable and have been discarded
Error in `row.names<-.data.frame`(`*tmp*`, value = c("X10", "X1652", "X0",  : 
  duplicate 'row.names' are not allowed
Calls: row.names<- -> row.names<-.data.frame
Warning message:
non-unique values when setting 'row.names': ‘X0’, ‘X1652’

I have used EdgeR on count tables produced by Htseq-count, so I think the problem likely lies in the FeatureCounts. As far as I can tell, there are no duplicate row names (gene IDs). I used the following parameters to generate the files:

Alignment file 28: TopHat on data 7 and data 8: accepted_hits
Gene annotation file history Gene annotation file 71: genes.gtf
Output format Gene-ID "\t" read-count "\t" gene-length
Create gene-length file False
pe_parameters
Count fragments instead of reads
Only allow fragments with both reads aligned True
Exclude chimeric fragments True
extended_parameters GFF feature type filter exon
GFF gene identifier gene_id On feature level False
Allow read to contribute to multiple features False
Strand specificity of the protocol Unstranded
Count multi-mapping reads/fragments
Minimum mapping quality per read 12
Exon-exon junctions
Long reads False
Count reads by read group False
Largest overlap False
Minimum bases of overlap 1
Minimum fraction (of read) overlapping a feature 0
Minimum fraction (of feature) overlapping a read 0
Read 5' extension 0
Read 3' extension 0
Reduce read to single position Leave the read as it is Only count primary alignments False
Ignore reads marked as duplicate False
Ignore unspliced alignments False

Any suggestions on what is going wrong and how to fix it?

Update: I ran DESeq2 on the count files and it ran fine, so I am not sure why I am having this problem with EdgeR, even though it can run on other count files.

rna-seq featurecounts htseq • 435 views
ADD COMMENTlink modified 4 months ago by Jennifer Hillman Jackson25k • written 4 months ago by mconerly0

Is there a reason why this was set to False?

GFF gene identifier gene_id On feature level False
ADD REPLYlink written 4 months ago by Jennifer Hillman Jackson25k

Interesting - I had not noticed that. I'll try re-running with "GFF gene identifier gene_id On feature level" set to TRUE. I should also note that we are hosting our own internal (cloud-based) instance of Galaxy, so it is possible something is wrong in the installation.

ADD REPLYlink modified 4 months ago • written 4 months ago by mconerly0

Unfortunately that did not solve the problem.

ADD REPLYlink written 4 months ago by mconerly0
0
gravatar for Jennifer Hillman Jackson
4 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

I agree. The problem could be with the inputs or something is wrong on the server.

For data content/format issues, please see the FAQs here: https://galaxyproject.org/support/#troubleshooting

For help with server configuration or the tool itself, I would suggest asking for help from the developers directly at Gitter:

Thanks! Jen, Galaxy team

ADD COMMENTlink written 4 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 128 users visited in the last hour