Fatal error with DESeq2 when processing TPM-based Salmon files and using a tabular transcript-gene mapping file

Question: Fatal error with DESeq2 when processing TPM-based Salmon files and using a tabular transcript-gene mapping file

13 months ago by

Dennis • 10

Dennis • 10 wrote:

Hi there,

I have used Salmon to map RNAseq reads to a transcriptome. I then proceeded to analyze Salmon output with DESeq2: - choice of input data: TPM values (e.g., from salmon) - transcript-ID and gene-ID mapping file (tabular file with transcript-gene mapping)

I used a tabular text file that contains two columns - one with SeqName and one with Description. Sample below:

SeqName Description
TNI017526-RC PREDICTED: uncharacterized protein LOC106135801
TNI017526-RD PREDICTED: uncharacterized protein LOC106135801
TNI017526-RE PREDICTED: uncharacterized protein LOC106135801
TNI017526-RB PREDICTED: uncharacterized protein LOC106135801
TNI017526-RA PREDICTED: uncharacterized protein LOC106135801

However, I keep getting a fatal error message: Fatal error: An undefined error occurred, please check your input carefully and contact your administrator. Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 5 elements Calls: read.table -> scan

What does that mean? Is there a specific input format requirement for the

I found two similar issues reported here https://biostar.usegalaxy.org/p/23985/ - however, I'm already using a tabular text file that has transcript and gene names only.

Thank you for your help!

Best, Dennis

rna-seq software error • 731 views

ADD COMMENT • link •

modified 13 months ago • written 13 months ago by Dennis • 10

13 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

I would try this first: Remove the header line from the tabular file. That is the reason for the current failure.

If that doesn't work or some other error comes up after, try to create the file so that both IDs do not contain any spaces. Use underscores instead or even better, get rid of the extra content (including the colon :).

A format like this would be ideal (with a tab between the two values):

TNI017526-RC LOC106135801
TNI017526-RD LOC106135801
TNI017526-RE LOC106135801
TNI017526-RB LOC106135801
TNI017526-RA LOC106135801

If that still doesn't work, where are you working? If at Galaxy Main https://usegalaxy.org or can reproduce the problem there, a bug report can be sent in. Leave all inputs/outputs undeleted (including the test with this format of input) and include a link to this Biostars post. There can be other problems with inputs, and you can try to double check your with the FAQs here first if you want (is quicker): https://galaxyproject.org/support/#troubleshooting

If not working at Galaxy Main, let us know where and we can follow up from there.

Thanks! Jen, Galaxy team

ADD COMMENT • link modified 13 months ago • written 13 months ago by Jennifer Hillman Jackson ♦ 25k

13 months ago by

Dennis • 10

Dennis • 10 wrote:

Hi Jen,

This was my first suspect so I removed all the spaces and merged all the columns into one producing a one long name, no spaces or any non-alphanumeric characters for the gene ID.

Now the tool runs, but comes back empty :-/

All three files: normalized counts, plots and files on data are empty...

I'm working on Galaxy Main, I can submit a report or something like that if it'll help resolving the issue.

ADD COMMENT • link modified 13 months ago • written 13 months ago by Dennis • 10

Please log in to add an answer.

Similar posts • Search »