Question: salmon gene quant to DESeq2
1
gravatar for matthew.johnson
16 months ago by
matthew.johnson30 wrote:

Hi again - I successfully ran salmon on my fastq files, including gene-level summary via a simple two-column map file of transcript-to-gene-ID. This gives me an output like this for each fastq:

Name    Length  EffectiveLength TPM NumReads
ENSMUSG00000114165  2016    1815.57 0.0732938   3.29409

I tried simply passing these outputs on as input to DESeq2 for differential expression, selecting under input "TPM values (e.g. from sailfish or salmon)", then for Gene mapping format selecting "Transcript-ID and Gene-ID mapping file" and specifying the same two-column table used for the salmon runs (haha).

I got this vague error:

Fatal error: An undefined error occurred, please check your input carefully and contact your administrator.
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 2 did not have 6 elements
Calls: read.table -> scan

So, not sure where the problem's arising, but for starters, the salmon output contains both TPM and NumReads (i.e., I presume, a read count estimate). Do I need to extract one or the other of these columns to pass on to DESeq2? And also, is the transcript-to-gene map even necessary for DESeq2 since the gene-level summary has already been done by salmon?

Thanks so much for your help!

rna-seq galaxy • 2.3k views
ADD COMMENTlink modified 3 days ago by shzad0 • written 16 months ago by matthew.johnson30

Hi Matthew,

Have you resolved this issue? I've run into similar problems with my use of DESeq2 on Salmon files and TPM input coupled with a transcript-ID gene-ID mapping file.

ADD REPLYlink written 13 months ago by Dennis10
1
gravatar for Jennifer Hillman Jackson
16 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello - I haven't seen this before but have mostly worked with count input (not TMP). The tutorials for Deseq also use count inputs. https://galaxyproject.org/tutorials/nt_rnaseq/#analysis-of-the-differential-gene-expression & http://galaxyproject.github.io/training-material/topics/transcriptomics/

Are you working at Galaxy Main (http://usegalaxy.org)? We would like to look at the exact usage and parameters to test if this option has a bug - or to help clarify usage - and sharing a history link or sending in a bug report is the most direct way to do that. How to: https://galaxyproject.org/issues/#usage-problem-reporting Please be sure to leave the datasets undeleted and include a link to this post so we can associate the two.

Others are still welcome to reply if they understand your use case from the given information.

Jen, Galaxy team

ADD COMMENTlink written 16 months ago by Jennifer Hillman Jackson25k

I didn't see a bug report sent in yet, but you can still do that if usage problems are still present.

ADD REPLYlink written 16 months ago by Jennifer Hillman Jackson25k
1
gravatar for devbt15
15 months ago by
devbt1530
devbt1530 wrote:

Dear all, I used the same pipeline and used the SALMON output for DESeq2 (there is an option of TPM input) along with .gtf of annotations. It gave me an error eventually:

Fatal error: An undefined error occurred, please check your input carefully and contact your administrator. Warning messages: 1: multiple methods tables found for 'arbind' 2: multiple methods tables found for 'acbind' 3: replacing previous import 'IRanges::arbind' by 'SummarizedExperiment::arbind' when loading 'DESeq2' 4: replacing previous import 'IRanges::acbind' by 'SummarizedExperiment::acbind' when loading 'DESeq2' Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Error in .normarg_makeTxDb_genes(genes, transcripts_tx_id) : 'genes$gene_id' must be a character vector (or factor) with no NAs Calls: makeTxDbFromGFF ... makeTxDbFromGRanges -> makeTxDb -> .normarg_makeTxDb_genes Warning messages: 1: replacing previous import 'IRanges::arbind' by 'SummarizedExperiment::arbind' when loading 'GenomicAlignments' 2: replacing previous import 'IRanges::acbind' by 'SummarizedExperiment::acbind' when loading 'GenomicAlignments' 3: In makeTxDbFromGRanges(gr, metadata = metadata) : The following transcripts were dropped because their exon ranks could not be inferred (either because the exons are not on the same chromosome/strand or because they are not separated by introns): Lj0g3v0075389.1, Lj0g3v0124429.1, Lj1g3v4579120.1, Lj3g3v1775190.1

Could you please clear it where the error is happening? Is it due to incompatible SALMON output and .gtf file? Thank you in advance. Regards, Das.

ADD COMMENTlink written 15 months ago by devbt1530
1

This was solved with email but putting out the solution for others that may run into problems with this tool.

Most of the above error message contains warnings ("warnings" do not cause of a tool failure). The initial portion of the error message is where the tool failed ("Fatal error: An undefined error occurred..."). This generally indicates a format or content problem with one or more inputs.

This problem and other related problems were with the input GTF file not containing lines for type=transcript (3rd column of file and required by the tool). Switching to using a transcript-to-gene text input instead allowed the tool to process the data correctly.

ADD REPLYlink written 15 months ago by Jennifer Hillman Jackson25k
1
gravatar for jevanveen
7 months ago by
jevanveen10
jevanveen10 wrote:

i had the same issue and resolved it as follows:

inputs:

salmon quantification (not gene quant) files generated in Galaxy

salmon transcript to gene map - tab delimited text file from ensembl

when i run as is, i get the line2/6elements error. the fix was simple - i removed header line from the tab delimited transcript to gene map and it worked well. note that you must NOT remove the header line from salmon quantification files or it will output 0 line files. also make certain that your transcript to gene map uses the same format accession numbers as you used to generate salmon files. cheers!

ADD COMMENTlink written 7 months ago by jevanveen10
0
gravatar for shzad
3 days ago by
shzad0
shzad0 wrote:

I am facing the same problem. I used Linux to make .sf file using salmon. Then uploaded these files to Galaxy. Now DESeq2 is demanding "Tabular file with Transcript - Gene mapping" and showing error if I pass one of the .sf file. I think the problem can be resolved by making DESeq2 to take the first Column of any .sf file as GENE-ID/Transcript ID.

Please help. Thank you

Shahzad

ADD COMMENTlink modified 3 days ago • written 3 days ago by shzad0

For the DESeq2 "Tabular file with Transcript - Gene mapping" input, use a two-column tabular dataset:

  • column1 == transcript
  • column2 == gene

Thanks! Jen, Galaxy team

ADD REPLYlink written 3 days ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour