I am slightly confused about the input requirements for the differential expression tool

Question: (Closed) I am slightly confused about the input requirements for the differential expression tool - last step of Trinity

13 months ago by

Dennis • 10

Dennis • 10 wrote:

Hello all,

I have a pressing question...

To start with, I have read the Trinity methods paper and Googled TMM normalization and different unit format prior to asking this question here:

I have a paired ends RNAseq data set with 3 conditions and 3 replicates.

I have assembled a transcriptome de novo using Trinity by utilizing one sample from each group (if I try to use all 9 -> 19 files, the DRM shuts down my job). Following that I ran RSEM on each sample and built a matrix table of expected counts. The end goal of the project is to quantify differentially expressed genes in two treatment groups in relation to control

I then ran EdgeR on the counts table and it gave me pairwise comparison between all samples. I don't see the relevance of Ct1 being different from Ct3, but maybe it becomes useful in the following steps of the analysis or something.

After this step, the last tool on the RNAseq Trinity protocol is Analyze_Differential_Expression. It asks to input (i) EdgeR tar.gz file (got that!), and (ii) TMM normalized FPKM matrix.

It is the second item that I'm confused about:

1) I know there is a way to TMM normalize in R or if you install a local Trinity on your machine, is there a way to TMM-normalize while avoiding both the installation on your computer (somewhere on the Galaxy main instance, or on the Trinity instance) and R usage? I'm not lazy, but I don't really code and I'm not that friendly with R, so in my mind using Galaxy I hoping to avoid both. So can I do it in Excel or something and then create a tabular file I can use as a matrix in Analyze_Differential_Expression?

2) RSEM file for every sample spits out expected counts (which are used in EdgeR), as well as TPM and FPKM - I know one shouldn't use FPKM for differential expression analysis (I've read that much abundantly so far), but can I use TPM values instead of TMM-normalized FPKM?

The confusion on this point stems from the tutorial posted here: https://github.com/trinityrnaseq/GalaxyTrinityProtocol/wiki

There it simply states input "abundance estimation to matrix_DS_HS_log_Plat: Counts_matrix" as input for normalized FPKM counts - does this mean I can just use raw counts like I used in EdgeR (i.e., the same file) to work with here? In the tutorial it seems like the same file labelled 400 is being used.

Thank you for all the help and answers in advance!

Dennis

rna-seq galaxy • 871 views

ADD COMMENT • link •

modified 13 months ago by Jennifer Hillman Jackson ♦ 25k • written 13 months ago by Dennis • 10

Hello Dennis!

We believe that this post does not fit the main topic of this site.

Reposted at another forum

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY • link written 13 months ago by Dennis • 10

13 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hi,

The tools needed for that tutorial are not all installed at https://usegalaxy.org. So yes, to use it exactly as published, a loca/docker/cloud Galaxy would be needed to install their tutorial tools.

The Galaxy training network hosts tutorials for assembly and differential expression workflows. Most of these will have either the tutorial tools on main or an associated customized docker will be available (often both). Please see: https://galaxyproject.org/learn/

Thanks! Jen, Galaxy team

ADD COMMENT • link modified 13 months ago • written 13 months ago by Jennifer Hillman Jackson ♦ 25k

Thanks Jen!

Is there a way to TMM-normalize using R or Excel?

Cheers, Dennis

ADD REPLY • link written 13 months ago by Dennis • 10

It would be best to ask this question at https://www.biostars.org/ where you will get many more eyes on it. I expect you will get a few different replies/alternatives to choose from. Even just searching by your question brings up much prior QA on the subject (too many look interesting to link back here individually).

In short, you want to get to read counts for the best results. If working at https://usegalaxy.org, Salmon or Sailfish can be used as TPM inputs to Deseq2. Htseq_count and featurecounts can also create count input for Deseq2. See the tutorials I linked for those sorts of workflows.

ADD REPLY • link modified 13 months ago • written 13 months ago by Jennifer Hillman Jackson ♦ 25k

Thank you Jen,

I'll repost this where suggested, but DESeq2 wasn't working for me with TPM reads if you remember, there is a bug. I'm still trying to resolve what it needs as an input as I tried "feeding" it all sorts of TPM input tables, but fortunately found someone who may help me run Deseq2 in R... I sort of didn't want to resort to the same approach with Trinity.

I'll need to delete/close this thread - I see people constantly being agitated at someone posting their queries in more than one place.

P.S. I suppose I should be able to use TPM input from RSEM and plug it into Deseq2 if it were to work with TPM.

ADD REPLY • link written 13 months ago by Dennis • 10

Similar posts • Search »