Question: Finding Percent Totals in RNA-Seq
1
19 months ago by
annie.e.collier20 wrote:

Does anyone know how to sum and divide RNA-Seq datasets? I have two datasets (FPKM values) and want to find the percent total of all genes in each dataset (they are two fractions of a whole sample), so essentially I need to divide dataset 1 by the sum of dataset 1 and 2. I can't seem to find a straightforward way to do this, preferably in Galaxy or R because I am new to this stuff. Seems like simple math and I can do it for individual genes but want to plot all genes into a nice figure to see trends. Thanks!

modified 18 months ago by Jennifer Hillman Jackson25k • written 19 months ago by annie.e.collier20
0
18 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

If a reference annotation dataset was used as an input to Cuffdiff that includes the attribute `gene_name`, then that common gene identifier can be used to link the two dataset's content and perform summary calculations.

• Group or Datamash can be used to count and add up the number of occurrences of each gene identifier
• Line/Word/Character count can be used on that output to count up the number of total unique genes
• The number of lines equals the number of genes
• Join two files can be used to merge datasets together by a common gene identifier
• Compute can be used to add, divide, and multiply values in tabular data per line. (plus other functions)
• Example syntax for add: `c5+c6` would mean column 5 plus column 6. The result can be rounded or not depending on the desired output value
• Example syntax for divide: `c5/c6` would mean column 5 divided by column 6. Do not round the result to get the result as a fraction
• Example syntax for multiply: `cX*100"` where cX is the fraction and the result is a percentage value. This can be rounded or not, although rounding will make graphing easier

The steps are performed by individual tools where many are similar to line-command functions. Once you work out a protocol, extract a workflow from the history, edit the workflow for just these operations, and reuse that workflow to create what will become in essence a single-click "custom tool".

Related manual help for Cuffdiff inputs, output file formats, and where to obtain reference annotation files that contain all the attribute values used by tools in this suite (p_id, tss_id, gene_name):

Hopefully this helps! Jen, Galaxy team