Phenotype data for Ballgown tool

Question: Phenotype data for Ballgown tool

18 months ago by

Hello I am doing RNA-seq analysis using local server galaxy. I'm following the new Tuxedo protocol HISAT>Stringtie>Ballgown. I have a question regarding Ballgown input. Apart from the 5 stringtie output files ballgown requires another phenotype data file for input. Is there any reference phenotype data file? or Do I have to write the file myself using text editor?

I have two experiment groups of Tumor and Normal with 30 matching samples in each group. Can anyone kindly give me an idea about how to arrange the phenotype data file?

rna-seq galaxy ballgown • 2.3k views

ADD COMMENT • link •

modified 11 months ago by rpink • 10 • written 18 months ago by tamrin.chowdhury • 40

18 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

This input file is a two-column file created by the analyst (you) that contains the sample name and phenotype. The order of the samples in this file (pData) must be in the same exact order as the samples given in other inputs or the tool will error.

The Ballgown wrapper was last updated some time ago, so I am not sure exactly about how well it works with the latest versions of Galaxy. If you have problems, contacting the tool wrapper author (aka repository owner) through the Tool Shed is how to get help or to report problems: http://usegalaxy.org/toolshed > log in > search for tool > click into tool and find the contact option in a top right menu named "Repository Actions".

The Ballgown tool (and the existing wrapper) may be reviewed again before or during the GCC Hackathon in late June.

Hope this helps! Others that have used the tool recently in the latest Galaxy release are encouraged to add more.

Thanks, Jen, Galaxy team

ADD COMMENT • link written 18 months ago by Jennifer Hillman Jackson ♦ 25k

Thank you for the kind advice and suggestion.

ADD REPLY • link written 18 months ago by tamrin.chowdhury • 40

Hi, may I ask a simple question related to this? What's the phenotype? I have two samples and two replicates for each sample. I am not sure what the phenotype is when I tried to generate pData.

ADD REPLY • link written 15 months ago by sophialovechan • 10

The "phenotype" is a description of the sample. To view an example, download the test data included the primary publication here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5032908/

Specifically, the sample pData input chrX_data/geuvadis_phenodata.csv looks like this:

"ids","sex","population"
"ERR188044","male","YRI"
"ERR188104","male","YRI"
"ERR188234","female","YRI"
"ERR188245","female","GBR"
"ERR188257","male","GBR"
"ERR188273","female","YRI"
"ERR188337","female","GBR"
"ERR188383","male","GBR"
"ERR188401","male","GBR"
"ERR188428","female","GBR"
"ERR188454","male","YRI"
"ERR204916","female","YRI"

ADD REPLY • link modified 15 months ago • written 15 months ago by Jennifer Hillman Jackson ♦ 25k

I have a most basic question: Where do the id names for the Ballgown phenotype csv table come from?

For example my dataset names in the right-side history panel are: "193: Stringtie on data 2 and data 11:exon to transcript mapping" or "data 11" or "Stringtie on data 2 and data 11"

ADD REPLY • link written 14 months ago by rjames • 0

Similar posts • Search »