Question: Phenotype data for Ballgown tool
gravatar for tamrin.chowdhury
18 months ago by
tamrin.chowdhury40 wrote:

Hello I am doing RNA-seq analysis using local server galaxy. I'm following the new Tuxedo protocol HISAT>Stringtie>Ballgown. I have a question regarding Ballgown input. Apart from the 5 stringtie output files ballgown requires another phenotype data file for input. Is there any reference phenotype data file? or Do I have to write the file myself using text editor?

I have two experiment groups of Tumor and Normal with 30 matching samples in each group. Can anyone kindly give me an idea about how to arrange the phenotype data file?

rna-seq galaxy ballgown • 2.3k views
ADD COMMENTlink modified 11 months ago by rpink10 • written 18 months ago by tamrin.chowdhury40
gravatar for Jennifer Hillman Jackson
18 months ago by
United States
Jennifer Hillman Jackson25k wrote:


This input file is a two-column file created by the analyst (you) that contains the sample name and phenotype. The order of the samples in this file (pData) must be in the same exact order as the samples given in other inputs or the tool will error.

The Ballgown wrapper was last updated some time ago, so I am not sure exactly about how well it works with the latest versions of Galaxy. If you have problems, contacting the tool wrapper author (aka repository owner) through the Tool Shed is how to get help or to report problems: > log in > search for tool > click into tool and find the contact option in a top right menu named "Repository Actions".

The Ballgown tool (and the existing wrapper) may be reviewed again before or during the GCC Hackathon in late June.

Hope this helps! Others that have used the tool recently in the latest Galaxy release are encouraged to add more.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 18 months ago by Jennifer Hillman Jackson25k

Thank you for the kind advice and suggestion.

ADD REPLYlink written 18 months ago by tamrin.chowdhury40

Hi, may I ask a simple question related to this? What's the phenotype? I have two samples and two replicates for each sample. I am not sure what the phenotype is when I tried to generate pData.

ADD REPLYlink written 15 months ago by sophialovechan10

The "phenotype" is a description of the sample. To view an example, download the test data included the primary publication here:

Specifically, the sample pData input chrX_data/geuvadis_phenodata.csv looks like this:

ADD REPLYlink modified 15 months ago • written 15 months ago by Jennifer Hillman Jackson25k

I have a most basic question: Where do the id names for the Ballgown phenotype csv table come from?

For example my dataset names in the right-side history panel are: "193: Stringtie on data 2 and data 11:exon to transcript mapping" or "data 11" or "Stringtie on data 2 and data 11"

ADD REPLYlink written 14 months ago by rjames0
gravatar for rpink
11 months ago by
rpink10 wrote:

Hello @rjames did you get to the bottom of this did you use the "193: Stringtie on data 2 and data 11:exon to transcript mapping" or the data storage in folders e.g. "dataset_2010.dat", as Im lost on this also. The pheno data looks straight forward but the sample naming file is a bit tricky, Ryan

ADD COMMENTlink written 11 months ago by rpink10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour