Question: Galaxy: Use header row of tabular data for column names?
0
gravatar for jcoliver
2.4 years ago by
jcoliver10
jcoliver10 wrote:

Is there a way to use metadata in a header row of a tab-delimited format for column names? Alternatively, is there a way to manually edit column names to something other than "1", "2", "3", etc.?

Background: I grabbed RefGene data for the entire human genome from the UCSC Genome Browser's Table Browser by sending the data to Galaxy. The output format of the query was "all fields from selected query". The Tabular data comes through just fine, but the first line is the header row:

"#bin name chrom strand ... exonFrames"

I would like to (1) use this information for the actual column names in the data, replacing the default enumeration. Ideally, it would look something like the column naming protocol for BED files "1. chrom 2. chromStart 3. chromEnd ..." and (2) after this information is included as column names, remove that header row. The latter is easy (I think), but I cannot find a means of accomplishing the former. There is an old discussion thread related to this (http://dev.list.galaxyproject.org/Tabular-file-metadata-columns-names-td4138590.html), but it does not appear to resolve the current issue. Any suggestions?

refgene galaxy tabular • 1.2k views
ADD COMMENTlink modified 2.4 years ago by Jennifer Hillman Jackson25k • written 2.4 years ago by jcoliver10
1
gravatar for Jennifer Hillman Jackson
2.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The simplest solution is to assign a known datatype, then assign the available defined metadata column assignments to your dataset. "Interval" seems like it would work for your case. Follow up would be to remove the header line and potentially the "bin" column (this is just an indexing number - probably not needed and can be ignored in many cases).

A broader solution is to create a new datatype specific to your data files and the tools that consume this as input. Contributions can be made to the Tool Shed. This is how:

Enhancing Galaxy itself to permit user-defined column (and potentially other) metadata, as a way of annotating, is a bigger project and use-cases/utility would have to be reviewed. User defined column metadata that is not utilized by tools directly is a form of annotation. You can always add in this type of annotation using the "Annotation" field associated with each dataset (free-form text).

More discussion is encouraged. You might consider posting to the galaxy-dev list (referencing the prior thread again) to see what other feedback the development community has now.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 2.4 years ago by Jennifer Hillman Jackson25k
1

All makes sense. I mostly wanted to make sure I wasn't missing anything obvious. I'm not sure if this warrants a custom datatype, but I'll look into it.

ADD REPLYlink written 2.4 years ago by jcoliver10

I'd also like to see Galaxy able to read header names when they are present in the tabular file. This is a trivially usable capability in high level languages like R and Python, so Galaxy (which should be even higher level) should not be missing this feature. One good reason to have named columns is to allow subsetting files to just a few needed columns without having to fiddle with column indices.

ADD REPLYlink written 18 months ago by matt.chambers4250

New datatypes can be created and added to any Galaxy server but these currently need to be pre-defined. However, if you would like to see some type of ad-hoc datatype creation/metadata assignment feature in Galaxy, it could be proposed as an enhancement request ticket against the core Github repository or community code changes/contributions to do enable this functionality could be submitted for review. I don't think that a new standalone datatype (added to the Tool Shed, which is open for contributions without review) will quite achieve what you want by itself. https://github.com/galaxyproject/galaxy

If you want to discuss this with the other community developers for feedback before opening a ticket, perhaps propose the idea at the galaxy-dev@lists.galaxyproject.org mailing list first?

ADD REPLYlink written 18 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour