Galaxy: Genomic Intervals to strict BED Conversion

Question: Galaxy: Genomic Intervals to strict BED Conversion

6 months ago by

rydmsound • 0 wrote:

Hello:

I'm new to the forum and Galaxy/bioinformatics.

In the last two months I've been following the documentation/training materials provided and have had success.

I had issues unfortunately with converting from the genomic interval data type to the BED format (in the datatype "tab" menu) as seen in this material here:

http://galaxyproject.github.io/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#file-preparation

After uploading the raw Peak Region dataset (format: interval database: mm9, columns are defined by Hpeak manual), converting the data on column 1 as instructed and verifying changes, we are instructed to convert this file from the interval format to BED by using the "Convert menu:" and using "Convert to New Format: Convert Genomic Intervals to BED".

This yields a traceback error (I've sourced this from usegalaxy.org):

Traceback (most recent call last): File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/datatypes/converters/interval_to_bed_converter.py", line 65, in <module> __main__() File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/datatypes/converter

However, I continued on and processed the promoter regions for the UCSC reference file with the "Get Flanks" tool, the output converts the BED file to the interval format while still referencing the mm9 assembly.

I found this a little strange as the material instructs and indicates:

Compare the rows of the resulting BED file with the input to find out how the start and end positions changed compare the regions.

After reviewing the processed UCSC file I can see that the column layout of the output dataset still appeared to be similar to the BED format but with the starts and ends positions changed (expected and normal).

I thought this would be an interesting chance to experiment to see if I could convert this output dataset from genomic interval back to BED format as the "arrangment" of the dataset was close. Using the same menu I received the same error as when trying to convert the Peak regions file.

Again, pushing ahead in the training material brought me to the "Extract workflow" section, and at step 3 it instructs to uncheck parts of the workflow not necessary for the next analysis and refers to the conversion in the pipeline as:

Convert Genomic Intervals to strict BED

This was a little confusing but I went back to try converting my peak regions file using this option in the Convert menu using this tool, and got the same error result again.

After reading that the minimum BED format is to have the first three columns defined (Chromosome, Start, End) I decided to try and experiment again by removing all other columns leaving the first three of the peak regions intervals file which have those values. Again, I had the same error upon trying to convert to BED (not strict BED this time).

When I look over the interval_to_bed_converter.py it appears to want to write a 6 column BED format:

 out.write("%s\t%i\t%i\t%s\t%i\t%s\n" % (region.chrom, region.start, region.end, name, 0, region.strand))

In the original peaks file I do not see columns with data representing for "name", "score" or "strand" information.

Hpeaks uses a "summit" number of hypothetical fragments counted at the site (not sure if this translates to BED format score).

My Questions surrounding this:

Why isn't this working?

Is there a way I can manually rebuild the bed format I need? If so how/where could I get the data to complete the required fields?

Do I necessarily need to convert the interval files to BED format if some galaxy tools (i.e. Get Flanks) apparently treat BED files as interval type formats (Let me know also if I've misunderstood this)?

I'm fairly certain I'm naively missing some pre-requisite knowledge or step, but am hitting a loss for insight into the issue.

I have my history of doing this saved on my usegalaxy.org account. I can also provide more details if required.

I apologize if this was already posted and answered (I couldn't find it myself).

Also, I wanted to thank the entire community for making Galaxy and providing a truly amazing resource.

Rob

convert interval bed galaxy chip-seq • 334 views

ADD COMMENT • link •

modified 6 months ago by Jennifer Hillman Jackson ♦ 25k • written 6 months ago by rydmsound • 0

6 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Starting from an interval formatted dataset: use the tool Text Manipulation > Cut to reduce the data to just chrom, start, end (three columns), then assign the BED datatype.

If you need or want to include strand, then the content for a BED6 (six column) needs to be in your input Interval dataset. If you don't have score or name that fit specification, these can be placeholder values added using the tool Text Manipulation > Compute.

FAQs: https://galaxyproject.org/support/

Format help for Tabular/BED/Interval Datasets
Common datatypes explained
How do I find, adjust, and/or correct metadata?

If you are getting an error with a Convert tool, it could be that the input does not really match the currently assigned datatype. It is hard to tell if that is the case, so please double check first using the FAQs above. If the tool errors with correct inputs, please send this in as a bug report (green "bug" inside the red error dataset) and we can help review/troubleshoot more. Please include a link to your Biostars post in the comments along with a link to the specific tutorial you are following.

Thanks! Jen, Galaxy team

ADD COMMENT • link modified 6 months ago • written 6 months ago by Jennifer Hillman Jackson ♦ 25k

I'm sorry for not replying sooner but I wanted to let you know that the advice was helpful and I was able to manually manipulate and convert my data types as mentioned. Thank You!

I haven't had much time to come back to this again as I am constantly moving in an apparently forward direction, but I will reexamine the data types and reference the FAQ. If I still get the same error than I'll report the issue if it happens again. I'll provide as much information and assistance if I can in resolving the problem.

Thanks again,

Rob

ADD REPLY • link modified 4 months ago • written 4 months ago by rydmsound • 0

Similar posts • Search »