Hello:
I'm new to the forum and Galaxy/bioinformatics.
In the last two months I've been following the documentation/training materials provided and have had success.
I had issues unfortunately with converting from the genomic interval data type to the BED format (in the datatype "tab" menu) as seen in this material here:
After uploading the raw Peak Region dataset (format: interval database: mm9, columns are defined by Hpeak manual), converting the data on column 1 as instructed and verifying changes, we are instructed to convert this file from the interval format to BED by using the "Convert menu:" and using "Convert to New Format: Convert Genomic Intervals to BED".
This yields a traceback error (I've sourced this from usegalaxy.org):
Traceback (most recent call last): File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/datatypes/converters/interval_to_bed_converter.py", line 65, in <module> __main__() File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/datatypes/converter
However, I continued on and processed the promoter regions for the UCSC reference file with the "Get Flanks" tool, the output converts the BED file to the interval format while still referencing the mm9 assembly.
I found this a little strange as the material instructs and indicates:
Compare the rows of the resulting BED file with the input to find out how the start and end positions changed compare the regions.
After reviewing the processed UCSC file I can see that the column layout of the output dataset still appeared to be similar to the BED format but with the starts and ends positions changed (expected and normal).
I thought this would be an interesting chance to experiment to see if I could convert this output dataset from genomic interval back to BED format as the "arrangment" of the dataset was close. Using the same menu I received the same error as when trying to convert the Peak regions file.
Again, pushing ahead in the training material brought me to the "Extract workflow" section, and at step 3 it instructs to uncheck parts of the workflow not necessary for the next analysis and refers to the conversion in the pipeline as:
Convert Genomic Intervals to strict BED
This was a little confusing but I went back to try converting my peak regions file using this option in the Convert menu using this tool, and got the same error result again.
After reading that the minimum BED format is to have the first three columns defined (Chromosome, Start, End) I decided to try and experiment again by removing all other columns leaving the first three of the peak regions intervals file which have those values. Again, I had the same error upon trying to convert to BED (not strict BED this time).
When I look over the interval_to_bed_converter.py it appears to want to write a 6 column BED format:
out.write("%s\t%i\t%i\t%s\t%i\t%s\n" % (region.chrom, region.start, region.end, name, 0, region.strand))
In the original peaks file I do not see columns with data representing for "name", "score" or "strand" information.
Hpeaks uses a "summit" number of hypothetical fragments counted at the site (not sure if this translates to BED format score).
My Questions surrounding this:
Why isn't this working?
Is there a way I can manually rebuild the bed format I need? If so how/where could I get the data to complete the required fields?
Do I necessarily need to convert the interval files to BED format if some galaxy tools (i.e. Get Flanks) apparently treat BED files as interval type formats (Let me know also if I've misunderstood this)?
I'm fairly certain I'm naively missing some pre-requisite knowledge or step, but am hitting a loss for insight into the issue.
I have my history of doing this saved on my usegalaxy.org account. I can also provide more details if required.
I apologize if this was already posted and answered (I couldn't find it myself).
Also, I wanted to thank the entire community for making Galaxy and providing a truly amazing resource.
Rob