Question: Extract Genomic Dna-Strand Information Is Not Recognized
0
gravatar for Sarah
6.7 years ago by
Sarah20
Sarah20 wrote:
Hello, I am trying to extract sequences from a FASTA file containing genomic information. The coordinates are in a tab-delimited format, which is recognized as BED format by Galaxy (meaning that the 6th column is correctly interpreted as 6. Strand). However, upon running "Fetch sequences" , Extract Genomic DNA only the +-strand information is included in the output FASTA file and I receive the following ERROR message: 1,431 sequences format: fasta, database: ? Info: 1476 warnings, 1st is: Invalid interval, start '1616' > end '1177'. Skipped 1476 invalid lines, 1st is #2, "scaffold00001 1616 1177 Fom - 1" Is this a bug? How can I can adjust my input data files to get the --strand sequences as well? I have seen a similar problem in an earlier posting and there it was suggested to manually adjust the strand information column 5, but this did not work for me neither. Many thanks for your all help!!!!! Sarah
galaxy • 981 views
ADD COMMENTlink modified 6.7 years ago by Jennifer Hillman Jackson25k • written 6.7 years ago by Sarah20
0
gravatar for Jennifer Hillman Jackson
6.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Sarah, One of the specifications of BED format is that the coordinates are with respect to the forward strand. BED format originated at UCSC, and this is their full specification: http://genome.ucsc.edu/FAQ/FAQformat.html#format1 And Galaxy's summary (also on tool forms that accept BED format): http://galaxyproject.org/wiki/Learn/Datatypes#Bed The rules to transform data in other coordinate formats to BED is explained in detail in this UCSC wiki document: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms There are no Galaxy wrapped automated tools to do this transformation, but perhaps someone on the mailing list has a workflow to offer. If not, the tools in Galaxy under "Text Manipulation" and "Filter and Sort" and a file containing the length of each chromosome can very likely be used in combination to perform the calculations (in several steps). If you create a process to do this, be sure to considering publishing the workflow for others to use. Hopefully this helps, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
ADD COMMENTlink written 6.7 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 108 users visited in the last hour