Question: Question About Fetching Sequence From Genome
0
gravatar for Qianli Shen
6.5 years ago by
Qianli Shen10
Qianli Shen10 wrote:
Hi I want to fetch sequence from soybean genome, according to a gff file. My gff3 file and genome file are attached to the email, because it is not easy to recongnize the format if I paste it in the email. And it keeps reporting the error: An error occurred running this job: Traceback (most recent call last): File "/galaxy/home/g2main/galaxy_main/tools/extract/extract_genomic_dna.py" , line 288, in <module> if __name__ == "__main__": __main__() File "/galaxy/home/g2main/galaxy_main/tools/extract/extract_genomic_dna.py" Could you please tell me where is the problem? Best Qianli
gff • 942 views
ADD COMMENTlink modified 5.2 years ago by Jennifer Hillman Jackson25k • written 6.5 years ago by Qianli Shen10
0
gravatar for Jennifer Hillman Jackson
6.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Qianli, This appears to be the same data as submitted as a recent bug? Converting the query coordinates to BED format is still the recommendation. This should be a good solution for most, if not all, of your prior Extract tool failures and is a good method overall. First "Convert Formats -> GFF-to-BED", followed by clicking on the pencil icon to assign the last three columns on the "Edit Attributes" form, in particular you will want to get strand assigned, so that c4 = name, c5 = score, and c6 = strand. The datatype will be bed. Then extract using your custom genome and the sequence will be titled by the region coordinates. Best, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org
ADD COMMENTlink written 6.5 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
5.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello, Yes, MEME is not on the Main server, but can be used in local, cloud, or slipstream Galaxy installs. For throughput - there are a few MEME-related repositories in the Tool Shed to choose from. How many sequences each can process will likely vary and are related to the hardware the Galaxy instance is run on. Contacting the tool authors is one path or you can try testing using your actual data. Data composition could be an important factor (not just the number of sequences). For the Sequence Logo Generator, I do not know of a hard limit by the tool itself, but as the recommended/supported input is ClustalW output, that tool will most likely be setting the potential upper limit when using the public Main Galaxy instance. Testing will be the best way to learn the limits for your particular data (whether nucleotide or protein), but the success range will be capped in the thousands, not millions (and possibly lower, as length increases). If there is a memory problem or the run time exceeds the limits on the public server, the job will end with an error. Moving to a scaled up server, such as a cloud Galaxy, will give you more control over these types of variables. Some benchmarks are in the Clustal W publication: http://bioinformatics.oxfordjournals.org/content/23/21/2947.long If others would like to post benchmarks from their own experience, that would be welcome! Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org
ADD COMMENTlink written 5.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour