Question: Substring Sequence On Coordinate In Columns
0
gravatar for Michal Stuglik
7.5 years ago by
European Union
Michal Stuglik20 wrote:
Hi all, I am wondering if galaxy has tool to substring/extract sequence/text from another sequence/text based on coordinates in columns (start, end column) or how to do it in Text Manipulation/Compute? all the best, michal
galaxy • 995 views
ADD COMMENTlink modified 7.5 years ago by Jennifer Hillman Jackson25k • written 7.5 years ago by Michal Stuglik20
0
gravatar for Jennifer Hillman Jackson
7.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Michal, The tool "Fetch Sequences -> Extract Genomic DNA" can be used to extract fasta sequences. The coordinates can be BED, GTF, etc. and the "genome" doesn't necessarily have to be an actual genome, just a fasta file in your history. To subset a data string, the tool "Text Manipulation -> Trim" might be helpful. This would only work if you want to use the same rules for an entire file (or split your file up and run the tool on those subfiles using different rules). Practical for some cases, but not all. And the final option is for coordinate data - tools in "Operate on Genomic Intervals". Once you have the final coordinate set, going back and using the "Fetch Sequences" tool can capture the associated result fasta sequence, from a native genome or a fasta file in your history, as described above. Hopefully this gives you an option that will work for your project, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org
ADD COMMENTlink written 7.5 years ago by Jennifer Hillman Jackson25k
Hi Jen, It works, thanks! I am wondering why using Text Manipulation/Compute function, galaxy changes brackets '[' to '/__ob__/' and /'__cb__' for ']', /so for this: /str(c1)[1:2] --> //str(c1)__ob__1:2__cb__ / thanks a lot, michal
ADD REPLYlink written 7.5 years ago by Michal Stuglik20
Dear Galaxy-users, Does anyone know what the differences are between hg19 and hg19patch2 and can anyone tell me if the latest ensembl gtf file (v62) is definitely compatible with both hg19 and hg19patch2? Best Wishes, David. __________________________________ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 D.A.Matthews@bristol.ac.uk
ADD REPLYlink written 7.5 years ago by David Matthews630
not sure about compatibility EnsEMBL gtf files, but differences between the various patches are represented here: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.s html <http: www.ncbi.nlm.nih.gov="" projects="" genome="" assembly="" grc="" human="" index.="" shtml="">Human genome reference now at Patch 4 (which was news to me until I went to this page, thx!). f. -- B.F. Francis Ouellette http://oicr.on.ca/research/ouellette/ Dear Galaxy-users, Does anyone know what the differences are between hg19 and hg19patch2 and can anyone tell me if the latest ensembl gtf file (v62) is definitely compatible with both hg19 and hg19patch2? Best Wishes, David. __________________________________ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 D.A.Matthews@bristol.ac.uk<mailto:d.a.matthews@bristol.ac.uk> Hi Jen, It works, thanks! I am wondering why using Text Manipulation/Compute function, galaxy changes brackets '[' to '__ob__' and '__cb__' for ']', so for this: str(c1)[1:2] --> str(c1)__ob__1:2__cb__ thanks a lot, michal Hi Michal, The tool "Fetch Sequences -> Extract Genomic DNA" can be used to extract fasta sequences. The coordinates can be BED, GTF, etc. and the "genome" doesn't necessarily have to be an actual genome, just a fasta file in your history. To subset a data string, the tool "Text Manipulation -> Trim" might be helpful. This would only work if you want to use the same rules for an entire file (or split your file up and run the tool on those subfiles using different rules). Practical for some cases, but not all. And the final option is for coordinate data - tools in "Operate on Genomic Intervals". Once you have the final coordinate set, going back and using the "Fetch Sequences" tool can capture the associated result fasta sequence, from a native genome or a fasta file in your history, as described above. Hopefully this gives you an option that will work for your project, Best, Jen Galaxy team Hi all, I am wondering if galaxy has tool to substring/extract sequence/text from another sequence/text based on coordinates in columns (start, end column) or how to do it in Text Manipulation/Compute? all the best, michal ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org<http: usegalaxy.org=""/>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ <att00002..txt>
ADD REPLYlink written 7.5 years ago by Francis Ouellette40
Hi David, You can find information about the assemblies here: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.s html The patches so far have just included extra regions representing alternative haplotype regions (e.g. MHC). Ensembl 62 was released on patch 3: http://www.ensembl.org/Homo_sapiens/Info/Index If your data uses only to the reference chromosomes then you should have no issues using hg19 or any of the patches released so far. Cheers Will McLaren Ensembl Variation
ADD REPLYlink written 7.5 years ago by Will McLaren10
The patches are just representing alternate paths (not all are truly haplotypic). Some of these represent corrections to the underlying chromosome assembly. Basically, regions where the chromosome tiling path is wrong. We release the fixes ahead of the next build to make them accessible to folks. Deanna
ADD REPLYlink written 7.5 years ago by Church, Deanna (NIH/NLM/NCBI) [E]30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour