Question: Nucleotide Analysis - Gc Percentage
0
gravatar for Peter Cock
7.6 years ago by
Peter Cock1.4k
European Union
Peter Cock1.4k wrote:
Hi all, Are there any built in Galaxy tools that I have missed to do with GC percentage (or indeed, AT percentage)? I'm thinking of a tool to calculate the GC percentage (and perhaps related statistics like counts/percentages of A, C, G, T), and perhaps a related tool to filter on GC. Possible use cases include filtering NGS reads to remove high/low GC reads from a contaminate. Slightly more complicated, right now I want to calculate the GC (or in fact AT) percentage from the first and last ~20 (configurable) bases. In this case I am looking for (and filtering on) AT rich ends of contigs which may be indicative of viral sequences. A very similar task would be looking for (and filtering on) poly A tails of mRNA, or if sequenced from the reverse strand, a poly T start. Peter
galaxy • 3.1k views
ADD COMMENTlink modified 7.6 years ago by Guru Ananda60 • written 7.6 years ago by Peter Cock1.4k
0
gravatar for Guru Ananda
7.6 years ago by
Guru Ananda60
Guru Ananda60 wrote:
Hi Peter, There isn't a built-in Galaxy tool to compute GC%, yet. You could perhaps use UCSC's hgGcPercent binary, which lets you compute GC% for BED intervals. You can find the same here: http://genome.ucsc.edu/FAQ/FAQdownloads#download27 Thanks, Guru. -- Graduate student, Bioinformatics and Genomics Makova lab/Galaxy team 505 Wartik lab University Park PA 16802 guru@psu.edu
ADD COMMENTlink written 7.6 years ago by Guru Ananda60
Thanks Guru. I'll be working with simple sequence files (FASTA, or even FASTQ, SFF, etc) rather than BED files, but I'll keep that in mind. Peter
ADD REPLYlink written 7.6 years ago by Peter Cock1.4k
Peter and Guru; [Computing GC] Emboss has some utilities that do this. infoseq and geecee, and there are also programs for exploring CpG islands: http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/nucleic_cpg _islands_group.html Brad
ADD REPLYlink written 7.6 years ago by Brad Chapman240
Thanks for pointing this out, Brad. Both geecee and infoseq are in fact available on Galaxy under EMBOSS section. Guru. -- Graduate student, Bioinformatics and Genomics Makova lab/Galaxy team 505 Wartik lab University Park PA 16802 guru@psu.edu
ADD REPLYlink written 7.6 years ago by Guru Ananda60
please remove me from mailing list - thanks
ADD REPLYlink written 7.6 years ago by Douglas Allan10
Hi Brad, These tools are also in galaxy under the EMBOSS section. "geecee" will tell you the percentage of GC in FASTA sequences. It basically outputs the sequence name and then the GC content as below: #Sequence GC content Sequence1 0.44 Hope this helps! Tychele
ADD REPLYlink written 7.6 years ago by Tychele10
The kent program hgGcPercent will measure what you want to measure from your sequences. --Hiram hgGcPercent - Calculate GC Percentage in 20kb windows usage: hgGcPercent [options] database nibDir nibDir can be a .2bit file, a directory that contains a database.2bit file, or a directory that contains *.nib files. Loads gcPercent table with counts from sequence. options: -win=<size> - change windows size (default 20000) -noLoad - do not load mysql table - create bed file -file=<filename> - output to <filename> (stdout OK) (implies -noLoad) -chr=<chrn> - process only chrN from the nibDir -noRandom - ignore randome chromosomes from the nibDir -noDots - do not display ... progress during processing -doGaps - process gaps correctly (default: gaps are not counted as GC) -wigOut - output wiggle ascii data ready to pipe to wigEncode -overlap=N - overlap windows by N bases (default 0) -verbose=N - display details to stderr during processing -bedRegionIn=input.bed Read in a bed file for GC content in specific regions and write to bedRegionsOut -bedRegionOut=output.bed Write a bed file of GC content in specific regions from bedRegionIn example: calculate GC percent in 5 base windows using a 2bit assembly (dp2): hgGcPercent -wigOut -doGaps -win=5 -file=stdout -verbose=0 \ dp2 /cluster/data/dp2 \ | wigEncode stdin gc5Base.wig gc5Base.wib
ADD REPLYlink written 7.6 years ago by Hiram Clawson260
Good idea Brad :) Now why does a tool search on the public Galaxy instance for GC not suggest this tool? Name: geecee Description: Calculates fractional GC content of nucleic acid sequences Does this mean the description isn't searched? It would seem like a sensible idea to me to include that... Searching for "geecee" works, but unless you're familiar with this EMBOSS tool no-one will think of that. Peter
ADD REPLYlink written 7.6 years ago by Peter Cock1.4k
Peter, The tool search doesn't start until you type in three characters, so typing 'GC' does not initiate a search. Typing 'gc<space' or="" 'gc="" content'="" works.="" perhaps="" a="" tooltip="" or="" help="" text="" is="" needed.="" j.<="" div="">
ADD REPLYlink written 7.6 years ago by Jeremy Goecks2.2k
I see that now, and yes, perhaps a caption on the search box would help... Also typing C, C, enter doesn't work - that does surprise me. There is still something amiss with the search apparently not using the tool description line, for instance neither "acid" nor "nucleic" nor "factional" show the EMBOSS geecee tool. If the search is indexing on the tool's main help text, then for the EMBOSS tools it would help to have an executive summary with key words in it, rather than just a link to the EMBOSS webpage for each tool. Peter
ADD REPLYlink written 7.6 years ago by Peter Cock1.4k
Dear All I have combined H3K4me3 pattern in a specific region (Info: UCSC Main on Human: wgEncodeBroadHistoneGm12878CtcfStdPk (genome)) with RefSeq genes in that region (CSC Main on Human: refGene (genome)) and get this pdf histogram. I was wondering if someone help me on interpretation of it. Best regards. Moein M.Farshchian Ph.D Candidate of Cell & Molecular Biology, Department of Biology, Faculty of Sciences, Ferdowsi University of Mashhad. Mashhad.Iran. P.O.Box: 9177948974 ________________________________ To: Jeremy Goecks <jeremy.goecks@emory.edu> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Nucleotide analysis - GC percentage I see that now, and yes, perhaps a caption on the search box would help... Also typing C, C, enter doesn't work - that does surprise me. There is still something amiss with the search apparently not using the tool description line, for instance neither "acid" nor "nucleic" nor "factional" show the EMBOSS geecee tool. If the search is indexing on the tool's main help text, then for the EMBOSS tools it would help to have an executive summary with key words in it, rather than just a link to the EMBOSS webpage for each tool. Peter ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
ADD REPLYlink written 7.6 years ago by Moein Farshchian20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour