Question: Trimming Small Rna
0
gravatar for Thiago Mafra
7.2 years ago by
Thiago Mafra10
Thiago Mafra10 wrote:
Hello everybody, I have sequences of small RNA's from 18 to 35nt and accurate trim the sequences of the adapters before aligning the reads with the reference genome. Are there any tools available to it in the Galaxy? Thanks. -- Thiago Mafra Batista Biólogo Molecular Doutorando em Bioinformática - UFMG LGB - ICB/Bloco K4 sala 245 Tel Lab: (31) 3409-2628 CV: http://lattes.cnpq.br/9414909432933240
galaxy • 1.1k views
ADD COMMENTlink modified 7.2 years ago by Richard Mark White240 • written 7.2 years ago by Thiago Mafra10
0
gravatar for Richard Mark White
7.2 years ago by
Richard Mark White240 wrote:
Hi,   This must seem like a newbie question but I cant get a clear answer.  My understanding from the galaxy wiki page http://wiki.g2.b x.psu.edu/Learn/FAQ#Learn.2BAC8-FAQ.Interval_and_BED_format is that all intervals in galaxy are 0 based, start inclusive end exclusive.  but when i use generate pileup/filter pileup and convert to intervals, i get something like this:   chr10 1056309 1056310 G C + When i look up the SNP (G-->C) it is pretty clearly 1056310.  Which would make the "interval" end inclusive.  this is key because when i annotate snp's against dbSNP, i need to have the right cooridnates.   Can anyone provide some guidance?  Thanks! rich
ADD COMMENTlink written 7.2 years ago by Richard Mark White240
Hi,   This must seem like a newbie question but I cant get a clear answer.  My understanding from the galaxy wiki page http://wiki.g2.b x.psu.edu/Learn/FAQ#Learn.2BAC8-FAQ.Interval_and_BED_format is that all intervals in galaxy are 0 based, start inclusive end exclusive.  but when i use generate pileup/filter pileup and convert to intervals, i get something like this:   chr10 1056309 1056310 G C + When i look up the SNP (G-->C) it is pretty clearly 1056310.  Which would make the "interval" end inclusive.  this is key because when i annotate snp's against dbSNP, i need to have the right cooridnates.   Can anyone provide some guidance?  Thanks! rich
ADD REPLYlink written 7.2 years ago by Richard Mark White240
Hello Richard, The coordinates have a zero-based start. Add +1 to the start, do nothing to the end, and the bases included will match up with any visualization tool where the first base is labeled "1". The data: chr10 1056309 1056310 G C + start = 1056309 + 1 = 1056310 end = 1056310 SNP is a single base change at position = 1056310 There are other details, but this is the key fact that you will likely need to know for most applications, esp. those that are not stranded or converted to be on the (+) strand. For the full details, including how to transform (-) stranded coordinates using this system, the description from UCSC is very handy: http://genomewiki.cse.ucsc.edu/index.php/Coordinate_Transforms Hopefully this helps, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD REPLYlink written 7.2 years ago by Jennifer Hillman Jackson25k
Hi Rich, That is consistent with the BED format, that is in "BEDs" 0 based coordinate system the SNP is at 1056309. In the more common "1" based system this translates to 1056310. If the end were inclusive the SNP would be at 1056309-1056310 in "BED" world, that is it would take 2 positions. The first base of a genome in BED coordinates is represented as 0-1. My quick rule of thumb for converting between coordinate systems is to add (or subtract) 1 from the start base, leave the end base alone. Jim
ADD REPLYlink written 7.2 years ago by Jim Robinson150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour