Question Regarding Quality Filtering Of 454 Amplicons

Question: Question Regarding Quality Filtering Of 454 Amplicons

7.7 years ago by

Hi, I have a question for you guys regarding quality filtering. I have a data set of double MID tagged 454 amplicons, from which I wish to select high quality sequences above Q20. The 454 quality filtering system seems to work differently from that given for the Illumina sequencing i.e. 454 filtering takes high quality segments, while Illumina (FASTQ) can select high quality full reads based on certain parameters. OK, so I know that the total length of my amplicon, including primers and barcodes is around 260bp. If I then set the 454 quality filtering tool to extract contiguous high quality sequence of >260, it gives me back around 45% of my raw data as hitting this criterion i.e. All 260bp are above Q20. I donšt necessarily need this high stringency as most bases may not be informative. But if I convert my 454 data to FASTQ format and then run the Illumina filtering system which also allows me to set the number of bases allowed to deviate from the Q20 criteria, I get back over 90% of my data (allowing 10bp to deviate from Q20). I then need to go ahead and convert back to 454 format. Can you tell me if this is OK? Will I loose /confuse information somewhere along these conversions? It seems that if I do this, my barcodes are removed, as amplicons do not sort properly when I parse them through my barcode filtering program. Does anyone know of a program to filter 454 data based on average sequence quality score, which doesnšt involve Linux and the Roche off instrument program (I have no experience in Linux! ) Thanks! -- Jack Lighten, Ph.D. Candidate, Bentzen Lab, Room 6078, Department of Biology, Dalhousie University, Halifax, NS, B3H 4J1 Canada Office:(902) 494-1398 Email: Jackie.Lighten@Dal.Ca Profile: www.marinebiodiversity.ca/CHONe/Members/lightenj/profile/bio

• 1.3k views

ADD COMMENT • link •

modified 7.4 years ago by Jennifer Hillman Jackson ♦ 25k • written 7.7 years ago by Jackie Lighten • 20

7.6 years ago by

Jeremy Goecks • 2.2k

Jeremy Goecks • 2.2k wrote:

Jagat, First, a couple housekeeping issues: (a) the questions you're asking are better suited to the galaxy-user list (questions about using Galaxy and performing analyses) rather than galaxy-dev (questions about installing Galaxy locally and tool development), so I've moved this thread to galaxy-user; (b) please start new threads when appropriate rather than replying to older threads as this makes threads shorter and more focused. Onto your questions: GTF files have multiple lines per feature, so your output is reasonable. As Vasu noted, this is an ongoing area of research. For some experiments, it may be reasonable to group alternatively-spliced isoforms of the same gene and jointly estimate FPKM, and for others it may not. Fortunately, if you do want to group transcripts to get gene FPKM values, Cuffdiff does this for you: see its gene FPKM expression file. Best, J.

ADD COMMENT • link written 7.6 years ago by Jeremy Goecks • 2.2k

7.4 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hi Jackie, The screencasts under "Metagenomic Analyses with Galaxy" specifically use 454 data and would likely be helpful, maybe even if you have already resolved your prior issue. http://main.g2.bx.psu.edu/screencast Apologies for the delay in reply, we were a bit backed up with questions in March and a few slipped through. Take care, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/

ADD COMMENT • link written 7.4 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »