Question: 3' Adapter Trimming Using Fastx-Toolkit Clipper
0
gravatar for Hoang, Thanh
5.2 years ago by
Hoang, Thanh200
Hoang, Thanh200 wrote:
Hi all, I am analyzing miRNA sequencing now. My data is 51bp, single -ended and ~5 M reads. I want to remove the adapter sequences from the reads before mapping to the genomes/known miRNA database. My 3' adapter sequence is : 5-AGATCGGAAGAGCACACGTCT-3. I found that many reads only contain part of the 3' adapter sequence. I am using FASTX-toolkit to clip it off. How many bases should I put in the " Enter custom clipping sequence" ? Because in the output files, I end up with more reads when putting the whole 3 adapter sequence than putting only first 8 nt. Also, miRNA is about 17-25 nt long, I guess that the rest of the reads (51-21=30bp) must contain part or whole 5's adapter sequence or the by-product of mRNA/tRNA degradation. So I think that I have to trim the 5' adapter as well. Any suggestion will be highly appreciated Thanh
• 5.5k views
ADD COMMENTlink modified 5.2 years ago by Jennifer Hillman Jackson25k • written 5.2 years ago by Hoang, Thanh200
0
gravatar for Jennifer Hillman Jackson
5.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Thanh, Just enter the whole adapter sequence. The tool will match what is found in the input sequence and clip. The help graphic on the Clip form itself illustrates this - only one adapter is entered (can be entered) but a variable length is clipped from the input to produce the output. Thanks for posting this new question to the mailing list. This greatly helps us to track & provide the speediest replies. Best, Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org
ADD COMMENTlink written 5.2 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
5.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Thanh, To hopefully be clearer, the part matched is clipped (whole or partial, and there is even some tolerance for low-frequency mismatches). I would suggest taking a few sequences out and running the tool on them to try it out. You could test for both length and mismatch constraints this way. (Perhaps even using constructed sequences that are modified to have specific adapter lengths and/or mismatch counts). This is a great way to get a feel for new tools in general. If you need more details about exactly how the algorithm works, you can read the original documentation and then if you still need help, try contacting the tool author (links at bottom of tool form). But this is a very popular, commonly used tool and what I have shared is how it is behaves to my knowledge & experience. There may not be much more to it. Best, Jen Galaxy Team
ADD COMMENTlink written 5.2 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
5.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Thanh, Questions are fine, that is what this mailing list is for. But please do try to cc the mailing list and start a new thread for new topics when possible. To generate a length distribution (among other stats), the tool "NGS: QC and manipulation -> FastQC" is a quick method. Take care, Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org
ADD COMMENTlink written 5.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour