Question: Trouble with FASTQ joiner
0
gravatar for nday
18 months ago by
nday0
United States
nday0 wrote:

Hi Biostars community, I attempting to run some initial QC and manipulation tools on some rnaseq data (illumina 1.8). I have my trimmed data from Trim Galore!, however I am unable to pair the trimmed data with FASTQ joiner. Whenever I take the two paired outputs (in fastqsanger format) from Trim Galore and use them in FASTQ joiner, the program will run, but returns a message states that 0% of reads were paired, no matter the "old" or "new" header option is selected.

"There were 22817078 known sequence reads not utilized. Joined 0 of 22817078 read pairs (0.00%)."

My Trim Galore data is in the following format: @D00472:148:CB1Y1ANXX:2:2201:1059:2134 1:N:0:ATCACG CTCAGGCATAGGTCACCAGCTTTCGGGTCGTTTGCCAACTGCTCAACCTCTGCACAGTCACAAGTGACACGCACAGGGCCGTGGTGCGCTGCACTCCG + BBBBBBFBF/</bbffff>

Am I missing something?

I have used this workflow in the past with success, but now it does not seem to work. I have noticed that this error popped up about 5 years ago, however the work around solution calls on a tool that is not available anymore. Any help/suggestions would be greatly appreciated.

Thanks.

Nick

rna-seq galaxy • 491 views
ADD COMMENTlink modified 18 months ago • written 18 months ago by nday0
0
gravatar for Jennifer Hillman Jackson
18 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Are you certain that the two fastq inputs are a pair? The headers must match between the two, either a "1 + 2" or "1 + 3" pairing. The sort order must also be the same between the two inputs for the tool to join them.

If you are not sure, you can post the first 20 lines of each dataset (or any multiple of 4 lines, so that at least 5 full sequences are shown) and we can help with troubleshooting. Try pasting the sequence data into a gist or another online site that will preserve format and share the link - or - you can paste back directly in a comment here and use the "code sample" text formatting option.

Thanks! Jen, Galaxy team

ADD COMMENTlink written 18 months ago by Jennifer Hillman Jackson25k
0
gravatar for nday
18 months ago by
nday0
United States
nday0 wrote:

Hi Jen, Below is a sample of the two fastq inputs (from Trim Galore!) that I am using for FASTQ Joiner. (They have been through FASTQ groomer previous to Trim Galore!). Taking a closer look at the sequence identifiers of my forward and reverse files, I notice that the flowcell lane and tile number within the flowcell lane is different between the files. Could this affect the program's ability to pair reads if the identifiers do not match up in the Coordinates section? Old data from a different experiment has matching coords but different flags for forward and reverse reads, but here, it seems to be the opposite. I am also certain that these files are pairs based on the file names that were assigned to them from the sequencing facility.

Forward:

   @D00472:148:CB1Y1ANXX:1:1102:1172:2173 1:N:0:CGATGT
GTGGGTTGGTAGCTCCGAGCATGGAACGTCCTTGCTTGACGACGTCAAGTCCTTGCCAGACCATGGCGACGACTGGTCCAGAGCGCATGTACTCGATG
+
BB<B/FFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFF</BBFFF<FFFFFFFBBBFFFFFFFFFFFFFFFFFFFFFFFFFF/FBFFFFFFFFFFF
@D00472:148:CB1Y1ANXX:1:1102:1195:2206 1:N:0:CGATGT
CTCAGCGCGCAAATGATCGATACAATCATACACGTCATTTTCATCTTTTCCTGCAACGACAACCAAGTTTGGATCTTCTGCTCCTTGACGTGGAAGACG
+
BBBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@D00472:148:CB1Y1ANXX:1:1102:1068:2216 1:N:0:CGATGT
CTGGGACGGCCTCTGGAAGAGATTCGTGATGCATCTCAACGGACTTGACTTCAGTGGTGACGTTTTGTGGAGCGAAGGTAACGACCATTCCTGGCTTGAT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@D00472:148:CB1Y1ANXX:1:1102:1217:2223 1:N:0:CGATGT
CACAGTGCAAAAAGTAGTTCACCCAGTGAGGATCGTCGTAGTGAGCCTCGACTTGTGCAATTTCATTATCGTGATGCGGGAACTTCAGATCGAAACCGCC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFF
@D00472:148:CB1Y1ANXX:1:1102:1480:2144 1:N:0:CGATGT
CACAGTTTGAAGCGCGGGGGCGGGGTCCTTCAAGAACATGATGGTATTGATGAAATAGCTTCGCAAGGGGTAATTAGCGGGGGATAGGAAAGGGGAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Reverse:

    @D00472:148:CB1Y1ANXX:2:2201:1190:2118 1:N:0:CGATGT
CTCTAGCATTTGAGTTTAGTTCCAAAGGAAGCCGTACAAAATCAAAAATCACCGGGGAATTCTACTACTCGATGGTAGTATTATTCACGGGTCGAACGTT
+
/<<BBF/<FFFBBFFFFF/BB<FF/<FFFFBFB<<FFFFF<<<FFFBFFFFFBFBF///<FBFFFFBFFFFFFFBFFFFBFFFFFBFFFBFB/B</FBFB
@D00472:148:CB1Y1ANXX:2:2201:1128:2160 1:N:0:CGATGT
CCTCTTTCTCCCCCACCGAGTTTTCTCCACGAGATGCTTATATCCTCGTCAACGATGCCACAGTTTTAGTCTCATCCACTAGAAAACATCATTTTTTCG
+
BBBBBFFFFF<//</FFF/<FFFFFFFBFFFFFFFFFFFFFFFFFFFFF/<<FFFF/7BFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFF/7
@D00472:148:CB1Y1ANXX:2:2201:1342:2132 1:N:0:CGATGT
GGAAAAATCGCAATTTTGGGTAAACGATATACATTGAAATCATTTTTAAATATTGGCAATATTATTCAGAAGCTTATCGCCAGTTATCAATGCACTTGT
+
BBBBBBFFFFF/<FFFFF/<<F//B<BFFFFFFFFFFFFFFFFFFFFFFFFFFFF</<FFFFFBFFBB<FFFFFFFFFF/FFFFFFFFFFFFFFFFFBF
@D00472:148:CB1Y1ANXX:2:2201:1327:2143 1:N:0:CGATGT
GTTTGAAGATCAGACACCGCAGCCTACGTTCTTTGTTTTTATGAAGAATTTAGACGATGGAGAGGACTTCTCCCCCCACACACACAAAAAAAAACAAA
+
BBBBBFFFFFB/B<FBFF/FFFFFFF<FFFFFFFFFFFFFFFBFFFFFFFFFFBFB/<<FFFBFFFFFFFFFFFFFFFFFFFBB<BFBFFFF<FBFFF
@D00472:148:CB1Y1ANXX:2:2201:1420:2191 1:N:0:CGATGT
CTGGTTGATGAAAGTCAAGAGTCTGGCGATGTTCTTGCGGACAACGCGAATCTTGGGGAGCTTAGAGGCAGCTCCTCCAGTGACCTTCGAGACACGG
+
BBBBBFFFFFF/BFFFFF/BFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/<BFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFF
ADD COMMENTlink modified 18 months ago • written 18 months ago by nday0

I agree that the problem is with the sequence labels. Before going forward, asking for a clarification from the sequence provider would be a good idea. Samples can get mixed up.

ADD REPLYlink written 18 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour