Question: Preprocessing Gdna Illumina Paired End Data For Mapping/Snp Calling
0
Timothy Brennan • 10 wrote:
This question is w/ regards to pre-processing whole genome
resequencing data for mapping data to a reference yeast strain.
I'm having trouble joining paired end data. I have two files per
sample (read1 and read2).
I've successfully uploaded my fastq.gz files into galaxy using FTP. I
have two fastq files for each direction per strain labelled for
example:
(for the left hand dir)
130104_7001240_0133_AH0854ADXX.lane_2.CCGTCC.1.fastq
(for the right hand dir)
130104_7001240_0133_AH0854ADXX.lane_2.CCGTCC.2.fastq
Now once I groom each using FASTQ Groomer I'm trying to join them to
get a single file and I'm joining 0% of the reads. So I think the
header or directory is not in the correct format. E.g., the raw
groomed reads for the left hand and right hand look like:
(for the left hand dir)
130104_7001240_0133_AH0854ADXX.lane_2.CCGTCC.1.fastq
@HWI-ST1240:133:H0854ADXX:2:1101:2716:1998 1:N:0:CCGTCC
NGTATGGAAGACGTAGAGTGGATGAAAATTTTGTGAAAAAAAAAAGCTTATAGGAACAAAAACATCCTTA
CATCTTCGGGTATTTCTTCTAGGGTTGAAGT
+
!!!%%%%%)))))**(*(!$()(((***(***')(**********)'))%!!%&%$$$$$####$$$$$$
"$!!""!!##!!!!$$%$""!!"#$#!!!!!
@HWI-ST1240:133:H0854ADXX:2:1101:5045:1994 1:N:0:CCGTCC
NCCAGACACAGTTAACGCAACCTGACATGCAACAGTTATCGGGTTCTTGTGGTTTTGCAGGCACTTGGAC
ACCTGCTATTTTCTTCGTTCCGCCGCTAAGC
(for the right hand dir)
130104_7001240_0133_AH0854ADXX.lane_2.CCGTCC.2.fastq
@HWI-ST1240:133:H0854ADXX:2:1101:2716:1998 2:N:0:CCGTCC
GCATAGTTACTTTTTGATCACTAACAACGATATATTATCGTTGAACAATTTACTACGCAAAACAGTTCAC
GTGATGTACGTCAGATAATTCACTGAAGGTA
+
$$$''''')))))++++++(+++++++++++++++++*+*++*++++++++++*++++*+++++*)))))
)''''&'&'%&%%%%$%%%'&%%%%%$%%$!
@HWI-ST1240:133:H0854ADXX:2:1101:5045:1994 2:N:0:CCGTCC
ATGTATTATAAGCCCGAATCAGATACTCAAATTTGAAAAAAGATATCTTTCTCCTCCGACATGGCCGAAC
TCATTTACATAAATAGCATAAATTAAACAGA
According to the wiki I think the fastq format should look something
like this with /1 and /2 corresponding the each paired file.
@61CC3AAXX100125:7:118:2538:5577/1
GACACCTTTAATGTCTGAAAAGAGACATTCACCATCTATTCTCTTGGAGGGCTACCACCTAAGAGCCTTC
ATCCCC
+
?>CADFEEEBEDIEHHIDGGGEEEEHFFGIGIIFFIIEFHIIIIHIIFFIIIDEIIGIIIEHFFFIIEHI
FA@?==
@61CC3AAXX100125:7:1:17320:13701/1
CTCAGAAGACCCTGAGAACATGTGCCCAAGGTGGTCACAGTGCATCTTAGTTTTGTACATTTTAGGGAGA
TATGAG
+
?BCAAADBBGGHGIDDDGHFEIFIIIIFGEIFIIFIGIGEFIIGGIIHEFFHHHIHEIFGHHIEFIIEEC
E?>@89
Any suggestions on how to get the files in the correct format/header
to be able to join them?
Last question, what is the tool to trim reads based on quality again?
Thanks very much gentle people
Tim
ADD COMMENT
• link
•
modified 5.6 years ago
by
Jennifer Hillman Jackson ♦ 25k
•
written
5.6 years ago by
Timothy Brennan • 10