Question: Data from NextGen sequencing uploaded, concatenated, but not in appropriate FASTA format for next step?
0
gravatar for linherk
2.7 years ago by
linherk0
linherk0 wrote:

Hello, I received my raw NextGen sequencing files and am following a lab mate's protocol based on Galaxy. I created my account, then under "Get Data", I uploaded all FASTQ files, specifying type as "fastqsanger" and genome as hp38. For each individual sample, I merged multiple files representing 1 lane into 1 FASTQ file using "Text Manipulation" >Concatenate datasets tail-to-head. My data how looks like the below example. When I tried to proceed to the next step, which prompts to remove possible new/empty lines created between files by using "Filter and Sort" > "that: NOT Matching" and "the pattern: ^$", I couldn't see my merges files in the drop-down box, and am now not sure how to proceed. Any help would be greatly appreciated! Best, Katja

@HWI-1KL153:117:HJL5WBCXX:2:1101:1494:1944 1:N:0:TAGCTT
CTGGTACAGTCCGCTGATGATGGGGTTACACACCTGCTCCAGCTCCTTCCTCTTGTGCTCAAACTCGTCC
+
DA@@DHFFHGHIIICDF1DCE?EEECEHIIIHHIFHHHHH@1GHEHCFHH@FHG?F1DFFE11C<CEFHH
@HWI-1KL153:117:HJL5WBCXX:2:1101:1554:1976 1:N:0:TAGCTT
CGTAGGTTTGGTCTAGGGTGTAGCCTGAGAATAGGGGAAATCAGTGAATGAAGCCTCCTATGATGGCAAA
+
DDDDDIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIHIIIIIIIIIIIIII
@HWI-1KL153:117:HJL5WBCXX:2:1101:1825:1907 1:N:0:TAGCTT
TTCCTTCAGCTCAGCAAACTTGCATGCAATGTGAGCCGTGTGGCAATCCAATACAGGGGCATAGCCGGCG
+
DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH
@HWI-1KL153:117:HJL5WBCXX:2:1101:2017:1954 1:N:0:TAGCTT
CCCGCCCATTAATGACACTCCAAGAAGTGTCATGATATATGCGATTCACTTTCAAGTCTTCAGCAAACTA
+
DDDDDIIIIIIIIIIIIIHIIIHIIIIIIIIIIHIIIIIIIIIHIIIIIIIIIIIIIIIIIIIHIIIIII
@HWI-1KL153:117:HJL5WBCXX:2:1101:2737:1946 1:N:0:TAGCTT
GTCAAATGTGAGTCGCCTAGTCTAACAGTAGAGGTAAGTTCAAAGATGAAATGTGATTTGTTCAAGGCTG
+
DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@HWI-1KL153:117:HJL5WBCXX:2:1101:2594:1994 1:N:0:TAGCTT
GCCAGCAAGCCAGCCCTCATAACCAAAGACAGCTGAACCATGGATTGCAGGTCCAGCAAAATCAAAGTCA
rna-seq • 935 views
ADD COMMENTlink modified 2.7 years ago by Dave Clements ♦♦ 2.5k • written 2.7 years ago by linherk0
0
gravatar for Jennifer Hillman Jackson
2.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

I am not sure what happened, but the data is not in fastq format and probably wasn't when first uploaded to Galaxy (Concatenate wouldn't create this type of output).

I suggest checking the data locally (how it is before upload) against the FASTQ data specification. A link plus short examples are here: https://wiki.galaxyproject.org/Learn/Datatypes#Fastq

Thanks, Jen, Galaxy team

ADD COMMENTlink written 2.7 years ago by Jennifer Hillman Jackson25k
0
gravatar for rgarcia
2.7 years ago by
rgarcia0
rgarcia0 wrote:

That looks like a FASTQ file without the newline characters at the end of each line.

ADD COMMENTlink written 2.7 years ago by rgarcia0

@rgarcia: Fixed the initial post to show that the newlines are there. That was an artifact of the editor.

ADD REPLYlink written 2.7 years ago by Dave Clements ♦♦ 2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour