Question: Get Data > EBI SRA (ENA) incorrectly assigning fastqsanger instead of fastqsanger.gz datatype, OPEN, read for workaround
2
gravatar for a.morris
10 weeks ago by
a.morris40
a.morris40 wrote:

Dear Advisor

You were very helpful with overcoming my Galaxy problems at the end of last year and Im hoping you are still in a position to help again. I am testing two downloaded files (paired end) for SRX1725724 via ENA.

(Error 1) When I try FastQC, the following error msg is returned:

“Fatal error: Exit code 1 () Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/jobdir/020/014/20014685/_job_tmp -Xmx7g -Xms256m

Failed to process EBI SRA_ SRR3438598 File_ ftp___ftp_sra_ebi_ac_uk_vol1_fastq_SRR343_008_SRR3438598_SRR3438598_1_fastq_gz

uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'

      at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)

      at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:89)

      at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:87)

      at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)

      at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:152)

      at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:121)

      at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)

(Error 2) When I tried FastQJoiner I get another error msg:joiner", line 6, in <module>

sys.exit(galaxy_utils.sequence.scripts.fastq_paired_end_joiner.main())

File "/cvmfs/main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/scripts/fastq_paired_end_joiner.py", line 146, in main for i, fastq_read in enumerate(fq.fastqReader(path=input1_filename, format=input1_type)): File "/cvmfs/main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py", line 616, in __iter__ yield next(self) File "/cvmfs/main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/six.py", line 564, in next return type(self).__next__(self) File "/cvmfs/main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py", line 592, in __next__ assert fastq_header.startswith('@'), 'Invalid fastq header: %s' % fastq_header AssertionError: Invalid fastq header: ­�

The problems that an inexpert eye like mine can see are that for (1) the id lines don’t start with @ and (2) an invalid header and/or the ‘@’ problem.

Here are the first few lines of one of the data files (All id lines here appear to start with@, so I don’t understand?)

@SRR3438598.1 1/1 GTAGTGGTAGGTATCCTGGGGNNCCCGGGTGAAGAACCTGTCCTGGAAGA + ;;:@><<>9:;##2<89@?391?<=???>?>????<99=; @SRR3438598.2 2/1 CAGACGGGGTGTGGGTGGGCGNTGCCGCAAGAAGAGGGGAGGGTGGAGCT + B@@DFFFFHFDDFGIEHIJJJ#08DFGEIIIEDHCHEBDDDDD,=@BBDB @SRR3438598.3 3/1 GTGTCCTTGTCGCTGTTGNACNNGTAGGCATAGGGGCGCCGCACCACCAC + <<<@@?@@@@@?@@?@@?#3@##2<@@???????????????<====<###4##21@=??????????????????????< @SRR3438598.9 9/1 GTCCCTATGGTATGGAACNCANNCTGCCCTCCCCGGCCCTCCTCACTCCT + <<<@@?@@@@@@@@@?@@#3=##2<>@??????????????????????? @SRR3438598.10 10/1 GGAGGGTNNNATAGAGANNNANNANAATTGTAATAAGCAGTGCTTGAATT +

Your help would be much appreciated

Thanks

Alex

ADD COMMENTlink modified 2 days ago by Jennifer Hillman Jackson25k • written 10 weeks ago by a.morris40
1

Thankyou for the quick workaround. yes these metafiles should have been formatted as compressed files with a .gz suffix

ADD REPLYlink written 8 weeks ago by a.morris40
1
gravatar for Jennifer Hillman Jackson
9 weeks ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

There is a known problem with the metadata assignment when importing data from this source (and a few others).

The details and workaround are here. Once resolved, the ticket will close out: https://github.com/galaxyproject/galaxy/issues/6334

In short, reassign the fastq datatype to be fastqsanger.gz and these issues will resolve.

FAQ: https://galaxyproject.org/support/

Alternatives:

  • Use this tool instead: Download and Extract Reads in FASTA/Q format from NCBI SRA. Uncompressed or gz compressed fastq data works well with most tools at the public usegalaxy.* servers (but not all - for example, Tophat - use HISAT2 instead).
  • Copy the Galaxy-FTP link at ENA and paste that into the Upload tool. If you wish to preserve the .gz compression, assign the datatype fastqsanger.gz during upload. By default when using "autodetect" for datatype fastqsanger will be applied and the data uncompressed.

Thanks for reporting the problem! Jen, Galaxy team

ADD COMMENTlink modified 8 days ago • written 9 weeks ago by Jennifer Hillman Jackson25k

Follow-up: For others having problems with tools not interpreting fastq data (job errors or unexpected results, unrelated to the bug above), please see the FAQs here for common data formatting issues and solutions:

ADD REPLYlink modified 8 days ago • written 9 weeks ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 127 users visited in the last hour