Fastq Groomer And Compute Quality Statistics

Question: Fastq Groomer And Compute Quality Statistics

7.5 years ago by

I noticed that for our new Ilumina data (which generate Sanger format) the FastQ groomer output is identical to the Ilumina FastQ input file. I was hoping to go ahead and just use the raw FastQ files as input (saving disk space) for computing quality statistics to look at box plots, but it appears that the tool "Compute Quality Statistics" appears to require that the data have been run through FastQ Groomer first. Is there a way to get around this and is this a bug? I assuming this is some sort of safety measure built into this tool? -John

• 1.5k views

ADD COMMENT • link •

modified 7.5 years ago by Tilahun Abebe • 40 • written 7.5 years ago by John David Osborne • 160

7.5 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

You can avoid the space/time overhead of grooming and get comprehensive QC reports using the new wrapper for FastQC (under NGS: QC) - it takes fastq of any flavour (and bam) groomed or not, producing a superset of the compute quality stats output without the need for an intermediate step. Highly recommended. -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;

ADD COMMENT • link written 7.5 years ago by fubar ♦ 1.1k

Thanks Ross, I don't see it under my local install - are there any pre-written scripts to integrate it with a local galaxy instance? I assume you are talking about this tool here: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ -John ________________________________________ To: John David Osborne Cc: galaxy-user@bx.psu.edu Subject: Re: [galaxy-user] FastQ Groomer and Compute Quality Statistics You can avoid the space/time overhead of grooming and get comprehensive QC reports using the new wrapper for FastQC (under NGS: QC) - it takes fastq of any flavour (and bam) groomed or not, producing a superset of the compute quality stats output without the need for an intermediate step. Highly recommended. -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;

ADD REPLY • link written 7.5 years ago by John David Osborne • 160

Hi, John. it's on main and test - ie the FastQC wrapper is distributed with the current stable and central branches so your local tool_conf.xml may be out of date since it's not automagically refreshed from the distro .sample ? If you do a diff of your local tool_conf.xml with the current distributed sample, you should see the lines you need to add which points to rgenetics/fastqc.xml Thu,Jun 09 at 10:22am grep -i fastqc tool_conf.xml <label text="FastQC: fastq/sam/bam" id="fastqcsambam"/> <tool file="rgenetics/rgFastQC.xml"/> Like everything else, you'll want to install the jar locally so it can be found by the cluster - the default location is tool-data/shared/jars/FastQC so the tool can find the fastqc perl script (yes, I know...but it's worth it!) <command interpreter="python"> rgFastQC.py -i $input_file -d $html_file.files_path -o $html_file -n "$out_prefix" -f $input_file.ext -e ${GALAXY_DATA_INDEX_DIR}/shared/jars/FastQC/fastqc I hope this helps? -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;

ADD REPLY • link written 7.5 years ago by fubar ♦ 1.1k

7.5 years ago by

Peter Cock • 1.4k

European Union

Peter Cock • 1.4k wrote:

If you know your data is already in Sanger FASTQ format, you can say this when uploading the data into Galaxy. Or, use the "pencil" icon to edit the attributes and change the file type. This doesn't change the file itself on disk. Peter

ADD COMMENT • link written 7.5 years ago by Peter Cock • 1.4k

7.5 years ago by

Tilahun Abebe • 40

Tilahun Abebe • 40 wrote:

Hi guys, We are trying to load Illumina data to our local Galaxy instance. The files are between 700 MB and 2.2 GB. Files below 2 GB load in less than 5 minutes. Files larger than 2 GB don't upload at all. We installed Galaxy locally because we thought loading files will be faster than the server version. Any suggestions to solve this problem is highly appreciated. Tilahun

ADD COMMENT • link written 7.5 years ago by Tilahun Abebe • 40

Please log in to add an answer.

Similar posts • Search »