Question: awk ... fatal: print to "standard output" failed
0
gravatar for 18Kbeyond
7 weeks ago by
18Kbeyond20
New York
18Kbeyond20 wrote:

Hi. I uploaded the fastq.gz file below to galaxy. I noticed that galaxy will convert it to fastq, so I didn't include gunzip before awk. (the code works in command line)

(I have tried adding gunzip -c before awk, like I do in command line, but galaxy will say "gzip: ...dat: not in gzip format )

wget https://www.encodeproject.org/files/ENCFF172GUS/@@download/ENCFF172GUS.fastq.gz

I wrote the script (see below) but it generated the error :

awk: (FILENAME=/data/galaxy_r17_05/database/files/000/dataset_193.dat FNR=5201) fatal: print to "standard output" failed (Broken pipe)


<tool id="getLength" name="Get the length of the FASTQ files">

<description></description>

<command>

awk '{if(NR%4==1) {print \$0}}' $input | head -n 1 | awk -F ":" '{print \$1}' | cut -c 2- | awk '{print length;}' | sort | uniq -c > $output

</command>

<inputs>

<param name="input" type="data" format="fastq" label="Input FASTQ file"/>

</inputs>

<outputs>

<data name="output" format="tabular" label="Get the file length on ${on_string}"/>

</outputs>

</tool>

tool awk galaxy syntax • 113 views
ADD COMMENTlink modified 7 weeks ago by Jennifer Hillman Jackson23k • written 7 weeks ago by 18Kbeyond20
0
gravatar for Jennifer Hillman Jackson
7 weeks ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

Try changing this \$0 to be just $0. Do the same for \$1 to be instead $1. The escaping looked odd to me so I tested it out just to be sure. It triggers an error on MAC OSX (line-command) and when using the Galaxy Awk tool in the Text Manipulation group. Leaving it out allows portions of your command string to work.

Other than that, the complete command string as-is (but without the escaping) seems to not be producing what I think you want to output (the name and length of each fastq sequence). For example, the first awk term is just pulling out the sequence identifier lines ("@" lines) and I don't see where else in the command string the sequence is isolated and passed to the length function.

There are many examples online for an awk string that summarizes fastq sequence length that might help with syntax. A google using "fastq read length awk" will bring these up. This one looks good to me but you can review other options: http://onetipperday.sterding.com/2012/05/simple-way-to-get-reads-length.html

Hope that helps! Jen, Galaxy team

ADD COMMENTlink written 7 weeks ago by Jennifer Hillman Jackson23k

Thank you very much for your helpful reply as always!

ADD REPLYlink written 7 weeks ago by 18Kbeyond20

Hi Jen,

So I replaced gunzip -c with cat for galaxy script as galaxy seems to convert fastq.gz to fastq.

However, after I remove / in front of 0 and 1, I still got error message:

awk: (FILENAME=- FNR=613) fatal: print to "standard output" failed (Broken pipe)

cat: write error: Broken pipe

ADD REPLYlink written 7 weeks ago by 18Kbeyond20

The output does not have any tabs, is just a single row of data with spaces leading and in between the two values.

Commands I used:

$ awk '{if(NR%4==1) {print $0}}' test.fastqsanger | head -n 1 | awk -F ":" '{print $1}' | cut -c 2- | awk '{print length;}' | sort | uniq -c > out_no_escaping

$ more out_no_escaping 

   1 5

^^ leads with spaces, then a space between the 1 and 5, end of line after the 5.

This is not tabular format so it fails. I still think your command will need some tuning.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 80 users visited in the last hour