Question: Fastq Collapse?
0
Johnson, Kory (NIH/NINDS) [C] • 50 wrote:
Hello Galaxy users,
Just to follow-up on my user group question described in the list-serv
e-mail just sent out.
I put forth the question about FASTQ collapse, as the FASTX-toolkit by
Assaf Gordon describes the supported collapse tool as follows:
"FASTQ/A Collapser, Collapsing identical sequences in a FASTQ/A file
into a single sequence (while maintaining reads counts)"
Yet, the collapse tool in Galaxy appears to be FASTA supported only?
Why am I asking?
Would like to remove duplicate reads in a FASTQ file by sequence,
leaving one representative unique read having the best quality line
among the duplicates it was identified from.
Can certainly convert FASTQ to FASTA, then collapse, but if you do not
have the qual file, you cannot reconstitute a FASTQ file with actual
qual scores.
Any argument for or against? Or can Galaxy already do and I am
missing the tool to actually use?
Thanks ... best,
Kory
Kory R. Johnson, MS, PhD
Sr. Bioinformatics Scientist
www.kellygovernmentsolutions.com
Providing Contract Services For:
Bioinformatics Section,
Information Technology & Bioinformatics Program,
Division of Intramural Research (DIR),
National Institute of Neurological Disorders & Stroke (NINDS),
National Institutes of Health (NIH),
Bethesda, Maryland
Mailing Address:
NINDS/NIH
Clinical Center (Building 10)
Office 5S223
9000 Rockville Pike
Bethesda, MD 20892
Contact Information:
Phone: 301-402-1956
Fax: 301-480-3563
email: johnsonko@ninds.nih.gov
ď Green Message:
Please consider the environment before printing this e-mail. Thank
you.
Important Message:
This electronic message transmission contains information intended for
the recipient only. Such that, the information contained herein may
be confidential, privaledged, or proprietary. If you are not the
intended recipient, be aware that any disclosure, copying,
distribution, or use of this information is strictly prohibited. If
you have received this electronic information in error, please notify
the sender immediately by telephone. Thank you.
To: galaxy-user@lists.bx.psu.edu
Subject: galaxy-user Digest, Vol 56, Issue 4
Send galaxy-user mailing list submissions to
galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.bx.psu.edu/listinfo/galaxy-user
or, via email, send a message with subject or body 'help' to
galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at
galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of galaxy-user digest..."
HEY! This is important! If you reply to a thread in a digest, please
1. Change the subject of your response from "Galaxy-user Digest Vol
..." to the original subject for the thread.
2. Strip out everything else in the digest that is not part of the
thread you are responding to.
Why?
1. This will keep the subject meaningful. People will have some idea
from the subject line if they should read it or not.
2. Not doing this greatly increases the number of emails that match
search queries, but that aren't actually informative.
Today's Topics:
1. CuffDiff gene fpkm tracking file. (Samuele Gherardi)
2. CuffDiff gene fpkm tracking file- Sorry! I sent only a part
of my email (Samuele Gherardi)
3. Re: listing attributes of data input (Peter)
4. Re: CuffDiff gene fpkm tracking file. (Jeremy Goecks)
5. Re: Downloadable Galaxy Virtual Machine in VMware
(Haarst, Jan van)
6. Re: Downloadable Galaxy Virtual Machine in VMware (Nate Coraor)
7. FASTQ collapse? (Johnson, Kory (NIH/NINDS) [C])
Message: 1
Date: Thu, 3 Feb 2011 09:53:44 +0000
To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] CuffDiff gene fpkm tracking file.
Message-ID:
<025DB19130DE0B43BBD868BDB9244A82C134@E10-MBX3-DR.personale.di
r.unibo.it>
Content-Type: text/plain; charset="iso-8859-1"
this is an example of my CuffDiff gene fpkm tracking file.
tracking_id class_code nearest_ref_id gene_short_name tss_id
locus q1_FPKM q1_conf_lo q1_conf_hi q2_FPKM q2_conf_lo
q2_conf_hi
XLOC_000001 - - MT-ND5 - chrM:0-16571
12484.2 12260.8 12707.7 11447 11233.1 11661
XLOC_000002 - - USP14 TSS1,TSS2,TSS3
chr18:148586-236453 16.7235 9.41244 24.0346 19.437 11.7368
27.1371
XLOC_000003 - - SMCHD1
TSS10,TSS11,TSS12,TSS4,TSS5,TSS6,TSS7,TSS8,TSS9 chr18:2719322-2728540
28.2493 17.5093 38.9892 27.2263 16.6263 37.8262
XLOC_000004 - - EMILIN2 TSS13,TSS14
chr18:2880607-2882469 3.98118 0 7.99721 4.62875 0.278519
8.97899
I this is normal, how can I find the class code of transcript listed
in the CuffDiff gene expression file?
thank you in advance
Samuele.
Message: 2
Date: Thu, 3 Feb 2011 10:58:47 +0000
To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] CuffDiff gene fpkm tracking file- Sorry! I sent
only a part of my email
Message-ID:
<025DB19130DE0B43BBD868BDB9244A82CB4C@E10-MBX3-DR.personale.di
r.unibo.it>
Content-Type: text/plain; charset="iso-8859-1"
Hello everybody,
I'm quite new in NGS world, I'm trying to analize dome RNA-seq data. I
followed the workflow through tophat,cufflink,cuffcompare and cuffdiff
I suppose everything work fine but in the Cuffdiff gene fpkm file the
column Class_Code is empty and i don't know why?
this is an example of my CuffDiff gene fpkm tracking file.
tracking_id class_code nearest_ref_id gene_short_name tss_id
locus q1_FPKM q1_conf_lo q1_conf_hi q2_FPKM q2_conf_lo
q2_conf_hi
XLOC_000001 - - MT-ND5 - chrM:0-16571
12484.2 12260.8 12707.7 11447 11233.1 11661
XLOC_000002 - - USP14 TSS1,TSS2,TSS3
chr18:148586-236453 16.7235 9.41244 24.0346 19.437 11.7368
27.1371
XLOC_000003 - - SMCHD1
TSS10,TSS11,TSS12,TSS4,TSS5,TSS6,TSS7,TSS8,TSS9 chr18:2719322-2728540
28.2493 17.5093 38.9892 27.2263 16.6263 37.8262
XLOC_000004 - - EMILIN2 TSS13,TSS14
chr18:2880607-2882469 3.98118 0 7.99721 4.62875 0.278519
8.97899
I this is normal, how can I find the class code of transcript listed
in the CuffDiff gene expression file?
thank you in advance
Samuele.
Message: 3
Date: Thu, 3 Feb 2011 11:05:07 +0000
To: Freddy de Bree <freddy.debree@wur.nl>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] listing attributes of data input
Message-ID:
<aanlktimkxwr_9mfudu7ws+qaphtfrqdj+rrukrltstfx@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Some are given in examples on the main tool XML doc page,
https://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax
Others I've noticed by looking at the provided XML wrappers,
and/or email list questions. For example, .ext or .extension gives
the Galaxy file type (e.g. fasta).
Other than that, I guess you can always read the code - but I
agree that a document describing this would be nice to have.
Peter
Message: 4
Date: Thu, 3 Feb 2011 09:14:19 -0500
To: Samuele Gherardi <samuele.gherardi@unibo.it>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] CuffDiff gene fpkm tracking file.
Message-ID: <b60dbe14-31bd-432d-92bb-fb7c8363a86f@emory.edu>
Content-Type: text/plain; charset=us-ascii
Hi Samuele,
Without seeing your history, it's difficult to say for certain what
your problem is. However, I'd guess that the GTF file that you're
providing to Cuffdiff does not have the p_id attribute. You can
produce a GTF file with both tss_id and p_id attributes by running
Cuffcompare and using sequence data.
Thanks,
J.
Message: 5
Date: Thu, 3 Feb 2011 16:54:14 +0100
To: "'Leon Mei'" <hailiang.mei@nbic.nl>,
"'galaxy-user@lists.bx.psu.edu'" <galaxy- user@lists.bx.psu.edu="">
Cc: 'David van Enckevort' <david.van.enckevort@nbic.nl>, 'Rob
Hooft'
<rob.hooft@nbic.nl>
Subject: Re: [galaxy-user] Downloadable Galaxy Virtual Machine in
VMware
Message-ID:
<48B2C6E110F6CC4387AFCD2EBCCA0B3B25D12D432F@scomp0536.wurnet.nl>
Content-Type: text/plain; charset="iso-8859-1"
The download can also be done using bittorrent, torrent is available
at http://www.biotorrents.net/details.php?id=136 .
This might be faster, as one of the peers is in Canada.
With kind regards,
Jan
Message: 6
Date: Thu, 3 Feb 2011 11:37:01 -0500
To: "Haarst, Jan van" <jan.vanhaarst@wur.nl>
Cc: "'galaxy-user@lists.bx.psu.edu'" <galaxy-user@lists.bx.psu.edu>,
'Leon Mei' <hailiang.mei@nbic.nl>, 'David van Enckevort'
<david.van.enckevort@nbic.nl>, 'Rob Hooft'
<rob.hooft@nbic.nl>
Subject: Re: [galaxy-user] Downloadable Galaxy Virtual Machine in
VMware
Message-ID: <20110203163701.GE15147@bx.psu.edu>
Content-Type: text/plain; charset=iso-8859-1
This is great! I haven't checked the image out, but I'm fetching the
torrent now and will leave it seeding here from PSU to help out.
Thanks,
--nate
Message: 7
Date: Thu, 3 Feb 2011 12:51:34 -0500
To: "'galaxy-user@bx.psu.edu'" <galaxy-user@bx.psu.edu>
Subject: [galaxy-user] FASTQ collapse?
Message-ID:
<f142c51c02c33c418e931103600d1e670648e93762@nihmlbxbb03.nih.gov>
Content-Type: text/plain; charset="us-ascii"
Hello,
Is there an option to collapse duplicate sequences in FASTQ format.
I see collapse for FASTA, but where is it for FASTQ?
Thank you,
Kory
Kory R. Johnson, MS, PhD
Sr. Bioinformatics Scientist
[cid:image001.jpg@01CBC39E.D7F751F0]
www.kellygovernmentsolutions.com
Providing Contract Services For:
Bioinformatics Section,
Information Technology & Bioinformatics Program,
Division of Intramural Research (DIR),
National Institute of Neurological Disorders & Stroke (NINDS),
National Institutes of Health (NIH),
Bethesda, Maryland
Mailing Address:
NINDS/NIH
Clinical Center (Building 10)
Office 5S223
9000 Rockville Pike
Bethesda, MD 20892
Contact Information:
Phone: 301-402-1956
Fax: 301-480-3563
email: johnsonko@ninds.nih.gov
P Green Message:
Please consider the environment before printing this e-mail. Thank
you.
Important Message:
This electronic message transmission contains information intended for
the recipient only. Such that, the information contained herein may
be confidential, privaledged, or proprietary. If you are not the
intended recipient, be aware that any disclosure, copying,
distribution, or use of this information is strictly prohibited. If
you have received this electronic information in error, please notify
the sender immediately by telephone. Thank you.
Name: image001.jpg
Type: image/jpeg
Size: 2396 bytes
Desc: image001.jpg
URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20110203="" 17864960="" attachment.jpg="">
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
End of galaxy-user Digest, Vol 56, Issue 4
******************************************
ADD COMMENT
• link
•
modified 7.8 years ago
by
Ben Bimber • 20
•
written
7.8 years ago by
Johnson, Kory (NIH/NINDS) [C] • 50