Thanks Ariel.
Bony
To: galaxy-user@lists.bx.psu.edu
Subject: galaxy-user Digest, Vol 74, Issue 15
Send galaxy-user mailing list submissions to
galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.bx.psu.edu/listinfo/galaxy-user
or, via email, send a message with subject or body 'help' to
galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at
galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of galaxy-user digest..."
HEY! This is important! If you reply to a thread in a digest, please
1. Change the subject of your response from "Galaxy-user Digest Vol
..." to the original subject for the thread.
2. Strip out everything else in the digest that is not part of the
thread you are responding to.
Why?
1. This will keep the subject meaningful. People will have some idea
from the subject line if they should read it or not.
2. Not doing this greatly increases the number of emails that match
search queries, but that aren't actually informative.
Today's Topics:
1. Re: Lift Over bam files (Jennifer Jackson)
2. Linking to Compressed Data (Branden Timm)
3. Re: How to decide "Mean Inner Distance between Mate Pairs"?
(Jennifer Jackson)
4. Can I convert paired-end datasets into single end ones?
(Du, Jianguang)
5. Re: Can I convert paired-end datasets into single end ones?
(Jennifer Jackson)
6. Re: Galaxy toolshed-vcftools (Jennifer Jackson)
7. Do I need to allow indel search? (Du, Jianguang)
8. Use Own Junctions or not (Du, Jianguang)
9. Re: copy number variation detcetion in Glaxay (Jennifer Jackson)
10. Cuffdiff errors (Yan He)
11. Re: Cuffdiff errors (Jennifer Jackson)
12. Re: copy number variation detcetion in Glaxay (Mathew Bunj)
13. Re: Do I need to allow indel search? (Jennifer Jackson)
Message: 1
Date: Wed, 15 Aug 2012 09:05:41 -0700
To: Geert Vandeweyer <geertvandeweyer@gmail.com>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Lift Over bam files
Message-ID: <502BC8D5.8020804@bx.psu.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello Geert,
For the best results, especially for SNPs, you will want to map
directly to the target genome. The genome Galaxy is using is the same
primary human genome the GATK team also uses - the 1000 genomes build
37 -> "hg_g1k_v37". Click on the GATK links from one of the tools to
see the details. GATK provides liftOver files between the the genomes,
and you could install and use these with the liftOver tool, but not
for BAM datasets. Inputs are BED, Interval, GFF. (BAM -> SAM ->
interval).
GATK also provides indexes (lifted) for hg19, but Galaxy does not
provide an hg19 genome that is sorted appropriately for GATK, or at
least not yet. RNA-seq tools and most other tools up until now
required sorting in one way, and now GATK requires sorting in another,
but keeping the database dbkey the same is important for visualization
and other functions. It can get complicated when moving between tools
in a history. We will likely have some 'best practice' solutions soon,
but for now, use the 1000 genomes build to keep it all simple:
Human (Homo sapients) (b37): hg_g1k_v37
The good news is that installing this genome has been greatly
simplified. The genome and indexes are now available on an rsync
server.
You can simply download and add the genome directory and all the
contents. You will still need to create the .loc file entries but the
rest is done.
http://wiki.g2.bx.psu.edu/Admin/Data%20Integration
The "dbkey" is "hg_g1k_v37"
Hopefully one of the options works out for you!
Jen
Galaxy team
ps: You post ended up threading behind another post. I am not sure if
this was because you started with a reply, but changed the subject
line?
This is not enough to start a new thread. Instead, please create a
brand new message in your email client, then copy over the mailing
list email address, add a subject line, and this will start a new
thread that will get tracked and not missed. Thanks!
--
Jennifer Jackson
http://galaxyproject.org
Message: 2
Date: Wed, 15 Aug 2012 11:09:37 -0500
To: galaxy-user@lists.bx.psu.edu
Subject: [galaxy-user] Linking to Compressed Data
Message-ID: <502BC9C1.8090903@wisc.edu>
Content-Type: text/plain; CHARSET=US-ASCII; format=flowed
Hi All,
Is it possible to link to compressed files in a Galaxy data
library?
We receive all of our NGS data in bz2 or gzip format for obvious
reasons, just wondering if I have to decompress it on the filesystem
before I link to it or not. Thanks!
--
Branden Timm
btimm@glbrc.wisc.edu
Message: 3
Date: Wed, 15 Aug 2012 09:27:36 -0700
To: Sean Davis <sdavis2@mail.nih.gov>, "Du, Jianguang"
<jiandu@iupui.edu>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] How to decide "Mean Inner Distance between
Mate Pairs"?
Message-ID: <502BCDF8.4030005@bx.psu.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Great advice Sean!
Jianguang, this is the correct analysis - mapping the data to test the
actual insert size of the library as sequenced. The experimental notes
at SRA are just a starting place, the data is truth. A sample through
TopHat itself might produce more precise results. I suspect the
coverage
on your top Blastn HSP is not complete, breaking off where it hits a
splice. And that you have some bias for sequences/hits that cross
junctions near ends. But overall, none of this would likely make that
much of a difference in the analysis as a whole.
Good luck!
Jen
Galaxy team
--
Jennifer Jackson
http://galaxyproject.org
Message: 4
Date: Wed, 15 Aug 2012 16:59:27 +0000
To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] Can I convert paired-end datasets into single
end ones?
Message-ID:
<2B3C356FD95D6A41B0CCFF77102E5EDF12867830@IU-MSSG-
MBX106.ads.iu.edu>
Content-Type: text/plain; charset="iso-8859-1"
Dear All,
I have some paired-end datasets to be analyzed, but I am not sure
about their Mean Inner Distance between Mate Pairs.
Can I convert these paired-end datasets into single-end ones and use
them as single-end dataset as follows?
1) Use the tool "Manipulate FASTQ" to convert the sequence of reverse
reads into its reverse-complement counter part, so that all of the
reverse reads actually become forward reads.
2) run Tophat on the manipulated datasets as single-end ones.
Thanks.
Jianguang
URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120815="" afa039f2="" attachment-0001.html="">
Message: 5
Date: Wed, 15 Aug 2012 10:15:16 -0700
To: "Du, Jianguang" <jiandu@iupui.edu>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] Can I convert paired-end datasets into
single end ones?
Message-ID: <502BD924.7070705@bx.psu.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Jianguang,
This is not recommended. The value of the paired relationships would
be
lost. Using an estimated Mean Inner Distance is a much better
solution.
This is keeping in mind that testing different values may be necessary
to obtain the optimal results for any dataset.
Your situation about the reported vs actual sizing is not unique and
does not mean that the data is poor (when considered as a single
factor). Searching an online NGS website such as seqanswers.com about
the topic will being up several threads where this is discussed.
Should
you have outstanding concerns about this particular parameter, please
consider contacting the tool authors at tophat.cufflinks@gmail.com for
advice.
Best,
Jen
Galaxy team
--
Jennifer Jackson
http://galaxyproject.org
Message: 6
Date: Wed, 15 Aug 2012 11:28:37 -0700
To: Mahtab Mirmomeni <m.mirmomeni@student.unimelb.edu.au>
Cc: galaxy-user@bx.psu.edu
Subject: Re: [galaxy-user] Galaxy toolshed-vcftools
Message-ID: <502BEA55.3090908@bx.psu.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello Mahtab,
Some vcftools are in development status on our test server under "VCF
Tools" at:
http://test.g2.bx.psu.edu/
Adding these and others (including the ones you mention) to the Tool
Shed is under consideration by the Galaxy team, but a firm timeline is
not set at this time. If your interest is in adding these to the Tool
Shed, they would welcomed. Another option is to check in with the
galaxy-dev@bx.psu.edu development mailing list to see if any other
developer/group has something they could submit to the Tool Shed.
Please
do not cross-post this thread, though, but instead create a brand new
message/thread with the relevant details and request (e.g. not a reply
with a new subject line or an added email).
Our primary investment at this time has been in the GATK pipeline.
There
are differences in the way vcftools vs GATK works, so its not an exact
1:1 mapping of functionality, but you may be able obtain the results
you
need by using the NGS: GATK Tools (beta) variant utilities
(CombineVariants, VariantEval, etc).
The GATK tool set is under active development and wrappers for these
can
be found in both galaxy-central and galaxy-dist at bitbucket in
various
stages of stability. Where to pull from depends on the tool (the
version
will be the same for some in both locations, but this can change over
time) and your desire/tolerance to work with tools that are undergoing
change.
http://bitbucket.org/galaxy/galaxy-dist
http://bitbucket.org/galaxy/galaxy-central
Hopefully this provides some useful choices,
Jen
Galaxy team
--
Jennifer Jackson
http://galaxyproject.org
Message: 7
Date: Wed, 15 Aug 2012 20:21:21 +0000
To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] Do I need to allow indel search?
Message-ID:
<2B3C356FD95D6A41B0CCFF77102E5EDF12867873@IU-MSSG-
MBX106.ads.iu.edu>
Content-Type: text/plain; charset="iso-8859-1"
Dear All,
I want to compare the pre-mRNA alternaive splicing events between RNA-
seq datasets. Do I need to allow indel search when I run Tophat? What
is the indel search for? I could not find detail information about
"indel search" through the documentation of Tophat.
Thanks.
Jianguang Du
URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120815="" ed2d0eda="" attachment-0001.html="">
Message: 8
Date: Wed, 15 Aug 2012 20:24:40 +0000
To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] Use Own Junctions or not
Message-ID:
<2B3C356FD95D6A41B0CCFF77102E5EDF1286787E@IU-MSSG-
MBX106.ads.iu.edu>
Content-Type: text/plain; charset="iso-8859-1"
Dear All,
I want to compare the pre-mRNA alternaive splicing events between RNA-
seq datasets. Should I use own junctions when I run Tophat? What does
"Own Junctions" mean?
Thanks.
Jianguang DU
URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120815="" 7ca0dc9f="" attachment-0001.html="">
Message: 9
Date: Thu, 16 Aug 2012 00:48:28 -0700
To: shamsher jagat <kanwarjag@gmail.com>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] copy number variation detcetion in Glaxay
Message-ID: <502CA5CC.8020607@bx.psu.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello,
The tool "FreeBayes" may be of interest. Please see the tool form for
links to the primary tool documentation to see if the functionality
will
meet your needs.
Best,
Jen
Galaxy team
--
Jennifer Jackson
http://galaxyproject.org
Message: 10
Date: Thu, 16 Aug 2012 21:18:01 +0800
To: <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] Cuffdiff errors
Message-ID: <blu0-smtp34493ac6e595ea755c7e677bfb50@phx.gbl>
Content-Type: text/plain; charset="us-ascii"
Hello,
I am having a problem running Cuffdiff on some RNA-seq data. I want
to
compare 2 samples (A and B). I did Cufflinks and Cuffmerge before
running
Cuffdiff. I ran Cuffdiff with the following options: Cuffmerge +
Bowtie A, B
(sorted required by Cufflinks after mapped with Bowtie). But I got the
following error message:
An error occurred running this job: cuffdiff v1.3.0 (3022)
cuffdiff --no-update-check -q -p 8 -c 10 --FDR 0.050000
/galaxy/main_pool/pool4/files/004/800/dataset_4800173.dat
/galaxy/main_pool/pool3/files/004/799/dataset_4799827.dat
/galaxy/main_pool/pool4/files/004/799/dataset_4799831.dat
Where did I do wrong? Thanks very much for your help!
Yan
URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120816="" f08c20fb="" attachment-0001.html="">
Message: 11
Date: Thu, 16 Aug 2012 08:09:15 -0700
To: Yan He <yanhe83@hotmail.com>
Cc: galaxy-user@lists.bx.psu.edu,
"closeticket@galaxyproject.org"
<closeticket@galaxyproject.org>
Subject: Re: [galaxy-user] Cuffdiff errors
Message-ID: <502D0D1B.60308@bx.psu.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Yan,
Would you please submit this as a bug report? It helps if you leave
all
inputs undeleted in your history. Instructions:
http://wiki.g2.bx.psu.edu/Support#Reporting_tool_errors
Thanks!
Jen
Galaxy team
--
Jennifer Jackson
http://galaxyproject.org
Message: 12
Date: Thu, 16 Aug 2012 08:09:15 -0700 (PDT)
To: Jennifer Jackson <jen@bx.psu.edu>, shamsher jagat
<kanwarjag@gmail.com>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] copy number variation detcetion in Glaxay
Message-ID:
<1345129755.17010.YahooMailNeo@web120906.mail.ne1.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"
Thanks Jen,
?
I am also intrested in this. Has any one used FreeBayes in Galaxy or
out side Gaaxy to detect CNV from a ilumina sequencing data. Is their
a tutorial for running this tools.
?
Thanks.
?
To: shamsher jagat <kanwarjag@gmail.com>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] copy number variation detcetion in Glaxay
Hello,
The tool "FreeBayes" may be of interest. Please see the tool form for
links to the primary tool documentation to see if the functionality
will
meet your needs.
Best,
Jen
Galaxy team
--
Jennifer Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.? Please keep all replies on the list by
using "reply all" in your mail client.? For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
?
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
?
http://lists.bx.psu.edu/
URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120816="" 52d7ec2c="" attachment-0001.html="">
Message: 13
Date: Thu, 16 Aug 2012 08:48:40 -0700
To: "Du, Jianguang" <jiandu@iupui.edu>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] Do I need to allow indel search?
Message-ID: <502D1658.1030801@bx.psu.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello Jianguang,
In simple terms, "No" produces a strict alignment and "Yes" produces a
more permissive alignment.
The option 'Allow indel search:' is a way of allowing for variability
in
your data (presumably biologically valid) to not be interpreted as
mismatches or gaps. Mismatches/gaps in an alignment lower the overall
score and can lead to alignment failures. The default for this
parameter
is "Yes" with value of 3 for insert/deletion length in Galaxy
(allowing
for simple nucleotide polymorphism variability up to a single codon,
per
position, in either the query or target). All values can be modified.
If this interferes with your data mapping accurately, then it could be
disabled by setting the parameter to "No". A test comparing the two
alternatives on a sample would be a good way to see how this single
change affects your particular sample. Good questions to ask: What
reads
do not map when the stricter alignment rules are applied? Do any reads
map with a change in specificity? Do you agree with the results?
Hopefully this helps!
Jen
Galaxy team
--
Jennifer Jackson
http://galaxyproject.org
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
End of galaxy-user Digest, Vol 74, Issue 15
*******************************************