Question: input file types to DESeq on the cloud
0
gravatar for anna.gordon
4.4 years ago by
anna.gordon0 wrote:

Hi there,

I've had a go at using the Ratsch lab DESeq tool but it seems to require your input files to be BAM files whereas i've already got a counts table of my mapped reads. Additionally it asks for an annotated genome GFF file and i'm using a non-model organism and I just don't have this information. (I was planning on getting further annotation on a vastly reduced dataset once i've run the analysis)

So, i'm thinking of getting some amazon credits and having a go at the more-fully-functional version of Galaxy on the Cloud. Before I do so i'd like to know what the input file formats are - is it sufficient to just have a counts table? I've dipped my toe in at using DEseq and DESeq2 with R and am trying to find the best way to proceed...

any advice gratefully received!

Where can I get information about the added extras available on the cloud vs galaxy main

Many thanks,

Anna

 

rna-seq • 1.5k views
ADD COMMENTlink modified 4.4 years ago by Jennifer Hillman Jackson25k • written 4.4 years ago by anna.gordon0
0
gravatar for Jennifer Hillman Jackson
4.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi Anna,

When you move to using Cloudman, all of the tools in the Tool Shed become available to you. And you will also be able to keep up to 1T of data content in an S3 bucket (if you wish to keep one around permanently or for some length of time), plus perhaps most importantly - scale up resources for your own use without competition (memory, cluster nodes to run jobs). Administration is fairly straightforward for basic usage, and for more advanced configuration, help is available through here or sometimes better (will reach more Galaxy developers/admins) on the galaxy-dev@bx.psu.edu mailing list.

Amazon does offer an educational/research grant program, to help with costs. The application process is simple and usually quite quick. I personally think this is a great option not to be missed, for supplemental funding, to further a project along. No guarantees - AWS decides, and each request is different, but applying, if this suits your project, is certainly worth the time.

To get started, here are the links into our wiki:
http://usegalaxy.org/cloud
http://usegalaxy.org/toolshed
Once you have an Amazon account, launching a new CloudMan instance is simplified by using the "CloudLaunch" option in the top menu at the public Main Galaxy instance http://usegalaxy.org

Browsing the Tool Shed itself can reveal a great deal about the repositories and tools.
http://toolshed.g2.bx.psu.edu/
Create an account for the full menu of choices. Locate the DEseq packages and click into the tools. The one owned by the "iuc" was put together by the community admins working with the Galaxy team, but explore any. Sometimes a README will be available that explains more about usage. Other times a closer look is required - use the upper right corner menu to view the tip files, these open in a hierarchy. The top level "<tool>.xml" files are the forms that are displayed in the User Interface and almost always provide instructions about expected inputs and formats. Some repositories will display the actual form, others just the code. Install if the code is foreign to you, to view the forms intact. Many tools authors choose to include all standard options. And if you program, the wrappers can be modified of course to customize (everything here is open source). Usage attribution/credits in publication is a standard courtesy and of course will complete your methods disclosure in a truly reproducible manor (to be appreciated by publishers, tool contributors, and your readers).

Tools/packages owned by "galaxyteam" were developed by the core team, or in close collaboration before tools were released under the owner "iuc". You will also find tool contributions from authors that also frequent this forum, the dev mailing list, and other public informatics forums. And many many from our ever-growing community, including some public sources. All are considered equally "good" without preference, with a few considerations you might want to use as guidance: a "valid" tool is a good choice (means that the minimum set of contents were included, such as automated tests) and the number of downloads will provide an indication about how vetted a package is (assuming it is not a brand-new addition). Further ranking/feedback/vetting of tools and packages, including UI display of these metrics, provided by the community and potentially the IUC, is underway. Ratings are accepted, but not in full use yet, and more is planned near-term as the Tool Shed further matures, to aid with selection, as the number of repositories is growing rapidly.

Back to the DEseq repositories and tools available (there are three), one accepts a counts file (the transformation sam counts tool is included), one a GFF (a service-type tool, likely the same as you were using at the public Galaxy instance - I didn't double check), and one that I am not 100% sure about, but seems to input counts as the original binary does - the "iuc" version Bjoern Gruening maintains - install to confirm (or perhaps he will see this post and reply) - but the dependencies point to here: http://www.bioconductor.org/packages/2.12/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf

Hopefully this gets you started and helps others with a similar interest in exploring Galaxy's expanded cloud options.

Jen, Galaxy team

 

ADD COMMENTlink written 4.4 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour