Question: Cufflinks options to get similar results
0
gravatar for ctoscano
3.1 years ago by
ctoscano0
United States
ctoscano0 wrote:

Hello everyone,

I'm trying to reproduce the results from a dataset from RNAseq I found on GEO. After I processed with the standard options on everything the FPKM values are a little higher. I noticed that in cufflinks I have different values, in their output file they have this command line: 

cufflinks --library-type fr-firststrand --no-effective-length-correction --min-isoform-fraction 0 --pre-mrna-fraction 0.05 --junc-alpha 0.05 --max-bundle-length 5500000 -b 

I'm not sure how to translate that into cufflinks in Galaxy.

Thanks a lot.

cufflinks galaxy • 907 views
ADD COMMENTlink modified 3.1 years ago by Jennifer Hillman Jackson25k • written 3.1 years ago by ctoscano0
0
gravatar for Jennifer Hillman Jackson
3.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Every command-line argument that is configurable will appear on the tool form. If an option is hard-set (non-configurable), the tool wrapper can be examined through the Tool Shed. Any of these that are hard-set or even the defaults (if configurable) could be modified for use in a local/cloud if you wish to adjust and then install a custom wrapper version.

To match up command-line options with Galaxy, the idea is to compare the arguments in the string (defined in the Cufflinks manual) to the options on the tool form. This is true not just for Cufflinks but most 3rd party tools. 

Is there a specific option you have trouble locating?

Thanks, Jen, Galaxy team

ADD COMMENTlink written 3.1 years ago by Jennifer Hillman Jackson25k

Thanks for your response Jen.

Yes, I'm struggling with the max-bundle-length and I'm not sure wich Max Intron Length should I use.

ADD REPLYlink written 3.1 years ago by ctoscano0

The param max-bundle-length is under Advanced Settings as Maximum genomic length of a given bundle

For Max Intron Length, this depends on the genome. Mammal vs insect vs plant (vs even bacteria) will each, or a subgroup within those, have a reasonable upper value. The larger the number is from the largest known observed intron length for that genome, the more resource is unnecessarily used. But if too low, data will be missed/discarded that shouldn't be. The tool is by default setup for mammalian genomes and 300k is pretty conservative (inclusive, resource intensive). I believe that this is a params that if set a bit large for the genome (bit over largest observed), that is a safe choice. If resource becomes an issue, go smaller or move to a server with more resource. (I am not sure if this is a param that will lead to "infinite" jobs that will never finish on any server or not, but watch out for that).

Thanks! Jen

ADD REPLYlink written 3.1 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 187 users visited in the last hour