Cufflinks options to get similar results

Question: Cufflinks options to get similar results

3.1 years ago by

United States

ctoscano • 0 wrote:

Hello everyone,

I'm trying to reproduce the results from a dataset from RNAseq I found on GEO. After I processed with the standard options on everything the FPKM values are a little higher. I noticed that in cufflinks I have different values, in their output file they have this command line:

cufflinks --library-type fr-firststrand --no-effective-length-correction --min-isoform-fraction 0 --pre-mrna-fraction 0.05 --junc-alpha 0.05 --max-bundle-length 5500000 -b

I'm not sure how to translate that into cufflinks in Galaxy.

Thanks a lot.

cufflinks galaxy • 907 views

ADD COMMENT • link •

modified 3.1 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.1 years ago by ctoscano • 0

3.1 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Every command-line argument that is configurable will appear on the tool form. If an option is hard-set (non-configurable), the tool wrapper can be examined through the Tool Shed. Any of these that are hard-set or even the defaults (if configurable) could be modified for use in a local/cloud if you wish to adjust and then install a custom wrapper version.

To match up command-line options with Galaxy, the idea is to compare the arguments in the string (defined in the Cufflinks manual) to the options on the tool form. This is true not just for Cufflinks but most 3rd party tools.

Is there a specific option you have trouble locating?

Thanks, Jen, Galaxy team

ADD COMMENT • link written 3.1 years ago by Jennifer Hillman Jackson ♦ 25k

Thanks for your response Jen.

Yes, I'm struggling with the max-bundle-length and I'm not sure wich Max Intron Length should I use.

ADD REPLY • link written 3.1 years ago by ctoscano • 0

The param max-bundle-length is under Advanced Settings as Maximum genomic length of a given bundle

For Max Intron Length, this depends on the genome. Mammal vs insect vs plant (vs even bacteria) will each, or a subgroup within those, have a reasonable upper value. The larger the number is from the largest known observed intron length for that genome, the more resource is unnecessarily used. But if too low, data will be missed/discarded that shouldn't be. The tool is by default setup for mammalian genomes and 300k is pretty conservative (inclusive, resource intensive). I believe that this is a params that if set a bit large for the genome (bit over largest observed), that is a safe choice. If resource becomes an issue, go smaller or move to a server with more resource. (I am not sure if this is a param that will lead to "infinite" jobs that will never finish on any server or not, but watch out for that).

Thanks! Jen

ADD REPLY • link written 3.1 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »