Re: Galaxy-User Digest, Vol 26, Issue 6

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Planet

All »

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Community

Log In

Sign Up

Add New Post

Question: Re: Galaxy-User Digest, Vol 26, Issue 6

10.3 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

Michael, I don't have any experience with Condor, but we're finding that the Galaxy framework scales very well - mostly because it doesn't do any of the computationally intense stuff itself - it hands that off through the job runner. Our internal Galaxy works fine with very large (6k subjects, Affy 6.0 snp chips, 9.6k subjects, Affy 5.0 snp chips...) datasets. Tools take a while to run (!) but Galaxy itself is more or less indifferent to the size of files because it only stores references (paths eg) to the disk files in the database - not the actual gigagobs of data. A collection of 100gb files takes about the same space in the Galaxy database tables as a collection of 1k ones as far as I can tell. A user's experience of Galaxy tool operation will obviously be impacted by the effects of physically shuffling large datafiles around for the cluster backend when a tool is run, so the cluster architecture, and the way datasets are made available to cluster nodes for processing is a key issue for very large datasets I suspect. On backends, I believe the party-line is that both PostgreSQL and MySQL are fully supported. We've used MySQL as our backend for nearly 2 years without any problems with released Galaxy versions - all 3 database backends are now all auto-tested before release AFAIK. Arguably, Postresql might be a better choice technically, and operationally, that's what runs the primary Galaxy site so is likely to work! My group remain familiar and comfortable with MySQL and don't have the energy to swap over. If you were going to swap, do it before you build a large userbase unless you have a bored DBA available to unload and reload a set of Galaxy history and user tables mid-stream. -- python -c "foo = map(None,'moc.liamg@surazal.ssor'); foo.reverse(); print ''.join(foo)"

galaxy • 2.5k views

ADD COMMENT • link •

modified 4.8 years ago by Gang Wang • 40 • written 10.3 years ago by fubar ♦ 1.1k

10.0 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

AFAIK, both $userId (the user numeric id) and $userEmail are added to the parameters in jobs/__init__.py, so should work... -- Ross Lazarus, Associate Professor, DACP, Harvard Medical School. Director of Bioinformatics, Channing Laboratory, BWH 181 Longwood Ave., Boston MA 02115, USA.

ADD COMMENT • link written 10.0 years ago by fubar ♦ 1.1k

9.7 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

Apache LDAP pass through authentication works fine for us as is with the security branch. I haven't tried it with the current trunk. The Galaxy roles/groups interface is being managed by our IRB admin - very easy and attractive option - but of course, independent and out of sync with our ldap groups. Mind you, in some ways the separation is convenient...in other ways it's inconsistent and redundant - but not a showstopper for us as we currently only have about 30 users (out of 1000+ in the LDAP tree!) who have IRB approvals to manage. The problem of using LDAP to manage groups and have Galaxy use those is that AFAIK there are no decent tools a non-tech savy administrator can use to administer an LDAP tree - we've written our own based on an LDAP adapter I wrote for Zope a long time ago, but LDAP really is a bit of a pain for non technical administrators and we don't want our system administrator wasting time with it if we can avoid that! So, I'm not even sure I'd swap away from the currently redundant but very convenient_to_manage situation, even if/when Galaxy can retrieve groups from an LDAP server. -- Ross Lazarus, Associate Professor, DACP, Harvard Medical School. Director of Bioinformatics, Channing Laboratory, BWH 181 Longwood Ave., Boston MA 02115, USA.

ADD COMMENT • link written 9.7 years ago by fubar ♦ 1.1k

8.6 years ago by

Peter Andrews • 30

Peter Andrews • 30 wrote:

I also find that uploading a file rarely works -- it just displays the loading up blue arrow forever. -- Peter Andrews Programmer Computational Genetics Lab Dartmouth Hitchcock Medical Center (603) 653-9963

ADD COMMENT • link written 8.6 years ago by Peter Andrews • 30

I find the only reliable way to upload files >1Gb is via a URL. Ian Quoting Peter Andrews <peter.andrews@dartmouth.edu>:

ADD REPLY • link written 8.6 years ago by Ian Donaldson • 120

I second the finding that >1Gb files work best when uploading from a URL, and also best if gzip'd beforehand. Best, Dan On Thu, Apr 8, 2010 at 8:31 AM, Ian Donaldson < -- Dan Webster Ph.D. Student - Cancer Biology Laboratory of Paul Khavari CCSR BLDG, Rm 2150 269 Campus Drive Stanford, CA 94305 DanWebster@stanford.edu

ADD REPLY • link written 8.6 years ago by Dan Webster • 30

8.6 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

Jelle, thanks for pointing out this dead-end link - now fixed - for the record, bitbucket marks any link preceded by 'wiki:' to be an unsafe URL... For the most up-to-date information on the WGA/SNP tools http://rgenetics.org is a good place to look.

ADD COMMENT • link written 8.6 years ago by fubar ♦ 1.1k

8.2 years ago by

Inma Barrasa • 10

Inma Barrasa • 10 wrote:

UNSUBSCRIBE To: galaxy-user@lists.bx.psu.edu Subject: galaxy-user Digest, Vol 52, Issue 4 Send galaxy-user mailing list submissions to galaxy-user@lists.bx.psu.edu To subscribe or unsubscribe via the World Wide Web, visit http://lists.bx.psu.edu/listinfo/galaxy-user or, via email, send a message with subject or body 'help' to galaxy-user-request@lists.bx.psu.edu You can reach the person managing the list at galaxy-user-owner@lists.bx.psu.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of galaxy-user digest..." Today's Topics: 1. October 5, 2010 Galaxy Development News Brief (Jennifer Jackson) 2. Reannotate microarray data (NONELL MAZELON, LARA) 3. Re: how to cancel a upload to data library? (Jennifer Jackson) 4. Re: Reannotate microarray data (Jennifer Jackson) Message: 1 Date: Tue, 05 Oct 2010 20:52:21 -0700 To: "galaxy-user@bx.psu.edu" <galaxy-user@bx.psu.edu>, "galaxy-dev@bx.psu.edu" <galaxy-dev@bx.psu.edu> Subject: [galaxy-user] October 5, 2010 Galaxy Development News Brief Message-ID: <4CABF275.8040801@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed http://bitbucket.org/galaxy/galaxy- central/wiki/Features/DevNewsBrief/2010_10_05 Here are the highlights of what you will get if you perform the following upgrade: hg pull -u -r d681ef7538ed - Enhanced Data Library Features 1. Add the ability to make any library item public (and it's contents, if any) * There are now options on library item pop-up menus to make a library dataset public, make all contents of a folder public, or make an entire data library public. These new menu items are displayed only if the current item (dataset) is not public or if the current item (data library, folder) contains items that are not public. * The following image shows a data library that contains a restricted dataset and a folder that contains a restricted dataset. Since the data library contains items that are not public (restricted), it's Library Actions menu includes a "Make public" option. Selecting this option will make all of the contents of the entire data library public. http://bitbucket.org/galaxy/galaxy- central/wiki/Features/DevNewsBrief/2010_10_05_library_popup.png * Similarly, the folder popup menu includes a "Make public" option since it contains a restricted dataset. Selecting this option at the folder level will only make the contents of the particular folder public. http://bitbucket.org/galaxy/galaxy- central/wiki/Features/DevNewsBrief/2010_10_05_folder_popup.png * The popup menu associated with any restricted library dataset will also include a "Make public" option. Selecting this option will make only that particular dataset public. http://bitbucket.org/galaxy/galaxy- central/wiki/Features/DevNewsBrief/2010_10_05_dataset_popup.png 2. Upload Option changes * Move the upload options (ie., file, directory, filesystem paths, import from history), which used to be in the data library upload form's title bar popup menu, into a select list on the upload form. * Selecting a different upload option now performs a refresh on the upload form so that any form contents entered before selecting the option are now retained. http://bitbucket.org/galaxy/galaxy- central/wiki/Features/DevNewsBrief/2010_10_05_upload_options.png - Extended Workflow and related History/Dataset Features 1. You can now flag a workflow step as an output within the editor. * To do this, hover over any output, click the asterisk to flag/unflag an output. If a workflow has outputs flagged, only those particular results will show in the history and all other outputs will be hidden. If no outputs are flagged, everything is shown (same as previous behavior). Note that workflow output flagging overrides HideDatasetActions; if a HideDatasetAction exists on a dataset flagged as a workflow output, it is removed. * This feature makes it much easier to take an existing workflow, and hide all of the outputs except for a small number of desired results. * It is worth noting that non-output results are only hidden, not deleted, and that they can be viewed in the history at any time by going to "View Hidden Datasets" in the history panel. 2. More options for Exporting, Searching, and Annotating * Both workflows and histories can now be exported to files. Importing workflows and histories is coming soon. * Annotations preserved when importing histories and workflows. * Tool search is now available in workflow editor so that you can search for tools when creating workflows. * Improvements for annotating histories and workflows to make it easier to add annotations. 3. Workflow Tool Form changes * Output extensions are now properly separated by ', ' (including space) instead of just slammed together. * Width is calculated better, taking into account the length of input and output rows, with an upper bound of 250px, and lower bound of 150px. 4. Post Job Actions * Separated into immediate and delayed actions. Immediate actions run when the job is created, as opposed to when it is finished. * Set Datatype is now an immediate action. This has no impact on the execution of the job, but it allows following jobs to queue with the correct subsequent datatypes. * RenameDatasetAction also happens immediately, rather than waiting for a job to complete. This makes the history less confusing to watch, as things don't randomly change names. 5. Updated UI - screenshots * Hovering for the tooltip on the asterisk. http://bitbucket.org/galaxy/galaxy- central/wiki/Features/DevNewsBrief/2010_10_05_workflow_tooltip.png * A larger segment of a workflow, showing two Group steps flagged as outputs, one intermediary join not flagged as an output, and then a final cut that is flagged. http://bitbucket.org/galaxy/galaxy- central/wiki/Features/DevNewsBrief/2010_10_05_workflow_flagged.png * And, for larger workflows, note that the overview in the editor panel colors outputs, so you can find them at a glance. http://bitbucket.org/galaxy/galaxy- central/wiki/Features/DevNewsBrief/2010_10_05_workflow_overview.png - Application Programming Interface (API) * API now returns data template information associated with a library dataset. * Added library dataset file_path to API captured information. - Data Upload * Allow for bz2 compressed uploads. Datasets can now be gzipped, bzipped, or zipped (only one file per zip, however). - Data Libraries * Clicking on a folder name now expands the folder instead of displaying information/description. * Moved folder icon to be next to the folder name and text (removing extra whitespace). * Rephrase naming for available compression schemes. * When viewing library dataset information under the Admin interface, a field is now present that displays the real path to the file on disk. - Mutation Visualization Tool * Requirement to enter the default columns indicating reference base, position, and start of sample. * New interactive zoom option: image can be zoomed in or out using the mouse wheel. - Trackster Visualization Tool * Better drawing of features at the bottom of viewing window. * Added track name dropdown for setting overview preferences. * Improved packing when zoomed (done on a w_scale basis instead of rounding to levels). - Documentation & Screencasts * Added more screencasts and topics in sample tracking documentation http://main.g2.bx.psu.edu/u/rkchak/p/sts - Community * Community ratings for all published items plus ratings are visible in published items lists. - Reproducibility * Tool parameters captured fresh when rerunning a process within an imported or copied history. - Functional Test Framework * Significant improvements to the functional test framework, especially those tests related to Galaxy forms and data libraries. * Allow the defining of a cluster job runner when running functional tests. - General * Changes to grids to make sorting order clear and reduce clutter from tags and annotations. * Adjusted BWA, Bowtie, and PerM wrappers so they now use verified use-case parameters. * Modified the 'file_path' field type in 'sample_dataset' table to 'TEXT' to support large file paths exceeding 255 characters. * Make the grouping, join, sort, and and any tool which uses r_wrapper.sh compatible with non-bash shells like bourne and dash (the default under modern Debian systems). - Bug Fixes! * Corrected bug that was causing Page item selection grids to be initialized twice and hence causing grid paging to fail. * Corrected bug in full parameter setting specification for Lastz. * Corrected typos in Tool Executed template and in Compute Motifs Frequency tool information. * Corrected import errors that would cause indel_analysis to fail under Python 2.6. Galaxy Project Team http://usegalaxy.org http://bitbucket.org/galaxy/galaxy-central This project is supported in part by NSF, NHGRI, and the Huck Institutes of the Life Sciences, and The Institute for CyberScience at Penn State. Message: 2 Date: Wed, 6 Oct 2010 09:13:09 +0200 To: "galaxy-user@bx.psu.edu" <galaxy-user@bx.psu.edu> Subject: [galaxy-user] Reannotate microarray data Message-ID: <bb3b87345736ee45843cc0adf9eeeb0d2ae8cd85ce@hermes2.imim.es> Content-Type: text/plain; charset="us-ascii" Dear Galaxy team, I just discovered your tool and is fantastic!. I want to reannotate the porcine affymetrix microarray. So I wanted to perform the following steps: 1. Run blat/blast to the porcine genome SGSC sscrofa9.2/susScr2 with the sequences of all probes on the array (which i have in a csv file) 2. Get genes 3. Do the same with human genome I can load my file and the entire pig genome but i do not know how to perform the "alignment" Is it possible to perform blat/blast (or any other alignment tool similar) through Galaxy? Thanks very much in advance, <mailto:lnonell@imim.es>Lara Nonell Microarray Analysis Service Scientific and Technical Services IMIM-Hospital del Mar Barcelona Biomedical Research Park (office 166) Doctor Aiguader, 88 | 08003 Barcelona Tel.+34 933 160 577 | Fax +34 933 160 410 lnonell@imim.es<mailto:lnonell@imim.es> www.imim.es<http: www.imim.es=""/> URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20101006="" ac5b6fde="" attachment-0001.html=""> Message: 3 Date: Wed, 06 Oct 2010 07:27:39 -0700 To: Kevin Lam <aboulia@gmail.com> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] how to cancel a upload to data library? Message-ID: <4CAC875B.6070206@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Kevin, Eventually the process will die if you leave the browser open and running. But have you tried to just close the upload process/dataset in your history? Do this by clicking the "X" on the top right of the dataset box in the history pane. I If that seems to be stuck in your browser, next you could try to delete the entire history by using the history pane's pull down Options menu and selecting the last item "Delete". Please let us know if one of these does not resolve the issue, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org Message: 4 Date: Wed, 06 Oct 2010 08:05:24 -0700 To: "NONELL MAZELON, LARA" <lnonell@imim.es> Cc: "galaxy-user@bx.psu.edu" <galaxy-user@bx.psu.edu> Subject: Re: [galaxy-user] Reannotate microarray data Message-ID: <4CAC9034.60302@bx.psu.edu> Content-Type: text/plain; charset=windows-1252; format=flowed Hello Lara, Did you want to map the probes or the consensus sequences against the native genomes? Other (cross-species) genomes? If you really want to map the probes, using a tool such as NGS: Mapping -> lastz would be one choice. However, grouping probe results may be tricky if your goal is to annotate the array with genes/transcripts. If you want to map the consensus sequences, using a tool such as BLAT is preferred. BLAT would be best run on the command line and the results uploaded into Galaxy for further analysis. See http://www.kentinformatics.com/products.html. UCSC will also provide help for the tool genome@soe.ucsc.edu. For analysis, the tools in Galaxy can perform many analysis tasks, but you would need to design the workflow. The basic path would be to upload the reference genome mapping results (if needed), pull the gene annotation from UCSC or your favorite source with "Get data", and perform an interval comparison based on overlap to the reference genome. Tuning the tool parameters to get the best gene/transcript annotation per probe/consensus would be part of the scientific analysis process and may take some experimentation. Hopefully this helps to get you started! Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org _______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user End of galaxy-user Digest, Vol 52, Issue 4 ****************************************** -- ******************************************** Inmaculada Barrasa Ph.D. Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Room 225-4 Nine Cambridge Center Cambridge, MA 02142 617 324-1673

ADD COMMENT • link written 8.2 years ago by Inma Barrasa • 10

8.1 years ago by

Bree, Freddy de • 90

Bree, Freddy de • 90 wrote:

Hi Anthony, You can avoid working with wrappers to run 3 scripts (your R-script, a perl or shell wrapper and the necessary xml file) to make 1 actually run what you want to do. I've struggled with this in the previous digests, which are not so accessible to search-find. simply put in your tool.xml file: <command> R --slave --vanilla --file=$GALAXY_ROOT_DIR/tools/<your rscript.r=""> --args $inputFile $<any nr="" of="" input="" args=""> $outputFile </command> This way you can run any R script in galaxy. Unfortunately, (also in one of the previous digests), "warnings" in R are causing an error status, so that once in a while you will need to add to the end of the cmd line in the above: 2>&- to ignore these warnings (but also any error message) My advice is to build and test in shell and galaxy without it and later add it to get to your output file(s). So far this has worked for me. Freddy de Bree Bioinformatics CVI, Lelystad The Netherlands

ADD COMMENT • link written 8.1 years ago by Bree, Freddy de • 90

7.7 years ago by

Brian Lam • 10

Brian Lam • 10 wrote:

ADD COMMENT • link written 7.7 years ago by Brian Lam • 10

closed -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org

ADD REPLY • link written 7.7 years ago by Jennifer Hillman Jackson ♦ 25k

7.7 years ago by

youngor Cheung • 10

youngor Cheung • 10 wrote:

<< Graduate Student: Yanfeng Zhang Comparative Genomics Group. Kunming Institute of Zoology,Chinese Academy of Sciences.

ADD COMMENT • link written 7.7 years ago by youngor Cheung • 10

Hello, Did you have a question we can help with? This digest posted from your email address to the galaxy-user mailing list. It would be great if you could post a new question directly without the complete digest next time, with a new subject line, not as a reply to a prior question/post/digest. Please let us know how we can help, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org

ADD REPLY • link written 7.7 years ago by Jennifer Hillman Jackson ♦ 25k

6.3 years ago by

De Kumar, Bony • 20

De Kumar, Bony • 20 wrote:

Thanks Ariel. Bony To: galaxy-user@lists.bx.psu.edu Subject: galaxy-user Digest, Vol 74, Issue 15 Send galaxy-user mailing list submissions to galaxy-user@lists.bx.psu.edu To subscribe or unsubscribe via the World Wide Web, visit http://lists.bx.psu.edu/listinfo/galaxy-user or, via email, send a message with subject or body 'help' to galaxy-user-request@lists.bx.psu.edu You can reach the person managing the list at galaxy-user-owner@lists.bx.psu.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of galaxy-user digest..." HEY! This is important! If you reply to a thread in a digest, please 1. Change the subject of your response from "Galaxy-user Digest Vol ..." to the original subject for the thread. 2. Strip out everything else in the digest that is not part of the thread you are responding to. Why? 1. This will keep the subject meaningful. People will have some idea from the subject line if they should read it or not. 2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative. Today's Topics: 1. Re: Lift Over bam files (Jennifer Jackson) 2. Linking to Compressed Data (Branden Timm) 3. Re: How to decide "Mean Inner Distance between Mate Pairs"? (Jennifer Jackson) 4. Can I convert paired-end datasets into single end ones? (Du, Jianguang) 5. Re: Can I convert paired-end datasets into single end ones? (Jennifer Jackson) 6. Re: Galaxy toolshed-vcftools (Jennifer Jackson) 7. Do I need to allow indel search? (Du, Jianguang) 8. Use Own Junctions or not (Du, Jianguang) 9. Re: copy number variation detcetion in Glaxay (Jennifer Jackson) 10. Cuffdiff errors (Yan He) 11. Re: Cuffdiff errors (Jennifer Jackson) 12. Re: copy number variation detcetion in Glaxay (Mathew Bunj) 13. Re: Do I need to allow indel search? (Jennifer Jackson) Message: 1 Date: Wed, 15 Aug 2012 09:05:41 -0700 To: Geert Vandeweyer <geertvandeweyer@gmail.com> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Lift Over bam files Message-ID: <502BC8D5.8020804@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello Geert, For the best results, especially for SNPs, you will want to map directly to the target genome. The genome Galaxy is using is the same primary human genome the GATK team also uses - the 1000 genomes build 37 -> "hg_g1k_v37". Click on the GATK links from one of the tools to see the details. GATK provides liftOver files between the the genomes, and you could install and use these with the liftOver tool, but not for BAM datasets. Inputs are BED, Interval, GFF. (BAM -> SAM -> interval). GATK also provides indexes (lifted) for hg19, but Galaxy does not provide an hg19 genome that is sorted appropriately for GATK, or at least not yet. RNA-seq tools and most other tools up until now required sorting in one way, and now GATK requires sorting in another, but keeping the database dbkey the same is important for visualization and other functions. It can get complicated when moving between tools in a history. We will likely have some 'best practice' solutions soon, but for now, use the 1000 genomes build to keep it all simple: Human (Homo sapients) (b37): hg_g1k_v37 The good news is that installing this genome has been greatly simplified. The genome and indexes are now available on an rsync server. You can simply download and add the genome directory and all the contents. You will still need to create the .loc file entries but the rest is done. http://wiki.g2.bx.psu.edu/Admin/Data%20Integration The "dbkey" is "hg_g1k_v37" Hopefully one of the options works out for you! Jen Galaxy team ps: You post ended up threading behind another post. I am not sure if this was because you started with a reply, but changed the subject line? This is not enough to start a new thread. Instead, please create a brand new message in your email client, then copy over the mailing list email address, add a subject line, and this will start a new thread that will get tracked and not missed. Thanks! -- Jennifer Jackson http://galaxyproject.org Message: 2 Date: Wed, 15 Aug 2012 11:09:37 -0500 To: galaxy-user@lists.bx.psu.edu Subject: [galaxy-user] Linking to Compressed Data Message-ID: <502BC9C1.8090903@wisc.edu> Content-Type: text/plain; CHARSET=US-ASCII; format=flowed Hi All, Is it possible to link to compressed files in a Galaxy data library? We receive all of our NGS data in bz2 or gzip format for obvious reasons, just wondering if I have to decompress it on the filesystem before I link to it or not. Thanks! -- Branden Timm btimm@glbrc.wisc.edu Message: 3 Date: Wed, 15 Aug 2012 09:27:36 -0700 To: Sean Davis <sdavis2@mail.nih.gov>, "Du, Jianguang" <jiandu@iupui.edu> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: Re: [galaxy-user] How to decide "Mean Inner Distance between Mate Pairs"? Message-ID: <502BCDF8.4030005@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Great advice Sean! Jianguang, this is the correct analysis - mapping the data to test the actual insert size of the library as sequenced. The experimental notes at SRA are just a starting place, the data is truth. A sample through TopHat itself might produce more precise results. I suspect the coverage on your top Blastn HSP is not complete, breaking off where it hits a splice. And that you have some bias for sequences/hits that cross junctions near ends. But overall, none of this would likely make that much of a difference in the analysis as a whole. Good luck! Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org Message: 4 Date: Wed, 15 Aug 2012 16:59:27 +0000 To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: [galaxy-user] Can I convert paired-end datasets into single end ones? Message-ID: <2B3C356FD95D6A41B0CCFF77102E5EDF12867830@IU-MSSG- MBX106.ads.iu.edu> Content-Type: text/plain; charset="iso-8859-1" Dear All, I have some paired-end datasets to be analyzed, but I am not sure about their Mean Inner Distance between Mate Pairs. Can I convert these paired-end datasets into single-end ones and use them as single-end dataset as follows? 1) Use the tool "Manipulate FASTQ" to convert the sequence of reverse reads into its reverse-complement counter part, so that all of the reverse reads actually become forward reads. 2) run Tophat on the manipulated datasets as single-end ones. Thanks. Jianguang URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120815="" afa039f2="" attachment-0001.html=""> Message: 5 Date: Wed, 15 Aug 2012 10:15:16 -0700 To: "Du, Jianguang" <jiandu@iupui.edu> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: Re: [galaxy-user] Can I convert paired-end datasets into single end ones? Message-ID: <502BD924.7070705@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Jianguang, This is not recommended. The value of the paired relationships would be lost. Using an estimated Mean Inner Distance is a much better solution. This is keeping in mind that testing different values may be necessary to obtain the optimal results for any dataset. Your situation about the reported vs actual sizing is not unique and does not mean that the data is poor (when considered as a single factor). Searching an online NGS website such as seqanswers.com about the topic will being up several threads where this is discussed. Should you have outstanding concerns about this particular parameter, please consider contacting the tool authors at tophat.cufflinks@gmail.com for advice. Best, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org Message: 6 Date: Wed, 15 Aug 2012 11:28:37 -0700 To: Mahtab Mirmomeni <m.mirmomeni@student.unimelb.edu.au> Cc: galaxy-user@bx.psu.edu Subject: Re: [galaxy-user] Galaxy toolshed-vcftools Message-ID: <502BEA55.3090908@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello Mahtab, Some vcftools are in development status on our test server under "VCF Tools" at: http://test.g2.bx.psu.edu/ Adding these and others (including the ones you mention) to the Tool Shed is under consideration by the Galaxy team, but a firm timeline is not set at this time. If your interest is in adding these to the Tool Shed, they would welcomed. Another option is to check in with the galaxy-dev@bx.psu.edu development mailing list to see if any other developer/group has something they could submit to the Tool Shed. Please do not cross-post this thread, though, but instead create a brand new message/thread with the relevant details and request (e.g. not a reply with a new subject line or an added email). Our primary investment at this time has been in the GATK pipeline. There are differences in the way vcftools vs GATK works, so its not an exact 1:1 mapping of functionality, but you may be able obtain the results you need by using the NGS: GATK Tools (beta) variant utilities (CombineVariants, VariantEval, etc). The GATK tool set is under active development and wrappers for these can be found in both galaxy-central and galaxy-dist at bitbucket in various stages of stability. Where to pull from depends on the tool (the version will be the same for some in both locations, but this can change over time) and your desire/tolerance to work with tools that are undergoing change. http://bitbucket.org/galaxy/galaxy-dist http://bitbucket.org/galaxy/galaxy-central Hopefully this provides some useful choices, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org Message: 7 Date: Wed, 15 Aug 2012 20:21:21 +0000 To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: [galaxy-user] Do I need to allow indel search? Message-ID: <2B3C356FD95D6A41B0CCFF77102E5EDF12867873@IU-MSSG- MBX106.ads.iu.edu> Content-Type: text/plain; charset="iso-8859-1" Dear All, I want to compare the pre-mRNA alternaive splicing events between RNA- seq datasets. Do I need to allow indel search when I run Tophat? What is the indel search for? I could not find detail information about "indel search" through the documentation of Tophat. Thanks. Jianguang Du URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120815="" ed2d0eda="" attachment-0001.html=""> Message: 8 Date: Wed, 15 Aug 2012 20:24:40 +0000 To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: [galaxy-user] Use Own Junctions or not Message-ID: <2B3C356FD95D6A41B0CCFF77102E5EDF1286787E@IU-MSSG- MBX106.ads.iu.edu> Content-Type: text/plain; charset="iso-8859-1" Dear All, I want to compare the pre-mRNA alternaive splicing events between RNA- seq datasets. Should I use own junctions when I run Tophat? What does "Own Junctions" mean? Thanks. Jianguang DU URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120815="" 7ca0dc9f="" attachment-0001.html=""> Message: 9 Date: Thu, 16 Aug 2012 00:48:28 -0700 To: shamsher jagat <kanwarjag@gmail.com> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] copy number variation detcetion in Glaxay Message-ID: <502CA5CC.8020607@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello, The tool "FreeBayes" may be of interest. Please see the tool form for links to the primary tool documentation to see if the functionality will meet your needs. Best, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org Message: 10 Date: Thu, 16 Aug 2012 21:18:01 +0800 To: <galaxy-user@lists.bx.psu.edu> Subject: [galaxy-user] Cuffdiff errors Message-ID: <blu0-smtp34493ac6e595ea755c7e677bfb50@phx.gbl> Content-Type: text/plain; charset="us-ascii" Hello, I am having a problem running Cuffdiff on some RNA-seq data. I want to compare 2 samples (A and B). I did Cufflinks and Cuffmerge before running Cuffdiff. I ran Cuffdiff with the following options: Cuffmerge + Bowtie A, B (sorted required by Cufflinks after mapped with Bowtie). But I got the following error message: An error occurred running this job: cuffdiff v1.3.0 (3022) cuffdiff --no-update-check -q -p 8 -c 10 --FDR 0.050000 /galaxy/main_pool/pool4/files/004/800/dataset_4800173.dat /galaxy/main_pool/pool3/files/004/799/dataset_4799827.dat /galaxy/main_pool/pool4/files/004/799/dataset_4799831.dat Where did I do wrong? Thanks very much for your help! Yan URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120816="" f08c20fb="" attachment-0001.html=""> Message: 11 Date: Thu, 16 Aug 2012 08:09:15 -0700 To: Yan He <yanhe83@hotmail.com> Cc: galaxy-user@lists.bx.psu.edu, "closeticket@galaxyproject.org" <closeticket@galaxyproject.org> Subject: Re: [galaxy-user] Cuffdiff errors Message-ID: <502D0D1B.60308@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Yan, Would you please submit this as a bug report? It helps if you leave all inputs undeleted in your history. Instructions: http://wiki.g2.bx.psu.edu/Support#Reporting_tool_errors Thanks! Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org Message: 12 Date: Thu, 16 Aug 2012 08:09:15 -0700 (PDT) To: Jennifer Jackson <jen@bx.psu.edu>, shamsher jagat <kanwarjag@gmail.com> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: Re: [galaxy-user] copy number variation detcetion in Glaxay Message-ID: <1345129755.17010.YahooMailNeo@web120906.mail.ne1.yahoo.com> Content-Type: text/plain; charset="iso-8859-1" Thanks Jen, ? I am also intrested in this. Has any one used FreeBayes in Galaxy or out side Gaaxy to detect CNV from a ilumina sequencing data. Is their a tutorial for running this tools. ? Thanks. ? To: shamsher jagat <kanwarjag@gmail.com> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] copy number variation detcetion in Glaxay Hello, The tool "FreeBayes" may be of interest. Please see the tool form for links to the primary tool documentation to see if the functionality will meet your needs. Best, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org.? Please keep all replies on the list by using "reply all" in your mail client.? For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: ? http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: ? http://lists.bx.psu.edu/ URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20120816="" 52d7ec2c="" attachment-0001.html=""> Message: 13 Date: Thu, 16 Aug 2012 08:48:40 -0700 To: "Du, Jianguang" <jiandu@iupui.edu> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: Re: [galaxy-user] Do I need to allow indel search? Message-ID: <502D1658.1030801@bx.psu.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello Jianguang, In simple terms, "No" produces a strict alignment and "Yes" produces a more permissive alignment. The option 'Allow indel search:' is a way of allowing for variability in your data (presumably biologically valid) to not be interpreted as mismatches or gaps. Mismatches/gaps in an alignment lower the overall score and can lead to alignment failures. The default for this parameter is "Yes" with value of 3 for insert/deletion length in Galaxy (allowing for simple nucleotide polymorphism variability up to a single codon, per position, in either the query or target). All values can be modified. If this interferes with your data mapping accurately, then it could be disabled by setting the parameter to "No". A test comparing the two alternatives on a sample would be a good way to see how this single change affects your particular sample. Good questions to ask: What reads do not map when the stricter alignment rules are applied? Do any reads map with a change in specificity? Do you agree with the results? Hopefully this helps! Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org _______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user End of galaxy-user Digest, Vol 74, Issue 15 *******************************************

ADD COMMENT • link written 6.3 years ago by De Kumar, Bony • 20

5.4 years ago by

Batool Akhtar-Zaidi • 10

Batool Akhtar-Zaidi • 10 wrote:

Please unsubscribe Sent from my iPhone On Jun 28, 2013, at 12:01 PM, "galaxy-user-request@lists.bx.psu.edu"

ADD COMMENT • link written 5.4 years ago by Batool Akhtar-Zaidi • 10

5.3 years ago by

Jun Fan • 50

Jun Fan • 50 wrote:

Hi Jen, Maybe I have not explained my question cleared which caused the confusion: I did not mean the name of dataset. Actually I asked about the display of the tool in the graphic view from the default value (guess it is the name attribute in the tool element of the wrapper) to something else in the workflow editor. As my ultimate purpose is to share my workflow with someone else. If they see three steps with the same displayed name and do not have the related knowledge, they will get lost. To help myself to explain well, here is the illustration: | mzidLib:PostProcessing| | mzidLib:PostProcessing| | mzidLib:PostProcessing| |-----------------------------| |-----------------------------| |-----------------------------| | input file | ---------->| input file | |-----------------------------| | |-----------------------------| | |-----------------------------| |output (mzid) |----- |output (mzid) |----- |output (mzid) | |-----------------------------| To | mzidLib:PostProcessing FDR| | mzidLib:PostProcessing Threshold| | mzidLib:PostProcessing ProteoGroup| |----------------------------------| |------------------------------------------| |----------------------------------------------| | input file | ---------->| input file | ------------>| input file | |----------------------------------| | |------------------------------------------| | |----------------------------------------------| |output (mzid) |----- |output (mzid) |----- |output (mzid) | |------------------------------------------| |----------------------------------------------| Best regards! Jun Date: Tue, 20 Aug 2013 11:34:47 -0700 To: Jun Fan <j.fan@qmul.ac.uk> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] customize tool display in the workflow Message-ID: <5213B6C7.5030800@bx.psu.edu> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Hi Jun, You are very close - just click on the "create" button within the "Edit Step Actions" box and it will expand, where you can then enter the new custom name. It will look something like this: The other post you are referring to is a method to name the output dataset based on the input datasets name. You can certainly try this out and see if it is useful. Hope this helps, Jen Galaxy team trick. -- Jennifer Hillman-Jackson http://galaxyproject.org URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20130820="" 3406f040="" attachment-0001.html=""> Name: fdjfidfg.png Type: image/png Size: 41713 bytes Desc: not available URL: <http: lists.bx.psu.edu="" pipermail="" galaxy-="" user="" attachments="" 20130820="" 3406f040="" attachment-0001.png="">

ADD COMMENT • link written 5.3 years ago by Jun Fan • 50

Hi Jun, The name of the tool itself cannot be modified, but there are workflow annotations that can help to communicate usage instructions to those you share it with. Specifically, 1) where each specific input should be selected and 2) what is happening at each step. For 1: where each specific input should be selected Unique to input datasets, the "name" field can be used. This will be displayed when a workflow is run, underneath the tool name, but above the pull down menu of datasets (for that step). Try a custom name here, save, then click run (all from the workflow menu) to see how the display works. In the workflow editor, this is where the input "name" attribute is located: For 2: what is happening at each step There is an annotation/notes box for all tools (including the input tool) where more help or comments can be entered. This is not displayed at run time, but when a workflow's structure is "viewed". You can find examples of workflows with this sort of annotation in the "Shared Data -> Published Workflows" area, or test out and see if you find it helpful. This is where it is located in the workflow editor: How to "view" a workflow: Click on a workflow's button on just about any Galaxy web page that it is displayed on. For example, "Shared Data -> Published Workflows", click on the workflow at the top of the list. Right now it is "metagenomic analysis" and this one has annotation (the next two do as well). When at "Your workflows" or "Workflows shared with you by others", click on the down arrow to the right side of the workflow name within the button, and "View" will be one of the options. I realize this is not exactly what you wanted originally, but hopefully these options will help you to communicate what you need to, Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org

ADD REPLY • link written 5.3 years ago by Jennifer Hillman Jackson ♦ 25k

Hi Jun, Thanks for the clarification. Workflow step annotations would best suit your needs, I think. When you're in the editor, click on any step and on the right side you'll see a field "Annotation / Notes" which will be displayed when the workflow is viewed. This can be used to clear up any confusion about what the individual step does. Good luck! Dannon

ADD REPLY • link written 5.3 years ago by Dannon Baker • 270

5.1 years ago by

Cory Dunn • 40

Turkey

Cory Dunn • 40 wrote:

Dear Graham, Thanks for the info. However, my problem is that the "Tool Version" field is completely empty in my history items (eg. Tophat2, Cuffdiff). I suppose I can check the dependancies list you described, but it would be important to know precisely which version was run on any given query. Best regards, Cory Message: 3

ADD COMMENT • link written 5.1 years ago by Cory Dunn • 40

If you ran Cuffdiff in the last couple months, you used version 2.1.1 ; before that it was 1.3.x The version information was added in the last couple weeks, which is why you don't see it. Any runs going forward should include the version. J.

ADD REPLY • link written 5.1 years ago by Jeremy Goecks • 2.2k

4.8 years ago by

Gang Wang • 40

Gang Wang • 40 wrote:

Hi, I just notice that NGS FASTQ Trimmer<https: usegalaxy.org="" tool_runner?tool_id="toolshed.g2.bx.psu.e" du%2frepos%2fdevteam%2ffastq_trimmer%2ffastq_trimmer%2f1.0.0="">by column can't detect the fastq file I loaded. Anyone knows why. thanks a lot. -- Gang Wang Ph.D. student Veterinary Integrative Bioscience College of Veterinary Medicine & Biomedical Sciences Texas A&M University College Station, TX 77843-4458

ADD COMMENT • link written 4.8 years ago by Gang Wang • 40

Hi Wang, please check if your fastq file is associated with the correct fastq format, fastqsanger probably. Cheers, Bjoern

ADD REPLY • link written 4.8 years ago by Bjoern Gruening ♦ 5.1k

The fastq format is correct, when i use FASTQ to FASTA<https: usegalaxy.org="" tool_runner?tool_id="toolshed.g2.bx.psu.edu" %2frepos%2fdevteam%2ffastqtofasta%2ffastq_to_fasta_python%2f1.0.0="">conv erter, the fastq file can be detected. -- Gang Wang Ph.D. student Veterinary Integrative Bioscience College of Veterinary Medicine & Biomedical Sciences Texas A&M University College Station, TX 77843-4458

ADD REPLY • link written 4.8 years ago by Gang Wang • 40

Hi Wang, The assigned format is the issue, as Björn replied. The two tools you are referencing to have different criteria for the input fastq data - one is more stringent than the other with regard to /_content_/. Both are fastq /_format_/, but the one that requires the more specific fastqsanger datatype assignment also has a dependency on the _scaling of the quality scores_. There are two histories in your account, and these will work as good examples. The larger one has a partial RNA-seq workflow - including trimming. Note that the "format" datatype assignment is "fastqsanger" for the input to these jobs. This is the model you want to follow for most analysis in Galaxy. In the other history are just a few datasets, one with format as "fastq". This is what you need to change so that it becomes "fastqsanger". In your case, the dataset has quality scores are already scaled to be in Sanger Phred with an ASCII offset of 33 - what is labeled in Galaxy as "fastqsanger", so it can be directly assigned to the datatype. Click on the pencil icon for the dataset, then the 'datatype' tab, choose 'fastqsanger', then remember to _/save/_. It will now appear as input to the NGS: QC and manipulation tools that were previously blocked. More about datasets and dataypes is here, including how to assess original quality score scaling and modify it if needed. Direct assignment of datatype is not always appropriate, and there are important tools (like 'FastQC') that are of great utility when deciding how to prep data. https://wiki.galaxyproject.org/Support#Dataset_special_cases https://wiki.galaxyproject.org/Learn/Managing%20Datasets#Dataset_Icons _.26_Text Hopefully this helps to clear up any confusion! Thanks, Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org

ADD REPLY • link written 4.8 years ago by Jennifer Hillman Jackson ♦ 25k

By the way, can you use detect the fastq file using trimmer by column with your PC. Thanks a lot. Gang -- Gang Wang Ph.D. student Veterinary Integrative Bioscience College of Veterinary Medicine & Biomedical Sciences Texas A&M University College Station, TX 77843-4458

ADD REPLY • link written 4.8 years ago by Gang Wang • 40

Hi Gang, I can't comment about a PC (I use a MAC), but this shouldn't matter. If you are using a local Galaxy (http://getgalaxy.org http://usegalaxy.org/toolshed), most tools will follow the same usage rules as when implemented on the public Main Galaxy instance at http://usegalaxy.org, the Galaxy CloudMan ami, and the bulk of other public Galaxy instances, at least for common tools. Exceptions would be if the tool(s) were specifically modified to function differently. Galaxy is open source and variations exist. The tool form of any alternate version (of a tool named the same) will likely note what modifications have been made. Plus you can always contact the other public instance hosts, if you have usage problems. Best! Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org

ADD REPLY • link written 4.8 years ago by Jennifer Hillman Jackson ♦ 25k

Hi Jennifer, I reload my data with defined format type fastqsanger and it works now. Thank you so much for your help. Gang -- Gang Wang Ph.D. student Veterinary Integrative Bioscience College of Veterinary Medicine & Biomedical Sciences Texas A&M University College Station, TX 77843-4458

ADD REPLY • link written 4.8 years ago by Gang Wang • 40

Please log in to add an answer.

Similar posts • Search »