Question: Re: [Galaxy-Bugs] Galaxy Tool Error Report From Azovoil@Gwdg.De
0
gravatar for Jeremy Goecks
8.4 years ago by
Jeremy Goecks2.2k
Jeremy Goecks2.2k wrote:
[cc'ing galaxy-user as this may be useful information to other users.] Hi Zovoilis, I was mistaken in writing that Galaxy cannot sort SAM files. Galaxy can sort SAM files; here is an example history that illustrates the sorting of a SAM file: http://test.g2.bx.psu.edu/u/jeremy.goecks/h/sorting-sam-file-for- cufflinks There are two steps in sorting a SAM file: (1) removing SAM headers via the 'remove lines from beginning of a file' tool and (2) using the 'sort data in ascending or descending order' tool to sort by chromosome name and then by position. Samtools provides a single-step process for sorting SAM files, and we expect to add this capability to Galaxy in the future so that the above 2-step process can be replaced by this single step. Best, J.
rna-seq cufflinks samtools bam • 1.1k views
ADD COMMENTlink modified 7.8 years ago by Ross Lazarus30 • written 8.4 years ago by Jeremy Goecks2.2k
0
gravatar for Jennifer Hillman Jackson
8.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Nripesh, Use "NGS: SAM Tools -> SAM-to-BAM". This will create a new BAM data history item. Hope this helps! Jen Galaxy team ps. For new data/usage questions, it would be great for us if you could send them to the mailing list galaxy-user@bx.psu.edu. We like to publish answers there for other all to learn from. -- Jennifer Jackson http://usegalaxy.org
ADD COMMENTlink written 8.1 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
8.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Nripesh, The reference file is the source genome or any other fasta file that your data is derived from (could be custom). It is the "reference" sequence that you will be using for mapping. There are a few thing to try: If your data will be mapped to a genome already in Galaxy, then use the pencil icon (for the SAM history item) and alter the attributes to assign a genome. Next, use the same genome as the reference when running SAM->BAM. Please note that not all genomes are indexed for use by SAM tools. If your genome is not here, we are open to requests to add more, if the data is in our main genome list or publicly available from a stable source. Please be specific for requests - exact genome name as we use it, or a link to NCBI, or a link to another public data source is preferred. If your data is custom, the database can remain undefined (will display as a "?"). Load your custom fasta genome/sequence into your history, if not already there. Then when running SAM->BAM, use the option "locally cashed" and set the reference to be that loaded custom fasta file. Hopefully this helps to resolve the issue. But, if you continue to have problems, please feel free to share your history and we can take a closer look. To do this, at the top of the history pane (right): Options -> Share or Publish -> Make History Accessible via Link and email to me. Thanks! Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org
ADD COMMENTlink written 8.1 years ago by Jennifer Hillman Jackson25k
0
gravatar for Ross Lazarus
7.8 years ago by
Ross Lazarus30
Ross Lazarus30 wrote:
Hi, Taka, I noticed that the full manhattan plot looks odd in the history I have shared with you, and I think it's because the offsets for some of your snp are wrong. For example, the very last marker in chr1 in your data is rs11488669. In your data, the offset is 2147483647 which is way beyond the end of chr1 - the genome is only 3B base pairs - so the manhatten plot looks clumpy instead of uniform. According to genome.ucsc.edu it is at chr1:153517269-153517769 I'm going to guess that your data (eg the map file) has at some stage been changed using spreadsheet software such as excel which can easily do strange things to numeric columns. If all your processing is inside Galaxy, these kinds of errors can be prevented. I can see you have tried unsuccessfully to upload some plink lped files in the history you shared - here's some information that might help you from a previous enquiry on galaxy-user a few weeks ago: ============================================== Hi, Sylvian, The plink/rgenetics lped and pbed (compressed) formats are special 'composite' Galaxy datatypes because the map and pedigree/genotype files need to be kept together correctly inside Galaxy. As a result, the upload tool requires that the file type be specified so all of the components can be properly uploaded and stored together. For example, to upload pbed data from your local desktop, choose 'Upload file' from the Get Data tools. When the upload form appears, the trick is that you *must* change the default 'Autodetect' in the first (filetype) select box to the specific rgenetics datatype - either 'pbed' as the format for compressed plink data (or 'lped' for uncompressed plink genotype data) as the very first step. Type the first few letters into the first box, and select the right one from the list that appears. Once this is done, you will see that the upload tool form will change to show three separate file upload inputs - one each for the plink xxx.bim xxx.bed and xxx.fam where xxx is the name you set when you ran plink to create the files, or for uncompressed linkage format two separate file upload inputs - the plink .ped and .map files. Now you can browse for the corresponding file for each input box from your local machine - be careful not to mix them up as the upload tool is unable to tell unfortunately. At the bottom of the form, I suggest you then change the genome build to the appropriate one (eg hg18 or hg19). Finally, I'd recommend that you change the 'metadata value for basename' (which will be the new dataset name) to something that will remind you what the data are - something more meaningful than the default 'rgenetics'. Click 'execute' to upload the data and create the new dataset in your history. Compressed (pbed) format is preferred so the upload is quicker. Note that some tools will autoconvert between lped and pbed so there is a delay the first time some tools are run on a new dataset. There are built in converters (use the pencil icon) also if you need them. I hope this helps - thanks for using Galaxy and Rgenetics - please let us know how you go and feel free to contact me if you have other questions. On Fri, Feb 18, 2011 at 9:26 AM, Ross Lazarus -- Ross Lazarus MBBS MPH Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory; 181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI;  PO Box 6492, St Kilda Rd Central; Melbourne, VIC 8008, Australia; Tel: +61 385321444
ADD COMMENTlink written 7.8 years ago by Ross Lazarus30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour