Question: Macs Input Format
0
gravatar for Steve Taylor
8.6 years ago by
Steve Taylor240
Steve Taylor240 wrote:
Hi, What format does MACS take as input within Galaxy? Thanks, Steve
macs • 1.2k views
ADD COMMENTlink modified 8.6 years ago by Daniel Blankenberg ♦♦ 1.7k • written 8.6 years ago by Steve Taylor240
0
gravatar for Daniel Blankenberg
8.6 years ago by
Daniel Blankenberg ♦♦ 1.7k
United States
Daniel Blankenberg ♦♦ 1.7k wrote:
Hi Steve, The latest version in the repository and on the test server accepts bed, sam, bam, eland and elandmulti for single-end and elandmulti only for paired-end. The version currently on the main public server accepts bed, sam, and bam for single-end with paired-end not functioning. The newest version of MACs will be available on the main public server the next time it is updated. Thanks for using Galaxy, Dan
ADD COMMENTlink written 8.6 years ago by Daniel Blankenberg ♦♦ 1.7k
Strange. On our local Galaxy instance I got ERROR:root:Format "SAM" cannot be recognized! and the bowtie (SAM) format was generated from within Galaxy. Any ideas why? Steve
ADD REPLYlink written 8.6 years ago by Steve Taylor240
Hi Steve, What version of MACs are you using? We have: $macs --version macs 1.3.7.1 (Oktoberfest, bug fixed #1) I tried a small SAM file (test-data/1.sam) on our test server and did not receive the error you listed. (No peaks were called due to the size of the file, but it seemed to recognize the format ok). Thanks, Dan
ADD REPLYlink written 8.6 years ago by Daniel Blankenberg ♦♦ 1.7k
Hi Daniel, Thanks for the tip. As a response to this mail we upgraded to this version. It looks like the job completes but on closer inspection we get: Messages from MACS: INFO @ Tue, 04 May 2010 09:23:37: # ARGUMENTS LIST: # name = MACS_in_Galaxy # format = SAM # ChIP-seq file = /wwwdata/galaxy- dist/database/files/000/dataset_229.dat # control file = None # effective genome size = 2.70e+09 # tag size = 25 # band width = 300 # model fold = 32 # pvalue cutoff = 1.00e-05 # Ranges for calculating regional lambda are : peak_region,1000,5000,10000 INFO @ Tue, 04 May 2010 09:23:37: #1 read tag files... INFO @ Tue, 04 May 2010 09:23:37: #1 read treatment tags... Traceback (most recent call last): File "/usr/local/pkgbin/macs", line 282, in ? main() File "/usr/local/pkgbin/macs", line 66, in main (treat, control) = load_tag_files_options (options) File "/usr/local/pkgbin/macs", line 261, in load_tag_files_options treat = options.build(open2(options.tfile, gzip_flag=options.gzip_flag)) File "/package/macs/1.3.7.1/lib/python2.5/site- packages/MACS/IO/__init__.py", line 1480, in build_fwtrack (chromosome,fpos,strand) = self.__fw_parse_line(thisline) File "/package/macs/1.3.7.1/lib/python2.5/site- packages/MACS/IO/__init__.py", line 1500, in __fw_parse_line bwflag = int(thisfields[1]) ValueError: invalid literal for int(): :8:1:316:468 It turns out the SAM file has got things like 'SRR015129.6 :6:1:909:23 length=36' in the first column that MACS doesn't like. The SAM files are from FASTQs from NCBIs SRA and have been processed in galaxy using FASTQ Groomer and then BOWTIE. For example, the SAM output: SRR015129.2 :6:1:236:897 length=36 4 * 0 0 * * 0 0 GTTGAGTATAGCCTTTTGTAGAAGGATGTGATGTTG IIIIIIIIIDI.+IIIIIIEI1+I+2I%I1+I.&5$ XM:i:1 SRR015129.1 :6:1:894:108 length=36 4 * 0 0 * * 0 0 GCTGCCGATCGCACAGATAAAGAAGCCTCAATTGGC I3II1I%II1&+>1+&(7III$I%%'6I0'&*992/ XM:i:0 SRR015129.6 :6:1:909:23 length=36 4 * 0 0 * * 0 0 GCTGCTTCTCTNNTTAGAATGNNNNNNNNNNNNNNN IIII;II1I=I!!III=IAIA!!!!!!!!!!!!!!! XM:i:0 SRR015129.4 0 chr16 8180060 255 36M * 0 0 GGTGTGTTTTTATGCCTCAACCTGAGGCAAAGGTTT IIIIIIIIIIIII>IIIIIIIII;?41D<>3;+III XA:i:0 MD:Z:36 NM:i:0 SRR015129.3 0 chr13 70318444 255 36M * 0 0 GAGATTGGTAGAGAGCATGTGGTTTTCATTATAAAT IIIIIIIIIIII.I;IIIIII:IIIII/(I2II:?I XA:i:0 MD:Z:28G7 NM:i:1 SRR015129.5 0 chr3 22775604 255 36M * 0 0 GGGCATGAAGTTATTTTCAGAGAGCTTTTACTGAAG IIIIBIIIIIIIIIIIIIII:I;AFIIII:5I154+ XA:i:0 MD:Z:36 NM:i:0 SRR015129.7 16 chr17 48330835 255 36M * 0 0 TAAATTGGGTGTGTGTCACAATAAAGTGTGTGTAAC -@I0//7@6);5I8II,I*IIIIIIII<iiiiiii9 xa:i:0="" md:z:36="" nm:i:0="" srr015129.9="" 16="" chr14="" 24322769="" 255="" 36m="" *="" 0="" 0="" agggcaacttctcaactctcaccttgaggtaaatcc="" idi9-ie1i::4giii:iiiiiiiiiiiiiiiiiii="" xa:i:0="" md:z:36="" nm:i:0="" srr015129.10="" 0="" chr8="" 55779424="" 255="" 36m="" *="" 0="" 0="" ggatcatccattggaacctggtgggatcaacagtgg="" iiiiiiiiiiii@ciiiii8ih,.09i63;03*f''="" xa:i:0="" md:z:36="" nm:i:0="" srr015129.8="" 16="" chr18="" 78388563="" 255="" 36m="" *="" 0="" 0="" atttgacctctttccttccccctctttcttttgcac="" iiii="9IIIIIIDIIIIIIIIIIIIGIIIIIIIIII" xa:i:0="" md:z:36="" nm:i:0="" srr015129.11="" 16="" chr12="" 89953903="" 255="" 36m="" *="" 0="" 0="" gtaaatgtatatatccatgcgcgtacataatcaagc="" igiii="" a="I">III5?;III=I=ICIIIIIIIIIIII XA:i:0 MD:Z:36 NM:i:0 SRR015129.15 :6:1:877:106 length=36 4 * 0 0 * * 0 0 GGTTGGCTAGGTTTCCAGTACCAGGTATAATTTCCC IIIIIIIIIIIIIIIII?IIIIII4IIIIIIII=<. XM:i:1 Maybe FASTQ Groomer should remove all spaces in a header to avoid this? It's tricky to say which tool is actually to 'blame' in this case (but not Galaxy!) :-). Simple to fix on the command line but has Galaxy got a search and replace function for users that encounter such problems? Steve
ADD REPLYlink written 8.6 years ago by Steve Taylor240
Hi Steve, It does look like MACS is splitting on spaces and tabs. Spaces are fine for use in FASTQ headers. To solve this problem in Galaxy, in the mean time, you can use the FASTQ manipulation tool, found under the Generic FASTQ tools to translate spaces in the header to underscores. Select the FASTQ file with spaces in the header as the input, click 'Add new manipulate Reads', set 'Manipulate Reads on' to 'Name/Identifier' and 'Identifier Manipulation Type' to 'String Translate' put a space in 'From' and an underscore in 'To' and click execute. Due to the nature of this tool, it is usually recommended that the output be re-Groomed. Thanks for using Galaxy, Dan
ADD REPLYlink written 8.6 years ago by Daniel Blankenberg ♦♦ 1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 174 users visited in the last hour