Question: Bed To Bam Conversion In Galaxy
0
gravatar for shamsher jagat
7.2 years ago by
United States
shamsher jagat590 wrote:
Is it possible to use some tool in Galaxy to convert BED file to Bam/ sam file. In other word do we have Bed tools or other option in Galaxy Thanks
galaxy • 2.9k views
ADD COMMENTlink modified 7.2 years ago by Jennifer Hillman Jackson25k • written 7.2 years ago by shamsher jagat590
0
gravatar for Jennifer Hillman Jackson
7.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello, It is possible to go from SAM/BAM to BED, but not the reverse. SAM/BAM files contain the actual sequence data associated with the original aligned read. BED files only have the reference genome location of the alignment (no read "sequence"). It is possible to extract genomic sequence based on BED coordinates, but the resulting sequence would not necessarily be the same sequence as in the original aligned read (any variation would be lost). BED is very similar to Interval format, so Interval tools also work with BED format. A BED file is basically a 3-12 column, tab delimited file, so tools that work with Tabular data are also appropriate for BED file. Note that you may need to change the datatype to be interval or tab for certain tools to recognize a BED file as an input. Hopefully this helps, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD COMMENTlink written 7.2 years ago by Jennifer Hillman Jackson25k
Thanks Jen, My problem is I have ChIP-seq data where I have one Bed file with coordinates- chr1 724027 724226 61PDWAAXX100706:4:19:6952:18071 - Then there is wig file.? Is it possible that thsi data can be analyzed in Galaxy/ Cistrome. I tried to use Cistrome which gav eme error message. Thanks
ADD REPLYlink written 7.2 years ago by shamsher jagat590
Hello, The format of the BED file may be a problem. To be in BED format, an additional field is required for the "score" attribute. This would be column 5, moving the strand out to column 6. To do this: 1 - use "Text Manipulation->Add column" with the value "0" note: "0" often is used to represent a NULL or undefined score value in BED files. This field cannot be left as whitespace (two tabs), a placeholder value must be present. 2 - then use ""Text Manipulation->Cut" and cut out the columns in the proper BED file order, in this case "c1,c2,c3,c4,c6,c5", to swap the last two 3 - change datatype to BED using the pencil icon/Edit attributes form In Galaxy, many of the tools in "NGS: Peak Calling" will work with ChIP-seq data in BED format. Having a control would be helpful, but is not required by all tools. Good luck with your project, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD REPLYlink written 7.2 years ago by Jennifer Hillman Jackson25k
Jen,   I ran the flow as you suggested, but got following error message, Do You hav eany suggestion? I added 0 and flips the columns: Here is the few lines of input file: chr1 12137 12336 61R33AAXX100706:1:79:7707:9270 0 - chr1 31542 31741 61R33AAXX100706:1:37:11341:10600 1 - chr1 39921 40120 61R33AAXX100706:1:2:16103:17629 2 - chr1 93213 93412 61R33AAXX100706:1:113:14396:2056 3 - chr1 109395 109594 61R33AAXX100706:1:13:8451:9619 4 - chr1 146854 147053 61R33AAXX100706:1:53:15558:13513 5 -Te error message is as followINFO @ Fri, 30 Sep 2011 17:59:54: # ARGUMENTS LIST: # name = macs_output # format = BED # ChIP-seq file = /data/CistromeAP/galaxy_database/files/000/198/dataset_198187.dat # control file = None # effective genome size = 2.79e+09 # band width = 300 # model fold = 10,30 # pvalue cutoff = 1.00e-05 # Small dataset will be scaled towards larger dataset. # Range for calculating regional lambda is: 10000 bps INFO @ Fri, 30 Sep 2011 17:59:54: #1 read tag files... INFO @ Fri, 30 Sep 2011 17:59:54: #1 read treatment tags... INFO @ Fri, 30 Sep 2011 18:00:02: 1000000 INFO @ Fri, 30 Sep 2011 18:00:11: 2000000 INFO @ Fri, 30 Sep 2011 18:00:21: 3000000 INFO @ Fri, 30 Sep 2011 18:00:30: 4000000 INFO @ Fri, 30 Sep 2011 18:00:39: 5000000 Traceback (most recent call last): File "/usr/local/bin/macs14", line 358, in <module> main() File "/usr/local/bin/macs14", line 60, in main (treat, control) = load_tag_files_options (options) File "/usr/local/bin/macs14", line 330, in load_tag_files_options treat = tp.build_fwtrack() File "/usr/lib/python2.6/dist-packages/MACS14/IO/Parser.py", line 150, in build_fwtrack (chromosome,fpos,strand) = self.__fw_parse_line(thisline) File "/usr/lib/python2.6/dist-packages/MACS14/IO/Parser.py", line 187, in __fw_parse_line raise StrandFormatError(thisline,thisfields[5]) MACS14.IO.Parser.StrandFormatError: 'Strand information can not be recognized in this line: "chr2\t121859840\t121860039\t61R33AAX\t.\t5837743","5837743"'  Thanks   Vasu Subject: Re: [galaxy-user] BED to BAM conversion in Galaxy To: "shamsher jagat" <kanwarjag@gmail.com> Cc: galaxy-user@bx.psu.edu Date: Friday, September 30, 2011, 9:08 AM Hello, The format of the BED file may be a problem. To be in BED format, an additional field is required for the "score" attribute. This would be column 5, moving the strand out to column 6. To do this: 1 - use "Text Manipulation->Add column" with the value "0" note: "0" often is used to represent a NULL or undefined score value in BED files. This field cannot be left as whitespace (two tabs), a placeholder value must be present. 2 - then use ""Text Manipulation->Cut" and cut out the columns in the proper BED file order, in this case "c1,c2,c3,c4,c6,c5", to swap the last two 3 - change datatype to BED using the pencil icon/Edit attributes form In Galaxy, many of the tools in "NGS: Peak Calling" will work with ChIP-seq data in BED format. Having a control would be helpful, but is not required by all tools. Good luck with your project, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org.  Please keep all replies on the list by using "reply all" in your mail client.  For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:   http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at:   http://lists.bx.psu.edu/
ADD REPLYlink written 7.2 years ago by vasu punj360
Hello Vasu, The score value should be "0" for each row. When adding the new column, set "Iterate?:" to the default "no". It also looks like there may be some inconsistencies in the original file. Are you certain it is 5 columns (exactly) for every row? Including the error row reported? Some detective work to get the file in the right format is probably necessary. Tabs are good to check. Change the filetype to tabular, run the file through the "Convert delimiters to TAB" tool using "Convert all: whitespace", next run "Condense consecutive characters" to cleanup any trailing tabs, then change the filetype back to BED and assign the score column on the "Edit Attributes" form (pencil icon). Hopefully this helps, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD REPLYlink written 7.2 years ago by Jennifer Hillman Jackson25k
This is what I followed: 1. Upload the Bed file (60) > Text manipulation Add column –add this value 0; iterate –no will give file 73 2. 73 > Txt manipulation – cut > c1,c2,c3,c4,c6,c5 and delimited by tab- give file 74 3. 74> pencil icon> change data type – tabular – file 74 4. Txt manipulation- Convert all white spaces to tab – 75 5. *Condense consecutive characters- don’t find this option- I am using Dev. Galaxy version Is it somehow possible this option in develop option* 6. Change file type – BED file 75 7. Pencil> edit attribute col 5 for score- file 75 8. Run MACS from NGS peak calling- I have shared my history with you please (http://test.g2.bx.psu.edu/root) How we can annotate the genes corresponding to peaks. Thanks
ADD REPLYlink written 7.2 years ago by shamsher jagat590
Now when I run the same files in the main Galaxy server it gave me following errors, Do you have any suggestion how these same files will be working ion Develop server but not on main server using same steps. INFO @ Tue, 04 Oct 2011 14:56:21: # ARGUMENTS LIST: # name = MACS_in_Galaxy # format = BED # ChIP-seq file = /galaxy/main_database/files/003/068/dataset_3068865.dat # control file = /galaxy/main_database/files/003/068/dataset_3068668.dat # effective genome size = 2.70e+09 # tag size = 25 # band width = 300 # model fold = 30 # pvalue cutoff = 5.00e-02 # Ranges for calculating regional lambda are : peak_region,1000,5000,10000 INFO @ Tue, 04 Oct 2011 14:56:21: #1 read tag files... INFO @ Tue, 04 Oct 2011 14:56:21: #1 read treatment tags... INFO @ Tue, 04 Oct 2011 14:56:32: 1000000 INFO @ Tue, 04 Oct 2011 14:56:44: 2000000 INFO @ Tue, 04 Oct 2011 14:56:55: 3000000 INFO @ Tue, 04 Oct 2011 14:57:06: 4000000 INFO @ Tue, 04 Oct 2011 14:57:19: 5000000 INFO @ Tue, 04 Oct 2011 14:57:30: 6000000 INFO @ Tue, 04 Oct 2011 14:57:41: 7000000 INFO @ Tue, 04 Oct 2011 14:57:52: 8000000 INFO @ Tue, 04 Oct 2011 14:58:03: 9000000 INFO @ Tue, 04 Oct 2011 14:58:15: 10000000 INFO @ Tue, 04 Oct 2011 14:58:26: 11000000 INFO @ Tue, 04 Oct 2011 14:58:37: 12000000 INFO @ Tue, 04 Oct 2011 14:58:49: #1.2 read input tags... Traceback (most recent call last): File "/home/g2main/linux2.6-x86_64/bin/macs", line 273, in main() File "/home/g2main/linux2.6-x86_64/bin/macs", line 57, in main (treat, control) = load_tag_files_options (options) File "/home/g2main/linux2.6-x86_64/bin/macs", line 256, in load_tag_files_options control = options.build(open2(options.cfile, gzip_flag=options.gzip_flag)) File "/home/g2main/linux2.6-x86_64/lib/python2.6/MACS/IO/__init__.py", line 1063, in build_fwtrack (chromosome,fpos,strand) = self.__fw_parse_line(thisline) File "/home/g2main/linux2.6-x86_64/lib/python2.6/MACS/IO/__init__.py", line 1102, in __fw_parse_line raise self.StrandFormatError(thisline,thisfields[5]) MACS.IO.StrandFormatError: 'Strand information can not be recognized in this line: "chr1\t10093\t10093\t10292\t61PDWAAXX100706:4:82:5766:21319 I can share this history if required please. Thanks.
ADD REPLYlink written 7.2 years ago by shamsher jagat590
Hello, The last line of the report you sent suggests that the file has format problems. I read through your other email and noticed that a few steps were inserted, perhaps because they were needed. However, the line noted here is not in BED format: c1 chrom = chr1 c2 start = 10093 c3 end = 10093 c4 name = 10292 c5 score = 61PDWAAXX100706:4:82:5766:21319 c6 strand = no data A description of BED format can be found at http://usegalaxy.org -> "Get Data -> Upload File" (scroll down to BED) or on most tool forms that uses a BED file, such as "Convert Formats -> BED-to-GFF". A few guidelines (subject to amendment by UCSC readers!): 1 - start is 0-based 2 - start is always a smaller number than end, as coordinates are reported with respect to the forward strand. Start and stop are never the same value. 3 - score is a value between 0-1000, where 0 means undefined. 4 - strand can be "+", "-", or ".", where the "." means undefined. 5 - BED files have to be at least 3 columns, but can have up to 15. Any column used must have all proceeding columns defined, columns 7-12 are usually considered interdependent by the tools that use that data, as are columns 13-15 (newer spec, for microarray data). 6 - BED files often are BED3-6, BED12 or BED15. 7 - some older tools will not recognize columns 13-15 as being "strict BED" format. Once the data is sorted out, if you continue to have problems, please send in a bug report from an error dataset and note in the comments that the bug is from you, if the account email address is different. Please be sure to leave all input datasets and the error dataset in the history until we can examine and provide feedback. Thanks, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD REPLYlink written 7.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour