Question: Merging Intervals Within Genes?
0
gravatar for Denis BAURAIN
10.1 years ago by
Denis BAURAIN10 wrote:
Hello, I use the remote version of Galaxy to extract aligned blocks of 3'-UTRs. For performance reasons, I would like to merge the 3'-UTR exons that overlap before fetching MAF blocks. The 'merge overlapping intervals' option is nearly what I need except that it discards any id information attached to the exons (see below). Therefore, I wonder if it would be difficult to implement a 'gene- aware' version of the merge operation. In particular, it should not try to merge overlapping exons from overlapping genes (e.g., one on each strand). I have my own implementation of this in Perl, but it is a bit tedious to export and re-import interval files just to perform this 'compression'. # input: chr10 100133312 100133544 - ENSG00000119943 chr10 100165944 100167310 - ENSG00000107521 chr10 100165945 100167310 - ENSG00000107521 chr10 100166796 100167310 - ENSG00000107521 chr10 100208864 100209320 - ENSG00000172987 chr10 100208866 100209320 - ENSG00000172987 chr10 100209320 100209486 - ENSG00000172987 chr10 100211496 100211532 - ENSG00000172987 ... chr10 12235787 12235825 + ENSG00000065665 chr10 12237333 12237409 + ENSG00000065665 chr10 12246459 12246832 + ENSG00000065665 chr10 12246459 12247368 + ENSG00000065665 chr10 12251332 12251962 + ENSG00000065665 chr10 12248507 12248728 - ENSG00000165609 chr10 12249580 12249706 - ENSG00000165609 chr10 12249581 12249706 - ENSG00000165609 chr10 12251726 12252150 - ENSG00000165609 chr10 12252236 12252857 - ENSG00000165609 ... # current output: chr10 100133312 100133544 chr10 100165944 100167310 chr10 100208864 100209486 chr10 100211496 100211532 ... chr10 12235787 12235825 chr10 12237333 12237409 chr10 12246459 12247368 chr10 12248507 12248728 chr10 12249580 12249706 chr10 12251332 12252150 chr10 12252236 12252857 ... # desired output: chr10 100133312 100133544 - ENSG00000119943 chr10 100165944 100167310 - ENSG00000107521 chr10 100208864 100209486 - ENSG00000172987 chr10 100211496 100211532 - ENSG00000172987 ... chr10 12235787 12235825 + ENSG00000065665 chr10 12237333 12237409 + ENSG00000065665 chr10 12246459 12247368 + ENSG00000065665 chr10 12251332 12251962 + ENSG00000065665 chr10 12248507 12248728 - ENSG00000165609 chr10 12249580 12249706 - ENSG00000165609 chr10 12251726 12252150 - ENSG00000165609 chr10 12252236 12252857 - ENSG00000165609 ... Best regards, Denis BAURAIN
galaxy • 663 views
ADD COMMENTlink written 10.1 years ago by Denis BAURAIN10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour