Question: Merge Tool And Strand Orientation
0
Eckart Bindewald • 30 wrote:
Hello:
let me start by saying that I am very impressed by the service the
Galaxy web server provides to the community; it has proven very useful
for my work.
Today I came across a situation that puzzles me. I am trying to merge
exons corresponding to the same gene (but possibly from different
splice variants).
At the bottom of this email I am listing, as an example, the 153 exons
that are related to the different splice variants of FlyBase gene
CG32491 (obtained by the pattern matching (tool "Select lines that
match an expression" and pattern .+CG32491-. ) applied to the data
set of FlyBaseGene exons (110,472 exons, genome assembly dm3). I am
using bed format and the general Galaxy web server.
If I now apply the "Merge" tool to the intervals, I obtain 26
intervals (listed further below). Now applying the "subtract" tool to
the original 153 exons results in 8 "leftover" regions that I did not
expect. Somehow they seem to be missing in the merge result.
I then deactivated the strand information in the interval set of 153
exons. Applying the merge tool now results in 34 intervals (again
listed below). Checking the result via the subtract tool (subtracting
the merge result from the original data set of 153 exons) results, as
expected, in zero intervals.
So my questions are:
- is this the intended functionality of the tools? Maybe one can add
statements regarding these issues in the tool documentation.
- why does the outcome of the merge operation depend on whether the
"strand" column is set or not? The original set of intervals all had
the same negative strand orientation, so it appears to me that the
merge operation should give the same result in both cases.
- subtracting the merged intervals (that do not have strand
information) from the set of 153 intervals results in 8 strands that
now have positive strand orientation (they originally had negative
strand orientation). Why does subtracting a set of intervals without
strand information from a set of intervals with strand information
change the strand orientation of the first set?
Any comments are highly appreciated!
Thanks,
Eckart
Dr. Eckart Bindewald (Contractor)
SAIC-Frederick, Inc.
Center for Cancer Research Nanobiology Program
National Cancer Institute
P.O. Box B
Frederick, MD 21702 USA
Phone: 301-846-5538
Fax: 301-846-5598
E-mail: eckart@mail.nih.gov
Here is the result (34 regions) of the merge operation (not using
strand orientation) applied to the 153 exon regions listed further
below ;
chr3R 17177330 17177608
chr3R 17177760 17178959
chr3R 17179070 17179456
chr3R 17179617 17180053
chr3R 17180159 17180416
chr3R 17180695 17181279
chr3R 17181479 17181973
chr3R 17182071 17182426
chr3R 17182532 17182690
chr3R 17182776 17183086
chr3R 17183242 17183480
chr3R 17183726 17183926
chr3R 17184011 17184791
chr3R 17186111 17186276
chr3R 17186349 17187009
chr3R 17187119 17187332
chr3R 17187391 17187860
chr3R 17187909 17188590
chr3R 17188688 17189606
chr3R 17189739 17190097
chr3R 17190173 17190367
chr3R 17190435 17190714
chr3R 17191725 17192060
chr3R 17192171 17192466
chr3R 17193631 17193960
chr3R 17194101 17194784
chr3R 17195183 17196364
chr3R 17196654 17196949
chr3R 17197044 17197789
chr3R 17197884 17198802
chr3R 17200781 17201634
chr3R 17202323 17202463
chr3R 17202540 17202798
chr3R 17203009 17203121
Here is the result (26 regions) of the merge operation (using strand
orientation) applied to the 153 exon regions listed further below ;
chr3R 17177330 17177608
chr3R 17177760 17178959
chr3R 17179070 17179456
chr3R 17179617 17180053
chr3R 17180159 17180416
chr3R 17180695 17181279
chr3R 17181479 17181973
chr3R 17182071 17182426
chr3R 17182532 17182690
chr3R 17182776 17183086
chr3R 17183242 17183480
chr3R 17183726 17183926
chr3R 17184011 17184791
chr3R 17187909 17188590
chr3R 17188688 17189606
chr3R 17189739 17190097
chr3R 17190173 17190367
chr3R 17190435 17190714
chr3R 17195821 17196364
chr3R 17196654 17196949
chr3R 17197044 17197789
chr3R 17197884 17198802
chr3R 17200781 17201634
chr3R 17202323 17202463
chr3R 17202540 17202798
chr3R 17203009 17203121
Here are the 8 "leftover" regions from the original 153 exons that do
not intersect with the result of the 26 merged regions (result of
subtract tool of 153 exons that do not overlap with 26 merged exons;
note the change strand orientation):
chr3R 17186111 17186276
CG32491-RT_exon_0_0_chr3R_17186112_f 0 +
chr3R 17186349 17187009
CG32491-RT_exon_1_0_chr3R_17186350_f 0 +
chr3R 17187119 17187332
CG32491-RZ_exon_0_0_chr3R_17187120_f 0 +
chr3R 17187391 17187860
CG32491-RZ_exon_1_0_chr3R_17187392_f 0 +
chr3R 17191725 17192060
CG32491-RY_exon_0_0_chr3R_17191726_f 0 +
chr3R 17192171 17192466
CG32491-RX_exon_0_0_chr3R_17192172_f 0 +
chr3R 17193631 17193960
CG32491-RW_exon_0_0_chr3R_17193632_f 0 +
chr3R 17194101 17194784
CG32491-RV_exon_0_0_chr3R_17194102_f 0 +
Here are the 153 exons related to FlyBase gene CG32491 obtained by the
pattern matching (tool "Select lines that match an expression" and
pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons
(110,472 exons):
chr3R 17177330 17177608
CG32491-RR_exon_0_0_chr3R_17177331_r 0 -
chr3R 17200781 17201634
CG32491-RR_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RR_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RR_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RR_exon_4_0_chr3R_17203010_r 0 -
chr3R 17177760 17178358
CG32491-RA_exon_0_0_chr3R_17177761_r 0 -
chr3R 17200781 17201634
CG32491-RA_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RA_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RA_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RA_exon_4_0_chr3R_17203010_r 0 -
chr3R 17178092 17178959
CG32491-RF_exon_0_0_chr3R_17178093_r 0 -
chr3R 17200781 17201634
CG32491-RF_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RF_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RF_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RF_exon_4_0_chr3R_17203010_r 0 -
chr3R 17179070 17179456
CG32491-RD_exon_0_0_chr3R_17179071_r 0 -
chr3R 17200781 17201634
CG32491-RD_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RD_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RD_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RD_exon_4_0_chr3R_17203010_r 0 -
chr3R 17179617 17180053
CG32491-RAC_exon_0_0_chr3R_17179618_r 0 -
chr3R 17200781 17201634
CG32491-RAC_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RAC_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RAC_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RAC_exon_4_0_chr3R_17203010_r 0 -
chr3R 17180159 17180416
CG32491-RG_exon_0_0_chr3R_17180160_r 0 -
chr3R 17180695 17180811
CG32491-RG_exon_1_0_chr3R_17180696_r 0 -
chr3R 17200781 17201634
CG32491-RG_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RG_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RG_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RG_exon_5_0_chr3R_17203010_r 0 -
chr3R 17180159 17180416
CG32491-RH_exon_0_0_chr3R_17180160_r 0 -
chr3R 17180695 17181279
CG32491-RH_exon_1_0_chr3R_17180696_r 0 -
chr3R 17200781 17201634
CG32491-RH_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RH_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RH_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RH_exon_5_0_chr3R_17203010_r 0 -
chr3R 17180159 17180416
CG32491-RQ_exon_0_0_chr3R_17180160_r 0 -
chr3R 17200781 17201634
CG32491-RQ_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RQ_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RQ_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RQ_exon_4_0_chr3R_17203010_r 0 -
chr3R 17180941 17181279
CG32491-RB_exon_0_0_chr3R_17180942_r 0 -
chr3R 17181479 17181973
CG32491-RB_exon_1_0_chr3R_17181480_r 0 -
chr3R 17200781 17201634
CG32491-RB_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RB_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RB_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RB_exon_5_0_chr3R_17203010_r 0 -
chr3R 17182071 17182426
CG32491-RI_exon_0_0_chr3R_17182072_r 0 -
chr3R 17182532 17182690
CG32491-RI_exon_1_0_chr3R_17182533_r 0 -
chr3R 17200781 17201634
CG32491-RI_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RI_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RI_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RI_exon_5_0_chr3R_17203010_r 0 -
chr3R 17182776 17183086
CG32491-RJ_exon_0_0_chr3R_17182777_r 0 -
chr3R 17200781 17201634
CG32491-RJ_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RJ_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RJ_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RJ_exon_4_0_chr3R_17203010_r 0 -
chr3R 17183242 17183480
CG32491-RP_exon_0_0_chr3R_17183243_r 0 -
chr3R 17183726 17183926
CG32491-RP_exon_1_0_chr3R_17183727_r 0 -
chr3R 17200781 17201634
CG32491-RP_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RP_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RP_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RP_exon_5_0_chr3R_17203010_r 0 -
chr3R 17184011 17184791
CG32491-RK_exon_0_0_chr3R_17184012_r 0 -
chr3R 17200781 17201634
CG32491-RK_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RK_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RK_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RK_exon_4_0_chr3R_17203010_r 0 -
chr3R 17184021 17184318
CG32491-RL_exon_0_0_chr3R_17184022_r 0 -
chr3R 17200781 17201634
CG32491-RL_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RL_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RL_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RL_exon_4_0_chr3R_17203010_r 0 -
chr3R 17186111 17186276
CG32491-RT_exon_0_0_chr3R_17186112_f 0 .
chr3R 17186349 17187009
CG32491-RT_exon_1_0_chr3R_17186350_f 0 .
chr3R 17200781 17201634
CG32491-RT_exon_2_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463
CG32491-RT_exon_3_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798
CG32491-RT_exon_4_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121
CG32491-RT_exon_5_0_chr3R_17203010_f 0 .
chr3R 17187119 17187332
CG32491-RZ_exon_0_0_chr3R_17187120_f 0 .
chr3R 17187391 17187860
CG32491-RZ_exon_1_0_chr3R_17187392_f 0 .
chr3R 17200781 17201634
CG32491-RZ_exon_2_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463
CG32491-RZ_exon_3_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798
CG32491-RZ_exon_4_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121
CG32491-RZ_exon_5_0_chr3R_17203010_f 0 .
chr3R 17187909 17188590
CG32491-RM_exon_0_0_chr3R_17187910_r 0 -
chr3R 17200781 17201634
CG32491-RM_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RM_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RM_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RM_exon_4_0_chr3R_17203010_r 0 -
chr3R 17188688 17189606
CG32491-RE_exon_0_0_chr3R_17188689_r 0 -
chr3R 17200781 17201634
CG32491-RE_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RE_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RE_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RE_exon_4_0_chr3R_17203010_r 0 -
chr3R 17189739 17190097
CG32491-RAB_exon_0_0_chr3R_17189740_r 0 -
chr3R 17200781 17201634
CG32491-RAB_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RAB_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RAB_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RAB_exon_4_0_chr3R_17203010_r 0 -
chr3R 17190173 17190367
CG32491-RC_exon_0_0_chr3R_17190174_r 0 -
chr3R 17190435 17190714
CG32491-RC_exon_1_0_chr3R_17190436_r 0 -
chr3R 17200781 17201634
CG32491-RC_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RC_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RC_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RC_exon_5_0_chr3R_17203010_r 0 -
chr3R 17191725 17192060
CG32491-RY_exon_0_0_chr3R_17191726_f 0 .
chr3R 17200781 17201634
CG32491-RY_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463
CG32491-RY_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798
CG32491-RY_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121
CG32491-RY_exon_4_0_chr3R_17203010_f 0 .
chr3R 17192171 17192466
CG32491-RX_exon_0_0_chr3R_17192172_f 0 .
chr3R 17200781 17201634
CG32491-RX_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463
CG32491-RX_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798
CG32491-RX_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121
CG32491-RX_exon_4_0_chr3R_17203010_f 0 .
chr3R 17193631 17193960
CG32491-RW_exon_0_0_chr3R_17193632_f 0 .
chr3R 17200781 17201634
CG32491-RW_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463
CG32491-RW_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798
CG32491-RW_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121
CG32491-RW_exon_4_0_chr3R_17203010_f 0 .
chr3R 17194101 17194784
CG32491-RV_exon_0_0_chr3R_17194102_f 0 .
chr3R 17200781 17201634
CG32491-RV_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463
CG32491-RV_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798
CG32491-RV_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121
CG32491-RV_exon_4_0_chr3R_17203010_f 0 .
chr3R 17195183 17195967
CG32491-RU_exon_0_0_chr3R_17195184_f 0 .
chr3R 17200781 17201634
CG32491-RU_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463
CG32491-RU_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798
CG32491-RU_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121
CG32491-RU_exon_4_0_chr3R_17203010_f 0 .
chr3R 17195821 17196364
CG32491-RS_exon_0_0_chr3R_17195822_r 0 -
chr3R 17200781 17201634
CG32491-RS_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RS_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RS_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RS_exon_4_0_chr3R_17203010_r 0 -
chr3R 17196654 17196949
CG32491-RAA_exon_0_0_chr3R_17196655_r 0 -
chr3R 17200781 17201634
CG32491-RAA_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RAA_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RAA_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RAA_exon_4_0_chr3R_17203010_r 0 -
chr3R 17197044 17197789
CG32491-RO_exon_0_0_chr3R_17197045_r 0 -
chr3R 17200781 17201634
CG32491-RO_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RO_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RO_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RO_exon_4_0_chr3R_17203010_r 0 -
chr3R 17197884 17198802
CG32491-RN_exon_0_0_chr3R_17197885_r 0 -
chr3R 17200781 17201634
CG32491-RN_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463
CG32491-RN_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798
CG32491-RN_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121
CG32491-RN_exon_4_0_chr3R_17203010_r 0 -
ADD COMMENT
• link
•
modified 8.0 years ago
by
Jennifer Hillman Jackson ♦ 25k
•
written
8.0 years ago by
Eckart Bindewald • 30