Question: Join On Genomic Interval
0
gravatar for Seth Kasowitz
7.3 years ago by
Seth Kasowitz10 wrote:
Hello, I am hoping for some clarification on how Join on Genomic Intervals functions. I have two lists of intervals: mapped reads and a list of exons. If I join the two (INNER JOIN), I expect multiple reads to join with the same exon, and see this in the output. What is confusing me is that some output has more joined intervals returned than were present in the input reads. For example: I join 17,000,000 mapped reads with a list of 300,000 exons and retrieve 21,000,000 joined intervals I must be misunderstanding what the function does, and am hoping someone can explain how the output can have more lines than the reads submitted. Thank you, Seth -- Seth Kasowitz University of Connecticut Department of Molecular and Cellular Biology seth.kasowitz@uconn.edu Beach Hall Room 335 (6-3580)
• 609 views
ADD COMMENTlink modified 7.3 years ago by Jennifer Hillman Jackson25k • written 7.3 years ago by Seth Kasowitz10
0
gravatar for Jennifer Hillman Jackson
7.3 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Seth, It sounds like there may be also reads joining with more than one exon - so there is a many-to-many relationship in the output. This would not be uncommon (especially if there are multiple reads per gene cluster) and would result in an input read being reported >1 time in the output. Depending on the data, separating the join into two by strand and/or increasing the overlap may be appropriate. FAQ/Screencast: http://wiki.g2.bx.psu.edu/Learn/Interval%20Operations Hopefully this helps, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD COMMENTlink written 7.3 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour