I have 17 genomic datasets. I filtered the data to include chromosome number and starting number into one column for each dataset. Then I used "Compare two datasets" to compare each dataset to another looking for commonalities. My goal is to create one file with one column of all the genetic information that is found among all 17 datasets.
When I compared the dataset, I took the newly created dataset and then compared it to the next genetic file and continued this process. For the first few times, the newly created file went down in size which makes sense as things not found between both datasets is removed.
However, it gets below 1 million commonalities after about 4 comparisons and then starts going back up. How is it possible to compare 2 files for similarities, and the number gets bigger?