Question: Remove sequences with duplicate chromosome start position
0
gravatar for lem
4.1 years ago by
lem0
USA/Chicago
lem0 wrote:

I used GALAXY to extract the 1000 bp upstream of all UCSC genes (i.e. promoters). I sorted the data by chromosome number then by start position (i.e. by c1 then by c2).  For any gene with multiple isoforms using the same start site, there will be duplicate chr start coordinates and I want to remove these.


Essentially, column 2 contains the start coordinate. I want to remove all lines with a duplicate start coordinate (for a given chromosome). 

 

Thank you in advance for your wonderful help to a student who is still learning the computational basics.

remove duplicate column • 1.0k views
ADD COMMENTlink modified 4.1 years ago by Bjoern Gruening5.1k • written 4.1 years ago by lem0
0
gravatar for Bjoern Gruening
4.1 years ago by
Bjoern Gruening5.1k
Germany
Bjoern Gruening5.1k wrote:

Hi Lem,

you can use the tool "Unique occurrences of each record" and use it on c1 and c2 only. This is under advanced options.

Cheers,

Bjoern

ADD COMMENTlink written 4.1 years ago by Bjoern Gruening5.1k

Thanks Bjoern- I do not see this option, though. Nor am I able to find 'advanced options' tab

ADD REPLYlink written 4.1 years ago by lem0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour