Question: Any tool for expand and extract genome sequence from a given start coordinate
4.1 years ago by
zhhxu90 wrote:

Hello everyone,

I have some genome coordinates with only the start site. I want to extract 50bp of the genome sequence of both upstream and down stream from this start coordinate. Anyone happen to know what tool can fulfil this job? Either on Galaxy on not. Thanks a lot.


extract genome sequence • 1.8k views
4.1 years ago by
Bjoern Gruening5.1k
Bjoern Gruening5.1k wrote:


please browse to all the text manipulation tools. I would try to "compute an expression on every row" and add 50 to your column and on the second step substract 50. Subsequently you can use the tool 'Cut columns from a table' to create your own BED file. Think of Galaxy as a huge box of lego bricks. As soon as you know how every brick looks like you can build beautiful things with it :)



But I do not have any sequence now. All I have are start coordinates. For example, Chr1: 68029345. And I have thousand of them. What I want is to find the coordinates on a specific genome and extract 100bp genome sequence around the start coordinate (50bp upstream, 50bp downstream). Any suggestion?



I think I find out a way to solve this problem. Thanks for your advise.


Do you mind sharing your answer to help other people? My answer did not need any sequence. But I guess I have not understood the question properly.

Because I am analysing the ChIP-Seq data. I want to find the consensus motif  bind by transcription factor around peak summit .  However the summit I got is only the start coordinates in .BED format. Here is what I did:

Firstly, use excel (idea comes from Bjoern) to get the specific scope around the coordinates. And save the file in .BED format.

Secondly, open , (ChIPseek, a web tool for annotating the ChIP peak) and upload the BED file into it and select the corresponding species (limited species are supported). Then in the output page, there is a function called "Get peak sequence", it can be used to export the sequence in .fasta format.

May be this is a stupid way, but it works for me. Better way or tool can be suggested if anyone has good ideas.




