Question: Any tool for expand and extract genome sequence from a given start coordinate
0
gravatar for zhhxu9
4.1 years ago by
zhhxu90
Canada
zhhxu90 wrote:

Hello everyone,

I have some genome coordinates with only the start site. I want to extract 50bp of the genome sequence of both upstream and down stream from this start coordinate. Anyone happen to know what tool can fulfil this job? Either on Galaxy on not. Thanks a lot.

Zhenhua

extract genome sequence • 1.8k views
ADD COMMENTlink modified 4.1 years ago by Bjoern Gruening5.1k • written 4.1 years ago by zhhxu90
0
gravatar for Bjoern Gruening
4.1 years ago by
Bjoern Gruening5.1k
Germany
Bjoern Gruening5.1k wrote:

Hi,

please browse to all the text manipulation tools. I would try to "compute an expression on every row" and add 50 to your column and on the second step substract 50. Subsequently you can use the tool 'Cut columns from a table' to create your own BED file. Think of Galaxy as a huge box of lego bricks. As soon as you know how every brick looks like you can build beautiful things with it :)

Cheers,

Bjoern

ADD COMMENTlink written 4.1 years ago by Bjoern Gruening5.1k

Hi,

But I do not have any sequence now. All I have are start coordinates. For example, Chr1: 68029345. And I have thousand of them. What I want is to find the coordinates on a specific genome and extract 100bp genome sequence around the start coordinate (50bp upstream, 50bp downstream). Any suggestion?

Thanks.

Zhenhua

ADD REPLYlink written 4.1 years ago by zhhxu90

Hi,

I think I find out a way to solve this problem. Thanks for your advise.

Zhenhua

ADD REPLYlink written 4.1 years ago by zhhxu90

Do you mind sharing your answer to help other people? My answer did not need any sequence. But I guess I have not understood the question properly.

ADD REPLYlink written 4.1 years ago by Bjoern Gruening5.1k

Because I am analysing the ChIP-Seq data. I want to find the consensus motif  bind by transcription factor around peak summit .  However the summit I got is only the start coordinates in .BED format. Here is what I did:

Firstly, use excel (idea comes from Bjoern) to get the specific scope around the coordinates. And save the file in .BED format.

Secondly, open http://chipseek.cgu.edu.tw/index_show.py , (ChIPseek, a web tool for annotating the ChIP peak) and upload the BED file into it and select the corresponding species (limited species are supported). Then in the output page, there is a function called "Get peak sequence", it can be used to export the sequence in .fasta format.

May be this is a stupid way, but it works for me. Better way or tool can be suggested if anyone has good ideas.

 

Thanks.

Zhenhua

ADD REPLYlink written 4.1 years ago by zhhxu90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 179 users visited in the last hour