Question: Flanking regions from SNP list
1
gravatar for d.angra
3.2 years ago by
d.angra50
United Kingdom
d.angra50 wrote:

Hello 

Using the RNA seq data and have performed de novo assembly with non-model species. I have used a lot of tools on galaxy and reached the point where I have a list of about 80,000 SNPs. Now in order to validate the SNPs I need a powerful tool. Is there a method in galaxy to fetch sequences around SNPs? I saw a tool "flanking sequence"in genome diversity suite but unfortunately I am not able to create gd_snp file which seems to be a requirement to use the tool. Any help on how to get sequences from SNP list? 

I have nucleotide sequences of validated markers. But how should I be using this to compare the sequences from SNP list if I don't have my SNP sequence at all.

Thankyou in advance.

 

Viva

 

 

 

rna-seq • 1.9k views
ADD COMMENTlink modified 3.2 years ago by Jennifer Hillman Jackson25k • written 3.2 years ago by d.angra50
1
gravatar for Guy Reeves
3.2 years ago by
Guy Reeves1.0k
Germany
Guy Reeves1.0k wrote:

Hi Viva 

Have you tried on usegalaxy.org  the use the following tool  

Genome Diversity>DESIGN GENOTYPING STUDIES>Convert : CSV, FSTAT, Genepop or VCF to either gd_snp or gd_genotype (Galaxy Tool Version 1.0.0)

 This should convert your VCF to gd_snp which would allow you to use the "flanking sequence"  tool you mentioned above.

Thanks Guy

ADD COMMENTlink written 3.2 years ago by Guy Reeves1.0k

Definitely use this if the input file is appropriate! It wasn't clear if this was where the problem originally was or not. If it is problematic, use the other advice I gave. Most any file can be created with Galaxy's data tools. Jen

ADD REPLYlink written 3.2 years ago by Jennifer Hillman Jackson25k

Hello Guy

Thankyou for your suggestion. I have tried doing what you suggested but ''flanking sequence" did not work as it throws the error "reference species" missing. I have added genome build to my dataset, if that could be a problem.

My possible explanation to this is: Since I am doing a de novo assembly so in particular I have no reference but I have one background genome which I have aligned to, but I need to find SNPs between assembly and this background genome, in order to get away with the problem of reference species. Do you think what I am saying is correct?

 

Thanks 

Viva

ADD REPLYlink written 3.2 years ago by d.angra50
0
gravatar for Jennifer Hillman Jackson
3.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This tool accepts a tabular input file or just coordinates and genome build info. For genome build, enter in the "dbkey" or short label for the reference genome. If you are using a Custom reference genome, go further and create a Custom Build with it and assign it to the dataset and use the dbkey you created for this value.

Let us know if you run into problems. There is another tool to get flanks in the group Operate on Genomic Intervals, but this tool was not created with the add-on functionality to swap in given SNPs. A simple tool that will swap in SNPs for any defined genomic region is in the works, but it is not on the public server (possibly in the future) yet will be in Tool Shed for local/cloud use soon.

That said, this manipulation is likely possible per-SNP with a combination of several existing tools, but I do not have a workflow to share. If anyone else reading does, please publish at http://usegalaxy.org and share the link here.

Best, Jen, Galaxy team

ADD COMMENTlink written 3.2 years ago by Jennifer Hillman Jackson25k
0
gravatar for d.angra
3.2 years ago by
d.angra50
United Kingdom
d.angra50 wrote:

Hi Jen

Thanks for a reply. I have custom reference and have added it to the genome build. I also want to know a way to get coordinates from the vcf file. As the vcf file only gives me the coordinate on the assembly (that is the contig) and position of the SNP. To get a tabular form I need start and end of interval on the contig of vcf file (meaning start and end of the region in which SNP is located. How do I go about doing this? I have intersected vcf with BED but this is just another vcf file with no change at all. Is this normal or have I done something which is not correct.

Regards

Viva

 

ADD COMMENTlink written 3.2 years ago by d.angra50
0
gravatar for d.angra
3.2 years ago by
d.angra50
United Kingdom
d.angra50 wrote:

HI Jen

I tried to do as you suggested in your reply but it throws en error that reference species missing refernce species Could you please advise me what has been going wrong.

 

Viva

ADD COMMENTlink written 3.2 years ago by d.angra50

Hello,

Was it the tool Convert : CSV, FSTAT, Genepop or VCF to either gd_snp or gd_genotype that is producing a problem? There are probably missing FORMAT fields (as described on the tool form). Or you are not specifying both the focus and reference genome (these can be the same value).

Another option is use the tool VCFtoTab-delimited and just create the file yourself using the text manipulation tools in Galaxy. This takes a few steps. Once you have it working, save it into a workflow for re-use.

Specifically, once in tabular format, do the following (find tools by searching):

  1. Remove beginning of a file to get rid of the header
  2. Add column with the reference genome name to all lines
  3. Compute a zero-based start (using the existing position) with the expression c2-1
  4. Cut the columns out of the result to create the final input dataset

Hope this helps! Jen

ADD REPLYlink written 3.2 years ago by Jennifer Hillman Jackson25k

Hi Jen.

I am able to create gd_snp by following instructions but when I try to use it for "flanking sequence" it shows the error reference species missing which makes me think that it could be a trouble while creating gd_snp file. I am not sure.

But I am going to use steps recommended by you and see what happens.

Thankyou

Deepti

 

ADD REPLYlink written 3.2 years ago by d.angra50
0
gravatar for d.angra
3.2 years ago by
d.angra50
United Kingdom
d.angra50 wrote:

Hello Jen

I am trying to write every detail of my efforts.

1) I have not been able to use this tool "convert " as it throws the error unable to finish job. I also get confused what should I be using for reference species and what as Focus species as I am doing denovo assembly and I have referenced my datasets with this assembly.

2) However if I somehow it goes a step beyond by keeping both reference species and focus species unspecified it says non integer scaffold.

3) When I try to use tool " flanking sequences " i use  vcf file as input, specify columns to use, then I get en error no species selected (as even if I try to select assembly I get no option because that input would not work. So this gets me no results either.

4) If I use tool manipulation I have no difference in scaffold and reference chromosome selection as they are both the same.

I desperately need help.

Thankyou in advance

 

Viva

 

ADD COMMENTlink written 3.2 years ago by d.angra50

If you would like to share a history with your work in it, we can take a look. First share it on the server, then email the link plus a link to this Biostars post to galaxy-bugs@lists.galaxyproject.org. Make certain that all inputs are present and not deleted, including the fasta for the custom reference genome. If the history is very large (over a hundred or so datasets), please copy the datasets involved in these manipulations to another history that contains all of the work associated with this task (omitting other non-related work).

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
3.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

The Genome Diversity tool set has been reported as problematic. Please follow this Trello ticket for updates and a resolution: https://trello.com/c/ZBYLglM3

ADD COMMENTlink written 3.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 183 users visited in the last hour