Question: SNPs genomic DNA
0
gravatar for mona.pujni2016
22 months ago by
mona.pujni201610 wrote:

Hi friends

In my research project I am undertaking I have mined about 200,000 SNPs. I have extracted DNA sequences upstream and downstream out of them but I am unable to align them to the right places. I am writing you a quick steps i followed. After extracting SNPs in the VCF file I have used get flanks to obtain the flanks of the SNP position, this step skipped several lines inspite of data input in the correct format. Then i used extract genomic DNA out of each flank both upstream and downstream.But the output which I obtained are in interval format with genomic sequences on both sides, but still there are some lines skipped. Now from whatever results I get I want to align sequences to go with the correct SNPs. Can anybody please suggest me how can this be done in galaxy?

A very big thank you in advance. D

snp • 549 views
ADD COMMENTlink modified 22 months ago by Jennifer Hillman Jackson24k • written 22 months ago by mona.pujni201610
0
gravatar for Jennifer Hillman Jackson
22 months ago by
United States
Jennifer Hillman Jackson24k wrote:

Hello,

The skipped lines are curious, but it might be that the length of the flanks requested extend beyond the chromosome ends (possible, but should be rare).

Or the chromosomes do not exist in the target genome - indicating a genome mismatch problem. Double check to eliminate that as a cause: https://wiki.galaxyproject.org/Support#Reference_genomes

The position of the SNP is in the VCF file. If you want to, the coordinates can be used to create an interval file of start/stop coordinates (use tools in Text Manipulation) with the SNP name/position in the "name" field) and regions directly extracted without cycling through other tools. Extract the sequences to fasta, name these by the SNP/position, then align to the genome. (The Get Flanks tools was not designed with this specific use case in mind as the "name" of the original query is lost during the processing). A short read mapper is not the best choice to align these longer reads, try Lastz or BLAST+ or BLAT instead (most for use on a local or cloud Galaxy): https://wiki.galaxyproject.org/BigPicture/Choices

UCSC aligns SNP flanks to confirm dbSNP positions and annotates SNPs with any discovered discrepancies. Review a dbSNP track's methods at http://genome.ucsc.edu for the protocol they use. I do not personally know of a converted Galaxy workflow to match these exact methods, but this could almost certainly be done - exact or an acceptable variation. Others can also reply/comment if they have one they are willing to share.

This is not an exact solution, but hopefully helps anyway, Jen, Galaxy team

ADD COMMENTlink written 22 months ago by Jennifer Hillman Jackson24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 132 users visited in the last hour