Question: obtaining SNP/SNVs from vcf or BAM files with galaxy?
0
gravatar for kdevitofranceschi
3.3 years ago by
Austria
kdevitofranceschi0 wrote:

Hi there!

I am working on a project identifying SNPs/SNVs of a few sequences and was wondering if there is a good work flow for conducting such an analysis with Galaxy? I have vcf and BAM files (already indexed) available for use and would ultimately like to align them to a ref sequence and see what SNPs I can find. One small caveat: these files include whole chromosomes worth of data.

While I can find SNPs manually in IGV, it is tedious, and I am sure there is a way to solve this issue using bioinformatics. Ideally I would like to use vcf files, because they include genotypes.

Any suggestions? Is it even possible?

Thorough, step-by-step answers would be appreciated (I'm a rookie, can you tell?).

Thanks for your time in advance!

galaxy snps vcf snvs bam • 2.3k views
ADD COMMENTlink modified 3.3 years ago by Jennifer Hillman Jackson25k • written 3.3 years ago by kdevitofranceschi0

 

HI 

Would be happy to help, but I need a better understanding of what you mean by 'finding' SNPs.  Also what is your .vcf file and your .bam file  derived from.  Your question is written very clearly and with some of this extra info I am fairly sure you will get help.

Thanks  Guy

 

 

ADD REPLYlink written 3.3 years ago by Guy Reeves1.0k

0

kdevitofranceschi

just now by

kdevitofranceschi10

Austria

Of course! Thank you for your response.

The vcf/BAM files were derived from Illumina HiSeq 2000 reads then mapped to the human genome (GRCh37).

As to what I mean with finding SNPs, these libraries have been created and I would like to find any SNPs present in the sample in comparison with the reference genome. So, for example if the reference nucleotide at position x is A, I would want to know if it's anything other than A in my sample. Does that make sense?

Thanks for your help. I greatly appreciate it!

ADD REPLYlink written 3.3 years ago by kdevitofranceschi0
1

Hi

So the .vcf is a file generated from the .bam file?  If so they basically represent the diffrent formats of your same data?

If you want to know about  all the  sites where mapped reads were diffrent from the reference you mapped them to this exactly what the VCF file is,  one line for each  potentially variable site once you go past the header (press the eye icon in the dataset and then scroll down passed the header and then across the column headings).  Each sample will have a column , there can be  more than one sample in a file.

Does reading about the .vcf file format help  in the following link?

https://samtools.github.io/hts-specs/VCFv4.2.pdfHi 

 But caution the  list  in the Vcf is all of potential variants  and may  well include a load of sites which are low confidence errors (from sequencing and mapping). Info about the quality of the variants in the is also in the .vcf. You then need to use a program to filter out low confidence calls to give you a list of probable SNPs or indels.  An example of one program that can do this is the GATK 'unified genotypes' . Does any of this help?  Guy

 

 

ADD REPLYlink written 3.3 years ago by Guy Reeves1.0k
0
gravatar for kdevitofranceschi
3.3 years ago by
Austria
kdevitofranceschi0 wrote:

Of course! Thank you for your response.

The vcf/BAM files were derived from Illumina HiSeq 2000 reads then mapped to the human genome (GRCh37).

As to what I mean with finding SNPs, these libraries have been created and I would like to find any SNPs present in the sample in comparison with the reference genome. So, for example if the reference nucleotide at position x is A, I would want to know if it's anything other than A in my sample. Does that make sense?

Thanks for your help. I greatly appreciate it!

 

ADD COMMENTlink written 3.3 years ago by kdevitofranceschi0
0
gravatar for Jennifer Hillman Jackson
3.3 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The Galaxy NGS 101 suite of tutorials is a good place to get started. Section 6 covers Variant Detection:
https://wiki.galaxyproject.org/Learn/GalaxyNGS101

Best, Jen, Galaxy team

ADD COMMENTlink written 3.3 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour