Question: mpileup result uncertanties, what does "<*>" mean?!?!?
1
gravatar for beckybedford96
9 months ago by
beckybedford9610 wrote:

I am trying to userstand the results from the mpile up tool but I am unsure what the "<*>" is in the alt column is? It also appears in conjunction in another base. What does it mean?!? ='(

ADD COMMENTlink modified 8 months ago by seyadshefrin0 • written 9 months ago by beckybedford9610

hi there

I was using galaxy software online for alighning my mirna seq data . i have a queasion is it appropriate to use reference genome hg19 for mirna seq data in bowtie step. i am getting a vcf file in 40 to 60 mb size with most of the alteration are <*> symbal. anyone kindly explain the meaning of it. i m getting it in almost all position except few

ADD REPLYlink written 8 months ago by seyadshefrin0

It is not clear what your steps are. Is the output a VCF dataset or a BAM dataset? Both formats are described here: https://galaxyproject.org/learn/datatypes/

If the data is RNA-seq, and human, then mapping against the hg19 human genome can be a valid use case. Tool choices and options matter too, as does data QA upstream from mapping. HISAT2 can map spliced reads (RNA). Bowtie is an unspliced mapper, only (DNA).

Please see the Galaxy tutorials here for examples of proper tool usage grouped by analysis goals: https://galaxyproject.org/learn/

Related: https://biostar.usegalaxy.org/p/27105/

ADD REPLYlink modified 8 months ago • written 8 months ago by Jennifer Hillman Jackson25k

thank you for your reply. i have got vcf file with<*> symbols as alternate sequence .why it is so?.IS it because i used bowtie as my map[ping tool? my steps for mirna sequence analysis are

fastqgrromer> fastqc>trimmomatic>fastqc>bowtie2>sort>mark duplicate>rmdup>mpile up

ADD REPLYlink modified 8 months ago • written 8 months ago by seyadshefrin0
1
gravatar for Jennifer Hillman Jackson
9 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The * represents a deletion in the reference genome.

From the VCF Specification: https://samtools.github.io/hts-specs/VCFv4.2.pdf

ALT - alternate base(s): Comma separated list of alternate non-reference alleles. These alleles do not have to be called in any of the samples. Options are base Strings made up of the bases A,C,G,T,N,*, (case insensitive) or an angle-bracketed ID String (“<id>”) or a breakend replacement string as described in the section on breakends. The * allele is reserved to indicate that the allele is missing due to a upstream deletion. If there are no alternative alleles, then the missing value should be used. Tools processing VCF files are not required to preserve case in the allele String, except for IDs, which are case sensitive. (String; no whitespace, commas, or angle-brackets are permitted in the ID String itself)

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 8 months ago • written 9 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour