Question: Trinity results mismatch during alignment
gravatar for d.angra
3.5 years ago by
United Kingdom
d.angra50 wrote:

Hello Jen

I thank you for your help. I think I have understood why there is a mismatch. I run de novo assembly (on Vicia faba available SRA) using trinity on Indiana instance of galaxy and obtain results of transcriptome assembly as of set of 361,316 contig, they are all under an identifier beginning with TR1,TR2 etc.I then upload this file onto usegalaxy for SNP discovery. I have developed a workflow for SNP discovery. Following this workflow the first step is to align my dataset with this trinity assembled file (uploaded after de novo assembly) as a reference which I do by using BWA-MEM. I get output in BAM format and when I loading it to view in IGV I see it is a mismatch (as I have queried you earlier). However when I align them to my own reference (which is Medicago genome) I do get an output again in BAM format. When I visualize this under IGV I can see it clearly. I did FASTA manipulations to match the identifiers but this made no difference to visualization in IGV. Well this makes me think that when I align my dataset with reference I think IGV is not able to read as it takes those identifiers TR1 etc as unidentifiable. On the other hand when I align it with reference genome (Medicago) it gets identifiers like IMAG etc.. which is identified by IGV. So I think galaxy and IGV not recongnising the identifier is a problem.

I am not sure what I am thinking is perfectly fine. But I think to a great extent is a good explanation.

Could you please help me in this regard and suggest me other tools by which I can align my datasets with Trinity assembly as a reference?


Looking forward to  reply from you.



bwa alignment • 1.1k views
ADD COMMENTlink modified 3.4 years ago by Jennifer Hillman Jackson25k • written 3.5 years ago by d.angra50
gravatar for Jennifer Hillman Jackson
3.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello Deepti,

Start by making sure that format is as simple as possible. Double check that your custom reference genome has unique identifiers and then strip off description line content if present. Also double check that the dataset is in strict fasta format with the sequence lines wrapped. Mapping tools will often not care about this detail, but many downstream tools do. Custom Ref Genome Troubleshooting help.

Another potential is that the custom genome has too many "chromosomes" for IGV to handle. You might want to ask their support about this possiblity (if there is some sort of cap). I didn't web search myself to see if already asked and online, but this is also something you could do.

Best, Jen, Galaxy team

ADD COMMENTlink written 3.4 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 167 users visited in the last hour