Trinity results mismatch during alignment

Question: Trinity results mismatch during alignment

3.5 years ago by

United Kingdom

d.angra • 50 wrote:

Hello Jen

I thank you for your help. I think I have understood why there is a mismatch. I run de novo assembly (on Vicia faba available SRA) using trinity on Indiana instance of galaxy and obtain results of transcriptome assembly as of set of 361,316 contig, they are all under an identifier beginning with TR1,TR2 etc.I then upload this file onto usegalaxy for SNP discovery. I have developed a workflow for SNP discovery. Following this workflow the first step is to align my dataset with this trinity assembled file (uploaded after de novo assembly) as a reference which I do by using BWA-MEM. I get output in BAM format and when I loading it to view in IGV I see it is a mismatch (as I have queried you earlier). However when I align them to my own reference (which is Medicago genome) I do get an output again in BAM format. When I visualize this under IGV I can see it clearly. I did FASTA manipulations to match the identifiers but this made no difference to visualization in IGV. Well this makes me think that when I align my dataset with reference I think IGV is not able to read as it takes those identifiers TR1 etc as unidentifiable. On the other hand when I align it with reference genome (Medicago) it gets identifiers like IMAG etc.. which is identified by IGV. So I think galaxy and IGV not recongnising the identifier is a problem.

I am not sure what I am thinking is perfectly fine. But I think to a great extent is a good explanation.

Could you please help me in this regard and suggest me other tools by which I can align my datasets with Trinity assembly as a reference?

Looking forward to reply from you.

Viva

bwa alignment • 1.1k views

ADD COMMENT • link •

modified 3.4 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.5 years ago by d.angra • 50

3.4 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello Deepti,

Start by making sure that format is as simple as possible. Double check that your custom reference genome has unique identifiers and then strip off description line content if present. Also double check that the dataset is in strict fasta format with the sequence lines wrapped. Mapping tools will often not care about this detail, but many downstream tools do. Custom Ref Genome Troubleshooting help.

Another potential is that the custom genome has too many "chromosomes" for IGV to handle. You might want to ask their support about this possiblity (if there is some sort of cap). I didn't web search myself to see if already asked and online, but this is also something you could do.

Best, Jen, Galaxy team

ADD COMMENT • link written 3.4 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »