Question: Extract assembled transcripts from Cufflinks gtf without to use reference genome
Hi all,

I know there were similar questions, but unfortunately I couldn't find proper answer for my problem. I am doing de novo transcriptome assembling of unknown fish. I used Trinity for that purpose and I acceptable assemblies. I also did genome-guided assembling with STAR->Cufflinks using close related organism O.niloticus. My idea is to do augmentation of trinity assemblies with results from Cufflinks (guided by this article) and annotate it afterwords.

The problem is to extract generated Cufflinks transcripts into fasta format. I read about several options, but all of them use reference genome. I cannot use it in my case since it's close related species and I have a lot of SNPs.

Is there any method to extract fasta file from alignment only (from bam files) or from reference but to change all inappropriate nucleotides?

Thanks in advance, Marija

If I understand the protocol you are using correctly, extracting the fasta sequences from the reference genome used with Cufflinks will produce the transcripts you want (even though they are based on the closely related genome). These are mapped back to the Trinity assembly to determine overlapping regions. Then those overlapping regions in the Trinity assembly represent the transcripts with the target species variation.

Hope this helps! Jen, Galaxy team

