Question: StringTie - annotation not matching naming convention for genome sequences
0
gravatar for sandee
5 months ago by
sandee10
sandee10 wrote:

Hi,

I am trying to use StringTie with a reference annotation from my history but get a warning "no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences".

I downloaded both the reference annotation "Bdistachyon_314v3.1.gene.gff3" and the reference genome I have used the for HiSat2 alignments "Bdistachyon_314v3.1.cds.fa" from the same genome version in Phytozome. I read on another post to check the chromosome names using BAM to SAM and get:

QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL OPT

@HD VN:1.0 SO:coordinate
@SQ SN:Bradi0180s00100.1 LN:1047
@SQ SN:Bradi2g20400.1 LN:1440

While my gff3 file looks like:

Seqid Source Type Start End Score Strand Phase Attributes

gff-version 3 annot-version v3.1 species Brachypodium distachyon

Bd1 phytozomev10 gene 10581 11638 . + . ID=Bradi1g00200.v3.1;Name=Bradi1g00200;...

Bd1 phytozomev10 mRNA 10581 11638 . + . ID=Bradi1g00200.1.v3.1;Name=Bradi1g00200.1;...

Bd1 phytozomev10 CDS 10581 10850 . + 0 ID=Bradi1g00200.1.v3.1.CDS.1;Parent=Bradi1g00200.1;....

I'm not sure what I should be doing to make these compatible. Any ideas?

Best wishes from a newbie, s

stringtie • 349 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by sandee10
2
gravatar for Jennifer Hillman Jackson
5 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

It looks like you are mapping against cDNA fasta but the reference annotation is based on the full genome. You'll need to map against the full genome to use this annotation or obtain annotation that is based on the exome. The first is more commonly available. You can always filler analysis results later to focus on the exons/exome.

However you decided to go forward, the identifiers between the two inputs much be a match (custom genome FASTA + reference annotation GTF/GFF3). Below is a bit more help about how to do that.

Note: For many Galaxy tools, GTF annotation is preferred or required. GFF3 can be converted to GTF format with the tool gffread, yet that is not ideal. Some GFF3 formats will be problematic with the transformation (duplicate IDs, the presence of fasta data) and so should be the last choice. Check if the GTF version is available first.

The title lines in the custom reference genome fasta will need to be modified to contain chromosome identifiers that are a match (exact match) with those in the reference annotation. The coordinates of features must also be based on the same reference fasta.

Once you have a properly paired custom genome/annotation source, check the chosen fasta title lines and compare to the annotation. Even when the data is a match (include data based on the same exact genome/exome version/build), you may need to manipulate the data to be a match that tools can interpret. You'll need to inspect these data to find out to parse/adjust the data to create identifiers that are a match. Then do the reformatting of the fasta to match that in the annotation.

This reformatting can be done before uploading the fasta to Galaxy, or you can try a combination of data manipulation tools to do the same. Should you have trouble reformatting (if even needed), please post back the fasta title lines and the identifiers in the reference annotation. We can probably help to create a reformatting method.

Galaxy FAQs: https://galaxyproject.org/support/#getting-inputs-right

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 5 months ago • written 5 months ago by Jennifer Hillman Jackson25k
1
gravatar for sandee
5 months ago by
sandee10
sandee10 wrote:

Hi Jen,

Thank you so much for the speedy and comprehensive reply. I had tried an exome only annotation that gave me the same error, but using the full genome fasta seems to be doing the job :)

Thanks again! s

ADD COMMENTlink written 5 months ago by sandee10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 166 users visited in the last hour