Question: Error using stringtie - AttributeError: 'NoneType' object has no attribute"
0
gravatar for stefanie.grunwald
12 months ago by
stefanie.grunwald10 wrote:

Hi, I have RNA-seq data and I am interested in whole gene expression results but also transcript expression, exon usage and long non coding RNA. The RNA-seq protocol is not optimized for circular and fusion RNA but I also want to look at that just in case I would find something giving me some ideas for future experiments. Below I am giving all my workflow for trimming and mapping in detail.

My problem is that I get errors trying to assemble mapped reads using StringTie. I use the same gencode annotation file that I included for mapping with STAR.

I used the following options in StringTie:

Use GFF file to guide assembly yes Reference annotation to use for guiding the assembly process 11: gencode.v27.primary_assembly.annotation.gff3
Perform abundance estimation only of input transcripts False
Output additional files for use in... deseq2
Average read length 99
Whether to cluster genes that overlap with different gene IDs False
Options advanced
Disable trimming of predicted transcripts based on coverage False
Increase sensitivity False
Name prefix for output transcripts STRG
Minimum isoform fraction 0.15
Minimum assembled transcript length 35
Minimum anchor length for junctions 10
Minimum junction coverage 5
Minimum bundle reads per bp coverage to consider for assembly 2
Gap between read mappings triggering a new bundle 50
Fraction of bundle allowed to be covered by multi-hit reads 0.95
Do not assemble any transcripts on these reference sequence(s) Empty.
Additional gene abundance estimation output file True
Disable multi-mapping correction False
Job Resource Parameters no

However, I keep getting the follwoing error after certain minutes: Fatal error: Exit code 1 () Traceback (most recent call last): File "/galaxy/main/deps/_conda/envs/__stringtie@1.3.3/bin/prepDE.py", line 85, in <module> t_id=RE_TRANSCRIPT_ID.search(v[len(v)-1]).group(1) AttributeError: 'NoneType' object has no attribute 'group'

Can you tell me, what the problem might be or any suggestions what I may have to change?

**Thank you very much!!!

Best regards Stefanie**

More detail on trimming and mapping:

Trimming was performed by removing 5' and 3' ends, if quality was lower than 10 and also reads smaller than 20bp have been removed.

I mapped my RNA-seq data with STAR using gencode primary assembly reference sequence and annotation with the following options:

Single-end or paired-end reads single
RNA-Seq FASTQ/FASTA file 7: Trimmomatic on CON2.fastq
Custom or built-in reference genome history Select a reference genome 8: GRCh38.primary_assembly.genome.fa
Gene model (gff3,gtf) file for splice junctions 9: gencode.v27.primary_assembly.annotation.gff3 Length of the genomic sequence around annotated junctions 99
Count number of reads per gene True
Would you like to set output parameters (formatting and filtering)? yes Extra SAM attributes to include All Include strand field flag XS Yes -- and reads with inconsistent and/or non-canonical introns are filtered out
Filter alignments containing non-canonical junctions Remove alignments with unannotated non-canonical junctions
Would you like to set additional output parameters (formatting and filtering)? yes Would you like unmapped reads included in the SAM? False
Would you like all alignments with the best score labeled primary? False
MAPQ value for unique mappers 255 Would you like to keep only reads that contain junctions that passed filtering? False
Score range below the maximum score for multimapping alignments 1
Maximum number of alignments to output a read's alignment results, plus 1 10
Maximum number of mismatches to output an alignment, plus 1 10
Maximum ratio of mismatches to mapped length 0.2 Maximum ratio of mismatches to read length 1.0 Minimum alignment score 0
Minimum alignment score, normalized to read length 0.7 Minimum number of matched bases 18
Minimum number of matched bases, normalized to read length 0.7 Other parameters (seed, alignment, and chimeric alignment) star_fusion Job Resource Parameters no

All mapped reads were in sorted bam files (sorted by coordinates). The mapping statistics gives me a mapping of almost 100%??? Are my mapping options not stringent enough? Here are the mapping results as detailed in star log file:

                      Number of input reads |   60296901
                  Average input read length |   100
                                UNIQUE READS:
               Uniquely mapped reads number |   53290337
                    Uniquely mapped reads % |   88.38%
                      Average mapped length |   100.37
                   Number of splices: Total |   15721797
        Number of splices: Annotated (sjdb) |   15589504
                   Number of splices: GT/AG |   15590195
                   Number of splices: GC/AG |   111954
                   Number of splices: AT/AC |   12448
           Number of splices: Non-canonical |   7200
                  Mismatch rate per base, % |   0.25%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.83
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.34
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   5825990
         % of reads mapped to multiple loci |   9.66%
    Number of reads mapped to too many loci |   22883
         % of reads mapped to too many loci |   0.04%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   1.89%
                 % of reads unmapped: other |   0.03%
                              CHIMERIC READS:
                   Number of chimeric reads |   533492
                        % of chimeric reads |   0.88%
assembly star rna-seq stringtie • 633 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by stefanie.grunwald10
1
gravatar for stefanie.grunwald
12 months ago by
stefanie.grunwald10 wrote:

Hi,

I could solve the problem for my data. I use now gtf version of the reference annotation file (instead of gff3) and set Minimum assembled transcript length to 50.

What I also tried but it did not help:

  • filter out all lines in gff3 annotation file that misses "transcript_id"
  • grooming using string tie merge transcripts
  • in StringTie - disabeling trimming of predicted transcripts based on coverage set on true
  • in StringTie -- perform abundance estimation only if input transcript set on true

Maybe that helps others.

Regards

Stefanie

ADD COMMENTlink written 12 months ago by stefanie.grunwald10

Yes, the problem seemed to be due to the format of the reference annotation. Glad you worked this out and thanks for posting back your solution!

ADD REPLYlink modified 12 months ago • written 12 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 176 users visited in the last hour