Question: Htseq-count input data
0
gravatar for gryndler
2.6 years ago by
gryndler0
gryndler0 wrote:

Dear colleagues, I am a new Galaxy user motivated to exploit its tools to compare two sets of the data from transcriptome sequencing of an extremophilic fungus Acidothrix acidophila. The genome and transcriptomes of the fungus have been sequenced by JGI within the frame of the 100 Fungal genomes project. As a result, I received fasta files of genome and fasta files of genes (Gene catalog of transcripts, gene catalog of proteins (both for best models and all models), as well as the corresponding GFF and GFF3 files. In Galaxy, after grooming, I mapped both transcripts against the Gene catalog fasta file using BWA-MEM, received BAM files and tried them use to get transcript counts using Htseq-count with with gff as well as gff3. At this point, however, I received error messages :

"Fatal error: Unknown error occured Error occured when processing GFF file (line 5 of file /galaxy-repl/main/files/015/198/dataset_15198690.dat): Feature exon_1_1 does not contain a 'gene_id' attribute [Exception type: ValueError, raised in count.py:53]"

The option "Autodetect" did not solved the problem. It seems to mee that the input data provided are not meeting the demands of galaxy tools and may be, I did some basic error(s) which I do not understand, and the gff/gff3 file lacks important data..

I would be grateful to you for any advice how to proceed further in simple comparison of two transcriptomes. My user name is gryndler@biomed.cas.cz and i have just one (unnamed) history.

Thank you in advance for any response

Milan Gryndler

gryndler@biomed.cas.cz

rna-seq software error bam • 1.8k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by gryndler0
0
gravatar for Bjoern Gruening
2.6 years ago by
Bjoern Gruening5.1k
Germany
Bjoern Gruening5.1k wrote:

Htseq-count takes gtf files as input and these files needs to have the same chromosome identifier than your BAM file. Can you check if this is true?

ADD COMMENTlink written 2.6 years ago by Bjoern Gruening5.1k
0
gravatar for gryndler
2.6 years ago by
gryndler0
gryndler0 wrote:

It seems that the Galaxy form associated with Htseq-count demands GFF or GFF3 files as inputs (Overview: "This tool takes an alignment file in SAM or BAM format and feature file in GFF format and calculates the number of reads mapping to each feature."). More over, I do not have the GTF file from JGI. However, the reads are mapped to genes, which gives me a hope that a simple way of comparison might be possible.

ADD COMMENTlink written 2.6 years ago by gryndler0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour