Question: Difference between mRNA, CDS, transcript and gene in annotation file
I am new to bioinformatics and have a very basic question. In the annotation files (gff) available in NCBI ftp site, there are different features such as gene, CDS, transcript, mRNA, etc. I understand that CDS represents the coding sequence - i.e. starting from ATG. But I confused between the definitions for gene, transcript and mRNA. If I want to extract 500 bp upstream and downstream from transcription start sites, should I be using the transcript, mRNA or gene?

Thanks! and Sorry about the naivety of the question.

Transcript can be used - the first base is the start of transcription (the TSS).

GFF3 specification

Thanks, Jen, Galaxy team

