I am new to bioinformatics and have a very basic question. In the annotation files (gff) available in NCBI ftp site, there are different features such as gene, CDS, transcript, mRNA, etc. I understand that CDS represents the coding sequence - i.e. starting from ATG. But I confused between the definitions for gene, transcript and mRNA. If I want to extract 500 bp upstream and downstream from transcription start sites, should I be using the transcript, mRNA or gene?
Thanks! and Sorry about the naivety of the question.