Hello,
I am new to RNA-seq data. I am using the built-in index and using the reference genome without a built-in gene model (hg38) to align in STAR. How do I find or construct the gene model file for splice junctions?
Thanks
Hello,
I am new to RNA-seq data. I am using the built-in index and using the reference genome without a built-in gene model (hg38) to align in STAR. How do I find or construct the gene model file for splice junctions?
Thanks
Hello,
That annotation is a good match for hg38. Be sure to use the GTF version of the annotation. There are three choices and which to use depends on what you are doing -- for many, the CHR version is the easiest to work with (simpler). You could run the analysis using the different version and compare to make the decision for yourself.
The data will load with the datatype gff
assigned due to the presence of header lines. Some tools can use the data that way, others will require that you remove the header lines and change the datatype to be gtf
.
Remove header lines with the tool Select using the options "NOT Matching" and the regular expression ^#
.
Thanks! Jen, Galaxy team
Hello
Thank you, that helps out a lot. I used the GTF version that was provided in the link to the right of all the files. Does this resolve those issues that you are talking about? This is what occurs at the top of the file:
Should I manually remove this, or where is the tool that you are referring to? Are these issues why the RNA-star job has yet to run on the galaxy (I have no other current jobs running)
Thanks
Update: I used the "ALL" option in this database: https://www.gencodegenes.org/releases/current.html
Would this be appropriate for whole blood RNA-seq?