6 months ago by
United States
Hello,
The input BAM must have the database assignment made when using this tool. The input reference annotation should have the same database assignment made. If the genome was one already indexed on the server (a genome you used directly from Galaxy, not a custom genome) this is normally assigned to the BAM result with mapping tools. However, STAR is not automatically assigning the database attribute as it should (we are working to fix this). Meanwhile, you could directly assign the database if not using a custom genome.
Note: Featurecounts might have a small bug when using a custom genome. Is that your case as well? See this other post for details: https://biostar.usegalaxy.org/p/27973/. If the same issue, we can merge the two for updates. I am testing to see if a Custom Build is a potential workaround (could be used before the tool is updated).
And yes, the GTF datatype assignment with the Upload tool can be problematic with data from certain sources that include header/comment lines. That is another fix in progress but at a lower priority. Meanwhile, just assign the datatype with Upload or after by editing the attributes. Many tools will work with the header intact but some will not. Or, you could remove the comment lines before Upload (and GTF will be autodetected correctly. Or, you can do the correction in Galaxy. Removing those comment lines at the top will avoid confusing errors/problems. I don't think that Featurecounts is impacted, but if the database assignments are the same between your BAM and GTF, both have the correct datatype assigned, and the tool still refused to recognize the GTF, you could try the reformatting to see if it helps. Use the tool "Remove beginning of a file" if you choose to do this within Galaxy. I'll be including GTFs with/without headers in my featurecounts tests.
UCSC is one choice for an alternative GTF that wouldn't have a header included. However, the problem with GTFs from that source is that the gene_id and transcript_id values are the same when extracted from the Table Browser. Both values are the transcript name. This effectively means that all counts will be made by "transcript" and not grouped "gene". This creates scientific content problems with the output from many tools (not just this one). iGenomes GTFs are well formatted, as are those from Gencode, and avoid all of these format/usage issues.
Thanks for reporting problems! Jen, Galaxy team