RNA-star mapped.bam problems

Question: RNA-star mapped.bam problems

8 months ago by

a.walne • 40 wrote:

I'm trying to use RNA-star but keep getting the following error message "An error occurred setting the metadata for this dataset Set it manually or retry auto-detection" in the tab "RNA STAR on data 11, data 10, and others: mapped.bam" In this example data 10 is - ftp://ftp.ensembl.org/pub/release91/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Data 11 is - ftp://ftp.ensembl.org/pub/release-91/gff3/homo_sapiens/Homo_sapiens.GRCh38.91.gff3.gz Others I assume refers to the paired end fastq files.

I have tried auto-detection but this doesn't help. Any suggestions? Thanks

rna-seq star metadata mapping error • 445 views

ADD COMMENT • link •

modified 8 months ago by Jennifer Hillman Jackson ♦ 25k • written 8 months ago by a.walne • 40

Hello - The public Galaxy server https://usegalaxy.org was updated this morning. I am running some tests to see if this is a usage error or a server-side error.

This includes a direct rerun and a few reruns with the Custom genome fasta and annotation GFF3 inputs cleaned up. Both are slightly out of specification and might be a factor (some tools are pickier about formats than others). I also noticed that you are assigning a metadata database attribute to your data even when they are not based on the built-in genome and indexes that use that same database name. Using a Custom Build is a better choice. Even if these are not the root problem with this run, the datasets should be cleaned up to work properly with all tools (including tools downstream of RNA-STAR, once working and returning proper alignment results).

Support FAQs: https://galaxyproject.org/support/

ADD REPLY • link modified 8 months ago • written 8 months ago by Jennifer Hillman Jackson ♦ 25k

8 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The custom genome/annotation data is too large to process with the RNA STAR mapper at Galaxy Main https://usegalaxy.org even with the input corrections. This results in an empty BAM output that cannot be indexed, triggering the metadata problems.

This means that you'll need to do one of these:

use HISAT2 instead
run the RNA STAR job using an indexed built-in genome at Galaxy Main (along with a reference annotation dataset that is a match: eg: same genome build/common chromosome identifiers).
consider starting up your own Galaxy server and provide it with enough memory to run.

For items 2 & 3, please be aware that it is possible that the job may still remain too large to execute, even when using a built-in genome or given more resources at your own Galaxy server. RNA STAR uses much more memory during job execution than the other mapping tools - whether used in Galaxy or not.

I tested HISAT2 with your data and the job completed successfully. What I did:

corrected the custom genome format
removed the database assignment from the fastq inputs
changed the datatype for the fastq inputs to fastqsanger.gz
no reference annotation was used (HISAT2 accepts gtf formatted annotation, not gff3)

How to do the above and where to obtain reference annotation in gtf format is covered in the FAQs I linked in the original comment.

Galaxy tutorials for RNA-seq with workflow/tool example usage: https://galaxyproject.org/learn/

Thanks! Jen, Galaxy team

ADD COMMENT • link modified 8 months ago • written 8 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »