Empty data result from Cufflinks

Question: Empty data result from Cufflinks

8 months ago by

a.walne • 40 wrote:

Hi I am trying to run Cufflinks on some BAM files that I have generated using HISAT2 with uploaded reference and annotation files. I don't appear to be getting any data in the skipped transcripts tab as shown below. There was also an error message about wrong annotation file type so I followed the suggestion of trying to search again for the file type using the attributes tab

Skipped transcripts 47.14 GB 0 lines formatgtfdatabasehg38 cufflinks v2.2.1 cufflinks -q --no-update-check -I 300000 -F 0.100000 -j 0.150000 -p 10 -G /jetstream/scratch0/main/jobs/18896476/inputs/dataset_24336832.dat -u -b ref.fa /jetstream/scratch0/main/jobs/18896476/inputs/dataset_24338216.dat

The annotation file URL I uploaded is ftp://ftp.ensembl.org/pub/release-91/gff3/homo_sapiens/Homo_sapiens.GRCh38.91.gff3.gz the genome file URL is: ftp://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Is there anything I can do to correct this continuing error, or does it not really matter

Thanks for your help

genome gtf cufflinks custom hisat2 • 222 views

ADD COMMENT • link •

modified 8 months ago by Jennifer Hillman Jackson ♦ 25k • written 8 months ago by a.walne • 40

8 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

There are a few items to check:

Tool settings: Did you set the HISAT2 spliced alignment options to report results compatible with Cufflinks?

Galaxy tutorials: https://galaxyproject.org/learn/
- RNA-seq: Discovering and quantifying new transcripts - an in-depth transcriptome analysis example: https://galaxyproject.org/tutorials/nt_rnaseq
- Specifically, the option is set here. The tutorial uses Stringtie downstream, but Cufflinks can also be used: https://galaxyproject.org/tutorials/nt_rnaseq/#spliced-mapping-with-hisat

Data: Your inputs are a match in terms of chromosome identifiers, but this source also includes format, comment, and sometimes fastq lines that will cause problems with the tool.

Support FAQs: https://galaxyproject.org/support
- The reference genome fasta should have NormalizeFasta run on it to strip out any title line ">" annotation content, retaining just the chromosome identifiers https://galaxyproject.org/learn/custom-genomes/
- For the reference annotation, choose the GTF annotation version instead of the GFF3, to avoid the formatting issues from this source: https://galaxyproject.org/learn/datatypes/#gff3
- General troubleshooting help: https://galaxyproject.org/support/#unexpected-results

Alternate method for Data above: Since you are mapping against hg38, and it is a large genome, you might run into memory problems with tools using it as a custom genome. So, instead, you could use the built-in hg38 genome index (if available on the server you are working at) along with a GTF annotation dataset based on the same genome build and that has matching chromosome identifiers.

iGenomes is known to work well with this tool suite. Choose the GTF for UCSC/hg38. Download the tar archive, uncompress it locally, then upload just the genes.gtf dataset to Galaxy. https://support.illumina.com/sequencing/sequencing_software/igenome.html

Thanks! Jen, Galaxy team

ADD COMMENT • link modified 8 months ago • written 8 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »