Question: Why are all my Feature counts "0" ?
0
gravatar for Learner
9 months ago by
Learner 10
Learner 10 wrote:

I am trying to analyse a data using galaxy. as an example I use this data https://www.ebi.ac.uk/ena/data/view/PRJNA338610 then I load the irst two reads as forward and reverse , then run the HISAT2 with human hg19 . Afterwards, I used the featurecount with gencode.v27.primary_assembly.annotation.gtf

All my counts are 0 , why? and what did I do wrong?

ADD COMMENTlink modified 9 months ago by Jennifer Hillman Jackson25k • written 9 months ago by Learner 10
1
gravatar for Jennifer Hillman Jackson
9 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

It looks like you are using the GTF version based on hg38. The version of the annotation that is based on hg19 is here: https://www.gencodegenes.org/releases/27lift37.html

Two choices (both assume that you obtained valid hits with HISAT2):

  • Map against hg38 with HISAT2 instead
  • Use the hg19 version of the GTF

Please try one of these and let us know if you still have problems. If working at https://usegalaxy.org, an email with a history share link to your working history can be sent to galaxy-bugs@lists.galaxyproject.org for closer review. Please leave all inputs and outputs undeleted.

The server at https://usegalaxy.org is undergoing updates right now and several tools are undergoing adjustments. I am not aware of this tool being problematic and would expect a different failure if your results were related to the other known issues, but we can check (after the mismatch issue with the annotation versus genome version is corrected).

FAQs: https://galaxyproject.org/support/

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson25k

@Jennifer Hillman Jackson Hi, should I get the Comprehensive gene annotation ? or another one?

ADD REPLYlink written 9 months ago by Learner 10

Comprehensive brings in annotation from other sources, which can be useful. Basic contains a subset of comprehensive (one source).

I'm not entirely sure which is best for your purposes. There are quite a few notes about the mapping not working well for the comprehensive. You could try both, review the differences (pick a region or two where there is a known gene of interest), and choose.

ADD REPLYlink written 9 months ago by Jennifer Hillman Jackson25k

@Jennifer Hillman Jackson unfortunately it is zero again. I sent an email to the email address you have mentioned

ADD REPLYlink written 9 months ago by Learner 10
1

Sorry to hear that. I see the email and will take a closer look at your use-case. Feedback soon.

ADD REPLYlink written 9 months ago by Jennifer Hillman Jackson25k

@Jennifer Hillman Jackson Thanks

ADD REPLYlink modified 9 months ago • written 9 months ago by Learner 10

Hi - I see the problem. The reads were mapped with HISAT2 against vicPac1 by mistake instead of hg19. Remap and the data will match up and should produce the proper statistics.

There is a proposed change to require direct selection of the target genome for mapping tools under consideration that will help avoid this issue. Please follow if interested: https://github.com/galaxyproject/galaxy/issues/4499

ADD REPLYlink modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson25k

@Jennifer Hillman Jackson are you sure? because I am pretty sure that I selected the hg19!!! my bad. I am sorry if I caused you any problem. one question, so there are several hg19. I select the hg19 and not the other ones like canonical etc. just what about the gft file for the htseq-count? should I use then the one you mentioned above?

ADD REPLYlink written 9 months ago by Learner 10
1

You can map with any of these variants and use the GTF you have now that is based on hg19:

  • hg19 - this is the full build
  • hg19 Canonical - contains primary autosomes (chr1-chr22) plus chrX, chrY, chrM
  • hg19 Female - contains primary autosomes (chr1-chr22) plus chrX, chrM -- no chrY

If you are not interested in haplotypes and unmapped genome data, Canonical or Female can be used. Some choose Female to avoid multi-mapping issues between the pseudoautosomal regions (PARs) in common between chrX and chrY (are exactly duplicated in hg19).

Details of where the PAR regions are located and details about haplo/unmapped included in the build can be reviewed at UCSC (the hg19 source): http://genome.ucsc.edu/cgi-bin/hgGateway

ADD REPLYlink modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson25k

@Jennifer Hillman Jackson thank you so much for your detailed explanation. some people say that UCSC is outdated and has so many problem in their genome data. they always refer to ensemble for example. I don't know if this is true or not :-) but you know in this field any person say something new and I always learn new thing :-)

ADD REPLYlink modified 9 months ago • written 9 months ago by Learner 10

@Jennifer Hillman Jackson I can obtain count with feature count but not htseq-count. It gives an error. can you please check it out?

ADD REPLYlink written 9 months ago by Learner 10

Please see my email reply and we can troubleshoot from there.

ADD REPLYlink written 9 months ago by Jennifer Hillman Jackson25k
1

@Jennifer Hillman Jackson I tested it with two account and both failed. Please check your email

ADD REPLYlink written 9 months ago by Learner 10

Hi - I reviewed all submitted emails to galaxy-bugs.

Set the tool form option "Force sorting of SAM/BAM file by NAME" to "Yes" when entering paired-end inputs. This is explained in the tool form but is sometimes missed. In short, it reduces the memory used and can help avoid this type of error. Some jobs may still be too large after using this option, yet please try sorting as a first pass solution.

The same solution applies to all htseq_count jobs that end with an error like this one:

.. other lines of error ..
64100000 SAM alignment record pairs processed.
64200000 SAM alignment record pairs processed.
64300000 SAM alignment record pairs processed.
Error occured when processing SAM input (record #131802701 in file /galaxy-
repl/main/files/XXX/XXX/dataset_XX1.dat):
  Maximum alignment buffer size exceeded while pairing SAM alignments.
  [Exception type: ValueError, raised in __init__.py:671]
.. end of error message ..

Not all tools have a sort option on the form, but sorting can really help avoid errors across tools and can be done as an intermediate step (or steps) in any analysis. How to sort common datatypes is explained here: https://galaxyproject.org/support/#troubleshooting

A secondary admin issue is that there is a one-account-per-user quota at https://usegalaxy.org. I sent you instructions on how to clear this up and we can discuss your accounts privately via email. For others reading, please be aware the duplicated accounts are be picked up by our admin tools periodically and deleted (where the data is often lost). Use just one account and if you need to have duplicates removed to avoid admin action, send us an email to galaxy-bugs@lists.galaxyproject.org and we can guide in consolidating down to one account.

Thanks!

ADD REPLYlink modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour