Why are all my Feature counts "0" ?

9 months ago by

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

It looks like you are using the GTF version based on hg38. The version of the annotation that is based on hg19 is here: https://www.gencodegenes.org/releases/27lift37.html

Two choices (both assume that you obtained valid hits with HISAT2):

Map against hg38 with HISAT2 instead
Use the hg19 version of the GTF

Please try one of these and let us know if you still have problems. If working at https://usegalaxy.org, an email with a history share link to your working history can be sent to galaxy-bugs@lists.galaxyproject.org for closer review. Please leave all inputs and outputs undeleted.

The server at https://usegalaxy.org is undergoing updates right now and several tools are undergoing adjustments. I am not aware of this tool being problematic and would expect a different failure if your results were related to the other known issues, but we can check (after the mismatch issue with the annotation versus genome version is corrected).

FAQs: https://galaxyproject.org/support/

Thanks! Jen, Galaxy team

ADD COMMENT • link modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson ♦ 25k

@Jennifer Hillman Jackson Hi, should I get the Comprehensive gene annotation ? or another one?

ADD REPLY • link written 9 months ago by Learner • 10

Comprehensive brings in annotation from other sources, which can be useful. Basic contains a subset of comprehensive (one source).

I'm not entirely sure which is best for your purposes. There are quite a few notes about the mapping not working well for the comprehensive. You could try both, review the differences (pick a region or two where there is a known gene of interest), and choose.

ADD REPLY • link written 9 months ago by Jennifer Hillman Jackson ♦ 25k

@Jennifer Hillman Jackson unfortunately it is zero again. I sent an email to the email address you have mentioned

ADD REPLY • link written 9 months ago by Learner • 10

Sorry to hear that. I see the email and will take a closer look at your use-case. Feedback soon.

ADD REPLY • link written 9 months ago by Jennifer Hillman Jackson ♦ 25k

@Jennifer Hillman Jackson Thanks

ADD REPLY • link modified 9 months ago • written 9 months ago by Learner • 10

Hi - I see the problem. The reads were mapped with HISAT2 against vicPac1 by mistake instead of hg19. Remap and the data will match up and should produce the proper statistics.

There is a proposed change to require direct selection of the target genome for mapping tools under consideration that will help avoid this issue. Please follow if interested: https://github.com/galaxyproject/galaxy/issues/4499

ADD REPLY • link modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson ♦ 25k

@Jennifer Hillman Jackson are you sure? because I am pretty sure that I selected the hg19!!! my bad. I am sorry if I caused you any problem. one question, so there are several hg19. I select the hg19 and not the other ones like canonical etc. just what about the gft file for the htseq-count? should I use then the one you mentioned above?

ADD REPLY • link written 9 months ago by Learner • 10

You can map with any of these variants and use the GTF you have now that is based on hg19:

hg19 - this is the full build
hg19 Canonical - contains primary autosomes (chr1-chr22) plus chrX, chrY, chrM
hg19 Female - contains primary autosomes (chr1-chr22) plus chrX, chrM -- no chrY

If you are not interested in haplotypes and unmapped genome data, Canonical or Female can be used. Some choose Female to avoid multi-mapping issues between the pseudoautosomal regions (PARs) in common between chrX and chrY (are exactly duplicated in hg19).

Details of where the PAR regions are located and details about haplo/unmapped included in the build can be reviewed at UCSC (the hg19 source): http://genome.ucsc.edu/cgi-bin/hgGateway

ADD REPLY • link modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson ♦ 25k

@Jennifer Hillman Jackson thank you so much for your detailed explanation. some people say that UCSC is outdated and has so many problem in their genome data. they always refer to ensemble for example. I don't know if this is true or not :-) but you know in this field any person say something new and I always learn new thing :-)

ADD REPLY • link modified 9 months ago • written 9 months ago by Learner • 10

@Jennifer Hillman Jackson I can obtain count with feature count but not htseq-count. It gives an error. can you please check it out?

ADD REPLY • link written 9 months ago by Learner • 10

Please see my email reply and we can troubleshoot from there.

ADD REPLY • link written 9 months ago by Jennifer Hillman Jackson ♦ 25k

@Jennifer Hillman Jackson I tested it with two account and both failed. Please check your email

ADD REPLY • link written 9 months ago by Learner • 10

Hi - I reviewed all submitted emails to galaxy-bugs.

Set the tool form option "Force sorting of SAM/BAM file by NAME" to "Yes" when entering paired-end inputs. This is explained in the tool form but is sometimes missed. In short, it reduces the memory used and can help avoid this type of error. Some jobs may still be too large after using this option, yet please try sorting as a first pass solution.

The same solution applies to all htseq_count jobs that end with an error like this one:

.. other lines of error ..
64100000 SAM alignment record pairs processed.
64200000 SAM alignment record pairs processed.
64300000 SAM alignment record pairs processed.
Error occured when processing SAM input (record #131802701 in file /galaxy-
repl/main/files/XXX/XXX/dataset_XX1.dat):
  Maximum alignment buffer size exceeded while pairing SAM alignments.
  [Exception type: ValueError, raised in __init__.py:671]
.. end of error message ..

Not all tools have a sort option on the form, but sorting can really help avoid errors across tools and can be done as an intermediate step (or steps) in any analysis. How to sort common datatypes is explained here: https://galaxyproject.org/support/#troubleshooting

Tool error? Try Sorting Your Inputs

A secondary admin issue is that there is a one-account-per-user quota at https://usegalaxy.org. I sent you instructions on how to clear this up and we can discuss your accounts privately via email. For others reading, please be aware the duplicated accounts are be picked up by our admin tools periodically and deleted (where the data is often lost). Use just one account and if you need to have duplicates removed to avoid admin action, send us an email to galaxy-bugs@lists.galaxyproject.org and we can guide in consolidating down to one account.

Terms of use (from the Help menu in the GUI masthead): https://usegalaxy.org/static/terms.html
About Galaxy Main (from the Galaxy Hub): https://galaxyproject.org/main/#user-data-and-job-quotas

Thanks!

ADD REPLY • link modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »