HTSeq-Count on Command Line

3.8 years ago by

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

To obtain the correct command-line for most tools, execute it through the UI first and it will be listed on the "Info" page (click on the small "i" icon in the expanded dataset). There are still a few tools that do not have this included - for those, run the tool in a way that produces an error, then send yourself the bug report, it will contain the command line.

Hopefully this helps, Jen, Galaxy team

ADD COMMENT • link written 3.8 years ago by Jennifer Hillman Jackson ♦ 25k

Hi Jennifer

Thanks for the reply.

Referencing from your answer : How to run HTseq on the public galaxy platform?

If the public Galaxy server does not have a function for HTSeq-Count, then how can i execute it through the UI?

Is there other ways to it?

Regards,

Julius

ADD REPLY • link written 3.8 years ago by jchen015 • 80

I think i am pretty close, i just need to know what is a gff file and its purpose. How do i even generate it?

This is what I did....

julius@julius-Aspire-4755:~/Desktop$ htseq-count -f bam cut_L7_1_5.bam gff.file
Error occured when processing GFF file (line 1 of file gff.file):
[Errno 2] No such file or directory: 'gff.file'
[Exception type: IOError, raised in __init__.py:51]
julius@julius-Aspire-4755:~/Desktop$

ADD REPLY • link written 3.8 years ago by jchen015 • 80

The GFF file should be a Dataset loaded into one of your working Histories. These are reference annotation files. For this tool, it should represent the genomic regions of interest to be summarized. The description for the tool in the Tool Shed (link I shared) has a brief summary of the content usually used. Good sources for these are UCSC, BioMart, Ensembl (see the tools on http://usegalaxy.org for other examples). You are not limited to these data providers and where to obtain the annotation data depends on what regions you interested in, the target reference genome, and who curates that data and makes it available publicly (or privately, if you are working with others internally and they have the data). As long as the file is in specification and the base reference genome is an exact match (the chromosome/contig identifiers and genome version) between all inputs, the tool should run fine. Take care, Jen

ADD REPLY • link written 3.8 years ago by Jennifer Hillman Jackson ♦ 25k

Hi Jennifer

Thanks for the explanation.

But I am still unable to figure out how to generate a GFF file to perform HTSeq on my bam.file

To my understanding, i know GFF file has 9 parameters, how am i suppose to comeup with the 9 of them and save it under a gff.file??

Regards,

Julius

ADD REPLY • link written 3.8 years ago by jchen015 • 80

Hello, You mentioned that you had set up a local Galaxy. I assumed that you were using the wrapped tool there (https://toolshed.g2.bx.psu.edu/view/lparsons/htseq_count). You can always install if you haven't yet and start the server, it will have a local URL. Then execute the tool that way. Your command-lines will have paths, etc. that will match the environment you are working on. Best, Jen, Galaxy team

ADD REPLY • link written 3.8 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »