Question: HTSeq-Count on Command Line
0
gravatar for jchen015
3.8 years ago by
jchen01580
Singapore
jchen01580 wrote:

Dear all

So basically, I have a .bam file(cut_L7_1_5.bam) and i installed HTSeq-Count onto my local Galaxy server already.

Now, I wish to run it on my .bam file with this command

"python -m HTSeq.scripts.count [options] <alignment_file> <gff_file>"

 

To my understanding, <alignment_file> is actually the .bam file itself? What about gff_file?

How can i run them?

Regards,

Julius

commandline #htseq-count sam • 3.3k views
ADD COMMENTlink written 3.8 years ago by jchen01580
0
gravatar for Jennifer Hillman Jackson
3.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

To obtain the correct command-line for most tools, execute it through the UI first and it will be listed on the "Info" page (click on the small "i" icon in the expanded dataset). There are still a few tools that do not have this included - for those, run the tool in a way that produces an error, then send yourself the bug report, it will contain the command line.

Hopefully this helps, Jen, Galaxy team

ADD COMMENTlink written 3.8 years ago by Jennifer Hillman Jackson25k

Hi Jennifer

Thanks for the reply.

Referencing from your answer : How to run HTseq on the public galaxy platform?

If the public Galaxy server does not have a function for HTSeq-Count, then how can i execute it through the UI?

Is there other ways to it?

Regards,

Julius

ADD REPLYlink written 3.8 years ago by jchen01580

I think i am pretty close, i just need to know what is a gff file and its purpose. How do i even generate it?

This is what I did....

julius@julius-Aspire-4755:~/Desktop$ htseq-count -f bam cut_L7_1_5.bam gff.file
Error occured when processing GFF file (line 1 of file gff.file):
  [Errno 2] No such file or directory: 'gff.file'
  [Exception type: IOError, raised in __init__.py:51]
julius@julius-Aspire-4755:~/Desktop$

 

ADD REPLYlink written 3.8 years ago by jchen01580

The GFF file should be a Dataset loaded into one of your working Histories. These are reference annotation files. For this tool, it should represent the genomic regions of interest to be summarized. The description for the tool in the Tool Shed (link I shared) has a brief summary of the content usually used. Good sources for these are UCSC, BioMart, Ensembl (see the tools on http://usegalaxy.org for other examples). You are not limited to these data providers and where to obtain the annotation data depends on what regions you interested in, the target reference genome, and who curates that data and makes it available publicly (or privately, if you are working with others internally and they have the data). As long as the file is in specification and the base reference genome is an exact match (the chromosome/contig identifiers and genome version) between all inputs, the tool should run fine. Take care, Jen

ADD REPLYlink written 3.8 years ago by Jennifer Hillman Jackson25k

Hi Jennifer

Thanks for the explanation.

But I am still unable to figure out how to generate a GFF file to perform HTSeq on my bam.file

To my understanding, i know GFF file has 9 parameters, how am i suppose to comeup with the 9 of them and save it under a gff.file??

Regards,

Julius

ADD REPLYlink written 3.8 years ago by jchen01580

Hello, You mentioned that you had set up a local Galaxy. I assumed that you were using the wrapped tool there (https://toolshed.g2.bx.psu.edu/view/lparsons/htseq_count). You can always install if you haven't yet and start the server, it will have a local URL. Then execute the tool that way. Your command-lines will have paths, etc. that will match the environment you are working on. Best, Jen, Galaxy team

ADD REPLYlink written 3.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour