Question: Bowtie-build : Finding information from bowtie2 indexed genome file
gravatar for syrez.m
3.0 years ago by
syrez.m0 wrote:

Hi all!

I did a bowtie2 index on a genome file using bowtie2-build. How do I find the following information?

Number of sequences in the genome? 

Name(s) of the sequence in the genome file? 


bowtie • 1.3k views
ADD COMMENTlink modified 3.0 years ago by Jennifer Hillman Jackson25k • written 3.0 years ago by syrez.m0
gravatar for Jennifer Hillman Jackson
3.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:


The fasta database was indexed on the line-command or using a Data Manager? If line command, there are simple unix commands that can provide information (there are of course other methods when on the line-command). Getting this information from within Galaxy after using a Data Manager is different. 

Data manager options:

Run a job against the database (map using BWA, etc), then use tools from the groups SAMTools, Picard, Text Manipulation and even Select, Filter, Group to generate summaries/lists of content from the header in a SAM/BAM dataset.

Line-command options:

This will exact all of the identifier lines in a fasta file:

prompt% grep ">" genome.fasta | sed 's/^>//' | cut -f1 > identifer_list

And this will return the number of identifiers. The output could be redirected to an output file if you wanted.:

prompt% wc -l identifer_list

Best, Jen, Galaxy team

ADD COMMENTlink written 3.0 years ago by Jennifer Hillman Jackson25k

thanks Jen! I am using command line tools and prompt% was not found so I edited it to be cat genome.fas | grep ">" | sed 's/^>//' | cut -f1 | more 

I have a feeling that it worked because it displayed the following output:




ADD REPLYlink written 3.0 years ago by syrez.m0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour