Question: Bowtie-build : Finding information from bowtie2 indexed genome file
3.0 years ago by
syrez.m0 wrote:

Hi all!

I did a bowtie2 index on a genome file using bowtie2-build. How do I find the following information?

Number of sequences in the genome? 

Name(s) of the sequence in the genome file? 


3.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:


The fasta database was indexed on the line-command or using a Data Manager? If line command, there are simple unix commands that can provide information (there are of course other methods when on the line-command). Getting this information from within Galaxy after using a Data Manager is different. 

Data manager options:

Run a job against the database (map using BWA, etc), then use tools from the groups SAMTools, Picard, Text Manipulation and even Select, Filter, Group to generate summaries/lists of content from the header in a SAM/BAM dataset.

Line-command options:

This will exact all of the identifier lines in a fasta file:

prompt% grep ">" genome.fasta | sed 's/^>//' | cut -f1 > identifer_list

And this will return the number of identifiers. The output could be redirected to an output file if you wanted.:

prompt% wc -l identifer_list

Best, Jen, Galaxy team

thanks Jen! I am using command line tools and prompt% was not found so I edited it to be cat genome.fas | grep ">" | sed 's/^>//' | cut -f1 | more 

I have a feeling that it worked because it displayed the following output:




