Hi all!
I did a bowtie2 index on a genome file using bowtie2-build. How do I find the following information?
Number of sequences in the genome?
Name(s) of the sequence in the genome file?
Thanks!
Hi all!
I did a bowtie2 index on a genome file using bowtie2-build. How do I find the following information?
Number of sequences in the genome?
Name(s) of the sequence in the genome file?
Thanks!
Hello,
The fasta database was indexed on the line-command or using a Data Manager? If line command, there are simple unix commands that can provide information (there are of course other methods when on the line-command). Getting this information from within Galaxy after using a Data Manager is different.
Data manager options:
Run a job against the database (map using BWA, etc), then use tools from the groups SAMTools, Picard, Text Manipulation and even Select, Filter, Group to generate summaries/lists of content from the header in a SAM/BAM dataset.
Line-command options:
This will exact all of the identifier lines in a fasta file:
prompt% grep ">" genome.fasta | sed 's/^>//' | cut -f1 > identifer_list
And this will return the number of identifiers. The output could be redirected to an output file if you wanted.:
prompt% wc -l identifer_list
Best, Jen, Galaxy team