I am trying to convert my SAM file to BAM file and I want to use the built in genome, but there is no available reference genome to use. I only have a SAM file from Saccharomyces cerevisiae. How do I load the reference genome??
Can you post the first 5-10 lines of the SAM file? That'll help determine where to download the genome from.
Here are the first five line of the BAM file
@HD VN:1.0 SO:coordinate
@SQ SN:chrI LN:230208 UR:file:/nfs/srpipe_references/references/Saccharomyces_cerevisiae/S288c/all/fasta/S_cerevisiae_S288c.fa AS:S288c M5:d05f97502be63502b1dc28cba2e4d8f8 SP:Saccharomyces cerevisiae
@SQ SN:chrII LN:813178 UR:file:/nfs/srpipe_references/references/Saccharomyces_cerevisiae/S288c/all/fasta/S_cerevisiae_S288c.fa AS:S288c M5:672ccac60edc61550caa540019ada6fa SP:Saccharomyces cerevisiae
So there isn't any built in genome or I have to upload them myself?
Given the header, it's the built in sacCer2 genome.
Thanks for the tip Devon.
So there isn't any built in genome in Galaxy for the SAM to BAM tool?
Umm, I literally included the phrase "built in" in my reply.
Sorry, I should be more specific with my question. I know the SAM file is build using the sacCer2 genome, but how do I select sacCer2 Genome on Galaxy's build in genome. Please take a look at the picture below. I selected built in genome but there isn't any reference genome available. Thanks
The tool is filtering by the SAM input's assigned "database" metadata. Set the SAM dataset to sacCer2 (pencil icon > Edit attributes > first tab). After, the matching genome will show up in the tool selection if working at https://usegalaxy.org.
Other servers may have different built-in genomes.
How do you know that it is sacCer2, not S288c? Thanks. I am confused about how to tell what the database should be.
S288C is the reference strain. The genome version of that you're using is sacCer2. One can determine this by looking at the chromosome sizes, which match those of sacCer2. Note that the only thing the genome is actually used for here is adding the chromosome names and sizes, so even if it weren't the correct genome, if these matched then the results would still be correct.