Question: How to I add a genome to Database/Build?
gravatar for amelidou
21 days ago by
amelidou0 wrote:

Hi! After pileup file generation and convertion of pileup to interval, potentially we could extract the consensus using the extract genomic DNA option. But it asks you to first tag the sequence using one of the existing Database/Build options which are quite limited. How can you add more? Is there another way to extract the consensus from a pileup file? Thank you! Angeliki

ADD COMMENTlink modified 10 days ago by funnyjokes1.com0 • written 21 days ago by amelidou0
gravatar for Jennifer Hillman Jackson
20 days ago by
United States
Jennifer Hillman Jackson25k wrote:


The 10 column pileup format includes the consensus in the 9th field. You could use a tool like "Tabular-to-Fasta" to extract it directly.

If you use the positional coordinates against the custom reference genome used to initially map with, and pull out that sequence, it is possible, but be aware that the sequence won't include any of the variations from your read data. If that is your goal (maybe you want to have both), the Extract tool can be used with the option to get the reference genome fasta data from the history. No need to assign the database with that usage.

  • Tool: Extract Genomic DNA
  • Option: Choose the source for the reference genome
    • Setting: From the history
  • Option: Using reference genome
    • Setting: Pick your custom genome fasta dataset

If you ever do need a custom genome's database assigned to a dataset (some tools do work better that way -- but not Extract), a custom genome can be promoted to a custom build. This creates a "database" specifically associated with your account that can be assigned the same as any other database (Pencil icon > Edit Attributes > first tab, genome/database selection > Save)


Thanks! Jen, Galaxy team

ADD COMMENTlink modified 20 days ago • written 20 days ago by Jennifer Hillman Jackson25k
gravatar for amelidou
20 days ago by
amelidou0 wrote:

Hi Jen, thank you for your answer! My column 9 contains various characters (like .,g, $) and if you try tabular to fasta it only creates different fasta files (not one consensus) with all these characters under each heading...Column 4 looks more like the sequence, but still it generates various fasta files with different headings (not just one consensus). Any ideas? Best regards, Angeliki

ADD COMMENTlink written 20 days ago by amelidou0

If you do not want the encoded consensus, try NGS: SAMtools > Pileup-to-Interval.

The consensus sequence from a pileup dataset is not the same as an assembly. These are short regions where variation was detected and reported.

ADD REPLYlink written 10 days ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour