Question: Aligning more than 2 sequences
gravatar for ppurkayastha2010
19 months ago by
ppurkayastha201030 wrote:

I have a pair-end data fast1.fq and fast2.fq in (fastq format). Now, I need to align the pair-end reads with two reference sequences.

I need to align the 3 sequences (pair end file, reference1, and reference 2) in such a way that in a bam file, I can see these three aligned and mapped to each other.

I know BOWTIE2 can be used to map 2 sequences. But how shall I proceed with mapping for 3 sequences to each other

ADD COMMENTlink modified 19 months ago by Jennifer Hillman Jackson25k • written 19 months ago by ppurkayastha201030
gravatar for Jennifer Hillman Jackson
19 months ago by
United States
Jennifer Hillman Jackson25k wrote:


Not in one step for the different input types, that I am aware of.

Some options:

  1. The mapping tools for NGS data are designed to map NGS reads against a single reference fasta file of data (typically a genome, transcriptome, exome). Each reference fasta file, aka dataset, could be used as Custom genome if not already indexed on the Galaxy server in use.

  2. This isn't exactly what you described as the goal, but two reference sequences could be combined into a single "meta" reference dataset and the NGS reads aligned against that as a custom reference genome.

Note that both 1 & 2 would not provide homology information between the two reference sequences themselves - instead just the NGS reads verses each, distinctly. But those two sequences could be compared to each other using difference tools. Which depends on what kind of data those represent.

  1. The NGS reads could be assembled to produce a consensus fasta dataset (in effect a "third" reference datasets) and then all three compared to produce a MAF result (Multiple Alignment Format). If you want to do that, the content of the reference sequences should be considered when selecting a tool. If you identify a tool that meets your needs and matches the data input types (from a publication, web search, etc) check to see if it is wrapped for Galaxy in the Tool Shed ( Considerations when selecting a tool (there can be others):
    • How successful was the NGS readsassembly?
    • Is each "reference sequence" really just a single sequence?
    • Or are there multiple sequences per distinct "reference dataset"?
    • What do they represent (transcript(s), chromosomes(s), exons(s), other)?
    • How long are the reference sequences?
    • Two individual transcripts are relatively easy to compare.
    • Two genomes, even smaller ones, or many versus many transcripts are both operations that are more complicated to do.

Custom genome help:

I hope that this helps! Jen, Galaxy team

ADD COMMENTlink modified 19 months ago • written 19 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour