Aligning more than 2 sequences

Question: Aligning more than 2 sequences

19 months ago by

ppurkayastha2010 • 30

ppurkayastha2010 • 30 wrote:

I have a pair-end data fast1.fq and fast2.fq in (fastq format). Now, I need to align the pair-end reads with two reference sequences.

I need to align the 3 sequences (pair end file, reference1, and reference 2) in such a way that in a bam file, I can see these three aligned and mapped to each other.

I know BOWTIE2 can be used to map 2 sequences. But how shall I proceed with mapping for 3 sequences to each other

ngs alignment variant analysis • 462 views

ADD COMMENT • link •

modified 19 months ago by Jennifer Hillman Jackson ♦ 25k • written 19 months ago by ppurkayastha2010 • 30

19 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Not in one step for the different input types, that I am aware of.

Some options:

The mapping tools for NGS data are designed to map NGS reads against a single reference fasta file of data (typically a genome, transcriptome, exome). Each reference fasta file, aka dataset, could be used as Custom genome if not already indexed on the Galaxy server in use.
This isn't exactly what you described as the goal, but two reference sequences could be combined into a single "meta" reference dataset and the NGS reads aligned against that as a custom reference genome.

Note that both 1 & 2 would not provide homology information between the two reference sequences themselves - instead just the NGS reads verses each, distinctly. But those two sequences could be compared to each other using difference tools. Which depends on what kind of data those represent.

The NGS reads could be assembled to produce a consensus fasta dataset (in effect a "third" reference datasets) and then all three compared to produce a MAF result (Multiple Alignment Format). If you want to do that, the content of the reference sequences should be considered when selecting a tool. If you identify a tool that meets your needs and matches the data input types (from a publication, web search, etc) check to see if it is wrapped for Galaxy in the Tool Shed (http://usegalaxy.org/toolshed). Considerations when selecting a tool (there can be others):
- How successful was the NGS readsassembly?
- Is each "reference sequence" really just a single sequence?
- Or are there multiple sequences per distinct "reference dataset"?
- What do they represent (transcript(s), chromosomes(s), exons(s), other)?
- How long are the reference sequences?
- Two individual transcripts are relatively easy to compare.
- Two genomes, even smaller ones, or many versus many transcripts are both operations that are more complicated to do.

Custom genome help:

I hope that this helps! Jen, Galaxy team

ADD COMMENT • link modified 19 months ago • written 19 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »