How To Combine Two Reference Genome (Files) In Galaxy?

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: How To Combine Two Reference Genome (Files) In Galaxy?

0

7.1 years ago by

Binbin You • 50

Binbin You • 50 wrote:

Hi all, I have two reference (genome) files. Let's say EAB_FB_MG.fa(total37972 sequences/contigs) and EAB_FB.fa(21272 sequences/contigs). I know there are some common contigs between them. How could I combine/merge them to get a new reference file with all unique contigs (without duplicates)? Many thanks for any idea!!

• 1.6k views

ADD COMMENT • link •

modified 7.1 years ago by Jennifer Hillman Jackson ♦ 25k • written 7.1 years ago by Binbin You • 50

0

7.1 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello, There is a tool from the FASTX-Toolkit to remove duplicated sequences, "Collapse sequences", but it is designed to work on short reads. If the common IDs/sequences are the same between the two files, you could compare them to identify the common and unique entries. The general path would be to first convert the fasta format to tabular using "Convert Formats -> FASTA-to-Tabular" then compare the IDs using "Join, Subtract and Group -> Compare two Datasets". Three comparisons will be needed: 1 - rows unique to file1 2 - rows unique to file2 3 - rows in common Then merge the results using "Text Manipulation -> Concatenate datasets" and convert back to fasta using "Convert Formats -> Tabular-to-FASTA". If the IDs are not the same and the sequences are slightly different, then you will probably need to consider a tool designed to do genome sequence assembly. Hopefully this helps, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support

ADD COMMENT • link written 7.1 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »

GATK indel realigner using custom reference
I've been using some SAM (converted to BAM) files and some partial genome assemblies (custom refe...
Chip Seq analysis with multiple biological replicates for differential expression
Hello, I am very new to sequence data analysis and had some structural questions. I am trying to ...
Creating consensus sequence after mapping
Greetings, I would like to know how to create contigs, or DNA consensus sequence after mapping t...
Using BWA with Illumina data and metagenomic "reference genome"
Hi, I'd like to try using BWA to align Illumina reads to some contigs a collaborator made from my...
Create and extract consensus sequence after initial genome mapping using a custom genome
I have aligned fastq sequences (from an insect vector) to a small bacterial genome of 1.2 mb (as...
use HIV as reference genome
hi, i have 4 read outs that cover the whole genome of HIV sequence. i want to align them to for...
Bowtie Reference Genome
Dear galaxy users, I am trying to map some multiplexing bisulfite PCR data (Illumina) to o...
How to put 2 reference choromosomes in one file so that i can perform alignment based on only these 2?
Hi There, Just a quick question, I am trying to perform alignment for three of my target genes ...
SAMtools Merge BAM files hangs
Hi all, I am using TopHat to map RNA-seq reads. My study species is a non-model organism, so I c...
Problems in SNPs identification
Hello I am Viva I am using galaxy as a platform to identify SNPs in my datasets. I have two d...
How to add flanking sequences to SNPs from vcf files
Hej, I posted this question in Biostars nad would like to shift this post here to biostars galax...
merging two unrelated reference genomes and annotation files
Hi, I work with an intracellular pathogen. I would like to run an RNA-Seq analysis in galaxy usin...
rDNA sequence extraction from NGS short reads or contigs
How can I extract rDNA sequence from NGS short reads or assembled contigs?
RNAseq data to be processed in two ways: (i) mapping to de novo Trinity-based transcriptome and (ii) mapping a relatively new genome
Hello all, I am new to RNAseq data and learning this process step by step, so I have a few quest...
Help with an simple workflow (annotation)?
Dear all, I need a recommendation about what tools to use in a workflow for annotation of sequenc...
Transcriptome Hypericum Perforatum
To whom it may concern I would like to kindly ask you if you do have any experience in de- novo...
Issue when trying to combine two VCF files
Hi! I have two different types of VCF files. The first, complete: ![enter image description here...
Bowtie 2 output read
I am using galaxy platform to run Bowtie 2. I have illumina paired end reads (file#1 and file#2)...
Merging two fastq files together
Hi all, I've recently received the .fastq files for my rna-seq experiment. According to the sequ...
Finding Snp Differences Between Datasets
Hi, I am trying to find SNPs and/or indel variants that differ between two groups of samples. Th...
How to export the SNPs between whole genome alignments using the command line?
I am using Mauve to align two the whole genome sequences, one of the sequence is in the form of c...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 169 users visited in the last hour