Question: merging two unrelated reference genomes and annotation files
0
gravatar for Widmer, Giovanni
22 months ago by
US, Tufts University
Widmer, Giovanni150 wrote:

Hi, I work with an intracellular pathogen. I would like to run an RNA-Seq analysis in galaxy using a combined host-parasite transcriptome as reference and annotation (gtf) file. I expect about 2% of my reads are of pathogen origin and the remaining 98% of host origin. How do I create a combined host (pig) and pathogen (Cryptosporidium) references files with both species merged into one genome file and one annotation file?

thanks!

Giovanni Widmer Tufts University

ADD COMMENTlink modified 22 months ago • written 22 months ago by Widmer, Giovanni150
0
gravatar for Jennifer Hillman Jackson
22 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Merge the two genome fasta files with the tool Concatenate, then use it as a Custom reference genome. Merging the reference annotation will probably work if the files are in GTF format - merge the headers removing redundant lines, merge the data lines, then combine all together. Other tools in the groups Text Manipulation and Filter and Sort can select specific lines from tabular data (GTF is tabular) to create header versus data line intermediate datasets.

GFF3 format is more complicated and it is likely "ID" attribute conflicts will occur if merged.

https://wiki.galaxyproject.org/Support#Custom_reference_genome

Best, Jen, Galaxy team

ADD COMMENTlink modified 22 months ago • written 22 months ago by Jennifer Hillman Jackson25k
0
gravatar for Widmer, Giovanni
22 months ago by
US, Tufts University
Widmer, Giovanni150 wrote:

thanks for your help, Jen. Which Concatenate tool do I use to merge 2 genome fasta files? Concatenate Fasta Alignment by Species (under Fasta Manipulation) seems to be the only tool that requires a FASTA formatted input file. For merging two GTF annotation files, do I use Concatenate datasets tail-to-head (under Text Manipulation)?

Giovanni

ADD COMMENTlink written 22 months ago by Widmer, Giovanni150

Do this the other way around.

For the fasta datasets, use Concatenate datasets tail-to-head. Make certain there are no extra blank lines between the two after merged. Use the Select tool with the regular expression ^$ to find these.

For the GTF files, extract the headers into new datasets and merge so that there are no duplicated lines. Tools in Text Mani can select lines in many ways by line position (the Select tool can too - but based on content, as described above). It is your choice which method to use. Then extract the data lines into new datasets as well. The Concatenate datasets tail-to-head can be used to assemble all into one dataset at the end.

ADD REPLYlink modified 21 months ago • written 21 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 105 users visited in the last hour