Question: Making separate tabular files for GATK custom reference genome sorting - Rn5
0
gravatar for christianwood7311
3.8 years ago by
United Kingdom
christianwood73110 wrote:

Dear Galaxy Biostar team,

After receiving some wonderful feedback from your team I'm currently trying to figure out how to correctly order the rat rn5 genome ready to use as a custom reference genome for GATK variant analysis on the public galaxy instance. I have downloaded the rn5 genome from UCSC and have uploaded the genome via FTP to galaxy. Having converted the file to the tabular format from FASTA I believe I now need to separate the file in order to correctly order the chromosomes (chr1, chr2, chr3 .... chrX, chrY, chrM). Which particular tool should I use for this as I'm unsure what would be most appropriate. I know how to run the sorting of the individual files for these new files through Filter and sort > Sort column 1 > Alphabetical/Numerical order (depending on file) and then concatenate the files and convert back to FASTA ready for use as a custom reference genome.

Would it be possible to indicate the particular steps to complete this process please? Any information or link to further information on this would be greatly appreciated.

Best wishes,

Christian

ADD COMMENTlink modified 3.8 years ago by Jennifer Hillman Jackson25k • written 3.8 years ago by christianwood73110
0
gravatar for Jennifer Hillman Jackson
3.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

You have a few options to break up the file:

1. There are tool in "Text Manipulation" that will allow you to pull out lines from a tabular dataset. "Select first" and "Select last". These can often work well with smaller files to break them up.

2. You could also pull out (or omit) specific chromosomes with the "Filter or "Select" tool, from the group "Filter and Sort". 

The same exact steps won't work for all genomes, so just order the tab file in a way that is convenient for you if using the first option, or use as-is when doing the second (a list of all the chromosome identifiers will help here - "Cut" the first column out of the complete genome in tabular format to isolate them).

Best, Jen, Galaxy team

 

ADD COMMENTlink written 3.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour