Question: BWA for Illumina using custom genome
0
gravatar for sshaldipurkar1
2.5 years ago by
sshaldipurkar10 wrote:

Hi I have uploaded the fasta files (3 chromosomes) of my genome of interest. I downloaded FASTA files from NCBI http://www.ncbi.nlm.nih.gov/nuccore/810407602?report=fasta (On the top right hand corner, I clicked Send -> complete record -> FASTA file). I uploaded these files using Firezilla FTP. However, mapping using 'BWA for Illumina' or 'Bowtie for Illumina' the job gets executed but the action gets aborted. The fasta files I uploaded were recognised as fasta files by the system while mapping. I have used the same fastq file using a related built in reference genome and it worked fine. If anyone knows what might be going on may they please get in touch with me to help me figure it?

I would really appreciate your help.

Kind Regards, Sayali.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by sshaldipurkar10
0
gravatar for Jennifer Hillman Jackson
2.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

It is difficult to know if formatting of the fasta file is the problem or if the recent cluster issues are a factor.

Making sure that the fasta is in good shape cannot hurt and is recommended anyway. Mappers are generally very tolerant of variations in format, while downstream tools are less so. This means that errors will occur downstream that require modifications of the fasta, and often remapping.

Please see the instructions for creating a Custom Genome at this wiki, including the Troubleshooting section. The primary goal is to have a fasta file with simple and unique chromosome identifiers, no description content, and wrapped sequence lines. https://wiki.galaxyproject.org/Learn/CustomGenomes

The details about best-practice fasta formatting at the bottom of this tutorial also cover similar information: Fasta Format, Custom Genomes, and GATK Chromosome ordering

After formatting, then rerun. This is the most direct solution, and often the only solution, for jobs that fail on the cluster right now.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 2.5 years ago by Jennifer Hillman Jackson25k
0
gravatar for sshaldipurkar1
2.5 years ago by
sshaldipurkar10 wrote:

Dear Jen, Thank you very much for your reply!

The following format worked best for me:

NZ_HG938372 Burkholderia cenocepacia H111 chromosome 3, complete genome.

Even though I generally followed fasta format rules what seemed to have made the difference was putting a fullstop/period at the end of the identifier line or description line.

Hope this helps anyone else who might be struggling with getting the right format for mapping with your own reference genome.

Kind Regards, Sayali.

ADD COMMENTlink written 2.5 years ago by sshaldipurkar10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour