Question: How to create a Fasta file of mouse genome from download chromosome files
0
gravatar for mohammedtleis
18 months ago by
mohammedtleis0 wrote:

Dear Biostar members,

My intention is to create a genome reference of the mouse (mm10) to be used within bowtie2. So far, I downloaded the fa files and have the files listed below after my question.

Now I need to combine the files into one fa file to be used as reference genome for bowtie2. My questions are :

  1. what files of these listed below should be combined?

  2. What are the *_random.fa files? and should I include them into the reference genome fasta file?

  3. What is chrUn ? and should I include it into the genome fasta file?

  4. I searched for the number of chromosomes of a mouse and google returned 20. In this mm10 genome, I can see files corresponding to 19 chr. 1 X and 1 Y. Then my question is how many chromosomes does a mouse genome has and why I couldn't find consistent numbers.

chr4_JH584293_random.fa chrUn_GL456378.fa chr1.fa chr4_JH584294_random.fa chrUn_GL456379.fa chr10.fa chr4_JH584295_random.fa chrUn_GL456381.fa chr11.fa chr5.fa chrUn_GL456382.fa chr12.fa chr5_GL456354_random.fa chrUn_GL456383.fa chr13.fa chr5_JH584296_random.fa chrUn_GL456385.fa chr14.fa chr5_JH584297_random.fa chrUn_GL456387.fa chr15.fa chr5_JH584298_random.fa chrUn_GL456389.fa chr16.fa chr5_JH584299_random.fa chrUn_GL456390.fa chr17.fa chr6.fa chrUn_GL456392.fa chr18.fa chr7.fa chrUn_GL456393.fa chr19.fa chr7_GL456219_random.fa chrUn_GL456394.fa chr1_GL456210_random.fa chr8.fa chrUn_GL456396.fa chr1_GL456211_random.fa chr9.fa chrUn_JH584304.fa chr1_GL456212_random.fa chrM.fa chrX.fa chr1_GL456213_random.fa chrUn_GL456239.fa chrX_GL456233_random.fa chr1_GL456221_random.fa chrUn_GL456359.fa chrY.fa chr2.fa chrUn_GL456360.fa chrY_JH584300_random.fa chr3.fa chrUn_GL456366.fa chrY_JH584301_random.fa chr4.fa chrUn_GL456367.fa chrY_JH584302_random.fa chr4_GL456216_random.fa chrUn_GL456368.fa chrY_JH584303_random.fa chr4_GL456350_random.fa chrUn_GL456370.fa chr4_JH584292_random.fa chrUn_GL456372.fa

Best Regards, M. Tleis

galaxy chip-seq • 1.2k views
ADD COMMENTlink modified 18 months ago by Devon Ryan1.9k • written 18 months ago by mohammedtleis0
1
gravatar for Devon Ryan
18 months ago by
Devon Ryan1.9k
Germany
Devon Ryan1.9k wrote:
  1. All of them.
  2. Contigs that have yet to be fully incorporated into chromosomes. With the exception of the chrUns, the chromosome is known, but the exact location on them is not.
  3. Contigs where the chromosome is unknown. Yes, include them.
  4. ? 19 chromosomes plus sex chromosomes makes 20, so you've received entirely consistent information.
ADD COMMENTlink written 18 months ago by Devon Ryan1.9k

Thank you Devon for your answer, I find it so helpful. Now I have three questions/issues:

  1. Does it matter in what order the files are combined? Would bowtie2 be able to automatically locate the corresponding information throughout the file? In case the order is important where should be the unknown contigs? I am using the cat command of unix as : cat chr1.fa .... chr19.fa chrX.fa chrY.fa

  2. If I exclude the unknow contigs from the reference genome. Is it still a valid reference genome to be used with bowtie2?

  3. I managed to upload the fasta file ( without the contigs with unknown locations) into a local instance of Galaxy. I managed that by uploading the file as a data library. But if I view the dataset in galaxy it shows me '>chr1' in the first line and then it is followed by a very long sequence of 'N's. I also uploaded the fa file corresponding to chr1 only and it shows the same series of N's. Any ideas how to view a fasta file correctly?

ADD REPLYlink modified 18 months ago • written 18 months ago by mohammedtleis0
  1. It only matters if you need to merge other datasets, since occasionally tools don't properly handle BAM files with different chromosome ordering. You can always reorder things with picard if this is ever an issue.
  2. Sure, it's just slightly incomplete.
  3. That sounds correct, telomeres are hard masked.
ADD REPLYlink written 18 months ago by Devon Ryan1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour