Question: Reads on Y chromosome while doing sequecing with a female donor
gravatar for Delong, Zhou
4.3 years ago by
Delong, Zhou140
Delong, Zhou140 wrote:


When looking closely to my alignments data I found something interesting. Some of my reads are aligned to the Y chromosome while the sample is from a ovarian cancer cell line - in short a female donor.

Indeed all of the these reads are aligned to repeated regions and for each gene on the Y chromosome having any reads aligned I can find a paralogue on the X chromosome.

Although these reads do not represent a high occurence, I still fear that it may falsify the calculation for gene / transcript expression level, since the genes on autosomes are not affected by the duplication.

I wonder if there is any way to turn off the Y chromosome when using tophat2 (I'm aware of the simple method of removing Y chromosome temporally) or merge the read counts before doing downstream analysis.


rna-seq paralogues y chomosome • 2.7k views
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Delong, Zhou140


This helps. If working command-line, then obtaining our version of the hg19female variant, along with assorted useful indexes (including Tophat2 ... the <dbkey>.*.bt2 files) is another option. All available on our rsync server in the hg19 top level directory. Link with instructions:

Should you decide to try this, the .loc files in the /location directory are formatted in a way such that results are redirected back to the full hg19 assembly. Very useful for visualization at UCSC, use with other tools and reference files (later in the Cuff* tools), etcetera. 

Good luck with you project, Jen, Galaxy team

ADD REPLYlink written 4.3 years ago by Jennifer Hillman Jackson25k


Thanks for your answer. I'm actually working with my local serveur via cmd line.

Judging by the presence of *random.fa files in the genome I think I'm using the hg19 full version. I think I'll just remove the Y chromosome and other unwanted .fa files next time.



ADD REPLYlink written 4.3 years ago by Delong, Zhou140


Thank you very much for the link and your effort!



ADD REPLYlink written 4.3 years ago by Delong, Zhou140

Please accept the answer to help others find it. Thanks.

ADD REPLYlink written 4.3 years ago by Martin Čech ♦♦ 4.9k
gravatar for Jennifer Hillman Jackson
4.3 years ago by
United States
Jennifer Hillman Jackson25k wrote:


Yes, you'll find mappings across repeats, but also the PAR regions (depending on the genome). Some reference genomes on the public Main Galaxy instance have a variant already create that omits the Y chromosome - for example, human hg19 has one, as do certain others. You can also create your own and load as a custom reference genome for use with TopHat (if on Main). Load it as a full index if on a cloud Galaxy.

hg19 reference genome variants:

  • hg19 full - direct replicate of GRCh37 as released by UCSC
  • hg19 canonical - autosomes, X, Y, plus M (no haplotypes, unmapped, other contigs). Could be considered "hg19 male" in this context.
  • hg19 female - canonical minus Y

If you have a specific model organism in mind that you would like to see on the Main instance as a variant, this can be requested in Trello:

As for manipulating TopHat results without a reference genome change - I can't help there, but perhaps someone else on the list has some ideas about that.

Take care, Jen, Galaxy team

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour