Reads on Y chromosome while doing sequecing with a female donor

Question: Reads on Y chromosome while doing sequecing with a female donor

4.3 years ago by

Canada

Delong, Zhou • 140 wrote:

Hello,

When looking closely to my alignments data I found something interesting. Some of my reads are aligned to the Y chromosome while the sample is from a ovarian cancer cell line - in short a female donor.

Indeed all of the these reads are aligned to repeated regions and for each gene on the Y chromosome having any reads aligned I can find a paralogue on the X chromosome.

Although these reads do not represent a high occurence, I still fear that it may falsify the calculation for gene / transcript expression level, since the genes on autosomes are not affected by the duplication.

I wonder if there is any way to turn off the Y chromosome when using tophat2 (I'm aware of the simple method of removing Y chromosome temporally) or merge the read counts before doing downstream analysis.

Thanks,

rna-seq paralogues y chomosome • 2.7k views

ADD COMMENT • link •

modified 4.3 years ago • written 4.3 years ago by Delong, Zhou • 140

Hi,

This helps. If working command-line, then obtaining our version of the hg19female variant, along with assorted useful indexes (including Tophat2 ... the <dbkey>.*.bt2 files) is another option. All available on our rsync server in the hg19 top level directory. Link with instructions: http://wiki.galaxyproject.org/Admin/UseGalaxyRsync

Should you decide to try this, the .loc files in the /location directory are formatted in a way such that results are redirected back to the full hg19 assembly. Very useful for visualization at UCSC, use with other tools and reference files (later in the Cuff* tools), etcetera.

Good luck with you project, Jen, Galaxy team

ADD REPLY • link written 4.3 years ago by Jennifer Hillman Jackson ♦ 25k

Hi,

Thanks for your answer. I'm actually working with my local serveur via cmd line.

Judging by the presence of *random.fa files in the genome I think I'm using the hg19 full version. I think I'll just remove the Y chromosome and other unwanted .fa files next time.

Best,

ADD REPLY • link written 4.3 years ago by Delong, Zhou • 140

Hi,

Thank you very much for the link and your effort!

Best,

ADD REPLY • link written 4.3 years ago by Delong, Zhou • 140

Please accept the answer to help others find it. Thanks.

ADD REPLY • link written 4.3 years ago by Martin Čech ♦♦ 4.9k

4.3 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Yes, you'll find mappings across repeats, but also the PAR regions (depending on the genome). Some reference genomes on the public Main Galaxy instance have a variant already create that omits the Y chromosome - for example, human hg19 has one, as do certain others. You can also create your own and load as a custom reference genome for use with TopHat (if on Main). Load it as a full index if on a cloud Galaxy.

hg19 reference genome variants:

hg19 full - direct replicate of GRCh37 as released by UCSC
hg19 canonical - autosomes, X, Y, plus M (no haplotypes, unmapped, other contigs). Could be considered "hg19 male" in this context.
hg19 female - canonical minus Y

If you have a specific model organism in mind that you would like to see on the Main instance as a variant, this can be requested in Trello: http://trello.com/c/mJWnAuuQ

As for manipulating TopHat results without a reference genome change - I can't help there, but perhaps someone else on the list has some ideas about that.

Take care, Jen, Galaxy team

ADD COMMENT • link modified 4.3 years ago • written 4.3 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »