Extracting Sequences For Transcripts From Reference Genome

Question: Extracting Sequences For Transcripts From Reference Genome

5.6 years ago by

Dear Galaxy community I'm new to galaxy and would like to ask the following: I have trimmed, QC'ed my data received from Illumina HiScan SQ, paired and single end data. Mapped using Tophat, run cufflinks, cuffmerge and cuffdiff. I would like to analyze the gene_exp.diff file by extracting the significant transcripts. I've used grep "yes" to extract only the significant transcripts. From this info I have the locus start and end coordinates of each transcript for example "XLOC_000544 XLOC_000544 - chr1:12763969-12765675 C0 C4 OK 3.16487 1628.25 9.00696 -4.57022 4.8722e-06 0.00905256 yes". How can I go about to extract this information/or sequence from the reference genome. Kind regards Lizex This message is confidential and may be covered by legal professional privilege. It must not be read, copied, disclosed or used in any other manner by any person other than the addressee(s). Unauthorised use, disclosure or copying is strictly prohibited and may be unlawful. The views expressed in this email are those of the sender, unless otherwise stated. If you have received this email in error, please contact ARC Service Desk immediately. mailto:Servicedesk@arc.agric.za) To report incidents of fraud and / or corruption in the ARC use our Ethics Hotline by: Phone number : 0800 000 604 Fax number : 0800 00 7788 Email address : arc@tip-offs.com Please Call me : 32840 Website: www.tip-offs.com For more information on the ARC Ethics Hotline, please visit our website at www.arc.agric.za.

rna-seq cuffmerge cufflinks • 1.3k views

ADD COMMENT • link •

modified 5.6 years ago by Jennifer Hillman Jackson ♦ 25k • written 5.6 years ago by Lizex Husselmann • 80

5.6 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hi Lizex, It sounds like you are working on the command line and want to now import data into Galaxy to work with it? If so, I'll add in an extra comment to be careful about the reference genome when moving into Galaxy: http://wiki.galaxyproject.org/Support#Rsync_data_and_moving_between_in stances To get the data into Galaxy - use FTP: http://wiki.galaxyproject.org/FTPUpload The gene expression file's XLOC IDs are the same as those in the GTF file's attribute field (9th field), used as input to Cuffdiff. To get the transcript sequence, you basically want to match up those identifiers, then extract the sequence from the reference genome. (Note that this will not include any base-level variation from your sequence data - this method is creating transcripts, using the genomic, based off coordinates. This tool packages does not assemble new consensus sequences.) The general path is: 0 - upload the "gene differential expression testing", GTF file, and reference genome if needed 2 - cut out the "XLOC" field from the " gene differential expression testing" file using the tool "Text Manipulation -> Cut" 3 - use the tool " Filter and Sort -> Filter GTF data by attribute values_list" to obtain only records related to your XLOC list 4 - obtain fasta sequence with the tool "Fetch Sequences -> Extract Genomic DNA" using the result from 3 as the query and your uploaded reference genome as a "Custom reference genome" if needed. More about custom reference genomes & RNA seq tools is in these links: http://wiki.galaxyproject.org/Support#Interpreting_scientific_results http://wiki.galaxyproject.org/Support#Custom_reference_genome Hopefully this helps, Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org

ADD COMMENT • link written 5.6 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »