Question: EBI SRA Data Import
0
gravatar for prahkingsworth
3.8 years ago by
prahkingsworth0 wrote:

Hi guys

i have just imported these datasets into Galaxy in fastq format

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4620

which focuses on several chromosomes.

I want to convert these imported fastq files into bam files and focus on just one chromosome for example chromosome 5 using galaxy for RNA Seq analysis. The main reason for trying to focus on a single chromosome is that this will reduce the size of the imported fastq files making it easy to focus on a single study.

How  can this be done using Galaxy?

Thanks in advance

Rex

fastq datasets filtering • 1.2k views
ADD COMMENTlink modified 3.8 years ago by Jennifer Hillman Jackson25k • written 3.8 years ago by prahkingsworth0
0
gravatar for Jennifer Hillman Jackson
3.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

You can convert a fasta file to BAM format, but it will not have chromosome mapping information.

In order to obtain that, map the full dataset then filter the results for hits to the target chromosome. Then go back and extract just the fasta sequences for those hits, creating the final .fastq dataset to use in your analysis as the input. This creates slightly skewed input - only sequences that map will be retained (meaning, unmapped sequences will not be a part of the input, as some fraction would normally be). This may or may not matter to you, and you could always seed back in some unmapped sequences at the same fraction found in the original dataset.

Alternatively, you could create a custom reference genome with just the target chromosome and use that when you map. The job will execute quicker. However, you will almost certainly get slightly different results. Perhaps try both and see which works best for you on one dataset, then use that method with the others.

Best, Jen, Galaxy team

ADD COMMENTlink written 3.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 170 users visited in the last hour