Question: EBI SRA Data Import
3.8 years ago
prahkingsworth wrote:

Hi guys

i have just imported these datasets into Galaxy in fastq format

which focuses on several chromosomes.

I want to convert these imported fastq files into bam files and focus on just one chromosome for example chromosome 5 using galaxy for RNA Seq analysis. The main reason for trying to focus on a single chromosome is that this will reduce the size of the imported fastq files making it easy to focus on a single study.

How  can this be done using Galaxy?

Thanks in advance


fastq datasets filtering
written 3.8 years ago by prahkingsworth
3.8 years ago
United States
Jennifer Hillman Jackson wrote:


You can convert a fasta file to BAM format, but it will not have chromosome mapping information.

In order to obtain that, map the full dataset then filter the results for hits to the target chromosome. Then go back and extract just the fasta sequences for those hits, creating the final .fastq dataset to use in your analysis as the input. This creates slightly skewed input - only sequences that map will be retained (meaning, unmapped sequences will not be a part of the input, as some fraction would normally be). This may or may not matter to you, and you could always seed back in some unmapped sequences at the same fraction found in the original dataset.

Alternatively, you could create a custom reference genome with just the target chromosome and use that when you map. The job will execute quicker. However, you will almost certainly get slightly different results. Perhaps try both and see which works best for you on one dataset, then use that method with the others.

Best, Jen, Galaxy team

written 3.8 years ago by Jennifer Hillman Jackson
