Hello,
I started using Galaxy some two weeks ago. I am not bioinformatician by training, but self tought molecular biologist working with HTS. Data I have come from semiconductor sequencing (PGM and Proton).
Managed to get data in Galaxy, complete alignment, get some QC and summary statistics and so on.
My current problem is slicing aligned human genome (low coverage) AND getting summary statistics for EACH slice (coverage, GC content, number of aligned reads). Managed to get sliced BAM with BEDtools but with statistics for EACH slice goes poorly.
Also found advice by Jennifer Hillman Jackson
"Hello Els, Have you seen the tool "BEDTools -> Create a BedGraph of genome coverage"? This would give you the coverage numbers, then you could perform statistics on those numbers. You could also "Convert from BAM to BED" (there is an option to split for spliced alignments) and if you had a bed file of transcripts, use tools in this group or tools in "Operate on Genomic Intervals" to generate statistics. You could also create your own statistics using "Text Manipulation -> Compute" or "Join, Subtract and Group -> Group". Hopefully one of these options works out for you. Jen Galaxy team"
But could not follow it, probably need more detailed explanation :(
Could anyone please give me more explanation how to achieve statistics for many small parts of genome? Step by step preferred :)
Currently I have working bed file with 1mil base slices. Looking forward to make slices in order of 20-100kb. Also working (at least I think its working sliced BAM).
Thanks,