4 months ago by
United States
Hello,
Ok, that went quicker than expected. The skipped BED transcripts are not represented in the bigWig file. This might be Ok depending on your analysis goals.
An example input region that was skipped because it hit to an unmapped chromosome not included in the BAM/bigWig:
chrUn_GL456372 6883 13335 uc009skg.1 0 - 6883 6883 0 3 680,151,109, 0,933,6343,
Two choices: Use the data as is (leave out the transcripts that didn't pass through the tool). Or, recreate the BAM used to generate the bigWig using the full mm10 genome build.
I only see a small number of skipped transcripts/regions (for this reason) when I review the stderr
for the job. Click on the job details icon ("i" icon) to review this content. You can also send in a bug report - I'll know it is yours, so is fine to do that without expecting more feedback - and you'll get a copy to self-review the full details.
Overall, 63,814 regions were input and 63,572 matrix lines were generated. The difference of a few hundred are most likely those transcripts that were not in the bigWig at all (skipped).
The current results appear to be Ok to plot. If you have a have a failed plot (I couldn't find one active in your history), a bug report for that can also be sent in/reviewed for more details about why that is failing. It could be that some setting for ComputeMatrix needs to be tuned. For example, you might want to set the parameter "Skip zeros" to "Yes" (what this exactly does is explained on the tool form). In short, when no overlap is found for the transcript regions vs any of the bigWig's regions those data will be removed from the heatmap plot output.
Hope that helps! Jen, Galaxy team
Hi Matt, Could you explain in more detail what steps are leading up to this problem? I'm not sure what "genes being left out" means or how you are preparing the data for this tool.
It would be helpful to include exact tool names (including version) and the copied contents of the Job Details tool run settings (don't post back the job API links, or your data will not remain private). Or you can post back a shared history link (would be public) or send in a bug report from the error dataset (private). If you choose to send in the bug report, be sure to leave all input/output datasets undeleted and include a link to this post so we can link the two.
How-to and troubleshooting FAQs: https://galaxyproject.org/support/
Thanks, Jen, Galaxy team
I'm using the computeMatrix tool in NGS: DeepTools, version 2.5.0.0 and by "genes being left out", I mean that the tool is deciding, for one reason or another, to skip some of the genes and leave them out of the computeMatrix output. The settings are as shown:
computeMatrix has two main output options reference-point
The reference point for the plotting beginning of region (e.g. TSS)
Discard any values after the region end False
Distance upstream of the start site of the regions defined in the region file 1000
Distance downstream of the end site of the given regions 1000
Show advanced output settings yes
Save the matrix of values underlying the heatmap True
Save the regions after skipping zeros or min/max threshold values False
Show advanced options yes
Length, in bases, of non-overlapping bins used for averaging the score over the regions length 50
Sort regions maintain the same ordering as the input files
Method used for sorting mean
Define the type of statistic that should be displayed. mean
Convert missing values to 0? True
Skip zeros False
Minimum threshold Not available.
Maximum threshold Not available.
Scaling factor Not available.
Use a metagene model False
trascript designator transcript
exon designator exon
transcriptID key designator transcript_id
Blacklisted regions in BED/GTF format
Job Resource Parameters no
The full message in the info box is as shown:
Skipping uc009skg.1, due to being absent in the computeMatrix output. Skipping uc009skh.1, due to being absent in the computeMatrix output. Skipping uc029xhh.1, due to being absent in the computeMatrix output. Skipping uc029xhi.1, due to being absent in
Either fixing the skipping issue or even just getting a complete list of the skipped ones would effectively fix the problem I'm having. I'm fairly certain the problem isn't with how the data is prepared, as I have run this through with different sets and with smaller sets and it was fine before. Thanks for looking into this, and feel free to ask if you need any more clarification. When I get time I'll see if I can't figure it out myself.
My initial guess is that the skipped transcripts (genes) map to places not represented in the bigwig score data. The bigWig was created from a BAM that was mapped to the mm10 primary autosomes + chrX, chrY, and chrM (an uploaded BAM, but the BAM headers are a match for mm10, so I don't think there is a genome mismatch problem).
The mm10 UCSC genes track includes transcripts that map to haplotypes and unmapped (the full genome). These are probably what is being skipped.
I'll be checking for that (there are not that many skipped so this makes sense) -- but also please continue to check/troubleshoot your way & considering this info.