Question: BAM file problems, sorting filtering and running naive variant called
gravatar for keeley.brookes
10 months ago by
keeley.brookes0 wrote:

Hi All

I have aligned by RNA-seq using BWA for Illumina - as I would like to use Naive Variant Caller downstream.

I converted my BWA SAM files to BAM files, and merged the four technical replicates I had.

I them sorted the file - using SAM tools and although green it had this error attached

Ignoring SAM validation error: ERROR: Record 492151, Read name NS500557:56:H5W5MBGXY:1:11109:18381:11447, bin field of BAM record does not equal

When I filtered by dataset using PICARD, for Is Mapped, Is proper pair, MapQ >=20 and NM:>1 I got a vastly smaller file - which I am not sure is correct. I ran Naive Variant caller on it and after almost 24 hours it is still running.

I ran PICARD validate SAM file on my data set and had this summary:

HISTOGRAM java.lang.String

Error Type Count ERROR:INVALID_INDEXING_BIN 83 ERROR:INVALID_MAPPING_QUALITY 270 ERROR:INVALID_TAG_NM 38 ERROR:MATE_NOT_FOUND 3645430 ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 758661 ERROR:MISMATCH_FLAG_MATE_UNMAPPED 270 ERROR:MISSING_READ_GROUP 1 I am not a bioinformatic and don't really understand computer language so if any can help explain this and help me out i would be very grateful. thanks!

ADD COMMENTlink modified 9 months ago by Jennifer Hillman Jackson25k • written 10 months ago by keeley.brookes0
gravatar for Jennifer Hillman Jackson
9 months ago by
United States
Jennifer Hillman Jackson25k wrote:


The NVC tool cannot be used with RNA-seq data. However, Freebayes can. Map with HISAT2 first.

You do not need to sort the HISAT2 BAM output before using Freebayes. The tutorial here is for DNA, but the workflow is about the same for RNA:

If you do need to sort a BAM database by queryname right now, use the Picard Sort Bam/Sam tool. At this time, a queryname sorted output is always in SAM format (Picard does this correctly) or will otherwise be in a BAM with an error state (no index, a problem with Samtools Sort right now, to be fixed soon).

There are some upcoming changes for BAM datatypes in the next release. Specifically, any dataset with a BAM datatype will be required to be coordinate-sorted. If a BAM is needed that is queryname sorted, tools will have an option to do that sorting for you (Htseq-count is already written this way) or a queryname sorted SAM can be used. In the near future, the BamNative datatype will be another option (new datatype, no sort order is assumed)

When the Galaxy 18.01 release goes out (in next few weeks), we'll add details about the new datatype in the release notes, create a new support FAQ describing and explaining how to work with the BamNative-vs-BAM-vs-SAM datatypes, and update our existing help resources for working with BAM/SAM data (sorting instructions, recommended manipulation tools, etc).

Thanks! Jen, Galaxy team

ADD COMMENTlink written 9 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 177 users visited in the last hour