Question: Pileup error: line length exceeds XXX in sequence YYY
0
gravatar for morgane.moreau.info
4.2 years ago by
Australia
morgane.moreau.info30 wrote:

Hi, 

 

I mapped my reads, filtered unpaired reads, converted to BAM and now I want to generate a pileup using my custom build genome (lacking its .gff file for now. I'm trying to fix that but galaxy doesn't want to upload my files atm). 

 

When I run 'generate pileup' I get this error message: [fai_build_core] line length exceeds 65535 in sequence 'NC_000XX'.

 

Can you tell me what this error means please? 

Thanks, 

 

Morgane

pileup error line lenght • 2.5k views
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by morgane.moreau.info30
1
gravatar for Daniel Blankenberg
4.2 years ago by
Daniel Blankenberg ♦♦ 1.7k
United States
Daniel Blankenberg ♦♦ 1.7k wrote:

You can use the FASTA Width formatter tool (https://usegalaxy.org/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fdevteam%2Ffasta_formatter%2Fcshl_fasta_formatter%2F1.0.0) with the "New width for nucleotides strings:" parameter set to 50 or so on your FASTA history item and then the pileup step will work with the reformatted FASTA genome.

ADD COMMENTlink written 4.2 years ago by Daniel Blankenberg ♦♦ 1.7k
0
gravatar for fubar
4.2 years ago by
fubar1.1k
Australia
fubar1.1k wrote:

Hello,

The error message looks like it comes from code trying to index your fasta and probably means precisely what it says - your fasta file is malformed - fasta files typically have line breaks every 50 or 80 characters or so but if you try counting lines (wc -l filename) or opening a local copy of your file in a linux text editor I think you'll see one short identifier line followed by one long line of sequence.

Perhaps the line breaks got lost in a windows/linux shuffle but I'm betting you have one sequence identifier (>NC_00XX)  followed by one really, really long line of sequence. Fixing will likely require using a local editor and uploading the re-formatted fasta. I hope this helps.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by fubar1.1k
0
gravatar for morgane.moreau.info
4.2 years ago by
Australia
morgane.moreau.info30 wrote:

Thanks Fubar, 

I did change the data type of my BAM file to BAM (it was set on Tabular) and it seems to be running. It is still in progress so I can't tell you yet if that was the problem. Will get back to you. 

ADD COMMENTlink written 4.2 years ago by morgane.moreau.info30

I noticed your bug report earlier and took a look at https://usegalaxy.org/history/view?id=257fff0a3bbe8944 and downloaded your TB fasta - it had 2 lines, one 4.4MB long. Not sure how you managed to get a bam file marked as tabular - that won't help either :)

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by fubar1.1k

No it didn't work, stayed queued (gray for ever). 

ADD REPLYlink written 4.2 years ago by morgane.moreau.info30
0
gravatar for morgane.moreau.info
4.2 years ago by
Australia
morgane.moreau.info30 wrote:

Thanks for your insights.

It is my reference genome that was in one line (it's a whole genome sequence so yes, it is in one line, how could have known that this could be an error ?? Isn't it the all purpose of a fasta file ?)

But now that I used FASTA width on it,  my .gff file can't be assigned to this dataset, like I can choose it but it doesn't save it? 

Should I redo all the mapping of my samples ? 

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by morgane.moreau.info30

a) AFAIK, there's no specification for fasta, but most text processing tools get unhappy with lines of millions of characters each because of the way text files are buffered - so I think it's good practice to take care and always generate them with some short fixed line length in the sequence and make sure to ignore any line breaks when processing them..

b) I've no idea why fixing the line length on one history file should change the datatype of another one - or are you reassigning the datatype on the fasta file to gff? That is not a very good idea since gff format <> fasta format - Galaxy is smart but not that smart - if there's a converter you can use it through the pencil icon but otherwise, changing a datatype to a completely incompatible format (eg fasta -> gff) will usually break things in unexpected ways.

c) you may well want to remap if the mapping was against a bogus fasta but I'm confused about what you are really trying to do and where the gff gene model file fits into your plans. Maybe if you clarify your goals someone can offer advice?

 

ADD REPLYlink written 4.2 years ago by fubar1.1k
0
gravatar for morgane.moreau.info
4.2 years ago by
Australia
morgane.moreau.info30 wrote:

Thanks for your feedbacks. 

As you guessed, I'm a newbie, I've been playing with galaxy for 2 weeks, and frankly, I'm starting to loose it. 

The end goal is to get a list of SNPs I can be confident with (I'm far from getting there,I'm just starting to understand how to map and filter my reads, but I'm so confused with pileup, mpileup,GATK)... 

Regarding the previous post, I'm not trying to modify my fasta file into a .gff file, sorry if I was not clear. I was talking about assigning the annotation file to the fasta file. When I created my reference genome using custom build, I uploaded my fasta file ( nucleotides) and a .gff file which contains all the annotations (so that later I have information on genes and coding region ect...). When I upload my .gff file, I link it to my reference genome fasta file. 

Now that I modified my fasta file, I would like my .gff file to be linked to that one instead of my initial fasta file (which had 2 line), just to be sure that I'm not loosing this database further down the track. But maybe I can specify it later when I am at the SNPs analysis stage and don't have to worry about it for now...

 

ADD COMMENTlink written 4.2 years ago by morgane.moreau.info30

Being a newbie isn't a problem in terms of getting help, but context and details matter a lot - so please, fill us in a little more. Start with the most basic question of all - using Galaxy main or your own local copy? If it's main, submitting a bug report when you get a failed job is by far the best way of getting help because it allows us to see the data. 

ADD REPLYlink written 4.2 years ago by fubar1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour