Pileup error: line length exceeds XXX in sequence YYY

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Pileup error: line length exceeds XXX in sequence YYY

0

4.2 years ago by

morgane.moreau.info • 30

Australia

morgane.moreau.info • 30 wrote:

Hi,

I mapped my reads, filtered unpaired reads, converted to BAM and now I want to generate a pileup using my custom build genome (lacking its .gff file for now. I'm trying to fix that but galaxy doesn't want to upload my files atm).

When I run 'generate pileup' I get this error message: [fai_build_core] line length exceeds 65535 in sequence 'NC_000XX'.

Can you tell me what this error means please?

Thanks,

Morgane

pileup error line lenght • 2.5k views

ADD COMMENT • link •

modified 4.2 years ago • written 4.2 years ago by morgane.moreau.info • 30

1

4.2 years ago by

Daniel Blankenberg ♦♦ 1.7k

United States

Daniel Blankenberg ♦♦ 1.7k wrote:

You can use the FASTA Width formatter tool (https://usegalaxy.org/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fdevteam%2Ffasta_formatter%2Fcshl_fasta_formatter%2F1.0.0) with the "New width for nucleotides strings:" parameter set to 50 or so on your FASTA history item and then the pileup step will work with the reformatted FASTA genome.

ADD COMMENT • link written 4.2 years ago by Daniel Blankenberg ♦♦ 1.7k

0

4.2 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

Hello,

The error message looks like it comes from code trying to index your fasta and probably means precisely what it says - your fasta file is malformed - fasta files typically have line breaks every 50 or 80 characters or so but if you try counting lines (wc -l filename) or opening a local copy of your file in a linux text editor I think you'll see one short identifier line followed by one long line of sequence.

Perhaps the line breaks got lost in a windows/linux shuffle but I'm betting you have one sequence identifier (>NC_00XX) followed by one really, really long line of sequence. Fixing will likely require using a local editor and uploading the re-formatted fasta. I hope this helps.

ADD COMMENT • link modified 4.2 years ago • written 4.2 years ago by fubar ♦ 1.1k

0

4.2 years ago by

morgane.moreau.info • 30

Australia

morgane.moreau.info • 30 wrote:

Thanks Fubar,

I did change the data type of my BAM file to BAM (it was set on Tabular) and it seems to be running. It is still in progress so I can't tell you yet if that was the problem. Will get back to you.

ADD COMMENT • link written 4.2 years ago by morgane.moreau.info • 30

I noticed your bug report earlier and took a look at https://usegalaxy.org/history/view?id=257fff0a3bbe8944 and downloaded your TB fasta - it had 2 lines, one 4.4MB long. Not sure how you managed to get a bam file marked as tabular - that won't help either :)

ADD REPLY • link modified 4.2 years ago • written 4.2 years ago by fubar ♦ 1.1k

No it didn't work, stayed queued (gray for ever).

ADD REPLY • link written 4.2 years ago by morgane.moreau.info • 30

0

4.2 years ago by

morgane.moreau.info • 30

Australia

morgane.moreau.info • 30 wrote:

Thanks for your insights.

It is my reference genome that was in one line (it's a whole genome sequence so yes, it is in one line, how could have known that this could be an error ?? Isn't it the all purpose of a fasta file ?)

But now that I used FASTA width on it, my .gff file can't be assigned to this dataset, like I can choose it but it doesn't save it?

Should I redo all the mapping of my samples ?

ADD COMMENT • link modified 4.2 years ago • written 4.2 years ago by morgane.moreau.info • 30

a) AFAIK, there's no specification for fasta, but most text processing tools get unhappy with lines of millions of characters each because of the way text files are buffered - so I think it's good practice to take care and always generate them with some short fixed line length in the sequence and make sure to ignore any line breaks when processing them..

b) I've no idea why fixing the line length on one history file should change the datatype of another one - or are you reassigning the datatype on the fasta file to gff? That is not a very good idea since gff format <> fasta format - Galaxy is smart but not that smart - if there's a converter you can use it through the pencil icon but otherwise, changing a datatype to a completely incompatible format (eg fasta -> gff) will usually break things in unexpected ways.

c) you may well want to remap if the mapping was against a bogus fasta but I'm confused about what you are really trying to do and where the gff gene model file fits into your plans. Maybe if you clarify your goals someone can offer advice?

ADD REPLY • link written 4.2 years ago by fubar ♦ 1.1k

0

4.2 years ago by

morgane.moreau.info • 30

Australia

morgane.moreau.info • 30 wrote:

Thanks for your feedbacks.

As you guessed, I'm a newbie, I've been playing with galaxy for 2 weeks, and frankly, I'm starting to loose it.

The end goal is to get a list of SNPs I can be confident with (I'm far from getting there,I'm just starting to understand how to map and filter my reads, but I'm so confused with pileup, mpileup,GATK)...

Regarding the previous post, I'm not trying to modify my fasta file into a .gff file, sorry if I was not clear. I was talking about assigning the annotation file to the fasta file. When I created my reference genome using custom build, I uploaded my fasta file ( nucleotides) and a .gff file which contains all the annotations (so that later I have information on genes and coding region ect...). When I upload my .gff file, I link it to my reference genome fasta file.

Now that I modified my fasta file, I would like my .gff file to be linked to that one instead of my initial fasta file (which had 2 line), just to be sure that I'm not loosing this database further down the track. But maybe I can specify it later when I am at the SNPs analysis stage and don't have to worry about it for now...

ADD COMMENT • link written 4.2 years ago by morgane.moreau.info • 30

Being a newbie isn't a problem in terms of getting help, but context and details matter a lot - so please, fill us in a little more. Start with the most basic question of all - using Galaxy main or your own local copy? If it's main, submitting a bug report when you get a failed job is by far the best way of getting help because it allows us to see the data.

ADD REPLY • link written 4.2 years ago by fubar ♦ 1.1k

Please log in to add an answer.

Similar posts • Search »

Sam Tools Pileup
Can we use the SAM tools pileup tool in Galaxy to get an accurate count of the coverage across th...
Help With Sam To Bam
Hi, I was wondering if someone could help me with an error message I'm getting after performing a...
User Reference Vs. Built-In Index
Hi. I created a workflow to map IGA reads using bowtie and generate a pileup at the end. The work...
Extracting Reads Mapped From Bwa Mapping
Hello, I recently figured out how to filter the output bwa SAM file for flag type in order to de...
Varscan on multiple samples generates error
Hello, I am using "samtools mpileup" to generate a pileup file from my .bam dataset and then pas...
Generate Pileup Not Working?
Hi, I am trying to do variants call with "generate pileup". My steps where: 1. BWA 2. select only...
Pileup consensus not called correctly
Hi, I want to extract a consensus sequence from an aligned BAM file, which was generated by alig...
Fasta Generation From Pileup For Snp Detection
I'm interested in generating a fasta file from Ilumina paired reads of my wild type strain. I hav...
Error When Running Generate Pileup
Hello, I am aligning Illumina paired end sequence using BWA, then run SAM-to- BAM resulting in a...
Error Message On Local Install
Hi, I'm trying to solve an issue I'm having with my local installation of Galaxy (installed on my...
How to generate pileup file
I am doing research on variant identification of genetic disease. when i generate file frome pil...
Error: htseq-count exceeds memory buffer, Solution: Set sort option on tool form to Yes
Hello, I am dealing with RNA-seq data and I was successfully able to align my raw FASTQ data to t...
Bowtie on Galaxy - -v or -n ?
Good morning, I'm using Galaxy with Bowtie for Illumina to map smallRNA sequencing (illumina) on ...
Help With Sam To Bam (Zachary A Lewis)
Message: 1 Date: Tue, 13 Sep 2011 18:32:43 +0000 To: "galaxy-user@lists.bx.psu.edu" Subject: [...
Generating Pileup With Max Depth Greater Than 8000?
My question: How can I generate a pileup with an output of more than 8000 hits per base? I was ge...
Increasing galaxy memory
Hello, Does anyone know if there is a way to increase my galaxy allocation? My data files were a...
Problems With The Groomer
Hi, I'm experiencing some strange problems with the fastq groomer. Trying to groom my files I get...
Filter Pileup In Sam Tools
NGS: SAM Tools I have generated a simple 6 column pileup from my BAM file but when I try to use ...
Cutadapt need adaptor sequence input
I use cutadapt on galaxy to trim my adaptors for my fastq files. Does the program find a default ...
Impossible to use Htseq-count on BAM files from Tophat2
Hello, I'm currently facing troubles using galaxy. I want to compare differentially expressed ...
BWA for SOLiD
Hello: I am receiving an error trying to perform the "Manipulate FASTQ" on my .fastq file. My or...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 169 users visited in the last hour