Extracting the read counts from a collapsed fasta file?

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Extracting the read counts from a collapsed fasta file?

1

12 months ago by

c.l.frankling • 10

c.l.frankling • 10 wrote:

I have collapsed my fastq file so I know have the output fasta file which contains all the unique sequences and their read count information.

For example;

1-106

CTATAGAAGGGTAATACTACGTA

2-88

CTATAGAAGGGTAATACTAACA

3-83

CTATAGAAGGGTGACTATTGG

How can I create a simple text file that contains all these sequences and their corresponding read counts? Which tools are most appropriate for this from the text manipulation group?

Thanks, Charlotte.

fasta read counts collapse galaxy • 441 views

ADD COMMENT • link •

modified 12 months ago • written 12 months ago by c.l.frankling • 10

1

12 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Try the tool Fasta-to-Tabular. This will create a two column text file with the ID-Count in the first column (the leading > is removed) and the sequence itself in the second column. If you want to break up ID and Count into distinct columns after, use the tool Convert delimiters to TAB with the option set to convert "dashes" aka - to a tab.

Should you want to rearrange the column order after, use the tool Cut.

Datamash could be also be used after to sum up the numerical count values (along with other operations).

Hope that helps! Jen, Galaxy team

ADD COMMENT • link modified 12 months ago • written 12 months ago by Jennifer Hillman Jackson ♦ 25k

0

12 months ago by

c.l.frankling • 10

c.l.frankling • 10 wrote:

Thank you, that really helped!

The collapse tool is really useful for nucleotide sequences, do you know of a way to do this for amino acid sequences in a fasta file still?

Thanks again, Charlotte.

ADD COMMENT • link written 12 months ago by c.l.frankling • 10

Please log in to add an answer.

Similar posts • Search »

Primer Contamination, Miranalyzer
Hi Galaxy, Ive got 2 problems for you; 1) Ive got microRNA Illumina NGS data that I want to ana...
Fastq Collapse?
Hello Galaxy users, Just to follow-up on my user group question described in the list-serv e-mai...
How To Use Collapsed Sequence Files In Mapping And Displaying
I found that there is a "collapse" tool under FASTA manipulation, which will significantly shorte...
Unable To Import Run Or Save-To-File Published Workflow After Galaxy Upgrade
dear all, we've just upgraded our Galaxy server (Galaxy revision 7148:17d57db9a7c0, upgraded to...
Convert FASTAQ file into plain text whole genome sequence
I have a FASTAQ file containing millions of sequences and I want a simple script to convert this ...
Problem with fastx collapser
Hello: I am having a problem working with fastx_collapser in Galaxy. When attempting to use the...
Cuffdiff 2.2.1.5 error with cummeRbund SQlite. "Fatal error: Exit code 1 () -- use version 2.2.1.3 instead
Hi, so as a disclaimer, I am very new to RNA seq analysis. I received the following error while r...
DE analysis of miRNA after mapping with miRDeep2 quantifier
After small RNA sequencing I performed adapter, quality, and length trimming on fastq files. I th...
DE NOVO TRANSCRIPTOME AND COUNT
Hi, I used STAR to assemble contigs for de novo transcriptomics data and used the following p...
April 8, 2011 Galaxy Development News Brief
April 8, 2011 Galaxy Development News Brief http://bitbucket.org/galaxy/galaxy- central/wiki/Fea...
Compare two datasets issue
Hi, I'm trying to use the compare datasets tool and can't get it to work. My first file was upl...
Galaxy mirdeep2 error
Hi, I am here to seek your help. I am trying to work with mirdeep2 (Within Galaxy) (identificatio...
Extract chromosome sequences from genome fasta file
I loaded genome sequences into Galaxy as fasta files. The files contain sequence information abo...
Extract Genomic Dna-Strand Information Is Not Recognized
Hello, I am trying to extract sequences from a FASTA file containing genomic information. The co...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 172 users visited in the last hour