Question: Extracting the read counts from a collapsed fasta file?
1
gravatar for c.l.frankling
12 months ago by
c.l.frankling10 wrote:

I have collapsed my fastq file so I know have the output fasta file which contains all the unique sequences and their read count information.

For example;

1-106

CTATAGAAGGGTAATACTACGTA

2-88

CTATAGAAGGGTAATACTAACA

3-83

CTATAGAAGGGTGACTATTGG

How can I create a simple text file that contains all these sequences and their corresponding read counts? Which tools are most appropriate for this from the text manipulation group?

Thanks, Charlotte.

ADD COMMENTlink modified 12 months ago • written 12 months ago by c.l.frankling10
1
gravatar for Jennifer Hillman Jackson
12 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Try the tool Fasta-to-Tabular. This will create a two column text file with the ID-Count in the first column (the leading > is removed) and the sequence itself in the second column. If you want to break up ID and Count into distinct columns after, use the tool Convert delimiters to TAB with the option set to convert "dashes" aka - to a tab.

Should you want to rearrange the column order after, use the tool Cut.

Datamash could be also be used after to sum up the numerical count values (along with other operations).

Hope that helps! Jen, Galaxy team

ADD COMMENTlink modified 12 months ago • written 12 months ago by Jennifer Hillman Jackson25k
0
gravatar for c.l.frankling
12 months ago by
c.l.frankling10 wrote:

Thank you, that really helped!

The collapse tool is really useful for nucleotide sequences, do you know of a way to do this for amino acid sequences in a fasta file still?

Thanks again, Charlotte.

ADD COMMENTlink written 12 months ago by c.l.frankling10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour