Question: Extracting sequences from FASTA file
gravatar for d.gerrard8
3.5 years ago by
United States
d.gerrard80 wrote:


I have been using on a mac to extract sequences from a fasta file. I have a file called 'Trinity.fasta' that has fasta sequences with identifiers 'comp#_c#_seq#' for instance, 'comp1_c0_seq1'. I also have an in text file for the specific contig identifiers that I would like to get sequences for but the identifiers are written as 'comp55698_c0'. As you can see the '_seq#' is missing.


Is there another program that I could use that would allow me to say that the _seq is missing?




idenfiers line-command data • 1.8k views
ADD COMMENTlink modified 3.5 years ago by Jennifer Hillman Jackson25k • written 3.5 years ago by d.gerrard80
gravatar for Jennifer Hillman Jackson
3.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:


This is not really a Galaxy question .. but I can't help but share a simple and super useful line-command option that will work here.

  1. Make a clone-copy of the file you intend to modify (the one with the "compNNNNN_c0_seq1" content). Put the backup file in another place completely - like a directory labeled as "YYYYMMDD_originals_experiment-name" or something else obvious. (sub-directories tend to not be the best place for backups IMHO .. too easy to "rm -rf" and lose it all)
  2. With the working copy of the file - let's call it "contigs.fasta" - execute the following at the prompt ($ == prompt):

$ sed 's/_seq1//' configs.fasta > configs_clean.fasta

There are literally at least 30 ways to do this sort of manipulation, sed is just my favorite line-command. Short and sweet.

The identifiers could also be modified to add-on the "_seq" bit. I personally would use "vi" or whatever your favorite text editor is for that.

  1. Backup! and call the file something like "identifiers.txt"
  2. Assumption: file is a single column list of identifiers, only!
$ vi identifiers.txt

within vi, while escaped (hit "esc" key if needed), type:

:%s/$/_seq1/ (hit return)

Either can be done within Galaxy itself using text manipulation tools. More about these commands & options can be googled.

Best, Jen, Galaxy team

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour