Renaming sequencing files

Question: Renaming sequencing files

4 weeks ago by

mrudhulaks • 0 wrote:

Hi,

I am working on gut microbiome data and I am working on QIIME2. The data I receive is paired end sequences and the file name for each sample contains sample ID with nucleotide sequence. For example, the file is named as 189V1-16S-APBJN_TACGCTGC-CCTAGAGT_L001_R1.fastq.gz. I have 2 files for each sample and 200 samples in total. Could you please let me know how to rename these folders, so it will be easy for me to create a metadata file.

Thank you.

Regards, Mrudhula

upstream file galaxy data rename • 56 views

ADD COMMENT • link •

modified 4 weeks ago by Jennifer Hillman Jackson ♦ 25k • written 4 weeks ago by mrudhulaks • 0

4 weeks ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The goal is to rename the files before uploading to Galaxy, correct? If so, online command-line resources such as these two explain how to do that on the command-line or in a bash script (Linux or OSX terminal). You can google for Windows methods and even more Linux/OSX options, if interested, as there are several ways to rename files.

I tend too like one-liners for quick commands. This is a modification of one the answers in the second link that will work for your data. The simple syntax assumes that you are within the directly containing the files but you could modify that part of the command. You could also modify the regular expression. I stripped out all but the sample name and the _R1 or _R2, which are both file name contents you'll probably want to retain (at a minimum).

$ rename 's/_.+_.+_/_/' *

This will change 189V1-16S-APBJN_TACGCTGC-CCTAGAGT_L001_R1.fastq.gz to be 189V1-16S-APBJN_R1.fastq.gz. And it should work on the rest of your files in the same directory/folder, as a batch, if the file name formatting is consistent.

You may need to install the rename function, and that may mean first installing a package manager if you don't already have/use one. Homebrew (https://brew.sh/) is one choice or use your package manager of choice. And for testing out regular expressions, this is my go-to website when I get stuck: https://www.regextester.com/ and this one has nice tutorials if this is new to you https://www.regular-expressions.info/.

Whatever you decide, be sure to test before running on the actual data files. You don't even need to copy the data for testing -- just create a few empty files that have the same names as the originals in a separate test directory and see what you get. Then tune the command (or script, if you go that route instead) until it works the way you want it to. Example, if this is new to you:

 $ mkdir test_rename
 $ cd test_rename
 $ touch original-filename-1 original-filename-2
 $ rename 's/_.+_.+_/_/' *
 $ ls

... and review the new names.

Hope that helps! Jen, Galaxy team

ADD COMMENT • link modified 4 weeks ago • written 4 weeks ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »