Question: Renaming sequencing files
0
gravatar for mrudhulaks
20 days ago by
mrudhulaks0
mrudhulaks0 wrote:

Hi,

I am working on gut microbiome data and I am working on QIIME2. The data I receive is paired end sequences and the file name for each sample contains sample ID with nucleotide sequence. For example, the file is named as 189V1-16S-APBJN_TACGCTGC-CCTAGAGT_L001_R1.fastq.gz. I have 2 files for each sample and 200 samples in total. Could you please let me know how to rename these folders, so it will be easy for me to create a metadata file.

Thank you.

Regards, Mrudhula

upstream file galaxy data rename • 38 views
ADD COMMENTlink modified 19 days ago by Jennifer Hillman Jackson25k • written 20 days ago by mrudhulaks0
0
gravatar for Jennifer Hillman Jackson
19 days ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The goal is to rename the files before uploading to Galaxy, correct? If so, online command-line resources such as these two explain how to do that on the command-line or in a bash script (Linux or OSX terminal). You can google for Windows methods and even more Linux/OSX options, if interested, as there are several ways to rename files.

I tend too like one-liners for quick commands. This is a modification of one the answers in the second link that will work for your data. The simple syntax assumes that you are within the directly containing the files but you could modify that part of the command. You could also modify the regular expression. I stripped out all but the sample name and the _R1 or _R2, which are both file name contents you'll probably want to retain (at a minimum).

$ rename 's/_.+_.+_/_/' *

This will change 189V1-16S-APBJN_TACGCTGC-CCTAGAGT_L001_R1.fastq.gz to be 189V1-16S-APBJN_R1.fastq.gz. And it should work on the rest of your files in the same directory/folder, as a batch, if the file name formatting is consistent.

You may need to install the rename function, and that may mean first installing a package manager if you don't already have/use one. Homebrew (https://brew.sh/) is one choice or use your package manager of choice. And for testing out regular expressions, this is my go-to website when I get stuck: https://www.regextester.com/ and this one has nice tutorials if this is new to you https://www.regular-expressions.info/.

Whatever you decide, be sure to test before running on the actual data files. You don't even need to copy the data for testing -- just create a few empty files that have the same names as the originals in a separate test directory and see what you get. Then tune the command (or script, if you go that route instead) until it works the way you want it to. Example, if this is new to you:

 $ mkdir test_rename
 $ cd test_rename
 $ touch original-filename-1 original-filename-2
 $ rename 's/_.+_.+_/_/' *
 $ ls

... and review the new names.

Hope that helps! Jen, Galaxy team

ADD COMMENTlink modified 19 days ago • written 19 days ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 141 users visited in the last hour