Question: Demultiplex Miseq Data With Separate Index File.
0
gravatar for Philip Dean
6.8 years ago by
Philip Dean10
Philip Dean10 wrote:
I am using Galaxy main site to analyse MiSeq data of pooled samples. Essentially the run produces 3 fastq files consisting of R1, R2 read files and a separate index file. They are in the format below. R1: @M00132:6:000000000-A0JG4:1:1:18014:1842 1:N:0:0 Sequence data R2: @M00132:6:000000000-A0JG4:1:1:18014:1842 2:N:0:0 Sequence data Index: @M00132:6:000000000-A0JG4:1:1:18014:1842 1:N:0:0 CTCGGT + <@@DFD I would like to use Galaxy to demultiplex the samples and then analyse them individually. I have found barcode Splitter (version 1.0.0) on Galaxy however this tool requires the index to be found at the beginning of the sequence. Therefore I am attempting to add the index sequence onto the end of the sequence read data. FASTQ joiner (version 1.0.0) joins fastq files, however the fastqs to be joint must be distinguished by a /1 or /2 at end of sequence identifiers. Does anyone have any advice or experience of demultiplexing data in this format? Thanks, Phil DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you.
galaxy • 4.3k views
ADD COMMENTlink modified 6.8 years ago by Bossers, Alex240 • written 6.8 years ago by Philip Dean10
0
gravatar for Bossers, Alex
6.8 years ago by
Bossers, Alex240
Bossers, Alex240 wrote:
Phil, we also have a MiSeq and currently experienced the same phenomenon how to demultiplex. We have our local galaxy instance and wrote some scripts to efficiently demultiplex the sample. However, you FIRST need to " convert" on the MiSeq the primary fastq into a (in their view) multiplex identified fastq. There the final :0 in all your headers get converted in the multiplex or sample ID you gave it in the sample sheet! I see you all have zeros which is not quite helpfull. After the samplesheet conversion we just concat all fastq files which you then can easily group the reads on the final multiplex id en demultiplex it in separate files. In addition you can split the forward and reversed by the <space>1 and <space>2 identifyers in the header. Many tools do not require the conversion to /1 and /2 any more but this can be easily done locally with for instance sed on unix. We converted it like this: @M00132:6:000000000-A0JG4:1:1:18014:1842 1:N:0:2 @M00132:6:000000000-A0JG4:1:1:18014:1842 2:N:0:2 into @M00132:6:000000000-A0JG4:1:1:18014:1842/1 1:N:0:2 @M00132:6:000000000-A0JG4:1:1:18014:1842/2 2:N:0:2 Since many tools grep till the first space. I might pop the scripts soon in the toolshed but that might not be of great help maybe....otherwise pm me and I send you the script (perl). Alex ________________________________ Van: galaxy-user-bounces@lists.bx.psu.edu [galaxy-user- bounces@lists.bx.psu.edu] namens Philip Dean [Philip.Dean@nbt.nhs.uk] Verzonden: maandag 27 februari 2012 19:45 To: 'galaxy-user@bx.psu.edu' Onderwerp: [galaxy-user] demultiplex Miseq data with separate index file. I am using Galaxy main site to analyse MiSeq data of pooled samples. Essentially the run produces 3 fastq files consisting of R1, R2 read files and a separate index file. They are in the format below. R1: @M00132:6:000000000-A0JG4:1:1:18014:1842 1:N:0:0 Sequence data R2: @M00132:6:000000000-A0JG4:1:1:18014:1842 2:N:0:0 Sequence data Index: @M00132:6:000000000-A0JG4:1:1:18014:1842 1:N:0:0 CTCGGT + <@@DFD I would like to use Galaxy to demultiplex the samples and then analyse them individually. I have found barcode Splitter (version 1.0.0) on Galaxy however this tool requires the index to be found at the beginning of the sequence. Therefore I am attempting to add the index sequence onto the end of the sequence read data. FASTQ joiner (version 1.0.0) joins fastq files, however the fastqs to be joint must be distinguished by a /1 or /2 at end of sequence identifiers. Does anyone have any advice or experience of demultiplexing data in this format? Thanks, Phil DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you. нн
ADD COMMENTlink written 6.8 years ago by Bossers, Alex240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour