Question: Copy/paste single read sequences in excel
0
gravatar for a.klausegger
13 months ago by
a.klausegger0 wrote:

Dear all,

I need all single read sequences from the SAM file converted from the BAM file to extract into an excel file. These can be up to 100.000 single read sequences or even more. Mark (ctrA) and Copy / Paste is just possible to the point with scolled down the pages, all sequences below are missed. Scrolling down can take up to 30 min, that is really boring. Is there a possibility to extract all single read seqences at once and copy into excel file?

thanks for help, alfred

galaxy samtools • 404 views
ADD COMMENTlink modified 13 months ago by Jennifer Hillman Jackson25k • written 13 months ago by a.klausegger0
0
gravatar for Jennifer Hillman Jackson
13 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Try this:

  • Extract the fastq sequences from the SAM file using NGS: Picard > SamToFastq
  • Convert the fastq to a tabular dataset using Convert Formats > Tabular to FASTQ converter
  • Filter out just the fields you want to retain (sequence identifier plus sequence?) using Text Manipulation > Cut

**Optional additional steps to remove any duplicates:

  • Convert the tabular data to fasta using Convert Formats > Tabular-to-FASTA
  • Collapse duplicate reads using NGS: QC and manipulation > Collapse sequences
  • Convert fasta back to tabular using Convert Formats > FASTA-to-Tabular

** There are other tools that will find "unique lines" in tabular datasets, but I'm not sure if they will work well on such a large dataset with longer data in the fields (the sequence). You could try though. An error would not be a bug but means the data is too large/complex to process this way and to use the original method above instead.

Any plain text file that has tabs separating columns can be imported into Excel. The limitation would be the "max lines" accepted by Excel (somewhere around 30-40k ?? you can google to check). Give the file the extension .txt during download from Galaxy, or after, so that Excel will recognize the file.

Hope that helps! Jen, Galaxy team

ADD COMMENTlink written 13 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour