Copy/paste single read sequences in excel

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Copy/paste single read sequences in excel

0

13 months ago by

a.klausegger • 0

a.klausegger • 0 wrote:

Dear all,

I need all single read sequences from the SAM file converted from the BAM file to extract into an excel file. These can be up to 100.000 single read sequences or even more. Mark (ctrA) and Copy / Paste is just possible to the point with scolled down the pages, all sequences below are missed. Scrolling down can take up to 30 min, that is really boring. Is there a possibility to extract all single read seqences at once and copy into excel file?

thanks for help, alfred

galaxy samtools • 404 views

ADD COMMENT • link •

modified 13 months ago by Jennifer Hillman Jackson ♦ 25k • written 13 months ago by a.klausegger • 0

0

13 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Try this:

Extract the fastq sequences from the SAM file using NGS: Picard > SamToFastq
Convert the fastq to a tabular dataset using Convert Formats > Tabular to FASTQ converter
Filter out just the fields you want to retain (sequence identifier plus sequence?) using Text Manipulation > Cut

**Optional additional steps to remove any duplicates:

Convert the tabular data to fasta using Convert Formats > Tabular-to-FASTA
Collapse duplicate reads using NGS: QC and manipulation > Collapse sequences
Convert fasta back to tabular using Convert Formats > FASTA-to-Tabular

** There are other tools that will find "unique lines" in tabular datasets, but I'm not sure if they will work well on such a large dataset with longer data in the fields (the sequence). You could try though. An error would not be a bug but means the data is too large/complex to process this way and to use the original method above instead.

Any plain text file that has tabs separating columns can be imported into Excel. The limitation would be the "max lines" accepted by Excel (somewhere around 30-40k ?? you can google to check). Give the file the extension .txt during download from Galaxy, or after, so that Excel will recognize the file.

Hope that helps! Jen, Galaxy team

ADD COMMENT • link written 13 months ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »

Built-in indexes not working in Local Galaxy Instance
Hi, I've just finished setting up a local Galaxy instance and everything works fine except for b...
Extract reads from BAM file
Hi I have a BAM file with all the reads aligned to the reference sequence that I used to map them...
BOWTIE BAM file alignments
I prepared a small RNA (28-34nt) library made up of ribosome protected RNA fragments. I prepared...
How to get sequence for a single gene from an RNA Seq file?
Is there a way to extract the sequence data for a single gene from a RNA Seq1 data file?
rDNA sequence extraction from NGS short reads or contigs
How can I extract rDNA sequence from NGS short reads or assembled contigs?
How do i remove multiple adapter sequences from my RNAseq reads?
Hi there, I want to remove the universal adapters as well as the index adapters in each data fil...
Critical Feedback
This student was more adventurous. I think he actually could do more of what he tried with more e...
Data From History Now Showing Up In Fastq Drop Down
Hi All, We have a galaxy local install. Thanks to Carlos's suggestion, I was able to get the ref...
Getting an excel list of variations from Varscan or FreeBayes .vcf files
I have whole genome sequence from Saccharomyces cerevisiae strains and I'm looking for their vari...
Unable to upload excel file
I am trying to upload an excel file, but I keep getting this message after the upload completes, ...
Extracting Sequences For Transcripts From Reference Genome
Dear Galaxy community I'm new to galaxy and would like to ask the following: I have trimmed, QC'...
Read count across interval
Hello, I'm trying to retrieve a score for ATAC-seq (similar processing to ChiP-seq data), at an i...
Extracting Sequences For Transcripts From Reference Genome
Dear Galaxy community I'm new to galaxy and would like to ask the following: I have trimmed, QC'...
Mapping To Only 3 Genes / Targeted Resequencing / Solid4 / Short Reads
Hi! Following situation: 10 barcoded "samples". Each sample consists of a mix of the sequences 3...
Search For Tf Binding Site Patterns In Galaxy
I am trying to come up with a nice workflow/tutorial for the use of Galaxy to search for Transcr...
How to deal with repeated genomic regions in BWA ? (How to generate a BED file from the XA tags)
Hi, I am using use "BWA for illumina" on galaxy main server, I am looking for the frequency of re...
Extracting the read counts from a collapsed fasta file?
I have collapsed my fastq file so I know have the output fasta file which contains all the unique...
Bowtie alignment with multiple datasets
Dear community, I am now in the process of aligning RNA-seq reads with Bowtie2. My input is a li...
Shortening Sequences?
Hi All, I have a little how to do question and was hoping somebody knows the answer? I have a me...
Handling Large Files In Galaxy
Hi all; I've recently gotten a local Galaxy install up and running for our group. We do a lot of ...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 172 users visited in the last hour