I've been trying to analyze reads from a short transcript. The data I have was obtained by using MiSeq machine, and it is paired end (2 separate files). I am new to RNAseq analysis, so I was advised to do the following:
- trim off the primers and any adaptor sequence
- assemble the two overlapping reads to get a consensus sequence for each fragment
- discard any low quality data that remains
- align the consensus sequences to your reference sequence
I've performed the following steps by using public Galaxy:
1. Removed the adapters with primers by using the Clip tool.
2. I ran FASTQ joiner tool to combine both files into one.
3. This was followed by filtering by quality (FASTQ filter by quality tool).
4. Converted fastq to fasta by using FASTQ to FASTA tool
5. Attempted to run Clustal 2.1 to perform multiple sequence alignment.
Here (after step 5), the output was empty and I have gotten the following error message at the end of the log file:
CLUSTAL 2.1 Multiple Sequence Alignments
Sequence type explicitly set to DNA
Sequence format is Pearson
Sequence 1: 1 38 bp
Sequence 2: 2 38 bp
Sequence 3: 3 96 bp
Sequence 4: 4 69 bp
Sequence 126812: 126812 180 bp
Sequence 126813: 126813 180 bp
Sequence 126814: 126814 180 bp
Sequence 126815: 126815 153 bp
Sequence 126816: 126816 180 bp
Sequence 126817: 126817 69 bp
terminate called after throwing an instance of 'std::bad_alloc'
Start of Pairwise alignments
Could not allocate a distance matrix for 126817 seqs. Need to terminate program.
Could anybody, please, explain me what is the problem with my workflow?
Thank you very much!