Question: Primer Contamination, Miranalyzer
0
Rosie Griffiths • 10 wrote:
Hi Galaxy,
Ive got 2 problems for you;
1) Ive got microRNA Illumina NGS data that I want to analyse, I put it
through fastQC on galaxy and it showed that 71% of the reads in one
overrepresented sequence;
Sequence
Count Percentage Possible Source
GAATTCCACCACGTTCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAG 16896622
71.06413061961005 RNA PCR Primer, Index 1 (100% over 29bp)
CCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCTTGTAATCTC 525614
2.2106372475809497 RNA PCR Primer, Index 12 (100% over 44bp)
CCACCACGTTCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACC 416041
1.7497930632000402 RNA PCR Primer, Index 2 (100% over 34bp)
What would be the best way to remove this contamination? Also is is
still ok to use that data despite such high contamination? Ive
currently been trying to remove the sequence by using the clip adaptor
tool, using the following options;
library to clip 2: FASTQ Groomer on data H1
Minimum sequence length (after clipping, sequences shorter than this
length will be discarded) 15
Enter custom clipping sequence
GAATTCCACCACGTTCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAG
enter non-zero value to keep the adapter sequence and x bases that
follow it 0
Discard sequences with unknown (N) bases No
Output options Output only non-clipped sequences (i.e. sequences
which did not contained the adapter)
Clipped reads - discarded.
Input: 23776583 reads.
Output: 3091831 reads.
discarded 1287140 too-short reads.
discarded 18984774 adapter-only reads.
discarded 412838 clipp
but then I'm only left with 13% of the reads.
2) After I've filtered and clipped the adapter I want to analyse the
frequency of each miR. I've been using miranalyzer to do this, I use
the following workflow
data=>groomer=>clip adapter=>filter FastQ (min quality 20)=>fastq to
fasta=>collapse
the collapse file is like this;
GAATTCCACCACGTTCCCGTGG
CCACCACGTTCCCGTGG
TATTGCACTTGTCCCGGCCTGT
Then upload the collapse file to miranalyzer however the total reads
in the miranalyzer output is the same as the total number of sequences
in the collapse file, it doesn't seem to recognise the count number.
miranalyzer says the following;
2.1 Input formats
miRanalyzer requires a single file containing the unique reads
and their counts. The application accepts two different input formats:
2.1.1 A tab or space separated file as in the following
example (read-count format):
GAGGTAGTAGGTTGTA 49862
ACCCGTAGAACCGACC 15490
... ...
GGAGCATCTCTCGGTC 13762
2.1.2 A multifasta file:
GAGGTAGTAGGTTGTA
ACCCGTAGAACCGACC
....
GGAGCATCTCTCGGTC
The description field must hold the read count. If not set, it
is supposed to be 1. The file must have extension fa, fasta or
mfa.
Do you know how I could change my format so it can recognise the read
count e.g. maybe change the '-' to a space?
3) Ive recently got the local install of galaxy but encounter the
following error when I try to add a file to my data libary
Error attempting to display contents of library (New data library):
(OperationalError) no such column: True u'SELECT
dataset_permissions.id AS dataset_permissions_id,
dataset_permissions.create_time AS dataset_permissions_create_time,
dataset_permissions.update_time AS dataset_permissions_update_time,
dataset_permissions.action AS dataset_permissions_action,
dataset_permissions.dataset_id AS dataset_permissions_dataset_id,
dataset_permissions.role_id AS dataset_permissions_role_id XnFROM
dataset_permissions XnWHERE True AND dataset_permissions.action = ?'
['access'].
Ive got the latest version of galaxy and am using chrome and mountain
lion os x
changeset: 7986:12fcd068b12e
tag: tip
user: Daniel Blankenberg <dan@bx.psu.edu>
date: Thu Oct 18 11:22:12 2012 -0400
summary: Do not hide failed datasets with HideDatasetAction post
job action.
Any help will be greatly appreciated
Thank you
Rosie Griffiths
ADD COMMENT
• link
•
modified 6.0 years ago
by
Jennifer Hillman Jackson ♦ 25k
•
written
6.0 years ago by
Rosie Griffiths • 10