Question: Fasta dataset contains non-ATCGN characters and fails with FastQC
9 months ago by
jay20 wrote:


I am getting this error on job submission. it is just a subset of the error. Not all files gives me this error. How to resolve this?

File "/opt/galaxy-user/galaxy-17.09-github/.venv/lib/python2.7/site-packages/MySQLdb/", line 208, in unicode_literal return db.literal(u.encode(unicode_literal.charset)) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2018' in position 117: ordinal not in range(256)

Are you using quotation marks in those file names?

Hi, The filename doesn't have quotation marks

8 months ago by
United States
Jennifer Hillman Jackson25k wrote:


This looks like an input formatting problem.

  1. Make sure the input is nucleotide content and not protein (amino acids).
  2. Check the fasta content for non-ATCGN content. IUPAC characters will cause problems.
  3. Ensure the datatype assigned is correct. Note that fastqsanger, fastq, fasta, fastqsanger.gz, fastq.gz, fasta.gz are all distinct datatypes - and must match the input datatype compression (if used).

Thanks! Jen, Galaxy team

Thanks Jennifer the file contains invalid charaters

