Question: Comparing experimental reads to reference reads in csv
I have a file with experimental reads I have cleaned (currently in fastqsanger format. I would like to match them to sequences I have stored in a csv file. I would also like to count the number of duplicates per matched sequence. Can you suggest a workflow to get this done?

Thank you. I am very new to this and all help is much appreciated!

Heather Watkins

Convert your reference sequences from cvs format to tabular format, then fasta format. It can then be used as a custom reference genome with tools, including mapping tools.


Galaxy tutorials:

Thanks! Jen, Galaxy team

Thank you Jennifer. Your reply was very helpful ... so I'm back for more advice.

I was able to upload my reference genome in tabular format, convert to fasta, and then per the FAQ link you gave me, I used NGS Picard: Normalize fasta on it.

My reads , after using Trimmomatic on them, were in fastqsanger format. I mapped them to the custom genome using NGS Mapping: Map with BWA-MEM, successfully. However, when I try to use NGS Picard: Mark Duplicates, I get this fatal error:

Fatal error: Exit code 1 () Picked up _JAVA_OPTIONS: -Xmx7g -Xms256m [Thu Apr 26 18:47:14 CDT 2018] picar

Do you have any idea why I am running into this error?

Thank you, Heather

Hi - Could you please send in a bug report from the error dataset? This is how: Thanks!

