6 weeks ago by
United States
Hello,
The two column tabular dataset should contain:
transcript_id <tab> gene_id
Where <tab>
is a whitespace tab character. There should be no headers, no extra spaces, no extra tabs, and no trailing empty lines. The transcript_id and gene_id should not be the same term/value.
The transcript_id should be an exact match for the transcript fasta identifiers in your reference transcriptome. That fasta should have no description content on the title line (">" line). It should only have the sequence identifier for the transcript. Often the tool NormalizeFasta is enough to clean up a fasta dataset and sometimes more text manipulation is needed to reformat the identifier (it depends on where the transcriptome was sourced).
The formatting rules for Custom Genomes are the same as for Custom Transcriptomes: FAQs: https://galaxyproject.org/support/
For the outputs, when all is set up correctly, the Quantification data will have the transcript names in the first column and Gene Quantification data will have the gene names in the first column.
Please check your inputs against the above and let us know if you need more help with that part.
For this part of your question, I'm not sure what you mean. "moreover, I have tried to send a specific table with these two columns directly on ensemble but couldn't send any query to a galaxy". Could you explain more about what steps you are doing and what is going wrong?
Thanks, Jen, Galaxy team