I have two files: one containing my original transcriptome reads and another containing blast+ blastn annotations for the transcriptome. I want to combine the description and subject ID number to the title of my original transcriptome sequences so that way they read with the query ID, the subject ID on the NCBI database, and the description.
For example, my transcriptomes look like this:
>Bta00064
ATGGCGTTCCAACTACTACTTCTCAGCGTCGGTGTCGCTG....
When I blastn the sequences, blast+ created a tab delimited file which looks like this (note: each section is separated by a tab):
Bta00064 Bemisia tabaci strain NJ-Imi cytochrome P450 (CYP6DV5) mRNA, complete cds gi|339896252|gb|JN165250.1| JN165250 96.87 0.0 1340 2242
I want to make a new file where the transcriptomes have fasta titles that contain the query ID, subject ID, and subject description like this:
>Bta00064 Bemisia tabaci strain NJ-Imi cytochrome P450 (CYP6DV5) mRNA, complete cds gi|339896252|gb|JN165250.1
ATGGCGTTCCAACTACTACTTCTCAGCGTCGGTGTCGCTG....
I'm having trouble doing this. my first file is a fasta with my sequences, my second file is the blast+ outputs I formatted as (query ID, subject description, subject ID, percent match, e-value, length, bit value). I want to make a fasta with the original sequences but with the titles turned into >query ID, subject description, subject ID.
Galaxy recognizes these columns, but I can't seem to combine the two. Basically, how do I combine two files so that they align based on a single factor and then turn them into FASTAs using specific columns?
Thank you for the help!