Hi Tan,
I'll break down your questions with answers.
RNA-seq with Tuxedo pipeline (Tophat, Cufflinks, etc.)
1. Q: Is it possible to input data aligned to reference genome "A" with reference annotation "B"?
No, both must be based on the same exact base reference genome, meaning same chromosomes and coordinate system (nucleotide content is implied at some steps). While this will trigger technical errors with the tools, as will even unintentional variations in chromosome/scaffold/etc naming when the reference genome is in fact the same (one of the most common early errors users experience with this tool set), the underlying reasons are biological. Rearrangements will occur at some rate between any two distinct species - and the purpose of RNA-seq analysis is to identify novel variants or characterize the differential expression between known and/or novel variants. The differences between transcript variants can be subtle. These differences will almost certainly be lost in noise of other rearrangements if performed using cross-species techniques at the genome-wide study level.
But, you can test for this yourself to get a bead on how divergent the species are. Align the sequences from the reference annotation set (the knowns) to the cross-species genome. Perfect match for all or not, and if not at what rate? Then keep in mind that you are only examining known genes (actually, transcripts) and that homology to other knowns is one of the most common methods for discovery and identification/proposed characterization (function, etc.) for newly sequenced data. There is a bias for conserved knowns. Novels/variants will be underrepresented, especially in new data, if this was the method used to create the annotation data. You can also test by aligning the same-species to the native genome and the related-species to the cross genome and compare mapping rates for a rough estimate.
Please understand - cross-species information is very valuable. But it is probably not best for this tool set. But depending on how far you want to take this, you could create a GTF from the cross-species annotation transcripts, aligned to the genome you have, perform some curation on the results, and use that with the pipeline (BLAT would be a good aligner for this). Both would then be technically based on the same genomic backbone and pass through the tool. But I'd be very cautious with this approach and aware of the factors when interpreting the results. Much depends of how similar the genomes and transcriptomes for the species involved really are, and curating gene/transcript data is tedious.
2. Q: Replicates merged before or after mapping?
Run replicates through individually. There is value in this approach. Also remember that you can use Tophat to align to transcriptomes as well as genomes. More about this is in the manual for the tool and there is discussion at the tophat.cufflinks@gmail.com google group. I'd recommend this as a first pass, if the annotation is reasonably complete.
3. Q: Map cross-species and omit native reference annotation?
You could certain test this out. Exploring is good. I also suggested doing the mapping portion of this and comparing to the mapping rate of native RNA-seq data, to better understand what the results are. If the data doesn't map well, this is probably not even worth considering for further analysis (downstream tools). I'd start by mapping the reference annotation cross-species - if that doesn't map well - you have the answer without doing the rest (plus it will be easier to interpret).
4. Q: Best assembler for insect?
I can't help here, I've worked primarily with mammal and plant genomes regarding de-novo assembly. Let's see if someone else on the board can answer. I'd also recommended researching what was used for the genome you have, examining recent literature in your field, and going to the various tool site's home pages since they will often note how the parameters were tuned/use scope (I realize this may sound pedantic - but is not intended that way - and is a good double check even if/when you get advice). Then you have a truth test of sorts - the reference annotation. Try a few assembly methods. Align the annotation to the assemblies. Depending on the quality of the annotation (expect some variation - this will always be true first pass - sometimes in surprising ways), you should be able to make an informed decision.
Good luck with your work!
Jen, Galaxy team