Hi Tiago,
such questions are really hard to answer as you can run such a service with just 100 cores and then people need to wait a little bit longer or you can easily add a few hundreds more to reduce the wait time. I would say BWA or BWA-mem should be finished in a few hours given 4 cores. 4 cores x 20 users = 80 cores. This should be enough. From my experience people are happy to wait if this is needed and they schedule there jobs over night or over weekends. So keep the queue busy and calculate with that.
Keep in mind that mapping needs some memory, so these mapping nodes should have enough >24GB. A few smaller once are also good for text-processing jobs or stacks.
Galaxy on the other side does not need much resources, so this is negotiable.
Storage is an other beast. Do you want to keep data over years? Are you fine that you users delete histories after a project is finished? What is your archive strategy? If you communicate this well people can deal with 250G quota and delete intermediate results as they can reproduce it with Galaxy at any time. So for 200 users, 20-30 TB should be enough but maybe buy it in a way that you can easily extend it.
Not sure this helps it's a complicated topic :(
Bjoern