The latency is not real-time yet but we're working on getting it to near real time. Regarding controlling the voice, we've added a few params like rate, voice guidance and temperature but for the most part the emotion is dependent on the text for now.