Sending and receiving real-time audio will cost developers twice the rate of text-only large language models.