The new Azure OpenAI gpt-4o-realtime-preview model opens the door for even more natural application user interfaces with its speech-to-speech capability. This new voice-based interface also brings an interesting new challenge with it: how do you implement retrieval-augmented generation (RAG), the prevailing pattern for combining language models with your own data, in a system that uses audio for i