MultiModal AI Assistant, My GPT-4o Alternative

OpenAI isn’t available in Hong Kong and I really, really wanted to use the GPT-4o voice model that they were showing off everywhere online to practice speaking Thai, Mandarin, Cantonese, and a bit of Spanish. I got so fed up that I ended up just pulling a bunch of APIs to make my own “GPT-4o” model that I could communicate with using text or voice input that would respond with lifelike audio outputs, simulating natural conversations.

I used Llama 3.1 for communication, Azure for the audio outputs, and Gemini Flash 1.5 for screen and camera computer vision as a bonus. The system could communicate in 100+ languages with 400+ realistic voice options. While this wasn’t really anything innovative, it was fun to be able to solve an annoying problem of mine.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Smart Debt Settler
  • AI Algorithmic Trading Bot
  • HKUST Corporate Consulting Project Sponsored by JPMorgan
  • HKUST Corporate Prototyping Project
  • Project Melo Summit