OpenAI Introduces Combined Text-audio-vision Chatbot model, GPT-4o

Enjoyed this video? Join my Locals community for exclusive content at keneci.locals.com!
21 days ago
69

According to OpenAI blog post, GPT-4omni accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation

Loading comments...