ChatGPT has proven to be useful in numerous scenarios for people from various fields and now, OpenAI, the creator of the AI chatbot, is taking it to the next level. With the latest announcement, OpenAI has rolled out the ability for ChatGPT to see, hear and speak. Yes, you heard that right. Here’s how it works.
“We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about”, said OpenAI.
How can ChatGPT Speak?
OpenAI is employing a new voice technology for ChatGPT’s speaking skills. It is powered by a new text-to-speech model that has the capability of generating human-like audio from just text and a few seconds of sample speech. OpenAI says that it collaborated with professional voice actors to create each of the voices. “We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text”, said the company.
Using this new feature, you can now make use of your voice to engage in a back-and-forth conversation with ChatGPT. Furthermore, OpenAI says that it is also partnering with other companies to lend them this new voice technology. For instance, Spotify is using this technology for the pilot of their Voice Translation feature. This would help podcasters expand their reach as the technology will automatically translate the whole podcast in additional languages apart from the native language the podcast is spoken in.
To get started with voice, head to Settings and then go to New Features on the mobile App and opt into voice conversations. Then, tap the headphone button located in the top-right corner of the home screen and choose your preferred voice out of five different voices. The voice feature will only be available on Android and iOS apps of ChatGPT and to those who are part of the Plus membership or the Enterprise program.
How can ChatGPT See?
ChatGPT can now also see apart from its ability to hear and speak, as OpenAI has integrated a new technology in the chatbot where it can now analyse images. You can now show ChatGPT one or more images where it would analyse the photo and output a response to your query related to that particular image. One can explore the contents of their fridge to plan a meal, or analyze a complex graph for work-related data.
Morwover, to focus on a specific part of the image, you can use the drawing tool in ChatGPT’s mobile app. Using this, you can encircle the part of the image about which you have a query. “Image understanding is powered by multimodal GPT-3.5 and GPT-4. These models apply their language reasoning skills to a wide range of images, such as photographs, screenshots, and documents containing both text and images”, noted OpenAI.
To get started, you can tap the photo button to capture or choose an image. If you’re on iOS or Android, you will have to tap on the plus button first. Multiple images can be a part of your query. The feature will be available to all ChatGPT Plus and Enterprise users over the next two weeks on all platforms.
In comparison, Google recently also integrated Google Lens into Bard, it’s own AI chatbot. It works in a similar fashion where Bard can analyse the photo to spit out responses related to tha photo.