Google announces many new AI models, and features in its latest Google I/O 2024 event, that might be more helpful in our daily life utilizing the best of artificial intelligence in devices.
Google hosted its annual I/O developer conference on 14th May of this year, where it unleashed at least half a dozen new AI products that ranges from new search and enhanced Gemini to AI tools. The event unfolded Google’s futuristic pathway underlined by multi-modal artificial intelligence, and its plans to be ahead in the AI race with its competitors, especially OpenAI.
Here are 10 major tools and models announced in the Google I/O event and what it means, how it helps you in real life. Make sure, you read to the last to grab every detail of the new AI features, that’s gonna help you.
Products announced at Google I/O 2024
Project Astra – Multimodal AI assistant
Google is aiming to bring Jarvis of Iron Man’s to all its users and that’s what ‘Project Astra’ is about. Project Astra is a multimodal AI assistant that’s being built by Google’s DeepMind unit. It can see, analyze and interpret our surroundings so that users can ask about anything around them to Astra.
In a video demoed at the Google I/O, Google showed how the AI helps in answering to people for questions on their surroundings through video and audio. For example, it helps people remember where they had kept their glasses, answer what a specific part of speaker is called when the speaker is shown to the AI.
Astra is still in prototype stage now, but it gives a glimpse of what it will be when it’s released. Sundar Pichai said he expects Project Astra to launch in Gemini later this year.
AI Teammate – AI for Workspace
Gemini AI of Google will power the firm’s Workspace apps like Gmail, Slides, Sheets, Docs and so on. The AI will simplify your tasks and makes it easy to schedule, organize, and search information from a vast space of data.
For instance, when you receive a meeting mail mentioning the time to join, the AI will show options to directly schedule it in calendar with just a click of a button. Similarly, you can ask anything about your mails, sheets or docs to Gemini AI and get answers instantly.
Google Veo – Text-to-video generation
Google announced its own AI video generating tool “Veo”, which can create high-definition 1080p videos exceeding 60 seconds. Using generative AI, Veo can create videos in a wide range of cinematic and visual styles from text prompts. With an advanced understanding of natural language and visual semantics, it generates video that closely represents a user’s creative vision — accurately capturing a prompt’s tone and rendering details in longer prompts.
Veo is now available only to selective creators to experiment with it, before going public.
Ask Photos powered by Gemini
Another notable feature unveiled by Google is this “Ask Photos”. Google had made AI to understand users’ photos and help them to easily identify them, collect details from it via a simple voice prompt.
In a demo, when asked “What is my license number?”, the AI went directly to the license card, analyzed it and told the license number precisely. It’s awesome.
Indeed, another demo showed a person asking best photos of him in every hiking experience and it loaded for few seconds and gave the photos right on the screen. How easy it is. The feature helps the most, when you have the memory of the photo but can’t search in your dumped gallery.
Detecting Phone-call Scam
Google will soon roll out a feature that detects scam callers in seconds. Expected at the second part of 2024, the feature will display a pop-up indicating if a call is likely a scam using AI tools that spots suspicious language. Google’s Gemini Nano with a multimodality AI will power this feature.
Earlier Google Pay announced an update wherein it will detect possible fraud payments before you initiating a payment.
Enhanced new Search experience
Google plans to introduce assistant-like planning capabilities directly within search. Means, users can ask about a process and the AI search will give results from the starting point to the ending point of the process as a sequence.
It explained that users will be able to search for something like, “Create a 3-day meal plan for a group that’s easy to prepare,” and you’ll get a starting point with a wide range of recipes from across the web.
Expanded AI overviews
AI Overviews is now available in Google Search in U.S. The search-giant makes it easy to get more precise answers to complex search questions in one-go. This AI Overviews provide quick summary of answers to any search questions. For example, if a user searches for the best way to clean leather boots, the results page may display an “AI Overview” at the top with a multi-step cleaning process, gleaned from information it synthesized from around the web.
The AI Overviews is now expanded as a multimodal AI, being able to interpret all forms of media and provide answers from it.
LearnLM – GenAI model for Learning
The tech giant comes up with another useful AI tool for curious learners. The dedicated AI model called “LearnLM” is a new family of models fine-tuned for learning, based on Gemini. Grounded in educational research and tailored to how people learn, LearnLM represents an effort across DeepMind, Google Research to help make learning experiences more engaging, personal and useful.
It can be accessed in Google Search, Android’s Circle to Search, Gems (customer versions of Gemini) and even on YouTube (‘raise your hand’ feature).
What’s new in Google Lens
As far as Google progress to offer “multimodality,” or integrating more images and video within generative AI tools, the firm said it will begin testing the ability for users to ask questions through video, such as filming a problem with a product they own, uploading it and asking the search engine to figure out the problem. In one example, Google showed someone filming a broken record player while asking why it wasn’t working. Google Search found the model of the record player and suggested that it could be malfunctioning because it wasn’t properly balanced.
Imagen 3 & Audio Overviews
Imagen 3 is Google’s highest quality text-to-image model, capable of generating an incredible level of detail producing photorealistic, lifelike images. Imagen 3 better understands natural language, the intent behind your prompt and incorporates small details from longer prompts. The model’s advanced understanding helps it master a range of styles.
The firm also introduced “Audio Overviews,” the ability to generate audio discussions based on text input. For instance, if a user uploads a lesson plan, the chatbot can speak a summary of it. Or, if you ask for an example of a science problem in real life, it can do so through interactive audio.
With this range of AI powered features, Google marches forward enhancing the way in which it services to people, catering more value. When it comes to thinking futuristically, the search-giant is well ahead of the race. In fact, Microsoft was very worried about Google’s AI advancement after which it started investing heavily in OpenAI, a report stating 2019 mail from Microsoft says.
Google not only concentrates on artificial intelligence, but every other possible technology including quantum computing, in order to effectively render its services and enhance the user experience.
(For more such interesting informational, technology and innovation stuffs, keep reading The Inner Detail).
Kindly add ‘The Inner Detail’ to your Google News Feed by following us!