From www.zdnet.com

Google IO 2024
Kerry Wan/ZDNET

Google has had an eventful year already, rebranding its AI chatbot from Bard to Gemini and releasing several new AI models. At this year’s Google I/O developer conference, the company made several more announcements regarding AI and how it’ll be embedded across the company’s various apps and services.

As expected, AI took center stage at the event, with the technology being infused across nearly all of Google products, from Search, which has remained mostly the same for decades, to Android 15 to, of course, Gemini. Here’s a roundup of every major announcement made at the event so far. And stay tuned for the latest updates.

It wouldn’t be a Google developer event if the company didn’t unveil at least one new large language model (LLM), and this year, the new model is Gemini 1.5 Flash. This model’s appeal is that it is the fastest Gemini model served in the API and a more cost-efficient alternative than Gemini 1.5 Pro while still highly capable. Gemini 1.5 Flash is available in public preview in Google’s AI studio and Vertex AI starting today.

flash-utility.png

Google

Even though Gemini 1.5 Pro was just launched in February, it has been upgraded to provide better-quality responses in many different areas, including translation, reasoning, coding, and more. Google shares that the latest version has achieved strong improvements on several benchmarks, including MMMUMathVistaChartQADocVQAInfographicVQA, and more.

Also: Google I/O 2024: 5 Gemini features that would pull me away from Copilot

Furthermore, Gemini 1.5 Pro, with its 1 million context window, will be available for consumers in Gemini Advanced. This is significant because it will allow consumers to get AI assistance on large bodies of work, such as PDFs that are 1,500 pages long.

Google

As if that context window wasn’t already large enough, Google is previewing a two million context window in Gemini 1.5 Pro and Gemini 1.5 Flash to developers through a waitlist in Google AI Studio.

Also: The best AI chatbots: ChatGPT and alternatives

Gemini Nano, Google’s model designed to run on smartphones, has been expanded to include images in addition to text. Google shares that starting with Pixel, applications using Gemini Nano with Multimodality will be able to understand sight, sound, and spoken language.

Gemma 2

Google

The Gemini sister family of models, Gemma, is also getting a major upgrade with the launch of Gemma 2 in June. The next generation of Gemma has been optimized for TPUs and GPUs and is launching at 27B parameters.

Lastly, PaliGemma, Google’s first vision-language model, is also being added to the Gemma family of models.

If you have opted into the Search Generative Experience (SGE) via Search Labs, you are familiar with the AI overview feature, which populates AI insights at the top of your search results to give users conversational, abridged answers to their search queries.

Now, using that feature will no longer be limited to Search Labs, as it is being made available to everyone in the U.S. starting today. The feature is made possible by a new Gemini model, customized for Google Search.

ai-overviews-break-it-down-still.png

Google

According to Google, since AI overviews were made available through Search Labs, the feature has been used billions of times, and it has caused people to use Search more and be more satisfied with their results. The implementation into Google Search is meant to provide a positive experience for users, and only appear when it can add to Search results.

Also: The 4 biggest Google Search features announced at Google I/O 2024

Another significant change coming to Search is an AI-organized results page that uses AI to create unique headlines to better suit the user’s search needs. AI-organized search will begin to roll out to English-language searches in the U.S. related to inspiration, starting with dining and recipes, then movies, music, books, hotels, shopping, and more, according to Google.

ai-organized-results-page-still.png

AI organized results page

Google

Google is also rolling out new Search features that will first be launched in Search Labs. For example, in Search Labs, users will soon be able to adjust their AI overview to best suit their preferences, with options to break down information further or simplify the language, according to Google.

Users will also be able to use video to search, taking visual searches to the next level. This feature will be available soon in Search Labs in English. Lastly, Search can plan meals and trips with you starting today in Search Labs, in English, in the U.S.

ai-overviews-meal-planning-still.png

Google

In addition to the technology having its own “What’s new in Google AI” keynote, we see generative AI sprinkled throughout other keynote descriptions as well, including a mention under the “What’s new in Android” keynote and another titled “What’s new in Firebase for building gen AI features.”

Also: Google teases an AI camera feature ahead of I/O that looks better than Rabbit R1’s

There are also over 10 confirmed developer “technical sessions” relating to generative AI, focused on topics including the latest Gemma advancements and learning how to use multimodal retrieval-augmented generation (RAG) with Gemini.

Google isn’t new to text-to-video AI models, having just shared a research paper on its Lumiere model in January. Now, the company is unveiling its most capable model to date, Veo, which can generate high-quality 1080p resolution video lengths beyond a minute.

The model can better understand natural language to generate video that more closely represents the user’s vision, according to Google. It also understands cinematic terms like “timelapse” to generate video in various styles and give users more control over the final output.

Also: Meet Veo, Google’s most advanced text-to-video generator, unveiled at Google I/O 2024

Google shares that it does build on years of generative video work, including Lumiere and other prevalent models such as Imagen-Video, VideoPoet, and more. The model is not yet available for users; however, it is available for select creators as a private preview inside VideoFX, and the public is invited to join a waitlist.

This video generator seems to be Google’s answer to Open AI’s text-to-image model, Sora, which is also not yet widely available and in private preview to red teamers and a select number of creatives.

Google also unveiled its next-generation text-to-image generator, Imagen 3. According to Google, this model produces the highest quality images yet, with more details and fewer artifacts in images to help create more realistic images.

Like Veo, Imagen 3 has improved natural language capabilities to better understand user prompts and the intention behind them. This model can tackle one of the biggest challenges for AI image generators, text, with Google saying Imagen 3 is the best for rendering it.

Also: The best AI image generators: Tested and reviewed

Imagen 3 is not widely available just yet, available in private preview inside Image FX for select creators. The model will be available soon in Vertex AI, and the public can sign up to join a waitlist.

In the era of generative AI we are in now, we are seeing companies focus on the multimodality of AI models. To make its AI-labeling tools fit accordingly, Google is now expanding its SynthID, Google’s technology that watermarks AI images, to two new modalities –text and video. Furthermore, Google’s new text-to-video model, Veo, will include SynthID watermarks on all videos generated by the platform.

If you have ever spent what felt like hours scrolling through your feed to find the picture you are searching for, Google unveiled an AI solution to your problem. Using Gemini, users can use conversational prompts in Google Photos to find the image they are looking for.

Ask Photos

Screenshot by Sabrina Ortiz/ZDNET

Also: Google’s new ‘Ask Photos’ AI solves a problem I have every day

In the example, Google gave, a user wants to see their daughter’s progress as a swimmer over time, so they ask Google Photos that question, and it automatically packages the highlights for them. This feature is called Ask Photos, and Google shares that it will roll it out later this summer with more capabilities to come.

In February, Google launched a premium subscription tier to its chatbot, Gemini Advanced, which granted users access to bonus perks such as access to Google’s latest AI models and longer conversations. Now, Google is upgrading its subscribers’ offerings even further with unique experiences.

Also: What is Gemini Live? A first look at Google’s new real-time voice AI bot

The first, as mentioned above, is access to Gemini 1.5 Pro, which grants users access to a much larger context window of one million tokens, which Google says is the largest of any widely available consumer chatbot on the market. That larger window can be leveraged to upload larger materials, such as documents of up to 1,500 pages or 100 emails. Soon, it will be able to process an hour of video and codebases with up to 30,000 lines.

Next, one of the most impressive features of the entire launch is Google’s Gemini Live, a new mobile experience in which users can have full conversations with Gemini, choosing from a variety of natural-sounding voices and interrupting it mid-conversation.

AI Agents - Project Astra Google IO

Kerry Wan/ZDNET

Later this year, users will also be able to use their camera with Live, giving Gemini context of the world around them for those conversations. Gemini uses video understanding capabilities from Project Astra, a project from Google DeepMind meant to reshape the future of AI assistants. For example, the Astra demo showed a user pointing out the window and asking Gemini what neighborhood they were likely in from what they saw.

Gemini Live is essentially Google’s take on OpenAI’s new Voice Mode in ChatGPT, which the company announced at its Spring Updates event yesterday, through which users can also carry out full-blown conversations with ChatGPT, interrupting mid-sentence, changing the chatbot’s tone, and using the user’s camera as context.

Taking another page from OpenAI’s book, Google is introducing Gems for Gemini, which accomplishes the same goal as ChatGPT’s GPTs. With Gems, users can create custom versions of Gemini to suit different purposes. All a user needs to do is share the instructions of what task it wants the chatbot to accomplish, and Gemini will create a Gem that suits that purpose.

Also: How to use ChatGPT (and what you can use it for)

In the upcoming months, Gemini Advanced will also include a new planning experience that can help users get detailed plans that take into account their own preferences, going beyond just generating an itinerary.

For example, with this experience, Google says Gemini Advanced could create an itinerary that fits the multi-stepped prompt, “My family and I are going to Miami for Labor Day. My son loves art, and my husband really wants fresh seafood. Can you pull my flight and hotel info from Gmail and help me plan the weekend?”

Lastly, users will soon be able to connect more Extensions into Gemini, including Google Calendar, Tasks, and Keep, allowing Gemini to do tasks within each one of those applications, such as taking a photo of a recipe you took and adding it your Keep as a shopping list, according to Google.

FAQs

When is Google I/O?

Google’s annual developer conference is here, taking place on May 14 and 15 at the Shoreline Amphitheatre in Mountain View, California. The opening day keynote, when Google leaders take the stage to unveil the company’s latest hardware and software, will begin at 10 AM PT / 1 PM ET.

How to watch Google I/O

Google will livestream the event on its main website and YouTube for members of the public and the press. You can register for the event on the Google I/O landing page for free to take advantage of perks such as receiving email updates and watching on-demand sessions. There will be an in-person element to I/O too, as has been the case for the past two years, with media and developers invited to attend. ZDNET will be among the crowd in Mountain View.

Google

[ For more curated Samsung news, check out the main news page here]

The post Everything announced at Google I/O 2024: Gemini, Search, Project Astra, and more first appeared on www.zdnet.com

New reasons to get excited everyday.



Get the latest tech news delivered right in your mailbox

You may also like

Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments