close
close

Coere argues that his new Aya Vision model AI is the best in the class

Coere argues that his new Aya Vision model AI is the best in the class

Cohere for youThe non -profit research laboratory of AI Cohere, this week launched a multimodal “ope” model, Aya Vision, said the lab is the best in the class.

Aya Vision can perform tasks such as writing image images, answering questions about photos, translating the text and generating summaries into 23 major languages. The Coher, which also makes the AYA vision available for free through WhatsApp, called it “a significant step to make the technical discoveries accessible to researchers around the world.”

“While you have made significant progress, there is still a big gap in the way in which the performing models in different languages ​​- one that becomes even more visible in the multimodal tasks that involve both text and images,” wrote cohere in a blog post. “Aya Vision aims to help explicitly help to close this gap.”

Aya Vision comes in a couple of flavors: Aya Vision 32B and Aya Vision 8B. The most sophisticated between the two, Aya Vision 32B, sets a “new border”, said Coher, the models that exceed 2x its size, including Meta Llama-3.2 90b vision on certain reference points of visual understanding. Meanwhile, Aya Vision 8B scores better on some evaluations than 10x models, according to Coher.

Both models are available From the AI ​​Dev Hugging platform do under a Creative Commons 4.0 license with Acceptable use of Coher. Cannot be used for commercial applications.

Coher said that Aya Vision was trained using a “diverse group” of English sets, which the laboratory translated and used to create synthetic annotations. Annotations, also known as labels or labels, help models understand and interpret the data during the training process. For example, annotation to train an image recognition model can take the shape of the markings around objects or subtitles that refer to each person, place or object described in a picture.

Coher aya Vision
The Aya Coher vision model can perform a series of tasks of visual understanding.Image credits:coarse

The use of Cher of synthetic annotations – that is, annotations generated by AI – is in trend. Despite the potential disadvantagesrivals, including Openai, are increasingly using synthetic data to train models as Well data in the real world dries. Gartner’s research company appraisal That 60% of the data used for AI and Analytics projects last year were created synthetically.

According to Coher, the training of the Aya vision of the synthetic annotations allowed the laboratory to use less resources, while obtaining competitive performances.

“This presents us the critical emphasis on efficiency and (do) more using less calculation,” wrote Coher in his blog. “This also allows greater support for the research community, which often have more limited access to calculation resources.”

Together with Aya Vision, Coere also launched a new reference suite, Ayavisionbench, designed to probe the skills of a “visual language” tasks, such as identifying the differences between two images and converting screenshots into the code.

The AI ​​industry is in the middle of what some have called “evaluation crisis”, a consequence of the reference popularization that Provide aggregate scores that correlate weakly with competence At tasks, most users have a care. The basket states that Ayavisionbench is a step towards rectifying this, offering a “wide and provocative” frame to evaluate the transverse and multimodal understanding of the model.

With any luck, this is indeed the case.

“(T) the data set serves as a robust reference point for evaluating the vision models in multilingual and real worlds”, cohere researchers wrote in a posting on the hugged face. “We make this set of evaluation available to the research community to promote multilingual multimodal assessments.”