OpenAI releases new version of GPT-4o via Azure

ChatGPT-maker OpenAI has released a new version of its GPT-4o large language model (LLM), designed to simplify the process of generating “well-defined” and “structured” outputs from AI models.

“This feature is particularly valuable for developers who need to validate and format AI outputs into structures like JSON schemas. Developers often face challenges validating and formatting AI outputs into well-defined structures like JSON schemas,” the company wrote in a blog post, adding that an early release version of the LLM has been made available on Microsoft’s Azure OpenAI Service.

The structured outputs generated by the new version of the LLM, christened GPT-4o-2024-08-06, are a result of the LLM allowing developers to specify the desired output directly from the AI models.

JSON schema, according to OpenAI and Microsoft, is used by developers to maintain consistency across platforms, drive model-driven UI constraints, and automatically generate user interfaces.

“They are also essential for defining the structure and constraints of JSON documents, ensuring they follow specific formats with mandatory properties and value types. It enhances data understandability through semantic annotation and serves as a domain-specific language for optimized application requirements,” the companies explained.

Additionally, the companies said that these schemas also support automated testing, schema inferencing, and machine-readable web profiles, improving data interoperability.

The new LLM supports two kinds of structured outputs — user-defined JSON schema and strict mode or more accurate tool output.

While the user-defined outputs allow developers to specify the exact JSON Schema they want the AI to follow, the strict mode output, which is a limited version, lets developers define specific function signatures for tool use, the companies said.

The user-defined output is supported by GPT-4o-2024-08-06 and GPT-4o-mini-2024-07-18. Alternatively, the limited strict mode is supported by all models that support function calling, including GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, and GPT-4o models.

GPT-4o was first announced in May 2024, as OpenAI’s new multimodal model, followed by GPT-4o mini in July 2024.

Microsoft has yet to make the pricing of the new model available on its pricing portal.

Critical time for the company and competition

The new model comes at a time when the company is undergoing critical changes in its leadership and is facing stiff competition from rivals such as Anthropic, Meta, Mistral, IBM, Google, and AWS.

This week, OpenAI co-founder John Schulman joined the list of departing OpenAI executives making the move to Anthropic — a rival LLM maker and provider that was founded by a group of executives leaving OpenAI.

Schulman follows Jan Leike, who made the move from OpenAI to Anthropic back in May.

Ilys Sutskever also left a senior role with OpenAI, but he is launching his own AI effort.

Another troubling indicator at OpenAI was the transfer of safety executive Aleksander Madry, who was mysteriously reassigned. This also follows OpenAI co-founder Andrej Karpathy’s departure back in February.

At the same time, OpenAI is also facing stiff competition for its LLMs. One such example is Meta’s newly unveiled Llama 3.1 family of large language models (LLMs), which includes a 405 billion parameter model as well as 70 billion parameter and 8 billion parameter variants.

Analysts and experts say that the openness and accuracy of the Llama 3.1 family of models pose an existential threat to providers of closed proprietary LLMs.

Meta in a blog post said that the larger 405B Llama 3.1 model outperformed models such as Nemotron-4 340B Instruct, GPT-4, and Claude 3.5 Sonnet in benchmark tests such as MMLU, MATH, GSM8K, and ARC Challenge.

Its performance was close to GPT-4o in these tests as well. For context, GPT-4o scored 88.7 in the MMLU benchmark and Llama 3.1 405B scored 88.6.

MMLU, MATH, GSM8K, and ARC Challenge are benchmarks that test LLMs in the areas of general intelligence, mathematics, and reasoning.

The smaller Llama 3.1 models of 8B and 70B, which have been updated with larger context windows and support for multiple languages, also performed better or close to proprietary LLMs in the same benchmark tests.

Another example is the release of Claude 3.5 Sonnet in June, which according to the LLM-maker, set new scores across industry benchmarks, such as graduate-level reasoning (GPQA), MMLU, and HumanEval — a test for coding proficiency.

While analysts and industry experts have been waiting for OpenAI to release GPT-5, company CEO Sam Altman’s post on X about summer gardens and strawberries has sparked speculations that the company is working on a next-generation AI model that can crawl the web to perform research.