12 Days of OpenAI Recap

AI for Everyone: from phone calls to new frontier models

Dec 29, 2024

Happy holidays! This is James, and I hope you are enjoying time with friends and family over the holiday break. As we close 2024, I have summarized key features and use cases for new capabilities released during the “12 Days of OpenAI.”

The best way to learn these new ChatGPT features is to get hands-on. I encourage you to watch these videos and think about how you can change your daily workflow to boost your productivity and creativity.

AI innovation is going to rapidly impact businesses and careers in 2025. With the planned release of the o3 family of models set for release at the end of January 2025, it’s clear that we are rapidly approaching AGI. You will have world-class PhDs in your pocket 24x7.

How will you use ChatGPT to compete in 2025?

Day 1: OpenAI o1 and o1 Pro Mode

The main focus of Day 1 of the 12 Days of OpenAI was the launch of OpenAI o1, a new AI model touted as the "smartest model in the world." This model is designed to "think before it responds," resulting in more detailed and accurate outputs.

Features

Here are some key features and benefits of the o1 model:

Faster and smarter than its preview version: o1 is significantly faster than the O1 preview released in September, addressing user concerns about slow response times. It also makes fewer mistakes and provides a better overall user experience.
Improved coding performance: o1 demonstrates a substantial leap in coding capabilities, making it particularly valuable for developers and engineers.
Multimodal input and reasoning: o1 can process both text and images, enabling it to understand complex problems and generate insightful responses.

Use Cases

Here are some examples of how o1 can be used for personal and business purposes:

Personal: Take a picture of a handwritten to-do list and ask o1 to organize it into a digital format, categorize tasks, or even set reminders. This could streamline personal task management and improve overall organization.
Business: o1's advanced reasoning and coding abilities make it a valuable tool for businesses in various fields. For instance, engineers could use o1 to solve complex technical problems, while data scientists could leverage its capabilities to analyze data and generate visualizations. The multimodal input feature allows businesses to use images for problem-solving and decision-making. For example, a manufacturing company could use o1 to analyze images of product defects and identify potential causes.

In addition to o1, OpenAI also launched ChatGPT Pro, a new subscription tier offering ($200/month) unlimited access to o1 and other models, as well as features like o1 Pro Mode for even more advanced problem-solving capabilities.

My Perspective

I have been using o1 for tasks where I want to ideate on alternatives and plans. Even though o1 is the most powerful model, it’s more expensive to use, and it’s not suitable for every task. o1 also produces more tokens for each turn; therefore, your subscription is limited to a preset usage. I recommend using 4o for everyday tasks where you need iteration, like writing an article on Canvas. Use o1 where you need the power of thinking, planning, and scenario optimization.

Day 2: Reinforcement Fine-Tuning

Day 2 of the 12 Days of OpenAI unveiled Reinforcement Fine-Tuning (RFT) for the o1 series of models. This new feature allows users to customize o1 using their own datasets and the power of reinforcement learning algorithms, enabling the creation of expert AI models for specific domains and tasks.

Features

Here’s why RFT is a game-changer:

Trains models to reason in new ways: RFT goes beyond the capabilities of Supervised Fine-Tuning, which focuses on replicating features found in the input data. Instead, RFT teaches the model to think through problems and arrive at solutions by reinforcing successful lines of thinking and discouraging unsuccessful ones.
Requires minimal data: RFT can achieve significant performance improvements with as little as a few dozen training examples, a remarkably small number in the world of large language models.
Opens up possibilities for diverse applications: RFT has the potential to revolutionize fields like law, finance, engineering, and scientific research by enabling the creation of AI assistants that can reason over custom domains and assist with complex, specialized tasks.

Example Use Case: Scientific Research

A collaboration with researchers at Berkeley Lab showcased the potential of RFT in scientific research, specifically in the study of rare genetic diseases. The researchers used RFT to train o1 mini to predict potential causative genes based on a patient’s symptoms.

The project highlighted the following key points:

Data Set: Researchers used a dataset of case reports from scientific publications, each containing patient symptoms, absent symptoms, and the known causative gene.
Grader: A custom grader evaluated the model’s performance by comparing its predicted gene ranking to the correct answer, providing a score between 0 and 1.
Results: RFT significantly improved o1 mini’s accuracy in predicting causative genes, exceeding the performance of the base o1 model on the same task. This success showcased the potential of RFT to empower researchers with AI assistants capable of reasoning over complex scientific data and assisting in crucial research endeavors.

OpenAI is expanding access to RFT through a research program, allowing select organizations to leverage this technology for their specific needs. A public launch is planned for early next year.

My Perspective

The introduction of reinforcement fine-tuning feels like a game-changer for developers and researchers who want to customize OpenAI's models. It empowers users to leverage OpenAI's expertise in reinforcement learning and model training without needing to delve into those complexities themselves.

Day 3: Sora - Video Generation

Day 3 of OpenAI's 12 Days of OpenAI marked the highly anticipated launch of Sora, their groundbreaking AI-powered video generation tool. Sora represents a significant leap in AI capabilities, allowing users to create realistic and imaginative videos using text prompts, images, or a combination.

Features

Here are the key features of Sora:

Text-to-video generation: You can create videos simply by describing what they want to see in natural language. Sora interprets the text and generates a corresponding video, bringing their imagination to life. For instance, a prompt like "a majestic lion walking through a lush jungle" can be transformed into a captivating video.
Image-to-video generation: Sora can animate still images, breathing life into them and creating dynamic scenes. You can upload an image and provide instructions on how they want the image to move or change over time. Imagine animating a painting of a sunset by adding moving clouds and birds flying across the horizon.
Storyboard feature: This feature allows users to create videos scene-by-scene, providing granular control over the narrative and visual elements. You can describe each scene using text or images, sequence them on a timeline, and let Sora seamlessly connect them into a cohesive video.
Editing capabilities: Sora offers a range of editing tools to refine and enhance videos. You can trim and extend scenes, remix videos by applying different styles and aesthetics, and adjust lighting, camera angles, and other visual aspects.
Multimodality: Sora goes beyond text and images, allowing users to combine various modalities to create richer and more engaging videos. For example, a user could upload a music track and ask Sora to create a video that visually complements the music.
High-quality output: Sora can generate videos in various aspect ratios and resolutions, including up to 1080p. The videos are designed to be realistic and visually appealing, capturing details and nuances based on user instructions.

Sora is available to ChatGPT Plus and Pro users as part of their existing subscriptions at https://sora.com.

Use Cases

Here are some of the potential use cases for Sora:

Filmmaking and animation: Sora can revolutionize the film and animation industries by enabling filmmakers and animators to quickly prototype ideas, create animatics, and generate visuals for their projects, saving time and resources.
Advertising and marketing: Businesses can use Sora to create engaging video ads and marketing content tailored to specific audiences. The ability to generate variations allows for A/B testing and optimization.
Education: Educators can leverage Sora to create immersive educational videos, bringing historical events, scientific concepts, and literary works to life.
Personal storytelling: Anyone can use Sora to create personal video stories, sharing their experiences, memories, and creative visions. The accessibility of Sora empowers individuals to become storytellers.

OpenAI emphasizes that Sora is still under development and will continue to improve. While Sora showcases remarkable capabilities, it's crucial to remember that it's a tool designed to augment human creativity, not replace it. The user's vision and direction remain paramount in shaping the final video output.

My Perspective

I tested the video creation process and found it to be basic right now and consumed credits quickly. I expect Sora to innovate quickly like other OpenAI products have. Video is a captivating medium, and I can see a lot of opportunities for people to use it to strengthen their personal brands.

Day 4: Canvas

Day 4 of the 12 Days of OpenAI introduced Canvas, a collaborative workspace that enhances the writing and coding experience within ChatGPT. This tool allows users to work side-by-side with ChatGPT, iterating and refining content more visually and interactively.

Features

Here are the key features and benefits of Canvas:

Side-by-side view: Canvas provides a split-screen view, with the familiar chat interface on the left and the Canvas workspace on the right. This allows users to simultaneously see their conversation history and the evolving document or code.
Collaborative editing: Both the user and ChatGPT can edit the content within Canvas, fostering a true collaborative experience. Users can add text, bold or format text, and make other changes, just like in other document editors.
Real-time feedback and suggestions: Canvas offers convenient shortcuts to request edits, feedback, and adjustments from ChatGPT. Users can ask for suggestions, adjust the length or reading level of the text, add a final polish, or even inject some fun with emojis.
Code execution: Canvas supports the execution of Python code, allowing users to see the output directly within the workspace. This feature includes generating graphics, making Canvas a powerful tool for data visualization and exploration.
Integration with Custom GPTs: Users can now add Canvas to their custom GPTs, enabling them to leverage this collaborative workspace within their personalized AI assistants.

Canvas is available to all ChatGPT users, regardless of their subscription plan.

Use Cases

Here are some potential use cases for Canvas, showcasing its versatility for both personal and professional tasks:

Storytelling: Users can collaborate with ChatGPT to write engaging stories for children, adding emojis and adjusting the reading level to suit the audience.
Writing and editing: Canvas makes writing and editing essays, articles, and other written content easier. Users can receive feedback and suggestions from ChatGPT, refine their writing style, and ensure grammatical accuracy.
Learning to code: Canvas offers an interactive environment for learning and practicing coding. Users can write and execute Python code, experiment with different libraries, and visualize data with the help of ChatGPT.
Business reporting and data analysis: Professionals can use Canvas to collaborate on reports, analyze data, and create compelling visualizations. The ability to execute code and generate graphics within the workspace streamlines the workflow and enhances data-driven decision-making.

My Perspective

I love this feature! I find it useful to iterate my content in the Canvas window and have ChatGPT refine it. I like how you can select a section of text and then state how you want to improve it or have ChatGPT explain it. Canvas makes content iteration more seamless and faster.

Day 5: Apple Intelligence

Day 5 of OpenAI's 12 Days of OpenAI marked a significant step towards integrating ChatGPT into everyday life through strategic partnerships with Apple. These integrations aim to provide seamless access to ChatGPT's capabilities across the Apple ecosystem, making AI assistance more readily available and intuitive.

Features

Effortless Access Through Siri: Users can now interact with ChatGPT directly through Siri voice commands. Siri intelligently determines when ChatGPT’s assistance would be beneficial and seamlessly hands off the task. This integration enables natural, conversational interactions with ChatGPT, making it feel like an extension of the user's daily workflow.
Elevated Writing with ChatGPT: ChatGPT's capabilities are now woven into Apple Intelligence’s Writing Tools, expanding beyond existing features like refining and summarizing. Users can now harness ChatGPT's power to compose entire documents from scratch, streamlining writing tasks and boosting productivity.
Visual Intelligence at a Glance: On iPhone 16, users can access a new “visual intelligence" mode, powered by ChatGPT. By activating this feature through a long press on the camera control button, users can point their camera at objects or scenes, and ChatGPT will analyze and provide information about what it sees. This feature transforms the iPhone into a powerful tool for exploration and learning, bringing a new dimension of real-world interaction to ChatGPT.
Seamless Continuity Across Apple Devices: The integrations span across iPhones, iPads, and Macs, ensuring a consistent and frictionless ChatGPT experience regardless of the device in use. Users can invoke ChatGPT from various apps and contexts within the Apple ecosystem, experiencing AI assistance that adapts to their needs and preferences.

OpenAI's collaboration with Apple signals a clear commitment to democratizing AI and making it an integral part of everyday life. These integrations empower users to interact with ChatGPT in more natural and intuitive ways, paving the way for broader adoption of AI and unlocking its potential across a wide range of applications.

My Perspective

As an iPhone user, I like configuring it to use my ChatGPT account to tap into memory. The Visual Intelligence feature uses the dedicated camera button on the lower right of the camera. I love the image reasoning capability of ChatGPT and use it often.

📕 OpenAI FAQ on Apple Intelligence and Visual Intelligence

Day 6: Advanced Voice Mode

Day 6 of OpenAI's 12 Days of OpenAI brought the highly anticipated launch of live video and screen-sharing capabilities within ChatGPT's Advanced Voice mode. These new features elevate the conversational experience within ChatGPT, making interactions more engaging, versatile, and personalized.

Features

Face-to-Face Interaction: Users can now have live video calls with ChatGPT, enabling more personal and expressive conversations. Nonverbal cues, facial expressions, and visual context add a new layer of depth and understanding to the interaction.
Real-Time Collaboration: Screen sharing capabilities allow users to share their screen content with ChatGPT in real-time. This feature unlocks various collaborative possibilities, from troubleshooting technical issues to brainstorming creative projects.

Use Cases

Personal Use Cases:

Remote Learning and Tutoring: Students can receive personalized tutoring from ChatGPT, sharing their screen to work through problems together.
Creative Brainstorming: Friends or family members can brainstorm ideas for projects, using video and screen sharing to visualize concepts and share inspiration.

Business Use Cases:

Technical Support: Customers can receive real-time technical assistance from ChatGPT, sharing their screen to demonstrate issues and receive guidance.
Sales and Product Demos: Sales professionals can use video and screen sharing to provide interactive product demonstrations to potential customers.

Video and screen sharing in Advanced Voice mode is gradually rolling out to Teams, Plus, and Pro users.

These new additions to ChatGPT demonstrate OpenAI's commitment to pushing the boundaries of conversational AI. By incorporating visual and personalized elements, OpenAI is transforming ChatGPT into a more versatile and engaging platform for both personal and professional interactions.

My Perspective

I love voice conversation and find it an easy and friendly way to collaborate with ChatGPT. Sharing real-time video from your iPhone opens up even more ways to collaborate as you navigate the world.

📕 OpenAI instructions to set up Voice mode.

Day 7: Projects

Day 7 of OpenAI's 12 Days of OpenAI introduced a significant organizational feature to ChatGPT: Projects. This new functionality empowers users to manage their conversations, files, and custom instructions within ChatGPT, making it a more robust and personalized tool for various tasks.

Projects: Your Personalized ChatGPT Workspace

Conversation Organization: Projects act as folders for organizing ChatGPT conversations, allowing users to group related chats and easily access past interactions.
File Uploads and Management: Users can upload files directly to their projects, providing context and reference materials for ChatGPT. This feature makes working with specific documents, data sets, or code easier within a dedicated project space.
Custom Instructions for Tailored Responses: Users can set project-specific instructions for ChatGPT, influencing its tone, style, and behavior within that project. This customization feature ensures ChatGPT's responses align with each project's specific goals and requirements.

Use Cases

Personal Use Cases:

Home Management: Create a "Home Maintenance" project to track repairs, store appliance manuals, and ask ChatGPT for guidance on specific tasks, like replacing a water filter. (as demonstrated in the sources).
Event Planning: Organize a "Secret Santa" project to manage participant lists, track gift ideas, and use ChatGPT to generate emails, assign gift givers, and suggest ideas based on web searches. (as demonstrated in the sources).
Creative Writing: Establish a "Novel Writing" project to store character profiles and plot outlines and use ChatGPT for brainstorming, generating dialogue, or even composing entire scenes based on uploaded reference materials.

Business Use Cases:

Software Development: Set up a "Website Development" project to upload code files, design mockups, and use ChatGPT to generate code snippets, debug issues, or even write documentation. (as demonstrated in the sources)45
Marketing Campaigns: Create a "Product Launch" project to store marketing materials, target audience data, and use ChatGPT to generate social media posts, write ad copy, or analyze market trends based on web searches.
Research Projects: Organize a "Scientific Research" project to upload relevant papers, data sets, and use ChatGPT to summarize findings, generate research questions, or explore related concepts through web searches.

Availability and Rollout:

Projects are being rolled out to Plus, Pro, and Teams users, with availability for free users and Enterprise/EDU plans in the near future.

The introduction of Projects marks a significant step in transforming ChatGPT from a conversational AI into a versatile and personalized workspace. By combining the power of organization, file management, and tailored instructions, Projects empower users to tackle more complex and multifaceted tasks, further blurring the lines between a simple chatbot and a powerful AI assistant.

My Perspective

This feature is similar to Claude Anthropic, and a must-have for organizing your personal information assets for ChatGPT to access.

Day 8: Search

Day 8 of OpenAI's 12 Days of OpenAI celebrated the expansion of ChatGPT Search, a feature that empowers ChatGPT to access and process real-time information from the web, making it a more informed and comprehensive AI assistant. The key announcements focused on improved functionality, expanded access, and seamless integration with other ChatGPT features.

ChatGPT Search: A More Intelligent Way to Explore the Web

Enhanced Performance and Mobile Experience: OpenAI introduced several improvements to ChatGPT Search based on user feedback. These enhancements include:
- Faster search results for a more efficient user experience.
- Optimized mobile experience, making search more intuitive and user-friendly on iPhone and Android devices.
- Enhanced Maps integration, providing richer visual results and seamless navigation within the ChatGPT interface, especially useful for finding local businesses and exploring locations.
Seamless Integration with Advanced Voice Mode: ChatGPT Search is now integrated with Advanced Voice mode, enabling users to initiate web searches using voice commands. This integration further streamlines the interaction with ChatGPT, making it a more natural and hands-free experience.
Expanded Access for All Logged-in Users: ChatGPT Search, previously available only to paid users, is now accessible to all users who have created a free account and are logged in. This expansion significantly broadens the reach of this powerful feature, making it available to a wider audience.

Use Cases

Finding Local Events and Activities: Users can ask ChatGPT to find events in their city, like concerts, festivals, or even holiday markets, receiving up-to-date information, visual results, and direct links to relevant websites.
Discovering Restaurants and Businesses: Users can search for restaurants or businesses based on specific criteria, like cuisine type, outdoor seating, or even the availability of heaters, receiving comprehensive information, images, and map integrations directly within ChatGPT.
Planning Trips and Vacations: Users can ask ChatGPT to research travel destinations, find attractions, and get real-time information like weather forecasts and event schedules, streamlining the planning process and accessing everything they need in one place.

OpenAI’s expansion of ChatGPT Search underscores its commitment to making AI a more accessible and integrated tool for daily life. By combining the power of conversational AI with real-time web information, OpenAI is transforming ChatGPT into a more versatile and indispensable tool for learning, exploration, and accomplishing tasks.

My Perspective

Over the last few days, I have used the search tool to help me evaluate various products I was seeking to buy on Amazon. It felt more intuitive to conduct this research in a collaborative conversation. I expect it will not be as powerful as native Google Search within the Google Gemini assistant.

Day 9: Dev Day

Day 9 of OpenAI's 12 Days of OpenAI, themed as a "Mini Dev Day," focused on empowering developers and startups building applications with OpenAI's API. Several new models, features, and developer tools were announced, enhancing the capabilities and accessibility of OpenAI's technology for a wider range of use cases.

GPT-3.5 Turbo Out of Preview and Enhanced with New Features

General Availability: GPT-3.5 Turbo, previously in preview, is now fully available in the API. This model is known for its speed, cost-effectiveness, and strong performance in various applications.
Function Calling: This feature allows developers to describe functions to the model, and have it intelligently choose to call those functions, returning structured data. This enables the creation of more interactive and dynamic applications.
Structured Outputs: Developers can now specify the format of the model's output, ensuring consistency and making it easier to integrate with other systems. This feature is particularly useful for applications requiring data in a specific format, like JSON.
Developer Messages: Building on the instruction hierarchy concept, developer messages allow developers to provide high-level instructions to steer the model's behavior. These messages are distinct from user messages and provide a clear way to guide the model's responses.
Reasoning Effort: A new parameter that controls how much time the model spends "thinking" before responding. This allows developers to optimize for cost and speed, using less compute for simpler tasks and more for complex ones.
Vision Inputs: GPT-3.5 Turbo can now process images as input, opening up new possibilities for applications in fields like manufacturing, science, and education.

Potential Use Cases for these features:

Building AI-Powered Chatbots: Function calling and structured outputs can create more interactive and data-driven chatbot experiences. For example, a chatbot for a restaurant could use function calling to handle reservations or check menu availability, and structured outputs to return data in a format easily used by other systems.
Automating Data Entry and Analysis: Vision inputs, combined with structured outputs, can extract information from images, like forms or documents, and convert it into structured data.
Creating Educational Tools: Vision inputs can create interactive learning experiences, allowing students to ask questions about images, receive explanations, or even generate different artistic interpretations.

Real-Time API Enhancements

WebRTC Support: This new feature simplifies building real-time voice experiences with OpenAI, enabling low-latency audio streaming, echo cancellation, and other benefits of WebRTC. Developers can build applications like custom voice assistants or real-time voice translation tools with significantly less code and complexity.
Reduced Costs: GPT-4 audio tokens are 60% cheaper, and GPT-4 Mini audio tokens are 10x cheaper, making real-time voice applications more accessible.
Python SDK: A new Python SDK streamlines the integration of real-time voice capabilities into applications.
Improved Function Calling and Guardrails: API changes make it easier to use function calling and guardrails within real-time voice applications.

Examples of potential applications:

Creating voice-controlled smart devices: Developers can integrate ChatGPT's real-time voice capabilities into smart devices, enabling natural and intuitive voice interactions.
Building immersive gaming experiences: Real-time voice interactions can enhance games with AI-powered characters or provide real-time feedback and guidance to players.

Preference Fine-Tuning for Enhanced Model Customization

Direct Preference Optimization: This new fine-tuning technique allows developers to train models to better align with user preferences. Instead of providing exact input-output pairs, developers provide pairs of responses, indicating which one is preferred. This allows for fine-tuning on more abstract qualities like helpfulness, conciseness, or creativity.

Potential Use Cases:

Tailoring Customer Support Chatbots: Train chatbots to provide concise and relevant responses, aligning with a company's preferred tone and style.
Personalizing Content Creation: Fine-tune models to generate content in a specific voice or style, for example, for marketing copy or creative writing.
Improving Content Moderation: Train models to better identify and flag content that violates specific guidelines.

Additional Developer Resources

New SDKs for Go and Java: Official SDKs for Go and Java are now available, expanding the language support for OpenAI's API.
Simplified Login and API Key Access: A streamlined process makes it easier for developers to sign up, get an API key, and start building.
Dev Day Talks on YouTube: Talks from previous Dev Day events are now available on YouTube, providing valuable insights and inspiration for developers.
Developer AMA: An Ask Me Anything session on the OpenAI Developer Forum allows developers to engage directly with OpenAI's team and get answers to their questions.

Day 9's announcements represent a significant investment in OpenAI's developer ecosystem, providing new tools and capabilities to unlock the potential of AI across a diverse range of applications. By making its technology more accessible and customizable, OpenAI is fostering a growing community of developers pushing the boundaries of what's possible with AI.

My Perspective

These features are targeted at the Dev audience, and I can see real value in the real-time API integrated into products and services. This will compete with Google’s Gemini’s Multimodal Live API.

Day 10: 1-800-CHAT-GPT

OpenAI's Day 10 announcement was about making ChatGPT as accessible as possible. Building on the existing web, iOS, Android, Mac, and Windows platforms, OpenAI expanded ChatGPT's reach to even more users, particularly those who may not have regular internet access.

Features

ChatGPT via Phone Call: OpenAI launched a dedicated phone number, 1-800-CHAT-GPT (1-800-242-8478), allowing users in the US to interact with ChatGPT through voice calls.
ChatGPT on WhatsApp: Expanding its global reach, OpenAI also launched ChatGPT on WhatsApp, enabling users worldwide to message ChatGPT.

Use Cases

The primary use case emphasized the ability to use ChatGPT without a reliable data connection. This allows people with limited internet access or those in areas with poor connectivity to still benefit from ChatGPT's capabilities.

My Perspective

While the previous days focused on cutting-edge advancements, Day 10 took a step back to address a fundamental issue: accessibility. Making ChatGPT available through phone calls and WhatsApp is a brilliant move to tap into the 300M+ global audience.

Day 11: Work with Apps

Day 11 of OpenAI's "12 Days of OpenAI" showcased the evolving role of ChatGPT, moving beyond simple chat interactions towards a more agentic and action-oriented assistant. The focus was on OpenAI's desktop apps (Mac and Windows) and their ability to interact directly with other applications on the user's computer. This integration enables ChatGPT to understand the context of the user's work and automate tasks within those applications.

Features

"Work With Apps" Feature: This feature allows ChatGPT to directly access and interact with other applications on the user's computer with their explicit permission. Instead of manual copy-pasting, ChatGPT can now pull in context directly from the active application, making the workflow much smoother.
Enhanced Desktop App Experience: OpenAI highlighted the benefits of its native desktop apps, emphasizing their lightweight performance, dedicated window space, and keyboard shortcut (option+space) for quick access, making ChatGPT a more seamless part of the user's workflow.
Warp Integration: Responding to user demand, OpenAI announced integration with the Warp terminal, demonstrating ChatGPT's ability to understand code and generate commands directly within the terminal environment.
Integration with Other ChatGPT Features: The "Work With Apps" feature seamlessly integrates with other ChatGPT capabilities, like Advanced Data Analysis. This allows users to leverage these features directly within the context of their applications. For example, a user can ask ChatGPT to generate a chart from data in a terminal window.
Support for Xcode and IDEs: Demonstrating broader application support, OpenAI showed how ChatGPT can interact with Xcode, a popular IDE, pulling in code context and generating code modifications.
Working with Notion: Moving beyond programming, OpenAI showcased ChatGPT's ability to work with Notion, a note-taking and project management application. ChatGPT was able to access content in a Notion document, conduct web research, and generate text tailored to the style of the document.
Advanced Voice Mode Integration: Taking interaction a step further, OpenAI integrated its Advanced Voice mode with the "Work With Apps" feature. This allows users to have voice conversations with ChatGPT to get insights and perform actions within their documents and applications. They even demonstrated this with a "Santa Mode" voice interaction.

My Perspective

The "Work With Apps" feature feels like a significant leap forward. It transforms ChatGPT from a helpful tool into a more integrated and active participant in my workflow. I tested this out with the VS Code app and ChatGPT could evaluate my Python code as I asked questions. I could see how this could accelerate coding tasks or make non-programmers more capable coding prototypes.

I can imagine a future where ChatGPT proactively suggests code improvements, drafts sections of my documents, or even automates repetitive tasks within my applications. The integration with Advanced Voice Mode adds another layer of intrigue—it feels almost like conversing with an AI colleague while I work!

Day 12: OpenAI o3 and o3-mini

The grand finale of OpenAI's 12 Days event brought us a glimpse into the future of reasoning models with the announcement of OpenAI o3 and OpenAI o3-mini. While not publicly available, these models showcase significant advancements in reasoning capabilities, particularly in coding.

Features

OpenAI o3: A Leap Forward in Reasoning: OpenAI o3 boasts exceptional performance on challenging technical benchmarks, particularly in coding. It excels in tasks like code generation, understanding complex instructions, and reasoning through multi-step problems.
OpenAI o3-mini: Cost-Efficient Reasoning: Alongside o3, OpenAI also introduced o3-mini, a smaller and more efficient model designed to make advanced reasoning capabilities more accessible. It supports features like function calling, structured outputs, and developer messages, offering developers a cost-effective alternative for integrating reasoning into their applications.
Benchmark Performance: OpenAI highlighted o3's remarkable performance on various benchmarks, including:
- Coding: o3 surpasses GPT-4.0 on coding benchmarks like HumanEval, MBPP, and APPS, showcasing its advanced code generation and problem-solving abilities.
- Math: o3 demonstrates comparable or superior performance to GPT-4.0 on math benchmarks like GSM8K and MATH.
- Reasoning: o3 excels in reasoning tasks, outperforming GPT-4.0 on benchmarks like ARC Challenge, BigBench, and TruthfulQA.
Multimodal Capabilities: Similar to its predecessors, o3 is multimodal, capable of understanding and processing text and images. This enables it to tackle problems that require visual reasoning, such as interpreting diagrams or analyzing images.
Adaptive Thinking Time: o3-mini supports adaptive thinking time, a feature allowing users to control how much time the model spends reasoning. This allows for fine-tuning performance and cost based on the complexity of the task.
Public Safety Testing: Unlike previous model releases, OpenAI is taking a new approach to safety. While o3 and o3-mini aren't publicly available yet, OpenAI is opening access to these models for public safety testing by researchers and security experts. This collaborative approach identifies and mitigates potential risks before a wider release.
Deliberative Alignment for Safety: OpenAI introduced a new safety technique called "Deliberative Alignment." This technique leverages the model's reasoning capabilities to enhance its ability to identify and reject unsafe prompts or those with harmful intent.

OpenAI’s Roadmap

You can apply for early access to conduct safety testing until January 10, 2025.
o3-mini is planned for launch at the end of January 2025 and 03 after that.

My Perspective

The unveiling of 03 and 03-mini marks another significant milestone in OpenAI's pursuit of advanced reasoning models.

Critical takeaway: we need to shift our mindset and career strategies to accept that we will have world-class PhD experts in our pockets, available 24x7 / 365 days a week. I can also see these PhDs as worker agents in multiagent workflows that will dramatically accelerate innovation and work processes.

What will be your unique competitive advantage once o3 is released next year?

Discussion about this post

Ready for more?