Artificial Intelligence video generators are close to a tipping point
Generative AI can destroy civilization: Sam Altman in the midst of the Twitter Twist, and how GPT-4 is putting it together
The pace of change is insane. Four months after it was released to the public, it is still there. Within two months, it reached 100 million users. TikTok, the internet’s previous instant sensation, took nine. Google, scrambling to keep up, has rolled out Bard, its own AI chatbot, and there are already various ChatGPT clones as well as new plug-ins to make the bot work with popular websites like Expedia and OpenTable. GPT-4, the new version of OpenAI’s model released last month, is both more accurate and “multimodal,” handling text, images, video, and audio all at once. The new release of MidJourney has made it clear that you will need to see every single image you see, since the deep fake sensations of Donald’s Trump “arrest” and the Pope looking fly in a silver coat, made it clear that you will soon have to treat every single image
In the midst of this frenzy, I’ve now twice seen the birth of generative AI compared to the creation of the atom bomb. The comparison made by people with opposing views about what it means is striking.
One of them is the closest person the generative AI revolution has to a chief architect: Sam Altman, the CEO of OpenAI, who in a recent interview with The New York Times called the Manhattan Project “the level of ambition we aspire to.” The others are Aza Raskin and Tristan Harris, both of whom are famous for warning about the perils of social media. They are now going around warning that generative AI could destroy nothing less than civilization itself, by putting tools of awesome and unpredictable power in the hands of just about anyone.
Altman, to be clear, doesn’t disagree with Harris and Raskin that AI could destroy civilization. He just claims that he’s better-intentioned than other people, so he can try to ensure the tools are developed with guardrails—and besides, he has no choice but to push ahead because the technology is unstoppable anyway. There is a mix of faith and fatalism.
For the record, I agree that the tech is unstoppable. I think that the guardrails that are put in place at the moment are weak. It would be a fairly trivial matter, for example, for companies like OpenAI or MidJourney to embed hard-to-remove digital watermarks in all their AI-generated images to make deepfakes like the Pope pictures easier to detect. A coalition called the Content Authenticity Initiative is doing a limited form of this; its protocol lets artists voluntarily attach metadata to AI-generated pictures. I don’t think any of the big generative AI companies will join this kind of effort.
Every time you respond on social media, make a website, send an email, or post a photo on the internet, your data is collected and used to train artificial intelligence to create text, audio, and video with just a few words. This has real consequences: OpenAI researchers studying the labor market impact of their language models estimated that approximately 80 percent of the US workforce could have at least 10 percent of their work tasks affected by the introduction of large language models (LLMs) like ChatGPT, while around 19 percent of workers may see at least half of their tasks impacted. There is an immediate labor market shift with image generation. The data that was created may cause you to lose your job.
When a company builds tech on the internet, it makes sense to make it available to everyone. But critics have noted that GPT-4 lacked any clear information or specifications that would enable anyone outside the organization to replicate, test, or verify any aspect of the model. Some of these companies have received vast sums of funding from other major corporations to create commercial products. For some in the AI community, this is a dangerous sign that these companies are going to seek profits above public benefit.
At Harvard University and the Minderoo Centre for Technology and Democracy, there are two places where Rumman is currently a researcher. Previously, she was the Director of Machine Learning Ethics, Transparency, and Accountability at Twitter.
Code transparency doesn’t seem to ensure that generative AI models serve the public good. The benefits of an LLM for a journalist, policy analyst, or accountant are small if the data underpinning it is available. Some companies would have to open their code and data to an expert auditor if the Digital Services Act were to become law. And open source code can sometimes enable malicious actors, allowing hackers to subvert safety precautions that companies are building in. Transparency is a laudable objective, but that alone won’t ensure that generative AI is used to better society.
In the nuclear proliferation era after World War II, for example, there was a credible and significant fear of nuclear technologies gone rogue. Many of the discussions we have today are based on the belief that society had to act collectively to avoid global disaster. In response, countries around the world, led by the US and under the guidance of the United Nations, convened to form the International Atomic Energy Agency (IAEA), an independent body free of government and corporate affiliation that would provide solutions to the far-reaching ramifications and seemingly infinite capabilities of nuclear technologies. It operates in three main areas: nuclear energy, nuclear safety and security, and safeguards. It provided critical resources, education, testing, and impact reports and helped to ensure ongoing nuclear safety in the aftermath of the Fukushima disaster. However, the agency is limited: It relies on member states to voluntarily comply with its standards and guidelines, and on their cooperation and assistance to carry out its mission.
Ai Video Generators Are Nearing a Crucial Tipping Point: A Brief Review of Recent Advances in AI Image-Making Tools
The Oversight Board of Facebook attempts to balance transparency with accountability. The Board’s decisions are binding, since they are an intersectionality global group and they decided to remove the post about sexual harassment from India. This model isn’t perfect either; there are accusations of corporate capture, as the board is funded solely by Meta, can only hear cases that Facebook itself refers, and is limited to content takedowns, rather than addressing more systemic issues such as algorithms or moderation policies.
But you only need to look at how advanced images from Midjourney and Dream Studio are now to sense where AI video is heading—and how difficult it may become to distinguish real clips from fake ones. Of course, people can already manipulate videos with existing technology, but it’s still relatively expensive and difficult to pull off.
The DALL-E 2 image generator, which was restricted by Openai, was the source of the original open source version of Craiyon. The tool was the first example of what the computer could do with a text prompt. Since then, DALL-E has become open to everyone, and programs like Midjourney and Dream Studio have developed and honed similar tools, making it relatively trivial to craft complex and realistic images with a few taps on a keyboard.
As engineers have tweaked the algorithmic knobs and levers behind these image generators, added more training data, and paid for more GPU chips to run everything, these image-making tools have become incredibly good at faking reality. To take a few examples from a subreddit dedicated to strange AI images, check out Alex Jones at a gay pride parade or the Ark of the Covenant at a yard sale.
Widespread access to this technology, and its sophistication, forces us to rethink how we view online imagery, as was highlighted after AI-made images purporting to show Donald Trump’s arrest went viral last month. The incident led Midjourney to announce that it would no longer offer a free trial of its service—a fix that might deter some cheapskate bad actors but leaves the broader problem untouched.
Source: https://www.wired.com/story/ai-video-generators-are-nearing-a-crucial-tipping-point/
Using the Cloud to Create an Efficient Method for Applying Stylistic Corrections to 3D Images I: Leona Walking Through a Cloud
Runway ML, a startup that’s developing AI tools for professional image and video creation and editing, this week launched a new more efficient technique for applying stylistic changes to videos. I used it to create this dreamlike footage of my cat, Leona, walking through a “cloudscape” from an existing video in just a few minutes.
Different machine learning techniques can open new possibilities. A company called Luma AI, for instance, is using a technique known as neural radiance fields to turn 2D photographs into detailed 3D scenes. Feed a few snapshots into the company’s app, and you’ll have a fully interactive 3D scene to play with.
For now, the instinct to trust video clips is mostly reliable, but it might not be long before the footage we see is less solid and truthful than it once was.