Text-to-Image AI Systems
A Text-to-Image system uses artificial intelligence to produce an image based exclusively on the input of a collection of words by a human user. Programs will yield different stylistic interpretations of such text, depending on the system and the depth and breadth of the data it uses. One of the most powerful examples to date of a text-to-image system is Google鈥檚 which is known for its remarkable photorealism as well as its impressive degree of linguistic interpretation. Its research team in May 2022 detailing key discoveries made to improve the quality of the system鈥檚 photorealistic image generation, featuring remarkable examples of picture creations from phrases like 鈥溾 and 鈥溾.
Another well-known text-to-image system is OpenAI鈥檚 , introduced in January 2021, whose original capabilities, according to its website, included 鈥渃reating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.鈥
Systems such as Imagen and 顿础尝尝鈭椭 are backed by well-funded companies and their abilities are increasing rapidly with ongoing research and development. For instance, 顿础尝尝鈭椭鈥檚 successor, , was announced the following year in April 2022, and already can generate of significantly greater resolution and accuracy, four times as detailed as the original system.
听
Public Use and Open-Sourcing
Google has thus far not released any code or a public demo for Imagen, regarding potential misuse. One of the areas of ongoing research is concerned with creating a responsible use framework that they claim will eventually facilitate an unrestricted open sourcing of the program.
OpenAI has noted regarding risks but 顿础尝尝鈭椭 does have a where anyone can sign up for a chance to try the program. As of July 20th, 2022, OpenAI鈥檚 that one million users will be invited to have the opportunity create using 顿础尝尝鈭椭 for free with a certain amount of credits, with the opportunity to purchase additional credits.
The open-source policies of other existing text-to-image AI systems stand in radical opposition to Imagen and 顿础尝尝鈭椭 such as and , the latter of which was just released to the public on August 22nd, 2022. For these companies, their text-to-image AI systems are completely open for the public at large to use freely. Both companies have stated that their purposes centre around developing tools to empower as many people as possible to be creators and to access this new technology.
听
Pledges of Responsibility and Current (Mis)Use
In support of their decision to not yet publicly release Imagen, Google researchers have undergone audits of Imagen鈥檚 training datasets and note that they 鈥渢end to reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalized identity groups鈥.
顿础尝尝鈭椭, for its part, claims to have removed explicit content from its training data so as to minimize the system鈥檚 exposure to harmful violent or hateful concepts. They also have worked to prevent photorealistic iterations of real people鈥檚 faces, particularly citing the concern of public figures and celebrities. For the release of 顿础尝尝鈭椭 2, researchers from OpenAI also published a that summarizes the initial findings of a risk analysis, notably citing concerns with generation of harmful content as well as biased training data.
For the select few users of 顿础尝尝鈭椭, OpenAI further requires all to adhere to their which prohibits non-G-rated or harmful image creation and sharing. The policy also mandates honest disclosure about AI involvement and respecting the rights of others, regarding both consensual image use and usage rights. The website further claims that it will act accordingly to violations of these rules, at most terminating the transgressor鈥檚 account.
The open access programs however have thus far been less proactive and more reactive regarding such concerns.
The team at Stable Diffusion, like Midjourney, has that they intend for the program鈥檚 ethical use but they have instituted limited barriers preventing users from maliciously exploiting the vast capabilities of these programs. According to a , upon the public release of Stable Diffusion, multiple threads of AI-generated deepfakes of nude celebrity photos appeared on the image-based forum, 4chan, whilst Reddit also has already had to ban numerous online communities dedicated to posting AI-created 鈥淣ot Safe For Work鈥 imagery. The CEO of Stability AI, Emad Mostaque, called the 4chan incident 鈥渦nfortunate鈥 and claimed the company was further working on improved safety mechanisms.
听
Paths to Policy
The current regulatory approach trends towards an enabling framework that leaves companies to tackle questions of both access and risk-management. The lack of restriction may prove positive with respect to companies such as Google and OpenAI, both of whom model a more sustainable approach for accountability from within a framework of experimentation and innovation. Their comprehensive research efforts facilitate close scrutiny of their systems capabilities alongside their conservative distribution policies.
The weakness of the laissez-faire policy approach however becomes clearer when considering the consequences of unfettered access to these open-sourced systems. In the absence of intervention from law and policymakers, open systems with unrestricted training data has implications for privacy violations from non-consensual image use and could easily exacerbate existing systemic biases if left unchecked. Unfiltered training data also raises questions in copyright as Stable Diffusion currently does not filter copyrighted works from its training data, which has already resulted in the imitation of distinct artistic styles and artists have begun to regarding copying.
Though the technology (and its public access) is in early days yet, its increasing capabilities and rapid open sourcing thus ultimately demand that these questions be addressed sooner rather than later by law and policymakers.
听
Artists of the Future
While this sudden onslaught of these powerful systems does demand close regulatory scrutiny, one cannot deny that using AI to enhance the creative arts does open plenty of doors as well. Improving greater accessibility to the arts is indeed one such benefit 鈥 it could facilitate a world where anyone with any background can be a creator, especially given that AI has been engaged in other art forms including , , , and even .
The further one travels down this path however, the more that the term 鈥淎I-created Art鈥 becomes a non-sequitur. As creation is delegated to the work of a tool, 鈥渁rt鈥 increasingly becomes a trivial commodity, void of the human individuality that ultimately gives it its expressive and meaningful quality.
Indeed, instances of the 鈥淎I Artist鈥 already have been making waves. On August 29th, 2022, artist Jason Allen鈥檚 , created in large part using Midjourney, took the $300 first prize in the fine arts category at the Colorado State Fair. The winning submission, a portrait of three long-robed staring into a heavenly landscape, was so detailed that its digital origins were able to pass completely undetected by the judges. One of the judges, even after learning of the work鈥檚 origins claimed she would not have changed her ranking and stated that Allen 鈥渉ad a concept and a vision he brought to reality, and it鈥檚 really a beautiful piece.鈥
As we do continue to engage with these highly proficient digital creators, so too must we be prepared to answer the questions demanded by accountability and confront the consequences borne of artistic commodification.