This might be a dumb question but is it possible to create images through API with a prompt + example images, when I go to the API reference and go to Images it says I can give a prompt and/or an input image, the model will generate a new image. But when I ask chatGPT if I can do this it says: Here’s what OpenAI’s API actually supports:
Text-to-Image (prompt only):
You send just a prompt → get a generated image.
Image Inpainting (prompt + image + mask):
You send an input image, a mask (to show what part to change), and a prompt.
The model generates new content only for the masked area.
This is image editing, not full image-to-image transformation.
Image Variations (image only):
You send an image → it returns multiple variations (same general content/style).
For image input along with prompt, the same image model that ChatGPT uses is called gpt-image-1, and it is also on the edits API endpoint to accept input images that you can talk about in a prompt. Then it is just up to the language model to understand and not deny your request against safety policies.
While you can use DALL-E 2 to infill or outfill transparent areas (like ChatGPT talks about) right away after funding an API account, the gpt-4o-based gpt-image-1 model requires ID Verification that OpenAI has put in front of several features, where a third-party company forces you to upload a government ID with a webcam, seen on both sides, and then take video selfies of yourself, then to be used for the company’s own undisclosed interests into perpetuity.
An output image with a few input images can range from $0.05 to $0.30 per image - even if the result is denied to you by another round of vision inspection.
So if you are already a ChatGPT subscriber and can’t figure out a way how to profit off that as a product, I would avoid the API model. Just talk to ChatGPT with a few upload images and get the same personal results.
Try here: Whisk - any Google account, multiple input images that you can categorize by style or subject; better results. Your competition and OpenAI’s.
But I need the API model as I want to use it in a software, if I use the edits API endpoint and add a mask where the whole image is transparent would that work the same as in ChatGPT to create a new image?
With gpt-image-1, you don’t need to use the mask form field, and gpt-image-1 doesn’t respect it or consume that input image anyway. The documentation that maintains that a mask works in any useful manner is a big fib.
You just send multiple input images and discuss what you want done. Or simply send one image and the ancillary purpose, as minor as, “I want this artist’s style” or “use this color palette”.
The image edits endpoint is the only one accepting image input plus prompt.
The image generate endpoint is purely text prompt without image in the prompting.
You can also use the Responses “chat” endpoint with an image creation tool, but this is an extra wasteful layer between you and the image creation that is a massive cost amplifier, still requiring ID verification to do what ChatGPT will do for any free account.