Two approaches to generating well-crafted prompts that output great images with large language models.
I've spent the last few days playing around with running Stable Diffusion (SD) on my M1 Mac. I used some of the fixes from this GitHub thread to run it and leverage the M1 GPU via MPS (Metal Programming Framework) on PyTorch. I've generally found two good strategies:
- Look at the training data.
- Look at the input/output pairs.
The training data. SD was trained on datasets collected by LAION (Large-scale Artificial Intelligence Open Network, a non-profit). Most of the data come from Common Crawl.
- Search engine by image name - https://rom1504.github.io/
- Sortable dataset - https://laion-aesthetic.datasette.io/laion-aesthetic-6pls
The input/output pairs. Another service, Lexica, is a search engine seeded with over 5 million SD prompt-image pairs from the Stable Diffusion discord. It's useful for figuring out what artists and concepts the model understands.