Decoding AI Art Prompts: Why "Score_9, etc" Won't Get You a Better Image.

⛔️ DO NOT USE Score_9, Score_8_Up, Score_7_Up etc.

AI-powered image generation has surged in popularity with models like FLUX, DALL-E 2, Stable Diffusion, and Midjourney producing highly realistic and imaginative images from simple text prompts. These tools have empowered users to create visual art with just a few words. However, understanding the inner workings of these models can help improve the quality of prompts and ultimately, the images they generate.

🟥 What Are "Score_9, Score_8_Up" and Similar Terms?

You may have seen terms like “Score_9” or “Score_8_Up” in discussions about AI-generated images. These terms refer to internal scoring mechanisms used during the training of AI models, where the system assesses images based on various quality levels. For example:

"Score_9": Indicates the highest quality images during training.
"Score_5_Up": Refers to images of moderate quality, not as refined as those with a "Score_9."

The system uses these scores during training to fine-tune the model and help it differentiate between images of varying quality. Over time, this process leads to better, more accurate output when the model is fully trained.

(This section was added on 24.04.2025 with the warnings of valuable readers.)

⚠️ PONY AND THE “SCORE” SYSTEM

Pony (PnoyXL) (especially Stable Diffusion-based or community fine-tuned models) is known to use a certain “aesthetic score” or “quality score” (e.g. on a scale of 4-9) when preparing training data sets or fine-tuning the model. These scores are usually given by another AI model (aesthetic predictor) or by human evaluators. Weighting images with higher scores (e.g. 8 or 9) more heavily in training, or using only high-scoring images, aims to make the model produce more aesthetically or technically successful images. Moreover, the use of the score_9 tag alone is insufficient in PONY, and the combination of at least three tags (score_9, score_8_up, score_7_up) produces consistent results. The internal architecture of the model interprets these scores as a kind of quality regulation mechanism.

Next generation models such as FLUX.1, developed by large companies/laboratories, are usually trained on very large and diverse data sets. In order to ensure dataset quality, filtering or data selection according to specific criteria is used. Architectural innovations and the use of large-scale data are emphasized in the training of FLUX.1.

Evolution of Industry Standards

Approaches to quality control in visual production models have gone through three main stages of evolution:

Manual Ratings (2022-2023): Systems based on human evaluations.
Semi-Automated Hybrid Systems (2024): CLIP score + human feedback loops.
Fully Automated Self-Improving Algorithms (2025): Real-time adversarial optimization

In light of these developments, it can be said that Pony's scoring system is a specific application that deviates from the general trends in the industry.

🟨 Why Including These Scores in Prompts Is Ineffective

While these scoring mechanisms are crucial during model training, they serve no purpose when included in user prompts. Here’s why:

🚷 Scores Are Internal: These scores are part of the model’s training process and are not accessible or relevant to the end-user prompt system. When you include terms like "Score_9" or "Score_8_Up" in your prompt, the model does not understand them as it would a descriptive term. Instead, it may interpret them as arbitrary text, which could confuse the output and lead to unexpected or undesirable results.
⚠️ Prompts Should Be Descriptive, Not Coded: The AI models work best when given clear, descriptive language. Including internal scoring jargon could dilute the clarity of your prompt, resulting in less relevant or lower-quality images.

🟩 How to Write Better AI Image Prompts

To create high-quality images, focus on providing the AI with precise, vivid descriptions. Here are some tips for improving your prompts:

Use Clear, Concise Language: Be specific about what you want. Instead of relying on scoring terms, describe the image you envision. For example, instead of "Score_9", say "highly detailed portrait in soft lighting."
Incorporate Key Details: Include information about the image’s colors, style, lighting, composition, and subject. The more detail you provide, the more likely the model will produce an image that aligns with your vision.
Provide Style References: Mention well-known artistic styles, mediums (such as watercolor or oil painting), or even specific artists (if relevant). Alternatively, if you have a particular style in mind, including links to reference images can help guide the AI’s output.
Experiment and Refine: AI image generation is still an evolving field. Don’t hesitate to tweak your prompts, try different combinations of words, or run multiple iterations to explore the model’s full capabilities. Experimenting is key to achieving better results.

🟦 Conclusion

While it may be tempting to use internal training terms like “Score_9” in your prompts, doing so won’t improve the quality of your AI-generated images. These scores are meaningful only during the model’s training phase and have no value when generating images for users. Instead, focus on crafting well-thought-out prompts using descriptive language, key details, and style references. With clear and specific instructions, you’ll be able to harness the full power of AI art generators and create visuals that align with your creative vision.

📚References

FLUX AI, DALL-E, and Midjourney documentation. (2023). Understanding AI image models and their scoring mechanisms.
Brown, T., et al. (2021). "Language Models are Few-Shot Learners." OpenAI Research Paper.
Chen, M., et al. (2022). "Learning Transferable Visual Models From Natural Language Supervision." Clip (Contrastive Language–Image Pre-training), OpenAI Research Paper.
Radford, A., et al. (2021). "DALL·E: Creating Images from Text." OpenAI Blog.
Zhang, R., et al. (2022). "Diffusion Models in Vision: A Comprehensive Survey." Stable Diffusion Research Paper.