Automatic1111’s Stable Diffusion WebUI now works with Intel GPU hardware, thanks to the integration of Intel’s OpenVINO toolkit that takes AI models and optimizes them to run on Intel hardware. We’ve re-tested the latest release of Stable Diffusion to see how much faster Intel’s GPUs are compared to our previous results, with gains of 40 to 55 percent.
Stable Diffusion (that currently has our previous testing, though we’re working on updating the results) is a deep-learning AI model used to generate images from text descriptions. What makes Stable Diffusion special is its ability to run on local consumer hardware. The AI community has plenty of projects out there, with Stable Diffusion WebUI being the most popular. It provides a browser interface that’s easy to use and experiment with.
After months of work in the background (we’ve been hearing rumblings of this for a while now), the latest updates are now available for Intel Arc owners and provide a substantial boost to performance.
Check out the Stable Diffusion A1111 webui for Intel Silicon. Works with my A770 or can run on your CPU or iGPU. It’s powered by OpenVINO, so its optimized.😃Example of image on the right, pure prompting. Left is same image with increase detail on the eyes using InPainting.… pic.twitter.com/zpbQOMvJF3August 17, 2023
Here are the results of our previous and updated testing of Stable Diffusion. We used a slightly tweaked Stable Diffusion OpenVINO for our previous testing, and have retested with the fork of Automatic1111 webui with OpenVINO. We also retested several of AMD’s GPUs with a newer build of Nod.ai’s Shark-based Stable Diffusion. The Nvidia results haven’t been updated, though we’ll look at retesting with the latest version in the near future (and update the main Stable Diffusion benchmarks article when we’re finished).
We should note that we also changed our prompt, which makes the new results generally more demanding. (The new prompt is “messy room,” which tends to have a lot of tiny details in the images that require more effort for the AI to generate.) There’s variation between runs, and there are caveats that apply specifically to Arc right now, but here are the before/after results.
The Intel ARC and AMD GPUs all show improved performance, with most delivering significant gains. The Arc A770 16GB (Limited Edition, which is now discontinued) improved by 54%, while the A750 improved by 40% in the same scenario.
Nod.ai hasn’t been sitting still either. AMD’s RX 6800, RX 6750 XT, and RX 6700 10GB are all faster, with the 6800 and 6700 10GB in particular showing large gains. We’re not sure why the 6750 XT didn’t do as well, but the RX 6800 saw a performance boost of 34% and the RX 6700 10GB saw an even greater 76% performance improvement. The RX 6750 XT for some reason only saw a measly 9% increase, even though all three AMD GPUs share the same RDNA2 architecture. (We’ll be retesting other GPUs, including AMD’s newer RX 7000-series parts, in the near future.)
Again, we did not retest the three Nvidia RTX 40-series GPUs, which is why the performance statistics remain identical between the two graphs. Even so, with the new OpenVINO optimizations, Intel’s Arc A750 and A770 are now able to outperform the RTX 4060, and the A770 16GB is close behind the RTX 4060 Ti.
There’s still plenty of ongoing work, including making the installation more straightforward, and fixes so that other image resolutions and Stable Diffusion models work. We had to rely on the “v1-5-pruned-emaonly.safetensors” default model, as the newer “v2-1_512-ema-pruned.safetensors” and “v2-1_768-ema-pruned.safetensors” failed to generate meaningful output. Also, 768×768 generation currently fails on Arc GPUs — we could do up to 720×720, but 744×744 ended up switching to CPU-based generation. We’re told a fix for the 768×768 support should be coming relatively soon, though, so Arc users should keep an eye out for that update.