本文基于第 13 代英特爾 酷睿 i5-13490F 型號(hào)CPU 驗(yàn)證,對(duì)于量化后模型,你只需要在16G 的筆記本電腦上就可體驗(yàn)生成過程(最佳體驗(yàn)為 32G 內(nèi)存)。
SDXL-Turbo 是一個(gè)快速的生成式文本到圖像模型,可以通過單次網(wǎng)絡(luò)評(píng)估從文本提示中合成逼真的圖像。SDXL-Turbo 采用了一種稱為 Adversarial Diffusion Distillation (ADD) 的新型訓(xùn)練方法(詳見技術(shù)報(bào)告),該方法可以在 1 到 4 個(gè)步驟中對(duì)大規(guī)?;A(chǔ)圖像擴(kuò)散模型進(jìn)行采樣,并保持高質(zhì)量的圖像。通過最新版本(2023.2)OpenVINO工具套件的強(qiáng)大推理能力及NNCF 的高效神經(jīng)網(wǎng)絡(luò)壓縮能力,我們能夠在兩秒內(nèi)實(shí)現(xiàn)SDXL-Turbo 圖像的高速、高質(zhì)量生成。
01
環(huán)境安裝
在開始之前,我們需要安裝所有環(huán)境依賴:
%pip install --extra-index-url https://download.pytorch.org/whl/cpu torch transformers diffusers nncf optimum-intel gradio openvino==2023.2.0 onnx "git+https://github.com/huggingface/optimum-intel.git"
02
下載、轉(zhuǎn)換模型
首先我們要把huggingface 下載的原始模型轉(zhuǎn)化為OpenVINO IR,以便后續(xù)的NNCF 工具鏈進(jìn)行量化工作。轉(zhuǎn)換完成后你將得到對(duì)應(yīng)的text_encode、unet、vae 模型。
from pathlib import Path model_dir = Path("./sdxl_vino_model") sdxl_model_id = "stabilityai/sdxl-turbo" skip_convert_model = model_dir.exists()
import os if not skip_convert_model: # 設(shè)置下載路徑到當(dāng)前文件夾,并加速下載 os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com' os.system(f'optimum-cli export openvino --model {sdxl_model_id} --task stable-diffusion-xl {model_dir} --fp16') os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com' tae_id = "madebyollin/taesdxl" save_path = './taesdxl' os.system(f'huggingface-cli download --resume-download {tae_id} --local-dir {save_path}')
import torch import openvino as ov from diffusers import AutoencoderTiny import gc class VAEEncoder(torch.nn.Module): def __init__(self, vae): super().__init__() self.vae = vae def forward(self, sample): return self.vae.encode(sample) class VAEDecoder(torch.nn.Module): def __init__(self, vae): super().__init__() self.vae = vae def forward(self, latent_sample): return self.vae.decode(latent_sample) def convert_tiny_vae(save_path, output_path): tiny_vae = AutoencoderTiny.from_pretrained(save_path) tiny_vae.eval() vae_encoder = VAEEncoder(tiny_vae) ov_model = ov.convert_model(vae_encoder, example_input=torch.zeros((1,3,512,512))) ov.save_model(ov_model, output_path / "vae_encoder/openvino_model.xml") tiny_vae.save_config(output_path / "vae_encoder") vae_decoder = VAEDecoder(tiny_vae) ov_model = ov.convert_model(vae_decoder, example_input=torch.zeros((1,4,64,64))) ov.save_model(ov_model, output_path / "vae_decoder/openvino_model.xml") tiny_vae.save_config(output_path / "vae_decoder") convert_tiny_vae(save_path, model_dir)
03
從文本到圖像生成
現(xiàn)在,我們就可以進(jìn)行文本到圖像的生成了,我們使用優(yōu)化后的openvino pipeline 加載轉(zhuǎn)換后的模型文件并推理;只需要指定一個(gè)文本輸入,就可以生成我們想要的圖像結(jié)果。
from optimum.intel.openvino import OVStableDiffusionXLPipeline device='AUTO' # 這里直接指定AUTO,可以寫成CPU model_dir = "./sdxl_vino_model" text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device)
import numpy as np prompt = "cute cat" image = text2image_pipe(prompt, num_inference_steps=1, height=512, width=512, guidance_scale=0.0, generator=np.random.RandomState(987)).images[0] image.save("cat.png") image
# 清除資源占用 import gc del text2image_pipe gc.collect()
04
從圖片到圖片生成
我們還可以實(shí)現(xiàn)從圖片到圖片的擴(kuò)散模型生成,將剛才產(chǎn)出的文生圖圖片進(jìn)行二次圖像生成即可。
from optimum.intel import OVStableDiffusionXLImg2ImgPipeline model_dir = "./sdxl_vino_model" device='AUTO' # 'CPU' image2image_pipe = OVStableDiffusionXLImg2ImgPipeline.from_pretrained(model_dir, device=device)
Compiling the vae_decoder to AUTO ... Compiling the unet to AUTO ... Compiling the vae_encoder to AUTO ... Compiling the text_encoder_2 to AUTO ... Compiling the text_encoder to AUTO ...
photo_prompt = "a cute cat with bow tie" photo_image = image2image_pipe(photo_prompt, image=image, num_inference_steps=2, generator=np.random.RandomState(511), guidance_scale=0.0, strength=0.5).images[0] photo_image.save("cat_tie.png") photo_image
05
量化
NNCF(Neural Network Compression Framework) 是一款神經(jīng)網(wǎng)絡(luò)壓縮框架,通過對(duì) OpenVINO IR 格式模型的壓縮與量化以便更好的提升模型在英特爾設(shè)備上部署的推理性能。
[NNCF]:
https://github.com/openvinotoolkit/nncf/
[NNCF] 通過在模型圖中添加量化層,并使用訓(xùn)練數(shù)據(jù)集的子集來微調(diào)這些額外的量化層的參數(shù),實(shí)現(xiàn)了后訓(xùn)練量化。量化后的權(quán)重結(jié)果將是INT8 而不是FP32/FP16,從而加快了模型的推理速度。
根據(jù)SDXL-Turbo Model 的結(jié)構(gòu),UNet 模型占據(jù)了整個(gè)流水線執(zhí)行時(shí)間的重要部分。現(xiàn)在我們將展示如何使用[NNCF] 對(duì)UNet 部分進(jìn)行優(yōu)化,以減少計(jì)算成本并加快流水線速度。至于其余部分不需要量化,因?yàn)椴⒉荒茱@著提高推理性能,但可能會(huì)導(dǎo)致準(zhǔn)確性的大幅降低。
量化過程包含以下步驟:
- 為量化創(chuàng)建一個(gè)校準(zhǔn)數(shù)據(jù)集。
- 運(yùn)行nncf.quantize() 來獲取量化模型。
- 使用openvino.save_model() 函數(shù)保存INT8 模型。
注:由于量化需要一定的硬件資源(64G 以上的內(nèi)存),之后我直接附上了量化后的模型,你可以直接下載使用。
from pathlib import Path import openvino as ov from optimum.intel.openvino import OVStableDiffusionXLPipeline import os core = ov.Core() model_dir = Path("./sdxl_vino_model") UNET_INT8_OV_PATH = model_dir / "optimized_unet" / "openvino_model.xml" import datasets import numpy as np from tqdm import tqdm from transformers import set_seed from typing import Any, Dict, List set_seed(1) class CompiledModelDecorator(ov.CompiledModel): def __init__(self, compiled_model: ov.CompiledModel, data_cache: List[Any] = None): super().__init__(compiled_model) self.data_cache = data_cache if data_cache else [] def __call__(self, *args, **kwargs): self.data_cache.append(*args) return super().__call__(*args, **kwargs) def collect_calibration_data(pipe, subset_size: int) -> List[Dict]: original_unet = pipe.unet.request pipe.unet.request = CompiledModelDecorator(original_unet) dataset = datasets.load_dataset("conceptual_captions", split="train").shuffle(seed=42) # Run inference for data collection pbar = tqdm(total=subset_size) diff = 0 for batch in dataset: prompt = batch["caption"] if len(prompt) > pipe.tokenizer.model_max_length: continue _ = pipe( prompt, num_inference_steps=1, height=512, width=512, guidance_scale=0.0, generator=np.random.RandomState(987) ) collected_subset_size = len(pipe.unet.request.data_cache) if collected_subset_size >= subset_size: pbar.update(subset_size - pbar.n) break pbar.update(collected_subset_size - diff) diff = collected_subset_size calibration_dataset = pipe.unet.request.data_cache pipe.unet.request = original_unet return calibration_dataset
if not UNET_INT8_OV_PATH.exists(): text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir) unet_calibration_data = collect_calibration_data(text2image_pipe, subset_size=200)
import nncf from nncf.scopes import IgnoredScope UNET_OV_PATH = model_dir / "unet" / "openvino_model.xml" if not UNET_INT8_OV_PATH.exists(): unet = core.read_model(UNET_OV_PATH) quantized_unet = nncf.quantize( model=unet, model_type=nncf.ModelType.TRANSFORMER, calibration_dataset=nncf.Dataset(unet_calibration_data), ignored_scope=IgnoredScope( names=[ "__module.model.conv_in/aten::_convolution/Convolution", "__module.model.up_blocks.2.resnets.2.conv_shortcut/aten::_convolution/Convolution", "__module.model.conv_out/aten::_convolution/Convolution" ], ), ) ov.save_model(quantized_unet, UNET_INT8_OV_PATH)
06
運(yùn)行量化后模型
由于量化unet 的過程需要的內(nèi)存可能比較大,且耗時(shí)較長(zhǎng),我提前導(dǎo)出了量化后unet 模型,此處給出下載地址:
鏈接: https://pan.baidu.com/s/1WMAsgFFkKKp-EAS6M1wK1g
提取碼: psta
下載后解壓到目標(biāo)文件夾`sdxl_vino_model` 即可運(yùn)行量化后的int8 unet 模型。
從文本到圖像生成
from pathlib import Path import openvino as ov from optimum.intel.openvino import OVStableDiffusionXLPipeline import numpy as np core = ov.Core() model_dir = Path("./sdxl_vino_model") UNET_INT8_OV_PATH = model_dir / "optimized_unet" / "openvino_model.xml" int8_text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, compile=False) int8_text2image_pipe.unet.model = core.read_model(UNET_INT8_OV_PATH) int8_text2image_pipe.unet.request = None prompt = "cute cat" image = int8_text2image_pipe(prompt, num_inference_steps=1, height=512, width=512, guidance_scale=0.0, generator=np.random.RandomState(987)).images[0] display(image)
Compiling the text_encoder to CPU ... Compiling the text_encoder_2 to CPU ... 0%| | 0/1 [00:00, ?it/s] ? ?Compiling the unet to CPU ... ? ?Compiling the vae_decoder to CPU ...
import gc del int8_text2image_pipe gc.collect()
從圖片到圖片生成
from optimum.intel import OVStableDiffusionXLImg2ImgPipeline int8_image2image_pipe = OVStableDiffusionXLImg2ImgPipeline.from_pretrained(model_dir, compile=False) int8_image2image_pipe.unet.model = core.read_model(UNET_INT8_OV_PATH) int8_image2image_pipe.unet.request = None photo_prompt = "a cute cat with bow tie" photo_image = int8_image2image_pipe(photo_prompt, image=image, num_inference_steps=2, generator=np.random.RandomState(511), guidance_scale=0.0, strength=0.5).images[0] display(photo_image)
Compiling the text_encoder to CPU ... Compiling the text_encoder_2 to CPU ... Compiling the vae_encoder to CPU ... 0%| | 0/1 [00:00, ?it/s] ? ?Compiling the unet to CPU ...
我們可以對(duì)比量化后的unet 模型大小減少,可以看到量化對(duì)模型大小的壓縮是非常顯著的
from pathlib import Path model_dir = Path("./sdxl_vino_model") UNET_OV_PATH = model_dir / "unet" / "openvino_model.xml" UNET_INT8_OV_PATH = model_dir / "optimized_unet" / "openvino_model.xml" fp16_ir_model_size = UNET_OV_PATH.with_suffix(".bin").stat().st_size / 1024 quantized_model_size = UNET_INT8_OV_PATH.with_suffix(".bin").stat().st_size / 1024 print(f"FP16 model size: {fp16_ir_model_size:.2f} KB") print(f"INT8 model size: {quantized_model_size:.2f} KB") print(f"Model compression rate: {fp16_ir_model_size / quantized_model_size:.3f}")
FP16 model size: 5014578.27 KB INT8 model size: 2513501.39 KB Model compression rate: 1.995
運(yùn)行下列代碼可以對(duì)量化前后模型推理速度進(jìn)行簡(jiǎn)單比較,我們可以發(fā)現(xiàn)速度幾乎加速了一倍,NNCF 使我們?cè)?CPU 上生成一張圖的時(shí)間縮短到兩秒之內(nèi):
FP16 pipeline latency: 3.148 INT8 pipeline latency: 1.558 Text-to-Image generation speed up: 2.020
import time def calculate_inference_time(pipe): inference_time = [] for prompt in ['cat']*10: start = time.perf_counter() _ = pipe( prompt, num_inference_steps=1, guidance_scale=0.0, generator=np.random.RandomState(23) ).images[0] end = time.perf_counter() delta = end - start inference_time.append(delta) return np.median(inference_time)
int8_latency = calculate_inference_time(int8_text2image_pipe) text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir) fp_latency = calculate_inference_time(text2image_pipe) print(f"FP16 pipeline latency: {fp_latency:.3f}") print(f"INT8 pipeline latency: {int8_latency:.3f}") print(f"Text-to-Image generation speed up: {fp_latency / int8_latency:.3f}")
07
可交互前端demo
最后,為了方便推理使用,這里附上了gradio 前端運(yùn)行demo,你可以利用他輕松生成你想要生成的圖像,并嘗試不同組合。
import gradio as gr from pathlib import Path import openvino as ov import numpy as np core = ov.Core() model_dir = Path("./sdxl_vino_model") # 如果你只有量化前模型,請(qǐng)使用這個(gè)地址并注釋 optimized_unet 地址: # UNET_PATH = model_dir / "unet" / "openvino_model.xml" UNET_PATH = model_dir / "optimized_unet" / "openvino_model.xml" from optimum.intel.openvino import OVStableDiffusionXLPipeline text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir) text2image_pipe.unet.model = core.read_model(UNET_PATH) text2image_pipe.unet.request = core.compile_model(text2image_pipe.unet.model) def generate_from_text(text, seed, num_steps, height, width): result = text2image_pipe(text, num_inference_steps=num_steps, guidance_scale=0.0, generator=np.random.RandomState(seed), height=height, width=width).images[0] return result with gr.Blocks() as demo: with gr.Column(): positive_input = gr.Textbox(label="Text prompt") with gr.Row(): seed_input = gr.Number(precision=0, label="Seed", value=42, minimum=0) steps_input = gr.Slider(label="Steps", value=1, minimum=1, maximum=4, step=1) height_input = gr.Slider(label="Height", value=512, minimum=256, maximum=1024, step=32) width_input = gr.Slider(label="Width", value=512, minimum=256, maximum=1024, step=32) btn = gr.Button() out = gr.Image(label="Result (Quantized)" , type="pil", width=512) btn.click(generate_from_text, [positive_input, seed_input, steps_input, height_input, width_input], out) gr.Examples([ ["cute cat", 999], ["underwater world coral reef, colorful jellyfish, 35mm, cinematic lighting, shallow depth of field, ultra quality, masterpiece, realistic", 89], ["a photo realistic happy white poodle dog playing in the grass, extremely detailed, high res, 8k, masterpiece, dynamic angle", 1569], ["Astronaut on Mars watching sunset, best quality, cinematic effects,", 65245], ["Black and white street photography of a rainy night in New York, reflections on wet pavement", 48199] ], [positive_input, seed_input]) try: demo.launch(debug=True) except Exception: demo.launch(share=True, debug=True)
08
總結(jié)
利用最新版本的OpenVINO優(yōu)化,我們可以很容易實(shí)現(xiàn)在家用設(shè)備上高效推理圖像生成AI 的能力,加速生成式AI 在世紀(jì)場(chǎng)景下的落地應(yīng)用;歡迎您與我們一同體驗(yàn)OpenVINO與NNCF 在生成式AI 場(chǎng)景上的強(qiáng)大威力。
審核編輯:劉清
-
英特爾
+關(guān)注
關(guān)注
61文章
9964瀏覽量
171773 -
OpenVINO
+關(guān)注
關(guān)注
0文章
93瀏覽量
202
原文標(biāo)題:用 OpenVINO? 在英特爾 13th Gen CPU 上運(yùn)行 SDXL-Turbo 文本圖像生成模型 | 開發(fā)者實(shí)戰(zhàn)
文章出處:【微信號(hào):英特爾物聯(lián)網(wǎng),微信公眾號(hào):英特爾物聯(lián)網(wǎng)】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論