欧美三级不卡在线观看视频,全能千金燃翻天

本篇測(cè)評(píng)由電子工程世界的優(yōu)秀測(cè)評(píng)者“jf_99374259”提供。

本文將介紹基于米爾電子MYD-YT113i開發(fā)板的G2D圖像處理硬件調(diào)用和測(cè)試。

MYC-YT113i核心板及開發(fā)板

真正的國(guó)產(chǎn)核心板，100%國(guó)產(chǎn)物料認(rèn)證

國(guó)產(chǎn)T113-i處理器配備2*Cortex-A7@1.2GHz ，RISC-V

外置DDR3接口、支持視頻編解碼器、HiFi4 DSP

接口豐富：視頻采集接口、顯示器接口、USB2.0 接口、CAN 接口、千兆以太網(wǎng)接口

工業(yè)級(jí)：-40℃~+85℃、尺寸37mm*39mm

郵票孔+LGA，140+50PIN

全志 T113-i 2D圖形加速硬件支持情況

Supports layer size up to 2048 x 2048 pixels

Supports pre-multiply alpha image data

Supports color key

Supports two pipes Porter-Duff alpha blending

Supports multiple video formats 40, 42, 41 and multiple pixel formats (8/16/24/32 bits graphics
layer)

Supports memory scan order option

Supports any format convert function

Supports 1/16× to 32× resize ratio

Supports 32-phase 8-tap horizontal anti-alias filter and 32-phase 4-tap vertical anti-alias filter

Supports window clip

Supports FillRectangle, BitBlit, StretchBlit and MaskBlit

Supports horizontal and vertical flip, clockwise 0/90/180/270 degree rotate for normal buffer

Supports horizontal flip, clockwise 0/90/270 degree rotate for LBC buffer

可以看到 g2d 硬件支持相當(dāng)多的2D圖像處理，包括顏色空間轉(zhuǎn)換，分辨率縮放，圖層疊加，旋轉(zhuǎn)等

開發(fā)環(huán)境配置

基礎(chǔ)開發(fā)環(huán)境搭建參考上上上一篇https://bbs.elecfans.com/jishu_2408808_1_1.html除了工具鏈外，我們使用 opencv-mobile 加載輸入圖片和保存結(jié)果，用來查看顏色轉(zhuǎn)換是否正常g2d硬件直接采用標(biāo)準(zhǔn)的 Linux ioctl 操縱，只需要引入相關(guān)結(jié)構(gòu)體定義即可，無需鏈接sohttps://github.com/MYIR-ALLWINNER/framework/blob/develop-yt113-framework/auto/sdk_lib/include/g2d_driver.h此外，g2d的輸入和輸出數(shù)據(jù)必須在dmaion buffer上，因此還需要dmaion.h頭文件，用來分配和釋放dmaion bufferhttps://github.com/MYIR-ALLWINNER/framework/blob/develop-yt113-framework/auto/sdk_lib/include/DmaIon.h

基于C語言實(shí)現(xiàn)的YUV轉(zhuǎn)RGB

這里復(fù)用之前T113-i JPG解碼的函數(shù)

void yuv420sp2rgb(const unsigned char* yuv420sp, int w, int h, unsigned char* rgb){ const unsigned char* yptr = yuv420sp; const unsigned char* vuptr = yuv420sp + w * h;
for (int y = 0; y < h; y += 2) { const unsigned char* yptr0 = yptr; const unsigned char* yptr1 = yptr + w; unsigned char* rgb0 = rgb; unsigned char* rgb1 = rgb + w * 3;
int remain = w;
#define SATURATE_CAST_UCHAR(X) (unsigned char)::max((int)(X), 0), 255); for (; remain > 0; remain -= 2) { // R = 1.164 * yy + 1.596 * vv // G = 1.164 * yy - 0.813 * vv - 0.391 * uu // B = 1.164 * yy + 2.018 * uu
// R = Y + (1.370705 * (V-128)) // G = Y - (0.698001 * (V-128)) - (0.337633 * (U-128)) // B = Y + (1.732446 * (U-128))
// R = ((Y << 6) + 87.72512 * (V-128)) >> 6 // G = ((Y << 6) - 44.672064 * (V-128) - 21.608512 * (U-128)) >> 6 // B = ((Y << 6) + 110.876544 * (U-128)) >> 6
// R = ((Y << 6) + 90 * (V-128)) >> 6 // G = ((Y << 6) - 46 * (V-128) - 22 * (U-128)) >> 6 // B = ((Y << 6) + 113 * (U-128)) >> 6
// R = (yy + 90 * vv) >> 6 // G = (yy - 46 * vv - 22 * uu) >> 6 // B = (yy + 113 * uu) >> 6
int v = vuptr[0] - 128; int u = vuptr[1] - 128;
int ruv = 90 * v; int guv = -46 * v + -22 * u; int buv = 113 * u;
int y00 = yptr0[0] << 6; rgb0[0] = SATURATE_CAST_UCHAR((y00 + ruv) >> 6); rgb0[1] = SATURATE_CAST_UCHAR((y00 + guv) >> 6); rgb0[2] = SATURATE_CAST_UCHAR((y00 + buv) >> 6);
int y01 = yptr0[1] << 6; rgb0[3] = SATURATE_CAST_UCHAR((y01 + ruv) >> 6); rgb0[4] = SATURATE_CAST_UCHAR((y01 + guv) >> 6); rgb0[5] = SATURATE_CAST_UCHAR((y01 + buv) >> 6);
int y10 = yptr1[0] << 6; rgb1[0] = SATURATE_CAST_UCHAR((y10 + ruv) >> 6); rgb1[1] = SATURATE_CAST_UCHAR((y10 + guv) >> 6); rgb1[2] = SATURATE_CAST_UCHAR((y10 + buv) >> 6);
int y11 = yptr1[1] << 6; rgb1[3] = SATURATE_CAST_UCHAR((y11 + ruv) >> 6); rgb1[4] = SATURATE_CAST_UCHAR((y11 + guv) >> 6); rgb1[5] = SATURATE_CAST_UCHAR((y11 + buv) >> 6);
yptr0 += 2; yptr1 += 2; vuptr += 2; rgb0 += 6; rgb1 += 6; }#undef SATURATE_CAST_UCHAR
yptr += 2 * w; rgb += 2 * 3 * w; }}

基于ARM neon指令集優(yōu)化的YUV轉(zhuǎn)RGB

考慮到armv7編譯器的自動(dòng)neon優(yōu)化能力較差，這里針對(duì)性的編寫 arm neon inline assembly 實(shí)現(xiàn)YUV2RGB內(nèi)核部分，達(dá)到最優(yōu)化的性能，榨干cpu性能

void yuv420sp2rgb_neon(const unsigned char* yuv420sp, int w, int h, unsigned char* rgb){ const unsigned char* yptr = yuv420sp; const unsigned char* vuptr = yuv420sp + w * h;
#if __ARM_NEON uint8x8_t _v128 = vdup_n_u8(128); int8x8_t _v90 = vdup_n_s8(90); int8x8_t _v46 = vdup_n_s8(46); int8x8_t _v22 = vdup_n_s8(22); int8x8_t _v113 = vdup_n_s8(113);#endif // __ARM_NEON
for (int y = 0; y < h; y += 2) { const unsigned char* yptr0 = yptr; const unsigned char* yptr1 = yptr + w; unsigned char* rgb0 = rgb; unsigned char* rgb1 = rgb + w * 3;
#if __ARM_NEON int nn = w >> 3; int remain = w - (nn << 3);#else int remain = w;#endif // __ARM_NEON
#if __ARM_NEON#if __aarch64__ for (; nn > 0; nn--) { int16x8_t _yy0 = vreinterpretq_s16_u16(vshll_n_u8(vld1_u8(yptr0), 6)); int16x8_t _yy1 = vreinterpretq_s16_u16(vshll_n_u8(vld1_u8(yptr1), 6));
int8x8_t _vvuu = vreinterpret_s8_u8(vsub_u8(vld1_u8(vuptr), _v128)); int8x8x2_t _vvvvuuuu = vtrn_s8(_vvuu, _vvuu); int8x8_t _vv = _vvvvuuuu.val[0]; int8x8_t _uu = _vvvvuuuu.val[1];
int16x8_t _r0 = vmlal_s8(_yy0, _vv, _v90); int16x8_t _g0 = vmlsl_s8(_yy0, _vv, _v46); _g0 = vmlsl_s8(_g0, _uu, _v22); int16x8_t _b0 = vmlal_s8(_yy0, _uu, _v113);
int16x8_t _r1 = vmlal_s8(_yy1, _vv, _v90); int16x8_t _g1 = vmlsl_s8(_yy1, _vv, _v46); _g1 = vmlsl_s8(_g1, _uu, _v22); int16x8_t _b1 = vmlal_s8(_yy1, _uu, _v113);
uint8x8x3_t _rgb0; _rgb0.val[0] = vqshrun_n_s16(_r0, 6); _rgb0.val[1] = vqshrun_n_s16(_g0, 6); _rgb0.val[2] = vqshrun_n_s16(_b0, 6);
uint8x8x3_t _rgb1; _rgb1.val[0] = vqshrun_n_s16(_r1, 6); _rgb1.val[1] = vqshrun_n_s16(_g1, 6); _rgb1.val[2] = vqshrun_n_s16(_b1, 6);
vst3_u8(rgb0, _rgb0); vst3_u8(rgb1, _rgb1);
yptr0 += 8; yptr1 += 8; vuptr += 8; rgb0 += 24; rgb1 += 24; }#else if (nn > 0) { asm volatile( "0: \n" "pld [%3, #128] \n" "vld1.u8 {d2}, [%3]! \n" "vsub.s8 d2, d2, %12 \n" "pld [%1, #128] \n" "vld1.u8 {d0}, [%1]! \n" "pld [%2, #128] \n" "vld1.u8 {d1}, [%2]! \n" "vshll.u8 q2, d0, #6 \n" "vorr d3, d2, d2 \n" "vshll.u8 q3, d1, #6 \n" "vorr q9, q2, q2 \n" "vtrn.s8 d2, d3 \n" "vorr q11, q3, q3 \n" "vmlsl.s8 q9, d2, %14 \n" "vorr q8, q2, q2 \n" "vmlsl.s8 q11, d2, %14 \n" "vorr q10, q3, q3 \n" "vmlal.s8 q8, d2, %13 \n" "vmlal.s8 q2, d3, %16 \n" "vmlal.s8 q10, d2, %13 \n" "vmlsl.s8 q9, d3, %15 \n" "vmlal.s8 q3, d3, %16 \n" "vmlsl.s8 q11, d3, %15 \n" "vqshrun.s16 d24, q8, #6 \n" "vqshrun.s16 d26, q2, #6 \n" "vqshrun.s16 d4, q10, #6 \n" "vqshrun.s16 d25, q9, #6 \n" "vqshrun.s16 d6, q3, #6 \n" "vqshrun.s16 d5, q11, #6 \n" "subs %0, #1 \n" "vst3.u8 {d24-d26}, [%4]! \n" "vst3.u8 {d4-d6}, [%5]! \n" "bne 0b \n" : "=r"(nn), // %0 "=r"(yptr0), // %1 "=r"(yptr1), // %2 "=r"(vuptr), // %3 "=r"(rgb0), // %4 "=r"(rgb1) // %5 : "0"(nn), "1"(yptr0), "2"(yptr1), "3"(vuptr), "4"(rgb0), "5"(rgb1), "w"(_v128), // %12 "w"(_v90), // %13 "w"(_v46), // %14 "w"(_v22), // %15 "w"(_v113) // %16 : "cc", "memory", "q0", "q1", "q2", "q3", "q8", "q9", "q10", "q11", "q12", "d26"); }#endif // __aarch64__#endif // __ARM_NEON
#define SATURATE_CAST_UCHAR(X) (unsigned char)::max((int)(X), 0), 255); for (; remain > 0; remain -= 2) { // R = 1.164 * yy + 1.596 * vv // G = 1.164 * yy - 0.813 * vv - 0.391 * uu // B = 1.164 * yy + 2.018 * uu
// R = Y + (1.370705 * (V-128)) // G = Y - (0.698001 * (V-128)) - (0.337633 * (U-128)) // B = Y + (1.732446 * (U-128))
// R = ((Y << 6) + 87.72512 * (V-128)) >> 6 // G = ((Y << 6) - 44.672064 * (V-128) - 21.608512 * (U-128)) >> 6 // B = ((Y << 6) + 110.876544 * (U-128)) >> 6
// R = ((Y << 6) + 90 * (V-128)) >> 6 // G = ((Y << 6) - 46 * (V-128) - 22 * (U-128)) >> 6 // B = ((Y << 6) + 113 * (U-128)) >> 6
// R = (yy + 90 * vv) >> 6 // G = (yy - 46 * vv - 22 * uu) >> 6 // B = (yy + 113 * uu) >> 6
int v = vuptr[0] - 128; int u = vuptr[1] - 128;
int ruv = 90 * v; int guv = -46 * v + -22 * u; int buv = 113 * u;
int y00 = yptr0[0] << 6; rgb0[0] = SATURATE_CAST_UCHAR((y00 + ruv) >> 6); rgb0[1] = SATURATE_CAST_UCHAR((y00 + guv) >> 6); rgb0[2] = SATURATE_CAST_UCHAR((y00 + buv) >> 6);
int y01 = yptr0[1] << 6; rgb0[3] = SATURATE_CAST_UCHAR((y01 + ruv) >> 6); rgb0[4] = SATURATE_CAST_UCHAR((y01 + guv) >> 6); rgb0[5] = SATURATE_CAST_UCHAR((y01 + buv) >> 6);
int y10 = yptr1[0] << 6; rgb1[0] = SATURATE_CAST_UCHAR((y10 + ruv) >> 6); rgb1[1] = SATURATE_CAST_UCHAR((y10 + guv) >> 6); rgb1[2] = SATURATE_CAST_UCHAR((y10 + buv) >> 6);
int y11 = yptr1[1] << 6; rgb1[3] = SATURATE_CAST_UCHAR((y11 + ruv) >> 6); rgb1[4] = SATURATE_CAST_UCHAR((y11 + guv) >> 6); rgb1[5] = SATURATE_CAST_UCHAR((y11 + buv) >> 6);
yptr0 += 2; yptr1 += 2; vuptr += 2; rgb0 += 6; rgb1 += 6; }#undef SATURATE_CAST_UCHAR
yptr += 2 * w; rgb += 2 * 3 * w; }}

基于G2D圖形硬件的YUV轉(zhuǎn)RGB

我們先實(shí)現(xiàn) dmaion buffer 管理器，參考

https://github.com/MYIR-ALLWINNER/framework/blob/develop-yt113-framework/auto/sdk_lib/sdk_memory/DmaIon.cpp

這里貼的代碼省略了異常錯(cuò)誤處理的邏輯，有個(gè)坑是 linux-4.9 和 linux-5.4 用法不一樣，米爾電子的這個(gè)T113-i系統(tǒng)是linux-5.4，所以不兼容4.9內(nèi)核的ioctl用法習(xí)慣

struct ion_memory{ size_t size; int fd; void* virt_addr; unsigned int phy_addr;};
class ion_allocator{public: ion_allocator(); ~ion_allocator();
int open(); void close();
int alloc(size_t size, struct ion_memory* mem); int free(struct ion_memory* mem);
int flush(struct ion_memory* mem);
public: int ion_fd; int cedar_fd;};
ion_allocator::ion_allocator(){ ion_fd = -1; cedar_fd = -1;}
ion_allocator::~ion_allocator(){ close();}
int ion_allocator::open(){ close();
ion_fd = ::open("/dev/ion", O_RDWR); cedar_fd = ::open("/dev/cedar_dev", O_RDONLY);
ioctl(cedar_fd, IOCTL_ENGINE_REQ, 0);
return 0;}
void ion_allocator::close(){ if (cedar_fd != -1) { ioctl(cedar_fd, IOCTL_ENGINE_REL, 0); ::close(cedar_fd); cedar_fd = -1; }
if (ion_fd != -1) { ::close(ion_fd); ion_fd = -1; }}
int ion_allocator::alloc(size_t size, struct ion_memory* mem){ struct aw_ion_new_alloc_data alloc_data; alloc_data.len = size; alloc_data.heap_id_mask = AW_ION_SYSTEM_HEAP_MASK; alloc_data.flags = AW_ION_CACHED_FLAG | AW_ION_CACHED_NEEDS_SYNC_FLAG; alloc_data.fd = 0; alloc_data.unused = 0; ioctl(ion_fd, AW_ION_IOC_NEW_ALLOC, &alloc_data);
void* virt_addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, alloc_data.fd, 0);
struct aw_user_iommu_param iommu_param; iommu_param.fd = alloc_data.fd; iommu_param.iommu_addr = 0; ioctl(cedar_fd, IOCTL_GET_IOMMU_ADDR, &iommu_param);
mem->size = size; mem->fd = alloc_data.fd; mem->virt_addr = virt_addr; mem->phy_addr = iommu_param.iommu_addr;
return 0;}
int ion_allocator::free(struct ion_memory* mem){ if (mem->fd == -1) return 0;
struct aw_user_iommu_param iommu_param; iommu_param.fd = mem->fd; ioctl(cedar_fd, IOCTL_FREE_IOMMU_ADDR, &iommu_param);
munmap(mem->virt_addr, mem->size);
::close(mem->fd);
mem->size = 0; mem->fd = -1; mem->virt_addr = 0; mem->phy_addr = 0;
return 0;}
int ion_allocator::flush(struct ion_memory* mem){ struct dma_buf_sync sync; sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW; ioctl(mem->fd, DMA_BUF_IOCTL_SYNC, &sync);
return 0;}

然后再實(shí)現(xiàn) G2D圖形硬件 YUV轉(zhuǎn)RGB 的轉(zhuǎn)換器

提前分配好YUV和RGB的dmaion buffer

將YUV數(shù)據(jù)拷貝到dmaion buffer，flush cache完成同步

配置轉(zhuǎn)換參數(shù)，ioctl調(diào)用G2D_CMD_BITBLT_H完成轉(zhuǎn)換

flush cache完成同步，從dmaion buffer拷貝出RGB數(shù)據(jù)

釋放dmaion buffer

// 步驟1ion_allocator ion;ion.open();
struct ion_memory yuv_ion;ion.alloc(rgb_size, &rgb_ion);
struct ion_memory rgb_ion;ion.alloc(yuv_size, &yuv_ion);
int g2d_fd = ::open("/dev/g2d", O_RDWR);
// 步驟2memcpy((unsigned char*)yuv_ion.virt_addr, yuv420sp, yuv_size);ion.flush(&yuv_ion);
// 步驟3g2d_blt_h blit;memset(&blit, 0, sizeof(blit));
blit.flag_h = G2D_BLT_NONE_H;
blit.src_image_h.format = G2D_FORMAT_YUV420UVC_V1U1V0U0;blit.src_image_h.width = width;blit.src_image_h.height = height;blit.src_image_h.align[0] = 0;blit.src_image_h.align[1] = 0;blit.src_image_h.clip_rect.x = 0;blit.src_image_h.clip_rect.y = 0;blit.src_image_h.clip_rect.w = width;blit.src_image_h.clip_rect.h = height;blit.src_image_h.gamut = G2D_BT601;blit.src_image_h.bpremul = 0;blit.src_image_h.mode = G2D_PIXEL_ALPHA;blit.src_image_h.use_phy_addr = 0;blit.src_image_h.fd = yuv_ion.fd;
blit.dst_image_h.format = G2D_FORMAT_RGB888;blit.dst_image_h.width = width;blit.dst_image_h.height = height;blit.dst_image_h.align[0] = 0;blit.dst_image_h.clip_rect.x = 0;blit.dst_image_h.clip_rect.y = 0;blit.dst_image_h.clip_rect.w = width;blit.dst_image_h.clip_rect.h = height;blit.dst_image_h.gamut = G2D_BT601;blit.dst_image_h.bpremul = 0;blit.dst_image_h.mode = G2D_PIXEL_ALPHA;blit.dst_image_h.use_phy_addr = 0;blit.dst_image_h.fd = rgb_ion.fd;
ioctl(g2d_fd, G2D_CMD_BITBLT_H, &blit);
// 步驟4ion.flush(&rgb_ion);memcpy(rgb, (const unsigned char*)rgb_ion.virt_addr, rgb_size);
// 步驟5ion.free(&rgb_ion);ion.free(&yuv_ion);ion.close();::close(g2d_fd);

G2D圖像硬件YUV轉(zhuǎn)RGB測(cè)試

考慮到dmaion buffer分配和釋放都比較耗時(shí)，我們提前做好，循環(huán)調(diào)用步驟3的G2D轉(zhuǎn)換，統(tǒng)計(jì)耗時(shí)，并在top工具中查看CPU占用率

sh-4.4# LD_LIBRARY_PATH=. ./g2dtestINFO : cedarc : register mjpeg decoder success!this device is not whitelisted for jpeg decoder cvithis device is not whitelisted for jpeg decoder cvithis device is not whitelisted for jpeg decoder cvithis device is not whitelisted for jpeg encoder rkmppINFO : cedarc : Set log level to 5 from /vendor/etc/cedarc.confERROR : cedarc : now cedarc log level:5ERROR : cedarc : now cedarc log level:5yuv420sp2rgb 46.61yuv420sp2rgb 42.04yuv420sp2rgb 41.32yuv420sp2rgb 42.06yuv420sp2rgb 41.69yuv420sp2rgb 42.05yuv420sp2rgb 41.29yuv420sp2rgb 41.30yuv420sp2rgb 42.14yuv420sp2rgb 41.33yuv420sp2rgb_neon 10.57yuv420sp2rgb_neon 7.21yuv420sp2rgb_neon 6.77yuv420sp2rgb_neon 8.31yuv420sp2rgb_neon 7.60yuv420sp2rgb_neon 6.80yuv420sp2rgb_neon 6.77yuv420sp2rgb_neon 7.01yuv420sp2rgb_neon 7.11yuv420sp2rgb_neon 7.06yuv420sp2rgb_g2d 4.32yuv420sp2rgb_g2d 4.69yuv420sp2rgb_g2d 4.56yuv420sp2rgb_g2d 4.57yuv420sp2rgb_g2d 4.52yuv420sp2rgb_g2d 4.54yuv420sp2rgb_g2d 4.52yuv420sp2rgb_g2d 4.58yuv420sp2rgb_g2d 4.60yuv420sp2rgb_g2d 4.67

可以看到 ARM neon 的優(yōu)化效果非常明顯，而使用G2D圖形硬件能獲得進(jìn)一步加速，并且能顯著降低CPU占用率！

	耗時(shí)(ms)	CPU占用率(%)
C	41.30	50
neon	6.77	50
g2d	4.32	12

轉(zhuǎn)換結(jié)果對(duì)比和分析

C和neon的轉(zhuǎn)換結(jié)果完全一致，但是g2d轉(zhuǎn)換后的圖片有明顯的色差

G2D圖形硬件只支持 G2D_BT601，G2D_BT709，G2D_BT2020 3種YUV系數(shù)，而JPG所使用的YUV系數(shù)是改版BT601，因此產(chǎn)生了色差

https://github.com/MYIR-ALLWINNER/myir-t1-kernel/blob/develop-yt113-L5.4.61/drivers/char/sunxi_g2d/g2d_bsp_v2.c

從g2d內(nèi)核驅(qū)動(dòng)中也可以得知，暫時(shí)沒有方法為g2d設(shè)置自定義的YUV系數(shù)，g2d不適合用于JPG的編解碼，但依然適合攝像頭和視頻編解碼的顏色空間轉(zhuǎn)換

阅读全文

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場(chǎng)。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請(qǐng)聯(lián)系本站處理。舉報(bào)投訴