91av中文,无码免费观看视屏在线

隨著AI、大模型的快速發(fā)展，傳統(tǒng)的集中式計算已無法應(yīng)對激增的數(shù)據(jù)處理需求，而分布式計算是指將一個計算任務(wù)分解成多個子任務(wù)，由多個計算節(jié)點并行地進(jìn)行計算，并將結(jié)果匯總得到最終結(jié)果的計算方式，能夠更高效、更穩(wěn)定、更靈活地處理大規(guī)模數(shù)據(jù)和復(fù)雜計算任務(wù)，在各行各業(yè)中得到了廣泛的應(yīng)用。

那如何從零到一搭建分布式計算的環(huán)境呢？本文將從硬件選型，到服務(wù)器側(cè)的基礎(chǔ)配置、GPU驅(qū)動安裝和集合通訊庫配置，以及無損以太網(wǎng)的啟用，直至大模型導(dǎo)入和訓(xùn)練測試，帶您跑通搭建分布式計算環(huán)境的全流程。

硬件準(zhǔn)備

GPU服務(wù)器選型

GPU擁有大量的計算核心，可以同時處理多個數(shù)據(jù)任務(wù)，是構(gòu)成智算中心的關(guān)鍵硬件。

從智算中心方案的整體設(shè)計層面來看：GPU服務(wù)器集群和存儲服務(wù)器集群分別通過計算網(wǎng)絡(luò)（Scale-out網(wǎng)絡(luò)）和存儲網(wǎng)絡(luò)連接。另外兩張管理網(wǎng)中，業(yè)務(wù)管理網(wǎng)用于GPU服務(wù)器互聯(lián)，進(jìn)行AIOS管理面通信，帶外管理則連接整個智算中心的所有設(shè)備，用于運維接入管理。

圖1：智算中心方案的概要設(shè)計拓?fù)?/i>
明確了智算中心的整體設(shè)計后，我們將對比通用計算服務(wù)器與GPU服務(wù)器的內(nèi)部硬件連接拓?fù)鋱D，來具體了解GPU服務(wù)器的選型邏輯：
圖2：通用計算服務(wù)器內(nèi)部的硬件連接拓?fù)?img src="https://file1.elecfans.com/web1/M00/F5/1B/wKgZoWc2v7KATB5kAARyuXoP2h8166.png" alt="wKgZoWc2v7KATB5kAARyuXoP2h8166.png" />圖3：GPU服務(wù)器內(nèi)部的硬件連接拓?fù)?/i>

圖2是一臺通用計算服務(wù)器內(nèi)部的硬件連接拓?fù)?，這臺服務(wù)器的核心是兩塊AMD的EPYC CPU，根據(jù)IO Chiplet擴(kuò)展出了若干接口，輔助CPU充分釋放通用計算能力。

圖3是一臺GPU服務(wù)器內(nèi)部的硬件連接拓?fù)?，這臺服務(wù)器配備了8塊A100 GPU，8張用于計算通信的RDMA網(wǎng)卡，以及2張用于存儲通信的RDMA網(wǎng)卡，所有的IO組件設(shè)計，都是為了讓這8塊GPU充分釋放算力。

通過上面兩張硬件連接拓?fù)鋱D可以看到，通用服務(wù)器和GPU服務(wù)器從基本的硬件構(gòu)造上就有著非常大的差異，一個是圍繞通用CPU來構(gòu)建，另一個是圍繞著GPU來構(gòu)建的。因此，在硬件選型階段，就需要注意差別，通常來講通用服務(wù)器是沒有辦法復(fù)用改造成一臺高性能的GPU服務(wù)器，PCIe接口數(shù)量、服務(wù)器空間、散熱設(shè)計、電源等方面都不能滿足要求。

當(dāng)通過計算任務(wù)確定算力需求，進(jìn)而確定了所需要的GPU型號和數(shù)量之后，我們也就可以再繼續(xù)規(guī)劃整個GPU集群的組網(wǎng)了。

由于資源限制，本次實驗驗證中，使用三臺通用服務(wù)器稍加改造進(jìn)行后續(xù)的并行訓(xùn)練和推理測試。

計算節(jié)點的硬件配置如下：

CPU：Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz * 2

GPU：NVIDIA GeForce RTX 4060 Ti 16G * 1

內(nèi)存：128G

存儲：10T HDD * 2

網(wǎng)卡：MGMT、CX5

其他部分：

散熱：GPU為全高尺寸，但服務(wù)器只有2U，所以只能拆掉上蓋板;

電源：通用服務(wù)器通常沒有預(yù)留足夠的供電接口，因此需要使用外置電源對GPU進(jìn)行額外供電；

電源選擇的是Great Wall 額定650W X6，功率上可以同時滿足3塊GPU（RTX4060Ti需要外接150W的供電）的供電要求，并且支持3個8pin接口，用來分別連接三塊GPU。
圖4：電源選型示意圖圖5：GPU和RDMA網(wǎng)卡上機(jī)安裝后的實拍圖
高性能計算網(wǎng)選型

智算中心的管理網(wǎng)相較于傳統(tǒng)的通用計算數(shù)據(jù)中心來說，沒有太大差異。比較特殊的就是Scale-out計算網(wǎng)絡(luò)和存儲網(wǎng)絡(luò)，這兩張網(wǎng)絡(luò)承載的業(yè)務(wù)流量決定了交換機(jī)設(shè)備的選型需求：支持RDMA、低時延、高吞吐。

如下圖所示，在組網(wǎng)連接方面也有所不同，這里會通過將GPU分組（圖中#L0～7一組，#L8～15一組），組成只有一跳的高帶寬互聯(lián)域（HB域），并通過針對智算場景優(yōu)化的Rail交換機(jī)連接，實現(xiàn)了高效的數(shù)據(jù)傳輸和計算協(xié)同。
圖6：組網(wǎng)連接示意

這次實驗驗證中，計算網(wǎng)的交換機(jī)選用星融元Asterfusion?? CX-N系列超低時延交換機(jī)，具體型號為CX308P-48Y-N。

型號

業(yè)務(wù)接口

交換容量

CX864E-N

64 x 800GE OSFP，2 x 10GE SFP+

102.4Tbps

CX732Q-N

32 x 400GE QSFP-DD, 2 x 10GE SFP+

25.6Tbps

CX664D-N

64 x 200GE QSFP56, 2 x 10GE SFP+

25.6Tbps

CX564P-N

64 x 100GE QSFP28, 2 x 10GE SFP+

12.8Tbps

CX532P-N

32 x 100GE QSFP28, 2 x 10GE SFP+

6.4Tbps

CX308P-48Y-N

48 x 25GE SFP28, 8 x 100GE QSFP28

4.0Tbps

表1：具體型號規(guī)格示意

提升大模型訓(xùn)練效率

CX-N數(shù)據(jù)中心交換機(jī)的單機(jī)轉(zhuǎn)發(fā)時延（400ns）低至業(yè)界平均水平的1/4~1/5，將網(wǎng)絡(luò)時延在AI/ML應(yīng)用端到端時延中的占比降至最低，同時多維度的高可靠設(shè)計確保網(wǎng)絡(luò)在任何時候都不中斷，幫助大模型的訓(xùn)練大幅度降低訓(xùn)練時間、提升整體效率。

全系列標(biāo)配RoCEv2能力

區(qū)別于傳統(tǒng)廠家多等級License權(quán)限管理方式，CX-N數(shù)據(jù)中心交換機(jī)所有應(yīng)用場景License權(quán)限一致，全系列標(biāo)配RoCEv2能力，提供PFC、ECN、Easy RoCE等一系列面向生產(chǎn)環(huán)境的增強(qiáng)網(wǎng)絡(luò)特性，用戶無須為此類高級特性額外付出網(wǎng)絡(luò)建設(shè)成本，幫助用戶獲得更高的ROI。

開放、中立的AI/ML網(wǎng)絡(luò)

星融元AI/ML網(wǎng)絡(luò)解決方案的開放性確保用戶能夠重用已有的系統(tǒng)（K8s、Prometheus等）對網(wǎng)絡(luò)進(jìn)行管理，無需重復(fù)投入；星融元以“中立的網(wǎng)絡(luò)供應(yīng)商參與AI生態(tài)”的理念為用戶提供專業(yè)的網(wǎng)絡(luò)方案，幫助用戶規(guī)避“全棧方案鎖定”的風(fēng)險。

最終，實驗環(huán)節(jié)的組網(wǎng)拓?fù)浜突A(chǔ)配置如下所示。
圖7：實驗拓?fù)浜突A(chǔ)配置示意

軟件準(zhǔn)備

以上，我們已經(jīng)完成了硬件選型，接下來我們將進(jìn)行軟件層面的配置：部署 RoCEv2 交換機(jī)、配置GPU 服務(wù)器、安裝 GPU 驅(qū)動和集合通訊庫。

RoCEv2交換機(jī)
圖8：CX308P-48Y-N設(shè)備圖

本次并行訓(xùn)練的環(huán)境中設(shè)備數(shù)量較少，組網(wǎng)相對簡單：

1. 將CX5網(wǎng)卡的25GE業(yè)務(wù)接口連接到CX308P；

2. 在交換機(jī)上一鍵啟用全局RoCE的無損配置；

3. 將三個25G業(yè)務(wù)口劃分到一個VLAN下組成一個二層網(wǎng)絡(luò)；

如前文提到，CX-N數(shù)據(jù)中心交換機(jī)全系列標(biāo)配RoCEv2能力，配合星融元AsterNOS網(wǎng)絡(luò)操作系統(tǒng)，只需要兩行命令行便可配置所有必要的QoS規(guī)則和參數(shù)，具體命令行如下：

noone@MacBook-Air ~ % ssh admin@10.230.1.17 Linux AsterNOS 5.10.0-8-2-amd64 #1 SMP Debian 5.10.46-4 (2021-08-03) x86_64 _ _ _ _ ___ ____ / ___ | |_ ___ _ __ | | | / _ / ___| / _ / __|| __| / _ | '__|| | || | | |___ / ___ __ | |_ | __/| | | | || |_| | ___) | /_/ _|___/ __| ___||_| |_| _| ___/ |____/ ------- Asterfusion Network Operating System ------- Help: http://www.asterfusion.com/ Last login: Sun Sep 29 17:10:46 2024 from 172.16.20.241 AsterNOS# configure terminal AsterNOS(config)# qos roce lossless AsterNOS(config)# qos service-policy roce_lossless AsterNOS(config)# end AsterNOS# show qos roce operational description ------------------ ------------- --------------------------------------------------- status bind qos roce binding status mode lossless Roce Mode cable-length 40m Cable Length(in meters) for Roce Lossless Config congestion-control - congestion-mode ECN congestion-control - enabled-tc 3,4 Congestion config enabled Traffic Class - max-threshold 750000 Congestion config max-threshold - min-threshold 15360 Congestion config max-threshold pfc - pfc-priority 3,4 switch-prio on which PFC is enabled - rx-enabled enable PFC Rx Enabled status - tx-enabled enable PFC Tx Enabled status trust - trust-mode dscp Trust Setting on the port for packet classification RoCE DSCP->SP mapping configurations ========================================== dscp switch-prio ----------------------- ------------- 0,1,2,3,4,5,6,7 0 10,11,12,13,14,15,8,9 1 16,17,18,19,20,21,22,23 2 24,25,26,27,28,29,30,31 3 32,33,34,35,36,37,38,39 4 40,41,42,43,44,45,46,47 5 48,49,50,51,52,53,54,55 6 56,57,58,59,60,61,62,63 7 RoCE SP->TC mapping and ETS configurations ================================================ switch-prio mode weight ------------- ------ -------- 6 SP 7 SP RoCE pool config ====================== name switch-prio ----------------------- ------------- egress_lossy_profile 0 1 2 5 6 ingress_lossy_profile 0 1 2 5 6 egress_lossless_profile 3 4 roce_lossless_profile 3 4

GPU服務(wù)器基礎(chǔ)配置

以下所有操作，在三臺服務(wù)器上都需要執(zhí)行，本文檔中的配置步驟以server3為例。

關(guān)閉防火墻和SELinux

[root@server3 ~]# systemctl stop firewalld [root@server3 ~]# systemctl disable firewalld [root@server3 ~]# setenforce 0 [root@server3 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

配置服務(wù)器間免密登陸

[root@server3 ~]# ssh-keygen [root@server3 ~]# ssh-copy-id root@server1 [root@server3 ~]# ssh-copy-id root@server2

配置服務(wù)器軟件源

[root@server3 ~]# ll /etc/yum.repos.d/ 總用量 80 -rw-r--r-- 1 root root 2278 9月 19 08:00 CentOS-Base.repo -rw-r--r-- 1 root root 232 9月 19 08:00 cuda-rhel7.repo -rw-r--r-- 1 root root 210 9月 19 08:00 cudnn-local-rhel7-8.9.7.29.repo drwxr-xr-x 2 root root 4096 9月 19 07:58 disable.d -rw-r--r-- 1 root root 664 9月 19 08:00 epel.repo -rw-r--r-- 1 root root 381 9月 19 08:00 hashicorp.repo -rw-r--r-- 1 root root 218 9月 19 08:00 kubernetes.repo -rw-r--r-- 1 root root 152 9月 19 08:00 MariaDB.repo -rw-r--r-- 1 root root 855 9月 19 08:00 remi-modular.repo -rw-r--r-- 1 root root 456 9月 19 08:00 remi-php54.repo -rw-r--r-- 1 root root 1314 9月 19 08:00 remi-php70.repo -rw-r--r-- 1 root root 1314 9月 19 08:00 remi-php71.repo -rw-r--r-- 1 root root 1314 9月 19 08:00 remi-php72.repo -rw-r--r-- 1 root root 1314 9月 19 08:00 remi-php73.repo -rw-r--r-- 1 root root 1314 9月 19 08:00 remi-php74.repo -rw-r--r-- 1 root root 1314 9月 19 08:00 remi-php80.repo -rw-r--r-- 1 root root 1314 9月 19 08:00 remi-php81.repo -rw-r--r-- 1 root root 1314 9月 19 08:00 remi-php82.repo -rw-r--r-- 1 root root 2605 9月 19 08:00 remi.repo -rw-r--r-- 1 root root 750 9月 19 08:00 remi-safe.repo [root@server3 ~]# more /etc/yum.repos.d/*.repo :::::::::::::: /etc/yum.repos.d/CentOS-Base.repo :::::::::::::: # CentOS-Base.repo # # The mirror system uses the connecting IP address of the client and the # update status of each mirror to pick mirrors that are updated to and # geographically close to the client. You should use this for CentOS updates # unless you are manually picking other mirrors. # # If the mirrorlist= does not work for you, as a fall back you can try the # remarked out baseurl= line instead. # # [base] name=CentOS-7 - Base - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/7/os/x86_64/ http://mirrors.aliyuncs.com/centos/7/os/x86_64/ http://mirrors.cloud.aliyuncs.com/centos/7/os/x86_64/ gpgcheck=1 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 #released updates [updates] name=CentOS-7 - Updates - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/7/updates/x86_64/ http://mirrors.aliyuncs.com/centos/7/updates/x86_64/ http://mirrors.cloud.aliyuncs.com/centos/7/updates/x86_64/ gpgcheck=1 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 #additional packages that may be useful [extras] name=CentOS-7 - Extras - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/7/extras/x86_64/ http://mirrors.aliyuncs.com/centos/7/extras/x86_64/ http://mirrors.cloud.aliyuncs.com/centos/7/extras/x86_64/ gpgcheck=1 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 #additional packages that extend functionality of existing packages [centosplus] name=CentOS-7 - Plus - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/7/centosplus/x86_64/ http://mirrors.aliyuncs.com/centos/7/centosplus/x86_64/ http://mirrors.cloud.aliyuncs.com/centos/7/centosplus/x86_64/ gpgcheck=1 enabled=0 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 #contrib - packages by Centos Users [contrib] name=CentOS-7 - Contrib - mirrors.aliyun.com failovermethod=priority baseurl=http://mirrors.aliyun.com/centos/7/contrib/x86_64/ http://mirrors.aliyuncs.com/centos/7/contrib/x86_64/ http://mirrors.cloud.aliyuncs.com/centos/7/contrib/x86_64/ gpgcheck=1 enabled=0 gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 :::::::::::::: /etc/yum.repos.d/cuda-rhel7.repo :::::::::::::: [cuda-rhel7-x86_64] name=cuda-rhel7-x86_64 baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64 enabled=1 gpgcheck=1 gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/D42D0685.pub :::::::::::::: /etc/yum.repos.d/cudnn-local-rhel7-8.9.7.29.repo :::::::::::::: [cudnn-local-rhel7-8.9.7.29] name=cudnn-local-rhel7-8.9.7.29 baseurl=file:///var/cudnn-local-repo-rhel7-8.9.7.29 enabled=1 gpgcheck=1 gpgkey=file:///var/cudnn-local-repo-rhel7-8.9.7.29/90F10142.pub obsoletes=0 :::::::::::::: /etc/yum.repos.d/epel.repo :::::::::::::: [epel] name=Extra Packages for Enterprise Linux 7 - $basearch baseurl=http://mirrors.aliyun.com/epel/7/$basearch failovermethod=priority enabled=1 gpgcheck=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 [epel-debuginfo] name=Extra Packages for Enterprise Linux 7 - $basearch - Debug baseurl=http://mirrors.aliyun.com/epel/7/$basearch/debug failovermethod=priority enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 gpgcheck=0 [epel-source] name=Extra Packages for Enterprise Linux 7 - $basearch - Source baseurl=http://mirrors.aliyun.com/epel/7/SRPMS failovermethod=priority enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 gpgcheck=0 :::::::::::::: /etc/yum.repos.d/hashicorp.repo :::::::::::::: [hashicorp] name=Hashicorp Stable - $basearch baseurl=https://rpm.releases.hashicorp.com/RHEL/$releasever/$basearch/stable enabled=0 gpgcheck=1 gpgkey=https://rpm.releases.hashicorp.com/gpg [hashicorp-test] name=Hashicorp Test - $basearch baseurl=https://rpm.releases.hashicorp.com/RHEL/$releasever/$basearch/test enabled=0 gpgcheck=1 gpgkey=https://rpm.releases.hashicorp.com/gpg :::::::::::::: /etc/yum.repos.d/kubernetes.repo :::::::::::::: [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/rpm/ enabled=1 gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/rpm/repodata/repomd.xml.key :::::::::::::: /etc/yum.repos.d/MariaDB.repo :::::::::::::: [mariadb] name = MariaDB baseurl = https://mirror.mariadb.org/yum/11.2/centos74-amd64 gpgkey = https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck = 0 :::::::::::::: /etc/yum.repos.d/remi-modular.repo :::::::::::::: # Repository: https://rpms.remirepo.net/ # Blog: https://blog.remirepo.net/ # Forum: https://forum.remirepo.net/ [remi-modular] name=Remi's Modular repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/modular/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/modular/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/modular/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-modular-test] name=Remi's Modular testing repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/modular-test/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/modular-test/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/modular-test/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php54.repo :::::::::::::: # This repository only provides PHP 5.4 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php54] name=Remi's PHP 5.4 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php54/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php54/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php54/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php70.repo :::::::::::::: # This repository only provides PHP 7.0 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php70] name=Remi's PHP 7.0 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php70/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php70/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php70/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php70-debuginfo] name=Remi's PHP 7.0 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php70/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php70-test] name=Remi's PHP 7.0 test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test70/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test70/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test70/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php70-test-debuginfo] name=Remi's PHP 7.0 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test70/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php71.repo :::::::::::::: # This repository only provides PHP 7.1 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php71] name=Remi's PHP 7.1 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php71/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php71/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php71/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php71-debuginfo] name=Remi's PHP 7.1 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php71/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php71-test] name=Remi's PHP 7.1 test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test71/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test71/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test71/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php71-test-debuginfo] name=Remi's PHP 7.1 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test71/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php72.repo :::::::::::::: # This repository only provides PHP 7.2 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php72] name=Remi's PHP 7.2 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php72/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php72/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php72/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php72-debuginfo] name=Remi's PHP 7.2 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php72/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php72-test] name=Remi's PHP 7.2 test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test72/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test72/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test72/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php72-test-debuginfo] name=Remi's PHP 7.2 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test72/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php73.repo :::::::::::::: # This repository only provides PHP 7.3 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php73] name=Remi's PHP 7.3 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php73/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php73/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php73/mirror enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php73-debuginfo] name=Remi's PHP 7.3 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php73/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php73-test] name=Remi's PHP 7.3 test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test73/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test73/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test73/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php73-test-debuginfo] name=Remi's PHP 7.3 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test73/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php74.repo :::::::::::::: # This repository only provides PHP 7.4 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php74] name=Remi's PHP 7.4 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php74/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php74/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php74/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php74-debuginfo] name=Remi's PHP 7.4 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php74/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php74-test] name=Remi's PHP 7.4 test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test74/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test74/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test74/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php74-test-debuginfo] name=Remi's PHP 7.4 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test74/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php80.repo :::::::::::::: # This repository only provides PHP 8.0 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php80] name=Remi's PHP 8.0 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php80/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php80/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php80/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php80-debuginfo] name=Remi's PHP 8.0 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php80/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php80-test] name=Remi's PHP 8.0 test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test80/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test80/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test80/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php80-test-debuginfo] name=Remi's PHP 8.0 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test80/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php81.repo :::::::::::::: # This repository only provides PHP 8.1 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php81] name=Remi's PHP 8.1 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php81/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php81/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php81/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php81-debuginfo] name=Remi's PHP 8.1 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php81/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php81-test] name=Remi's PHP 8.1 test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test81/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test81/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test81/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php81-test-debuginfo] name=Remi's PHP 8.1 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test81/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-php82.repo :::::::::::::: # This repository only provides PHP 8.2 and its extensions # NOTICE: common dependencies are in "remi-safe" [remi-php82] name=Remi's PHP 8.2 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php82/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php82/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php82/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php82-debuginfo] name=Remi's PHP 8.2 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php82/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php82-test] name=Remi's PHP 8.2 test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test82/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test82/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test82/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php82-test-debuginfo] name=Remi's PHP 8.2 test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test82/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi.repo :::::::::::::: # Repository: http://rpms.remirepo.net/ # Blog: http://blog.remirepo.net/ # Forum: http://forum.remirepo.net/ [remi] name=Remi's RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/remi/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/remi/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/remi/mirror enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php55] name=Remi's PHP 5.5 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php55/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php55/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php55/mirror # NOTICE: common dependencies are in "remi-safe" enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php56] name=Remi's PHP 5.6 RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/php56/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/php56/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/php56/mirror # NOTICE: common dependencies are in "remi-safe" enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-test] name=Remi's test RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/test/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/test/mirror mirrorlist=http://cdn.remirepo.net/enterprise/7/test/mirror # WARNING: If you enable this repository, you must also enable "remi" enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-debuginfo] name=Remi's RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-remi/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php55-debuginfo] name=Remi's PHP 5.5 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php55/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-php56-debuginfo] name=Remi's PHP 5.6 RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-php56/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-test-debuginfo] name=Remi's test RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-test/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi :::::::::::::: /etc/yum.repos.d/remi-safe.repo :::::::::::::: # This repository is safe to use with RHEL/CentOS base repository # it only provides additional packages for the PHP stack # all dependencies are in base repository or in EPEL [remi-safe] name=Safe Remi's RPM repository for Enterprise Linux 7 - $basearch #baseurl=http://rpms.remirepo.net/enterprise/7/safe/$basearch/ #mirrorlist=https://rpms.remirepo.net/enterprise/7/safe/httpsmirror mirrorlist=http://cdn.remirepo.net/enterprise/7/safe/mirror enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [remi-safe-debuginfo] name=Remi's RPM repository for Enterprise Linux 7 - $basearch - debuginfo baseurl=http://rpms.remirepo.net/enterprise/7/debug-remi/$basearch/ enabled=0 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-remi [root@server3 ~]#

安裝Python3

準(zhǔn)備工作目錄 [root@server3 lichao]# mkdir AIGC [root@server3 lichao]# cd AIGC/ 安裝Python3 安裝編譯環(huán)境和依賴包 [root@server3 AIGC]# yum install wget gcc openssl-devel bzip2-devel libffi-devel [root@server3 AIGC]# yum install openssl11 openssl11-devel openssl-devel 解壓源碼包 [root@server3 AIGC]# tar xvf Python-3.11.9.tar.xz [root@server3 AIGC]# cd Python-3.11.9 [root@server3 Python-3.11.9]# 設(shè)置環(huán)境變量 [root@server3 Python-3.11.9]# export CFLAGS=$(pkg-config --cflags openssl11) [root@server3 Python-3.11.9]# export LDFLAGS=$(pkg-config --libs openssl11) 進(jìn)行編譯安裝 [root@server3 Python-3.11.9]# mkdir -p /home/lichao/opt/python3.11.9 [root@server3 Python-3.11.9]# ./configure --prefix=/home/lichao/opt/python3.11.9 [root@server3 Python-3.11.9]# make && make install 創(chuàng)建軟鏈接，用于全局訪問 [root@server3 Python-3.11.9]# cd /home/lichao/opt/python3.11.9/ [root@server3 python3.11.9]# ln -s /home/lichao/opt/python3.11.9/bin/python3 /usr/bin/python3 [root@server3 python3.11.9]# ln -s /home/lichao/opt/python3.11.9/bin/pip3 /usr/bin/pip3 [root@server3 python3.11.9]# ll /usr/bin/python3 lrwxrwxrwx 1 root root 41 5月 16 08:32 /usr/bin/python3 -> /home/lichao/opt/python3.11.9/bin/python3 [root@server3 python3.11.9]# ll /usr/bin/pip3 lrwxrwxrwx 1 root root 38 5月 16 08:32 /usr/bin/pip3 -> /home/lichao/opt/python3.11.9/bin/pip3 驗證測試 [root@server3 python3.11.9]# python3 Python 3.11.9 (main, May 16 2024, 08:23:00) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> exit() [root@server3 python3.11.9]#

安裝MLNX網(wǎng)卡驅(qū)動

下文以CentOS7為例，詳細(xì)介紹了Mellanox網(wǎng)卡MLNX_OFED的驅(qū)動安裝和固件升級方法。

本次下載的驅(qū)動版本為：MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64.tgz。

[root@server3 ~]# tar –zxvf MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64.tgz [root@server3 ~]# cd MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64 查看當(dāng)前系統(tǒng)的內(nèi)核版本 [root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]# uname -r 3.10.0-957.el7.x86_64 查看當(dāng)前驅(qū)動所支持的內(nèi)核版本 [root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]# cat .supported_kernels 3.10.0-957.el7.x86_64 注：由以上可知下載的默認(rèn)驅(qū)動支持當(dāng)前的內(nèi)核版本如果當(dāng)前內(nèi)核與支持內(nèi)核不匹配，手動編譯適合內(nèi)核的驅(qū)動，在編譯之前首先安裝gcc編譯環(huán)境和kernel開發(fā)包 [root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]#yum install gcc gcc-c++ libstdc++-devel kernel-default-devel 添加針對當(dāng)前內(nèi)核版本的驅(qū)動 [root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]#./mlnx_add_kernel_support.sh -m /root/MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64 -v 注：完成后生成的驅(qū)動文件在/tmp目錄下 [root@server3 MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64]# ls -l /tmp/MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64-ext.tgz -rw-r--r-- 1 root root 282193833 Dec 23 09:49 /tmp/MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64-ext.tgz 安裝驅(qū)動 [root@server3 tmp]# tar xzvf MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64-ext.tgz [root@server3 tmp]# cd MLNX_OFED_LINUX-4.7-3.2.9.0-rhel7.6-x86_64-ext [root@server3 tmp]# ./mlnxofedinstall 最后啟動openibd服務(wù) [root@server3 ~]#/etc/init.d/openibd start [root@server3 ~]#chkconfig openibd on

安裝GPU驅(qū)動和集合通訊庫安裝配置

安裝配置

安裝GPU驅(qū)動和CUDA、CUDNN

安裝開始前，請根據(jù)自己的GPU型號、操作系統(tǒng)版本去英偉達(dá)官網(wǎng)下載相對應(yīng)的軟件包。

[root@server3 AIGC]# ll 總用量 1733448 -rw-r--r-- 1 root root 1430373861 5月 16 08:55 cudnn-local-repo-rhel7-8.9.7.29-1.0-1.x86_64.rpm drwxr-xr-x 7 root root 141 5月 17 13:45 nccl-tests -rwxr-xr-x 1 root root 306736632 5月 16 08:43 NVIDIA-Linux-x86_64-550.67.run drwxrwxr-x 10 1000 1000 4096 5月 17 13:21 openmpi-4.1.6 -rw-r--r-- 1 root root 17751702 9月 30 2023 openmpi-4.1.6.tar.gz drwxr-xr-x 17 root root 4096 5月 16 08:23 Python-3.11.9 -rw-r--r-- 1 root root 20175816 4月 2 13:11 Python-3.11.9.tar.xz [root@server3 AIGC]# ./NVIDIA-Linux-x86_64-550.67.run Verifying archive integrity... OK Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.67...................

[root@server3 AIGC]# yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo 已加載插件：fastestmirror, nvidia adding repo from: https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo grabbing file https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo to /etc/yum.repos.d/cuda-rhel7.repo repo saved to /etc/yum.repos.d/cuda-rhel7.repo [root@server3 AIGC]# yum install libnccl-2.21.5-1+cuda12.4 libnccl-devel-2.21.5-1+cuda12.4 libnccl-static-2.21.5-1+cuda12.4 [root@server3 AIGC]# yum install cudnn-local-repo-rhel7-8.9.7.29-1.0-1.x86_64.rpm

安裝完成后，可以通過nvidia-smi查看驅(qū)動和CUDA版本。如果版本不匹配，則執(zhí)行此命令行會報錯。

[root@server3 AIGC]# nvidia-smi Mon Jun 3 11:59:36 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:02:00.0 Off | N/A | | 0% 34C P0 27W / 165W | 1MiB / 16380MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ [root@server3 AIGC]#

編譯安裝OpenMPI

[root@server3 AIGC]# tar xvf openmpi-4.1.6.tar.gz [root@server3 openmpi-4.1.6]# [root@server3 openmpi-4.1.6]# mkdir -p /home/lichao/lib/openmpi [root@server3 openmpi-4.1.6]# ./configure --prefix=/home/lichao/lib/openmpi -with-cuda=/usr/local/cuda-12.4 -with-nccl=/usr/lib64 Open MPI configuration: ----------------------- Version: 4.1.6 Build MPI C bindings: yes Build MPI C++ bindings (deprecated): no Build MPI Fortran bindings: mpif.h, use mpi MPI Build Java bindings (experimental): no Build Open SHMEM support: yes Debug build: no Platform file: (none) Miscellaneous ----------------------- CUDA support: yes HWLOC support: internal Libevent support: internal Open UCC: no PMIx support: Internal Transports ----------------------- Cisco usNIC: no Cray uGNI (Gemini/Aries): no Intel Omnipath (PSM2): no Intel TrueScale (PSM): no Mellanox MXM: no Open UCX: yes OpenFabrics OFI Libfabric: no OpenFabrics Verbs: yes Portals4: no Shared memory/copy in+copy out: yes Shared memory/Linux CMA: yes Shared memory/Linux KNEM: no Shared memory/XPMEM: no TCP: yes Resource Managers ----------------------- Cray Alps: no Grid Engine: no LSF: no Moab: no Slurm: yes ssh/rsh: yes Torque: no OMPIO File Systems ----------------------- DDN Infinite Memory Engine: no Generic Unix FS: yes IBM Spectrum Scale/GPFS: no Lustre: no PVFS2/OrangeFS: no [root@server3 openmpi-4.1.6]#

編譯安裝NCCL-Test

[root@server3 lichao]# cd AIGC/ [root@server3 AIGC]# git clone https://github.com/NVIDIA/nccl-tests.git [root@server3 AIGC]# cd nccl-tests/ [root@server3 nccl-tests]# make clean [root@server3 nccl-tests]# make MPI=1 MPI_HOME=/home/lichao/opt/openmpi/ CUDA_HOME=/usr/local/cuda-12.4/ NCCL_HOME=/usr/lib64/

集合通信性能測試方法（all_reduce）

[root@server1 lichao]# cat run_nccl-test.sh /home/lichao/opt/openmpi/bin/mpirun --allow-run-as-root -np 3 -host "server1,server2,server3" -mca btl ^openib -x NCCL_DEBUG=INFO -x NCCL_ALGO=ring -x NCCL_IB_DISABLE=0 -x NCCL_IB_GID_INDEX=3 -x NCCL_SOCKET_IFNAME=ens11f1 -x NCCL_IB_HCA=mlx5_1:1 /home/lichao/AIGC/nccl-tests/build/all_reduce_perf -b 128 -e 8G -f 2 -g 1 [root@server1 lichao]# ./run_nccl-test.sh # nThread 1 nGpus 1 minBytes 128 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 # # Using devices # Rank 0 Group 0 Pid 18697 on server1 device 0 [0x02] NVIDIA GeForce RTX 4060 Ti # Rank 1 Group 0 Pid 20893 on server2 device 0 [0x02] NVIDIA GeForce RTX 4060 Ti # Rank 2 Group 0 Pid 2458 on server3 device 0 [0x02] NVIDIA GeForce RTX 4060 Ti # # Reducing maxBytes to 5261099008 due to memory limitation server1:18697:18697 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1 server1:18697:18697 [0] NCCL INFO Bootstrap : Using ens11f1:172.16.0.11 server1:18697:18697 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) server1:18697:18697 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so server1:18697:18697 [0] NCCL INFO NET/Plugin: Using internal network plugin. server2:20893:20893 [0] NCCL INFO cudaDriverVersion 12040 server2:20893:20893 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1 server2:20893:20893 [0] NCCL INFO Bootstrap : Using ens11f1:172.16.0.12 server2:20893:20893 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) server2:20893:20893 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so server2:20893:20893 [0] NCCL INFO NET/Plugin: Using internal network plugin. server1:18697:18697 [0] NCCL INFO cudaDriverVersion 12040 NCCL version 2.21.5+cuda12.4 server3:2458:2458 [0] NCCL INFO cudaDriverVersion 12040 server3:2458:2458 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1 server3:2458:2458 [0] NCCL INFO Bootstrap : Using ens11f1:172.16.0.13 server3:2458:2458 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) server3:2458:2458 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so server3:2458:2458 [0] NCCL INFO NET/Plugin: Using internal network plugin. server2:20893:20907 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. server2:20893:20907 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1 server2:20893:20907 [0] NCCL INFO NCCL_IB_HCA set to mlx5_1:1 server2:20893:20907 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [RO]; OOB ens11f1:172.16.0.12 server2:20893:20907 [0] NCCL INFO Using non-device net plugin version 0 server2:20893:20907 [0] NCCL INFO Using network IB server3:2458:2473 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. server3:2458:2473 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1 server3:2458:2473 [0] NCCL INFO NCCL_IB_HCA set to mlx5_1:1 server1:18697:18712 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. server1:18697:18712 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens11f1 server3:2458:2473 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [RO]; OOB ens11f1:172.16.0.13 server1:18697:18712 [0] NCCL INFO NCCL_IB_HCA set to mlx5_1:1 server3:2458:2473 [0] NCCL INFO Using non-device net plugin version 0 server3:2458:2473 [0] NCCL INFO Using network IB server1:18697:18712 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [RO]; OOB ens11f1:172.16.0.11 server1:18697:18712 [0] NCCL INFO Using non-device net plugin version 0 server1:18697:18712 [0] NCCL INFO Using network IB server1:18697:18712 [0] NCCL INFO ncclCommInitRank comm 0x23622c0 rank 0 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init START server3:2458:2473 [0] NCCL INFO ncclCommInitRank comm 0x346ffc0 rank 2 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init START server2:20893:20907 [0] NCCL INFO ncclCommInitRank comm 0x2a1af20 rank 1 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init START server3:2458:2473 [0] NCCL INFO Setting affinity for GPU 0 to 0f,ff000fff server2:20893:20907 [0] NCCL INFO Setting affinity for GPU 0 to 0f,ff000fff server1:18697:18712 [0] NCCL INFO Setting affinity for GPU 0 to 0f,ff000fff server1:18697:18712 [0] NCCL INFO comm 0x23622c0 rank 0 nRanks 3 nNodes 3 localRanks 1 localRank 0 MNNVL 0 server1:18697:18712 [0] NCCL INFO Channel 00/02 : 0 1 2 server1:18697:18712 [0] NCCL INFO Channel 01/02 : 0 1 2 server1:18697:18712 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] 2/-1/-1->0->1 server1:18697:18712 [0] NCCL INFO P2P Chunksize set to 131072 server3:2458:2473 [0] NCCL INFO comm 0x346ffc0 rank 2 nRanks 3 nNodes 3 localRanks 1 localRank 0 MNNVL 0 server2:20893:20907 [0] NCCL INFO comm 0x2a1af20 rank 1 nRanks 3 nNodes 3 localRanks 1 localRank 0 MNNVL 0 server3:2458:2473 [0] NCCL INFO Trees [0] 1/-1/-1->2->0 [1] -1/-1/-1->2->0 server3:2458:2473 [0] NCCL INFO P2P Chunksize set to 131072 server2:20893:20907 [0] NCCL INFO Trees [0] -1/-1/-1->1->2 [1] 0/-1/-1->1->-1 server2:20893:20907 [0] NCCL INFO P2P Chunksize set to 131072 server3:2458:2473 [0] NCCL INFO Channel 00/0 : 1[0] -> 2[0] [receive] via NET/IB/0 server3:2458:2473 [0] NCCL INFO Channel 01/0 : 1[0] -> 2[0] [receive] via NET/IB/0 server3:2458:2473 [0] NCCL INFO Channel 00/0 : 2[0] -> 0[0] [send] via NET/IB/0 server3:2458:2473 [0] NCCL INFO Channel 01/0 : 2[0] -> 0[0] [send] via NET/IB/0 server2:20893:20907 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[0] [receive] via NET/IB/0 server2:20893:20907 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[0] [receive] via NET/IB/0 server2:20893:20907 [0] NCCL INFO Channel 00/0 : 1[0] -> 2[0] [send] via NET/IB/0 server2:20893:20907 [0] NCCL INFO Channel 01/0 : 1[0] -> 2[0] [send] via NET/IB/0 server1:18697:18712 [0] NCCL INFO Channel 00/0 : 2[0] -> 0[0] [receive] via NET/IB/0 server1:18697:18712 [0] NCCL INFO Channel 01/0 : 2[0] -> 0[0] [receive] via NET/IB/0 server1:18697:18712 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[0] [send] via NET/IB/0 server1:18697:18712 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[0] [send] via NET/IB/0 server3:2458:2475 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. server1:18697:18714 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. server2:20893:20909 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. server1:18697:18712 [0] NCCL INFO Connected all rings server1:18697:18712 [0] NCCL INFO Channel 01/0 : 1[0] -> 0[0] [receive] via NET/IB/0 server3:2458:2473 [0] NCCL INFO Connected all rings server2:20893:20907 [0] NCCL INFO Connected all rings server1:18697:18712 [0] NCCL INFO Channel 00/0 : 0[0] -> 2[0] [send] via NET/IB/0 server2:20893:20907 [0] NCCL INFO Channel 00/0 : 2[0] -> 1[0] [receive] via NET/IB/0 server1:18697:18712 [0] NCCL INFO Channel 01/0 : 0[0] -> 2[0] [send] via NET/IB/0 server3:2458:2473 [0] NCCL INFO Channel 00/0 : 0[0] -> 2[0] [receive] via NET/IB/0 server2:20893:20907 [0] NCCL INFO Channel 01/0 : 1[0] -> 0[0] [send] via NET/IB/0 server3:2458:2473 [0] NCCL INFO Channel 01/0 : 0[0] -> 2[0] [receive] via NET/IB/0 server3:2458:2473 [0] NCCL INFO Channel 00/0 : 2[0] -> 1[0] [send] via NET/IB/0 server3:2458:2473 [0] NCCL INFO Connected all trees server1:18697:18712 [0] NCCL INFO Connected all trees server1:18697:18712 [0] NCCL INFO NCCL_ALGO set by environment to ring server3:2458:2473 [0] NCCL INFO NCCL_ALGO set by environment to ring server3:2458:2473 [0] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512 server3:2458:2473 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer server2:20893:20907 [0] NCCL INFO Connected all trees server2:20893:20907 [0] NCCL INFO NCCL_ALGO set by environment to ring server2:20893:20907 [0] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512 server2:20893:20907 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer server1:18697:18712 [0] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512 server1:18697:18712 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer server2:20893:20907 [0] NCCL INFO TUNER/Plugin: Plugin load returned 11 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so server2:20893:20907 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. server2:20893:20907 [0] NCCL INFO ncclCommInitRank comm 0x2a1af20 rank 1 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init COMPLETE server3:2458:2473 [0] NCCL INFO TUNER/Plugin: Plugin load returned 11 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so server3:2458:2473 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. server3:2458:2473 [0] NCCL INFO ncclCommInitRank comm 0x346ffc0 rank 2 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init COMPLETE server1:18697:18712 [0] NCCL INFO TUNER/Plugin: Plugin load returned 11 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so server1:18697:18712 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. server1:18697:18712 [0] NCCL INFO ncclCommInitRank comm 0x23622c0 rank 0 nranks 3 cudaDev 0 nvmlDev 0 busId 2000 commId 0x35491327c8228dd0 - Init COMPLETE # # out-of-place in-place # size count type redop root time algbw busbw #wrong time algbw busbw #wrong # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 128 32 float sum -1 28.39 0.00 0.01 0 27.35 0.00 0.01 0 256 64 float sum -1 29.44 0.01 0.01 0 28.54 0.01 0.01 0 512 128 float sum -1 29.99 0.02 0.02 0 29.66 0.02 0.02 0 1024 256 float sum -1 32.89 0.03 0.04 0 30.64 0.03 0.04 0 2048 512 float sum -1 34.81 0.06 0.08 0 31.87 0.06 0.09 0 4096 1024 float sum -1 37.32 0.11 0.15 0 36.09 0.11 0.15 0 8192 2048 float sum -1 45.11 0.18 0.24 0 43.12 0.19 0.25 0 16384 4096 float sum -1 57.92 0.28 0.38 0 56.98 0.29 0.38 0 32768 8192 float sum -1 72.68 0.45 0.60 0 70.79 0.46 0.62 0 65536 16384 float sum -1 95.77 0.68 0.91 0 93.73 0.70 0.93 0 131072 32768 float sum -1 162.7 0.81 1.07 0 161.5 0.81 1.08 0 262144 65536 float sum -1 177.3 1.48 1.97 0 177.4 1.48 1.97 0 524288 131072 float sum -1 301.4 1.74 2.32 0 302.0 1.74 2.31 0 1048576 262144 float sum -1 557.9 1.88 2.51 0 559.2 1.88 2.50 0 2097152 524288 float sum -1 1089.8 1.92 2.57 0 1092.2 1.92 2.56 0 4194304 1048576 float sum -1 2165.7 1.94 2.58 0 2166.6 1.94 2.58 0 8388608 2097152 float sum -1 4315.7 1.94 2.59 0 4316.1 1.94 2.59 0 16777216 4194304 float sum -1 8528.8 1.97 2.62 0 8529.3 1.97 2.62 0 33554432 8388608 float sum -1 16622 2.02 2.69 0 16610 2.02 2.69 0 67108864 16777216 float sum -1 32602 2.06 2.74 0 32542 2.06 2.75 0 134217728 33554432 float sum -1 63946 2.10 2.80 0 63831 2.10 2.80 0 268435456 67108864 float sum -1 126529 2.12 2.83 0 126412 2.12 2.83 0 536870912 134217728 float sum -1 251599 2.13 2.85 0 251327 2.14 2.85 0 1073741824 268435456 float sum -1 500664 2.14 2.86 0 501911 2.14 2.85 0 2147483648 536870912 float sum -1 1001415 2.14 2.86 0 1000178 2.15 2.86 0 4294967296 1073741824 float sum -1 1999361 2.15 2.86 0 1997380 2.15 2.87 0 server1:18697:18697 [0] NCCL INFO comm 0x23622c0 rank 0 nranks 3 cudaDev 0 busId 2000 - Destroy COMPLETE server2:20893:20893 [0] NCCL INFO comm 0x2a1af20 rank 1 nranks 3 cudaDev 0 busId 2000 - Destroy COMPLETE server3:2458:2458 [0] NCCL INFO comm 0x346ffc0 rank 2 nranks 3 cudaDev 0 busId 2000 - Destroy COMPLETE # Out of bounds values : 0 OK # Avg bus bandwidth : 1.66163 # [root@server1 lichao]#

結(jié)果詳解

- size (B)：操作處理的數(shù)據(jù)的大小，以字節(jié)為單位；

- count (elements)：操作處理的元素的數(shù)量；

- type：元素的數(shù)據(jù)類型；

- redop：使用的歸約操作；

- root：對于某些操作（如 reduce 和 broadcast），這列指定了根節(jié)點的編號，值是 -1 表示這個操作沒有根節(jié)點（all-reduce 操作涉及到所有的節(jié)點）；

- time (us)：操作的執(zhí)行時間，以微秒為單位；

- algbw (GB/s)：算法帶寬，以每秒吉字節(jié)（GB/s）為單位；

- busbw (GB/s)：總線帶寬，以每秒吉字節(jié)（GB/s）為單位；

- wrong：錯誤的數(shù)量，如果這個值不是 0，那可能表示有一些錯誤發(fā)生。

在這個例子中，你可以看到，當(dāng)處理的數(shù)據(jù)量增大時，算法帶寬和總線帶寬都有所提高，這可能表示 NCCL 能夠有效地利用大量的數(shù)據(jù)。

查看結(jié)果時，需要關(guān)注如下幾點：

1. 數(shù)據(jù)量增加時，帶寬是否會下降（下降明顯不符合預(yù)期）；

2. 更關(guān)注帶寬的峰值，每次算到的帶寬峰值，可以只關(guān)注 in 或者 out；

3. 平均值，在數(shù)據(jù)量遞增的情況下，可能無法體現(xiàn)最終的結(jié)果；

4. 請確保數(shù)據(jù)量足夠大，可以壓到帶寬上限（通過調(diào)整 b、e 或者 n 選項）。

常用參數(shù)及解釋

- GPU 數(shù)量

- -t,--nthreads 每個進(jìn)程的線程數(shù)量配置，默認(rèn) 1；

- -g,--ngpus 每個線程的 GPU 數(shù)量，默認(rèn) 1；

- 數(shù)據(jù)大小配置

- -b,--minbytes 開始的最小數(shù)據(jù)量，默認(rèn) 32M；

- -e,--maxbytes 結(jié)束的最大數(shù)據(jù)量，默認(rèn) 32M；

- 數(shù)據(jù)步長設(shè)置

- -i,--stepbytes 每次增加的數(shù)據(jù)量，默認(rèn): 1M；

- -f,--stepfactor 每次增加的倍數(shù)，默認(rèn)禁用；

- NCCL 操作相關(guān)配置

- -o,--op 指定那種操作為reduce，僅適用于Allreduce、Reduce或ReduceScatter等縮減操作。默認(rèn)值為：求和（Sum）；

- -d,--datatype 指定使用哪種數(shù)據(jù)類型，默認(rèn) : Float；

- 性能相關(guān)配置

- -n,--iters 每次操作（一次發(fā)送）循環(huán)多少次，默認(rèn) : 20；

- -w,--warmup_iters 預(yù)熱迭代次數(shù)（不計時），默認(rèn)：5；

- -m,--agg_iters 每次迭代中要聚合在一起的操作數(shù)，默認(rèn)：1；

- -a,--average <0/1/2/3> 在所有 ranks 計算均值作為最終結(jié)果 (MPI=1 only). <0=Rank0,1=Avg,2=Min,3=Max>，默認(rèn)：1；

- 測試相關(guān)配置

- -p,--parallel_init <0/1> 使用線程并行初始化 NCCL，默認(rèn): 0；

- -c,--check <0/1> 檢查結(jié)果的正確性。在大量GPU上可能會非常慢，默認(rèn)：1；

- -z,--blocking <0/1> 使NCCL集合阻塞，即在每個集合之后讓CPU等待和同步，默認(rèn)：0；

- -G,--cudagraph 將迭代作為CUDA圖形捕獲，然后重復(fù)指定的次數(shù)，默認(rèn)：0；

實驗測試

完成硬件、軟件的選型和配置后，下一步將進(jìn)行實踐測試。

獲取LLaMA-Factory源碼包

因為網(wǎng)絡(luò)問題很難直接通過git clone命令行拉取，建議通過打包下載后自己上傳的方式進(jìn)行：

noone@MacBook-Air Downloads % scp LLaMA-Factory-0.8.3.zip root@10.230.1.13:/tmp [root@server3 AIGC]# pwd /home/lichao/AIGC [root@server3 AIGC]# cp /tmp/LLaMA-Factory-0.8.3.zip ./ [root@server3 AIGC]# unzip LLaMA-Factory-0.8.3.zip [root@server3 AIGC]# cd LLaMA-Factory-0.8.3 [root@server3 LLaMA-Factory-0.8.3]# ll 總用量 128 drwxr-xr-x 2 root root 83 9月 13 05:04 assets drwxr-xr-x 2 root root 122 9月 6 08:26 cache -rw-r--r-- 1 root root 1378 7月 18 19:36 CITATION.cff drwxr-xr-x 6 root root 4096 9月 13 05:03 data drwxr-xr-x 4 root root 43 7月 18 19:36 docker drwxr-xr-x 5 root root 44 7月 18 19:36 evaluation drwxr-xr-x 10 root root 182 7月 18 19:36 examples -rw-r--r-- 1 root root 11324 7月 18 19:36 LICENSE -rw-r--r-- 1 root root 242 7月 18 19:36 Makefile -rw-r--r-- 1 root root 33 7月 18 19:36 MANIFEST.in -rw-r--r-- 1 root root 645 7月 18 19:36 pyproject.toml -rw-r--r-- 1 root root 44424 7月 18 19:36 README.md -rw-r--r-- 1 root root 44093 7月 18 19:36 README_zh.md -rw-r--r-- 1 root root 245 7月 18 19:36 requirements.txt drwxr-xr-x 3 root root 16 9月 6 18:48 saves drwxr-xr-x 2 root root 219 7月 18 19:36 scripts -rw-r--r-- 1 root root 3361 7月 18 19:36 setup.py drwxr-xr-x 4 root root 101 9月 6 08:22 src drwxr-xr-x 5 root root 43 7月 18 19:36 tests [root@server3 LLaMA-Factory-0.8.3]#

安裝LLaMA-Factory，并進(jìn)行驗證

[root@server3 LLaMA-Factory-0.8.3]# pip install -e ".[torch,metrics]" [root@server3 LLaMA-Factory-0.8.3]# llamafactory-cli version [2024-09-23 08:51:28,722] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) ---------------------------------------------------------- | Welcome to LLaMA Factory, version 0.8.3 | | | | Project page: https://github.com/hiyouga/LLaMA-Factory | ---------------------------------------------------------- [root@server3 LLaMA-Factory-0.8.3]#

下載訓(xùn)練時所需的預(yù)訓(xùn)練模型和數(shù)據(jù)集

根據(jù)當(dāng)前GPU服務(wù)器所配置的GPU硬件規(guī)格，選擇適合的訓(xùn)練方法、模型和數(shù)據(jù)集。

GPU型號：NVIDIA GeForce RTX 4060 Ti 16GB

預(yù)訓(xùn)練模型：Qwen/Qwen1.5-0.5B-Chat

數(shù)據(jù)集：identity、alpaca_zh_demo

# Make sure you have git-lfs installed (https://git-lfs.com) git lfs install git clone https://hf-mirror.com/Qwen/Qwen1.5-0.5B-Chat # If you want to clone without large files - just their pointers GIT_LFS_SKIP_SMUDGE=1 git clone https://hf-mirror.com/Qwen/Qwen1.5-0.5B-Chat

因為網(wǎng)絡(luò)問題通過命令行很難直接下載，這里使用huggingface的國內(nèi)鏡像站拉取預(yù)訓(xùn)練模型數(shù)據(jù)，并使用“GIT_LFS_SKIP_SMUDGE=1”變量跳過大文件，隨后手工下載后再上傳。

如果覺得麻煩，也可以安裝使用huggingface的命令行工具，下載預(yù)訓(xùn)練模型和數(shù)據(jù)集。同樣地，安裝完成后，需要配置一些環(huán)境變量（使用鏡像站hf-mirror.com）來解決網(wǎng)絡(luò)問題。

1. 安裝依賴 [root@server3 LLaMA-Factory-0.8.3]# pip3 install -U huggingface_hub 2. 設(shè)置環(huán)境變量 [root@server3 LLaMA-Factory-0.8.3]# export HF_ENDPOINT=https://hf-mirror.com 可以寫入 ~/.bashrc 永久生效。 3. 確認(rèn)環(huán)境變量生效 [root@server3 LLaMA-Factory-0.8.3]# huggingface-cli env Copy-and-paste the text below in your GitHub issue. - huggingface_hub version: 0.24.5 - Platform: Linux-3.10.0-1160.118.1.el7.x86_64-x86_64-with-glibc2.17 - Python version: 3.11.9 - Running in iPython ?: No - Running in notebook ?: No - Running in Google Colab ?: No - Token path ?: /root/.cache/huggingface/token - Has saved token ?: True - Who am I ?: richard-open-source - Configured git credential helpers: - FastAI: N/A - Tensorflow: N/A - Torch: 2.4.0 - Jinja2: 3.1.4 - Graphviz: N/A - keras: N/A - Pydot: N/A - Pillow: 10.4.0 - hf_transfer: N/A - gradio: 4.43.0 - tensorboard: N/A - numpy: 1.26.4 - pydantic: 2.9.0 - aiohttp: 3.10.3 - ENDPOINT: https://hf-mirror.com - HF_HUB_CACHE: /root/.cache/huggingface/hub - HF_ASSETS_CACHE: /root/.cache/huggingface/assets - HF_TOKEN_PATH: /root/.cache/huggingface/token - HF_HUB_OFFLINE: False - HF_HUB_DISABLE_TELEMETRY: False - HF_HUB_DISABLE_PROGRESS_BARS: None - HF_HUB_DISABLE_SYMLINKS_WARNING: False - HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False - HF_HUB_DISABLE_IMPLICIT_TOKEN: False - HF_HUB_ENABLE_HF_TRANSFER: False - HF_HUB_ETAG_TIMEOUT: 10 - HF_HUB_DOWNLOAD_TIMEOUT: 10 [root@server3 LLaMA-Factory-0.8.3]# 4.1 下載模型 [root@server3 LLaMA-Factory-0.8.3]# huggingface-cli download --resume-download Qwen/Qwen1.5-0.5B-Chat --local-dir ./models/Qwen1.5-0.5B-Chat 4.2 下載數(shù)據(jù)集 [root@server3 LLaMA-Factory-0.8.3]# huggingface-cli download --repo-type dataset --resume-download alpaca_zh_demo --local-dir ./datasets/alpaca_zh_demo

下載預(yù)訓(xùn)練模型

[root@server3 AIGC]# mkdir models [root@server3 AIGC]# cd models/ [root@server3 models]# GIT_LFS_SKIP_SMUDGE=1 git clone https://hf-mirror.com/Qwen/Qwen1.5-0.5B-Chat [root@server3 models]# tree -h Qwen1.5-0.5B-Chat/ Qwen1.5-0.5B-Chat/ ├── [ 656] config.json ├── [ 661] config.json.raw ├── [ 206] generation_config.json ├── [7.1K] LICENSE ├── [1.6M] merges.txt ├── [1.2G] model.safetensors ├── [4.2K] README.md ├── [1.3K] tokenizer_config.json ├── [6.7M] tokenizer.json └── [2.6M] vocab.json 0 directories, 10 files [root@server3 models]#

下載數(shù)據(jù)集

默認(rèn)情況下，LLaMA-Factory項目文件下的data目錄，自帶了一些本地數(shù)據(jù)集可直接使用。

[root@server3 LLaMA-Factory-0.8.3]# tree -h data/ data/ ├── [841K] alpaca_en_demo.json ├── [621K] alpaca_zh_demo.json ├── [ 32] belle_multiturn │ └── [2.7K] belle_multiturn.py ├── [733K] c4_demo.json ├── [ 13K] dataset_info.json ├── [1.5M] dpo_en_demo.json ├── [833K] dpo_zh_demo.json ├── [722K] glaive_toolcall_en_demo.json ├── [665K] glaive_toolcall_zh_demo.json ├── [ 27] hh_rlhf_en │ └── [3.3K] hh_rlhf_en.py ├── [ 20K] identity.json ├── [892K] kto_en_demo.json ├── [ 45] mllm_demo_data │ ├── [ 12K] 1.jpg │ ├── [ 22K] 2.jpg │ └── [ 16K] 3.jpg ├── [3.1K] mllm_demo.json ├── [9.8K] README.md ├── [9.2K] README_zh.md ├── [ 27] ultra_chat │ └── [2.3K] ultra_chat.py └── [1004K] wiki_demo.txt 4 directories, 20 files [root@server3 LLaMA-Factory-0.8.3]#

使用準(zhǔn)備好的模型與數(shù)據(jù)集，在單機(jī)上進(jìn)行訓(xùn)練測試

LLaMA-Factory支持通過WebUI微調(diào)大語言模型。在完成安裝后，我們可以使用WebUI進(jìn)行快速調(diào)測驗證，沒問題后可使用命令行工具進(jìn)行多機(jī)分布式訓(xùn)練。

[root@server3 LLaMA-Factory-0.8.3]# llamafactory-cli webui [2024-09-23 17:54:45,786] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) Running on local URL: http://0.0.0.0:7861 To create a public link, set `share=True` in `launch()`.

使用命令行運行多機(jī)分布式訓(xùn)練任務(wù)

1. 準(zhǔn)備目錄 [root@server3 LLaMA-Factory-0.8.3]# mkdir asterun [root@server3 LLaMA-Factory-0.8.3]# mkdir -p asterun/saves/qwen/full/sft 2. 根據(jù)集群環(huán)境和訓(xùn)練任務(wù)，準(zhǔn)備分布式訓(xùn)練的配置文件 [root@server3 LLaMA-Factory-0.8.3]# cat asterun/qwen_full_sft_ds2.yaml ### model model_name_or_path: /home/lichao/AIGC/models/Qwen1.5-0.5B-Chat ### method stage: sft do_train: true finetuning_type: full deepspeed: examples/deepspeed/ds_z2_config.json ### dataset dataset: identity,alpaca_zh_demo template: llama3 cutoff_len: 1024 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16 ### output output_dir: asterun/saves/qwen/full/sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true report_to: tensorboard logging_dir: asterun/saves/qwen/full/sft/runs ### train per_device_train_batch_size: 1 gradient_accumulation_steps: 2 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 ### eval val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500 [root@server3 LLaMA-Factory-0.8.3]# 3. 用同樣的方式，在Server1和Server2上準(zhǔn)備運行環(huán)境步驟略。 4. 依次在集群中的3個GPU節(jié)點上啟動分布式訓(xùn)練任務(wù) 主節(jié)點rank0： [root@server3 LLaMA-Factory-0.8.3]# FORCE_TORCHRUN=1 NNODES=3 RANK=0 MASTER_ADDR=172.16.0.13 MASTER_PORT=29500 llamafactory-cli train asterun/qwen_full_sft_ds2.yaml 從節(jié)點rank1： [root@server2 LLaMA-Factory-0.8.3]# FORCE_TORCHRUN=1 NNODES=3 RANK=1 MASTER_ADDR=172.16.0.13 MASTER_PORT=29500 llamafactory-cli train asterun/qwen_full_sft_ds2.yaml 從節(jié)點rank2： [root@server1 LLaMA-Factory-0.8.3]# FORCE_TORCHRUN=1 NNODES=3 RANK=2 MASTER_ADDR=172.16.0.13 MASTER_PORT=29500 llamafactory-cli train asterun/qwen_full_sft_ds2.yaml

推理測試

安裝GGUF庫

下載llama.cpp源碼包到服務(wù)器，解壓到工作目錄 [root@server3 AIGC]# unzip llama.cpp-master.zip [root@server3 AIGC]# cd llama.cpp-master [root@server3 llama.cpp-master]# ll 總用量 576 -rw-r--r-- 1 root root 33717 9月 26 11:38 AUTHORS drwxr-xr-x 2 root root 37 9月 26 11:38 ci drwxr-xr-x 2 root root 164 9月 26 11:38 cmake -rw-r--r-- 1 root root 6591 9月 26 11:38 CMakeLists.txt -rw-r--r-- 1 root root 3164 9月 26 11:38 CMakePresets.json drwxr-xr-x 3 root root 4096 9月 26 11:38 common -rw-r--r-- 1 root root 2256 9月 26 11:38 CONTRIBUTING.md -rwxr-xr-x 1 root root 199470 9月 26 11:38 convert_hf_to_gguf.py -rwxr-xr-x 1 root root 15993 9月 26 11:38 convert_hf_to_gguf_update.py -rwxr-xr-x 1 root root 19106 9月 26 11:38 convert_llama_ggml_to_gguf.py -rwxr-xr-x 1 root root 14901 9月 26 11:38 convert_lora_to_gguf.py drwxr-xr-x 4 root root 109 9月 26 11:38 docs drwxr-xr-x 43 root root 4096 9月 26 11:38 examples -rw-r--r-- 1 root root 1556 9月 26 11:38 flake.lock -rw-r--r-- 1 root root 7469 9月 26 11:38 flake.nix drwxr-xr-x 5 root root 85 9月 26 11:38 ggml drwxr-xr-x 6 root root 116 9月 26 11:38 gguf-py drwxr-xr-x 2 root root 154 9月 26 11:38 grammars drwxr-xr-x 2 root root 21 9月 26 11:38 include -rw-r--r-- 1 root root 1078 9月 26 11:38 LICENSE -rw-r--r-- 1 root root 50865 9月 26 11:38 Makefile drwxr-xr-x 2 root root 163 9月 26 11:38 media drwxr-xr-x 2 root root 4096 9月 26 11:38 models -rw-r--r-- 1 root root 163 9月 26 11:38 mypy.ini -rw-r--r-- 1 root root 2044 9月 26 11:38 Package.swift drwxr-xr-x 3 root root 40 9月 26 11:38 pocs -rw-r--r-- 1 root root 124786 9月 26 11:38 poetry.lock drwxr-xr-x 2 root root 4096 9月 26 11:38 prompts -rw-r--r-- 1 root root 1280 9月 26 11:38 pyproject.toml -rw-r--r-- 1 root root 528 9月 26 11:38 pyrightconfig.json -rw-r--r-- 1 root root 28481 9月 26 11:38 README.md drwxr-xr-x 2 root root 4096 9月 26 11:38 requirements -rw-r--r-- 1 root root 505 9月 26 11:38 requirements.txt drwxr-xr-x 2 root root 4096 9月 26 11:38 scripts -rw-r--r-- 1 root root 5090 9月 26 11:38 SECURITY.md drwxr-xr-x 2 root root 97 9月 26 11:38 spm-headers drwxr-xr-x 2 root root 289 9月 26 11:38 src drwxr-xr-x 2 root root 4096 9月 26 11:38 tests [root@server3 llama.cpp-master]# 進(jìn)入gguf-py子目錄，安裝GGUF庫 [root@server3 llama.cpp-master]# cd gguf-py [root@server3 gguf-py]# ll 總用量 12 drwxr-xr-x 2 root root 40 9月 26 11:38 examples drwxr-xr-x 2 root root 230 9月 26 11:38 gguf -rw-r--r-- 1 root root 1072 9月 26 11:38 LICENSE -rw-r--r-- 1 root root 1049 9月 26 11:38 pyproject.toml -rw-r--r-- 1 root root 2719 9月 26 11:38 README.md drwxr-xr-x 2 root root 151 9月 26 11:38 scripts drwxr-xr-x 2 root root 71 9月 26 11:38 tests [root@server3 gguf-py]# pip install --editable . Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Obtaining file:///home/lichao/AIGC/llama.cpp-master/gguf-py Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Preparing editable metadata (pyproject.toml) ... done Requirement already satisfied: numpy>=1.17 in /home/lichao/opt/python3.11.9/lib/python3.11/site-packages (from gguf==0.10.0) (1.26.4) Requirement already satisfied: pyyaml>=5.1 in /home/lichao/opt/python3.11.9/lib/python3.11/site-packages (from gguf==0.10.0) (6.0.2) Requirement already satisfied: sentencepiece<=0.2.0,?>=0.1.98 in /home/lichao/opt/python3.11.9/lib/python3.11/site-packages (from gguf==0.10.0) (0.2.0) Requirement already satisfied: tqdm>=4.27 in /home/lichao/opt/python3.11.9/lib/python3.11/site-packages (from gguf==0.10.0) (4.66.5) Building wheels for collected packages: gguf Building editable for gguf (pyproject.toml) ... done Created wheel for gguf: filename=gguf-0.10.0-py3-none-any.whl size=3403 sha256=4a0851426e263076c64c9854be9dfe95493844062484d001fddb08c1be5fa2ca Stored in directory: /tmp/pip-ephem-wheel-cache-iiq8ofh3/wheels/80/80/9b/c6c23d750f4bd20fc0c2c75e51253d89c61a2369247fb694db Successfully built gguf Installing collected packages: gguf Successfully installed gguf-0.10.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [root@server3 gguf-py]#

模型格式轉(zhuǎn)換

將之前微調(diào)訓(xùn)練生成的safetensors格式的模型，轉(zhuǎn)換為gguf格式 [root@server3 gguf-py]# cd .. [root@server3 llama.cpp-master]# python3 convert_hf_to_gguf.py /home/lichao/AIGC/LLaMA-Factory-0.8.3/asterun/saves/qwen/full/sft INFO:hf-to-gguf:Loading model: sft INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Exporting model... INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {1024, 151936} INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {1024, 151936} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.0.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.0.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.0.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.1.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.1.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.1.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.10.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.10.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.10.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.11.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.11.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.11.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.11.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.11.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.11.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.12.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.12.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.12.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.13.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.13.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.13.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.13.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.13.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.13.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.14.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.14.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.14.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.15.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.15.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.15.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.15.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.15.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.15.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.16.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.16.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.16.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.16.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.16.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.16.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.17.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.17.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.17.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.17.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.17.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.17.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.18.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.18.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.18.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.18.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.18.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.18.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.19.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.19.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.19.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.19.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.19.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.19.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.2.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.2.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.2.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.20.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.20.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.20.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.20.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.20.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.20.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.21.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.21.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.21.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.21.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.21.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.21.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.22.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.22.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.22.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.22.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.22.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.22.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.23.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.23.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.23.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.23.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.23.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.23.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.3.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.3.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.3.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.4.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.4.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.4.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.5.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.5.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.5.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.6.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.6.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.6.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.7.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.7.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.7.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.8.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.8.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.8.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {2816, 1024} INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {1024, 2816} INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.9.attn_k.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.9.attn_q.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:blk.9.attn_v.bias, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {1024, 1024} INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {1024} INFO:hf-to-gguf:Set meta model INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 32768 INFO:hf-to-gguf:gguf: embedding length = 1024 INFO:hf-to-gguf:gguf: feed forward length = 2816 INFO:hf-to-gguf:gguf: head count = 16 INFO:hf-to-gguf:gguf: key-value head count = 16 INFO:hf-to-gguf:gguf: rope theta = 1000000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06 INFO:hf-to-gguf:gguf: file type = 1 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Adding 151387 merge(s). INFO:gguf.vocab:Setting special token type eos to 151646 INFO:gguf.vocab:Setting special token type pad to 151643 INFO:gguf.vocab:Setting special token type bos to 151643 INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ '<|start_header_id|?>system<|end_header_id|?> ' + system_message + '<|eot_id|?>' }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|start_header_id|?>user<|end_header_id|?> ' + content + '<|eot_id|?><|start_header_id|?>assistant<|end_header_id|?> ' }}{% elif message['role'] == 'assistant' %}{{ content + '<|eot_id|?>' }}{% endif %}{% endfor %} INFO:hf-to-gguf:Set model quantization version INFO:gguf.gguf_writer:Writing the following files: INFO:gguf.gguf_writer:/home/lichao/AIGC/LLaMA-Factory-0.8.3/asterun/saves/qwen/full/sft/Sft-620M-F16.gguf: n_tensors = 291, total_size = 1.2G Writing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.24G/1.24G [00:03
安裝Ollama OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2024-09-26T12:04:20.753+02:00 level=INFO source=images.go:753 msg="total blobs: 0" time=2024-09-26T12:04:20.754+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0" time=2024-09-26T12:04:20.754+02:00 level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11434 (version 0.3.12)" time=2024-09-26T12:04:20.755+02:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama316805737/runners time=2024-09-26T12:04:39.145+02:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]" time=2024-09-26T12:04:39.145+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-09-26T12:04:39.283+02:00 level=INFO source=types.go:107 msg="inference compute" id=GPU-2d337ad0-020d-0464-2d00-715b0d00c7ba library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4060 Ti" total="15.7 GiB" available="15.6 GiB" 注冊模型打開一個新的terminal [root@server3 AIGC]# cd LLaMA-Factory-0.8.3/asterun/ [root@server3 asterun]# ll 總用量 4 -rw-r--r-- 1 root root 817 9月 19 09:33 qwen_full_sft_ds2.yaml drwxr-xr-x 3 root root 18 9月 13 10:28 saves 創(chuàng)建模型的Modelfile文件 [root@server3 asterun]# touch qwen_full_sft_ds2.ollama.Modelfile [root@server3 asterun]# vim qwen_full_sft_ds2.ollama.Modelfile [root@server3 asterun]# cat qwen_full_sft_ds2.ollama.Modelfile FROM /home/lichao/AIGC/LLaMA-Factory-0.8.3/asterun/saves/qwen/full/sft/qwen-sft-620M-F16.gguf [root@server3 asterun]# cd ../.. 使用Modelfile注冊模型 [root@server3 AIGC]# ollama create qwen-full-sft -f ./LLaMA-Factory-0.8.3/asterun/qwen_full_sft_ds2.ollama.Modelfile transferring model data 100% using existing layer sha256:19d794be57081c1a5aa7e03c4045a0fdc5b8a40f080f0c550ab38033cf0d5d58 creating new layer sha256:c33681b055686143e7d6e0bb0f1054c9910c05c3f4ab16932fbc567a8961929a writing manifest success [root@server3 AIGC]# 使用注冊好的模型運行推理服務(wù) [root@server3 AIGC]# ollama run qwen-full-sft >>> who are you? <|im_end|?> 我是 {{name}}，一個由 {{author}} 開發(fā)的人工智能助手，我可以幫助用戶查詢信息、安排日程、提供建議等。 >>> can you speak english? I am an AI assistant developed by {{author}}. >>> 好吧，用中文交流吧。沒問題。 >>> 你喜歡中國哪個城市？每個城市都有其獨特的魅力，各具特色，比如：成都：美食之都，生活悠閑。北京：歷史悠久，文化豐富。杭州：風(fēng)景優(yōu)美，以西湖聞名。上海：現(xiàn)代化大都市，經(jīng)濟(jì)繁榮。 >>> 感謝，再見好的，我是個人工智能助手，很高興見到您。 >>> exit [root@server3 AIGC]# 至此，已完成分布式計算環(huán)境的搭建與測試審核編輯黃宇?

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴 gpu gpu +關(guān)注關(guān)注 28 文章 4739 瀏覽量 128941 AI AI +關(guān)注關(guān)注 87 文章 30887 瀏覽量 269060 分布式計算分布式計算 +關(guān)注關(guān)注 0 文章 28 瀏覽量 4473 大模型大模型 +關(guān)注關(guān)注 2 文章 2448 瀏覽量 2699

收藏人收藏掃一掃，分享給好友復(fù)制鏈接分享評論發(fā)布評論請先登錄相關(guān)推薦分布式軟件系統(tǒng) 。分布式程序設(shè)計語言用于編寫運行于分布式計算機(jī)系統(tǒng)上的分布式程序。一個分布式程序由若干個可以獨立執(zhí)行的程序模塊組成,它們分布于一個發(fā)表于 07-22 14:53 基于分布式調(diào)用鏈監(jiān)控技術(shù)的全息排查功能作為鷹眼的商業(yè)化產(chǎn)品，用于全鏈路APM監(jiān)控的阿里云業(yè)務(wù)實時監(jiān)控服務(wù) (ARMS) ，基于鷹眼的全息排查沉淀，近日推出了基于分布式調(diào)用鏈監(jiān)控技術(shù)的全息排查功能，將該功能提供給廣大用戶。至此，ARMS 發(fā)表于 08-07 17:02 分布式系統(tǒng)的優(yōu)勢是什么？當(dāng)討論分布式系統(tǒng)時，我們面臨許多以下這些形容詞所描述的同類型：分布式的、刪絡(luò)的、并行的、并發(fā)的和分散的。分布式處理是一個相對較新的領(lǐng)域，所以還沒有‘致的定義。與順序計算相比、并行的發(fā)表于 03-31 09:01 HarmonyOS應(yīng)用開發(fā)-分布式任務(wù)調(diào)度什么如何創(chuàng)建一個HarmonyOSDemo Project 如何構(gòu)建一個HAP并且將其部署到智慧屏真機(jī) 通過此示例應(yīng)用體驗如何使用分布式任務(wù)調(diào)度2. 您需要什么硬件要求操作系統(tǒng)：Windows1064位發(fā)表于 09-18 09:21 HarmonyOS 分布式親子教育——操作演示《HarmonyOS 分布式親子教育》操作演示發(fā)表于 06-06 15:32 各種分布式電源的電氣特性 PS：滲透率的概念：從字面上理解，“滲透”就是由分布式電源發(fā)出的功率進(jìn)入（滲入）到配電系統(tǒng)，所謂的“率”就是由分布式電源發(fā)出的電和整個系統(tǒng)所消耗的電（或者說總發(fā)電量）的一個比值。各種發(fā)表于 07-12 07:54 HDC2021技術(shù)分論壇：跨端分布式計算技術(shù)初探設(shè)備協(xié)同計算和資源分擔(dān)以及實時的任務(wù)調(diào)度。如圖1所示，跨端分布式計算的目標(biāo)是：能隨時方便的發(fā)現(xiàn)和啟用周邊閑置的設(shè)備將周邊的設(shè)備組建成算力和差異化功能的資源池為用戶的高體驗應(yīng)用提供隨需算發(fā)表于 11-15 14:54 OpenHarmony分布式軟總線流程分析 OpenHarmony分布式軟總線流程分析，大神總結(jié)，大家可以下載去學(xué)習(xí)了~.~ 發(fā)表于 11-19 15:56 HDC2021技術(shù)分論壇：跨端分布式計算技術(shù)初探的網(wǎng)絡(luò)環(huán)境下，為實現(xiàn)靈活、高效和穩(wěn)定的跨端分布式計算能力，HarmonyOS為開發(fā)者提供了“融合計算、極簡協(xié)議及秩序化組網(wǎng)”的分布式發(fā)表于 11-23 17:06 如何高效完成HarmonyOS分布式應(yīng)用測試？ Testing從測試標(biāo)準(zhǔn)、測試服務(wù)及云測服務(wù)三個方面提供分布式應(yīng)用測試的解決方案。下面，我們將逐一介紹。1. 測試標(biāo)準(zhǔn)測試標(biāo)準(zhǔn)定義APP的入門級測試要求，重點覆蓋消費者用戶最關(guān)心的HarmonyOS特征發(fā)表于 12-13 18:07 基于潤和DAYU200開發(fā)套件的OpenHarmony分布式音樂播放器：參考DevEco Studio（OpenHarmony）使用指南搭建OpenHarmony應(yīng)用開發(fā)環(huán)境、并導(dǎo)入本工程進(jìn)行編譯、運行。運行結(jié)果截圖：【分布式流轉(zhuǎn)體驗】硬件準(zhǔn)備：準(zhǔn)備兩臺潤和DAYU200開發(fā)板發(fā)表于 03-14 09:07 滿滿干貨！手把手教你實現(xiàn)基于eTS的分布式計算器最近收到很多小伙伴反饋，想基于擴(kuò)展的TS語言（eTS）進(jìn)行HarmonyOS應(yīng)用開發(fā)，但是不知道代碼該從何處寫起，從0到1的過程讓新手們抓狂。本期我們將帶來“ 發(fā)表于 05-23 18:34 基于分布式電源接入對電網(wǎng)運行的影響不同的影響結(jié)果；其次，分析分布式電源接入電網(wǎng)的方式，構(gòu)建配電網(wǎng)典型模型，以作為分布式電源接入影響性分析的基礎(chǔ)；最后通過接入后模型的理論計算，研究分布發(fā)表于 12-18 15:06 ?10次下載如何借助分布式GPU環(huán)境來提升神經(jīng)網(wǎng)絡(luò)訓(xùn)練系統(tǒng)的浮點計算能力雖然近年來 GPU 硬件算力和訓(xùn)練方法上均取得了重大進(jìn)步，但在單一機(jī)器上，網(wǎng)絡(luò)訓(xùn)練所需要的時間仍然長得不切實際，因此需要借助分布式GPU環(huán)境來提升神經(jīng)網(wǎng)絡(luò)訓(xùn)練系統(tǒng)的浮點發(fā)表于 05-28 11:11 ?5167次閱讀 openEuler Summit2021之構(gòu)建歐拉openEuler的分布式能力在openEuler Summit2021分布式&多樣性計算分論壇上，介紹了構(gòu)建歐拉openEuler的分布式能力。發(fā)表于 11-10 15:33 ?1522次閱讀

搜索歷史

全流程演示：如何從0到1構(gòu)建分布式GPU計算環(huán)境

硬件準(zhǔn)備

GPU服務(wù)器選型

高性能計算網(wǎng)選型

軟件準(zhǔn)備

RoCEv2交換機(jī)

GPU服務(wù)器基礎(chǔ)配置

安裝GPU驅(qū)動和集合通訊庫安裝配置

安裝配置

集合通信性能測試方法（all_reduce）

結(jié)果詳解

常用參數(shù)及解釋

實驗測試

獲取LLaMA-Factory源碼包

安裝LLaMA-Factory，并進(jìn)行驗證

下載訓(xùn)練時所需的預(yù)訓(xùn)練模型和數(shù)據(jù)集

使用準(zhǔn)備好的模型與數(shù)據(jù)集，在單機(jī)上進(jìn)行訓(xùn)練測試

推理測試

評論

分布式軟件系統(tǒng)

基于分布式調(diào)用鏈監(jiān)控技術(shù)的全息排查功能

分布式系統(tǒng)的優(yōu)勢是什么？

HarmonyOS應(yīng)用開發(fā)-分布式任務(wù)調(diào)度

HarmonyOS 分布式親子教育——操作演示

各種分布式電源的電氣特性

HDC2021技術(shù)分論壇：跨端分布式計算技術(shù)初探

OpenHarmony分布式軟總線流程分析

HDC2021技術(shù)分論壇：跨端分布式計算技術(shù)初探

如何高效完成HarmonyOS分布式應(yīng)用測試？

基于潤和DAYU200開發(fā)套件的OpenHarmony分布式音樂播放器

滿滿干貨！手把手教你實現(xiàn)基于eTS的分布式計算器

基于分布式電源接入對電網(wǎng)運行的影響

如何借助分布式GPU環(huán)境來提升神經(jīng)網(wǎng)絡(luò)訓(xùn)練系統(tǒng)的浮點計算能力

openEuler Summit2021之構(gòu)建歐拉openEuler的分布式能力

型號	業(yè)務(wù)接口	交換容量
CX864E-N	64 x 800GE OSFP，2 x 10GE SFP+	102.4Tbps
CX732Q-N	32 x 400GE QSFP-DD, 2 x 10GE SFP+	25.6Tbps
CX664D-N	64 x 200GE QSFP56, 2 x 10GE SFP+	25.6Tbps
CX564P-N	64 x 100GE QSFP28, 2 x 10GE SFP+	12.8Tbps
CX532P-N	32 x 100GE QSFP28, 2 x 10GE SFP+	6.4Tbps
CX308P-48Y-N	48 x 25GE SFP28, 8 x 100GE QSFP28	4.0Tbps

搜索歷史

全流程演示：如何從0到1構(gòu)建分布式GPU計算環(huán)境

硬件準(zhǔn)備

GPU服務(wù)器選型

高性能計算網(wǎng)選型

軟件準(zhǔn)備

RoCEv2交換機(jī)

GPU服務(wù)器基礎(chǔ)配置

安裝GPU驅(qū)動和集合通訊庫安裝配置

安裝配置

集合通信性能測試方法（all_reduce）

結(jié)果詳解

常用參數(shù)及解釋

實驗測試

獲取LLaMA-Factory源碼包

安裝LLaMA-Factory，并進(jìn)行驗證

下載訓(xùn)練時所需的預(yù)訓(xùn)練模型和數(shù)據(jù)集

使用準(zhǔn)備好的模型與數(shù)據(jù)集，在單機(jī)上進(jìn)行訓(xùn)練測試

推理測試

評論

安裝LLaMA-Factory，并進(jìn)行驗證