'\" t .\" Title: ROCM Support .\" Author: [see the "AUTHOR" section] .\" Generator: DocBook XSL Stylesheets vsnapshot .\" Date: 05/14/2022 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" .TH "ROCM SUPPORT" "1" "05/14/2022" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * (re)Define some macros .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" toupper - uppercase a string (locale-aware) .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .de toupper .tr aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ \\$* .tr aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz .. .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" SH-xref - format a cross-reference to an SH section .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .de SH-xref .ie n \{\ .\} .toupper \\$* .el \{\ \\$* .\} .. .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" SH - level-one heading that works better for non-TTY output .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .de1 SH .\" put an extra blank line of space above the head in non-TTY output .if t \{\ .sp 1 .\} .sp \\n[PD]u .nr an-level 1 .set-an-margin .nr an-prevailing-indent \\n[IN] .fi .in \\n[an-margin]u .ti 0 .HTML-TAG ".NH \\n[an-level]" .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 \." make the size of the head bigger .ps +3 .ft B .ne (2v + 1u) .ie n \{\ .\" if n (TTY output), use uppercase .toupper \\$* .\} .el \{\ .nr an-break-flag 0 .\" if not n (not TTY), use normal case (not uppercase) \\$1 .in \\n[an-margin]u .ti 0 .\" if not n (not TTY), put a border/line under subheading .sp -.6 \l'\n(.lu' .\} .. .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" SS - level-two heading that works better for non-TTY output .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .de1 SS .sp \\n[PD]u .nr an-level 1 .set-an-margin .nr an-prevailing-indent \\n[IN] .fi .in \\n[IN]u .ti \\n[SN]u .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 .ps \\n[PS-SS]u \." make the size of the head bigger .ps +2 .ft B .ne (2v + 1u) .if \\n[.$] \&\\$* .. .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" BB/EB - put background/screen (filled box) around block of text .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .de BB .if t \{\ .sp -.5 .br .in +2n .ll -2n .gcolor red .di BX .\} .. .de EB .if t \{\ .if "\\$2"adjust-for-leading-newline" \{\ .sp -1 .\} .br .di .in .ll .gcolor .nr BW \\n(.lu-\\n(.i .nr BH \\n(dn+.5v .ne \\n(BHu+.5v .ie "\\$2"adjust-for-leading-newline" \{\ \M[\\$1]\h'1n'\v'+.5v'\D'P \\n(BWu 0 0 \\n(BHu -\\n(BWu 0 0 -\\n(BHu'\M[] .\} .el \{\ \M[\\$1]\h'1n'\v'-.5v'\D'P \\n(BWu 0 0 \\n(BHu -\\n(BWu 0 0 -\\n(BHu'\M[] .\} .in 0 .sp -.5v .nf .BX .in .sp .5v .fi .\} .. .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" BM/EM - put colored marker in margin next to block of text .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .de BM .if t \{\ .br .ll -2n .gcolor red .di BX .\} .. .de EM .if t \{\ .br .di .ll .gcolor .nr BH \\n(dn .ne \\n(BHu \M[\\$1]\D'P -.75n 0 0 \\n(BHu -(\\n[.i]u - \\n(INu - .75n) 0 0 -\\n(BHu'\M[] .in 0 .nf .BX .in .fi .\} .. .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "Name" amdgpu_plugin \- A plugin extension to CRIU to support checkpoint/restore in userspace for AMD GPUs\&. .SH "CURRENT SUPPORT" .sp Single and Multi GPU systems (Gfx9) Checkpoint / Restore on different system Checkpoint / Restore inside a docker container Pytorch Tensorflow Using CRIU Image Streamer .SH "DESCRIPTION" .sp Though \fBcriu\fR is a great tool for checkpointing and restoring running applications, it has certain limitations such as it cannot handle applications that have device files open\&. In order to support \fBROCm\fR based workloads with \fBcriu\fR we need to augment criu\(cqs core functionality with a plugin based extension mechanism\&. \fBamdgpu_plugin\fR provides the necessary support to criu to allow Checkpoint / Restore with ROCm\&. .SS "Dependencies" .PP \fBamdkfd support\fR .RS 4 In order to snapshot the \fBVRAM\fR and other \fBGPU\fR device states, we require an updated version of amdkfd(amdgpu) driver\&. The kernel patches are under review currently\&. .RE .PP \fBcriu 3\&.16\fR .RS 4 This work is rebased on latest criu release available at this time\&. .RE .SH "OPTIONS" .sp Optional parameters can be passed in as environment variables before executing criu command\&. .PP \fBKFD_FW_VER_CHECK\fR .RS 4 Enable or disable firmware version check\&. If enabled, firmware version on restored gpu needs to be greater than or equal firmware version on checkpointed GPU\&. Default:Enabled .sp .if n \{\ .RS 4 .\} .fam C .ps -1 .nf .BB lightgray E\&.g: KFD_FW_VER_CHECK=0 .EB lightgray .fi .fam .ps +1 .if n \{\ .RE .\} .RE .PP \fBKFD_SDMA_FW_VER_CHECK\fR .RS 4 Enable or disable SDMA firmware version check\&. If enabled, SDMA firmware version on restored gpu needs to be greater than or equal firmware version on checkpointed GPU\&. Default:Enabled .sp .if n \{\ .RS 4 .\} .fam C .ps -1 .nf .BB lightgray E\&.g: KFD_SDMA_FW_VER_CHECK=0 .EB lightgray .fi .fam .ps +1 .if n \{\ .RE .\} .RE .PP \fBKFD_CACHES_COUNT_CHECK\fR .RS 4 Enable or disable caches count check\&. If enabled, the caches count on restored GPU needs to be greater than or equal caches count on checkpointed GPU\&. Default:Enabled .sp .if n \{\ .RS 4 .\} .fam C .ps -1 .nf .BB lightgray E\&.g: KFD_CACHES_COUNT_CHECK=0 .EB lightgray .fi .fam .ps +1 .if n \{\ .RE .\} .RE .PP \fBKFD_NUM_GWS_CHECK\fR .RS 4 Enable or disable num_gws check\&. If enabled, the num_gws on restored GPU needs to be greater than or equal num_gws on checkpointed GPU\&. Default:Enabled .sp .if n \{\ .RS 4 .\} .fam C .ps -1 .nf .BB lightgray E\&.g: KFD_NUM_GWS_CHECK=0 .EB lightgray .fi .fam .ps +1 .if n \{\ .RE .\} .RE .PP \fBKFD_VRAM_SIZE_CHECK\fR .RS 4 Enable or disable VRAM size check\&. If enabled, the VRAM size on restored GPU needs to be greater than or equal VRAM size on checkpointed GPU\&. Default:Enabled .sp .if n \{\ .RS 4 .\} .fam C .ps -1 .nf .BB lightgray E\&.g: KFD_VRAM_SIZE_CHECK=0 .EB lightgray .fi .fam .ps +1 .if n \{\ .RE .\} .RE .PP \fBKFD_NUMA_CHECK\fR .RS 4 Enable or disable NUMA CPU region check\&. If enabled, the plugin will restore GPUs that belong to one CPU NUMA region to the same CPU NUMA region\&. Default:Enabled .sp .if n \{\ .RS 4 .\} .fam C .ps -1 .nf .BB lightgray E\&.g: KFD_NUMA_CHECK=1 .EB lightgray .fi .fam .ps +1 .if n \{\ .RE .\} .RE .PP \fBKFD_CAPABILITY_CHECK\fR .RS 4 Enable or disable capability check\&. If enabled, the capability on restored GPU needs to be equal to the capability on the checkpointed GPU\&. Default:Enabled .sp .if n \{\ .RS 4 .\} .fam C .ps -1 .nf .BB lightgray E\&.g: KFD_CAPABILITY_CHECK=1 .EB lightgray .fi .fam .ps +1 .if n \{\ .RE .\} .RE .SH "AUTHOR" .sp The AMDKFD team\&. .SH "COPYRIGHT" .sp Copyright (C) 2020\-2021, Advanced Micro Devices, Inc\&. (AMD)