summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-26drm/amdgpu: fix uninitialized scalar variable warningTim Huang1-1/+2
Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20drm/amdgpu: correct the KGQ fallback messagePrike Liang1-1/+1
Fix the KGQ fallback function name, as this will help differentiate the failure in the KCQ enablement. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-08Merge tag 'amd-drm-next-6.9-2024-03-01' of ↵Dave Airlie1-3/+3
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-03-01: amdgpu: - GC 11.5.1 updates - Misc display cleanups - NBIO 7.9 updates - Backlight fixes - DMUB fixes - MPO fixes - atomfirmware table updates - SR-IOV fixes - VCN 4.x updates - use RMW accessors for pci config registers - PSR fixes - Suspend/resume fixes - RAS fixes - ABM fixes - Misc code cleanups - SI DPM fix - Revert freesync video amdkfd: - Misc cleanups - Error handling fixes radeon: - use RMW accessors for pci config registers From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240301204857.13960-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-02-22drm/amdgpu: Drop redundant parameter in amdgpu_gfx_kiq_init_ringMa Jun1-3/+3
Drop redundant parameters in function amdgpu_gfx_kiq_init_ring to simplify the code Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22Merge tag 'amd-drm-next-6.9-2024-02-19' of ↵Dave Airlie1-1/+8
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-02-19: amdgpu: - ATHUB 4.1 support - EEPROM support updates - RAS updates - LSDMA 7.0 support - JPEG DPG support - IH 7.0 support - HDP 7.0 support - VCN 5.0 support - Misc display fixes - Retimer fixes - DCN 3.5 fixes - VCN 4.x fixes - PSR fixes - PSP 14.0 support - VA_RESERVED cleanup - SMU 13.0.6 updates - NBIO 7.11 updates - SDMA 6.1 updates - MMHUB 3.3 updates - Suspend/resume fixes - DMUB updates amdkfd: - Trap handler enhancements - Fix cache size reporting - Relocate the trap handler radeon: - fix typo in print statement Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219214810.4911-1-alexander.deucher@amd.com
2024-02-13Revert "drm/amd: flush any delayed gfxoff on suspend entry"Mario Limonciello1-1/+8
commit ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") caused GFXOFF control to be used more heavily and the codepath that was removed from commit 0dee72639533 ("drm/amd: flush any delayed gfxoff on suspend entry") now can be exercised at suspend again. Users report that by using GNOME to suspend the lockscreen trigger will cause SDMA traffic and the system can deadlock. This reverts commit 0dee726395333fea833eaaf838bc80962df886c8. Acked-by: Alex Deucher <alexander.deucher@amd.com> Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-13Merge tag 'amd-drm-next-6.9-2024-02-09' of ↵Dave Airlie1-2/+2
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-02-09: amdgpu: - Validate DMABuf imports in compute VMs - Add RAS ACA framework - PSP 13 fixes - Misc code cleanups - Replay fixes - Atom interpretor PS, WS bounds checking - DML2 fixes - Audio fixes - DCN 3.5 Z state fixes - Remove deprecated ida_simple usage - UBSAN fixes - RAS fixes - Enable seq64 infrastructure - DC color block enablement - Documentation updates - DC documentation updates - DMCUB updates - S3 fixes - VCN 4.0.5 fixes - DP MST fixes - SR-IOV fixes amdkfd: - Validate DMABuf imports in compute VMs - SVM fixes - Trap handler updates radeon: - Atom interpretor PS, WS bounds checking - Misc code cleanups UAPI: - Bump KFD version so UMDs know that the fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present - Add INFO query for input power. This matches the existing INFO query for average power. Used in gaming HUDs, etc. Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240209221459.5453-1-alexander.deucher@amd.com
2024-01-31drm/amdgpu: prefer snprintf over sprintfJani Nikula1-1/+2
This will trade the W=1 warning -Wformat-overflow to -Wformat-truncation. This lets us enable -Wformat-overflow subsystem wide. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pan, Xinhui <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fea7a52924f98b1ac24f4a7e6ba21d7754422430.1704908087.git.jani.nikula@intel.com
2024-01-22drm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()'Srinivasan Shanmugam1-2/+2
Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting Cc: Le Ma <Le.Ma@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)Victor Lu1-4/+4
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0. Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter. Using amdgpu_sriov_runtime to determine whether to access via kiq or RLC is sufficient for now. v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call v4: avoid using amdgpu_sriov_w/rreg v3: use W/RREG32_XCC to handle non-kiq case v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters of amdgpu_device_wreg/rreg Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64Alex Deucher1-0/+2
Issues were reported with commit 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere Altra Developer Platform (AVA developer platform). Various ARM systems seem to have problems related to PCIe and MMIO access. In this case, I'm not sure if this is specific to the ADLINK platform or ARM in general. Seems to be some coherency issue with VRAM. For now, just don't put MQDs in VRAM on ARM. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html Fixes: 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: alexey.klimov@linaro.org
2023-10-19drm/amdgpu: Workaround to skip kiq ring test during ras gpu recoveryStanley.Yang1-0/+21
This is workaround, kiq ring test failed in suspend stage when do ras recovery. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20drm/amdgpu: Use function for IP version checkLijo Lazar1-2/+2
Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16drm/amd: flush any delayed gfxoff on suspend entryMario Limonciello1-8/+1
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity is happening during entry. This is because GFXOFF was scheduled as delayed but RLC gets disabled in s2idle entry sequence which will hang GFX IP if not already in GFXOFF. To help this problem, flush any delayed work for GFXOFF early in s2idle entry sequence to ensure that it's off when RLC is changed. commit 4b31b92b143f ("drm/amdgpu: complete gfxoff allow signal during suspend without delay") modified power gating flow so that if called in s0ix that it ensured that GFXOFF wasn't put in work queue but instead processed immediately. This is dead code due to commit 10cb67eb8a1b ("drm/amdgpu: skip CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly called as part of the suspend entry code. Remove that dead code. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amdgpu: Add -ENOMEM error handling when there is no memorySrinivasan Shanmugam1-7/+10
Return -ENOMEM, when there is no sufficient dynamically allocated memory Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amdgpu: Return -ENOMEM when there is no memory in 'amdgpu_gfx_mqd_sw_init'Srinivasan Shanmugam1-3/+8
Return -ENOMEM, when there is no sufficient dynamically allocated memory to create MQD backup for ring Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Fix unused variable in amdgpu_gfx.cSrinivasan Shanmugam1-3/+3
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kcq’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:497:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable] 497 | int j; | ^ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kgq’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:528:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable] 528 | int j; | ^ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_enable_kgq’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:630:12: warning: variable ‘j’ set but not used [-Wunused-but-set-variable] 630 | int r, i, j; | Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix the memory override in kiq ring structShiwu Zhang1-2/+2
This is introduced by the code merge and will let the adev->gfx.kiq[0].ring struct being overrided Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix S3 issue if MQD in VRAMJack Xiao1-0/+4
1. Need flush HDP for MQD putting in vram 2. Zero out mes MQD Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add GFX RAS common functionTao Zhou1-0/+19
The common function can help reduce redundant code. v2: remove xcp operation, only need to do RAS operations for all instances. v3: remove check for GFX RAS support, will be checked in higher level. add amdgpu prefix for the function name. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add compute mode descriptor functionLijo Lazar1-23/+1
Keep a helper function to get description of compute partition mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Move memory partition query to gmcLijo Lazar1-29/+1
GMC block handles memory related information, it makes more sense to keep memory partition functions in gmc block. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add flags for partition mode queryLijo Lazar1-1/+2
It's not required to take lock on all cases while querying partition mode. Querying partition mode during KFD init process doesn't need to take a lock. Init process after a switch will already be happening under lock. Control the behaviour by adding flags to xcp_query_partition_mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Use unique doorbell range per xccLijo Lazar1-1/+4
Program different ranges in each XCC with MEC_DOORBELL_RANGE_LOWER/HIGHER. Keeping the same range causes CPF in other XCCs also to be busy when an IB packet is submitted to KCQ. Only the XCC which processes the packet comes back to idle afterwards and this causes other CPs not be idle. This in turn affects clockgating behavior as RLC doesn't get idle interrupt. LOWER/HIGHER covers only KIQ/KCQs which are per XCC queues. Assigning different ranges doesn't seem to have any side effect as user queue ranges are outside of this range. User queue tests - PM4 through KFD and AQL through rocr - have the same results after this change. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Switch to SOC partition funcsLijo Lazar1-27/+4
For GFXv9.4.3, use SOC level partition switch implementation rather than keeping them at GFX IP level. Change the exisiting implementation in GFX IP for keeping partition mode and restrict it to only GFX related switch. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: remove partition attributes sys file for gfx_v9_4_3Shiwu Zhang1-0/+7
For driver de-init like rmmod operations those partition specific attributes need to be removed accordingly. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix kcq mqd_backup buffer double free for multi-XCDShiwu Zhang1-1/+0
For gfx_v9_4_3 and beyond, struct kiq has its own mqd_backup pointer rather than using the last pointer from mec struct. Then the kfree operation on the pointer from the mec struct should be removed otherwise it will cause double free on the first kcq's mqd_backup buffer on XCD1. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: detect current GPU memory partition modeRajneesh Bhardwaj1-0/+25
- Add helpers to detect the current GPU memory partition. - Add current memory partition mode sysfs node. Tested-by: Ori Messinger <Ori.Messinger@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Fix failure when switching to DPX modeMukul Joshi1-1/+5
Fix the if condition which causes dynamic repartitioning to fail when trying to switch to DPX mode. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Add device repartition supportMukul Joshi1-1/+21
GFX9.4.3 will support dynamic repartitioning of the GPU through sysfs. Add device repartitioning support in KFD to repartition GPU from one mode to other. v2: squash in fix ("drm/amdkfd: Fix warning kgd2kfd_unlock_kfd defined but not used") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Change num_xcd to xcc_maskLijo Lazar1-10/+11
Instead of number of XCCs, keep a mask of XCCs for the exact XCCs available on the ASIC. XCC configuration could differ based on different ASIC configs. v2: Rename num_xcd to num_xcc (Hawking) Use smaller xcc_mask size, changed to u16 (Le) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: introduce new doorbell assignment table for GC 9.4.3Le Ma1-5/+1
Four basic reasons as below to do the change: 1. number of ring expand a lot on GC 9.4.3, and adjustment on old assignment cannot make each ring in a continuous doorbell space. 2. the SDMA doorbell index should not exceed 0x1FF on SDMA 4.2.2 due to regDOORBELLx_CTRL_ENTRY.BIF_DOORBELLx_RANGE_OFFSET_ENTRY field width. 3. re-design the doorbell assignment and unify the calculation as "start + ring/inst id" will make the code much concise. 4. only defining the START/END makes the table look simple v2: (Lijo) 1. replace name 2. use num_inst_per_aid/sdma_doorbell_range instead of hardcoding Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: convert the doorbell_index to 2 dwords offset for kiqLe Ma1-4/+3
KIQ doorbell_index is non-zero from XCC1, thus need to left-shift it like other rings. Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: enable the ring and IB test for slave kcqShiwu Zhang1-32/+27
With the mec FW update to utilize the mqd base set by driver for kcq mapping, slave kcq ring test and IB test can be re-enabled. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add sysfs node for compute partition modeLe Ma1-0/+132
Add current/available compute partitin mode sysfs node. v2: make the sysfs node as IP independent one in amdgpu_gfx.c Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: assign different AMDGPU_GFXHUB for rings on each xccLe Ma1-1/+1
Pass the xcc_id to AMDGPU_GFXHUB(x) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: introduce vmhub definition for multi-partition cases (v3)Hawking Zhang1-1/+1
v1: Each partition has its own gfxhub or mmhub. adjust the num of MAX_VMHUBS and the GFXHUB/MMHUB layout (Le) v2: re-design the AMDGPU_GFXHUB/AMDGPU_MMHUB layout (Le) v3: apply the gfxhub/mmhub layout to new IPs (Hawking) v4: fix up gmc11 (Alex) v5: rebase (Alex) Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: release correct lock in amdgpu_gfx_enable_kgq()Dan Carpenter1-1/+1
This function was releasing the incorrect lock on the error path. Reported-by: kernel test robot <lkp@intel.com> Fixes: 1156e1a60f02 ("drm/amdgpu: add [en/dis]able_kgq() functions") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcsHoratio Zhang1-3/+5
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still use amdgpu_irq_put to disable this interrupt, which caused the call trace in this function. [ 102.873958] Call Trace: [ 102.873959] <TASK> [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375] process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0 [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379] kthread+0xfd/0x130 [ 102.874380] ? kthread_complete_and_exit+0x20/0x20 [ 102.874381] ret_from_fork+0x22/0x30 v2: - Handle umc and gfx ras cases in separated patch - Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11 v3: - Improve the subject and code comments - Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init v4: - Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state - Check cp_ecc_error_irq.funcs rather than ip version for a more sustainable life v5: - Simplify judgment conditions Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: unlock the correct lock in amdgpu_gfx_enable_kcq()Dan Carpenter1-1/+1
We changed which lock we are supposed to take but this error path was accidentally over looked so it still drops the old lock. Fixes: def799c6596d ("drm/amdgpu: add multi-xcc support to amdgpu_gfx interfaces (v4)") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: put MQDs in VRAMAlex Deucher1-2/+7
Reduces preemption latency. Only enable this for gfx10 and 11 for now to avoid changing behavior on gfx 8 and 9. v2: move MES MQDs into VRAM as well (YuBiao) v3: enable on gfx10, 11 only (Alex) v4: minor style changes, document why gfx10/11 only (Alex) Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add [en/dis]able_kgq() functionsAlex Deucher1-0/+68
To replace the IP specific variants which are largely duplicate. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: check correct allocated mqd_backup object after allocGuchun Chen1-4/+5
Instead of the default one, check the right mqd_backup object. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Cc: Le Ma <le.ma@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix a build warning by a typo in amdgpu_gfx.cGuchun Chen1-2/+2
This should be a typo when intruducing multi-xx support. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Cc: Le Ma <le.ma@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-24drm/amdgpu: track MQD size for gfx and computeAlex Deucher1-0/+2
It varies by generation and we need to know the size to expose this via debugfs. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-21drm/amdgpu: allocate doorbell index for multi-die caseLe Ma1-0/+5
Allocate different doorbell index for kiq/kcq rings on each die Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-20drm/amdgpu: add master/slave check in init phaseLe Ma1-23/+36
Skip KCQ setup on slave xcc as there's no use case. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu: add multi-xcc support to amdgpu_gfx interfaces (v4)Le Ma1-33/+42
v1: Modify kiq_init/fini, mqd_sw_init/fini and enable/disable_kcq to adapt to multi-die case. Pass 0 as default to all asics with single xcc (Le) v2: squash commits to avoid breaking the build (Le) v3: unify naming style (Le) v4: apply the changes to gc v11_0 (Hawking) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu: move queue_bitmap to an independent structure (v3)Le Ma1-16/+25
To allocate independent queue_bitmap for each XCD, then the old bitmap policy can be continued to use with a clear logic. Use mec_bitmap[0] as default for all non-GC 9.4.3 IPs. v2: squash commits to avoid breaking the build v3: unify naming style Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu: convert gfx.kiq to array type (v3)Le Ma1-17/+17
v1: more kiq instances are a available in SOC (Le) v2: squash commits to avoid breaking the build (Le) v3: make the conversion for gfx/mec v11_0 (Hawking) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>