summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
AgeCommit message (Collapse)AuthorFilesLines
2020-04-27drm/amdgpu: decouple EccErrCnt query and clear operationGuchun Chen1-4/+79
Due to hardware bug that when RSMU UMC index is disabled, clear EccErrCnt at the first UMC instance will clean up all other EccErrCnt registes from other instances at the same time. This will break the correctable error count log in EccErrCnt register once querying it. So decouple both to make error count query workable. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-27drm/amdgpu: switch to SMN interface to operate RSMU index modeGuchun Chen1-5/+24
This makes consistent with other regsiters' access in this module. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06drm/amdgpu: update page retirement sequenceJohn Clements1-2/+5
check UMC status and exit prior to making and erroneus register access this resolved unexpected behaviour with UMC indexing mode broadcasting writes Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06drm/amdgpu: toggle DF-Cstate when accessing UMC ras error related registersGuchun Chen1-0/+16
On arcturus, DF-Cstate needs to be toggled off/on before and after accessing UMC error counter and error address registers, otherwise, clearing such registers may fail. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-14drm/amdgpu: preserve RSMU UMC index mode stateJohn Clements1-2/+41
between UMC RAS err register access restore previous RSMU UMC index mode state Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-14drm/amdgpu: calculate MCUMC_ADDRT0 per asic's UMC offsetGuchun Chen1-4/+6
Hardcoded offset is not friendly. And another benifit of this patch is to keep read and write access to this register be consistent with other similar UMC regsiters in this file. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07drm/amdgpu: updated UMC error address record with correct channel indexJohn Clements1-33/+32
defined macros for repetitive for loops Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07drm/amdgpu: resolved bug in UMC RAS CE queryJohn Clements1-12/+20
switch CE counter register access' to use SMN disable UMC indexing mode Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07drm/amdgpu: resolve bug in UMC 6 error counter queryJohn Clements1-55/+64
iterate over all error counter registers in SMN space removed support error counter access via MMIO Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07drm/amdgpu: update UMC 6.1 RAS error counter register access pathJohn Clements1-5/+5
use proper method for SMN register access Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18drm/amdgpu: move umc offset to one new header file for ArcturusGuchun Chen1-16/+1
Code refactor and no functional change. Fixes: 4cf781c24c3b ("drm/amdgpu: Added RAS UMC error query support for Arcturus") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-11drm/amdgpu: Added RAS UMC error query support for ArcturusJohn Clements1-14/+64
Updated UMC 6.1 function set to support UMC 6.1.1 and 6.1.2 devices Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-03drm/amdgpu: use GPU PAGE SHIFT for umc retired pageTao Zhou1-1/+1
umc retired page belongs to vram and it should be aligned to gpu page size Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16drm/amdgpu: rename umc ras_init to err_cnt_initTao Zhou1-4/+4
this interface is related to specific version of umc, distinguish it from ras_late_init Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16drm/amdgpu: move umc late init from gmc to umc blockTao Zhou1-0/+1
umc late init is umc specific, it's more suitable to be put in umc block Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13drm/amdgpu: save umc error recordsTao Zhou1-7/+32
save umc error records to ras bad page array v2: add bad pages before gpu reset v3: add NULL check for adev->umc.funcs Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-09drm/amdgpu: implement UMC 64 bits REG operationsTao Zhou1-5/+5
implement 64 bits operations via 32 bits interface v2: make use of lower_32_bits() and upper_32_bits() macros Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: update the calc algorithm of umc ecc error countTao Zhou1-4/+6
the initial value of ecc error count can be adjusted Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: implement umc ras init functionTao Zhou1-0/+32
enable umc ce interrupt and initialize ecc error count Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: apply umc_for_each_channel macro to umc_6_1Tao Zhou1-56/+28
use umc_for_each_channel to make code simpler Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: initialize new parameters and functions for amdgpu_umc structureTao Zhou1-1/+9
add initialization for new members of amdgpu_umc structure Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: remove the clear of MCA_ADDRTao Zhou1-2/+0
clearing MCA_STATUS is enough to reset the whole MCA, writing zero to MCA_ADDR is unnecessary Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: query umc ras error addressTao Zhou1-0/+80
query umc ras error address, translate it to gpu 4k page view and save it. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add structures for umc error address translationTao Zhou1-0/+10
add related registers, callback function and channel index table Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: update algorithm of umc uncorrectable error countingTao Zhou1-6/+6
remove the check of ErrorCodeExt v2: refine the if condition for ue counting Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: use 64bit operation macros for umcTao Zhou1-17/+8
replace some 32bit macros with 64bit operations to simplify code Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add umc v6_1 query error count supportHawking Zhang1-0/+162
Implement umc query_ras_error_count function to support querry both correctable and uncorrectable error Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>