path: root/include/uapi/drm
diff options
authorTvrtko Ursulin <>2017-11-21 18:18:45 +0000
committerTvrtko Ursulin <>2017-11-22 11:24:57 +0000
commitb46a33e271ed81bd765c632b972c49d5b44729c7 (patch)
treec809e32922d8e9d79b020f7fdbaedeae5468d4e1 /include/uapi/drm
parentc84b27054643db73e56a5e933191bda435b6c46e (diff)
drm/i915/pmu: Expose a PMU interface for perf queries
From: Chris Wilson <> From: Tvrtko Ursulin <> From: Dmitry Rogozhkin <> The first goal is to be able to measure GPU (and invidual ring) busyness without having to poll registers from userspace. (Which not only incurs holding the forcewake lock indefinitely, perturbing the system, but also runs the risk of hanging the machine.) As an alternative we can use the perf event counter interface to sample the ring registers periodically and send those results to userspace. Functionality we are exporting to userspace is via the existing perf PMU API and can be exercised via the existing tools. For example: perf stat -a -e i915/rcs0-busy/ -I 1000 Will print the render engine busynnes once per second. All the performance counters can be enumerated (perf list) and have their unit of measure correctly reported in sysfs. v1-v2 (Chris Wilson): v2: Use a common timer for the ring sampling. v3: (Tvrtko Ursulin) * Decouple uAPI from i915 engine ids. * Complete uAPI defines. * Refactor some code to helpers for clarity. * Skip sampling disabled engines. * Expose counters in sysfs. * Pass in fake regs to avoid null ptr deref in perf core. * Convert to class/instance uAPI. * Use shared driver code for rc6 residency, power and frequency. v4: (Dmitry Rogozhkin) * Register PMU with .task_ctx_nr=perf_invalid_context * Expose cpumask for the PMU with the single CPU in the mask * Properly support pmu->stop(): it should call pmu->read() * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE) * Introduce refcounting of event subscriptions. * Make pmu.busy_stats a refcounter to avoid busy stats going away with some deleted event. * Expose cpumask for i915 PMU to avoid multiple events creation of the same type followed by counter aggregation by perf-stat. * Track CPUs getting online/offline to migrate perf context. If (likely) cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be needed to see effect of CPU status tracking. * End result is that only global events are supported and perf stat works correctly. * Deny perf driver level sampling - it is prohibited for uncore PMU. v5: (Tvrtko Ursulin) * Don't hardcode number of engine samplers. * Rewrite event ref-counting for correctness and simplicity. * Store initial counter value when starting already enabled events to correctly report values to all listeners. * Fix RC6 residency readout. * Comments, GPL header. v6: * Add missing entry to v4 changelog. * Fix accounting in CPU hotplug case by copying the approach from arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin) v7: * Log failure message only on failure. * Remove CPU hotplug notification state on unregister. v8: * Fix error unwind on failed registration. * Checkpatch cleanup. v9: * Drop the energy metric, it is available via intel_rapl_perf. (Ville Syrjälä) * Use HAS_RC6(p). (Chris Wilson) * Handle unsupported non-engine events. (Dmitry Rogozhkin) * Rebase for intel_rc6_residency_ns needing caller managed runtime pm. * Drop HAS_RC6 checks from the read callback since creating those events will be rejected at init time already. * Add counter units to sysfs so perf stat output is nicer. * Cleanup the attribute tables for brevity and readability. v10: * Fixed queued accounting. v11: * Move intel_engine_lookup_user to intel_engine_cs.c * Commit update. (Joonas Lahtinen) v12: * More accurate sampling. (Chris Wilson) * Store and report frequency in MHz for better usability from perf stat. * Removed metrics: queued, interrupts, rc6 counters. * Sample engine busyness based on seqno difference only for less MMIO (and forcewake) on all platforms. (Chris Wilson) v13: * Comment spelling, use mul_u32_u32 to work around potential GCC issue and somne code alignment changes. (Chris Wilson) v14: * Rebase. v15: * Rebase for RPS refactoring. v16: * Use the dynamic slot in the CPU hotplug state machine so that we are free to setup our state as multi-instance. Previously we were re-using the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as multi-instance, nor owned by our driver to start with. * Register the CPU hotplug handlers after the PMU, otherwise the callback will get called before the PMU is initialized which can end up in perf_pmu_migrate_context with an un-initialized base. * Added workaround for a probable bug in cpuhp core. v17: * Remove workaround for the cpuhp bug. v18: * Rebase for drm_i915_gem_engine_class getting upstream before us. v19: * Rebase. (trivial) Signed-off-by: Chris Wilson <> Signed-off-by: Tvrtko Ursulin <> Signed-off-by: Dmitry Rogozhkin <> Cc: Tvrtko Ursulin <> Cc: Chris Wilson <> Cc: Dmitry Rogozhkin <> Cc: Peter Zijlstra <> Reviewed-by: Chris Wilson <> Signed-off-by: Tvrtko Ursulin <> Link:
Diffstat (limited to 'include/uapi/drm')
1 files changed, 39 insertions, 0 deletions
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index b57985929553..40e7b438bdaa 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -102,6 +102,45 @@ enum drm_i915_gem_engine_class {
+ * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
+ *
+ */
+enum drm_i915_pmu_engine_sample {
+ I915_SAMPLE_BUSY = 0,
+ I915_SAMPLE_WAIT = 1,
+ I915_SAMPLE_SEMA = 2,
+ I915_ENGINE_SAMPLE_MAX /* non-ABI */
+#define I915_PMU_SAMPLE_BITS (4)
+#define I915_PMU_SAMPLE_MASK (0xf)
+#define I915_PMU_CLASS_SHIFT \
+#define __I915_PMU_ENGINE(class, instance, sample) \
+ ((class) << I915_PMU_CLASS_SHIFT | \
+ (instance) << I915_PMU_SAMPLE_BITS | \
+ (sample))
+#define I915_PMU_ENGINE_BUSY(class, instance) \
+ __I915_PMU_ENGINE(class, instance, I915_SAMPLE_BUSY)
+#define I915_PMU_ENGINE_WAIT(class, instance) \
+ __I915_PMU_ENGINE(class, instance, I915_SAMPLE_WAIT)
+#define I915_PMU_ENGINE_SEMA(class, instance) \
+ __I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
+#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
/* Each region is a minimum of 16k, and there are at most 255 of them.
#define I915_NR_TEX_REGIONS 255 /* table size 2k - maximum due to use