nir: fix lower_memcpy

memcpy is divided into chunks that are vec4 sized max. The problem here happens with a structure of 24 bytes : struct { float3 a; float3 b; } If you memcpy that struct, the lowering will emit 2 load/store, one of sized 8, next one sized 16. But both end up located at offset 0, so we effectively drop 2 floats. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: a3177cca996145 ("nir: Add a lowering pass to lower memcpy") Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15049> (cherry picked from commit 768930a73a43e48172df00b6c934de582bd9422b)
author: Lionel Landwerlin <lionel.g.landwerlin@intel.com> 2022-02-16 23:14:15 +0200
committer: Dylan Baker <dylan.c.baker@intel.com> 2022-02-24 14:56:51 -0800
commit: 5998d19a9605f50f22a498eab1e1c4895c433dcc (patch)
tree: 676bb385555a34c8e113bab33b8b748cd7fd7276
parent: 4aa73d53108d1c2ce4e993cd515c461def2270c5 (diff)
2 files changed, 8 insertions, 5 deletions
diff --git a/.pick_status.json b/.pick_status.json
index abac5691959..09dfa98c269 100644
--- a/.pick_status.json
+++ b/.pick_status.json
@@ -1876,7 +1876,7 @@
         "description": "nir: fix lower_memcpy",
         "nominated": true,
         "nomination_type": 1,
-        "resolution": 0,
+        "resolution": 1,
         "main_sha": null,
         "because_sha": "a3177cca9961452b436b12fd0790c6ffaa8f0eee"
     },
diff --git a/src/compiler/nir/nir_lower_memcpy.c b/src/compiler/nir/nir_lower_memcpy.c
index b7a3f1752cb..768537a3478 100644
--- a/src/compiler/nir/nir_lower_memcpy.c
+++ b/src/compiler/nir/nir_lower_memcpy.c
@@ -111,11 +111,14 @@ lower_memcpy_impl(nir_function_impl *impl)
             uint64_t size = nir_src_as_uint(cpy->src[2]);
             uint64_t offset = 0;
             while (offset < size) {
-               uint64_t remaining = offset - size;
-               /* For our chunk size, we choose the largest power-of-two that
-                * divides size with a maximum of 16B (a vec4).
+               uint64_t remaining = size - offset;
+               /* Find the largest chunk size power-of-two (MSB in remaining)
+                * and limit our chunk to 16B (a vec4). It's important to do as
+                * many 16B chunks as possible first so that the index
+                * computation is correct for
+                * memcpy_(load|store)_deref_elem_imm.
                 */
-               unsigned copy_size = 1u << MIN2(ffsll(remaining) - 1, 4);
+               unsigned copy_size = 1u << MIN2(util_last_bit64(remaining) - 1, 4);
                const struct glsl_type *copy_type =
                   copy_type_for_byte_size(copy_size);
author	Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2022-02-16 23:14:15 +0200
committer	Dylan Baker <dylan.c.baker@intel.com>	2022-02-24 14:56:51 -0800
commit	5998d19a9605f50f22a498eab1e1c4895c433dcc (patch)
tree	676bb385555a34c8e113bab33b8b748cd7fd7276
parent	4aa73d53108d1c2ce4e993cd515c461def2270c5 (diff)