17 files changed, 547 insertions, 292 deletions
diff --git a/docs/_static/specs/EGL_MESA_x11_native_visual_id.txt b/docs/_static/specs/EGL_MESA_x11_native_visual_id.txt
new file mode 100644
index 00000000000..de30c399ef0
--- /dev/null
+++ b/docs/_static/specs/EGL_MESA_x11_native_visual_id.txt
@@ -0,0 +1,80 @@
+Name
+
+    MESA_x11_native_visual_id
+
+Name Strings
+
+    EGL_MESA_x11_native_visual_id
+
+Contact
+
+    Eric Engestrom <eric@engestrom.ch>
+
+Status
+
+    Complete, shipping.
+
+Version
+
+    Version 2, May 10, 2024
+
+Number
+
+    EGL Extension #TBD
+
+Extension Type
+
+    EGL display extension
+
+Dependencies
+
+    None.  This extension is written against the
+    wording of the EGL 1.5 specification.
+
+Overview
+
+    This extension allows EGL_NATIVE_VISUAL_ID to be used in
+    eglChooseConfig() for a display of type EGL_PLATFORM_X11_EXT.
+
+IP Status
+
+    Open-source; freely implementable.
+
+New Types
+
+    None
+
+New Procedures and Functions
+
+    None
+
+New Tokens
+
+    None
+
+In section 3.4.1.1 "Selection of EGLConfigs" of the EGL 1.5
+Specification, replace:
+
+    If EGL_MAX_PBUFFER_WIDTH, EGL_MAX_PBUFFER_HEIGHT,
+    EGL_MAX_PBUFFER_PIXELS, or EGL_NATIVE_VISUAL_ID are specified in
+    attrib list, then they are ignored [...]
+
+with:
+
+    If EGL_MAX_PBUFFER_WIDTH, EGL_MAX_PBUFFER_HEIGHT,
+    or EGL_MAX_PBUFFER_PIXELS are specified in attrib list, then they
+    are ignored [...]. EGL_NATIVE_VISUAL_ID is ignored except on
+    a display of type EGL_PLATFORM_X11_EXT when EGL_ALPHA_SIZE is
+    greater than zero.
+
+Issues
+
+    None.
+
+Revision History
+
+    Version 1, March 27, 2024 (Eric Engestrom)
+        Initial draft
+    Version 2, May 10, 2024 (David Heidelberg)
+	add EGL_ALPHA_SIZE condition
+	add Extension type and set it to display extension
diff --git a/docs/_static/specs/OLD/WL_create_wayland_buffer_from_image.spec b/docs/_static/specs/WL_create_wayland_buffer_from_image.spec
index aa5eb4d24d9..aa5eb4d24d9 100644
--- a/docs/_static/specs/OLD/WL_create_wayland_buffer_from_image.spec
+++ b/docs/_static/specs/WL_create_wayland_buffer_from_image.spec
diff --git a/docs/android.rst b/docs/android.rst
index 3ac75171e57..0034706bb75 100644
--- a/docs/android.rst
+++ b/docs/android.rst
@@ -34,8 +34,8 @@ Then, create your Meson cross file to use it, something like this
 
     [host_machine]
     system = 'android'
-    cpu_family = 'arm'
-    cpu = 'aarch64'
+    cpu_family = 'aarch64'
+    cpu = 'armv8'
     endian = 'little'
 
 Now, use that cross file for your Android build directory (as in this
diff --git a/docs/drivers/panfrost.rst b/docs/drivers/panfrost.rst
index 7fc1a32e9f0..a8b63da2441 100644
--- a/docs/drivers/panfrost.rst
+++ b/docs/drivers/panfrost.rst
@@ -3,33 +3,31 @@ Panfrost
 
 The Panfrost driver stack includes an OpenGL ES implementation for Arm Mali
 GPUs based on the Midgard and Bifrost microarchitectures. It is **conformant**
-on Mali-G52 and Mali-G57 but **non-conformant** on other GPUs. The following
-hardware is currently supported:
-
-=========  ============= ============ =======
-Product    Architecture  OpenGL ES    OpenGL
-=========  ============= ============ =======
-Mali T600  Midgard (v4)  2.0          2.1
-Mali T620  Midgard (v4)  2.0          2.1
-Mali T720  Midgard (v4)  2.0          2.1
-Mali T760  Midgard (v5)  3.1          3.1
-Mali T820  Midgard (v5)  3.1          3.1
-Mali T830  Midgard (v5)  3.1          3.1
-Mali T860  Midgard (v5)  3.1          3.1
-Mali T880  Midgard (v5)  3.1          3.1
-Mali G72   Bifrost (v6)  3.1          3.1
-Mali G31   Bifrost (v7)  3.1          3.1
-Mali G51   Bifrost (v7)  3.1          3.1
-Mali G52   Bifrost (v7)  3.1          3.1
-Mali G76   Bifrost (v7)  3.1          3.1
-Mali G57   Valhall (v9)  3.1          3.1
-Mali G310  Valhall (v10) 3.1          3.1
-Mali G610  Valhall (v10) 3.1          3.1
-=========  ============= ============ =======
+on `Mali-G52 <https://www.khronos.org/conformance/adopters/conformant-products/opengles#submission_949>`_
+and `Mali-G57 <https://www.khronos.org/conformance/adopters/conformant-products/opengles#submission_980>`_
+but **non-conformant** on other GPUs. The following hardware is currently
+supported:
+
++--------------------+---------------+-----------+--------+
+| Models             | Architecture  | OpenGL ES | OpenGL |
++====================+===============+===========+========+
+| T600, T620, T720   | Midgard (v4)  | 2.0       | 2.1    |
++--------------------+---------------+-----------+--------+
+| T760, T820, T830   | Midgard (v5)  | 3.1       | 3.1    |
+| T860, T880         |               |           |        |
++--------------------+---------------+-----------+--------+
+| G72                | Bifrost (v6)  | 3.1       | 3.1    |
++--------------------+---------------+-----------+--------+
+| G31, G51, G52, G76 | Bifrost (v7)  | 3.1       | 3.1    |
++--------------------+---------------+-----------+--------+
+| G57                | Valhall (v9)  | 3.1       | 3.1    |
++--------------------+---------------+-----------+--------+
+| G310, G610         | Valhall (v10) | 3.1       | 3.1    |
++--------------------+---------------+-----------+--------+
 
 Other Midgard and Bifrost chips (e.g. G71) are not yet supported.
 
-Older Mali chips based on the Utgard architecture (Mali 400, Mali 450) are
+Older Mali chips based on the Utgard architecture (Mali-400, Mali-450) are
 supported in the :doc:`Lima <lima>` driver, not Panfrost. Lima is also
 available in Mesa.
 
@@ -61,255 +59,12 @@ Panfrost developers and users hang out on IRC at ``#panfrost`` on OFTC. Note
 that registering and authenticating with ``NickServ`` is required to prevent
 spam. `Join the chat. <https://webchat.oftc.net/?channels=panfrost>`_
 
-Compressed texture support
---------------------------
-
-In the driver, Panfrost supports ASTC, ETC, and all BCn formats (e.g. RGTC,
-S3TC, etc.) However, Panfrost depends on the hardware to support these formats
-efficiently.  All supported Mali architectures support these formats, but not
-every system-on-chip with a Mali GPU support all these formats. Many lower-end
-systems lack support for some BCn formats, which can cause problems when playing
-desktop games with Panfrost. To check whether this issue applies to your
-system-on-chip, Panfrost includes a ``panfrost_texfeatures`` tool to query
-supported formats.
-
-To use this tool, include the option ``-Dtools=panfrost`` when configuring Mesa.
-Then inside your Mesa build directory, the tool is located at
-``src/panfrost/tools/panfrost_texfeatures``. Copy it to your target device,
-set as executable as necessary, and run on the target device. A table of
-supported formats will be printed to standard output.
-
-drm-shim
---------
-
-Panfrost implements ``drm-shim``, stubbing out the Panfrost kernel interface.
-Use cases for this functionality include:
-
-- Future hardware bring up
-- Running shader-db on non-Mali workstations
-- Reproducing compiler (and some driver) bugs without Mali hardware
-
-Although Mali hardware is usually paired with an Arm CPU, Panfrost is portable C
-code and should work on any Linux machine. In particular, you can test the
-compiler on shader-db on an Intel desktop.
-
-To build Mesa with Panfrost drm-shim, configure Meson with
-``-Dgallium-drivers=panfrost`` and ``-Dtools=drm-shim``. See the above
-building section for a full invocation. The drm-shim binary will be built to
-``build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so``.
-
-To use, set the ``LD_PRELOAD`` environment variable to the drm-shim binary.  It
-may also be necessary to set ``LIBGL_DRIVERS_PATH`` to the location where Mesa
-was installed.
-
-By default, drm-shim mocks a Mali-G52 system. To select a specific Mali GPU,
-set the ``PAN_GPU_ID`` environment variable to the desired GPU ID:
-
-=========  ============= =======
-Product    Architecture  GPU ID
-=========  ============= =======
-Mali-T720  Midgard (v4)  720
-Mali-T860  Midgard (v5)  860
-Mali-G72   Bifrost (v6)  6221
-Mali-G52   Bifrost (v7)  7212
-Mali-G57   Valhall (v9)  9093
-Mali-G610  Valhall (v10) a867
-=========  ============= =======
-
-Additional GPU IDs are enumerated in the ``panfrost_model_list`` list in
-``src/panfrost/lib/pan_props.c``.
-
-As an example: assuming Mesa is installed to a local path ``~/lib`` and Mesa's
-build directory is ``~/mesa/build``, a shader can be compiled for Mali-G52 as:
-
-.. code-block:: sh
-
-   ~/shader-db$ BIFROST_MESA_DEBUG=shaders \
-   LIBGL_DRIVERS_PATH=~/lib/dri/ \
-   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
-   PAN_GPU_ID=7212 \
-   ./run shaders/glmark/1-1.shader_test
-
-The same shader can be compiled for Mali-T720 as:
-
-.. code-block:: sh
-
-   ~/shader-db$ MIDGARD_MESA_DEBUG=shaders \
-   LIBGL_DRIVERS_PATH=~/lib/dri/ \
-   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
-   PAN_GPU_ID=720 \
-   ./run shaders/glmark/1-1.shader_test
-
-These examples set the compilers' ``shaders`` debug flags to dump the optimized
-NIR, backend IR after instruction selection, backend IR after register
-allocation and scheduling, and a disassembly of the final compiled binary.
-
-As another example, this invocation runs a single dEQP test "on" Mali-G52,
-pretty-printing GPU data structures and disassembling all shaders
-(``PAN_MESA_DEBUG=trace``) as well as dumping raw GPU memory
-(``PAN_MESA_DEBUG=dump``). The ``EGL_PLATFORM=surfaceless`` environment variable
-and various flags to dEQP mimic the surfaceless environment that our
-continuous integration (CI) uses. This eliminates window system dependencies,
-although it requires a specially built CTS:
-
-.. code-block:: sh
-
-   ~/VK-GL-CTS/build/external/openglcts/modules$ PAN_MESA_DEBUG=trace,dump \
-   LIBGL_DRIVERS_PATH=~/lib/dri/ \
-   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
-   PAN_GPU_ID=7212 EGL_PLATFORM=surfaceless \
-   ./glcts --deqp-surface-type=pbuffer \
-   --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 \
-   --deqp-surface-height=256 -n \
-   dEQP-GLES31.functional.shaders.builtin_functions.common.abs.float_highp_compute
-
-U-interleaved tiling
----------------------
-
-Panfrost supports u-interleaved tiling. U-interleaved tiling is
-indicated by the ``DRM_FORMAT_MOD_ARM_16X16_BLOCK_U_INTERLEAVED`` modifier.
-
-The tiling reorders whole pixels (blocks). It does not compress or modify the
-pixels themselves, so it can be used for any image format. Internally, images
-are divided into tiles. Tiles occur in source order, but pixels (blocks) within
-each tile are reordered according to a space-filling curve.
-
-For regular formats, 16x16 tiles are used. This harmonizes with the default tile
-size for binning and CRCs (transaction elimination). It also means a single line
-(16 pixels) at 4 bytes per pixel equals a single 64-byte cache line.
-
-For formats that are already block compressed (S3TC, RGTC, etc), 4x4 tiles are
-used, where entire blocks are reorder. Most of these formats compress 4x4
-blocks, so this gives an effective 16x16 tiling. This justifies the tile size
-intuitively, though it's not a rule: ASTC may uses larger blocks.
-
-Within a tile, the X and Y bits are interleaved (like Morton order), but with a
-twist: adjacent bit pairs are XORed. The reason to add XORs is not obvious.
-Visually, addresses take the form::
-
-   | y3 | (x3 ^ y3) | y2 | (y2 ^ x2) | y1 | (y1 ^ x1) | y0 | (y0 ^ x0) |
-
-Reference routines to encode/decode u-interleaved images are available in
-``src/panfrost/shared/test/test-tiling.cpp``, which documents the space-filling
-curve. This reference implementation is used to unit test the optimized
-implementation used in production. The optimized implementation is available in
-``src/panfrost/shared/pan_tiling.c``.
-
-Although these routines are part of Panfrost, they are also used by Lima, as Arm
-introduced the format with Utgard. It is the only tiling supported on Utgard. On
-Mali-T760 and newer, Arm Framebuffer Compression (AFBC) is more efficient and
-should be used instead where possible. However, not all formats are
-compressible, so u-interleaved tiling remains an important fallback on Panfrost.
-
-Instancing
-----------
-
-The attribute descriptor lets the attribute unit compute the address of an
-attribute given the vertex and instance ID. Unfortunately, the way this works is
-rather complicated when instancing is enabled.
-
-To explain this, first we need to explain how compute and vertex threads are
-dispatched.  When a quad is dispatched, it receives a single, linear index.
-However, we need to translate that index into a (vertex id, instance id) pair.
-One option would be to do:
-
-.. math::
-   \text{vertex id} = \text{linear id} \% \text{num vertices}
-
-   \text{instance id} = \text{linear id} / \text{num vertices}
-
-but this involves a costly division and modulus by an arbitrary number.
-Instead, we could pad num_vertices. We dispatch padded_num_vertices *
-num_instances threads instead of num_vertices * num_instances, which results
-in some "extra" threads with vertex_id >= num_vertices, which we have to
-discard.  The more we pad num_vertices, the more "wasted" threads we
-dispatch, but the division is potentially easier.
-
-One straightforward choice is to pad num_vertices to the next power of two,
-which means that the division and modulus are just simple bit shifts and
-masking. But the actual algorithm is a bit more complicated. The thread
-dispatcher has special support for dividing by 3, 5, 7, and 9, in addition
-to dividing by a power of two. As a result, padded_num_vertices can be
-1, 3, 5, 7, or 9 times a power of two. This results in less wasted threads,
-since we need less padding.
-
-padded_num_vertices is picked by the hardware. The driver just specifies the
-actual number of vertices. Note that padded_num_vertices is a multiple of four
-(presumably because threads are dispatched in groups of 4). Also,
-padded_num_vertices is always at least one more than num_vertices, which seems
-like a quirk of the hardware. For larger num_vertices, the hardware uses the
-following algorithm: using the binary representation of num_vertices, we look at
-the most significant set bit as well as the following 3 bits. Let n be the
-number of bits after those 4 bits. Then we set padded_num_vertices according to
-the following table:
-
-==========  =======================
-high bits   padded_num_vertices
-==========  =======================
-1000		   :math:`9 \cdot 2^n`
-1001		   :math:`5 \cdot 2^{n+1}`
-101x		   :math:`3 \cdot 2^{n+2}`
-110x		   :math:`7 \cdot 2^{n+1}`
-111x		   :math:`2^{n+4}`
-==========  =======================
-
-For example, if num_vertices = 70 is passed to glDraw(), its binary
-representation is 1000110, so n = 3 and the high bits are 1000, and
-therefore padded_num_vertices = :math:`9 \cdot 2^3` = 72.
-
-The attribute unit works in terms of the original linear_id. if
-num_instances = 1, then they are the same, and everything is simple.
-However, with instancing things get more complicated. There are four
-possible modes, two of them we can group together:
-
-1. Use the linear_id directly. Only used when there is no instancing.
-
-2. Use the linear_id modulo a constant. This is used for per-vertex
-attributes with instancing enabled by making the constant equal
-padded_num_vertices. Because the modulus is always padded_num_vertices, this
-mode only supports a modulus that is a power of 2 times 1, 3, 5, 7, or 9.
-The shift field specifies the power of two, while the extra_flags field
-specifies the odd number. If shift = n and extra_flags = m, then the modulus
-is :math:`(2m + 1) \cdot 2^n`. As an example, if num_vertices = 70, then as
-computed above, padded_num_vertices = :math:`9 \cdot 2^3`, so we should set
-extra_flags = 4 and shift = 3. Note that we must exactly follow the hardware
-algorithm used to get padded_num_vertices in order to correctly implement
-per-vertex attributes.
-
-3. Divide the linear_id by a constant. In order to correctly implement
-instance divisors, we have to divide linear_id by padded_num_vertices times
-to user-specified divisor. So first we compute padded_num_vertices, again
-following the exact same algorithm that the hardware uses, then multiply it
-by the GL-level divisor to get the hardware-level divisor. This case is
-further divided into two more cases. If the hardware-level divisor is a
-power of two, then we just need to shift. The shift amount is specified by
-the shift field, so that the hardware-level divisor is just
-:math:`2^\text{shift}`.
+Technical details
+-----------------
 
-If it isn't a power of two, then we have to divide by an arbitrary integer.
-For that, we use the well-known technique of multiplying by an approximation
-of the inverse. The driver must compute the magic multiplier and shift
-amount, and then the hardware does the multiplication and shift. The
-hardware and driver also use the "round-down" optimization as described in
-https://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
-The hardware further assumes the multiplier is between :math:`2^{31}` and
-:math:`2^{32}`, so the high bit is implicitly set to 1 even though it is set
-to 0 by the driver -- presumably this simplifies the hardware multiplier a
-little. The hardware first multiplies linear_id by the multiplier and
-takes the high 32 bits, then applies the round-down correction if
-extra_flags = 1, then finally shifts right by the shift field.
+You can read more technical details about Panfrost here:
 
-There are some differences between ridiculousfish's algorithm and the Mali
-hardware algorithm, which means that the reference code from ridiculousfish
-doesn't always produce the right constants. Mali does not use the pre-shift
-optimization, since that would make a hardware implementation slower (it
-would have to always do the pre-shift, multiply, and post-shift operations).
-It also forces the multiplier to be at least :math:`2^{31}`, which means
-that the exponent is entirely fixed, so there is no trial-and-error.
-Altogether, given the divisor d, the algorithm the driver must follow is:
+.. toctree::
+   :glob:
 
-1. Set shift = :math:`\lfloor \log_2(d) \rfloor`.
-2. Compute :math:`m = \lceil 2^{shift + 32} / d \rceil` and :math:`e = 2^{shift + 32} % d`.
-3. If :math:`e <= 2^{shift}`, then we need to use the round-down algorithm. Set
-   magic_divisor = m - 1 and extra_flags = 1.  4. Otherwise, set magic_divisor =
-   m and extra_flags = 0.
+   panfrost/*
diff --git a/docs/drivers/panfrost/drm-shim.rst b/docs/drivers/panfrost/drm-shim.rst
new file mode 100644
index 00000000000..874ac37c2f9
--- /dev/null
+++ b/docs/drivers/panfrost/drm-shim.rst
@@ -0,0 +1,84 @@
+
+drm-shim
+========
+
+Panfrost implements ``drm-shim``, stubbing out the Panfrost kernel interface.
+Use cases for this functionality include:
+
+- Future hardware bring up
+- Running shader-db on non-Mali workstations
+- Reproducing compiler (and some driver) bugs without Mali hardware
+
+Although Mali hardware is usually paired with an Arm CPU, Panfrost is portable C
+code and should work on any Linux machine. In particular, you can test the
+compiler on shader-db on an Intel desktop.
+
+To build Mesa with Panfrost drm-shim, configure Meson with
+``-Dgallium-drivers=panfrost`` and ``-Dtools=drm-shim``. See the above
+building section for a full invocation. The drm-shim binary will be built to
+``build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so``.
+
+To use, set the ``LD_PRELOAD`` environment variable to the drm-shim binary.  It
+may also be necessary to set ``LIBGL_DRIVERS_PATH`` to the location where Mesa
+was installed.
+
+By default, drm-shim mocks a Mali-G52 system. To select a specific Mali GPU,
+set the ``PAN_GPU_ID`` environment variable to the desired GPU ID:
+
+=========  ============= =======
+Product    Architecture  GPU ID
+=========  ============= =======
+Mali-T720  Midgard (v4)  720
+Mali-T860  Midgard (v5)  860
+Mali-G72   Bifrost (v6)  6221
+Mali-G52   Bifrost (v7)  7212
+Mali-G57   Valhall (v9)  9093
+Mali-G610  Valhall (v10) a867
+=========  ============= =======
+
+Additional GPU IDs are enumerated in the ``panfrost_model_list`` list in
+``src/panfrost/lib/pan_props.c``.
+
+As an example: assuming Mesa is installed to a local path ``~/lib`` and Mesa's
+build directory is ``~/mesa/build``, a shader can be compiled for Mali-G52 as:
+
+.. code-block:: sh
+
+   ~/shader-db$ BIFROST_MESA_DEBUG=shaders \
+   LIBGL_DRIVERS_PATH=~/lib/dri/ \
+   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
+   PAN_GPU_ID=7212 \
+   ./run shaders/glmark/1-1.shader_test
+
+The same shader can be compiled for Mali-T720 as:
+
+.. code-block:: sh
+
+   ~/shader-db$ MIDGARD_MESA_DEBUG=shaders \
+   LIBGL_DRIVERS_PATH=~/lib/dri/ \
+   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
+   PAN_GPU_ID=720 \
+   ./run shaders/glmark/1-1.shader_test
+
+These examples set the compilers' ``shaders`` debug flags to dump the optimized
+NIR, backend IR after instruction selection, backend IR after register
+allocation and scheduling, and a disassembly of the final compiled binary.
+
+As another example, this invocation runs a single dEQP test "on" Mali-G52,
+pretty-printing GPU data structures and disassembling all shaders
+(``PAN_MESA_DEBUG=trace``) as well as dumping raw GPU memory
+(``PAN_MESA_DEBUG=dump``). The ``EGL_PLATFORM=surfaceless`` environment variable
+and various flags to dEQP mimic the surfaceless environment that our
+continuous integration (CI) uses. This eliminates window system dependencies,
+although it requires a specially built CTS:
+
+.. code-block:: sh
+
+   ~/VK-GL-CTS/build/external/openglcts/modules$ PAN_MESA_DEBUG=trace,dump \
+   LIBGL_DRIVERS_PATH=~/lib/dri/ \
+   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
+   PAN_GPU_ID=7212 EGL_PLATFORM=surfaceless \
+   ./glcts --deqp-surface-type=pbuffer \
+   --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 \
+   --deqp-surface-height=256 -n \
+   dEQP-GLES31.functional.shaders.builtin_functions.common.abs.float_highp_compute
diff --git a/docs/drivers/panfrost/instancing.rst b/docs/drivers/panfrost/instancing.rst
new file mode 100644
index 00000000000..d4565af3155
--- /dev/null
+++ b/docs/drivers/panfrost/instancing.rst
@@ -0,0 +1,112 @@
+Instancing
+==========
+
+The attribute descriptor lets the attribute unit compute the address of an
+attribute given the vertex and instance ID. Unfortunately, the way this works is
+rather complicated when instancing is enabled.
+
+To explain this, first we need to explain how compute and vertex threads are
+dispatched.  When a quad is dispatched, it receives a single, linear index.
+However, we need to translate that index into a (vertex id, instance id) pair.
+One option would be to do:
+
+.. math::
+   \text{vertex id} = \text{linear id} \% \text{num vertices}
+
+   \text{instance id} = \text{linear id} / \text{num vertices}
+
+but this involves a costly division and modulus by an arbitrary number.
+Instead, we could pad num_vertices. We dispatch padded_num_vertices *
+num_instances threads instead of num_vertices * num_instances, which results
+in some "extra" threads with vertex_id >= num_vertices, which we have to
+discard.  The more we pad num_vertices, the more "wasted" threads we
+dispatch, but the division is potentially easier.
+
+One straightforward choice is to pad num_vertices to the next power of two,
+which means that the division and modulus are just simple bit shifts and
+masking. But the actual algorithm is a bit more complicated. The thread
+dispatcher has special support for dividing by 3, 5, 7, and 9, in addition
+to dividing by a power of two. As a result, padded_num_vertices can be
+1, 3, 5, 7, or 9 times a power of two. This results in less wasted threads,
+since we need less padding.
+
+padded_num_vertices is picked by the hardware. The driver just specifies the
+actual number of vertices. Note that padded_num_vertices is a multiple of four
+(presumably because threads are dispatched in groups of 4). Also,
+padded_num_vertices is always at least one more than num_vertices, which seems
+like a quirk of the hardware. For larger num_vertices, the hardware uses the
+following algorithm: using the binary representation of num_vertices, we look at
+the most significant set bit as well as the following 3 bits. Let n be the
+number of bits after those 4 bits. Then we set padded_num_vertices according to
+the following table:
+
+==========  =======================
+high bits   padded_num_vertices
+==========  =======================
+1000		   :math:`9 \cdot 2^n`
+1001		   :math:`5 \cdot 2^{n+1}`
+101x		   :math:`3 \cdot 2^{n+2}`
+110x		   :math:`7 \cdot 2^{n+1}`
+111x		   :math:`2^{n+4}`
+==========  =======================
+
+For example, if num_vertices = 70 is passed to glDraw(), its binary
+representation is 1000110, so n = 3 and the high bits are 1000, and
+therefore padded_num_vertices = :math:`9 \cdot 2^3` = 72.
+
+The attribute unit works in terms of the original linear_id. if
+num_instances = 1, then they are the same, and everything is simple.
+However, with instancing things get more complicated. There are four
+possible modes, two of them we can group together:
+
+1. Use the linear_id directly. Only used when there is no instancing.
+
+2. Use the linear_id modulo a constant. This is used for per-vertex
+attributes with instancing enabled by making the constant equal
+padded_num_vertices. Because the modulus is always padded_num_vertices, this
+mode only supports a modulus that is a power of 2 times 1, 3, 5, 7, or 9.
+The shift field specifies the power of two, while the extra_flags field
+specifies the odd number. If shift = n and extra_flags = m, then the modulus
+is :math:`(2m + 1) \cdot 2^n`. As an example, if num_vertices = 70, then as
+computed above, padded_num_vertices = :math:`9 \cdot 2^3`, so we should set
+extra_flags = 4 and shift = 3. Note that we must exactly follow the hardware
+algorithm used to get padded_num_vertices in order to correctly implement
+per-vertex attributes.
+
+3. Divide the linear_id by a constant. In order to correctly implement
+instance divisors, we have to divide linear_id by padded_num_vertices times
+to user-specified divisor. So first we compute padded_num_vertices, again
+following the exact same algorithm that the hardware uses, then multiply it
+by the GL-level divisor to get the hardware-level divisor. This case is
+further divided into two more cases. If the hardware-level divisor is a
+power of two, then we just need to shift. The shift amount is specified by
+the shift field, so that the hardware-level divisor is just
+:math:`2^\text{shift}`.
+
+If it isn't a power of two, then we have to divide by an arbitrary integer.
+For that, we use the well-known technique of multiplying by an approximation
+of the inverse. The driver must compute the magic multiplier and shift
+amount, and then the hardware does the multiplication and shift. The
+hardware and driver also use the "round-down" optimization as described in
+https://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
+The hardware further assumes the multiplier is between :math:`2^{31}` and
+:math:`2^{32}`, so the high bit is implicitly set to 1 even though it is set
+to 0 by the driver -- presumably this simplifies the hardware multiplier a
+little. The hardware first multiplies linear_id by the multiplier and
+takes the high 32 bits, then applies the round-down correction if
+extra_flags = 1, then finally shifts right by the shift field.
+
+There are some differences between ridiculousfish's algorithm and the Mali
+hardware algorithm, which means that the reference code from ridiculousfish
+doesn't always produce the right constants. Mali does not use the pre-shift
+optimization, since that would make a hardware implementation slower (it
+would have to always do the pre-shift, multiply, and post-shift operations).
+It also forces the multiplier to be at least :math:`2^{31}`, which means
+that the exponent is entirely fixed, so there is no trial-and-error.
+Altogether, given the divisor d, the algorithm the driver must follow is:
+
+1. Set shift = :math:`\lfloor \log_2(d) \rfloor`.
+2. Compute :math:`m = \lceil 2^{shift + 32} / d \rceil` and :math:`e = 2^{shift + 32} % d`.
+3. If :math:`e <= 2^{shift}`, then we need to use the round-down algorithm. Set
+   magic_divisor = m - 1 and extra_flags = 1.  4. Otherwise, set magic_divisor =
+   m and extra_flags = 0.
diff --git a/docs/drivers/panfrost/texcomp.rst b/docs/drivers/panfrost/texcomp.rst
new file mode 100644
index 00000000000..2cb6c9d59a0
--- /dev/null
+++ b/docs/drivers/panfrost/texcomp.rst
@@ -0,0 +1,17 @@
+Compressed texture support
+==========================
+
+In the driver, Panfrost supports ASTC, ETC, and all BCn formats (e.g. RGTC,
+S3TC, etc.) However, Panfrost depends on the hardware to support these formats
+efficiently.  All supported Mali architectures support these formats, but not
+every system-on-chip with a Mali GPU support all these formats. Many lower-end
+systems lack support for some BCn formats, which can cause problems when playing
+desktop games with Panfrost. To check whether this issue applies to your
+system-on-chip, Panfrost includes a ``panfrost_texfeatures`` tool to query
+supported formats.
+
+To use this tool, include the option ``-Dtools=panfrost`` when configuring Mesa.
+Then inside your Mesa build directory, the tool is located at
+``src/panfrost/tools/panfrost_texfeatures``. Copy it to your target device,
+set as executable as necessary, and run on the target device. A table of
+supported formats will be printed to standard output.
diff --git a/docs/drivers/panfrost/tiling.rst b/docs/drivers/panfrost/tiling.rst
new file mode 100644
index 00000000000..08c311bd55a
--- /dev/null
+++ b/docs/drivers/panfrost/tiling.rst
@@ -0,0 +1,38 @@
+
+U-interleaved tiling
+====================
+
+Panfrost supports u-interleaved tiling. U-interleaved tiling is
+indicated by the ``DRM_FORMAT_MOD_ARM_16X16_BLOCK_U_INTERLEAVED`` modifier.
+
+The tiling reorders whole pixels (blocks). It does not compress or modify the
+pixels themselves, so it can be used for any image format. Internally, images
+are divided into tiles. Tiles occur in source order, but pixels (blocks) within
+each tile are reordered according to a space-filling curve.
+
+For regular formats, 16x16 tiles are used. This harmonizes with the default tile
+size for binning and CRCs (transaction elimination). It also means a single line
+(16 pixels) at 4 bytes per pixel equals a single 64-byte cache line.
+
+For formats that are already block compressed (S3TC, RGTC, etc), 4x4 tiles are
+used, where entire blocks are reorder. Most of these formats compress 4x4
+blocks, so this gives an effective 16x16 tiling. This justifies the tile size
+intuitively, though it's not a rule: ASTC may uses larger blocks.
+
+Within a tile, the X and Y bits are interleaved (like Morton order), but with a
+twist: adjacent bit pairs are XORed. The reason to add XORs is not obvious.
+Visually, addresses take the form::
+
+   | y3 | (x3 ^ y3) | y2 | (y2 ^ x2) | y1 | (y1 ^ x1) | y0 | (y0 ^ x0) |
+
+Reference routines to encode/decode u-interleaved images are available in
+``src/panfrost/shared/test/test-tiling.cpp``, which documents the space-filling
+curve. This reference implementation is used to unit test the optimized
+implementation used in production. The optimized implementation is available in
+``src/panfrost/shared/pan_tiling.c``.
+
+Although these routines are part of Panfrost, they are also used by Lima, as Arm
+introduced the format with Utgard. It is the only tiling supported on Utgard. On
+Mali-T760 and newer, Arm Framebuffer Compression (AFBC) is more efficient and
+should be used instead where possible. However, not all formats are
+compressible, so u-interleaved tiling remains an important fallback on Panfrost.
diff --git a/docs/envvars.rst b/docs/envvars.rst
index a68483ae352..59df3964722 100644
--- a/docs/envvars.rst
+++ b/docs/envvars.rst
@@ -594,6 +594,9 @@ Intel driver environment variables
    ``sf``
       emit messages about the strips & fans unit (for old gens, includes
       the SF program)
+   ``shader-print``
+      allow developer print traces added by `brw_nir_printf` to be
+      printed out on the console
    ``soft64``
       enable implementation of software 64bit floating point support
    ``sparse``
@@ -1085,6 +1088,11 @@ Rusticl environment variables
    - ``sync`` waits on the GPU to complete after every event
    - ``validate`` validates any internally generated SPIR-Vs, e.g. through compiling OpenCL C code
 
+.. envvar:: RUSTICL_MAX_WORK_GROUPS
+
+   Limits the amount of threads per dimension in a work-group. Useful for splitting up long running
+   tasks to increase responsiveness or to simulate the lowering of huge global sizes for testing.
+
 .. _clc-env-var:
 
 clc environment variables
diff --git a/docs/features.txt b/docs/features.txt
index a314ae05469..b481aeb5f91 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -485,8 +485,8 @@ Vulkan 1.3 -- all DONE: anv, lvp, nvk, radv, tu, vn
   VK_KHR_synchronization2                               DONE (anv, dzn, hasvk, lvp, nvk, panvk, radv, tu, v3dv, vn)
   VK_KHR_zero_initialize_workgroup_memory               DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
   VK_EXT_4444_formats                                   DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
-  VK_EXT_extended_dynamic_state                         DONE (anv, hasvk, lvp, nvk, radv, tu, vn)
-  VK_EXT_extended_dynamic_state2                        DONE (anv, hasvk, lvp, nvk, radv, tu, vn)
+  VK_EXT_extended_dynamic_state                         DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
+  VK_EXT_extended_dynamic_state2                        DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
   VK_EXT_inline_uniform_block                           DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
   VK_EXT_pipeline_creation_cache_control                DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
   VK_EXT_pipeline_creation_feedback                     DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
@@ -508,7 +508,7 @@ Khronos extensions that are not part of any Vulkan version:
   VK_KHR_deferred_host_operations                       DONE (anv, hasvk, lvp, radv)
   VK_KHR_display                                        DONE (anv, nvk, pvr, radv, tu, v3dv)
   VK_KHR_display_swapchain                              not started
-  VK_KHR_dynamic_rendering_local_read                   DONE (lvp)
+  VK_KHR_dynamic_rendering_local_read                   DONE (lvp, radv)
   VK_KHR_external_fence_fd                              DONE (anv, hasvk, nvk, pvr, radv, tu, v3dv, vn)
   VK_KHR_external_fence_win32                           not started
   VK_KHR_external_memory_fd                             DONE (anv, dzn, hasvk, lvp, nvk, pvr, radv, tu, v3dv, vn)
@@ -540,7 +540,7 @@ Khronos extensions that are not part of any Vulkan version:
   VK_KHR_shader_maximal_reconvergence                   DONE (anv, lvp, nvk, radv)
   VK_KHR_shader_subgroup_rotate                         DONE (anv, nvk, radv)
   VK_KHR_shader_subgroup_uniform_control_flow           DONE (anv, hasvk, nvk, radv)
-  VK_KHR_shader_quad_control                            DONE (radv)
+  VK_KHR_shader_quad_control                            DONE (anv, radv)
   VK_KHR_shared_presentable_image                       not started
   VK_KHR_surface                                        DONE (anv, dzn, hasvk, lvp, nvk, panvk, pvr, radv, tu, v3dv, vn)
   VK_KHR_surface_protected_capabilities                 DONE (anv, lvp, nvk, radv, tu, v3dv, vn)
@@ -561,14 +561,14 @@ Khronos extensions that are not part of any Vulkan version:
   VK_EXT_calibrated_timestamps                          DONE (anv, hasvk, nvk, lvp, radv, vn)
   VK_EXT_color_write_enable                             DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
   VK_EXT_conditional_rendering                          DONE (anv, hasvk, lvp, nvk, radv, tu, vn)
-  VK_EXT_conservative_rasterization                     DONE (anv, radv, vn)
+  VK_EXT_conservative_rasterization                     DONE (anv, nvk, radv, vn)
   VK_EXT_custom_border_color                            DONE (anv, hasvk, lvp, nvk, panvk, radv, tu, v3dv, vn)
   VK_EXT_debug_marker                                   DONE (radv)
   VK_EXT_debug_report                                   DONE (anv, dzn, lvp, nvk, pvr, radv, tu, v3dv)
   VK_EXT_depth_bias_control                             DONE (anv, nvk, radv)
   VK_EXT_depth_clip_control                             DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
   VK_EXT_depth_clip_enable                              DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
-  VK_EXT_depth_range_unrestricted                       DONE (anv/gen20+, radv, lvp)
+  VK_EXT_depth_range_unrestricted                       DONE (anv/gen20+, nvk, radv, lvp)
   VK_EXT_descriptor_buffer                              DONE (anv, lvp, radv, tu)
   VK_EXT_device_address_binding_report                  DONE (radv)
   VK_EXT_device_fault                                   DONE (radv)
@@ -590,10 +590,11 @@ Khronos extensions that are not part of any Vulkan version:
   VK_EXT_headless_surface                               DONE (anv, dzn, hasvk, lvp, nvk, panvk, pvr, radv, tu, v3dv, vn)
   VK_EXT_image_2d_view_of_3d                            DONE (anv, hasvk, lvp, nvk, radv, tu, vn)
   VK_EXT_image_compression_control                      DONE (radv)
-  VK_EXT_image_drm_format_modifier                      DONE (anv, hasvk, radv/gfx9+, tu, v3dv, vn)
+  VK_EXT_image_drm_format_modifier                      DONE (anv, hasvk, nvk, radv/gfx9+, tu, v3dv, vn)
   VK_EXT_image_sliced_view_of_3d                        DONE (anv, nvk, radv/gfx10+)
   VK_EXT_image_view_min_lod                             DONE (anv, hasvk, nvk, radv, tu, vn)
   VK_EXT_index_type_uint8                               DONE (anv, hasvk, nvk, lvp, panvk, pvr, radv/gfx8+, tu, v3dv, vn)
+  VK_EXT_legacy_vertex_attributes                       DONE (anv, lvp, radv, tu)
   VK_EXT_line_rasterization                             DONE (anv, hasvk, nvk, lvp, radv, tu, v3dv, vn)
   VK_EXT_load_store_op_none                             DONE (anv, nvk, radv, tu, v3dv, vn)
   VK_EXT_memory_budget                                  DONE (anv, hasvk, lvp, nvk, pvr, radv, tu, v3dv, vn)
@@ -607,12 +608,12 @@ Khronos extensions that are not part of any Vulkan version:
   VK_EXT_pci_bus_info                                   DONE (anv, hasvk, nvk, radv, vn)
   VK_EXT_physical_device_drm                            DONE (anv, hasvk, nvk, radv, tu, v3dv, vn)
   VK_EXT_pipeline_library_group_handles                 DONE (anv, radv)
-  VK_EXT_pipeline_robustness                            DONE (anv, radv, v3dv)
+  VK_EXT_pipeline_robustness                            DONE (anv, nvk, radv, v3dv)
   VK_EXT_post_depth_coverage                            DONE (anv/gfx11+, lvp, radv/gfx10+, tu)
   VK_EXT_primitive_topology_list_restart                DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
   VK_EXT_primitives_generated_query                     DONE (anv, hasvk, lvp, nvk, radv, tu, vn)
   VK_EXT_provoking_vertex                               DONE (anv, hasvk, lvp, nvk, radv, tu, v3dv, vn)
-  VK_EXT_queue_family_foreign                           DONE (anv, hasvk, lvp, radv, tu, vn)
+  VK_EXT_queue_family_foreign                           DONE (anv, hasvk, nvk, lvp, radv, tu, vn)
   VK_EXT_rasterization_order_attachment_access          DONE (lvp, tu, vn)
   VK_EXT_robustness2                                    DONE (anv, hasvk, lvp, nvk, radv, tu, vn)
   VK_EXT_sample_locations                               DONE (anv, hasvk, nvk, radv/gfx9-, tu/a650+)
@@ -663,7 +664,9 @@ Khronos extensions that are not part of any Vulkan version:
   VK_EXT_depth_clamp_zero_one                           DONE (anv, radv)
   VK_INTEL_shader_integer_functions2                    DONE (anv, hasvk, radv)
   VK_KHR_map_memory2                                    DONE (anv, nvk, radv, tu)
-
+  VK_EXT_map_memory_placed                              DONE (anv, nvk, radv, tu)
+  VK_MESA_image_alignment_control                       DONE (radv)
+  VK_EXT_legacy_dithering                               DONE (anv)
 
 
 Clover OpenCL 1.0 -- all DONE:
diff --git a/docs/header-stubs/compiler/spirv/spirv_info.h b/docs/header-stubs/compiler/spirv/spirv_info.h
new file mode 100644
index 00000000000..d8db07f5f1b
--- /dev/null
+++ b/docs/header-stubs/compiler/spirv/spirv_info.h
@@ -0,0 +1 @@
+struct spirv_capabilities {};
diff --git a/docs/header-stubs/vk_enum_to_str.h b/docs/header-stubs/vk_enum_to_str.h
new file mode 100644
index 00000000000..e69de29bb2d
--- /dev/null
+++ b/docs/header-stubs/vk_enum_to_str.h
diff --git a/docs/release-calendar.csv b/docs/release-calendar.csv
index ed0ddca652c..8c3af5c6648 100644
--- a/docs/release-calendar.csv
+++ b/docs/release-calendar.csv
@@ -1,5 +1,2 @@
-24.0,2024-05-08,24.0.7,Eric Engestrom
-,2024-05-22,24.0.8,Eric Engestrom
-24.1,2024-05-01,24.1.0-rc2,Eric Engestrom
-,2024-05-08,24.1.0-rc3,Eric Engestrom
-,2024-05-15,24.1.0-rc4,Eric Engestrom,or 24.1.0 final
+24.0,2024-05-22,24.0.8,Eric Engestrom
+24.1,2024-05-22,24.1.0-rc5,Eric Engestrom,or 24.1.0 final
diff --git a/docs/relnotes.rst b/docs/relnotes.rst
index bf788eaae16..3d273e35112 100644
--- a/docs/relnotes.rst
+++ b/docs/relnotes.rst
@@ -3,6 +3,7 @@ Release Notes
 
 The release notes summarize what's new or changed in each Mesa release.
 
+-  :doc:`24.0.7 release notes <relnotes/24.0.7>`
 -  :doc:`24.0.6 release notes <relnotes/24.0.6>`
 -  :doc:`24.0.5 release notes <relnotes/24.0.5>`
 -  :doc:`24.0.4 release notes <relnotes/24.0.4>`
@@ -417,6 +418,7 @@ The release notes summarize what's new or changed in each Mesa release.
    :maxdepth: 1
    :hidden:
 
+   24.0.7 <relnotes/24.0.7>
    24.0.6 <relnotes/24.0.6>
    24.0.5 <relnotes/24.0.5>
    24.0.4 <relnotes/24.0.4>
diff --git a/docs/relnotes/24.0.7.rst b/docs/relnotes/24.0.7.rst
new file mode 100644
index 00000000000..0eaecdec76f
--- /dev/null
+++ b/docs/relnotes/24.0.7.rst
@@ -0,0 +1,155 @@
+Mesa 24.0.7 Release Notes / 2024-05-08
+======================================
+
+Mesa 24.0.7 is a bug fix release which fixes bugs found since the 24.0.6 release.
+
+Mesa 24.0.7 implements the OpenGL 4.6 API, but the version reported by
+glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
+glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
+Some drivers don't support all the features required in OpenGL 4.6. OpenGL
+4.6 is **only** available if requested at context creation.
+Compatibility contexts may report a lower version depending on each driver.
+
+Mesa 24.0.7 implements the Vulkan 1.3 API, but the version reported by
+the apiVersion property of the VkPhysicalDeviceProperties struct
+depends on the particular driver being used.
+
+SHA256 checksum
+---------------
+
+::
+
+    7454425f1ed4a6f1b5b107e1672b30c88b22ea0efea000ae2c7d96db93f6c26a  mesa-24.0.7.tar.xz
+
+
+New features
+------------
+
+- None
+
+
+Bug fixes
+---------
+
+- mesa 24 intel A770 KOTOR black shadow smoke scenes
+- Graphical glitches in RPCS3 after updating Vulkan Intel drivers
+- [R600] OpenGL and VDPAU regression in Mesa 23.3.0 - some bitmaps get distorted.
+- VAAPI radeonsi: VBAQ broken with HEVC
+- radv: vkCmdWaitEvents2 is broken
+- Zink: enabled extensions and features may not match
+
+
+Changes
+-------
+
+Boris Brezillon (3):
+
+- panfrost: do not write outside num_wg_sysval
+- panfrost: Add the BO containing fragment program descriptor to the batch
+- pan/kmod: Make default allocator thread-safe
+
+Constantine Shablia (2):
+
+- pan/bi: fix 1D array tex coord lowering
+- panfrost: report correct MAX_VARYINGS
+
+Daniel Schürmann (1):
+
+- aco/ra: fix kill flags after renaming fixed Operands
+
+David Rosca (5):
+
+- radeonsi/vcn: Allocate session buffer in VRAM
+- radeonsi/vcn: Fix 10bit HEVC VPS general_profile_compatibility_flags
+- radeonsi/vcn: Only enable VBAQ with rate control mode
+- frontends/va: Fix AV1 slice_data_offset with multiple slice data buffers
+- Revert "radeonsi/vcn: AV1 skip the redundant bs resize"
+
+Eric Engestrom (6):
+
+- docs: add sha256sum for 24.0.6
+- .pick_status.json: Update to 86281ef15fca378ef48bcb072a762168e537820d
+- .pick_status.json: Mark 0666a715c7210558017ce717f6b0b947c679a68e as denominated
+- .pick_status.json: Update to 603982ea802b3846e91a943b413a7baf430e875d
+- .pick_status.json: Update to 9666756f603f0285d8a93ef93db1c7ec702b671f
+- .pick_status.json: Update to b8e79d2769b4a4aed7e2103cf0405acc5bdadb86
+
+Erik Faye-Lund (2):
+
+- panfrost: correct first-tracking for signature
+- panvk: avoid dereferencing a null-pointer
+
+Georg Lehmann (1):
+
+- radv, radeonsi: don't use D16 for f2f16_rtz
+
+Gert Wollny (1):
+
+- zink/kopper: Wait for last QueuePresentKHR to finish before acquiring for readback
+
+Ian Romanick (1):
+
+- intel/brw: Fix optimize_extract_to_float for i2f of unsigned extract
+
+Iván Briano (2):
+
+- anv: check requirements for VK_IMAGE_USAGE_FRAGMENT_SHADING_RATE
+- anv: fix casting to graphics_pipeline_base
+
+Karol Herbst (2):
+
+- nir: fix nir_shader_get_function_for_name for functions without names.
+- rusticl: use stream uploader for cb0 if prefered
+
+Kenneth Graunke (1):
+
+- isl: Set MOCS to uncached for Gfx12.0 blitter sources/destinations
+
+Konstantin Seurer (1):
+
+- radv: Handle all dependencies of CmdWaitEvents2
+
+Lionel Landwerlin (2):
+
+- anv: disable dual source blending state if not used in shader
+- intel/brw: fixup wm_prog_data_barycentric_modes()
+
+Mike Blumenkrantz (8):
+
+- zink: reconstruct features pnext after determining extension support
+- glthread: check for invalid primitive modes in DrawElementsBaseVertex
+- zink: prune zink_shader::programs under lock
+- zink: fully wait on all program fences during ctx destroy
+- kopper: fix bufferage/swapinterval handling for non-window swapchains
+- zink: slightly better swapinterval failure handling
+- zink: clean up accidental debug print
+- zink: add a tu flake
+
+Patrick Lerda (1):
+
+- gallium/auxiliary/vl: fix typo which negatively impacts the src_stride initialization
+
+Rohan Garg (1):
+
+- anv: formatting fix when printing pipe controls
+
+Samuel Pitoiset (1):
+
+- radv: fix image format properties with fragment shading rate usage
+
+Sviatoslav Peleshko (1):
+
+- anv: Fix descriptor sampler offsets assignment
+
+Tapani Pälli (1):
+
+- iris: change stream uploader default size to 2MB
+
+Yiwei Zhang (2):
+
+- venus: avoid client allocators for ring internals
+- venus: fix to destroy all pipeline handles on early error paths
+
+Yusuf Khan (1):
+
+- nouveau: Fix crash when destination or source screen fences are null
diff --git a/docs/relnotes/new_features.txt b/docs/relnotes/new_features.txt
index e69de29bb2d..eec9619f4ac 100644
--- a/docs/relnotes/new_features.txt
+++ b/docs/relnotes/new_features.txt
@@ -0,0 +1,3 @@
+VK_KHR_dynamic_rendering_local_read on RADV
+VK_EXT_legacy_vertex_attributes on lavapipe, ANV, Turnip and RADV
+VK_MESA_image_alignment_control on RADV
diff --git a/docs/rusticl.rst b/docs/rusticl.rst
index 28cf87dda0e..5b7f56e24f2 100644
--- a/docs/rusticl.rst
+++ b/docs/rusticl.rst
@@ -32,8 +32,8 @@ To build Rusticl you need to satisfy the following build dependencies:
 The minimum versions to build Rusticl are:
 
 -  Rust: 1.66
--  Meson: 1.3.1
--  Bindgen: 0.62.0
+-  Meson: 1.4.0
+-  Bindgen: 0.65.0
 -  LLVM: 15.0.0
 -  Clang: 15.0.0
    Updating clang requires a rebuilt of mesa and rusticl if and only if the value of