diff options
Diffstat (limited to 'docs/dispatch.rst')
-rw-r--r-- | docs/dispatch.rst | 255 |
1 files changed, 255 insertions, 0 deletions
diff --git a/docs/dispatch.rst b/docs/dispatch.rst new file mode 100644 index 00000000000..c7b0af37636 --- /dev/null +++ b/docs/dispatch.rst @@ -0,0 +1,255 @@ +GL Dispatch +=========== + +Several factors combine to make efficient dispatch of OpenGL functions +fairly complicated. This document attempts to explain some of the issues +and introduce the reader to Mesa's implementation. Readers already +familiar with the issues around GL dispatch can safely skip ahead to the +`overview of Mesa's implementation <#overview>`__. + +1. Complexity of GL Dispatch +---------------------------- + +Every GL application has at least one object called a GL *context*. This +object, which is an implicit parameter to every GL function, stores all +of the GL related state for the application. Every texture, every buffer +object, every enable, and much, much more is stored in the context. +Since an application can have more than one context, the context to be +used is selected by a window-system dependent function such as +``glXMakeContextCurrent``. + +In environments that implement OpenGL with X-Windows using GLX, every GL +function, including the pointers returned by ``glXGetProcAddress``, are +*context independent*. This means that no matter what context is +currently active, the same ``glVertex3fv`` function is used. + +This creates the first bit of dispatch complexity. An application can +have two GL contexts. One context is a direct rendering context where +function calls are routed directly to a driver loaded within the +application's address space. The other context is an indirect rendering +context where function calls are converted to GLX protocol and sent to a +server. The same ``glVertex3fv`` has to do the right thing depending on +which context is current. + +Highly optimized drivers or GLX protocol implementations may want to +change the behavior of GL functions depending on current state. For +example, ``glFogCoordf`` may operate differently depending on whether or +not fog is enabled. + +In multi-threaded environments, it is possible for each thread to have a +different GL context current. This means that poor old ``glVertex3fv`` +has to know which GL context is current in the thread where it is being +called. + +.. _overview: + +2. Overview of Mesa's Implementation +------------------------------------ + +Mesa uses two per-thread pointers. The first pointer stores the address +of the context current in the thread, and the second pointer stores the +address of the *dispatch table* associated with that context. The +dispatch table stores pointers to functions that actually implement +specific GL functions. Each time a new context is made current in a +thread, these pointers a updated. + +The implementation of functions such as ``glVertex3fv`` becomes +conceptually simple: + +- Fetch the current dispatch table pointer. +- Fetch the pointer to the real ``glVertex3fv`` function from the + table. +- Call the real function. + +This can be implemented in just a few lines of C code. The file +``src/mesa/glapi/glapitemp.h`` contains code very similar to this. + +:: + + void glVertex3f(GLfloat x, GLfloat y, GLfloat z) + { + const struct _glapi_table * const dispatch = GET_DISPATCH(); + + (*dispatch->Vertex3f)(x, y, z); + } + +Sample dispatch function + +The problem with this simple implementation is the large amount of +overhead that it adds to every GL function call. + +In a multithreaded environment, a naive implementation of +``GET_DISPATCH`` involves a call to ``pthread_getspecific`` or a similar +function. Mesa provides a wrapper function called +``_glapi_get_dispatch`` that is used by default. + +3. Optimizations +---------------- + +A number of optimizations have been made over the years to diminish the +performance hit imposed by GL dispatch. This section describes these +optimizations. The benefits of each optimization and the situations +where each can or cannot be used are listed. + +3.1. Dual dispatch table pointers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The vast majority of OpenGL applications use the API in a single +threaded manner. That is, the application has only one thread that makes +calls into the GL. In these cases, not only do the calls to +``pthread_getspecific`` hurt performance, but they are completely +unnecessary! It is possible to detect this common case and avoid these +calls. + +Each time a new dispatch table is set, Mesa examines and records the ID +of the executing thread. If the same thread ID is always seen, Mesa +knows that the application is, from OpenGL's point of view, single +threaded. + +As long as an application is single threaded, Mesa stores a pointer to +the dispatch table in a global variable called ``_glapi_Dispatch``. The +pointer is also stored in a per-thread location via +``pthread_setspecific``. When Mesa detects that an application has +become multithreaded, ``NULL`` is stored in ``_glapi_Dispatch``. + +Using this simple mechanism the dispatch functions can detect the +multithreaded case by comparing ``_glapi_Dispatch`` to ``NULL``. The +resulting implementation of ``GET_DISPATCH`` is slightly more complex, +but it avoids the expensive ``pthread_getspecific`` call in the common +case. + +:: + + #define GET_DISPATCH() \ + (_glapi_Dispatch != NULL) \ + ? _glapi_Dispatch : pthread_getspecific(&_glapi_Dispatch_key) + +Improved ``GET_DISPATCH`` Implementation + +3.2. ELF TLS +~~~~~~~~~~~~ + +Starting with the 2.4.20 Linux kernel, each thread is allocated an area +of per-thread, global storage. Variables can be put in this area using +some extensions to GCC. By storing the dispatch table pointer in this +area, the expensive call to ``pthread_getspecific`` and the test of +``_glapi_Dispatch`` can be avoided. + +The dispatch table pointer is stored in a new variable called +``_glapi_tls_Dispatch``. A new variable name is used so that a single +libGL can implement both interfaces. This allows the libGL to operate +with direct rendering drivers that use either interface. Once the +pointer is properly declared, ``GET_DISPACH`` becomes a simple variable +reference. + +:: + + extern __thread struct _glapi_table *_glapi_tls_Dispatch + __attribute__((tls_model("initial-exec"))); + + #define GET_DISPATCH() _glapi_tls_Dispatch + +TLS ``GET_DISPATCH`` Implementation + +Use of this path is controlled by the preprocessor define +``USE_ELF_TLS``. Any platform capable of using ELF TLS should use this +as the default dispatch method. + +3.3. Assembly Language Dispatch Stubs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Many platforms has difficulty properly optimizing the tail-call in the +dispatch stubs. Platforms like x86 that pass parameters on the stack +seem to have even more difficulty optimizing these routines. All of the +dispatch routines are very short, and it is trivial to create optimal +assembly language versions. The amount of optimization provided by using +assembly stubs varies from platform to platform and application to +application. However, by using the assembly stubs, many platforms can +use an additional space optimization (see `below <#fixedsize>`__). + +The biggest hurdle to creating assembly stubs is handling the various +ways that the dispatch table pointer can be accessed. There are four +different methods that can be used: + +#. Using ``_glapi_Dispatch`` directly in builds for non-multithreaded + environments. +#. Using ``_glapi_Dispatch`` and ``_glapi_get_dispatch`` in + multithreaded environments. +#. Using ``_glapi_Dispatch`` and ``pthread_getspecific`` in + multithreaded environments. +#. Using ``_glapi_tls_Dispatch`` directly in TLS enabled multithreaded + environments. + +People wishing to implement assembly stubs for new platforms should +focus on #4 if the new platform supports TLS. Otherwise, implement #2 +followed by #3. Environments that do not support multithreading are +uncommon and not terribly relevant. + +Selection of the dispatch table pointer access method is controlled by a +few preprocessor defines. + +- If ``USE_ELF_TLS`` is defined, method #3 is used. +- If ``HAVE_PTHREAD`` is defined, method #2 is used. +- If none of the preceding are defined, method #1 is used. + +Two different techniques are used to handle the various different cases. +On x86 and SPARC, a macro called ``GL_STUB`` is used. In the preamble of +the assembly source file different implementations of the macro are +selected based on the defined preprocessor variables. The assembly code +then consists of a series of invocations of the macros such as: + +:: + + GL_STUB(Color3fv, _gloffset_Color3fv) + +SPARC Assembly Implementation of ``glColor3fv`` + +The benefit of this technique is that changes to the calling pattern +(i.e., addition of a new dispatch table pointer access method) require +fewer changed lines in the assembly code. + +However, this technique can only be used on platforms where the function +implementation does not change based on the parameters passed to the +function. For example, since x86 passes all parameters on the stack, no +additional code is needed to save and restore function parameters around +a call to ``pthread_getspecific``. Since x86-64 passes parameters in +registers, varying amounts of code needs to be inserted around the call +to ``pthread_getspecific`` to save and restore the GL function's +parameters. + +The other technique, used by platforms like x86-64 that cannot use the +first technique, is to insert ``#ifdef`` within the assembly +implementation of each function. This makes the assembly file +considerably larger (e.g., 29,332 lines for ``glapi_x86-64.S`` versus +1,155 lines for ``glapi_x86.S``) and causes simple changes to the +function implementation to generate many lines of diffs. Since the +assembly files are typically generated by scripts (see +`below <#autogen>`__), this isn't a significant problem. + +Once a new assembly file is created, it must be inserted in the build +system. There are two steps to this. The file must first be added to +``src/mesa/sources``. That gets the file built and linked. The second +step is to add the correct ``#ifdef`` magic to +``src/mesa/glapi/glapi_dispatch.c`` to prevent the C version of the +dispatch functions from being built. + +.. _fixedsize: + +3.4. Fixed-Length Dispatch Stubs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To implement ``glXGetProcAddress``, Mesa stores a table that associates +function names with pointers to those functions. This table is stored in +``src/mesa/glapi/glprocs.h``. For different reasons on different +platforms, storing all of those pointers is inefficient. On most +platforms, including all known platforms that support TLS, we can avoid +this added overhead. + +If the assembly stubs are all the same size, the pointer need not be +stored for every function. The location of the function can instead be +calculated by multiplying the size of the dispatch stub by the offset of +the function in the table. This value is then added to the address of +the first dispatch stub. + +This path is activated by adding the correct ``#ifdef`` magic to +``src/mesa/glapi/glapi.c`` just before ``glprocs.h`` is included. |