summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorStephane Marchesin <stephane.marchesin@gmail.com>2012-03-12 02:02:08 -0700
committerStephane Marchesin <stephane.marchesin@gmail.com>2012-03-12 02:02:08 -0700
commit1c50fd53b0f526806a6e83875090db0079d45a02 (patch)
tree6510614b6d725d1480091f3d6b3d859a9277c872
parentf845e596176d828b4d5cac1661cdd8d80b15da5a (diff)
More changes...
-rw-r--r--linuxgraphicsdrivers.lyx238
1 files changed, 136 insertions, 102 deletions
diff --git a/linuxgraphicsdrivers.lyx b/linuxgraphicsdrivers.lyx
index bbf6129..74faf78 100644
--- a/linuxgraphicsdrivers.lyx
+++ b/linuxgraphicsdrivers.lyx
@@ -11,6 +11,7 @@
\usepackage{tikz}
\usepackage{array}
+
\usetikzlibrary{positioning,shadows,arrows,shapes,patterns}
\usepackage{verbatim}
\tikzset{
@@ -1243,7 +1244,7 @@ Common bus types.
\end_layout
-\begin_layout Subparagraph*
+\begin_layout Subsubsection*
\lang english
PCI (Peripheral Component Interconnect)
@@ -1261,7 +1262,7 @@ PCI is the most basic bus allowing connecting graphics peripherals today.
for the memory to be coherent across devices.
\end_layout
-\begin_layout Subparagraph*
+\begin_layout Subsubsection*
\lang english
AGP (Accelerated Graphics Port)
@@ -1333,7 +1334,7 @@ Keep in mind that these last two features are known to be unstable on a
on AGP cards.
\end_layout
-\begin_layout Subparagraph*
+\begin_layout Subsubsection*
\lang english
PCI-X
@@ -1343,11 +1344,11 @@ PCI-X
\lang english
PCI-X was developed as a faster PCI for server boards, and very few graphics
- peripherals exist in this format.
+ peripherals exist in this format (some Matrox G550 cards).
It is not to be confused with PCI-Express, which sees real widespread usage.
\end_layout
-\begin_layout Subparagraph*
+\begin_layout Subsubsection*
\lang english
PCI-Express (PCI-E)
@@ -1740,8 +1741,8 @@ Dire que c'est lineaire en memoire physique et virtu
\begin_layout Standard
\lang english
-Yet another special case of IOMMU is the PCI GART which allows exposing
- a chunk of system memory to the card.
+Yet another special case of IOMMU is the PCI GART present on some GPUs,
+ which allows exposing a chunk of system memory to the card.
In that case the IOMMU table is embedded in the graphics card, and often
the physical memory used does not need to be contiguous.
\end_layout
@@ -2333,7 +2334,7 @@ The layout of a surface.
\begin_layout Subsubsection*
\lang english
-2D engine
+2D Engine
\end_layout
\begin_layout Standard
@@ -2943,7 +2944,7 @@ reference "cha:Gallium-3D"
\begin_layout Subsubsection*
\lang english
-3D engine
+3D Engine
\end_layout
\begin_layout Standard
@@ -2981,7 +2982,11 @@ http://www.x.org/wiki/Development/Documentation/HowVideoCardsWork
\begin_layout Standard
\lang english
-tiled textures
+To attain better cache locality, the textures and surface are often tiled.
+ Tiling means that the texture isn't stored linearly in GPU memory, but
+ instead is stored so as to make pixels which are close in texture space
+ also close in memory space.
+ Examples are the Z-order curve and the Hilbert curve.
\end_layout
\begin_layout Subsubsection*
@@ -3059,7 +3064,7 @@ lspci -v
the PCI resource space stays limited.
\end_layout
-\begin_layout Subparagraph*
+\begin_layout Subsubsection*
\lang english
MMIO
@@ -3082,7 +3087,7 @@ MMIO is the most direct access to the card.
of today's drivers.
\end_layout
-\begin_layout Subparagraph*
+\begin_layout Subsubsection*
\lang english
DMA
@@ -3151,10 +3156,20 @@ Graphics Hardware Examples
\begin_layout Subsection
\lang english
-Classical renderers
+Forward renderers
\end_layout
-\begin_layout Paragraph*
+\begin_layout Standard
+
+\lang english
+Forward renderers (i.e.
+ classical renderers) are GPU which render the primitives as they are submitted
+ to the rendering API, and for each of those complete one of these primitives
+ before moving on to the next.
+ This is the most straightforward way of rendering 3D primitives.
+\end_layout
+
+\begin_layout Subsubsection*
\lang english
ATI
@@ -3166,7 +3181,7 @@ ATI
Shader engine 4+1
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Nvidia
@@ -3224,13 +3239,23 @@ Deferred renderers are a different design for GPUs.
\begin_layout Itemize
Much better rendering locality can be achieved by splitting the screen into
- tiles (usually in the 32x32 pixel range).
+ tiles (usually in the
+\begin_inset Formula $16\times16$
+\end_inset
+
+ to
+\begin_inset Formula $32\times32$
+\end_inset
+
+ pixel range).
The GPU can then iterate over these tiles, and for each of those can resolve
per-pixel depth in an internal (mini) zbuffer.
Once the whole tile is rendered it can be written back to video memory,
saving precious bandwidth.
Similarly, since visibility is determined before fetching texture data,
- only the useful texture data is read, again saving bandwidth.
+ only the useful texture data is read (again saving bandwidth) and the fragment
+ shaders are only executed for visible fragments (which saves computation
+ power).
\end_layout
\begin_layout Itemize
@@ -3259,7 +3284,7 @@ All in all, the deferred renderers are particularly useful for embedded
approach don't matter.
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
SGX
@@ -4577,16 +4602,19 @@ name "cha:Framebuffer-Drivers"
\lang english
Framebuffer drivers are the simplest form of graphics drivers under Linux.
- Kernel modesetting DRM drivers are still a relevant option if the only
- thing you are after is a basic two-dimensional display.
+ A framebuffer driver is a kernel graphics driver exposing its interface
+ through /dev/fb*.
+ This interface implements limited functionality (basically it allows setting
+ a video mode and drawing to a linear framebuffer), and the framebuffer
+ drivers are therefore extremely easy to create.
+ Despite their simplicity, framebuffer drivers are still a relevant option
+ if the only thing you are after is a basic two-dimensional display.
It is also useful to know how framebuffer drivers work when implementing
- framebuffer acceleration on top of a kernel modesetting DRM driver, as
- the acceleration callbacks are the same.
- A framebuffer driver implements little functionality, and is therefore
- extremely easy to create.
- Such a driver is especially interesting for embedded systems, where memory
- footprint is essential, or when the intended applications do not require
- advanced graphics acceleration.
+ framebuffer acceleration for a kernel modesetting DRM driver, as the accelerati
+on callbacks are the same.
+ In short, framebuffer drivers are especially interesting for embedded systems,
+ where memory footprint is essential, or when the intended applications
+ do not require advanced graphics acceleration.
\end_layout
\begin_layout Standard
@@ -4670,6 +4698,12 @@ The framebuffer operations structure is how non-modesetting framebuffer
By filling struct fb_ops callbacks, one can implement the following functions:
\end_layout
+\begin_layout Subsubsection*
+
+\lang english
+Set color register
+\end_layout
+
\begin_layout Standard
\lang english
@@ -4701,10 +4735,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* set color register */
+Set color registers in batch
\end_layout
\begin_layout Standard
@@ -4737,10 +4771,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* set color registers in batch */
+Blank display
\end_layout
\begin_layout Standard
@@ -4773,10 +4807,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* blank display */
+Pan display
\end_layout
\begin_layout Standard
@@ -4809,10 +4843,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* pan display */
+Draws a solid rectangle
\end_layout
\begin_layout Standard
@@ -4846,10 +4880,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* Draws a rectangle */
+Copy data from area to another
\end_layout
\begin_layout Standard
@@ -4882,10 +4916,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* Copy data from area to another */
+Draws an image to the display
\end_layout
\begin_layout Standard
@@ -4918,10 +4952,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* Draws a image to the display */
+Draws cursor
\end_layout
\begin_layout Standard
@@ -4954,10 +4988,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* Draws cursor */
+Rotates the display
\end_layout
\begin_layout Standard
@@ -4990,10 +5024,10 @@ end{lstlisting}{}
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-/* Rotates the display */
+Wait for blit idle, optional
\end_layout
\begin_layout Standard
@@ -5029,12 +5063,6 @@ end{lstlisting}{}
\begin_layout Standard
\lang english
-/* wait for blit idle, optional */
-\end_layout
-
-\begin_layout Standard
-
-\lang english
Note that common framebuffer functions (cfb) are available if you do not
want to implement everything for your device specifically.
These functions are cfb_fillrect, cfb_copyarea and cfb_imageblit and will
@@ -5135,16 +5163,16 @@ y to user space.
\begin_layout Itemize
\lang english
-More recently, DRM was improve to achieve modesetting.
+More recently, DRM was improved to achieve modesetting.
This simplifies the situation where both the DRM and the framebuffer driver
- access the card by removing the framebuffer driver and implementing in
- the DRM.
+ access the same GPU by removing the framebuffer driver and instead implementing
+ framebuffer support in the DRM.
\end_layout
\begin_layout Itemize
\lang english
-Put critical initialization of the card in the kernel, for example by uploading
+Put critical initialization of the card in the kernel, for example uploading
firmwares or setting up DMA areas.
\end_layout
@@ -5444,8 +5472,8 @@ When the hardware doesn't have memory protection, this can still be achieved
To prevent access to arbitrary GPU memory, the command submission ioctl
can also check that each of these offsets is owned by the calling process,
and reject the batch buffer if it isn't.
- This way it is possible to implement memory protection on hardware which
- doesn't have that functionality otherwise.
+ That way it is possible to implement memory protection when the hardware
+ doesn't provide that functionality.
\end_layout
\begin_layout Standard
@@ -5486,8 +5514,8 @@ However, these days it makes more sense to put it in the kernel once and
DDXes, EGL stacks...).
This extension to modesetting is called kernel modesetting (also known
as KMS).
- A number of concepts are used by the modesetting interface (those are inherited
- from the Randr 1.2 specification).
+ A number of concepts are used by the modesetting interface (those concepts
+ are mainly inherited from the Randr 1.2 specification).
\end_layout
\begin_layout Subsubsection*
@@ -5661,7 +5689,7 @@ Creating a basic driver
Mandatory entry points
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
PreInit
@@ -5703,7 +5731,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
ScreenInit
@@ -5746,7 +5774,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
EnterVT
@@ -5790,7 +5818,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
LeaveVT
@@ -5839,7 +5867,7 @@ end{lstlisting}{}
Optional functions (but very useful)
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
SwitchMode
@@ -5881,7 +5909,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
AdjustFrame
@@ -5924,7 +5952,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
FreeScreen
@@ -6308,7 +6336,7 @@ EXA is implemented in the driver as a series of callbacks; the following
; some of them like Composite() are optional.
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
Solid
@@ -6423,7 +6451,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
Copy
@@ -6530,7 +6558,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
Composite
@@ -6557,7 +6585,7 @@ If the driver doesn't support the required operation, it is free to return
Of course this will be done on the CPU as a fallback.
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
UploadToScreen
@@ -6600,7 +6628,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
DowndloadFromScreen
@@ -6643,7 +6671,7 @@ end{lstlisting}{}
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
PrepareAccess
@@ -6662,7 +6690,7 @@ PrepareAccess makes the pixmap accessible from the CPU.
a linear view, or by doing a copy from GPU to CPU memory.
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
FinishAccess
@@ -6675,7 +6703,7 @@ FinishAccess is called once the pixmap is done being accessed, and must
undo what PrepareAccess did to make the pixmap usable by the GPU again.
\end_layout
-\begin_layout Paragraph
+\begin_layout Subsubsection*
\lang english
A note about EXA performance
@@ -6810,7 +6838,7 @@ Video decoding pipeline
Two typical video pipelines : mpeg2 and h264
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
The H262 decoding pipeline
@@ -6822,7 +6850,7 @@ The H262 decoding pipeline
iDCT -> MC -> CSC -> Final display
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
The H.264 decoding pipeline
@@ -7901,7 +7929,7 @@ using a conversion shader or a conversion texture lookup
Video decoding APIs
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Xv
@@ -7923,7 +7951,7 @@ Xv is simply about CSC ans scaling.
prove to be fine when coupled with a powerful CPU to decode H264 content).
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
XvMC
@@ -7935,7 +7963,7 @@ XvMC
idct + mc +csc
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
VAAPI
@@ -7949,7 +7977,7 @@ VAAPI was initially created for intel's poulsbo video decoding.
at different pipeline stages, which makes it more complex to implement.
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
VDPAU
@@ -7961,7 +7989,7 @@ VDPAU
The VDPAU was initiated by nvidia for H264 & VC1 decoding support
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
XvBA
@@ -7973,7 +8001,7 @@ XvBA
All 3 APIs are intended for full
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
OpenMax
@@ -9277,7 +9305,7 @@ not as difficult as it seems, requires organization, being rigorous.
the road (if you hesitate, you have crossed the line already!).
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Mmiotrace
@@ -9304,7 +9332,7 @@ mmio trace is now part of the official Linux kernels.
Therefore, any pre-existing driver can be traced.
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Libsegfault
@@ -9320,7 +9348,7 @@ libsegfault is similar to mmio-trace in the way it works: after removing
is a kernel tool.
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Valgrind-mmt
@@ -9337,13 +9365,13 @@ Valgrind is a dynamic recompiling and instrumentation framework.
access to the zones we want to see traced is logged.
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
-vbetool
+vbetool/vbtracetool
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Virtualization
@@ -9360,7 +9388,7 @@ n.
(which imposes the use of an open source virtualization solution like Qemu).
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Ad-hoc tools
@@ -9446,13 +9474,13 @@ name "cha:Beyond-Development"
Testing for conformance
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Rendercheck
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
OpenGL conformance test suite
@@ -9466,13 +9494,13 @@ The official OpenGL testing suite is not publicly available, and (paying)
Instead, most developers use alternate sources for test programs.
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Piglit
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
glean
@@ -9484,7 +9512,7 @@ glean
glean.sourceforge.net
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
Mesa demos
@@ -9502,7 +9530,7 @@ mesa/progs/*
Debugging
\end_layout
-\begin_layout Paragraph*
+\begin_layout Subsubsection*
\lang english
gdb and X.Org
@@ -9517,43 +9545,49 @@ gdb needs to run on a terminal emulator while the application debug might
and gdb waiting to be able to output text.
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
printk debug
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
-crash (surcouche gdb pour analyser les vmcore)
+crash
\end_layout
\begin_layout Standard
\lang english
+(surcouche gdb pour analyser les vmcore)
+\end_layout
+
+\begin_layout Subsubsection*
+
+\lang english
kgdb
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
serial console
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
diskdump
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
linux-uml
\end_layout
-\begin_layout Standard
+\begin_layout Subsubsection*
\lang english
systemtap