summaryrefslogtreecommitdiff
path: root/AGP.moin
diff options
context:
space:
mode:
Diffstat (limited to 'AGP.moin')
-rw-r--r--AGP.moin192
1 files changed, 0 insertions, 192 deletions
diff --git a/AGP.moin b/AGP.moin
deleted file mode 100644
index cac7940..0000000
--- a/AGP.moin
+++ /dev/null
@@ -1,192 +0,0 @@
-= Accelerated Graphics Port (AGP) =
-
-AGP is a dedicated high-speed bus that allows the graphics controller to fetch
-large amounts of data directly from system memory. It uses a Graphics Address
-Re-Mapping Table (GART) to provide a physically-contiguous view of scattered
-pages in system memory for DMA transfers.
-
-With AGP, main memory is specifically used for advanced three-dimensional
-features, such as textures, alpha buffers, and ZBuffer``s.
-
-There are two primary AGP usage models for 3D rendering that have to do with
-how data are partitioned and accessed, and the resultant interface data
-flow characteristics.
-
- DMA:: In the DMA model, the primary graphics memory is the local memory
- associated with the accelerator, referred to as the local frame buffer. 3D
- structures are stored in system memory, but are not used (or executed)
- directly from this memory; rather they are copied to primary (local) memory
- (the DMA operation) to which the rendering engine's address generator makes its
- references. This implies that the traffic on the AGP tends to be long,
- sequential transfers, serving the purpose of bulk data transport from system
- memory to primary graphics (local) memory. This sort of access model is
- amenable to a linked list of physical addresses provided by software (similar
- to the operation of a disk or network I/O device), and is generally not sensitive
- to a non-contiguous view of the memory space.
-
- execute:: In the execute model, the accelerator uses both the local memory and
- the system memory as primary graphics memory. From the accelerator's
- perspective, the two memory systems are logically equivalent; any data
- structure may be allocated in either memory, with performance optimization as
- the only criterion for selection. In general, structures in system memory space
- are not copied into the local memory prior to use by the accelerator, but are
- executed in place. This implies that the traffic on the AGP tends to be
- short, random accesses, which are not amenable to an access model based on
- software resolved lists of physical addresses. Because the accelerator
- generates direct references into system memory, a contiguous view of that space
- is essential; however, since system memory is dynamically allocated in random
- 4K pages, it is necessary in the execute model to provide an address mapping
- mechanism that maps random 4K pages into a single contiguous, physical address
- space.
-
-'''Note:''' The AGP supports both the DMA and the execute model. However, since
-a primary motivation of the AGP is to reduce growth pressure on local memory,
-the execute model is the design focus.
-
-AGP also allows to issue several access requests in a pipelined fashion while
-waiting for the data transfers to occur. Such pipelining of access requests results
-in having several read and/or write requests outstanding in the corelogic's
-request queue at any point in time.
-
-== Resources ==
-
- * [[http://www.intel.com/technology/agp/index.htm|AGP Technology Home]]
- * [[http://www.agpforum.org/faq_ans.htm|AGP Implementors Forum Q&A]]
- Link changed, [[ http://web.archive.org/web/20021214224953/http://www.agpforum.org/faq_ans.htm | archived version]]
- * [[http://www.intel.com/technology/agp/agp_index.htm|AGP Specification 2.0]]
- Link changed, [[http://esd.cs.ucr.edu/webres/agp20.pdf| try this one]]
- * [[http://www.playtool.com/pages/agpcompat/agp30.pdf | AGP 3.0 v1.0 spec]]
-
-== Frequently Asked Questions ==
-
-=== Why not use the existing XFree86 AGP manipulation calls? ===
-
-You have to understand that the DRI functions have a different purpose than the
-ones in XFree86. The DRM has to know about AGP, so it talks to the AGP kernel
-module itself. It has to be able to protect certain regions of AGP memory from
-the client side 3D drivers, yet it has to export some regions of it as well.
-While most of this functionality (most, not all) can be accomplished with the
-`/dev/agpgart` interface, it makes sense to use the DRM's current
-authentication mechanism. This means that there is less complexity on the
-client side. If we used `/dev/agpgart`, then the client would have to open two
-devices, authenticate to both of them, and make half a dozen calls to agpgart,
-and only then care about the DRM device.
-
-'''Note:''' As a side note, the XFree86 calls were written after the DRM
-functions.
-
-
-Also to answer a previous question about not using XFree86 calls for memory
-mapping, you have to understand that under most OS's (probably Solaris as well),
-XFree86's functions will only work for root privileged processes. The whole
-point of the DRI is to allow processes that can connect to the X server to do
-some form of direct to hardware rendering. If we limited ourselves to using
-XFree86's functionality, we would not be able to do this. We don't want
-everyone to be root.
-
-=== How do I use AGP? ===
-
-You can also use [[http://dri.sourceforge.net/res/testgart.c|this]] test program
-as a bit more documentation as to how agpgart is used.
-
-=== How to allocate AGP memory? ===
-
-Generally programs do the following:
-
- 1. open `/dev/agpgart`
- 1. `ioctl(ACQUIRE)`
- 1. `ioctl(INFO)` to determine amount of memory for AGP
- 1. mmap the device
- 1. `ioctl(SETUP)` to set the AGP mode
- 1. `ioctl(ALLOCATE)` a chunk o memory, specifying offset in aperture
- 1. `ioctl(BIND)` that same chunk o memory
-
-Every time you update the GART, you have to flush the cache and/or TLB's. This
-is expensive. Therefore, you allocate and bind the pages you'll use, and `mmap()`
-just returns the right pages when needed.
-
-Then you need to have a remap of the AGP aperture in the kernel which you can
-access. Use ioremap to do that.
-
-After that you have access to the AGP memory. You probably want to make sure
-that there is a write-combining MTRR over the aperture. There is code in
-`mga_drv.c` in our kernel directory that shows you how to do that.
-
-=== If one has to insert pages in order to check for -EBUSY errors and loop through the entire GART, wouldn't it be better if the driver filled up ''pg_start'' of the ''agp_bind'' structure instead of the user filling it up? ===
-
-All this allocation should be done by only one process. If you need memory in
-the GART you should be asking the X server for it (or whatever your controlling
-process is). Things are implemented this way so that the controlling process
-can know intimate details of how memory is laid out. This is very important for
-the I810, since you want to set tiled memory on certain regions of the
-aperture. If you made the kernel do the layout, then you would have to create
-device specific code in the kernel to make sure that the backbuffer/dcache are
-aligned for tiled memory. This adds complexity to the kernel that doesn't need
-to be there, and imposes restrictions on what you can do with AGP memory. Also,
-the current X server implementation (4.0) actually locks out other applications
-from adding to the GART. While the X server is active, the X server is the only
-one who can add memory. Only the controlling process may add things to the GART,
-and while a controlling process is active, no other application can be the
-controlling process.
-
-Microsoft's VGART does things like you are describing I believe. I think it's
-bad design. It enforces a policy on whoever uses it, and is not flexible. When
-you are designing low level system routines I think it is very important to
-make sure your design has the minimum of policy. Otherwise when you want to do
-something different you have to change the interface, or create custom drivers
-for each application that needs to do things differently.
-
-=== How does the DMA transfer mechanism works? ===
-
-Here's a proposal for an zero-ioctl (best case) DMA transfer mechanism.
-
-Let's call it 'kernel ringbuffers'. The premise is to replace the calls to the
-'fire-vertex-buffer' ioctl with code to write to a client-private mapping
-shared by the kernel (like the current SAREA, but for each client).
-
-Starting from the beginning:
-
- * Each client has a private piece of AGP memory, into which it will put secure commands (typically vertices and texture data). The client may expand or shrink this region according to load.
-
- * Each client has a shared user/kernel region of cached memory. (Per-context SAREA). This is managed like a ring, with head and tail pointers.
- * The client emits vertices to AGP memory (as it currently does with DMA buffers).
-
- * When a state change, clear, swap, flush, or other event occurs, the client:Grabs the hardware lock.Re-emits any invalidated state to the head of the ring.Emits a command to fire the portion of AGP space as vertices.Updates the head pointer in the ring.Releases the lock.
-
- * The kernel is responsible for processing all of the rings. Several events might cause the kernel to examine active rings for commands to be dispatched:
-
- * A flush ioctl. (Called by impatient clients)
-
- * A periodic timer. (If this is low overhead?)
-
- * An interrupt previously emitted by the kernel. (If timers don't work)
-
-Additionally, for those who've been paying attention, you'll notice that some
-of the assumptions that we currently use to manage hardware state between
-multiple active contexts are broken if client commands to hardware aren't
-executed serially in an order which is knowable to the clients. Otherwise, a
-client that grabs the heavy lock doesn't know what state has been invalidated
-or textures swapped out by other clients.
-
-This could be solved by keeping per-context state in the kernel and
-implementing a proper texture manager. That's something we need to do anyway,
-but it's not a requirement for this mechanism to work.
-
-Instead, force the kernel to fire all outstanding commands on client
-ringbuffers whenever the heavyweight lock changes hands. This provides the same
-serialized semantics as the current mechanism, and also simplifies the kernel's
-task as it knows that only a single context has an active ring buffer (the one
-last to hold the lock).
-
-An additional mechanism is required to allow clients to know which pieces of
-their AGP buffer is pending execution by the hardware, and which pieces of the
-buffer are available to be reused. This is also exactly what
-''NV_vertex_array_range'' requires.
-
-= Kernel patches =
-
-Kernel patch to get AGP 8X working on a VIA KT400. The patch is for a 2.4.21pre5 but patches, compiles and runs perfectly in a 2.4.21. This patch is needed for this chipset because it works '''only''' in 8X when you plug a 8X card.
-
-* [[http://lists.insecure.org/lists/linux-kernel/2003/Mar/3999.html|Linux Kernel Mailing List]]
-----
-CategoryGlossary, CategoryFaq