Age | Commit message (Collapse) | Author | Files | Lines |
|
This is needed for the proof of concept work that will allow mirrored
GPU addressing via the existing userptr interface. Part of the hack
involves passing the context ID to the ioctl in order to get a VM.
v2: This patch now breaks ABI, since userptr was merged.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
This HACK allows users to reuse the userptr ioctl in order to
pre-reserve the VMA at a specific location. The vma will follow all the
same paths as other userptr objects - only the drm_mm node is actually
allocated.
Again, this patch is a big HACK to get some other people currently using
userptr enabled.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
|
|
|
|
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
v2: 0 pad the new 8B fields or else intel_error_decode has a hard time.
Note, regardless we need an igt update.
v3: Make reloc_offset 64b also.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
There is no need to preallocate the aliasing PPGTT. The code is properly
plubmed now to treat this address space like any other.
v2: Updated for CHV. Note CHV doesn't support 64b address space.
v3: Rebase on RO pte stuff. Thanks again, BYT.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
After this change, the old GGTT keeps its insert_entries/clear_range
functions as we don't expect those to ever change in terms of page table
levels. The address space now gets map_vma/unmap VMA. It better reflects
the operations we actually want to support for a VMA.
I was too lazy, but the GGTT should really use these new functions as
well.
BISECT WARNING: This commit breaks aliasing PPGTT as is. If you see this
during bisect, please skip. There was no other way I could find to make
these changes remotely readable
v2: Rebase on RO pte stuff. Thanks again, BYT.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Map is easy, it's the same register as the PDP descriptor 0, but it only
has one entry. Also, the mapping code is now trivial thanks to all of
the prep patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
The code for 4lvl works just as one would expect, and nicely it is able
to call into the existing 3lvl page table code to handle all of the
lower levels.
PML4 has no special attributes.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.
This patch addresses some carelessness done throughout development wrt
assumptions made of the root page tables.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier to swallow. The patch also introduces the PML4, ie. the new top
level structure of the page tables.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
This works the same as GEN6.
I was disappointed that I need to pass vm around now, but it's not so
much uglier than the drm_device, and having the vm in trace events is
hugely important.
v2: Consolidate pagetable/pagedirectory events
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
I was getting unexplainable hangs with the last patch, even though I
think it should be correct. As the subject says, should this ever get
merged, it needs to be coordinated with the patch this reverts.
Revert "drm/i915/bdw: Optimize PDP loads"
This reverts commit 64053129b5cbd3a5f87dab27d026c17efbdf0387.
|
|
Don't do them if they're not necessary, which they're not, for the RCS,
in certain conditions.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
This is probably not required since BDW is hopefully a bit more robust
that previous generations. Realize also that scratch will not exist for
every entry within the page table structure. Doing this would waste
an extraordinary amount of space when we move to 4 level page tables.
Therefore, the scratch pages/tables will only be pointed to by page
tables which have less than all of the entries filled.
I wrote the patch while debugging so I figured why not put it in the
series.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.
FIXME, don't merge: Error handling on page table allocations is most
definitely broken - see next patches
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
error handling for failed ppgtt init on gen8
squash this after test
Kill the separate map.
squash when tested
|
|
Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
This patch adds the functionality and calls it at init, which should
have no functional change.
The PDPEs are still a special case for now. We'll need a function for
that in the future as well.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_pagedirs. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.
The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.
TODO: this probably needs to be earlier in the series
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, it's not clear we're allowed to just use 0, or an invalid
pointer, and second, we must wipe out any previous contents from the last
context.
The latter point only matters with full PPGTT. The former point would
only effect 32b platforms, or platforms with less than 4GB memory.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.
comments
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
The docs say you cannot change the PDEs of a currently running context. If you
are changing the PDEs of the currently running context then. We never
map new PDEs of a running context, and expect them to be present - so I
think this is okay. (We can unmap, but this should also be okay since we
only unmap unreferenced objects that the GPU shouldn't be tryingto
va->pa xlate.) The MI_SET_CONTEXT command does have a flag to signal
that even if the context is the same, force a reload. It's unclear
exactly what this does, but I have a hunch it's the right thing to do.
The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.
NOTE: I have no evidence to suggest this is actually needed other than a
few tidbits which lead me to believe there are some corner cases that
will require it. I'm mostly depending on the reload of DCLV to
invalidate the old TLBs. We can try to remove this patch and see what
happens.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
We have some fanciness coming up. This patch just breaks out the logic.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.
With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.
Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning forthis.
v2: s/pdp.pagedir/pdp.pagedirs
Make a scratch page allocation helper
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Similar to the patch a few back in the series, we can always map and
unmap page directories when we do their allocation and teardown. Page
directory pages only exist on gen8+, so this should only effect behavior
on those platforms.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
With a little bit of macro magic, and the fact that every page
table/dir/etc. we wish to map will have a page, and daddr member, we can
greatly simplify and reduce code.
The patch introduces an i915_dma_map/unmap which has the same semantics
as pci_map_page, but is 1 line, and doesn't require newlines, or local
variables to make it fit cleanly.
Notice that even the page allocation shares this same attribute. For
now, I am leaving that code untouched because the macro version would be
a bit on the big side - but it's a nice cleanup as well (IMO)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
There is never a case where we don't want to do it. Since we've broken
up the allocations into nice clean helper functions, it's both easy and
obvious to do the dma mapping at the same time.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Map and unmap are common operations across all generations for
pagetables. With a simple helper, we can get a nice net code reduction
as well as simplified complexity.
There is some room for optimization here, for instance with the multiple
page mapping, that can be done in one pci_map operation. In that case
however, the max value we'll ever see there is 512, and so I believe the
simpler code makes this a worthwhile trade-off. Also, the range mapping
functions are place holders to help transition the code. Eventually,
mapping will only occur during a page allocation which will always be a
discrete operation.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Having a more general way of doing mappings will allow the ability to
easy map and unmap a specific page table. Specifically in this case, we
pass down the page directory + entry, and the page table to map. This
works similarly to the x86 code.
The same work will need to happen for GEN8. At that point I will try to
combine functionality.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Move the remaining members over to the new page table structures.
This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem_gtt.c
|
|
Thus far we've opted to make complex code requiring difficult review. In
the future, the code is only going to become more complex, and as such
we'll take the hit now and start to encapsulate things.
To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.
NOTE: The pun in the subject was intentional.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
|
|
These page table helpers make the code much cleaner. There is some
room to use the arch/x86 header files. The reason I've opted not to is
in several cases, the definitions are dictated by the CONFIG_ options
which do not always indicate the restrictions in the GPU. While here,
clean up the defines to have more concise names, and consolidate between
gen6 and gen8 where appropriate.
v2: Use I915_PAGE_SIZE to remove PAGE_SIZE dep in the new code (Jesse)
Fix bugged I915_PTE_MASK define, which was unused (Chris)
BUG_ON bad length/size - taking directly from Chris (Chris)
define NUM_PTE (Chris)
I've made a lot of tiny errors in these helpers. Often I'd correct an
error only to introduce another one. While IGT was capable of catching
them, the tests often took a while to catch, and where hard/slow to
debug in the kernel. As a result, to test this, I compiled
i915_gem_gtt.h in userspace, and ran tests from userspace. What follows
isn't by any means complete, but it was able to catch lot of bugs. Gen8
is also untested, but since the current code is almost identical, I feel
pretty comfortable with that.
void test_pte(uint32_t base) {
uint32_t ret;
assert_pte_index((base + 0), 0);
assert_pte_index((base + 1), 0);
assert_pte_index((base + 0x1000), 1);
assert_pte_index((base + (1<<22)), 0);
assert_pte_index((base + ((1<<22) - 1)), 1023);
assert_pte_index((base + (1<<21)), 512);
assert_pte_count(base + 0, 0, 0);
assert_pte_count(base + 0, 1, 1);
assert_pte_count(base + 0, 0x1000, 1);
assert_pte_count(base + 0, 0x1001, 2);
assert_pte_count(base + 0, 1<<21, 512);
assert_pte_count(base + 0, 1<<22, 1024);
assert_pte_count(base + 0, (1<<22) - 1, 1024);
assert_pte_count(base + (1<<21), 1<<22, 512);
assert_pte_count(base + (1<<21), (1<<22)+1, 512);
assert_pte_count(base + (1<<21), 10<<22, 512);
}
void test_pde(uint32_t base) {
assert(gen6_pde_index(base + 0) == 0);
assert(gen6_pde_index(base + 1) == 0);
assert(gen6_pde_index(base + (1<<21)) == 0);
assert(gen6_pde_index(base + (1<<22)) == 1);
assert(gen6_pde_index(base + ((256<<22)))== 256);
assert(gen6_pde_index(base + ((512<<22))) == 0);
assert(gen6_pde_index(base + ((513<<22))) == 1); /* This is
actually not possible on gen6 */
assert(gen6_pde_count(base + 0, 0) == 0);
assert(gen6_pde_count(base + 0, 1) == 1);
assert(gen6_pde_count(base + 0, 1<<21) == 1);
assert(gen6_pde_count(base + 0, 1<<22) == 1);
assert(gen6_pde_count(base + 0, (1<<22) + 0x1000) == 2);
assert(gen6_pde_count(base + 0x1000, 1<<22) == 2);
assert(gen6_pde_count(base + 0, 511<<22) == 511);
assert(gen6_pde_count(base + 0, 512<<22) == 512);
assert(gen6_pde_count(base + 0x1000, 512<<22) == 512);
assert(gen6_pde_count(base + (1<<22), 512<<22) == 511);
}
int main()
{
test_pde(0);
while (1)
test_pte(rand() & ~((1<<22) - 1));
return 0;
}
v3: Some small rebase conflicts resolved
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Therefore we can do it from our general init function. Eventually, I
hope to have a lot more commonality like this. It won't arrive yet, but
this was a nice easy one.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Split out single mappings which will help with upcoming work. Also while
here, rename the function because it is a better description - but this
function is going away soon.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
trivial.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
The old code (I'm having trouble finding the commit) had a reason for
doing things when there was an error, and would continue on, thus the
!ret. For the newer code however, this looks completely silly.
Follow the normal idiom of if (ret) return ret.
Also, put the pde wiring in the gen specific init, now that GEN8 exists.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Upcoming patches will use the terms map and unmap in references to the
page table entries. Having this distinction will really help with code
clarity at that point.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
The actual correct way to think about this with the new style of page
table data structures is as the actual entry that is being indexed into
the array. "pd", and "pt" aren't representative of what the operation is
doing.
The clarity here will improve the readability of future patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
There often is not enough memory to dump the full contents of the PPGTT.
As a temporary bandage, to continue getting valuable basic PPGTT info,
wrap the dangerous, memory hungry part inside of a new verbose version
of the debugfs file.
Also while here we can split out the PPGTT print function so it's more
reusable.
I'd really like to get PPGTT info into our error state, but I found it too
difficult to make work in the limited time I have. Maybe Mika can find a way.
v2: Get the info for the non-default contexts. Merge a patch from Chris
into this patch (Chris). All credit goes to him.
References: 20140320115742.GA4463@nuc-i3427.alporthouse.com
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
one, but it resembles having one). The #define was confusing as is, and
using "PDPE" is a much better description.
sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
It's useful to have it not as a macro for some upcoming work. Generally
since we try to avoid macros anyway, I think it doesn't hurt to put this
as its own patch.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
This patch finishes off actually separating the aliasing and global
finds. Prior to this, all global binds would be aliased. Now if aliasing
binds are required, they must be explicitly asked for. So far, we have
no users of this outside of execbuf - but Mika has already submitted a
patch requiring just this.
A nice benefit of this is we should no longer be able to clobber GTT
only objects from the aliasing PPGTT.
v2: Only add aliasing binds for the GGTT/Aliasing PPGTT at execbuf
v3: Rebase resolution with changed size of flags
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|