summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-06-20drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC)gpu_mirrorBen Widawsky2-32/+109
This is needed for the proof of concept work that will allow mirrored GPU addressing via the existing userptr interface. Part of the hack involves passing the context ID to the ioctl in order to get a VM. v2: This patch now breaks ABI, since userptr was merged. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Track userptr VMAsBen Widawsky4-4/+28
This HACK allows users to reuse the userptr ioctl in order to pre-reserve the VMA at a specific location. The vma will follow all the same paths as other userptr objects - only the drm_mm node is actually allocated. Again, this patch is a big HACK to get some other people currently using userptr enabled. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20TESTME: Always force invalidateBen Widawsky1-0/+8
2014-06-20TESTME: GFX_TLB_INVALIDATE_EXPLICITBen Widawsky1-1/+1
2014-06-20drm/i915/bdw: Flip the 48b switchBen Widawsky2-4/+1
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Expand error state's address width to 64bBen Widawsky2-9/+10
v2: 0 pad the new 8B fields or else intel_error_decode has a hard time. Note, regardless we need an igt update. v3: Make reloc_offset 64b also. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: make aliasing PPGTT dynamicBen Widawsky1-130/+162
There is no need to preallocate the aliasing PPGTT. The code is properly plubmed now to treat this address space like any other. v2: Updated for CHV. Note CHV doesn't support 64b address space. v3: Rebase on RO pte stuff. Thanks again, BYT. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Restructure map vs. insert entriesBen Widawsky4-124/+131
After this change, the old GGTT keeps its insert_entries/clear_range functions as we don't expect those to ever change in terms of page table levels. The address space now gets map_vma/unmap VMA. It better reflects the operations we actually want to support for a VMA. I was too lazy, but the GGTT should really use these new functions as well. BISECT WARNING: This commit breaks aliasing PPGTT as is. If you see this during bisect, please skip. There was no other way I could find to make these changes remotely readable v2: Rebase on RO pte stuff. Thanks again, BYT. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: 4 level pages tablesBen Widawsky3-6/+52
Map is easy, it's the same register as the PDP descriptor 0, but it only has one entry. Also, the mapping code is now trivial thanks to all of the prep patches. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: implement alloc/teardown for 4lvlBen Widawsky2-19/+163
The code for 4lvl works just as one would expect, and nicely it is able to call into the existing 3lvl page table code to handle all of the lower levels. PML4 has no special attributes. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Abstract PDP usageBen Widawsky1-44/+85
Up until now, ppgtt->pdp has always been the root of our page tables. Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs. In preparation for 4 level page tables, we need to stop use ppgtt->pdp directly unless we know it's what we want. The future structure will use ppgtt->pml4 for the top level, and the pdp is just one of the entries being pointed to by a pml4e. This patch addresses some carelessness done throughout development wrt assumptions made of the root page tables. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Make pdp allocation more dynamicBen Widawsky4-32/+151
This transitional patch doesn't do much for the existing code. However, it should make upcoming patches to use the full 48b address space a bit easier to swallow. The patch also introduces the PML4, ie. the new top level structure of the page tables. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Add dynamic page trace eventsBen Widawsky2-10/+47
This works the same as GEN6. I was disappointed that I need to pass vm around now, but it's not so much uglier than the drm_device, and having the vm in trace events is hugely important. v2: Consolidate pagetable/pagedirectory events Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20TESTME: Either drop the last patch or fix it.Ben Widawsky2-21/+0
I was getting unexplainable hangs with the last patch, even though I think it should be correct. As the subject says, should this ever get merged, it needs to be coordinated with the patch this reverts. Revert "drm/i915/bdw: Optimize PDP loads" This reverts commit 64053129b5cbd3a5f87dab27d026c17efbdf0387.
2014-06-20drm/i915/bdw: Optimize PDP loadsBen Widawsky2-0/+21
Don't do them if they're not necessary, which they're not, for the RCS, in certain conditions. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Add ppgtt info for dynamic pagesBen Widawsky3-12/+83
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Scratch unused pagesBen Widawsky1-2/+23
This is probably not required since BDW is hopefully a bit more robust that previous generations. Realize also that scratch will not exist for every entry within the page table structure. Doing this would waste an extraordinary amount of space when we move to 4 level page tables. Therefore, the scratch pages/tables will only be pointed to by page tables which have less than all of the entries filled. I wrote the patch while debugging so I figured why not put it in the series. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Dynamic page table allocationsBen Widawsky1-45/+216
This finishes off the dynamic page tables allocations, in the legacy 3 level style that already exists. Most everything has already been setup to this point, the patch finishes off the enabling by setting the appropriate function pointers. FIXME, don't merge: Error handling on page table allocations is most definitely broken - see next patches Signed-off-by: Ben Widawsky <ben@bwidawsk.net> error handling for failed ppgtt init on gen8 squash this after test Kill the separate map. squash when tested
2014-06-20drm/i915/bdw: begin bitmap trackingBen Widawsky2-14/+99
Like with gen6/7, we can enable bitmap tracking with all the preallocations to make sure things actually don't blow up. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Split out mappingsBen Widawsky1-42/+52
When we do dynamic page table allocations for gen8, we'll need to have more control over how and when we map page tables, similar to gen6. This patch adds the functionality and calls it at init, which should have no functional change. The PDPEs are still a special case for now. We'll need a function for that in the future as well. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Extract PPGTT param from pagedir allocBen Widawsky1-8/+7
Now that we don't need to trace num_pd_pages, we may as well kill all need for the PPGTT structure in the alloc_pagedirs. This is very useful for when we move to 48b addressing, and the PDP isn't the root of the page table structure. The param is replaced with drm_device, which is an unavoidable wart throughout the series. (in other words, not extra flagrant). Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: num_pd_pages/num_pd_entries isn't usefulBen Widawsky3-42/+21
These values are never quite useful for dynamic allocations of the page tables. Getting rid of them will help prevent later confusion. TODO: this probably needs to be earlier in the series Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Make the pdp switch a bit less hackyBen Widawsky2-13/+24
One important part of this patch is we now write a scratch page directory into any unused PDP descriptors. This matters for 2 reasons, first, it's not clear we're allowed to just use 0, or an invalid pointer, and second, we must wipe out any previous contents from the last context. The latter point only matters with full PPGTT. The former point would only effect 32b platforms, or platforms with less than 4GB memory. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: pagetable allocation reworkBen Widawsky2-25/+39
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: pagedirs rework allocationBen Widawsky1-12/+31
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915/bdw: Use dynamic allocation idioms on freeBen Widawsky2-16/+55
The page directory freer is left here for now as it's still useful given that GEN8 still preallocates. Once the allocation functions are broken up into more discrete chunks, we'll follow suit and destroy this leftover piece. comments Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Finish gen6/7 dynamic page table allocationBen Widawsky5-11/+238
This patch continues on the idea from the previous patch. From here on, in the steady state, PDEs are all pointing to the scratch page table (as recommended in the spec). When an object is allocated in the VA range, the code will determine if we need to allocate a page for the page table. Similarly when the object is destroyed, we will remove, and free the page table pointing the PDE back to the scratch page. Following patches will work to unify the code a bit as we bring in GEN8 support. GEN6 and GEN8 are different enough that I had a hard time to get to this point with as much common code as I do. The aliasing PPGTT must pre-allocate all of the page tables. There are a few reasons for this. Two trivial ones: aliasing ppgtt goes through the ggtt paths, so it's hard to maintain, we currently do not restore the default context (assuming the previous force reload is indeed necessary). Most importantly though, the only way (it seems from empirical evidence) to invalidate the CS TLBs on non-render ring is to either use ring sync (which requires actually stopping the rings in order to synchronize when the sync completes vs. where you are in execution), or to reload DCLV. Since without full PPGTT we do not ever reload the DCLV register, there is no good way to achieve this. The simplest solution is just to not support dynamic page table creation/destruction in the aliasing PPGTT. We could always reload DCLV, but this seems like quite a bit of excess overhead only to save at most 2MB-4k of memory for the aliasing PPGTT page tables. v2: Make the page table bitmap declared inside the function (Chris) Simplify the way scratching address space works. Move the alloc/teardown tracepoints up a level in the call stack so that both all implementations get the trace. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Force pd restore when PDEs change, gen6-7Ben Widawsky4-4/+35
The docs say you cannot change the PDEs of a currently running context. If you are changing the PDEs of the currently running context then. We never map new PDEs of a running context, and expect them to be present - so I think this is okay. (We can unmap, but this should also be okay since we only unmap unreferenced objects that the GPU shouldn't be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag to signal that even if the context is the same, force a reload. It's unclear exactly what this does, but I have a hunch it's the right thing to do. The logic assumes that we always emit a context switch after mapping new PDEs, and before we submit a batch. This is the case today, and has been the case since the inception of hardware contexts. A note in the comment let's the user know. NOTE: I have no evidence to suggest this is actually needed other than a few tidbits which lead me to believe there are some corner cases that will require it. I'm mostly depending on the reload of DCLV to invalidate the old TLBs. We can try to remove this patch and see what happens. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Extract context switch skip logicBen Widawsky1-1/+11
We have some fanciness coming up. This patch just breaks out the logic. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Track GEN6 page table usageBen Widawsky2-89/+230
Instead of implementing the full tracking + dynamic allocation, this patch does a bit less than half of the work, by tracking and warning on unexpected conditions. The tracking itself follows which PTEs within a page table are currently being used for objects. The next patch will modify this to actually allocate the page tables only when necessary. With the current patch there isn't much in the way of making a gen agnostic range allocation function. However, in the next patch we'll add more specificity which makes having separate functions a bit easier to manage. Notice that aliasing PPGTT is not managed here. The patch which actually begins dynamic allocation/teardown explains the reasoning forthis. v2: s/pdp.pagedir/pdp.pagedirs Make a scratch page allocation helper Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Always dma map page directory allocationsBen Widawsky1-60/+19
Similar to the patch a few back in the series, we can always map and unmap page directories when we do their allocation and teardown. Page directory pages only exist on gen8+, so this should only effect behavior on those platforms. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Consolidate dma mappingsBen Widawsky1-38/+18
With a little bit of macro magic, and the fact that every page table/dir/etc. we wish to map will have a page, and daddr member, we can greatly simplify and reduce code. The patch introduces an i915_dma_map/unmap which has the same semantics as pci_map_page, but is 1 line, and doesn't require newlines, or local variables to make it fit cleanly. Notice that even the page allocation shares this same attribute. For now, I am leaving that code untouched because the macro version would be a bit on the big side - but it's a nice cleanup as well (IMO) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Always dma map page table allocationsBen Widawsky1-61/+17
There is never a case where we don't want to do it. Since we've broken up the allocations into nice clean helper functions, it's both easy and obvious to do the dma mapping at the same time. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Clean up pagetable DMA map & unmapBen Widawsky1-62/+85
Map and unmap are common operations across all generations for pagetables. With a simple helper, we can get a nice net code reduction as well as simplified complexity. There is some room for optimization here, for instance with the multiple page mapping, that can be done in one pci_map operation. In that case however, the max value we'll ever see there is 512, and so I believe the simpler code makes this a worthwhile trade-off. Also, the range mapping functions are place holders to help transition the code. Eventually, mapping will only occur during a page allocation which will always be a discrete operation. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Generalize GEN6 mappingBen Widawsky2-29/+34
Having a more general way of doing mappings will allow the ability to easy map and unmap a specific page table. Specifically in this case, we pass down the page directory + entry, and the page table to map. This works similarly to the x86 code. The same work will need to happen for GEN8. At that point I will try to combine functionality. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Create page table allocatorsBen Widawsky2-83/+147
As we move toward dynamic page table allocation, it becomes much easier to manage our data structures if break do things less coarsely by breaking up all of our actions into individual tasks. This makes the code easier to write, read, and verify. Aside from the dissection of the allocation functions, the patch statically allocates the page table structures without a page directory. This remains the same for all platforms, The patch itself should not have much functional difference. The primary noticeable difference is the fact that page tables are no longer allocated, but rather statically declared as part of the page directory. This has non-zero overhead, but things gain non-trivial complexity as a result. This patch exists for a few reasons: 1. Splitting out the functions allows easily combining GEN6 and GEN8 code. Page tables have no difference based on GEN8. As we'll see in a future patch when we add the DMA mappings to the allocations, it requires only one small change to make work, and error handling should just fall into place. 2. Unless we always want to allocate all page tables under a given PDE, we'll have to eventually break this up into an array of pointers (or pointer to pointer). 3. Having the discrete functions is easier to review, and understand. All allocations and frees now take place in just a couple of locations. Reviewing, and catching leaks should be easy. 4. Less important: the GFP flags are confined to one location, which makes playing around with such things trivial. v2: Updated commit message to explain why this patch exists Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Complete page table structuresBen Widawsky3-64/+37
Move the remaining members over to the new page table structures. This can be squashed with the previous commit if desire. The reasoning is the same as that patch. I simply felt it is easier to review if split. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Conflicts: drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem_gtt.c
2014-06-20drm/i915: construct page table abstractionsBen Widawsky2-93/+104
Thus far we've opted to make complex code requiring difficult review. In the future, the code is only going to become more complex, and as such we'll take the hit now and start to encapsulate things. To help transition the code nicely there is some wasted space in gen6/7. This will be ameliorated shortly. NOTE: The pun in the subject was intentional. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Conflicts: drivers/gpu/drm/i915/i915_drv.h
2014-06-20drm/i915: Page table helpers, and define renamesBen Widawsky2-56/+156
These page table helpers make the code much cleaner. There is some room to use the arch/x86 header files. The reason I've opted not to is in several cases, the definitions are dictated by the CONFIG_ options which do not always indicate the restrictions in the GPU. While here, clean up the defines to have more concise names, and consolidate between gen6 and gen8 where appropriate. v2: Use I915_PAGE_SIZE to remove PAGE_SIZE dep in the new code (Jesse) Fix bugged I915_PTE_MASK define, which was unused (Chris) BUG_ON bad length/size - taking directly from Chris (Chris) define NUM_PTE (Chris) I've made a lot of tiny errors in these helpers. Often I'd correct an error only to introduce another one. While IGT was capable of catching them, the tests often took a while to catch, and where hard/slow to debug in the kernel. As a result, to test this, I compiled i915_gem_gtt.h in userspace, and ran tests from userspace. What follows isn't by any means complete, but it was able to catch lot of bugs. Gen8 is also untested, but since the current code is almost identical, I feel pretty comfortable with that. void test_pte(uint32_t base) { uint32_t ret; assert_pte_index((base + 0), 0); assert_pte_index((base + 1), 0); assert_pte_index((base + 0x1000), 1); assert_pte_index((base + (1<<22)), 0); assert_pte_index((base + ((1<<22) - 1)), 1023); assert_pte_index((base + (1<<21)), 512); assert_pte_count(base + 0, 0, 0); assert_pte_count(base + 0, 1, 1); assert_pte_count(base + 0, 0x1000, 1); assert_pte_count(base + 0, 0x1001, 2); assert_pte_count(base + 0, 1<<21, 512); assert_pte_count(base + 0, 1<<22, 1024); assert_pte_count(base + 0, (1<<22) - 1, 1024); assert_pte_count(base + (1<<21), 1<<22, 512); assert_pte_count(base + (1<<21), (1<<22)+1, 512); assert_pte_count(base + (1<<21), 10<<22, 512); } void test_pde(uint32_t base) { assert(gen6_pde_index(base + 0) == 0); assert(gen6_pde_index(base + 1) == 0); assert(gen6_pde_index(base + (1<<21)) == 0); assert(gen6_pde_index(base + (1<<22)) == 1); assert(gen6_pde_index(base + ((256<<22)))== 256); assert(gen6_pde_index(base + ((512<<22))) == 0); assert(gen6_pde_index(base + ((513<<22))) == 1); /* This is actually not possible on gen6 */ assert(gen6_pde_count(base + 0, 0) == 0); assert(gen6_pde_count(base + 0, 1) == 1); assert(gen6_pde_count(base + 0, 1<<21) == 1); assert(gen6_pde_count(base + 0, 1<<22) == 1); assert(gen6_pde_count(base + 0, (1<<22) + 0x1000) == 2); assert(gen6_pde_count(base + 0x1000, 1<<22) == 2); assert(gen6_pde_count(base + 0, 511<<22) == 511); assert(gen6_pde_count(base + 0, 512<<22) == 512); assert(gen6_pde_count(base + 0x1000, 512<<22) == 512); assert(gen6_pde_count(base + (1<<22), 512<<22) == 511); } int main() { test_pde(0); while (1) test_pte(rand() & ~((1<<22) - 1)); return 0; } v3: Some small rebase conflicts resolved Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Range clearing is PPGTT agnosticBen Widawsky1-4/+1
Therefore we can do it from our general init function. Eventually, I hope to have a lot more commonality like this. It won't arrive yet, but this was a nice easy one. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Make gen6_write_pdes gen6_map_page_tablesBen Widawsky1-16/+23
Split out single mappings which will help with upcoming work. Also while here, rename the function because it is a better description - but this function is going away soon. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Un-hardcode number of page directoriesBen Widawsky1-1/+1
trivial. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: clean up PPGTT init error pathBen Widawsky1-13/+9
The old code (I'm having trouble finding the commit) had a reason for doing things when there was an error, and would continue on, thus the !ret. For the newer code however, this looks completely silly. Follow the normal idiom of if (ret) return ret. Also, put the pde wiring in the gen specific init, now that GEN8 exists. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Setup less PPGTT on failed pagedirBen Widawsky1-1/+4
The current code will both potentially print a WARN, and setup part of the PPGTT structure. Neither of these harm the current code, it is simply for clarity, and to perhaps prevent later bugs, or weird debug messages. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: rename map/unmap to dma_map/unmapBen Widawsky1-6/+6
Upcoming patches will use the terms map and unmap in references to the page table entries. Having this distinction will really help with code clarity at that point. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: s/pd/pdpe, s/pt/pdeBen Widawsky1-7/+7
The actual correct way to think about this with the new style of page table data structures is as the actual entry that is being indexed into the array. "pd", and "pt" aren't representative of what the operation is doing. The clarity here will improve the readability of future patches. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Split out verbose PPGTT dumpingBen Widawsky1-21/+28
There often is not enough memory to dump the full contents of the PPGTT. As a temporary bandage, to continue getting valuable basic PPGTT info, wrap the dangerous, memory hungry part inside of a new verbose version of the debugfs file. Also while here we can split out the PPGTT print function so it's more reusable. I'd really like to get PPGTT info into our error state, but I found it too difficult to make work in the limited time I have. Maybe Mika can find a way. v2: Get the info for the non-default contexts. Merge a patch from Chris into this patch (Chris). All credit goes to him. References: 20140320115742.GA4463@nuc-i3427.alporthouse.com Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Rename to GEN8_LEGACY_PDPESBen Widawsky2-6/+6
In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have one, but it resembles having one). The #define was confusing as is, and using "PDPE" is a much better description. sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch] Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: fix gtt_total_entries()Ben Widawsky2-4/+7
It's useful to have it not as a macro for some upcoming work. Generally since we try to avoid macros anyway, I think it doesn't hurt to put this as its own patch. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2014-06-20drm/i915: Split out aliasing bindsBen Widawsky4-5/+11
This patch finishes off actually separating the aliasing and global finds. Prior to this, all global binds would be aliased. Now if aliasing binds are required, they must be explicitly asked for. So far, we have no users of this outside of execbuf - but Mika has already submitted a patch requiring just this. A nice benefit of this is we should no longer be able to clobber GTT only objects from the aliasing PPGTT. v2: Only add aliasing binds for the GGTT/Aliasing PPGTT at execbuf v3: Rebase resolution with changed size of flags Signed-off-by: Ben Widawsky <ben@bwidawsk.net>