Age | Commit message (Collapse) | Author | Files | Lines |
|
v2: Incorporated feedback from Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
It's a special reg and does not require a flush like
the other CONFIG regs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
CONFIG regs (byte offsets 0x8000-0xac00) are single state and the pipeline
must be flushed and hw idle when they are changed. Border color regs
are in the CONFIG range and this is why a flush is required when changing
them. CONTEXT regs (byte offset 0x28000+) are multi-state and those do
not require flushes when changing them.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
Ideally we'd have a compiler and register spilling and all that
but this is good enough for now to avoid the gpu hang in piglit,
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined
on r600/r700 cards.
based on r600c patch
Andre Maasikas <amaasikas@gmail.com>
r600c: bump sq gpr resources if a shader needs more than default
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
This makes sure these are enabled even if set to 0 at startup.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Evergreen can do this as well as cayman, so we should enable it.
This fixes a gpu lockup with
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test
I need to add a better workaround for r600/r700.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This caused a loop in some tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
We weren't emitting the SQ setup regs at all which really is
fail.
When a state is always enabled we need to add it to the dirty list
as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This just moves the messy stuff out of the fast path,
and leaves the fast-case in the fast path.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Since resources don't generally vary in size, this splits
the emit path, it also takes into a/c that texture and vertex resources
have different number of relocs, and avoids emitting the extra
reloc for vertex resources.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Exit this loop early to avoid pointless iterations later.
Move the resource bos to the first two regs, it actually
doesn't matter which regs we use for this in resource land.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
We were always re-emitting lots of unnecessary changes here,
avoid doing that.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This relies on the reference member being first, so document it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
We drop them when we reference the new objects in the next line.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
query->num_results already has the size in dwords of the query
buffer. There no need to multiply again. We were reading past
the end of the buffer, resulting in reading garbage.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=37028
agd5f: clarify the comment.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
Not sure why these were included originally.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
According to the hw documentation, the driver needs to:
- allocate 128 bits for each possible DB
- clear the 128 bits for each possible DB
- write 1 to bits 127 and 63 for upper DBs that don't
exist on a particular asic
Previously we were only doing these steps if the
asic had less than the max possible DBs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
Use r300_pci_ids.h instead.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
|
|
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
|
|
This just reduces code size a bit for this chunk.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
At the end of flushing we were scanning over 450 blocks
with generally about 50 enabled. This reduces the scanning
to just the list of enabled blocks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
There isn't much point taking the overhead of range/block lookups on resources
we aren't going to be getting resource registers at wierd offsets.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This just splits this function up as pre-cursor to reusing
the internals of it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
resource setting could be a fair bit more lightweight,
this patch just separates the resource structs from the standard
reg tracking structs in the driver, later patches will improve
the winsys.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
we don't need to loop over all the registers unless we have
some bos in the block, also avoid setting the ctx flags,
and move the optional stuff down below this chunk.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
also fix a unneeded dirty check and add a dirty check speedup.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This moves the overhead of working out the range/block to state build time,
it also allows the compiler to use constants for a lot of things instead
of working them out each time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
this is just an precursor change for some later patches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
These aren't used anymore.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This just avoids copying stuff if its going to modify the number of dwords
later anyways.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This range was 76 dwords long, the 75th dword changes, the first 60 or so
don't. split the block so it emits less often.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
- all asics need to emit CONTEXT_CONTROL
- all r6xx asics need to emit 3D_START_CMDBUF
The ddx and r600c already do this. r600g should as well.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
On my original R600 card this at least lets gnome shell run for a while longer
and the piglit r300-readcache test case works a lot more reliably.
Still a few more stability issues running a piglit test run though.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
The original R600 doesn't have these so don't emit them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Cayman is the RadeonHD 69xx series of GPUs. This adds support for
3D acceleration to the r600g driver.
Major changes:
Some context registers moved around - mainly MSAA and clipping/guardband related.
GPR allocation is all dynamic
no vertex cache - all unified in texture cache.
5-wide to 4-wide shader engines (no scalar or trans slot)
- some changes to how instructions are placed into slots
- removal of END_OF_PROGRAM bit in favour of END flow control clause
- no vertex fetch clause - TC accepts vertex or texture
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
If we do this for CB bases then we should do it for DB bases.
noticed while adding cayman support.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
this is taken from a patch from Mathias Froehlich, just going to
stage it in a few pieces.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
We only need to do this when the texture and CB are using the
same memory area.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
We ask for Hyper-Z access when clearing a zbuffer.
We release it if no zbuffer clear has been done for 2 seconds.
|
|
should fix https://bugs.freedesktop.org/show_bug.cgi?id=37157
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
this just makes the code a little bit cleaner.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
only allocate the blocks ptr in the range if we ever have one,
otherwise don't bother wasting the memory.
valgrind glxinfo
before:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
after:
==5227== in use at exit: 419,754 bytes in 706 blocks
==5227== total heap usage: 3,452 allocs, 2,746 frees, 3,140,531 bytes allocate
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This drops 6k of the text segment, a minor drop in the ocean, however
it also makes the code a lot cleaner and removes a lot of duplicated
information, hopefully making it more maintainable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This table covered a large range unnecessarily, reduce the address
range covered, use the fact that the bottom two bits aren't significant,
and remove unused fields from the range struct. It also drops the hash_size/shift in context in favour of a define, which should make doing the math
a bit less CPU intensive.
valgrind glxinfo
Before:
==320== in use at exit: 419,754 bytes in 706 blocks
==320== total heap usage: 3,691 allocs, 2,985 frees, 7,272,467 bytes allocated
After:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Currently r600g always maps every bo, this is quite pointless as it wastes
VM and on 32-bit with wine running VM space is quite useful.
So with this patch we don't create the mappings until first use, without
tiling enabled this probably won't make a major difference on its own,
but with tiled staged uploads it should avoid keeping maps for most of the
textures unnecessarily.
v2: add bo data ptr check
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
They need the same hack as rv670.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=35312
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|