Age | Commit message (Collapse) | Author | Files | Lines |
|
ASID register offset is caclulated by SWGROUP ID so that we can get
rid of old SoC specific MACROs. This ID conversion is needed for the
unified SMMU driver over Tegra SoCs. We use dt-bindings MACRO instead
of SoC dependent MACROs. The formula is:
MC_SMMU_<swgroup name>_ASID_0 = MC_SMMU_AFI_ASID_0 + ID * 4;
Now SWGROUP ID is the global HardWare Accelerator(HWA) identifier
among all Tegra SoC except Tegra2.
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
platform_devices are registered as IOMMU'able dynamically via
add_device() and remove_device().
Tegra SMMU can have multiple address spaces(AS). IOMMU'able devices
can belong to one of them. Multiple IOVA maps are created at boot-up,
which can be attached to devices later. We reserve 2 of them for
static assignment, AS[0] for system default, AS[1] for AHB clusters as
protected domain from others, where there are many traditional
pheripheral devices like USB, SD/MMC. They should be isolated from
some smart devices like host1x for system robustness. Even if smart
devices behaves wrongly, the traditional devices(SD/MMC, USB) wouldn't
be affected, and the system could continue most likely. DMA API(ARM)
needs ARM_DMA_USE_IOMMU to be enabled.
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
ops->{bound,unbind}_driver() functions are called at
BUS_NOTIFY_{BOUND,UNBIND}_DRIVER respectively.
This is necessary to control the device population order. IOMMU master
devices depend on an IOMMU device instanciation. IOMMU master devices
can be registered to an IOMMU only after it's successfully
populated. This IOMMU registration is done via
ops->bound_driver(). Currently this population can be deferred if
depending IOMMU device hasn't yet been populated in driver core. This
cannot be done via ops->add_device() since after add_device() device's
population/instanciation can be still deferred via probe().
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
IOMMU devices on the bus need to be poplulated first, then iommu
master devices are done later.
With CONFIG_OF_IOMMU, "iommus=" DT binding would be used to identify
whether a device can be an iommu msater or not. If a device can, we'll
defer to populate that device till an iommu device is populated. Then,
those deferred iommu master devices are populated and configured with
help of the already populated iommu device.
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
IOMMU devices on the bus need to be poplulated first, then iommu
master devices are done later.
With CONFIG_OF_IOMMU, "iommus=" DT binding would be used to identify
whether a device can be an iommu msater or not. If a device can, we'll
defer to populate that device till an depending iommu device is
populated.
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
This enables to find an populated IOMMU device via a device node. This
can be used to see if an dependee IOMMU is populated or not to keep
correct device population order. Client devices need to wait an IOMMU
to be populated.
Suggested by Thierry Reding and copied his example code.
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Iterating over a property containing a list of phandles with arguments
is a common operation for device drivers. This patch adds a new
of_property_for_each_phandle_with_args() macro to make the iteration
simpler.
Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Cc: Rob Herring <robherring2@gmail.com>
Cc: Grant Likely <grant.likely@linaro.org>
Reviewed-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
|
|
|
|
This patch is not meant to be merged, but rather to try and understand
why this is needed and what a more suitable solution could be.
Allowing BOs to be write-cached results in the following happening when
trying to run any program on Tegra/GK20A:
Unhandled fault: external abort on non-linefetch (0x1008) at 0xf0036010
...
(nouveau_bo_rd32) from [<c0357d00>] (nouveau_fence_update+0x5c/0x80)
(nouveau_fence_update) from [<c0357d40>] (nouveau_fence_done+0x1c/0x38)
(nouveau_fence_done) from [<c02c3d00>] (ttm_bo_wait+0xec/0x168)
(ttm_bo_wait) from [<c035e334>] (nouveau_gem_ioctl_cpu_prep+0x44/0x100)
(nouveau_gem_ioctl_cpu_prep) from [<c02aaa84>] (drm_ioctl+0x1d8/0x4f4)
(drm_ioctl) from [<c0355394>] (nouveau_drm_ioctl+0x54/0x80)
(nouveau_drm_ioctl) from [<c00ee7b0>] (do_vfs_ioctl+0x3dc/0x5a0)
(do_vfs_ioctl) from [<c00ee9a8>] (SyS_ioctl+0x34/0x5c)
(SyS_ioctl) from [<c000e6e0>] (ret_fast_syscall+0x0/0x30
The offending nouveau_bo_rd32 is done over an IO-mapped BO, e.g. a BO
mapped through the BAR.
Any idea about the origin of this behavior? Does ARM forbid cached
mappings over IO regions?
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
This is a gazillion times faster, but requires to invalidate the
GPU L2 (probably at a better place than this!)
|
|
What is it for anyway?
|
|
|
|
|
|
Add a platform driver for Nouveau devices declared using the device tree
or platform data. This driver currently supports GK20A on Tegra
platforms and is only compiled for these platforms if Nouveau is
enabled.
Nouveau will probe the chip type itself using the BOOT0 register, so all
this driver really needs to do is to make sure the module is powered and
its clocks active before calling nouveau_drm_platform_probe().
Heavily based on work done by Thierry Reding.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
BOs in non-coherent memory need to be explicitly flushed to ensure
a memory write took effect. Not doing so could result in
synchronization issues.
This patch introduces a macro that flushes the caches on ARM and
translates to a no-op on other architectures, and uses it when
writing to in-memory BOs. It will also be useful for implementations of
instmem that access shared memory directly instead of going through
PRAMIN.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Use the newly-introduced TTM cache sync functions in Nouveau.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
[acourbot@nvidia.com: make conditional and platform-friendly]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
On architectures for which access to GPU memory is non-coherent,
caches need to be flushed and invalidated explicitly at the
appropriate places. Introduce two small helpers to make things
easy for TTM-based drivers.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
GK20A's RAM driver was using CMA functions in order to allocate VRAM.
This is wrong because these functions are not exported, which causes
compilation to fail when CMA is enabled and Nouveau is built as a
module. On top of that the driver was leaking (or rather bleeding)
memory.
dma_alloc_coherent() will also use CMA when needed but has the
advantage of being properly exported. It creates a permanent kernel
mapping, but experiment revealed that the lowmem mapping is actually
reused, and this mapping can also be taken advantage of to implement
faster instmem. We lose the ability to allocate memory at finer
granularity, but that's what CMA is here for and it also simplifies the
driver.
This driver is to be replaced by an IOMMU-based one in the future ;
until then, its current form will allow it to do its job.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
gk20a_ram_put() can be called with a NULL nouveau_mem in case of error.
Handle that case the way is it done in other RAM drivers.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Set the correct subdev/engine classes when GK20A (0xea) is probed.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Add a GR device for GK20A based on NVE4, with the correct classes
definitions (GK20A's 3D class is 0xa297).
Most of the NVE4 code can be used on GK20A, so make relevant bits of
NVE4 available to other chips as well.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Pad the microcode to a multiple of 0x40 words, otherwise firmware will
fail to run from non-prepadded firmware files.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
|
|
nvc0_graph_ctor() would only let the graphics engine be enabled if its
oclass has a proper microcode linked to it. This prevents GR from being
enabled at all on chips that rely exclusively on external firmware, even
though such a use-case is valid.
Relax the conditions enabling the GR engine to also include the case
where an external firmware has also been loaded.
Also switch to external firmware if the graph class has no microcode
linked to it.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
GK20A's FIFO is compatible with NVE0, but only features 128 channels and
1 runlist.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
|
|
Add a simple FB device for GK20A, as well as a RAM implementation based
on contiguous DMA memory allocations suitable for chips that use system
memory as video RAM.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Add support for initializing the priv ring of GK20A. This is done by the
BIOS on desktop GPUs, but needs to be done by hand on Tegra.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Adapt the NVC0 BAR driver to make it able to support chips that do not
expose a BAR3. When this happens, BAR1 is then used for USERD mapping
and the BAR alloc() functions is disabled, making GPU objects unable
to rely on BAR for data access and falling back to PRAMIN.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Some chips that use system memory exclusively (e.g. GK20A) do not
expose 2 BAR regions. For them only BAR1 exists, and it should be used
for USERD mapping. Do not map BAR3 if its resource does not exist.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
|
|
When a device is shut down, disable the panel to make sure the display
backlight doesn't stay lit.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Hook up the MIPI DSI bus's .shutdown() function to allow drivers to
implement code that should be run when a device is shut down.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
The internal host1x_{,un}register_client() functions can potentially be
confused with public the host1x_client_{,un}register() functions.
Rename them to host1x_{add,del}_client() to remove some of the possible
confusion.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Other output drivers set up debugfs slightly differently. Bring the SOR
driver in line with those for consistency.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Removing only the root directory will fail when there are still files in
it. Instead of manually removing all files, remove the whole directory
recursively.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
This doesn't save as many lines as it could because the hotplug detect
GPIO is always active-high. But converting now will give us support for
active-low hotplug detect GPIO as well.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Doing so allows the hotplug events generated by the connector to be
properly handled by the DRM poll helpers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Calling the drm_helper_hpd_irq_event() helper can sleep, so instead of
invoking it directly from the interrupt handler, schedule a work queue
and run it from there.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Enable hardware cursor support on Tegra124. Earlier generations support
the hardware cursor to some degree as well, but not in a way that can be
generically exposed.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
The DRM core can now cope with drivers that don't have an associated
struct drm_bus, so the host1x implementation is no longer useful.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Describe how devices are registered using the drm_*_init() functions.
Adding this to docbook requires a largish set of changes to the comments
in drm_{pci,usb,platform}.c since they are doxygen-style rather than
proper kernel-doc and therefore mess with the docbook generation.
While at it, mark usage of drm_put_dev() as discouraged in favour of
calling drm_dev_unregister() and drm_dev_unref() directly.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v3:
- add note that drm_put_dev() should not be used in new drivers
- use !E to include all exported functions in DocBook
|
|
Add a helper function that allows drivers to statically set the unique
name of the device. This will allow platform and USB drivers to get rid
of their DRM bus implementations and directly use drm_dev_alloc() and
drm_dev_register().
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v3:
- address comments by David Herrmann:
- return -EINVAL when dev->unique not set in drm_set_busid()
- group together drm_dev_*() functions
- use drm_dev_ prefix
Changes in v2:
- move drm_set_unique() to drivers/gpu/drm/drm_stub.c
- add kernel-doc
|
|
Explicitly allocate the array with the correct length instead of using a
variable length array. The variable length array is probably safe in
this case, but sparse complains about it.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
|
|
A race condition currently exists on Tegra, where it can happen that a
monitor attached via HDMI isn't detected during the initial FB helper
setup, but the hotplug event happens too early to be processed by the
poll helpers because they haven't been initialized yet. This happens
because on some boards the HDMI driver can control the regulator that
supplies the +5V pin on the HDMI connector. Therefore depending on the
timing between the initialization of the HDMI driver and the rest of
DRM, it's possible that the monitor returns the hotplug signal right
within the window where we would miss it.
Unfortunately, drm_kms_helper_poll_init() will wreak havoc when called
before at least some parts of the FB helpers have been set up.
This commit fixes this by splitting out the minimum of initialization
required to make drm_kms_helper_poll_init() work into a separate
function that can be called early. It is then safe to move all of the
poll helper initialization to an earlier point in time (before the
HDMI output driver has a chance to enable the +5V supply). That way if
the hotplug signal is returned before the initial FB helper setup, the
monitor will be forcefully detected at that point, and if the hotplug
signal is returned after that it will be properly handled by the poll
helpers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
To implement hotplug detection in a race-free manner, drivers must call
drm_kms_helper_poll_init() before hotplug events can be triggered. Such
events can be triggered right after any of the encoders or connectors
are initialized. At the same time, if the drm_fb_helper_hotplug_event()
helper is used by a driver, then the poll helper requires some parts of
the FB helper to be initialized to prevent a crash.
At the same time, drm_fb_helper_init() requires information that is not
necessarily available at such an early stage (number of CRTCs and
connectors), so it cannot be used yet.
Add a new helper, drm_fb_helper_prepare(), that initializes the bare
minimum needed to allow drm_kms_helper_poll_init() to execute and any
subsequent hotplug events to be processed properly.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
There's no need for this to be modifiable. Make it const so that it can
be put into the .rodata section.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Some drivers need to be able to have a perfect race-free fbcon setup.
Current drivers only enable hotplug processing after the call to
drm_fb_helper_initial_config which leaves a tiny but important race.
This race is especially noticable on embedded platforms where the
driver itself enables the voltage for the hdmi output, since only then
will monitors (after a bit of delay, as usual) respond by asserting
the hpd pin.
Most of the infrastructure is already there with the split-out
drm_fb_helper_init. And drm_fb_helper_initial_config already has all
the required locking to handle concurrent hpd events since
commit 53f1904bced78d7c00f5d874c662ec3ac85d0f9f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Mar 20 14:26:35 2014 +0100
drm/fb-helper: improve drm_fb_helper_initial_config locking
The only missing bit is making drm_fb_helper_hotplug_event save
against concurrent calls of drm_fb_helper_initial_config. The only
unprotected bit is the check for fb_helper->fb.
With that drivers can first initialize the fb helper, then enabel
hotplug processing and then set up the initial config all in a
completely race-free manner. Update kerneldoc and convert i915 as a
proof of concept.
Feature requested by Thierry since his tegra driver atm reliably boots
slowly enough to misses the hotplug event for an external hdmi screen,
but also reliably boots to quickly for the hpd pin to be asserted when
the fb helper calls into the hdmi ->detect function.
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[treding@nvidia.com: remove one new enable_hotplug_processing occurrence]
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
The host bridge on Tegra is not visible on the PCI bus, and therefore
this quirk causes the following warning in the boot log:
[ 4.373017] pci 0000:00:01.0: nv_msi_ht_cap_quirk didn't locate host bridge
Since the quirk requires access to the host bridge's PCI configuration
space it cannot work on Tegra, so simply disable it in that case.
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Currently the resource hierarchy generated from the PCIe host bridge is
completely flat:
$ cat /proc/iomem
00000000-00000fff : /pcie-controller@00003000/pci@1,0
00003000-000037ff : pads
00003800-000039ff : afi
10000000-1fffffff : cs
28000000-28003fff : r8169
28004000-28004fff : r8169
...
The host bridge driver doesn't request all the resources that are used.
Windows allocated to each of the root ports aren't tracked, so there is
no way for resources allocated to individual devices to be matched up
with the correct parent resource by the PCI core.
This patch addresses this in two steps. It first takes the union of all
regions associated with the PCIe host bridge (control registers, root
port registers, configuration space, I/O and prefetchable as well as
non-prefetchable memory regions) and uses it as the new root of the
resource hierarchy.
Subsequently, regions are allocated from within this new root resource
so that the resource tree looks much more like what's expected:
# cat /proc/iomem
00000000-3fffffff : /pcie-controller@00003000
00000000-00000fff : /pcie-controller@00003000/pci@1,0
00003000-000037ff : pads
00003800-000039ff : afi
10000000-1fffffff : cs
20000000-27ffffff : non-prefetchable
28000000-3fffffff : prefetchable
28000000-280fffff : PCI Bus 0000:01
28000000-28003fff : 0000:01:00.0
28000000-28003fff : r8169
28004000-28004fff : 0000:01:00.0
28004000-28004fff : r8169
...
Signed-off-by: Thierry Reding <treding@nvidia.com>
|