~nh/llvm - Misc LLVM things, mostly radeonsi (AMDGPU)

Age	Commit message (Collapse)	Author	Files	Lines
2015-12-20	[X86] Use range-based for loop. NFC	Craig Topper	1	-3/+2
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256127 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-17	[X86] Use push-pop for materializing small constants under 'minsize'	Hans Wennborg	1	-0/+48
	Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255936 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-15	Fix "Not having LAHF/SAHF" assert.	Hans Wennborg	1	-1/+2
	It wants to assert that the subtarget is 64-bit, not the register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255703 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-15	[X86] Smaller code for materializing 32-bit 1 and -1 constants	Hans Wennborg	1	-5/+43
	"movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes. This commit makes LLVM use the latter when optimizing for size. Differential Revision: http://reviews.llvm.org/D14971 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255656 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-11	CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()	Matthias Braun	1	-15/+10
	computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255362 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-05	[X86][ADX] Added memory folding patterns and stack folding tests	Simon Pilgrim	1	-0/+6
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254844 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-04	X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported	Hans Wennborg	1	-4/+25
	These instructions are not supported by all CPUs in 64-bit mode. Emitting them causes Chromium to crash on start-up for users with such chips. (GCC puts these instructions behind -msahf on 64-bit for the same reason.) This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering from before r244503 when the instructions are not available. Differential Revision: http://reviews.llvm.org/D15240 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254793 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-04	X86InstrInfo::copyPhysReg: workaround reg liveness	JF Bastien	1	-3/+13
	Summary: computeRegisterLiveness and analyzePhysReg are currently getting confused about liveness in some cases, breaking copyPhysReg's calculation of whether AX is dead in some cases. Work around this issue temporarily by assuming that AX is always live. See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7 And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201. This workaround makes the code correct but slightly inefficient, but it seems to confuse the machine instr verifier which now things EAX was undefined in some cases where it's being conservatively saved / restored. Reviewers: majnemer, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15198 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254680 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-01	[X86] Use range-based for loops. NFC	Craig Topper	1	-6/+6
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254387 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-01	[X86] Use array_lengthof instead of calculating manually. Also change index ↵	Craig Topper	1	-7/+7
	types to size_t to match. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254386 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-30	Revert r254279 "[X86] Use ArrayRef. NFC". It seems to have upset an MSVC ↵	Craig Topper	1	-4/+7
	build bot. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254280 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-30	[X86] Use ArrayRef. NFC	Craig Topper	1	-7/+4
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254279 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-26	X86-FMA3: Improved/enabled the memory folding optimization for scalar loads	Vyacheslav Klochkov	1	-0/+12
	generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}(). Reviewer: David Kreitzer Differential Revision: http://reviews.llvm.org/D14762 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254140 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-24	[x86] remove duplicate movq instruction defs (PR25554)	Sanjay Patel	1	-2/+0
	We had duplicated definitions for the same hardware '[v]movq' instructions. For example with SSE: def MOVZQI2PQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src), "mov{d\|q}\t{$src, $dst\|$dst, $src}", // X86-64 only [(set VR128:$dst, (v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))))], IIC_SSE_MOVDQ>; def MOV64toPQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src), "mov{d\|q}\t{$src, $dst\|$dst, $src}", [(set VR128:$dst, (v2i64 (scalar_to_vector GR64:$src)))], IIC_SSE_MOVDQ>, Sched<[WriteMove]>; As shown in the test case and PR25554: https://llvm.org/bugs/show_bug.cgi?id=25554 This causes us to miss reusing an operand because later passes don't know these 'movq' are the same instruction. This patch deletes one pair of these defs. Sadly, this won't fix the original test case in the bug report. Something else is still broken. Differential Revision: http://reviews.llvm.org/D14941 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253988 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-19	[X86] Use existing MachineInstrBuilder::addDisp to create offseted pointer. NFC.	Simon Pilgrim	1	-8/+1
	Minor code duplication tidyup to D13988 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253606 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-19	AVX-512: Fixed COPY_TO_REGCLASS for mask registers	Elena Demikhovsky	1	-13/+49
	Copying one mask register to another under BW should be done with kmovq instruction, otherwise we can loose some bits. Copying 8 bits under DQ may be done with kmovb. Differential Revision: http://reviews.llvm.org/D14812 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253563 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-13	X86-FMA3: Implemented commute transformations FMA*_Int instructions.	Vyacheslav Klochkov	1	-118/+206
	It made it possible to apply the memory folding optimization for the 2nd operand of FMA*_Int instructions. Reviewer: Quentin Colombet Differential Revision: http://reviews.llvm.org/D14550 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252973 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-12	My first/test commit. Removed a trailing whitespace.	Vyacheslav Klochkov	1	-1/+1
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252940 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-06	Improved the operands commute transformation for X86-FMA3 instructions.	Andrew Kaylor	1	-27/+326
	All 3 operands of FMA3 instructions are commutable now. Patch by Slava Klochkov Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab). Differential Revision: http://reviews.llvm.org/D13269 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252335 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-04	Warning fix.	Simon Pilgrim	1	-2/+2
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252078 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-04	[X86][SSE] Add general memory folding for (V)INSERTPS instruction	Simon Pilgrim	1	-7/+71
	This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction. The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening. This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done. It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted. Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll Differential Revision: http://reviews.llvm.org/D13988 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252074 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-04	Created new X86 FMA3 opcodes (FMA*_Int) that are used now for lowering of ↵	Andrew Kaylor	1	-0/+24
	scalar FMA intrinsics. Patch by Slava Klochkov The key difference between FMA* and FMA_Int opcodes is that FMA_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result. Reviewers: Quentin Colombet, Elena Demikhovsky Differential Revision: http://reviews.llvm.org/D13710 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252060 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-26	AVX512: Add AVX-512 not materializable instructions.	Igor Breger	1	-1/+29
	Otherwise value can be reused , despite its value could be changed - produces incorrect assembler. https://llvm.org/bugs/show_bug.cgi?id=25270 Differential Revision: http://reviews.llvm.org/D14057 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251275 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-01	[WinEH] Make FuncletLayout more robust against catchret	David Majnemer	1	-3/+5
	Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249052 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-30	[x86] enable machine combiner reassociations for 256-bit vector logical ↵	Sanjay Patel	1	-0/+3
	integer insts git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248955 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-28	Improved the interface of methods commuting operands, improved X86-FMA3 ↵	Andrew Kaylor	1	-59/+57
	mem-folding&coalescing. Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com) Differential Revision: http://reviews.llvm.org/D11370 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248735 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-21	[Machine Combiner] Refactor machine reassociation code to be target-independent.	Chad Rosier	1	-206/+9
	No functional change intended. Patch by Haicheng Wu <haicheng@codeaurora.org>! http://reviews.llvm.org/D12887 PR24522 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248164 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-17	AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1>	Elena Demikhovsky	1	-0/+4
	AVX-512 does not provide an instruction that shuffles mask register. So I do the following way: mask-2-simd , shuffle simd , simd-2-mask Differential Revision: http://reviews.llvm.org/D12727 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247876 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-12	[x86] enable machine combiner reassociations for 128-bit vector logical ↵	Sanjay Patel	1	-0/+6
	integer insts (2nd try) The changes in: test/CodeGen/X86/machine-cp.ll are just due to scheduling differences after some logic instructions were reassociated. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247516 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-12	revert r247506; need to verify changes in existing tests	Sanjay Patel	1	-6/+0
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247507 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-12	[x86] enable machine combiner reassociations for 128-bit vector logical ↵	Sanjay Patel	1	-0/+6
	integer insts git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247506 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-03	[x86] enable machine combiner reassociations for scalar 'xor' insts	Sanjay Patel	1	-0/+4
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246781 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-01	rename "slow-unaligned-mem-under-32" to slow-unaligned-mem-16" (NFCI)	Sanjay Patel	1	-3/+3
	This is a follow-on suggested by: http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 ) http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 ) This makes the attribute name match most of the existing lowering logic and regression test expectations. But the current use of this attribute is inconsistent; see the FIXME comment for "allowsMisalignedMemoryAccesses()". That change will result in functional changes and should be coming soon. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246585 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-31	[x86] enable machine combiner reassociations for scalar 'or' insts	Sanjay Patel	1	-0/+4
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246481 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-30	[MIR Serialization] static -> static const in ↵	Hal Finkel	1	-1/+1
	getSerializable*MachineOperandTargetFlags Make the arrays 'static const' instead of just 'static'. Post-commit review comment from Roman Divacky on IRC. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246376 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-28	[x86] enable machine combiner reassociations for scalar 'and' insts	Sanjay Patel	1	-1/+5
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246300 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-26	Expose hasLiveCondCodeDef as a member function of the X86InstrInfo class. NFC	Andrew Kaylor	1	-1/+1
	This takes the existing static function hasLiveCondCodeDef and makes it a member function of the X86InstrInfo class. This is a useful utility function that an upcoming change would like to use. NFC. Patch by: Kevin B. Smith Differential Revision: http://reviews.llvm.org/D12371 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246073 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21	[x86] enable machine combiner reassociations for 256-bit vector min/max	Sanjay Patel	1	-0/+4
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245735 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21	[x86] invert logic for attribute 'FeatureFastUAMem'	Sanjay Patel	1	-3/+8
	This is a 'no functional change intended' patch. It removes one FIXME, but adds several more. Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes. Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute to new chips; fast unaligned accesses have been standard for several generations of CPUs now. Differential Revision: http://reviews.llvm.org/D12154 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245729 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21	[x86] enable machine combiner reassociations for 128-bit vector min/max	Sanjay Patel	1	-0/+8
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245715 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-19	[x86] enable machine combiner reassociations for scalar double-precision min/max	Sanjay Patel	1	-0/+4
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245506 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-19	[x86] enable machine combiner reassociations for scalar single-precision ↵	Sanjay Patel	1	-0/+2
	maximums git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245504 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-15	[x86] enable machine combiner reassociations for scalar single-precision ↵	Sanjay Patel	1	-0/+6
	minimums git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245166 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12	fix typo; NFC	Sanjay Patel	1	-1/+1
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244753 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12	[x86] enable machine combiner reassociations for 256-bit vector FP mul/add	Sanjay Patel	1	-0/+4
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244705 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11	PseudoSourceValue: Replace global manager with a manager in a machine function.	Alex Lorenz	1	-2/+2
	This commit removes the global manager variable which is responsible for storing and allocating pseudo source values and instead it introduces a new manager class named 'PseudoSourceValueManager'. Machine functions now own an instance of the pseudo source value manager class. This commit also modifies the 'get...' methods in the 'MachinePointerInfo' class to construct pseudo source values using the instance of the pseudo source value manager object from the machine function. This commit updates calls to the 'get...' methods from the 'MachinePointerInfo' class in a lot of different files because those calls now need to pass in a reference to a machine function to those methods. This change will make it easier to serialize pseudo source values as it will enable me to transform the mips specific MipsCallEntry PseudoSourceValue subclass into two target independent subclasses. Reviewers: Akira Hatanaka git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244693 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11	[x86] enable machine combiner reassociations for 128-bit vector ↵	Sanjay Patel	1	-2/+6
	single/double multiplies git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244657 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-10	x86: Emit LAHF/SAHF instead of PUSHF/POPF	JF Bastien	1	-26/+51
	NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF. As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire. I did [[ https://github.com/jfbastien/benchmark-x86-flags \| a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are: \| Time per call (ms) \| Runtime (ms) \| Benchmark \| \| 0.000012514 \| 6257 \| sete.i386 \| \| 0.000012810 \| 6405 \| sete.i386-fast \| \| 0.000010456 \| 5228 \| sete.x86-64 \| \| 0.000010496 \| 5248 \| sete.x86-64-fast \| \| 0.000012906 \| 6453 \| lahf-sahf.i386 \| \| 0.000013236 \| 6618 \| lahf-sahf.i386-fast \| \| 0.000010580 \| 5290 \| lahf-sahf.x86-64 \| \| 0.000010304 \| 5152 \| lahf-sahf.x86-64-fast \| \| 0.000028056 \| 14028 \| pushf-popf.i386 \| \| 0.000027160 \| 13580 \| pushf-popf.i386-fast \| \| 0.000023810 \| 11905 \| pushf-popf.x86-64 \| \| 0.000026468 \| 13234 \| pushf-popf.x86-64-fast \| Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose. Reviewers: rnk, jvoung, t.p.northover Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D6629 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244503 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-10	fix minsize detection: minsize attribute implies optimizing for size	Sanjay Patel	1	-5/+2
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244499 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-10	fix minsize detection: minsize attribute implies optimizing for size	Sanjay Patel	1	-3/+1
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244464 91177308-0d34-0410-b5e6-96231b3b80d8