Armv86bisa PDF

Arm A64 Instruction Set Architecture
Armv8, for Armv8-A architecture profile

Beta
Copyright © 2010-2020 Arm Limited (or its affiliates). All rights reserved.
DDI 0596 (ID033020)
Arm A64 Instruction Set Architecture
Armv8, for Armv8-A architecture profile
Release Information
For information on the change history and known issues for this release, see the Release Notes in the A64 ISA XML for
Armv8.6 (2020-03).
Proprietary Notice
This document is protected by copyright and other related rights and the practice or implementation of the information contained
in this document may be protected by one or more patents or pending patent applications. No part of this document may be
reproduced in any form by any means without the express prior written permission of Arm. No license, express or implied, by
estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR
PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to,
and has undertaken no analysis to identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.
This document may include technical inaccuracies or typographical errors.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure
of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof
is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to Arm’s customers
is not intended to create or refer to any partnership relationship with any other company. Arm may make changes to this document
at any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any click through or signed written
agreement covering this document with Arm, then the click through or signed written agreement prevails over and supersedes the
conflicting provisions of these terms. This document may be translated into other languages for convenience, and you agree that
if there is any conflict between the English version of this document and any translation, the terms of the English version of the
Agreement shall prevail.
The Arm corporate logo and words marked with ™ or © are registered trademarks or trademarks of Arm Limited (or its subsidiaries)
in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of
their respective owners. You must follow the Arm’s trademark usage guidelines
http://www.arm.com/company/policies/trademarks.
Arm Limited. Company 02557590 registered in England.

110 Fulbourn Road, Cambridge, England CB1 9NJ.
LES-PRE-20349
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to.
Product Status
The information in this document is for a Beta product, that is a product under development.
Web Address
http://www.arm.com
ii Copyright © 2010-2020 Arm Limited (or its affiliates). All rights reserved. DDI 0596
Non-Confidential - Beta ID033020
A64 -- Base Instructions (alphabetic order)
ADC: Add with Carry.
ADCS: Add with Carry, setting flags.
ADD (extended register): Add (extended register).
ADD (immediate): Add (immediate).
ADD (shifted register): Add (shifted register).
ADDG: Add with Tag.
ADDS (extended register): Add (extended register), setting flags.
ADDS (immediate): Add (immediate), setting flags.
ADDS (shifted register): Add (shifted register), setting flags.
ADR: Form PC-relative address.
ADRP: Form PC-relative address to 4KB page.
AND (immediate): Bitwise AND (immediate).
AND (shifted register): Bitwise AND (shifted register).
ANDS (immediate): Bitwise AND (immediate), setting flags.
ANDS (shifted register): Bitwise AND (shifted register), setting flags.
ASR (immediate): Arithmetic Shift Right (immediate): an alias of SBFM.
ASR (register): Arithmetic Shift Right (register): an alias of ASRV.
ASRV: Arithmetic Shift Right Variable.
AT: Address Translate: an alias of SYS.
AUTDA, AUTDZA: Authenticate Data address, using key A.
AUTDB, AUTDZB: Authenticate Data address, using key B.
AUTIA, AUTIA1716, AUTIASP, AUTIAZ, AUTIZA: Authenticate Instruction address, using key A.
AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, AUTIZB: Authenticate Instruction address, using key B.
AXFLAG: Convert floating-point condition flags from Arm to external format.
B: Branch.
B.cond: Branch conditionally.
BFC: Bitfield Clear: an alias of BFM.
BFI: Bitfield Insert: an alias of BFM.
BFM: Bitfield Move.
BFXIL: Bitfield extract and insert at low end: an alias of BFM.
BIC (shifted register): Bitwise Bit Clear (shifted register).
BICS (shifted register): Bitwise Bit Clear (shifted register), setting flags.
BL: Branch with Link.
BLR: Branch with Link to Register.
Page 2
BLRAA, BLRAAZ, BLRAB, BLRABZ: Branch with Link to Register, with pointer authentication.
BR: Branch to Register.
BRAA, BRAAZ, BRAB, BRABZ: Branch to Register, with pointer authentication.
BRK: Breakpoint instruction.
BTI: Branch Target Identification.
CAS, CASA, CASAL, CASL: Compare and Swap word or doubleword in memory.
CASB, CASAB, CASALB, CASLB: Compare and Swap byte in memory.
CASH, CASAH, CASALH, CASLH: Compare and Swap halfword in memory.
CASP, CASPA, CASPAL, CASPL: Compare and Swap Pair of words or doublewords in memory.
CBNZ: Compare and Branch on Nonzero.
CBZ: Compare and Branch on Zero.
CCMN (immediate): Conditional Compare Negative (immediate).
CCMN (register): Conditional Compare Negative (register).
CCMP (immediate): Conditional Compare (immediate).
CCMP (register): Conditional Compare (register).
CFINV: Invert Carry Flag.
CFP: Control Flow Prediction Restriction by Context: an alias of SYS.
CINC: Conditional Increment: an alias of CSINC.
CINV: Conditional Invert: an alias of CSINV.
CLREX: Clear Exclusive.
CLS: Count Leading Sign bits.
CLZ: Count Leading Zeros.
CMN (extended register): Compare Negative (extended register): an alias of ADDS (extended register).
CMN (immediate): Compare Negative (immediate): an alias of ADDS (immediate).
CMN (shifted register): Compare Negative (shifted register): an alias of ADDS (shifted register).
CMP (extended register): Compare (extended register): an alias of SUBS (extended register).
CMP (immediate): Compare (immediate): an alias of SUBS (immediate).
CMP (shifted register): Compare (shifted register): an alias of SUBS (shifted register).
CMPP: Compare with Tag: an alias of SUBPS.
CNEG: Conditional Negate: an alias of CSNEG.
CPP: Cache Prefetch Prediction Restriction by Context: an alias of SYS.
CRC32B, CRC32H, CRC32W, CRC32X: CRC32 checksum.
CRC32CB, CRC32CH, CRC32CW, CRC32CX: CRC32C checksum.
CSDB: Consumption of Speculative Data Barrier.
CSEL: Conditional Select.
CSET: Conditional Set: an alias of CSINC.
Page 3
CSETM: Conditional Set Mask: an alias of CSINV.
CSINC: Conditional Select Increment.
CSINV: Conditional Select Invert.
CSNEG: Conditional Select Negation.
DC: Data Cache operation: an alias of SYS.
DCPS1: Debug Change PE State to EL1..
DCPS2: Debug Change PE State to EL2..
DCPS3: Debug Change PE State to EL3.
DGH: Data Gathering Hint.
DMB: Data Memory Barrier.
DRPS: Debug restore process state.
DSB: Data Synchronization Barrier.
DVP: Data Value Prediction Restriction by Context: an alias of SYS.
EON (shifted register): Bitwise Exclusive OR NOT (shifted register).
EOR (immediate): Bitwise Exclusive OR (immediate).
EOR (shifted register): Bitwise Exclusive OR (shifted register).
ERET: Exception Return.
ERETAA, ERETAB: Exception Return, with pointer authentication.
ESB: Error Synchronization Barrier.
EXTR: Extract register.
GMI: Tag Mask Insert.
HINT: Hint instruction.
HLT: Halt instruction.
HVC: Hypervisor Call.
IC: Instruction Cache operation: an alias of SYS.
IRG: Insert Random Tag.
ISB: Instruction Synchronization Barrier.
LDADD, LDADDA, LDADDAL, LDADDL: Atomic add on word or doubleword in memory.
LDADDB, LDADDAB, LDADDALB, LDADDLB: Atomic add on byte in memory.
LDADDH, LDADDAH, LDADDALH, LDADDLH: Atomic add on halfword in memory.
LDAPR: Load-Acquire RCpc Register.
LDAPRB: Load-Acquire RCpc Register Byte.
LDAPRH: Load-Acquire RCpc Register Halfword.
LDAPUR: Load-Acquire RCpc Register (unscaled).
LDAPURB: Load-Acquire RCpc Register Byte (unscaled).
LDAPURH: Load-Acquire RCpc Register Halfword (unscaled).
Page 4
LDAPURSB: Load-Acquire RCpc Register Signed Byte (unscaled).
LDAPURSH: Load-Acquire RCpc Register Signed Halfword (unscaled).
LDAPURSW: Load-Acquire RCpc Register Signed Word (unscaled).
LDAR: Load-Acquire Register.
LDARB: Load-Acquire Register Byte.
LDARH: Load-Acquire Register Halfword.
LDAXP: Load-Acquire Exclusive Pair of Registers.
LDAXR: Load-Acquire Exclusive Register.
LDAXRB: Load-Acquire Exclusive Register Byte.
LDAXRH: Load-Acquire Exclusive Register Halfword.
LDCLR, LDCLRA, LDCLRAL, LDCLRL: Atomic bit clear on word or doubleword in memory.
LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB: Atomic bit clear on byte in memory.
LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH: Atomic bit clear on halfword in memory.
LDEOR, LDEORA, LDEORAL, LDEORL: Atomic exclusive OR on word or doubleword in memory.
LDEORB, LDEORAB, LDEORALB, LDEORLB: Atomic exclusive OR on byte in memory.
LDEORH, LDEORAH, LDEORALH, LDEORLH: Atomic exclusive OR on halfword in memory.
LDG: Load Allocation Tag.
LDGM: Load Tag Multiple.
LDLAR: Load LOAcquire Register.
LDLARB: Load LOAcquire Register Byte.
LDLARH: Load LOAcquire Register Halfword.
LDNP: Load Pair of Registers, with non-temporal hint.
LDP: Load Pair of Registers.
LDPSW: Load Pair of Registers Signed Word.
LDR (immediate): Load Register (immediate).
LDR (literal): Load Register (literal).
LDR (register): Load Register (register).
LDRAA, LDRAB: Load Register, with pointer authentication.
LDRB (immediate): Load Register Byte (immediate).
LDRB (register): Load Register Byte (register).
LDRH (immediate): Load Register Halfword (immediate).
LDRH (register): Load Register Halfword (register).
LDRSB (immediate): Load Register Signed Byte (immediate).
LDRSB (register): Load Register Signed Byte (register).
LDRSH (immediate): Load Register Signed Halfword (immediate).
LDRSH (register): Load Register Signed Halfword (register).
Page 5
LDRSW (immediate): Load Register Signed Word (immediate).
LDRSW (literal): Load Register Signed Word (literal).
LDRSW (register): Load Register Signed Word (register).
LDSET, LDSETA, LDSETAL, LDSETL: Atomic bit set on word or doubleword in memory.
LDSETB, LDSETAB, LDSETALB, LDSETLB: Atomic bit set on byte in memory.
LDSETH, LDSETAH, LDSETALH, LDSETLH: Atomic bit set on halfword in memory.
LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL: Atomic signed maximum on word or doubleword in memory.
LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB: Atomic signed maximum on byte in memory.
LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH: Atomic signed maximum on halfword in memory.
LDSMIN, LDSMINA, LDSMINAL, LDSMINL: Atomic signed minimum on word or doubleword in memory.
LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB: Atomic signed minimum on byte in memory.
LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH: Atomic signed minimum on halfword in memory.
LDTR: Load Register (unprivileged).
LDTRB: Load Register Byte (unprivileged).
LDTRH: Load Register Halfword (unprivileged).
LDTRSB: Load Register Signed Byte (unprivileged).
LDTRSH: Load Register Signed Halfword (unprivileged).
LDTRSW: Load Register Signed Word (unprivileged).
LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL: Atomic unsigned maximum on word or doubleword in memory.
LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB: Atomic unsigned maximum on byte in memory.
LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH: Atomic unsigned maximum on halfword in memory.
LDUMIN, LDUMINA, LDUMINAL, LDUMINL: Atomic unsigned minimum on word or doubleword in memory.
LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB: Atomic unsigned minimum on byte in memory.
LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH: Atomic unsigned minimum on halfword in memory.
LDUR: Load Register (unscaled).
LDURB: Load Register Byte (unscaled).
LDURH: Load Register Halfword (unscaled).
LDURSB: Load Register Signed Byte (unscaled).
LDURSH: Load Register Signed Halfword (unscaled).
LDURSW: Load Register Signed Word (unscaled).
LDXP: Load Exclusive Pair of Registers.
LDXR: Load Exclusive Register.
LDXRB: Load Exclusive Register Byte.
LDXRH: Load Exclusive Register Halfword.
LSL (immediate): Logical Shift Left (immediate): an alias of UBFM.
LSL (register): Logical Shift Left (register): an alias of LSLV.
Page 6
LSLV: Logical Shift Left Variable.
LSR (immediate): Logical Shift Right (immediate): an alias of UBFM.
LSR (register): Logical Shift Right (register): an alias of LSRV.
LSRV: Logical Shift Right Variable.
MADD: Multiply-Add.
MNEG: Multiply-Negate: an alias of MSUB.
MOV (bitmask immediate): Move (bitmask immediate): an alias of ORR (immediate).
MOV (inverted wide immediate): Move (inverted wide immediate): an alias of MOVN.
MOV (register): Move (register): an alias of ORR (shifted register).
MOV (to/from SP): Move between register and stack pointer: an alias of ADD (immediate).
MOV (wide immediate): Move (wide immediate): an alias of MOVZ.
MOVK: Move wide with keep.
MOVN: Move wide with NOT.
MOVZ: Move wide with zero.
MRS: Move System Register.
MSR (immediate): Move immediate value to Special Register.
MSR (register): Move general-purpose register to System Register.
MSUB: Multiply-Subtract.
MUL: Multiply: an alias of MADD.
MVN: Bitwise NOT: an alias of ORN (shifted register).
NEG (shifted register): Negate (shifted register): an alias of SUB (shifted register).
NEGS: Negate, setting flags: an alias of SUBS (shifted register).
NGC: Negate with Carry: an alias of SBC.
NGCS: Negate with Carry, setting flags: an alias of SBCS.
NOP: No Operation.
ORN (shifted register): Bitwise OR NOT (shifted register).
ORR (immediate): Bitwise OR (immediate).
ORR (shifted register): Bitwise OR (shifted register).
PACDA, PACDZA: Pointer Authentication Code for Data address, using key A.
PACDB, PACDZB: Pointer Authentication Code for Data address, using key B.
PACGA: Pointer Authentication Code, using Generic key.
PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA: Pointer Authentication Code for Instruction address, using key A.
PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB: Pointer Authentication Code for Instruction address, using key B.
PRFM (immediate): Prefetch Memory (immediate).
PRFM (literal): Prefetch Memory (literal).
PRFM (register): Prefetch Memory (register).
Page 7
PRFUM: Prefetch Memory (unscaled offset).
PSB CSYNC: Profiling Synchronization Barrier.
PSSBB: Physical Speculative Store Bypass Barrier.
RBIT: Reverse Bits.
RET: Return from subroutine.
RETAA, RETAB: Return from subroutine, with pointer authentication.
REV: Reverse Bytes.
REV16: Reverse bytes in 16-bit halfwords.
REV32: Reverse bytes in 32-bit words.
REV64: Reverse Bytes: an alias of REV.
RMIF: Rotate, Mask Insert Flags.
ROR (immediate): Rotate right (immediate): an alias of EXTR.
ROR (register): Rotate Right (register): an alias of RORV.
RORV: Rotate Right Variable.
SB: Speculation Barrier.
SBC: Subtract with Carry.
SBCS: Subtract with Carry, setting flags.
SBFIZ: Signed Bitfield Insert in Zero: an alias of SBFM.
SBFM: Signed Bitfield Move.
SBFX: Signed Bitfield Extract: an alias of SBFM.
SDIV: Signed Divide.
SETF8, SETF16: Evaluation of 8 or 16 bit flag values.
SEV: Send Event.
SEVL: Send Event Local.
SMADDL: Signed Multiply-Add Long.
SMC: Secure Monitor Call.
SMNEGL: Signed Multiply-Negate Long: an alias of SMSUBL.
SMSUBL: Signed Multiply-Subtract Long.
SMULH: Signed Multiply High.
SMULL: Signed Multiply Long: an alias of SMADDL.
SSBB: Speculative Store Bypass Barrier.
ST2G: Store Allocation Tags.
STADD, STADDL: Atomic add on word or doubleword in memory, without return: an alias of LDADD, LDADDA,
LDADDAL, LDADDL.
STADDB, STADDLB: Atomic add on byte in memory, without return: an alias of LDADDB, LDADDAB, LDADDALB,
LDADDLB.
STADDH, STADDLH: Atomic add on halfword in memory, without return: an alias of LDADDH, LDADDAH, LDADDALH,
LDADDLH.
Page 8
STCLR, STCLRL: Atomic bit clear on word or doubleword in memory, without return: an alias of LDCLR, LDCLRA,
LDCLRAL, LDCLRL.
STCLRB, STCLRLB: Atomic bit clear on byte in memory, without return: an alias of LDCLRB, LDCLRAB, LDCLRALB,
LDCLRLB.
STCLRH, STCLRLH: Atomic bit clear on halfword in memory, without return: an alias of LDCLRH, LDCLRAH,
LDCLRALH, LDCLRLH.
STEOR, STEORL: Atomic exclusive OR on word or doubleword in memory, without return: an alias of LDEOR,
LDEORA, LDEORAL, LDEORL.
STEORB, STEORLB: Atomic exclusive OR on byte in memory, without return: an alias of LDEORB, LDEORAB,
LDEORALB, LDEORLB.
STEORH, STEORLH: Atomic exclusive OR on halfword in memory, without return: an alias of LDEORH, LDEORAH,
LDEORALH, LDEORLH.
STG: Store Allocation Tag.
STGM: Store Tag Multiple.
STGP: Store Allocation Tag and Pair of registers.
STLLR: Store LORelease Register.
STLLRB: Store LORelease Register Byte.
STLLRH: Store LORelease Register Halfword.
STLR: Store-Release Register.
STLRB: Store-Release Register Byte.
STLRH: Store-Release Register Halfword.
STLUR: Store-Release Register (unscaled).
STLURB: Store-Release Register Byte (unscaled).
STLURH: Store-Release Register Halfword (unscaled).
STLXP: Store-Release Exclusive Pair of registers.
STLXR: Store-Release Exclusive Register.
STLXRB: Store-Release Exclusive Register Byte.
STLXRH: Store-Release Exclusive Register Halfword.
STNP: Store Pair of Registers, with non-temporal hint.
STP: Store Pair of Registers.
STR (immediate): Store Register (immediate).
STR (register): Store Register (register).
STRB (immediate): Store Register Byte (immediate).
STRB (register): Store Register Byte (register).
STRH (immediate): Store Register Halfword (immediate).
STRH (register): Store Register Halfword (register).
STSET, STSETL: Atomic bit set on word or doubleword in memory, without return: an alias of LDSET, LDSETA,
LDSETAL, LDSETL.
STSETB, STSETLB: Atomic bit set on byte in memory, without return: an alias of LDSETB, LDSETAB, LDSETALB,
LDSETLB.
Page 9
STSETH, STSETLH: Atomic bit set on halfword in memory, without return: an alias of LDSETH, LDSETAH, LDSETALH,
LDSETLH.
STSMAX, STSMAXL: Atomic signed maximum on word or doubleword in memory, without return: an alias of LDSMAX,
LDSMAXA, LDSMAXAL, LDSMAXL.
STSMAXB, STSMAXLB: Atomic signed maximum on byte in memory, without return: an alias of LDSMAXB,
LDSMAXAB, LDSMAXALB, LDSMAXLB.
STSMAXH, STSMAXLH: Atomic signed maximum on halfword in memory, without return: an alias of LDSMAXH,
LDSMAXAH, LDSMAXALH, LDSMAXLH.
STSMIN, STSMINL: Atomic signed minimum on word or doubleword in memory, without return: an alias of LDSMIN,
LDSMINA, LDSMINAL, LDSMINL.
STSMINB, STSMINLB: Atomic signed minimum on byte in memory, without return: an alias of LDSMINB, LDSMINAB,
LDSMINALB, LDSMINLB.
STSMINH, STSMINLH: Atomic signed minimum on halfword in memory, without return: an alias of LDSMINH,
LDSMINAH, LDSMINALH, LDSMINLH.
STTR: Store Register (unprivileged).
STTRB: Store Register Byte (unprivileged).
STTRH: Store Register Halfword (unprivileged).
STUMAX, STUMAXL: Atomic unsigned maximum on word or doubleword in memory, without return: an alias of
LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL.
STUMAXB, STUMAXLB: Atomic unsigned maximum on byte in memory, without return: an alias of LDUMAXB,
LDUMAXAB, LDUMAXALB, LDUMAXLB.
STUMAXH, STUMAXLH: Atomic unsigned maximum on halfword in memory, without return: an alias of LDUMAXH,
LDUMAXAH, LDUMAXALH, LDUMAXLH.
STUMIN, STUMINL: Atomic unsigned minimum on word or doubleword in memory, without return: an alias of
LDUMIN, LDUMINA, LDUMINAL, LDUMINL.
STUMINB, STUMINLB: Atomic unsigned minimum on byte in memory, without return: an alias of LDUMINB,
LDUMINAB, LDUMINALB, LDUMINLB.
STUMINH, STUMINLH: Atomic unsigned minimum on halfword in memory, without return: an alias of LDUMINH,
LDUMINAH, LDUMINALH, LDUMINLH.
STUR: Store Register (unscaled).
STURB: Store Register Byte (unscaled).
STURH: Store Register Halfword (unscaled).
STXP: Store Exclusive Pair of registers.
STXR: Store Exclusive Register.
STXRB: Store Exclusive Register Byte.
STXRH: Store Exclusive Register Halfword.
STZ2G: Store Allocation Tags, Zeroing.
STZG: Store Allocation Tag, Zeroing.
STZGM: Store Tag and Zero Multiple.
SUB (extended register): Subtract (extended register).
SUB (immediate): Subtract (immediate).
SUB (shifted register): Subtract (shifted register).
SUBG: Subtract with Tag.
Page 10
SUBP: Subtract Pointer.
SUBPS: Subtract Pointer, setting Flags.
SUBS (extended register): Subtract (extended register), setting flags.
SUBS (immediate): Subtract (immediate), setting flags.
SUBS (shifted register): Subtract (shifted register), setting flags.
SVC: Supervisor Call.
SWP, SWPA, SWPAL, SWPL: Swap word or doubleword in memory.
SWPB, SWPAB, SWPALB, SWPLB: Swap byte in memory.
SWPH, SWPAH, SWPALH, SWPLH: Swap halfword in memory.
SXTB: Signed Extend Byte: an alias of SBFM.
SXTH: Sign Extend Halfword: an alias of SBFM.
SXTW: Sign Extend Word: an alias of SBFM.
SYS: System instruction.
SYSL: System instruction with result.
TBNZ: Test bit and Branch if Nonzero.
TBZ: Test bit and Branch if Zero.
TLBI: TLB Invalidate operation: an alias of SYS.
TSB CSYNC: Trace Synchronization Barrier.
TST (immediate): Test bits (immediate): an alias of ANDS (immediate).
TST (shifted register): Test (shifted register): an alias of ANDS (shifted register).
UBFIZ: Unsigned Bitfield Insert in Zero: an alias of UBFM.
UBFM: Unsigned Bitfield Move.
UBFX: Unsigned Bitfield Extract: an alias of UBFM.
UDF: Permanently Undefined.
UDIV: Unsigned Divide.
UMADDL: Unsigned Multiply-Add Long.
UMNEGL: Unsigned Multiply-Negate Long: an alias of UMSUBL.
UMSUBL: Unsigned Multiply-Subtract Long.
UMULH: Unsigned Multiply High.
UMULL: Unsigned Multiply Long: an alias of UMADDL.
UXTB: Unsigned Extend Byte: an alias of UBFM.
UXTH: Unsigned Extend Halfword: an alias of UBFM.
WFE: Wait For Event.
WFI: Wait For Interrupt.
XAFLAG: Convert floating-point condition flags from external format to Arm format.
XPACD, XPACI, XPACLRI: Strip Pointer Authentication Code.
Page 11
YIELD: YIELD.
Internal version only: isa v32.03, AdvSIMD v29.02, pseudocode v2020-03, sve v2020-03_rc1 ; Build timestamp: 2020-04-15T14:28
Copyright © 2010-2020 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Page 12
ADC
Add with Carry adds two register values and the Carry flag value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S
32-bit (sf == 0)
ADC <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
ADC <Xd>, <Xn>, <Xm>
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
(result, -) = AddWithCarry(operand1, operand2, PSTATE.C);
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
ADC Page 13
ADCS
Add with Carry, setting flags, adds two register values and the Carry flag value, and writes the result to the destination
register. It updates the condition flags based on the result.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S
32-bit (sf == 0)
ADCS <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
ADCS <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation
bits(4) nzcv;
(result, nzcv) = AddWithCarry(operand1, operand2, PSTATE.C);
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
If PSTATE.DIT is 1:
ADCS Page 14
ADD (extended register)
Add (extended register) adds a register value and a sign or zero-extended register value, followed by an optional left
shift amount, and writes the result to the destination register. The argument that is extended from the <Rm> register
can be a byte, halfword, word, or doubleword.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S
32-bit (sf == 0)
ADD <Wd|WSP>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}
64-bit (sf == 1)
ADD <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}
ExtendType extend_type = DecodeRegExtend(option);
integer shift = UInt(imm3);
if shift > 4 then UNDEFINED;
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<R> Is a width specifier, encoded in “option”:
option <R>
00x W
010 W
x11 X
10x W
110 W
<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.
<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rd" or "Rn" is '11111' (WSP) and "option" is '010' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTW when "option" is '010'.
ADD (extended register) Page 15

For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rd" or "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.
Operation
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
(result, -) = AddWithCarry(operand1, operand2, '0');
if d == 31 then
SP[] = result;
else
X[d] = result;
If PSTATE.DIT is 1:
ADD (extended register) Page 16

ADD (immediate)
Add (immediate) adds a register value and an optionally-shifted immediate value, and writes the result to the
destination register.
This instruction is used by the alias MOV (to/from SP).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 0 1 0 sh imm12 Rn Rd
op S
32-bit (sf == 0)
ADD <Wd|WSP>, <Wn|WSP>, #<imm>{, <shift>}
64-bit (sf == 1)
ADD <Xd|SP>, <Xn|SP>, #<imm>{, <shift>}
bits(datasize) imm;
case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);
Assembler Symbols
field.
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #12
Alias Conditions
Alias Is preferred when

MOV (to/from sh == '0' && imm12 == '000000000000' && (Rd == '11111' || Rn == '11111')
SP)
ADD (immediate) Page 17

Operation
(result, -) = AddWithCarry(operand1, imm, '0');
if d == 31 then
SP[] = result;
else
X[d] = result;
If PSTATE.DIT is 1:

ADD (shifted register)
Add (shifted register) adds a register value and an optionally-shifted register value, and writes the result to the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S
32-bit (sf == 0)
ADD <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
ADD <Xd>, <Xn>, <Xm>{, <shift> #<amount>}
if shift == '11' then UNDEFINED;

if sf == '0' && imm6<5> == '1' then UNDEFINED;
ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);
Assembler Symbols
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Operation
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
X[d] = result;
ADD (shifted register) Page 19

If PSTATE.DIT is 1:
ADD (shifted register) Page 20

ADDG
Add with Tag adds an immediate value scaled by the Tag granule to the address in the source register, modifies the
Logical Address Tag of the address using an immediate value, and writes the result to the destination register. Tags
specified in GCR_EL1.Exclude are excluded from the possible outputs when modifying the Logical Address Tag.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 0 0 1 1 0 uimm6 (0) (0) uimm4 Xn Xd
op3
Integer
ADDG <Xd|SP>, <Xn|SP>, #<uimm6>, #<uimm4>
if !HaveMTEExt() then UNDEFINED;

integer d = UInt(Xd);
integer n = UInt(Xn);
bits(64) offset = LSL(ZeroExtend(uimm6, 64), LOG2_TAG_GRANULE);
Assembler Symbols
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Xd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Xn" field.
<uimm6> Is an unsigned immediate, a multiple of 16 in the range 0 to 1008, encoded in the "uimm6" field.
<uimm4> Is an unsigned immediate, in the range 0 to 15, encoded in the "uimm4" field.
Operation
bits(64) operand1 = if n == 31 then SP[] else X[n];

bits(4) start_tag = AArch64.AllocationTagFromAddress(operand1);
bits(16) exclude = GCR_EL1.Exclude;
bits(64) result;
bits(4) rtag;
if AArch64.AllocationTagAccessIsEnabled(AccType_NORMAL) then
rtag = AArch64.ChooseNonExcludedTag(start_tag, uimm4, exclude);
else
rtag = '0000';
(result, -) = AddWithCarry(operand1, offset, '0');

result = AArch64.AddressWithAllocationTag(result, AccType_NORMAL, rtag);
if d == 31 then
SP[] = result;
else
X[d] = result;
ADDG Page 21
ADDS (extended register)
Add (extended register), setting flags, adds a register value and a sign or zero-extended register value, followed by an
optional left shift amount, and writes the result to the destination register. The argument that is extended from the
<Rm> register can be a byte, halfword, word, or doubleword. It updates the condition flags based on the result.
This instruction is used by the alias CMN (extended register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op S
32-bit (sf == 0)
ADDS <Wd>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}
64-bit (sf == 1)
ADDS <Xd>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}
Assembler Symbols
field.
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
"Rm" field.
option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rn" is '11111' (WSP) and "option" is '010' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTW when "option" is '010'.
ADDS (extended register) Page 22

option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
Alias Conditions

CMN (extended register) Rd == '11111'
Operation
bits(4) nzcv;
(result, nzcv) = AddWithCarry(operand1, operand2, '0');
X[d] = result;
If PSTATE.DIT is 1:
ADDS (extended register) Page 23

ADDS (immediate)
Add (immediate), setting flags, adds a register value and an optionally-shifted immediate value, and writes the result
to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias CMN (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 0 1 0 sh imm12 Rn Rd
op S
32-bit (sf == 0)
ADDS <Wd>, <Wn|WSP>, #<imm>{, <shift>}
64-bit (sf == 1)
ADDS <Xd>, <Xn|SP>, #<imm>{, <shift>}
bits(datasize) imm;
case sh of
Assembler Symbols
sh <shift>
0 LSL #0
1 LSL #12
Alias Conditions

CMN (immediate) Rd == '11111'
Operation
bits(4) nzcv;
(result, nzcv) = AddWithCarry(operand1, imm, '0');
X[d] = result;
ADDS (immediate) Page 24

If PSTATE.DIT is 1:
ADDS (immediate) Page 25

ADDS (shifted register)
Add (shifted register), setting flags, adds a register value and an optionally-shifted register value, and writes the result
to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias CMN (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op S
32-bit (sf == 0)
ADDS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
ADDS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}


Assembler Symbols
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
"imm6" field.
"imm6" field.
Alias Conditions

CMN (shifted register) Rd == '11111'
ADDS (shifted register) Page 26

Operation
bits(4) nzcv;
X[d] = result;
If PSTATE.DIT is 1:
ADDS (shifted register) Page 27

ADR
Form PC-relative address adds an immediate value to the PC value to form a PC-relative address, and writes the result
to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 immlo 1 0 0 0 0 immhi Rd
op
Literal
ADR <Xd>, <label>
bits(64) imm;
imm = SignExtend(immhi:immlo, 64);
Assembler Symbols
<label> Is the program label whose address is to be calculated. Its offset from the address of this instruction, in
the range +/-1MB, is encoded in "immhi:immlo".
Operation
bits(64) base = PC[];
X[d] = base + imm;
ADR Page 28
ADRP
Form PC-relative address to 4KB page adds an immediate value that is shifted left by 12 bits, to the PC value to form a
PC-relative address, with the bottom 12 bits masked out, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 immlo 1 0 0 0 0 immhi Rd
op
Literal
ADRP <Xd>, <label>
bits(64) imm;
imm = SignExtend(immhi:immlo:Zeros(12), 64);
Assembler Symbols
<label> Is the program label whose 4KB page address is to be calculated. Its offset from the page address of
this instruction, in the range +/-4GB, is encoded as "immhi:immlo" times 4096.
Operation
bits(64) base = PC[];
base<11:0> = Zeros(12);
X[d] = base + imm;
ADRP Page 29
AND (immediate)
Bitwise AND (immediate) performs a bitwise AND of a register value and an immediate value, and writes the result to
the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 0 N immr imms Rn Rd
opc
32-bit (sf == 0 && N == 0)
AND <Wd|WSP>, <Wn>, #<imm>
64-bit (sf == 1)
AND <Xd|SP>, <Xn>, #<imm>
bits(datasize) imm;
if sf == '0' && N != '0' then UNDEFINED;
(imm, -) = DecodeBitMasks(N, imms, immr, TRUE);
Assembler Symbols
field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr".
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr".
Operation
result = operand1 AND imm;

if d == 31 then
SP[] = result;
else
X[d] = result;
If PSTATE.DIT is 1:
AND (immediate) Page 30

AND (shifted register)
Bitwise AND (shifted register) performs a bitwise AND of a register value and an optionally-shifted register value, and
writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc N
32-bit (sf == 0)
AND <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
AND <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

Assembler Symbols
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Operation

result = operand1 AND operand2;

X[d] = result;
If PSTATE.DIT is 1:
AND (shifted register) Page 31

AND (shifted register) Page 32

ANDS (immediate)
Bitwise AND (immediate), setting flags, performs a bitwise AND of a register value and an immediate value, and writes
the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias TST (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
ANDS <Wd>, <Wn>, #<imm>
64-bit (sf == 1)
ANDS <Xd>, <Xn>, #<imm>
bits(datasize) imm;
Assembler Symbols
Alias Conditions

TST (immediate) Rd == '11111'
Operation
result = operand1 AND imm;

PSTATE.<N,Z,C,V> = result<datasize-1>:IsZeroBit(result):'00';
X[d] = result;
If PSTATE.DIT is 1:
ANDS (immediate) Page 33

ANDS (immediate) Page 34

ANDS (shifted register)
Bitwise AND (shifted register), setting flags, performs a bitwise AND of a register value and an optionally-shifted
register value, and writes the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias TST (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc N
32-bit (sf == 0)
ANDS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
ANDS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Alias Conditions

TST (shifted register) Rd == '11111'
ANDS (shifted register) Page 35

Operation


X[d] = result;
If PSTATE.DIT is 1:
ANDS (shifted register) Page 36

ASR (register)
Arithmetic Shift Right (register) shifts a register value right by a variable number of bits, shifting in copies of its sign
bit, and writes the result to the destination register. The remainder obtained by dividing the second source register by
the data size defines the number of bits by which the first source register is right-shifted.
This is an alias of ASRV. This means:
• The encodings in this description are named to match the encodings of ASRV.
• The description of ASRV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 0 Rn Rd
op2
32-bit (sf == 0)
ASR <Wd>, <Wn>, <Wm>
is equivalent to
ASRV <Wd>, <Wn>, <Wm>
and is always the preferred disassembly.
64-bit (sf == 1)
ASR <Xd>, <Xn>, <Xm>
is equivalent to
ASRV <Xd>, <Xn>, <Xm>
Assembler Symbols
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
Operation
The description of ASRV gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
ASR (register) Page 37

ASR (register) Page 38

ASR (immediate)
Arithmetic Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in copies of
the sign bit in the upper bits and zeros in the lower bits, and writes the result to the destination register.
This is an alias of SBFM. This means:
• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N immr x 1 1 1 1 1 Rn Rd
opc imms
32-bit (sf == 0 && N == 0 && imms == 011111)
ASR <Wd>, <Wn>, #<shift>
is equivalent to
SBFM <Wd>, <Wn>, #<shift>, #31
64-bit (sf == 1 && N == 1 && imms == 111111)
ASR <Xd>, <Xn>, #<shift>
is equivalent to
SBFM <Xd>, <Xn>, #<shift>, #63
Assembler Symbols
<shift> For the 32-bit variant: is the shift amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, encoded in the "immr" field.
Operation
The description of SBFM gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
ASR (immediate) Page 39

ASRV
Arithmetic Shift Right Variable shifts a register value right by a variable number of bits, shifting in copies of its sign
bit, and writes the result to the destination register. The remainder obtained by dividing the second source register by
the data size defines the number of bits by which the first source register is right-shifted.
This instruction is used by the alias ASR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 0 Rn Rd
op2
32-bit (sf == 0)
ASRV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
ASRV <Xd>, <Xn>, <Xm>
ShiftType shift_type = DecodeShift(op2);
Assembler Symbols
Operation
result = ShiftReg(n, shift_type, UInt(operand2) MOD datasize);

X[d] = result;
If PSTATE.DIT is 1:
ASRV Page 40
AT
Address Translate. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
translation instructions.
This is an alias of SYS. This means:
• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 0 1 1 1 1 0 0 x op2 Rt
L CRn CRm
System
AT <at_op>, <Xt>
is equivalent to
SYS #<op1>, C7, <Cm>, #<op2>, <Xt>
and is the preferred disassembly when SysOp(op1,'0111',CRm,op2) == Sys_AT.
Assembler Symbols
<at_op> Is an AT instruction name, as listed for the AT system instruction group, encoded in
“op1:CRm<0>:op2”:
op1 CRm<0> op2 <at_op> Architectural Feature
000 0 000 S1E1R -
000 0 001 S1E1W -
000 0 010 S1E0R -
000 0 011 S1E0W -
000 1 000 S1E1RP ARMv8.2-ATS1E1
000 1 001 S1E1WP ARMv8.2-ATS1E1
100 0 000 S1E2R -
100 0 001 S1E2W -
100 0 100 S12E1R -
100 0 101 S12E1W -
100 0 110 S12E0R -
100 0 111 S12E0W -
110 0 000 S1E3R -
110 0 001 S1E3W -
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.
Operation
The description of SYS gives the operational pseudocode for this instruction.
AT Page 41
AUTDA, AUTDZA
Authenticate Data address, using key A. This instruction authenticates a data address, using a modifier and key A.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTDA.
• The value zero, for AUTDZA.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 1 0 Rn Rd
AUTDA (Z == 0)
AUTDA <Xd>, <Xn|SP>
AUTDZA (Z == 1 && Rn == 11111)
AUTDZA <Xd>
boolean source_is_sp = FALSE;

if !HavePACExt() then
UNDEFINED;
if Z == '0' then // AUTDA

if n == 31 then source_is_sp = TRUE;
else // AUTDZA
if n != 31 then UNDEFINED;
Assembler Symbols
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if HavePACExt() then
if source_is_sp then
X[d] = AuthDA(X[d], SP[], FALSE);
else
X[d] = AuthDA(X[d], X[n], FALSE);
AUTDA, AUTDZA Page 42

AUTDB, AUTDZB
Authenticate Data address, using key B. This instruction authenticates a data address, using a modifier and key B.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTDB.
• The value zero, for AUTDZB.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 1 1 Rn Rd
AUTDB (Z == 0)
AUTDB <Xd>, <Xn|SP>
AUTDZB (Z == 1 && Rn == 11111)
AUTDZB <Xd>

UNDEFINED;
if Z == '0' then // AUTDB

else // AUTDZB
Assembler Symbols
Operation
X[d] = AuthDB(X[d], SP[], FALSE);
else
X[d] = AuthDB(X[d], X[n], FALSE);
AUTDB, AUTDZB Page 43

AUTIA, AUTIA1716, AUTIASP, AUTIAZ, AUTIZA
Authenticate Instruction address, using key A. This instruction authenticates an instruction address, using a modifier
and key A.
The address is:
• In the general-purpose register that is specified by <Xd> for AUTIA and AUTIZA.
• In X17, for AUTIA1716.
• In X30, for AUTIASP and AUTIAZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTIA.
• The value zero, for AUTIZA and AUTIAZ.
• In X16, for AUTIA1716.
• In SP, for AUTIASP.
It has encodings from 2 classes: Integer and System
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 0 0 Rn Rd
AUTIA (Z == 0)
AUTIA <Xd>, <Xn|SP>
AUTIZA (Z == 1 && Rn == 11111)
AUTIZA <Xd>

UNDEFINED;
if Z == '0' then // AUTIA

else // AUTIZA
System
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 1 0 x 1 1 1 1 1
CRm op2
AUTIA, AUTIA1716, AUTIASP,

Page 44
AUTIAZ, AUTIZA
AUTIA1716 (CRm == 0001 && op2 == 100)
AUTIA1716
AUTIASP (CRm == 0011 && op2 == 101)
AUTIASP
AUTIAZ (CRm == 0011 && op2 == 100)
AUTIAZ
integer d;
integer n;
case CRm:op2 of
when '0011 100' // AUTIAZ
d = 30;
n = 31;
when '0011 101' // AUTIASP
d = 30;
source_is_sp = TRUE;
when '0001 100' // AUTIA1716
d = 17;
n = 16;
when '0001 000' SEE "PACIA";
when '0001 010' SEE "PACIB";
when '0001 110' SEE "AUTIB";
when '0011 00x' SEE "PACIA";
when '0011 01x' SEE "PACIB";
when '0011 11x' SEE "AUTIB";
when '0000 111' SEE "XPACLRI";
otherwise SEE "HINT";
Assembler Symbols
Operation
X[d] = AuthIA(X[d], SP[], FALSE);
else
X[d] = AuthIA(X[d], X[n], FALSE);
AUTIA, AUTIA1716, AUTIASP,

Page 45
AUTIAZ, AUTIZA
AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, AUTIZB
Authenticate Instruction address, using key B. This instruction authenticates an instruction address, using a modifier
and key B.
The address is:
• In the general-purpose register that is specified by <Xd> for AUTIB and AUTIZB.
• In X17, for AUTIB1716.
• In X30, for AUTIBSP and AUTIBZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTIB.
• The value zero, for AUTIZB and AUTIBZ.
• In X16, for AUTIB1716.
• In SP, for AUTIBSP.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 0 1 Rn Rd
AUTIB (Z == 0)
AUTIB <Xd>, <Xn|SP>
AUTIZB (Z == 1 && Rn == 11111)
AUTIZB <Xd>

UNDEFINED;
if Z == '0' then // AUTIB

else // AUTIZB
System
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 1 1 x 1 1 1 1 1
CRm op2
AUTIB, AUTIB1716, AUTIBSP,

Page 46
AUTIBZ, AUTIZB
AUTIB1716 (CRm == 0001 && op2 == 110)
AUTIB1716
AUTIBSP (CRm == 0011 && op2 == 111)
AUTIBSP
AUTIBZ (CRm == 0011 && op2 == 110)
AUTIBZ
integer d;
integer n;
case CRm:op2 of
when '0011 110' // AUTIBZ
d = 30;
n = 31;
when '0011 111' // AUTIBSP
d = 30;
when '0001 110' // AUTIB1716
d = 17;
n = 16;
when '0001 100' SEE "AUTIA";
when '0011 10x' SEE "AUTIA";
Assembler Symbols
Operation
X[d] = AuthIB(X[d], SP[], FALSE);
else
X[d] = AuthIB(X[d], X[n], FALSE);
AUTIB, AUTIB1716, AUTIBSP,

Page 47
AUTIBZ, AUTIZB
AXFLAG
Convert floating-point condition flags from Arm to external format. This instruction converts the state of the
PSTATE.{N,Z,C,V} flags from a form representing the result of an Arm floating-point scalar compare instruction to an
alternative representation required by some software.
System
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 1 0 1 1 1 1 1
CRm
System
AXFLAG
if !HaveFlagFormatExt() then UNDEFINED;
Operation
bit Z = PSTATE.Z OR PSTATE.V;

bit C = PSTATE.C AND NOT(PSTATE.V);
PSTATE.N = '0';
PSTATE.Z = Z;
PSTATE.C = C;
PSTATE.V = '0';
AXFLAG Page 48
B.cond
Branch conditionally to a label at a PC-relative offset, with a hint that this is not a subroutine call or return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 1 0 0 imm19 0 cond
19-bit signed PC-relative branch offset
B.<cond> <label>
bits(64) offset = SignExtend(imm19:'00', 64);
Assembler Symbols
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-1MB, is encoded as "imm19" times 4.
Operation
if ConditionHolds(cond) then
BranchTo(PC[] + offset, BranchType_DIR);
B.cond Page 49
B
Branch causes an unconditional branch to a label at a PC-relative offset, with a hint that this is not a subroutine call or
return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 0 1 imm26
op
B <label>
Assembler Symbols
<label> Is the program label to be unconditionally branched to. Its offset from the address of this instruction, in
Operation
B Page 50
BFC
Bitfield Clear sets a bitfield of <width> bits at bit position <lsb> of the destination register to zero, leaving the other
destination bits unchanged.
This is an alias of BFM. This means:
• The encodings in this description are named to match the encodings of BFM.
• The description of BFM gives the operational pseudocode for this instruction.
Leaving other bits unchanged

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms 1 1 1 1 1 Rd
opc Rn
32-bit (sf == 0 && N == 0)
BFC <Wd>, #<lsb>, #<width>
is equivalent to
BFM <Wd>, WZR, #(-<lsb> MOD 32), #(<width>-1)
and is the preferred disassembly when UInt(imms) < UInt(immr).
64-bit (sf == 1 && N == 1)
BFC <Xd>, #<lsb>, #<width>
is equivalent to
BFM <Xd>, XZR, #(-<lsb> MOD 64), #(<width>-1)
Assembler Symbols
<lsb> For the 32-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.
Operation
The description of BFM gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
BFC Page 51
BFC Page 52
BFI
Bitfield Insert copies a bitfield of <width> bits from the least significant bits of the source register to bit position
<lsb> of the destination register, leaving the other destination bits unchanged.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms != 11111 Rd
opc Rn
32-bit (sf == 0 && N == 0)
BFI <Wd>, <Wn>, #<lsb>, #<width>
is equivalent to
BFM <Wd>, <Wn>, #(-<lsb> MOD 32), #(<width>-1)
64-bit (sf == 1 && N == 1)
BFI <Xd>, <Xn>, #<lsb>, #<width>
is equivalent to
BFM <Xd>, <Xn>, #(-<lsb> MOD 64), #(<width>-1)
Assembler Symbols
Operation
If PSTATE.DIT is 1:
BFI Page 53
BFI Page 54
BFM
Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
If <imms> is greater than or equal to <immr>, this copies a bitfield of (<imms>-<immr>+1) bits starting from bit
position <immr> in the source register to the least significant bits of the destination register.
If <imms> is less than <immr>, this copies a bitfield of (<imms>+1) bits from the least significant bits of the source
register to bit position (regsize-<immr>) of the destination register, where regsize is the destination register size of 32
or 64 bits.
In both cases the other bits of the destination register remain unchanged.
This instruction is used by the aliases BFC, BFI, and BFXIL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
BFM <Wd>, <Wn>, #<immr>, #<imms>
64-bit (sf == 1 && N == 1)
BFM <Xd>, <Xn>, #<immr>, #<imms>
integer R;
bits(datasize) wmask;
bits(datasize) tmask;

if sf == '0' && (N != '0' || immr<5> != '0' || imms<5> != '0') then UNDEFINED;
R = UInt(immr);
(wmask, tmask) = DecodeBitMasks(N, imms, immr, FALSE);
Assembler Symbols
<immr> For the 32-bit variant: is the right rotate amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the right rotate amount, in the range 0 to 63, encoded in the "immr" field.
<imms> For the 32-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 63,
Alias Conditions

BFC Rn == '11111' && UInt(imms) < UInt(immr)
BFI Rn != '11111' && UInt(imms) < UInt(immr)
BFXIL UInt(imms) >= UInt(immr)
BFM Page 55
Operation
bits(datasize) dst = X[d];

bits(datasize) src = X[n];
// perform bitfield move on low bits

bits(datasize) bot = (dst AND NOT(wmask)) OR (ROR(src, R) AND wmask);
// combine extension bits and result bits

X[d] = (dst AND NOT(tmask)) OR (bot AND tmask);
If PSTATE.DIT is 1:
BFM Page 56
BFXIL
Bitfield Extract and Insert Low copies a bitfield of <width> bits starting from bit position <lsb> in the source register
to the least significant bits of the destination register, leaving the other destination bits unchanged.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
BFXIL <Wd>, <Wn>, #<lsb>, #<width>
is equivalent to
BFM <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)
and is the preferred disassembly when UInt(imms) >= UInt(immr).
64-bit (sf == 1 && N == 1)
BFXIL <Xd>, <Xn>, #<lsb>, #<width>
is equivalent to
BFM <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)
and is the preferred disassembly when UInt(imms) >= UInt(immr).
Assembler Symbols
<lsb> For the 32-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 63.
Operation
If PSTATE.DIT is 1:
BFXIL Page 57
BFXIL Page 58
BIC (shifted register)
Bitwise Bit Clear (shifted register) performs a bitwise AND of a register value and the complement of an optionally-
shifted register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc N
32-bit (sf == 0)
BIC <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
BIC <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Operation

operand2 = NOT(operand2);

X[d] = result;
BIC (shifted register) Page 59

If PSTATE.DIT is 1:
BIC (shifted register) Page 60

BICS (shifted register)
Bitwise Bit Clear (shifted register), setting flags, performs a bitwise AND of a register value and the complement of an
optionally-shifted register value, and writes the result to the destination register. It updates the condition flags based
on the result.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc N
32-bit (sf == 0)
BICS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
BICS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Operation


X[d] = result;
BICS (shifted register) Page 61

If PSTATE.DIT is 1:
BICS (shifted register) Page 62

BL
Branch with Link branches to a PC-relative offset, setting the register X30 to PC+4. It provides a hint that this is a
subroutine call.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 1 imm26
op
BL <label>
Assembler Symbols
<label> Is the program label to be unconditionally branched to. Its offset from the address of this instruction, in
Operation
X[30] = PC[] + 4;
BranchTo(PC[] + offset, BranchType_DIRCALL);
BL Page 63
BLR
Branch with Link to Register calls a subroutine at an address in a register, setting register X30 to PC+4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm
Integer
BLR <Xn>
Assembler Symbols
<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field.
Operation
bits(64) target = X[n];
X[30] = PC[] + 4;
BranchTo(target, BranchType_INDCALL);
BLR Page 64
BLRAA, BLRAAZ, BLRAB, BLRABZ
Branch with Link to Register, with pointer authentication. This instruction authenticates the address in the general-
purpose register that is specified by <Xn>, using a modifier and the specified key, and calls a subroutine at the
authenticated address, setting register X30 to PC+4.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xm|SP> for BLRAA and BLRAB.
• The value zero, for BLRAAZ and BLRABZ.
Key A is used for BLRAA and BLRAAZ, and key B is used for BLRAB and BLRABZ.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to the general-purpose register.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 Z 0 0 1 1 1 1 1 1 0 0 0 0 1 M Rn Rm
op A
Key A, zero modifier (Z == 0 && M == 0 && Rm == 11111)
BLRAAZ <Xn>
Key A, register modifier (Z == 1 && M == 0)
BLRAA <Xn>, <Xm|SP>
Key B, zero modifier (Z == 0 && M == 1 && Rm == 11111)
BLRABZ <Xn>
Key B, register modifier (Z == 1 && M == 1)
BLRAB <Xn>, <Xm|SP>
boolean use_key_a = (M == '0');
boolean source_is_sp = ((Z == '1') && (m == 31));
UNDEFINED;
if Z == '0' && m != 31 then

UNDEFINED;
Assembler Symbols
the "Rn" field.
<Xm|SP> Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier, encoded
in the "Rm" field.
BLRAA, BLRAAZ, BLRAB,

Page 65
BLRABZ
Operation
bits(64) modifier = if source_is_sp then SP[] else X[m];
if use_key_a then
target = AuthIA(target, modifier, TRUE);
else
target = AuthIB(target, modifier, TRUE);
X[30] = PC[] + 4;
BranchTo(target, BranchType_INDCALL);
BLRAA, BLRAAZ, BLRAB,

Page 66
BLRABZ
BR
Branch to Register branches unconditionally to an address in a register, with a hint that this is not a subroutine return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm
Integer
BR <Xn>
Assembler Symbols
the "Rn" field.
Operation
BranchTo(target, BranchType_INDIR);
BR Page 67
BRAA, BRAAZ, BRAB, BRABZ
Branch to Register, with pointer authentication. This instruction authenticates the address in the general-purpose
register that is specified by <Xn>, using a modifier and the specified key, and branches to the authenticated address.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xm|SP> for BRAA and BRAB.
• The value zero, for BRAAZ and BRABZ.
Key A is used for BRAA and BRAAZ, and key B is used for BRAB and BRABZ.
The authenticated address is not written back to the general-purpose register.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 Z 0 0 0 1 1 1 1 1 0 0 0 0 1 M Rn Rm
op A
Key A, zero modifier (Z == 0 && M == 0 && Rm == 11111)
BRAAZ <Xn>
Key A, register modifier (Z == 1 && M == 0)
BRAA <Xn>, <Xm|SP>
Key B, zero modifier (Z == 0 && M == 1 && Rm == 11111)
BRABZ <Xn>
Key B, register modifier (Z == 1 && M == 1)
BRAB <Xn>, <Xm|SP>
boolean source_is_sp = ((Z == '1') && (m == 31));
UNDEFINED;
if Z == '0' && m != 31 then

UNDEFINED;
Assembler Symbols
the "Rn" field.
<Xm|SP> Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier, encoded
in the "Rm" field.
BRAA, BRAAZ, BRAB, BRABZ Page 68

Operation
bits(64) modifier = if source_is_sp then SP[] else X[m];
if use_key_a then
else
BranchTo(target, BranchType_INDIR);
BRAA, BRAAZ, BRAB, BRABZ Page 69

BRK
Breakpoint instruction. A BRK instruction generates a Breakpoint Instruction exception. The PE records the exception
in ESR_ELx, using the EC value 0x3c, and captures the value of the immediate argument in ESR_ELx.ISS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 1 imm16 0 0 0 0 0
System
BRK #<imm>
if HaveBTIExt() then
SetBTypeCompatible(TRUE);
Assembler Symbols
<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
Operation
AArch64.SoftwareBreakpoint(imm16);
BRK Page 70
BTI
Branch Target Identification. A BTI instruction is used to guard against the execution of instructions which are not the
intended target of a branch.
Outside of a guarded memory region, a BTI instruction executes as a NOP. Within a guarded memory region while
PSTATE.BTYPE != 0b00, a BTI instruction compatible with the current value of PSTATE.BTYPE will not generate a
Branch Target Exception and will allow execution of subsequent instructions within the memory region.
The operand <targets> passed to a BTI instruction determines the values of PSTATE.BTYPE which the BTI instruction
is compatible with.
Within a guarded memory region, while PSTATE.BTYPE
!= 0b00, all instructions will generate a Branch Target
Exception, other than BRK, BTI, HLT, PACIASP,
and PACIBSP, which may not. See the individual instructions for details.
System
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 x x 0 1 1 1 1 1
CRm op2
System
BTI {<targets>}
SystemHintOp op;
if CRm:op2 == '0100 xx0' then

op = SystemHintOp_BTI;
// Check branch target compatibility between BTI instruction and PSTATE.BTYPE
SetBTypeCompatible(BTypeCompatible_BTI(op2<2:1>));
else
EndOfInstruction();
Assembler Symbols
<targets> Is the type of indirection, encoded in “op2<2:1>”:

op2<2:1> <targets>
00 (omitted)
01 c
10 j
11 jc
BTI Page 71
Operation
BTI Page 72
case op of
when SystemHintOp_YIELD
Hint_Yield();
when SystemHintOp_DGH
Hint_DGH();
when SystemHintOp_WFE
if IsEventRegisterSet() then
ClearEventRegister();
else
trap = FALSE;
if PSTATE.EL == EL0 then
// Check for traps described by the OS which may be EL1 or EL2.
if HaveTWEDExt() then
- = SCTLR[];
trap = sctlr.nTWE == '0';
target_el = EL1;
else
AArch64.CheckForWFxTrap(EL1, TRUE);
if !trap && PSTATE.EL IN {EL0, EL1} && EL2Enabled() && !IsInHost() then
// Check for traps described by the Hypervisor.
trap = HCR_EL2.TWE == '1';
target_el = EL2;
else
if !trap && HaveEL(EL3) && PSTATE.EL != EL3 then

// Check for traps described by the Secure Monitor.
trap = SCR_EL3.TWE == '1';
target_el = EL3;
else
if HaveTWEDExt() && trap && PSTATE.EL != EL3 then

(delay_enabled, delay) = WFETrapDelay(target_el); // (If trap delay is enabled, Delay
if !AArch64.WaitForEventUntilDelay(delay_enabled, delay) then
// Event did not arrive until delay expired
AArch64.WFxTrap(target_el, TRUE); // Trap WFE
else
WaitForEvent();
when SystemHintOp_WFI
if !InterruptPending() then
AArch64.CheckForWFxTrap(EL1, FALSE);
if PSTATE.EL IN {EL0, EL1} && EL2Enabled() && !IsInHost() then
if HaveEL(EL3) && PSTATE.EL != EL3 then
WaitForInterrupt();
when SystemHintOp_SEV
SendEvent();
when SystemHintOp_SEVL
SendEventLocal();
when SystemHintOp_ESB
SynchronizeErrors();
AArch64.ESBOperation();
if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then AArch64.vESBOperation();
TakeUnmaskedSErrorInterrupts();
BTI Page 73
when SystemHintOp_PSB
ProfilingSynchronizationBarrier();
when SystemHintOp_TSB
TraceSynchronizationBarrier();
when SystemHintOp_CSDB
ConsumptionOfSpeculativeDataBarrier();
when SystemHintOp_BTI
SetBTypeNext('00');
otherwise // do nothing
BTI Page 74
CAS, CASA, CASAL, CASL
Compare and Swap word or doubleword in memory reads a 32-bit word or 64-bit doubleword from memory, and
compares it against the value held in a first register. If the comparison is equal, the value in a second register is
written to memory. If the write is performed, the read and write occur atomically such that no other modification of
the memory location can take place between the read and write.
• CASA and CASAL load from memory with acquire semantics.
• CASL and CASAL store to memory with release semantics.
• CAS has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, or
<Xs>, is restored to the value held in the register before the instruction was executed.
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size
CAS, CASA, CASAL, CASL Page 75

32-bit CAS (size == 10 && L == 0 && o0 == 0)
CAS <Ws>, <Wt>, [<Xn|SP>{,#0}]
32-bit CASA (size == 10 && L == 1 && o0 == 0)
CASA <Ws>, <Wt>, [<Xn|SP>{,#0}]
32-bit CASAL (size == 10 && L == 1 && o0 == 1)
CASAL <Ws>, <Wt>, [<Xn|SP>{,#0}]
32-bit CASL (size == 10 && L == 0 && o0 == 1)
CASL <Ws>, <Wt>, [<Xn|SP>{,#0}]
64-bit CAS (size == 11 && L == 0 && o0 == 0)
CAS <Xs>, <Xt>, [<Xn|SP>{,#0}]
64-bit CASA (size == 11 && L == 1 && o0 == 0)
CASA <Xs>, <Xt>, [<Xn|SP>{,#0}]
64-bit CASAL (size == 11 && L == 1 && o0 == 1)
CASAL <Xs>, <Xt>, [<Xn|SP>{,#0}]
64-bit CASL (size == 11 && L == 0 && o0 == 1)
CASL <Xs>, <Xt>, [<Xn|SP>{,#0}]
if !HaveAtomicExt() then UNDEFINED;
integer t = UInt(Rt);
integer s = UInt(Rs);
integer datasize = 8 << UInt(size);

integer regsize = if datasize == 64 then 64 else 32;
AccType ldacctype = if L == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
AccType stacctype = if o0 == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register to be compared and loaded, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be conditionally stored, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register to be compared and loaded, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be conditionally stored, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

Operation
bits(64) address;
bits(datasize) comparevalue;
bits(datasize) newvalue;
bits(datasize) data;
if HaveMTEExt() then
SetTagCheckedInstruction(tag_checked);
comparevalue = X[s];
newvalue = X[t];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomicCompareAndSwap(address, comparevalue, newvalue, ldacctype, stacctype);
X[s] = ZeroExtend(data, regsize);

CASB, CASAB, CASALB, CASLB
Compare and Swap byte in memory reads an 8-bit byte from memory, and compares it against the value held in a first
register. If the comparison is equal, the value in a second register is written to memory. If the write is performed, the
read and write occur atomically such that no other modification of the memory location can take place between the
read and write.
• CASAB and CASALB load from memory with acquire semantics.
• CASLB and CASALB store to memory with release semantics.
• CASB has neither acquire nor release semantics.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, is
restored to the values held in the register before the instruction was executed.
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size
CASAB (L == 1 && o0 == 0)
CASAB <Ws>, <Wt>, [<Xn|SP>{,#0}]
CASALB (L == 1 && o0 == 1)
CASALB <Ws>, <Wt>, [<Xn|SP>{,#0}]
CASB (L == 0 && o0 == 0)
CASB <Ws>, <Wt>, [<Xn|SP>{,#0}]
CASLB (L == 0 && o0 == 1)
CASLB <Ws>, <Wt>, [<Xn|SP>{,#0}]

Assembler Symbols
CASB, CASAB, CASALB,

Page 78
CASLB
Operation
bits(64) address;
bits(8) comparevalue;
bits(8) newvalue;
bits(8) data;
newvalue = X[t];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
X[s] = ZeroExtend(data, 32);
CASB, CASAB, CASALB,

Page 79
CASLB
CASH, CASAH, CASALH, CASLH
Compare and Swap halfword in memory reads a 16-bit halfword from memory, and compares it against the value held
in a first register. If the comparison is equal, the value in a second register is written to memory. If the write is
performed, the read and write occur atomically such that no other modification of the memory location can take place
between the read and write.
• CASAH and CASALH load from memory with acquire semantics.
• CASLH and CASALH store to memory with release semantics.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, is
restored to the values held in the register before the instruction was executed.
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size
CASAH (L == 1 && o0 == 0)
CASAH <Ws>, <Wt>, [<Xn|SP>{,#0}]
CASALH (L == 1 && o0 == 1)
CASALH <Ws>, <Wt>, [<Xn|SP>{,#0}]
CASH (L == 0 && o0 == 0)
CASH <Ws>, <Wt>, [<Xn|SP>{,#0}]
CASLH (L == 0 && o0 == 1)
CASLH <Ws>, <Wt>, [<Xn|SP>{,#0}]

Assembler Symbols
CASH, CASAH, CASALH,

Page 80
CASLH
Operation
bits(64) address;
bits(16) comparevalue;
bits(16) newvalue;
bits(16) data;
newvalue = X[t];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
X[s] = ZeroExtend(data, 32);
CASH, CASAH, CASALH,

Page 81
CASLH
CASP, CASPA, CASPAL, CASPL
Compare and Swap Pair of words or doublewords in memory reads a pair of 32-bit words or 64-bit doublewords from
memory, and compares them against the values held in the first pair of registers. If the comparison is equal, the values
in the second pair of registers are written to memory. If the writes are performed, the reads and writes occur
atomically such that no other modification of the memory location can take place between the reads and writes.
• CASPA and CASPAL load from memory with acquire semantics.
• CASPL and CASPAL store to memory with release semantics.
If the instruction generates a synchronous Data Abort, the registers which are compared and loaded, that is <Ws>
and <W(s+1)>, or <Xs> and <X(s+1)>, are restored to the values held in the registers before the instruction was
executed.
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 sz 0 0 1 0 0 0 0 L 1 Rs o0 1 1 1 1 1 Rn Rt
Rt2
CASP, CASPA, CASPAL,

Page 82
CASPL
32-bit CASP (sz == 0 && L == 0 && o0 == 0)
CASP <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{,#0}]
32-bit CASPA (sz == 0 && L == 1 && o0 == 0)
CASPA <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{,#0}]
32-bit CASPAL (sz == 0 && L == 1 && o0 == 1)
CASPAL <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{,#0}]
32-bit CASPL (sz == 0 && L == 0 && o0 == 1)
CASPL <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{,#0}]
64-bit CASP (sz == 1 && L == 0 && o0 == 0)
CASP <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{,#0}]
64-bit CASPA (sz == 1 && L == 1 && o0 == 0)
CASPA <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{,#0}]
64-bit CASPAL (sz == 1 && L == 1 && o0 == 1)
CASPAL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{,#0}]
64-bit CASPL (sz == 1 && L == 0 && o0 == 1)
CASPL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{,#0}]

if Rs<0> == '1' then UNDEFINED;
if Rt<0> == '1' then UNDEFINED;
integer datasize = 32 << UInt(sz);

Assembler Symbols
<Ws> Is the 32-bit name of the first general-purpose register to be compared and loaded, encoded in the "Rs"
field. <Ws> must be an even-numbered register.
<W(s+1)> Is the 32-bit name of the second general-purpose register to be compared and loaded.
<Wt> Is the 32-bit name of the first general-purpose register to be conditionally stored, encoded in the "Rt"
field. <Wt> must be an even-numbered register.
<W(t+1)> Is the 32-bit name of the second general-purpose register to be conditionally stored.
<Xs> Is the 64-bit name of the first general-purpose register to be compared and loaded, encoded in the "Rs"
field. <Xs> must be an even-numbered register.
<X(s+1)> Is the 64-bit name of the second general-purpose register to be compared and loaded.
<Xt> Is the 64-bit name of the first general-purpose register to be conditionally stored, encoded in the "Rt"
field. <Xt> must be an even-numbered register.

Page 83
CASPL
<X(t+1)> Is the 64-bit name of the second general-purpose register to be conditionally stored.
Operation
bits(64) address;
bits(2*datasize) comparevalue;
bits(2*datasize) newvalue;
bits(2*datasize) data;
bits(datasize) s1 = X[s];
bits(datasize) s2 = X[s+1];
bits(datasize) t1 = X[t];
bits(datasize) t2 = X[t+1];
comparevalue = if BigEndian() then s1:s2 else s2:s1;
newvalue = if BigEndian() then t1:t2 else t2:t1;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if BigEndian() then
X[s] = ZeroExtend(data<2*datasize-1:datasize>, datasize);
X[s+1] = ZeroExtend(data<datasize-1:0>, datasize);
else
X[s] = ZeroExtend(data<datasize-1:0>, datasize);
X[s+1] = ZeroExtend(data<2*datasize-1:datasize>, datasize);

Page 84
CASPL
CBNZ
Compare and Branch on Nonzero compares the value in a register with zero, and conditionally branches to a label at a
PC-relative offset if the comparison is not equal. It provides a hint that this is not a subroutine call or return. This
instruction does not affect the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 1 0 1 imm19 Rt
op
32-bit (sf == 0)
CBNZ <Wt>, <label>
64-bit (sf == 1)
CBNZ <Xt>, <label>
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
Operation
bits(datasize) operand1 = X[t];
if IsZero(operand1) == FALSE then

CBNZ Page 85
CBZ
Compare and Branch on Zero compares the value in a register with zero, and conditionally branches to a label at a PC-
relative offset if the comparison is equal. It provides a hint that this is not a subroutine call or return. This instruction
does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 1 0 0 imm19 Rt
op
32-bit (sf == 0)
CBZ <Wt>, <label>
64-bit (sf == 1)
CBZ <Xt>, <label>
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
Operation
bits(datasize) operand1 = X[t];
if IsZero(operand1) == TRUE then

CBZ Page 86
CCMN (immediate)
Conditional Compare Negative (immediate) sets the value of the condition flags to the result of the comparison of a
register value and a negated immediate value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 1 0 imm5 cond 1 0 Rn 0 nzcv
op
32-bit (sf == 0)
CCMN <Wn>, #<imm>, #<nzcv>, <cond>
64-bit (sf == 1)
CCMN <Xn>, #<imm>, #<nzcv>, <cond>
bits(4) flags = nzcv;
bits(datasize) imm = ZeroExtend(imm5, datasize);
Assembler Symbols
<imm> Is a five bit unsigned (positive) immediate encoded in the "imm5" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
Operation
(-, flags) = AddWithCarry(operand1, imm, '0');
PSTATE.<N,Z,C,V> = flags;
If PSTATE.DIT is 1:
CCMN (immediate) Page 87

CCMN (register)
Conditional Compare Negative (register) sets the value of the condition flags to the result of the comparison of a
register value and the inverse of another register value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 1 0 Rm cond 0 0 Rn 0 nzcv
op
32-bit (sf == 0)
CCMN <Wn>, <Wm>, #<nzcv>, <cond>
64-bit (sf == 1)
CCMN <Xn>, <Xm>, #<nzcv>, <cond>
Assembler Symbols
Operation

(-, flags) = AddWithCarry(operand1, operand2, '0');
If PSTATE.DIT is 1:
CCMN (register) Page 88

CCMP (immediate)
Conditional Compare (immediate) sets the value of the condition flags to the result of the comparison of a register
value and an immediate value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 1 0 imm5 cond 1 0 Rn 0 nzcv
op
32-bit (sf == 0)
CCMP <Wn>, #<imm>, #<nzcv>, <cond>
64-bit (sf == 1)
CCMP <Xn>, #<imm>, #<nzcv>, <cond>
bits(datasize) imm = ZeroExtend(imm5, datasize);
Assembler Symbols
<imm> Is a five bit unsigned (positive) immediate encoded in the "imm5" field.
Operation

bits(datasize) operand2;
operand2 = NOT(imm);
If PSTATE.DIT is 1:
CCMP (immediate) Page 89

CCMP (register)
Conditional Compare (register) sets the value of the condition flags to the result of the comparison of two registers if
the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 1 0 Rm cond 0 0 Rn 0 nzcv
op
32-bit (sf == 0)
CCMP <Wn>, <Wm>, #<nzcv>, <cond>
64-bit (sf == 1)
CCMP <Xn>, <Xm>, #<nzcv>, <cond>
Assembler Symbols
Operation

If PSTATE.DIT is 1:
CCMP (register) Page 90

CFINV
Invert Carry Flag. This instruction inverts the value of the PSTATE.C flag.
System
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 0 0 1 1 1 1 1
CRm
System
CFINV
if !HaveFlagManipulateExt() then UNDEFINED;
Operation
PSTATE.C = NOT(PSTATE.C);
If PSTATE.DIT is 1:
CFINV Page 91
CFP
Control Flow Prediction Restriction by Context prevents control flow predictions that predict execution addresses,
based on information gathered from earlier execution within a particular execution context, from allowing later
speculative execution within that context to be observable through side-channels.
For more information, see CFP RCTX, Control Flow Prediction Restriction by Context.
System
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 Rt
L op1 CRn CRm op2
System
CFP RCTX, <Xt>
is equivalent to
SYS #3, C7, C3, #4, <Xt>
Assembler Symbols
Operation
CFP Page 92
CINC
Conditional Increment returns, in the destination register, the value of the source register incremented by 1 if the
condition is TRUE, and otherwise returns the value of the source register.
This is an alias of CSINC. This means:
• The encodings in this description are named to match the encodings of CSINC.
• The description of CSINC gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 != 11111 != 111x 0 1 != 11111 Rd
op Rm cond o2 Rn
32-bit (sf == 0)
CINC <Wd>, <Wn>, <cond>
is equivalent to
CSINC <Wd>, <Wn>, <Wn>, invert(<cond>)
and is the preferred disassembly when Rn == Rm.
64-bit (sf == 1)
CINC <Xd>, <Xn>, <cond>
is equivalent to
CSINC <Xd>, <Xn>, <Xn>, invert(<cond>)
Assembler Symbols
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<cond> Is one of the standard conditions, excluding AL and NV, encoded in the "cond" field with its least
significant bit inverted.
Operation
The description of CSINC gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CINC Page 93
CINV
Conditional Invert returns, in the destination register, the bitwise inversion of the value of the source register if the
condition is TRUE, and otherwise returns the value of the source register.
This is an alias of CSINV. This means:
• The encodings in this description are named to match the encodings of CSINV.
• The description of CSINV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 != 11111 != 111x 0 0 != 11111 Rd
op Rm cond o2 Rn
32-bit (sf == 0)
CINV <Wd>, <Wn>, <cond>
is equivalent to
CSINV <Wd>, <Wn>, <Wn>, invert(<cond>)
64-bit (sf == 1)
CINV <Xd>, <Xn>, <cond>
is equivalent to
CSINV <Xd>, <Xn>, <Xn>, invert(<cond>)
Assembler Symbols
Operation
The description of CSINV gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CINV Page 94
CLREX
Clear Exclusive clears the local monitor of the executing PE.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 0 1 0 1 1 1 1 1
System
CLREX {#<imm>}
// CRm field is ignored
Assembler Symbols
<imm> Is an optional 4-bit unsigned immediate, in the range 0 to 15, defaulting to 15 and encoded in the
"CRm" field.
Operation
ClearExclusiveLocal(ProcessorID());
CLREX Page 95
CLS
Count Leading Sign bits counts the number of leading bits of the source register that have the same value as the most
significant bit of the register, and writes the result to the destination register. This count does not include the most
significant bit of the source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 Rn Rd
op
32-bit (sf == 0)
CLS <Wd>, <Wn>
64-bit (sf == 1)
CLS <Xd>, <Xn>
Assembler Symbols
Operation
integer result;
result = CountLeadingSignBits(operand1);
X[d] = result<datasize-1:0>;
If PSTATE.DIT is 1:
CLS Page 96
CLZ
Count Leading Zeros counts the number of binary zero bits before the first binary one bit in the value of the source
register, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 Rn Rd
op
32-bit (sf == 0)
CLZ <Wd>, <Wn>
64-bit (sf == 1)
CLZ <Xd>, <Xn>
Assembler Symbols
Operation
integer result;
result = CountLeadingZeroBits(operand1);
If PSTATE.DIT is 1:
CLZ Page 97
CMN (extended register)
Compare Negative (extended register) adds a register value and a sign or zero-extended register value, followed by an
optional left shift amount. The argument that is extended from the <Rm> register can be a byte, halfword, word, or
doubleword. It updates the condition flags based on the result, and discards the result.
This is an alias of ADDS (extended register). This means:
• The encodings in this description are named to match the encodings of ADDS (extended register).
• The description of ADDS (extended register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
CMN <Wn|WSP>, <Wm>{, <extend> {#<amount>}}
is equivalent to
ADDS WZR, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}
64-bit (sf == 1)
CMN <Xn|SP>, <R><m>{, <extend> {#<amount>}}
is equivalent to
ADDS XZR, <Xn|SP>, <R><m>{, <extend> {#<amount>}}
Assembler Symbols
field.
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
"Rm" field.
CMN (extended register) Page 98

option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
Operation
The description of ADDS (extended register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CMN (extended register) Page 99

CMN (immediate)
Compare Negative (immediate) adds a register value and an optionally-shifted immediate value. It updates the
condition flags based on the result, and discards the result.
This is an alias of ADDS (immediate). This means:
• The encodings in this description are named to match the encodings of ADDS (immediate).
• The description of ADDS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 0 1 0 sh imm12 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
CMN <Wn|WSP>, #<imm>{, <shift>}
is equivalent to
ADDS WZR, <Wn|WSP>, #<imm> {, <shift>}
64-bit (sf == 1)
CMN <Xn|SP>, #<imm>{, <shift>}
is equivalent to
ADDS XZR, <Xn|SP>, #<imm> {, <shift>}
Assembler Symbols
sh <shift>
0 LSL #0
1 LSL #12
Operation
The description of ADDS (immediate) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CMN (immediate) Page 100

CMN (shifted register)
Compare Negative (shifted register) adds a register value and an optionally-shifted register value. It updates the
condition flags based on the result, and discards the result.
This is an alias of ADDS (shifted register). This means:
• The encodings in this description are named to match the encodings of ADDS (shifted register).
• The description of ADDS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 shift 0 Rm imm6 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
CMN <Wn>, <Wm>{, <shift> #<amount>}
is equivalent to
ADDS WZR, <Wn>, <Wm> {, <shift> #<amount>}
64-bit (sf == 1)
CMN <Xn>, <Xm>{, <shift> #<amount>}
is equivalent to
ADDS XZR, <Xn>, <Xm> {, <shift> #<amount>}
Assembler Symbols
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
"imm6" field.
"imm6" field.
Operation
The description of ADDS (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CMN (shifted register) Page 101

CMN (shifted register) Page 102

CMP (extended register)
Compare (extended register) subtracts a sign or zero-extended register value, followed by an optional left shift
amount, from a register value. The argument that is extended from the <Rm> register can be a byte, halfword, word,
or doubleword. It updates the condition flags based on the result, and discards the result.
This is an alias of SUBS (extended register). This means:
• The encodings in this description are named to match the encodings of SUBS (extended register).
• The description of SUBS (extended register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
CMP <Wn|WSP>, <Wm>{, <extend> {#<amount>}}
is equivalent to
SUBS WZR, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}
64-bit (sf == 1)
CMP <Xn|SP>, <R><m>{, <extend> {#<amount>}}
is equivalent to
SUBS XZR, <Xn|SP>, <R><m>{, <extend> {#<amount>}}
Assembler Symbols
field.
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
"Rm" field.
CMP (extended register) Page 103

option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
Operation
The description of SUBS (extended register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CMP (extended register) Page 104

CMP (immediate)
Compare (immediate) subtracts an optionally-shifted immediate value from a register value. It updates the condition
flags based on the result, and discards the result.
This is an alias of SUBS (immediate). This means:
• The encodings in this description are named to match the encodings of SUBS (immediate).
• The description of SUBS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 0 1 0 sh imm12 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
CMP <Wn|WSP>, #<imm>{, <shift>}
is equivalent to
SUBS WZR, <Wn|WSP>, #<imm> {, <shift>}
64-bit (sf == 1)
CMP <Xn|SP>, #<imm>{, <shift>}
is equivalent to
SUBS XZR, <Xn|SP>, #<imm> {, <shift>}
Assembler Symbols
sh <shift>
0 LSL #0
1 LSL #12
Operation
The description of SUBS (immediate) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CMP (immediate) Page 105

CMP (shifted register)
Compare (shifted register) subtracts an optionally-shifted register value from a register value. It updates the condition
flags based on the result, and discards the result.
This is an alias of SUBS (shifted register). This means:
• The encodings in this description are named to match the encodings of SUBS (shifted register).
• The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 shift 0 Rm imm6 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
CMP <Wn>, <Wm>{, <shift> #<amount>}
is equivalent to
SUBS WZR, <Wn>, <Wm> {, <shift> #<amount>}
64-bit (sf == 1)
CMP <Xn>, <Xm>{, <shift> #<amount>}
is equivalent to
SUBS XZR, <Xn>, <Xm> {, <shift> #<amount>}
Assembler Symbols
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
"imm6" field.
"imm6" field.
Operation
The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CMP (shifted register) Page 106

CMP (shifted register) Page 107

CMPP
Compare with Tag subtracts the 56-bit address held in the second source register from the 56-bit address held in the
first source register, updates the condition flags based on the result of the subtraction, and discards the result.
This is an alias of SUBPS. This means:
• The encodings in this description are named to match the encodings of SUBPS.
• The description of SUBPS gives the operational pseudocode for this instruction.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn 1 1 1 1 1
Xd
Integer
CMPP <Xn|SP>, <Xm|SP>
is equivalent to
SUBPS XZR, <Xn|SP>, <Xm|SP>
Assembler Symbols
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Xm"
field.
Operation
The description of SUBPS gives the operational pseudocode for this instruction.
CMPP Page 108

CNEG
Conditional Negate returns, in the destination register, the negated value of the source register if the condition is
TRUE, and otherwise returns the value of the source register.
This is an alias of CSNEG. This means:
• The encodings in this description are named to match the encodings of CSNEG.
• The description of CSNEG gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm != 111x 0 1 Rn Rd
op cond o2
32-bit (sf == 0)
CNEG <Wd>, <Wn>, <cond>
is equivalent to
CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>)
64-bit (sf == 1)
CNEG <Xd>, <Xn>, <cond>
is equivalent to
CSNEG <Xd>, <Xn>, <Xn>, invert(<cond>)
Assembler Symbols
Operation
The description of CSNEG gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CNEG Page 109

CPP
Cache Prefetch Prediction Restriction by Context prevents cache allocation predictions, based on information
gathered from earlier execution within a particular execution context, from allowing later speculative execution within
that context to be observable through side-channels.
For more information, see CPP RCTX, Cache Prefetch Prediction Restriction by Context.
System
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 Rt
L op1 CRn CRm op2
System
CPP RCTX, <Xt>
is equivalent to
SYS #3, C7, C3, #7, <Xt>
Assembler Symbols
Operation
CPP Page 110

CRC32B, CRC32H, CRC32W, CRC32X
CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.
It takes an input CRC value in the first source operand, performs a CRC on the input value in the second source
operand, and returns the output CRC value. The second source operand can be 8, 16, 32, or 64 bits. To align with
common usage, the bit order of the values is reversed as part of the operation, and the polynomial 0x04C11DB7 is
used for the CRC calculation.
In Armv8-A, this is an OPTIONAL instruction, and in Armv8.1 it is mandatory for all implementations to implement it.
ID_AA64ISAR0_EL1.CRC32 indicates whether this instruction is supported.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 1 0 0 sz Rn Rd
C
CRC32B (sf == 0 && sz == 00)
CRC32B <Wd>, <Wn>, <Wm>
CRC32H (sf == 0 && sz == 01)
CRC32H <Wd>, <Wn>, <Wm>
CRC32W (sf == 0 && sz == 10)
CRC32W <Wd>, <Wn>, <Wm>
CRC32X (sf == 1 && sz == 11)
CRC32X <Wd>, <Wn>, <Xm>
if !HaveCRCExt() then UNDEFINED;

if sf == '1' && sz != '11' then UNDEFINED;
if sf == '0' && sz == '11' then UNDEFINED;
integer size = 8 << UInt(sz);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose accumulator output register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose accumulator input register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose data source register, encoded in the "Rm" field.
<Wm> Is the 32-bit name of the general-purpose data source register, encoded in the "Rm" field.
Operation
bits(32) acc = X[n]; // accumulator

bits(size) val = X[m]; // input value
bits(32) poly = 0x04C11DB7<31:0>;
bits(32+size) tempacc = BitReverse(acc):Zeros(size);

bits(size+32) tempval = BitReverse(val):Zeros(32);
// Poly32Mod2 on a bitstring does a polynomial Modulus over {0,1} operation

X[d] = BitReverse(Poly32Mod2(tempacc EOR tempval, poly));
CRC32B, CRC32H, CRC32W,

Page 111
CRC32X
If PSTATE.DIT is 1:
CRC32B, CRC32H, CRC32W,

Page 112
CRC32X
CRC32CB, CRC32CH, CRC32CW, CRC32CX
CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.
It takes an input CRC value in the first source operand, performs a CRC on the input value in the second source
operand, and returns the output CRC value. The second source operand can be 8, 16, 32, or 64 bits. To align with
common usage, the bit order of the values is reversed as part of the operation, and the polynomial 0x1EDC6F41 is
used for the CRC calculation.
In Armv8-A, this is an OPTIONAL instruction, and in Armv8.1 it is mandatory for all implementations to implement it.
ID_AA64ISAR0_EL1.CRC32 indicates whether this instruction is supported.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 1 0 1 sz Rn Rd
C
CRC32CB (sf == 0 && sz == 00)
CRC32CB <Wd>, <Wn>, <Wm>
CRC32CH (sf == 0 && sz == 01)
CRC32CH <Wd>, <Wn>, <Wm>
CRC32CW (sf == 0 && sz == 10)
CRC32CW <Wd>, <Wn>, <Wm>
CRC32CX (sf == 1 && sz == 11)
CRC32CX <Wd>, <Wn>, <Xm>
if !HaveCRCExt() then UNDEFINED;

if sf == '1' && sz != '11' then UNDEFINED;
if sf == '0' && sz == '11' then UNDEFINED;
integer size = 8 << UInt(sz);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose accumulator output register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose accumulator input register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose data source register, encoded in the "Rm" field.
<Wm> Is the 32-bit name of the general-purpose data source register, encoded in the "Rm" field.
Operation
bits(32) acc = X[n]; // accumulator

bits(size) val = X[m]; // input value
bits(32) poly = 0x1EDC6F41<31:0>;
bits(32+size) tempacc = BitReverse(acc):Zeros(size);

bits(size+32) tempval = BitReverse(val):Zeros(32);
// Poly32Mod2 on a bitstring does a polynomial Modulus over {0,1} operation

X[d] = BitReverse(Poly32Mod2(tempacc EOR tempval, poly));
CRC32CB, CRC32CH,
Page 113
CRC32CW, CRC32CX
If PSTATE.DIT is 1:
CRC32CB, CRC32CH,
Page 114
CRC32CW, CRC32CX
CSDB
Consumption of Speculative Data Barrier is a memory barrier that controls speculative execution and data value
prediction.
No instruction other than branch instructions appearing in program order after the CSDB can be speculatively
executed using the results of any:
• Data value predictions of any instructions.
• PSTATE.{N,Z,C,V} predictions of any instructions other than conditional branch instructions appearing in
program order before the CSDB that have not been architecturally resolved.
• Predictions of SVE predication state for any SVE instructions.
For purposes of the definition of CSDB, PSTATE.{N,Z,C,V} is not considered a data value. This definition permits:
• Control flow speculation before and after the CSDB.
• Speculative execution of conditional data processing instructions after the CSDB, unless they use the results
of data value or PSTATE.{N,Z,C,V} predictions of instructions appearing in program order before the CSDB
that have not been architecturally resolved.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1
CRm op2
System
CSDB
// Empty.
Operation
CSDB Page 115

CSEL
Conditional Select returns, in the destination register, the value of the first source register if the condition is TRUE,
and otherwise returns the value of the second source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 Rm cond 0 0 Rn Rd
op o2
32-bit (sf == 0)
CSEL <Wd>, <Wn>, <Wm>, <cond>
64-bit (sf == 1)
CSEL <Xd>, <Xn>, <Xm>, <cond>
Assembler Symbols
Operation
result = operand1;
else
result = operand2;
X[d] = result;
If PSTATE.DIT is 1:
CSEL Page 116

CSET
Conditional Set sets the destination register to 1 if the condition is TRUE, and otherwise sets it to 0.
This is an alias of CSINC. This means:
• The encodings in this description are named to match the encodings of CSINC.
• The description of CSINC gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 1 1 1 1 1 != 111x 0 1 1 1 1 1 1 Rd
op Rm cond o2 Rn
32-bit (sf == 0)
CSET <Wd>, <cond>
is equivalent to
CSINC <Wd>, WZR, WZR, invert(<cond>)
64-bit (sf == 1)
CSET <Xd>, <cond>
is equivalent to
CSINC <Xd>, XZR, XZR, invert(<cond>)
Assembler Symbols
Operation
The description of CSINC gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CSET Page 117

CSETM
Conditional Set Mask sets all bits of the destination register to 1 if the condition is TRUE, and otherwise sets all bits to
0.
This is an alias of CSINV. This means:
• The encodings in this description are named to match the encodings of CSINV.
• The description of CSINV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 1 1 1 1 1 != 111x 0 0 1 1 1 1 1 Rd
op Rm cond o2 Rn
32-bit (sf == 0)
CSETM <Wd>, <cond>
is equivalent to
CSINV <Wd>, WZR, WZR, invert(<cond>)
64-bit (sf == 1)
CSETM <Xd>, <cond>
is equivalent to
CSINV <Xd>, XZR, XZR, invert(<cond>)
Assembler Symbols
Operation
The description of CSINV gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
CSETM Page 118

CSINC
Conditional Select Increment returns, in the destination register, the value of the first source register if the condition
is TRUE, and otherwise returns the value of the second source register incremented by 1.
This instruction is used by the aliases CINC, and CSET.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 Rm cond 0 1 Rn Rd
op o2
32-bit (sf == 0)
CSINC <Wd>, <Wn>, <Wm>, <cond>
64-bit (sf == 1)
CSINC <Xd>, <Xn>, <Xm>, <cond>
Assembler Symbols
Alias Conditions

CINC Rm != '11111' && cond != '111x' && Rn != '11111' && Rn == Rm
CSET Rm == '11111' && cond != '111x' && Rn == '11111'
Operation
result = operand1;
else
result = operand2 + 1;
X[d] = result;
If PSTATE.DIT is 1:
CSINC Page 119

CSINC Page 120

CSINV
Conditional Select Invert returns, in the destination register, the value of the first source register if the condition is
TRUE, and otherwise returns the bitwise inversion value of the second source register.
This instruction is used by the aliases CINV, and CSETM.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm cond 0 0 Rn Rd
op o2
32-bit (sf == 0)
CSINV <Wd>, <Wn>, <Wm>, <cond>
64-bit (sf == 1)
CSINV <Xd>, <Xn>, <Xm>, <cond>
Assembler Symbols
Alias Conditions

CINV Rm != '11111' && cond != '111x' && Rn != '11111' && Rn == Rm
CSETM Rm == '11111' && cond != '111x' && Rn == '11111'
Operation
result = operand1;
else
result = NOT(operand2);
X[d] = result;
If PSTATE.DIT is 1:
CSINV Page 121

CSINV Page 122

CSNEG
Conditional Select Negation returns, in the destination register, the value of the first source register if the condition is
TRUE, and otherwise returns the negated value of the second source register.
This instruction is used by the alias CNEG.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm cond 0 1 Rn Rd
op o2
32-bit (sf == 0)
CSNEG <Wd>, <Wn>, <Wm>, <cond>
64-bit (sf == 1)
CSNEG <Xd>, <Xn>, <Xm>, <cond>
Assembler Symbols
Alias Conditions

CNEG cond != '111x' && Rn == Rm
Operation
result = operand1;
else
result = NOT(operand2);
result = result + 1;
X[d] = result;
If PSTATE.DIT is 1:
CSNEG Page 123

CSNEG Page 124

DC
Data Cache operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 0 1 1 1 CRm op2 Rt
L CRn
System
DC <dc_op>, <Xt>
is equivalent to
SYS #<op1>, C7, <Cm>, #<op2>, <Xt>
and is the preferred disassembly when SysOp(op1,'0111',CRm,op2) == Sys_DC.
Assembler Symbols
<dc_op> Is a DC instruction name, as listed for the DC system instruction group, encoded in “op1:CRm:op2”:
op1 CRm op2 <dc_op> Architectural Feature
000 0110 001 IVAC -
000 0110 010 ISW -
000 0110 011 IGVAC ARMv8.5-MemTag
000 0110 100 IGSW ARMv8.5-MemTag
000 0110 101 IGDVAC ARMv8.5-MemTag
000 0110 110 IGDSW ARMv8.5-MemTag
000 1010 010 CSW -
000 1010 100 CGSW ARMv8.5-MemTag
000 1010 110 CGDSW ARMv8.5-MemTag
000 1110 010 CISW -
000 1110 100 CIGSW ARMv8.5-MemTag
000 1110 110 CIGDSW ARMv8.5-MemTag
011 0100 001 ZVA -
011 0100 011 GVA ARMv8.5-MemTag
011 0100 100 GZVA ARMv8.5-MemTag
011 1010 001 CVAC -
011 1010 011 CGVAC ARMv8.5-MemTag
011 1010 101 CGDVAC ARMv8.5-MemTag
011 1011 001 CVAU -
011 1100 001 CVAP ARMv8.2-DCPoP
011 1100 011 CGVAP ARMv8.5-MemTag
011 1100 101 CGDVAP ARMv8.5-MemTag
011 1101 001 CVADP ARMv8.2-DCCVADP
011 1101 011 CGVADP ARMv8.5-MemTag
011 1101 101 CGDVADP ARMv8.5-MemTag
011 1110 001 CIVAC -
011 1110 011 CIGVAC ARMv8.5-MemTag
011 1110 101 CIGDVAC ARMv8.5-MemTag
DC Page 125
Operation
DC Page 126
DCPS1
Debug Change PE State to EL1, when executed in Debug state:

• If executed at EL0 changes the current Exception level and SP to EL1 using SP_EL1.
• Otherwise, if executed at ELx, selects SP_ELx.
The target exception level of a DCPS1 instruction is:
• EL1 if the instruction is executed at EL0.
• Otherwise, the Exception level at which the instruction is executed.
When the target Exception level of a DCPS1 instruction is ELx, on executing this instruction:
• ELR_ELx becomes UNKNOWN.
• SPSR_ELx becomes UNKNOWN.
• ESR_ELx becomes UNKNOWN.
• DLR_EL0 and DSPSR_EL0 become UNKNOWN.
• The endianness is set according to SCTLR_ELx.EE.
This instruction is UNDEFINED at EL0 in Non-secure state if EL2 is implemented and HCR_EL2.TGE == 1.
This instruction is always UNDEFINED in Non-debug state.
For more information on the operation of the DCPSn instructions, see DCPS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 1 0 1 imm16 0 0 0 0 1
LL
System
DCPS1 {#<imm>}
if !Halted() then UNDEFINED;
Assembler Symbols
"imm16" field.
Operation
DCPSInstruction(LL);
DCPS1 Page 127

DCPS2

• If executed at EL0 or EL1 changes the current Exception level and SP to EL2 using SP_EL2.
• Otherwise, if executed at ELx, selects SP_ELx.
The target exception level of a DCPS2 instruction is:
• EL2 if the instruction is executed at an exception level that is not EL3.
• EL3 if the instruction is executed at EL3.
When the target Exception level of a DCPS2 instruction is ELx, on executing this instruction:
• ELR_ELx becomes UNKNOWN.
• SPSR_ELx becomes UNKNOWN.
• ESR_ELx becomes UNKNOWN.
• The endianness is set according to SCTLR_ELx.EE.
This instruction is UNDEFINED at the following exception levels:
• All exception levels if EL2 is not implemented.
• At EL0 and EL1 if EL2 is disabled in the current Security state.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 1 0 1 imm16 0 0 0 1 0
LL
System
DCPS2 {#<imm>}
Assembler Symbols
"imm16" field.
Operation
DCPS2 Page 128

DCPS3

• If executed at EL3 selects SP_EL3.
• Otherwise, changes the current Exception level and SP to EL3 using SP_EL3.
The target exception level of a DCPS3 instruction is EL3.
On executing a DCPS3 instruction:
• ELR_EL3 becomes UNKNOWN.
• SPSR_EL3 becomes UNKNOWN.
• ESR_EL3 becomes UNKNOWN.
• The endianness is set according to SCTLR_EL3.EE.
This instruction is UNDEFINED at all exception levels if either:
• EDSCR.SDD == 1.
• EL3 is not implemented.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 1 0 1 imm16 0 0 0 1 1
LL
System
DCPS3 {#<imm>}
Assembler Symbols
"imm16" field.
Operation
DCPS3 Page 129

DGH
DGH is a hint instruction. A DGH instruction is not expected to be performance optimal to merge memory accesses with
Normal Non-cacheable or Device-GRE attributes appearing in program order before the hint instruction with any
memory accesses appearing after the hint instruction into a single memory transaction on an interconnect.
System
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 1 1 1 1 1
CRm op2
System
DGH
if !HaveDGHExt() then EndOfInstruction();
Operation
Hint_DGH();
DGH Page 130

DMB
Data Memory Barrier is a memory barrier that ensures the ordering of observations of memory accesses, see Data
Memory Barrier.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 1 0 1 1 1 1 1 1
opc
System
DMB <option>|#<imm>
case CRm<3:2> of
when '00' domain = MBReqDomain_OuterShareable;
when '01' domain = MBReqDomain_Nonshareable;
when '10' domain = MBReqDomain_InnerShareable;
when '11' domain = MBReqDomain_FullSystem;
case CRm<1:0> of
when '00' types = MBReqTypes_All; domain = MBReqDomain_FullSystem;
when '01' types = MBReqTypes_Reads;
when '10' types = MBReqTypes_Writes;
when '11' types = MBReqTypes_All;
Assembler Symbols
<option> Specifies the limitation on the barrier operation. Values are:

SY
Full system is the required shareability domain, reads and writes are the required access types,
both before and after the barrier instruction. This option is referred to as the full system barrier.
Encoded as CRm = 0b1111.
ST
Full system is the required shareability domain, writes are the required access type, both before
and after the barrier instruction. Encoded as CRm = 0b1110.
LD
Full system is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b1101.
ISH
Inner Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm = 0b1011.
ISHST
Inner Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b1010.
ISHLD
Inner Shareable is the required shareability domain, reads are the required access type before the
NSH
Non-shareable is the required shareability domain, reads and writes are the required access, both
NSHST
Non-shareable is the required shareability domain, writes are the required access type, both
NSHLD
Non-shareable is the required shareability domain, reads are the required access type before the
DMB Page 131

OSH
Outer Shareable is the required shareability domain, reads and writes are the required access
OSHST
Outer Shareable is the required shareability domain, writes are the required access type, both
OSHLD
Outer Shareable is the required shareability domain, reads are the required access type before the
All other encodings of CRm that are not listed above are reserved, and can be encoded using the
#<imm> syntax. All unsupported and reserved options must execute as a full system barrier operation,
but software must not rely on this behavior. For more information on whether an access is before or
after a barrier instruction, see Data Memory Barrier (DMB) or see Data Synchronization Barrier (DSB).
<imm> Is a 4-bit unsigned immediate, in the range 0 to 15, encoded in the "CRm" field.
Operation
DataMemoryBarrier(domain, types);
DMB Page 132

DRPS
Debug restore process state.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
System
DRPS
if !Halted() || PSTATE.EL == EL0 then UNDEFINED;
Operation
DRPSInstruction();
DRPS Page 133

DSB
Data Synchronization Barrier is a memory barrier that ensures the completion of memory accesses, see Data
Synchronization Barrier.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 != 0x00 1 0 0 1 1 1 1 1
CRm opc
System
DSB <option>|#<imm>
case CRm<3:2> of
when '00' domain = MBReqDomain_OuterShareable;
when '01' domain = MBReqDomain_Nonshareable;
when '10' domain = MBReqDomain_InnerShareable;
when '11' domain = MBReqDomain_FullSystem;
case CRm<1:0> of
when '00' types = MBReqTypes_All; domain = MBReqDomain_FullSystem;
when '01' types = MBReqTypes_Reads;
when '10' types = MBReqTypes_Writes;
when '11' types = MBReqTypes_All;
Assembler Symbols
<option> Specifies the limitation on the barrier operation. Values are:

SY
Full system is the required shareability domain, reads and writes are the required access types,
both before and after the barrier instruction. This option is referred to as the full system barrier.
Encoded as CRm = 0b1111.
ST
Full system is the required shareability domain, writes are the required access type, both before
and after the barrier instruction. Encoded as CRm = 0b1110.
LD
Full system is the required shareability domain, reads are the required access type before the
ISH
Inner Shareable is the required shareability domain, reads and writes are the required access
ISHST
Inner Shareable is the required shareability domain, writes are the required access type, both
ISHLD
Inner Shareable is the required shareability domain, reads are the required access type before the
NSH
Non-shareable is the required shareability domain, reads and writes are the required access, both
NSHST
Non-shareable is the required shareability domain, writes are the required access type, both
NSHLD
Non-shareable is the required shareability domain, reads are the required access type before the
DSB Page 134

OSH
Outer Shareable is the required shareability domain, reads and writes are the required access
OSHST
Outer Shareable is the required shareability domain, writes are the required access type, both
OSHLD
Outer Shareable is the required shareability domain, reads are the required access type before the
All other encodings of CRm, other than the values 0b0000 and 0b0100, that are not listed above are
reserved, and can be encoded using the #<imm> syntax. All unsupported and reserved options must
execute as a full system barrier operation, but software must not rely on this behavior. For more
information on whether an access is before or after a barrier instruction, see Data Memory Barrier
(DMB) or see Data Synchronization Barrier (DSB).
The value 0b0000 is used to encode SSBB and the value 0b0100 is used to encode PSSBB.
Operation
DataSynchronizationBarrier(domain, types);
DSB Page 135

DVP
Data Value Prediction Restriction by Context prevents data value predictions, based on information gathered from
earlier execution within an particular execution context, from allowing later speculative execution within that context
to be observable through side-channels.
For more information, see DVP RCTX, Data Value Prediction Restriction by Context.
System
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 Rt
L op1 CRn CRm op2
System
DVP RCTX, <Xt>
is equivalent to
SYS #3, C7, C3, #5, <Xt>
Assembler Symbols
Operation
DVP Page 136

EON (shifted register)
Bitwise Exclusive OR NOT (shifted register) performs a bitwise Exclusive OR NOT of a register value and an
optionally-shifted register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc N
32-bit (sf == 0)
EON <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
EON <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Operation

result = operand1 EOR operand2;
X[d] = result;
EON (shifted register) Page 137

If PSTATE.DIT is 1:
EON (shifted register) Page 138

EOR (immediate)
Bitwise Exclusive OR (immediate) performs a bitwise Exclusive OR of a register value and an immediate value, and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
EOR <Wd|WSP>, <Wn>, #<imm>
64-bit (sf == 1)
EOR <Xd|SP>, <Xn>, #<imm>
bits(datasize) imm;
Assembler Symbols
field.
field.
Operation
result = operand1 EOR imm;
if d == 31 then
SP[] = result;
else
X[d] = result;
If PSTATE.DIT is 1:
EOR (immediate) Page 139

EOR (shifted register)
Bitwise Exclusive OR (shifted register) performs a bitwise Exclusive OR of a register value and an optionally-shifted
register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc N
32-bit (sf == 0)
EOR <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
EOR <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Operation

X[d] = result;
If PSTATE.DIT is 1:
EOR (shifted register) Page 140

EOR (shifted register) Page 141

ERET
Exception Return using the ELR and SPSR for the current Exception level. When executed, the PE restores PSTATE
from the SPSR, and branches to the address held in the ELR.
The PE checks the SPSR for the current Exception level for an illegal return event. See Illegal return events from
AArch64 state.
ERET is UNDEFINED at EL0.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
A M Rn op4
System
ERET
if PSTATE.EL == EL0 then UNDEFINED;
Operation
AArch64.CheckForERetTrap(FALSE, TRUE);
bits(64) target = ELR[];
AArch64.ExceptionReturn(target, SPSR[]);
ERET Page 142

ERETAA, ERETAB
Exception Return, with pointer authentication. This instruction authenticates the address in ELR, using SP as the
modifier and the specified key, the PE restores PSTATE from the SPSR for the current Exception level, and branches to
the authenticated address.
Key A is used for ERETAA, and key B is used for ERETAB.
The authenticated address is not written back to ELR.
The PE checks the SPSR for the current Exception level for an illegal return event. See Illegal return events from
AArch64 state.
ERETAA and ERETAB are UNDEFINED at EL0.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 M 1 1 1 1 1 1 1 1 1 1
A Rn op4
ERETAA (M == 0)
ERETAA
ERETAB (M == 1)
ERETAB

UNDEFINED;
Operation
AArch64.CheckForERetTrap(TRUE, use_key_a);
bits(64) target;
if use_key_a then
target = AuthIA(ELR[], SP[], TRUE);
else
target = AuthIB(ELR[], SP[], TRUE);
AArch64.ExceptionReturn(target, SPSR[]);
ERETAA, ERETAB Page 143

ESB
Error Synchronization Barrier is an error synchronization event that might also update DISR_EL1 and VDISR_EL2.
This instruction can be used at all Exception levels and in Debug state.
In Debug state, this instruction behaves as if SError interrupts are masked at all Exception levels. See Error
Synchronization Barrier in the Arm(R) Reliability, Availability, and Serviceability (RAS) Specification, Armv8, for
Armv8-A architecture profile.
If the RAS Extension is not implemented, this instruction executes as a NOP.
System
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1
CRm op2
System
ESB
if !HaveRASExt() then EndOfInstruction();
Operation
ESB Page 144

EXTR
Extract register extracts a register from a pair of registers.
This instruction is used by the alias ROR (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 1 N 0 Rm imms Rn Rd
32-bit (sf == 0 && N == 0 && imms == 0xxxxx)
EXTR <Wd>, <Wn>, <Wm>, #<lsb>
64-bit (sf == 1 && N == 1)
EXTR <Xd>, <Xn>, <Xm>, #<lsb>
integer lsb;
if N != sf then UNDEFINED;
if sf == '0' && imms<5> == '1' then UNDEFINED;
lsb = UInt(imms);
Assembler Symbols
<lsb> For the 32-bit variant: is the least significant bit position from which to extract, in the range 0 to 31,
For the 64-bit variant: is the least significant bit position from which to extract, in the range 0 to 63,
Alias Conditions

ROR (immediate) Rn == Rm
Operation
bits(2*datasize) concat = operand1:operand2;
result = concat<lsb+datasize-1:lsb>;
X[d] = result;
EXTR Page 145

If PSTATE.DIT is 1:
EXTR Page 146

GMI
Tag Mask Insert inserts the tag in the first source register into the excluded set specified in the second source
register, writing the new excluded set to the destination register.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 1 0 1 Xn Xd
Integer
GMI <Xd>, <Xn|SP>, <Xm>

integer m = UInt(Xm);
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Xd" field.
field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Xm" field.
Operation
bits(64) address = if n == 31 then SP[] else X[n];

bits(64) mask = X[m];
bits(4) tag = AArch64.AllocationTagFromAddress(address);
mask<UInt(tag)> = '1';
X[d] = mask;
GMI Page 147

HINT
Hint instruction is for the instruction set space that is reserved for architectural hint instructions.
Some encodings described here are not allocated in this revision of the architecture, and behave as NOPs. These
encodings might be allocated to other hint functionality in future revisions of the architecture and therefore must not
be used by software.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 CRm op2 1 1 1 1 1
System
HINT #<imm>
SystemHintOp op;
case CRm:op2 of
when '0000 000' op = SystemHintOp_NOP;
when '0000 001' op = SystemHintOp_YIELD;
when '0000 010' op = SystemHintOp_WFE;
when '0000 011' op = SystemHintOp_WFI;
when '0000 100' op = SystemHintOp_SEV;
when '0000 101' op = SystemHintOp_SEVL;
when '0000 110'
if !HaveDGHExt() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_DGH;
when '0001 xxx'
case op2 of
when '000' SEE "PACIA1716";
when '010' SEE "PACIB1716";
when '100' SEE "AUTIA1716";
when '110' SEE "AUTIB1716";
otherwise EndOfInstruction();
when '0010 000'
if !HaveRASExt() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_ESB;
when '0010 001'
if !HaveStatisticalProfiling() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_PSB;
when '0010 010'
if !HaveSelfHostedTrace() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_TSB;
when '0010 100'
op = SystemHintOp_CSDB;
when '0011 xxx'
case op2 of
when '000' SEE "PACIAZ";
when '001' SEE "PACIASP";
when '010' SEE "PACIBZ";
when '011' SEE "PACIBSP";
when '100' SEE "AUTIAZ";
when '101' SEE "AUTHASP";
when '110' SEE "AUTIBZ";
when '111' SEE "AUTIBSP";
when '0100 xx0'
op = SystemHintOp_BTI;
// Check branch target compatibility between BTI instruction and PSTATE.BTYPE
SetBTypeCompatible(BTypeCompatible_BTI(op2<2:1>));
otherwise EndOfInstruction();
Assembler Symbols
<imm> Is a 7-bit unsigned immediate, in the range 0 to 127 encoded in the "CRm:op2" field.
HINT Page 148

The encodings that are allocated to architectural hint functionality are described in the "Hints" table in
the "Index by Encoding".
For allocated encodings of "CRm:op2":
• A disassembler will disassemble the allocated instruction, rather than the HINT instruction.
• An assembler may support assembly of allocated encodings using HINT with the
corresponding <imm> value, but it is not required to do so.
HINT Page 149

Operation
HINT Page 150

case op of
when SystemHintOp_YIELD
Hint_Yield();
when SystemHintOp_DGH
Hint_DGH();
when SystemHintOp_WFE
else
trap = FALSE;
- = SCTLR[];
target_el = EL1;
else
target_el = EL2;
else

target_el = EL3;
else

(delay_enabled, delay) = WFETrapDelay(target_el); // (If trap delay is enabled, Delay
else
WaitForEvent();
when SystemHintOp_WFI
WaitForInterrupt();
when SystemHintOp_SEV
SendEvent();
when SystemHintOp_SEVL
SendEventLocal();
when SystemHintOp_ESB
HINT Page 151

when SystemHintOp_PSB
when SystemHintOp_TSB
when SystemHintOp_CSDB
when SystemHintOp_BTI
SetBTypeNext('00');
otherwise // do nothing
HINT Page 152

HLT
Halt instruction. A HLTinstruction can generate a Halt Instruction debug event, which causes entry into Debug state.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 1 0 imm16 0 0 0 0 0
System
HLT #<imm>
if EDSCR.HDE == '0' || !HaltingAllowed() then UNDEFINED;

SetBTypeCompatible(TRUE);
Assembler Symbols
Operation
Halt(DebugHalt_HaltInstruction);
HLT Page 153

HVC
Hypervisor Call causes an exception to EL2. Non-secure software executing at EL1 can use this instruction to call the
hypervisor to request a service.
The HVC instruction is UNDEFINED:
• At EL0.
• At EL1 if EL2 is not enabled in the current Security state.
• When SCR_EL3.HCE is set to 0.
On executing an HVC instruction, the PE records the exception as a Hypervisor Call exception in ESR_ELx, using the
EC value 0x16, and the value of the immediate argument.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 0 imm16 0 0 0 1 0
System
HVC #<imm>
// Empty.
Assembler Symbols
Operation
if !HaveEL(EL2) || PSTATE.EL == EL0 || (PSTATE.EL == EL1 && (!IsSecureEL2Enabled() && IsSecure())) then
UNDEFINED;
hvc_enable = if HaveEL(EL3) then SCR_EL3.HCE else NOT(HCR_EL2.HCD);
if hvc_enable == '0' then

UNDEFINED;
else
AArch64.CallHypervisor(imm16);
HVC Page 154

IC
Instruction Cache operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and
address translation instructions.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 0 1 1 1 CRm op2 Rt
L CRn
System
IC <ic_op>{, <Xt>}
is equivalent to
SYS #<op1>, C7, <Cm>, #<op2>{, <Xt>}
and is the preferred disassembly when SysOp(op1,'0111',CRm,op2) == Sys_IC.
Assembler Symbols
<ic_op> Is an IC instruction name, as listed for the IC system instruction pages, encoded in “op1:CRm:op2”:
op1 CRm op2 <ic_op>
000 0001 000 IALLUIS
000 0101 000 IALLU
011 0101 001 IVAU
<Xt> Is the 64-bit name of the optional general-purpose source register, defaulting to '11111', encoded in the
"Rt" field.
Operation
IC Page 155
IRG
Insert Random Tag inserts a random Logical Address Tag into the address in the first source register, and writes the
result to the destination register. Any tags specified in the optional second source register or in GCR_EL1.Exclude are
excluded from the selection of the random Logical Address Tag.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 1 0 0 Xn Xd
Integer
IRG <Xd|SP>, <Xn|SP>{, <Xm>}

Assembler Symbols
field.
field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Xm" field. Defaults to
XZR if absent.
Operation
bits(64) operand = if n == 31 then SP[] else X[n];

bits(64) exclude_reg = X[m];
bits(16) exclude = exclude_reg<15:0> OR GCR_EL1.Exclude;
if GCR_EL1.RRND == '1' then
rtag = _ChooseRandomNonExcludedTag(exclude);
else
bits(4) start = RGSR_EL1.TAG;
bits(4) offset = AArch64.RandomTag();
rtag = AArch64.ChooseNonExcludedTag(start, offset, exclude);
RGSR_EL1.TAG = rtag;
else
rtag = '0000';
bits(64) result = AArch64.AddressWithAllocationTag(operand, AccType_NORMAL, rtag);
if d == 31 then
SP[] = result;
else
X[d] = result;
IRG Page 156

ISB
Instruction Synchronization Barrier flushes the pipeline in the PE and is a context synchronization event. For more
information, see Instruction Synchronization Barrier (ISB).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 1 1 0 1 1 1 1 1
opc
System
ISB {<option>|#<imm>}
// No additional decoding required
Assembler Symbols
<option> Specifies an optional limitation on the barrier operation. Values are:

SY
Full system barrier operation, encoded as CRm = 0b1111. Can be omitted.
All other encodings of CRm are reserved. The corresponding instructions execute as full system barrier
operations, but must not be relied upon by software.
"CRm" field.
Operation
InstructionSynchronizationBarrier();
ISB Page 157

LDADD, LDADDA, LDADDAL, LDADDL
Atomic add on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, adds
the value held in a register to it, and stores the result back to memory. The value initially loaded from memory is
returned in the destination register.
• If the destination register is not one of WZR or XZR, LDADDA and LDADDAL load from memory with acquire
semantics.
• LDADDL and LDADDAL store to memory with release semantics.
• LDADD has neither acquire nor release semantics.
This instruction is used by the alias STADD, STADDL.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc
LDADD, LDADDA, LDADDAL,

Page 158
LDADDL
32-bit LDADD (size == 10 && A == 0 && R == 0)
LDADD <Ws>, <Wt>, [<Xn|SP>]
32-bit LDADDA (size == 10 && A == 1 && R == 0)
LDADDA <Ws>, <Wt>, [<Xn|SP>]
32-bit LDADDAL (size == 10 && A == 1 && R == 1)
LDADDAL <Ws>, <Wt>, [<Xn|SP>]
32-bit LDADDL (size == 10 && A == 0 && R == 1)
LDADDL <Ws>, <Wt>, [<Xn|SP>]
64-bit LDADD (size == 11 && A == 0 && R == 0)
LDADD <Xs>, <Xt>, [<Xn|SP>]
64-bit LDADDA (size == 11 && A == 1 && R == 0)
LDADDA <Xs>, <Xt>, [<Xn|SP>]
64-bit LDADDAL (size == 11 && A == 1 && R == 1)
LDADDAL <Xs>, <Xt>, [<Xn|SP>]
64-bit LDADDL (size == 11 && A == 0 && R == 1)
LDADDL <Xs>, <Xt>, [<Xn|SP>]

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
Alias Conditions

STADD, STADDL A == '0' && Rt == '11111'

Page 159
LDADDL
Operation
bits(64) address;
bits(datasize) value;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomic(address, MemAtomicOp_ADD, value, ldacctype, stacctype);
if t != 31 then
X[t] = ZeroExtend(data, regsize);
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Page 160
LDADDL
LDADDB, LDADDAB, LDADDALB, LDADDLB
Atomic add on byte in memory atomically loads an 8-bit byte from memory, adds the value held in a register to it, and
stores the result back to memory. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDADDAB and LDADDALB load from memory with acquire semantics.
• LDADDLB and LDADDALB store to memory with release semantics.
• LDADDB has neither acquire nor release semantics.
This instruction is used by the alias STADDB, STADDLB.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc
LDADDAB (A == 1 && R == 0)
LDADDAB <Ws>, <Wt>, [<Xn|SP>]
LDADDALB (A == 1 && R == 1)
LDADDALB <Ws>, <Wt>, [<Xn|SP>]
LDADDB (A == 0 && R == 0)
LDADDB <Ws>, <Wt>, [<Xn|SP>]
LDADDLB (A == 0 && R == 1)
LDADDLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STADDB, STADDLB A == '0' && Rt == '11111'
LDADDB, LDADDAB,
Page 161
LDADDALB, LDADDLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
LDADDB, LDADDAB,
Page 162
LDADDALB, LDADDLB
LDADDH, LDADDAH, LDADDALH, LDADDLH
Atomic add on halfword in memory atomically loads a 16-bit halfword from memory, adds the value held in a register
to it, and stores the result back to memory. The value initially loaded from memory is returned in the destination
register.
• If the destination register is not WZR, LDADDAH and LDADDALH load from memory with acquire semantics.
• LDADDLH and LDADDALH store to memory with release semantics.
• LDADDH has neither acquire nor release semantics.
This instruction is used by the alias STADDH, STADDLH.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc
LDADDAH (A == 1 && R == 0)
LDADDAH <Ws>, <Wt>, [<Xn|SP>]
LDADDALH (A == 1 && R == 1)
LDADDALH <Ws>, <Wt>, [<Xn|SP>]
LDADDH (A == 0 && R == 0)
LDADDH <Ws>, <Wt>, [<Xn|SP>]
LDADDLH (A == 0 && R == 1)
LDADDLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STADDH, STADDLH A == '0' && Rt == '11111'
LDADDH, LDADDAH,
Page 163
LDADDALH, LDADDLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDADDH, LDADDAH,
Page 164
LDADDALH, LDADDLH
LDAPR
Load-Acquire RCpc Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword
from the derived address in memory, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs
32-bit (size == 10)
LDAPR <Wt>, [<Xn|SP> {,#0}]
64-bit (size == 11)
LDAPR <Xt>, [<Xn|SP> {,#0}]
integer elsize = 8 << UInt(size);

integer regsize = if elsize == 64 then 64 else 32;
Assembler Symbols
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = Mem[address, dbytes, AccType_ORDERED];

LDAPR Page 165

LDAPR Page 166

LDAPRB
Load-Acquire RCpc Register Byte derives an address from a base register value, loads a byte from the derived address
in memory, zero-extends it and writes it to a register.
except that:
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs
Integer
LDAPRB <Wt>, [<Xn|SP> {,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = Mem[address, 1, AccType_ORDERED];

LDAPRB Page 167

LDAPRH
Load-Acquire RCpc Register Halfword derives an address from a base register value, loads a halfword from the
derived address in memory, zero-extends it and writes it to a register.
except that:
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs
Integer
LDAPRH <Wt>, [<Xn|SP> {,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDAPRH Page 168

LDAPUR
Load-Acquire RCpc Register (unscaled) calculates an address from a base register and an immediate offset, loads a
32-bit word or 64-bit doubleword from memory, zero-extends it, and writes it to a register.
except that:
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc
32-bit (size == 10)
LDAPUR <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (size == 11)
LDAPUR <Xt>, [<Xn|SP>{, #<simm>}]
integer scale = UInt(size);

bits(64) offset = SignExtend(imm9, 64);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer regsize;
regsize = if size == '11' then 64 else 32;

integer datasize = 8 << scale;
LDAPUR Page 169

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
address = address + offset;
data = Mem[address, datasize DIV 8, AccType_ORDERED];

LDAPUR Page 170

LDAPURB
Load-Acquire RCpc Register Byte (unscaled) calculates an address from a base register and an immediate offset, loads
a byte from memory, zero-extends it, and writes it to a register.
except that:
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
LDAPURB <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDAPURB Page 171

LDAPURB Page 172

LDAPURH
Load-Acquire RCpc Register Halfword (unscaled) calculates an address from a base register and an immediate offset,
loads a halfword from memory, zero-extends it, and writes it to a register.
except that:
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
LDAPURH <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDAPURH Page 173

LDAPURH Page 174

LDAPURSB
Load-Acquire RCpc Register Signed Byte (unscaled) calculates an address from a base register and an immediate
offset, loads a signed byte from memory, sign-extends it, and writes it to a register.
except that:
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 1 x 0 imm9 0 0 Rn Rt
size opc
32-bit (opc == 11)
LDAPURSB <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (opc == 10)
LDAPURSB <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
MemOp memop;
boolean signed;
integer regsize;
if opc<1> == '0' then

// store or zero-extending load
memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
regsize = 32;
signed = FALSE;
else
// sign-extending load
memop = MemOp_LOAD;
regsize = if opc<0> == '1' then 32 else 64;
signed = TRUE;
boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);
LDAPURSB Page 175

Operation
bits(64) address;
bits(8) data;
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, AccType_ORDERED] = data;
when MemOp_LOAD
if signed then
X[t] = SignExtend(data, regsize);
else
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
LDAPURSB Page 176

LDAPURSH
Load-Acquire RCpc Register Signed Halfword (unscaled) calculates an address from a base register and an immediate
offset, loads a signed halfword from memory, sign-extends it, and writes it to a register.
except that:
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 1 x 0 imm9 0 0 Rn Rt
size opc
32-bit (opc == 11)
LDAPURSH <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (opc == 10)
LDAPURSH <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;
LDAPURSH Page 177

Operation
bits(64) address;
bits(16) data;
if n == 31 then
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
LDAPURSH Page 178

LDAPURSW
Load-Acquire RCpc Register Signed Word (unscaled) calculates an address from a base register and an immediate
offset, loads a signed word from memory, sign-extends it, and writes it to a register.
except that:
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 0 1 1 0 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
LDAPURSW <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(32) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

X[t] = SignExtend(data, 64);
LDAPURSW Page 179

LDAPURSW Page 180

LDAR
Load-Acquire Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword from
memory, and writes it to a register. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses, see Load/Store addressing modes.
For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the acquire
semantic other than its effect on the arrival at endpoints.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
32-bit (size == 10)
LDAR <Wt>, [<Xn|SP>{,#0}]
64-bit (size == 11)
LDAR <Xt>, [<Xn|SP>{,#0}]

Assembler Symbols
Operation
bits(64) address;
bits(elsize) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = Mem[address, dbytes, AccType_ORDERED];

LDAR Page 181

LDARB
Load-Acquire Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it
and writes it to a register. The instruction also has memory ordering semantics as described in Load-Acquire, Store-
Release. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
LDARB <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDARB Page 182

LDARH
Load-Acquire Register Halfword derives an address from a base register value, loads a halfword from memory, zero-
extends it, and writes it to a register. The instruction also has memory ordering semantics as described in Load-
Acquire, Store-Release. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
LDARH <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDARH Page 183

LDAXP
Load-Acquire Exclusive Pair of Registers derives an address from a base register value, loads two 32-bit words or two
64-bit doublewords from memory, and writes them to two registers. A 32-bit pair requires the address to be
doubleword aligned and is single-copy atomic at doubleword granularity. A 64-bit pair requires the address to be
quadword aligned and is single-copy atomic for each doubleword at doubleword granularity. The PE marks the
physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive
instructions. See Synchronization and semaphores. The instruction also has memory ordering semantics as described
in Load-Acquire, Store-Release. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 1 1 (1) (1) (1) (1) (1) 1 Rt2 Rn Rt
L Rs o0
32-bit (sz == 0)
LDAXP <Wt1>, <Wt2>, [<Xn|SP>{,#0}]
64-bit (sz == 1)
LDAXP <Xt1>, <Xt2>, [<Xn|SP>{,#0}]
integer t2 = UInt(Rt2);
integer elsize = 32 << UInt(sz);

integer datasize = elsize * 2;
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDAXP.
Assembler Symbols
<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
LDAXP Page 184

Operation
bits(64) address;
constant integer dbytes = datasize DIV 8;
boolean rt_unknown = FALSE;
if t == t2 then
Constraint c = ConstrainUnpredictable(Unpredictable_LDPOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // result is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
// Tell the Exclusives monitors to record a sequence of one or more atomic

// memory reads from virtual address range [address, address+dbytes-1].
// The Exclusives monitor will only be set if all the reads are from the
// same dbytes-aligned physical address, to allow for the possibility of
// an atomicity break if the translation is changed between reads.
AArch64.SetExclusiveMonitors(address, dbytes);
if rt_unknown then
// ConstrainedUNPREDICTABLE case
X[t] = bits(datasize) UNKNOWN; // In this case t = t2
elsif elsize == 32 then
// 32-bit load exclusive pair (atomic)
data = Mem[address, dbytes, AccType_ORDEREDATOMIC];
if BigEndian() then
X[t] = data<datasize-1:elsize>;
X[t2] = data<elsize-1:0>;
else
X[t] = data<elsize-1:0>;
X[t2] = data<datasize-1:elsize>;
else // elsize == 64
// 64-bit load exclusive pair (not atomic),
// but must be 128-bit aligned
if address != Align(address, dbytes) then
AArch64.Abort(address, AArch64.AlignmentFault(AccType_ORDEREDATOMIC, FALSE, FALSE));
X[t] = Mem[address, 8, AccType_ORDEREDATOMIC];
X[t2] = Mem[address+8, 8, AccType_ORDEREDATOMIC];
LDAXP Page 185

LDAXR
Load-Acquire Exclusive Register derives an address from a base register value, loads a 32-bit word or 64-bit
doubleword from memory, and writes it to a register. The memory access is atomic. The PE marks the physical address
being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
32-bit (size == 10)
LDAXR <Wt>, [<Xn|SP>{,#0}]
64-bit (size == 11)
LDAXR <Xt>, [<Xn|SP>{,#0}]

Assembler Symbols
Operation
bits(64) address;
bits(elsize) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, dbytes, AccType_ORDEREDATOMIC];

LDAXR Page 186

LDAXR Page 187

LDAXRB
Load-Acquire Exclusive Register Byte derives an address from a base register value, loads a byte from memory, zero-
extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed
as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. The instruction also has memory ordering semantics as described in Load-Acquire, Store-Release. For
information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
LDAXRB <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

AArch64.SetExclusiveMonitors(address, 1);
data = Mem[address, 1, AccType_ORDEREDATOMIC];

LDAXRB Page 188

LDAXRH
Load-Acquire Exclusive Register Halfword derives an address from a base register value, loads a halfword from
memory, zero-extends it and writes it to a register. The memory access is atomic. The PE marks the physical address
being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
LDAXRH <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 2, AccType_ORDEREDATOMIC];

LDAXRH Page 189

LDCLR, LDCLRA, LDCLRAL, LDCLRL
Atomic bit clear on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory,
performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to
memory. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDCLRA and LDCLRAL load from memory with acquire
semantics.
• LDCLRL and LDCLRAL store to memory with release semantics.
• LDCLR has neither acquire nor release semantics.
This instruction is used by the alias STCLR, STCLRL.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc
LDCLR, LDCLRA, LDCLRAL,

Page 190
LDCLRL
32-bit LDCLR (size == 10 && A == 0 && R == 0)
LDCLR <Ws>, <Wt>, [<Xn|SP>]
32-bit LDCLRA (size == 10 && A == 1 && R == 0)
LDCLRA <Ws>, <Wt>, [<Xn|SP>]
32-bit LDCLRAL (size == 10 && A == 1 && R == 1)
LDCLRAL <Ws>, <Wt>, [<Xn|SP>]
32-bit LDCLRL (size == 10 && A == 0 && R == 1)
LDCLRL <Ws>, <Wt>, [<Xn|SP>]
64-bit LDCLR (size == 11 && A == 0 && R == 0)
LDCLR <Xs>, <Xt>, [<Xn|SP>]
64-bit LDCLRA (size == 11 && A == 1 && R == 0)
LDCLRA <Xs>, <Xt>, [<Xn|SP>]
64-bit LDCLRAL (size == 11 && A == 1 && R == 1)
LDCLRAL <Xs>, <Xt>, [<Xn|SP>]
64-bit LDCLRL (size == 11 && A == 0 && R == 1)
LDCLRL <Xs>, <Xt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STCLR, STCLRL A == '0' && Rt == '11111'

Page 191
LDCLRL
Operation
bits(64) address;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomic(address, MemAtomicOp_BIC, value, ldacctype, stacctype);
if t != 31 then

Page 192
LDCLRL
LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB
Atomic bit clear on byte in memory atomically loads an 8-bit byte from memory, performs a bitwise AND with the
complement of the value held in a register on it, and stores the result back to memory. The value initially loaded from
memory is returned in the destination register.
• If the destination register is not WZR, LDCLRAB and LDCLRALB load from memory with acquire semantics.
• LDCLRLB and LDCLRALB store to memory with release semantics.
• LDCLRB has neither acquire nor release semantics.
This instruction is used by the alias STCLRB, STCLRLB.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc
LDCLRAB (A == 1 && R == 0)
LDCLRAB <Ws>, <Wt>, [<Xn|SP>]
LDCLRALB (A == 1 && R == 1)
LDCLRALB <Ws>, <Wt>, [<Xn|SP>]
LDCLRB (A == 0 && R == 0)
LDCLRB <Ws>, <Wt>, [<Xn|SP>]
LDCLRLB (A == 0 && R == 1)
LDCLRLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STCLRB, STCLRLB A == '0' && Rt == '11111'
LDCLRB, LDCLRAB,
Page 193
LDCLRALB, LDCLRLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDCLRB, LDCLRAB,
Page 194
LDCLRALB, LDCLRLB
LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH
Atomic bit clear on halfword in memory atomically loads a 16-bit halfword from memory, performs a bitwise AND with
the complement of the value held in a register on it, and stores the result back to memory. The value initially loaded
from memory is returned in the destination register.
• If the destination register is not WZR, LDCLRAH and LDCLRALH load from memory with acquire semantics.
• LDCLRLH and LDCLRALH store to memory with release semantics.
• LDCLRH has neither acquire nor release semantics.
This instruction is used by the alias STCLRH, STCLRLH.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc
LDCLRAH (A == 1 && R == 0)
LDCLRAH <Ws>, <Wt>, [<Xn|SP>]
LDCLRALH (A == 1 && R == 1)
LDCLRALH <Ws>, <Wt>, [<Xn|SP>]
LDCLRH (A == 0 && R == 0)
LDCLRH <Ws>, <Wt>, [<Xn|SP>]
LDCLRLH (A == 0 && R == 1)
LDCLRLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STCLRH, STCLRLH A == '0' && Rt == '11111'
LDCLRH, LDCLRAH,
Page 195
LDCLRALH, LDCLRLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDCLRH, LDCLRAH,
Page 196
LDCLRALH, LDCLRLH
LDEOR, LDEORA, LDEORAL, LDEORL
Atomic exclusive OR on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDEORA and LDEORAL load from memory with acquire
semantics.
• LDEORL and LDEORAL store to memory with release semantics.
• LDEOR has neither acquire nor release semantics.
This instruction is used by the alias STEOR, STEORL.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc
LDEOR, LDEORA, LDEORAL,

Page 197
LDEORL
32-bit LDEOR (size == 10 && A == 0 && R == 0)
LDEOR <Ws>, <Wt>, [<Xn|SP>]
32-bit LDEORA (size == 10 && A == 1 && R == 0)
LDEORA <Ws>, <Wt>, [<Xn|SP>]
32-bit LDEORAL (size == 10 && A == 1 && R == 1)
LDEORAL <Ws>, <Wt>, [<Xn|SP>]
32-bit LDEORL (size == 10 && A == 0 && R == 1)
LDEORL <Ws>, <Wt>, [<Xn|SP>]
64-bit LDEOR (size == 11 && A == 0 && R == 0)
LDEOR <Xs>, <Xt>, [<Xn|SP>]
64-bit LDEORA (size == 11 && A == 1 && R == 0)
LDEORA <Xs>, <Xt>, [<Xn|SP>]
64-bit LDEORAL (size == 11 && A == 1 && R == 1)
LDEORAL <Xs>, <Xt>, [<Xn|SP>]
64-bit LDEORL (size == 11 && A == 0 && R == 1)
LDEORL <Xs>, <Xt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STEOR, STEORL A == '0' && Rt == '11111'

Page 198
LDEORL
Operation
bits(64) address;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomic(address, MemAtomicOp_EOR, value, ldacctype, stacctype);
if t != 31 then

Page 199
LDEORL
LDEORB, LDEORAB, LDEORALB, LDEORLB
Atomic exclusive OR on byte in memory atomically loads an 8-bit byte from memory, performs an exclusive OR with
the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is
• If the destination register is not WZR, LDEORAB and LDEORALB load from memory with acquire semantics.
• LDEORLB and LDEORALB store to memory with release semantics.
• LDEORB has neither acquire nor release semantics.
This instruction is used by the alias STEORB, STEORLB.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc
LDEORAB (A == 1 && R == 0)
LDEORAB <Ws>, <Wt>, [<Xn|SP>]
LDEORALB (A == 1 && R == 1)
LDEORALB <Ws>, <Wt>, [<Xn|SP>]
LDEORB (A == 0 && R == 0)
LDEORB <Ws>, <Wt>, [<Xn|SP>]
LDEORLB (A == 0 && R == 1)
LDEORLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STEORB, STEORLB A == '0' && Rt == '11111'
LDEORB, LDEORAB,
Page 200
LDEORALB, LDEORLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDEORB, LDEORAB,
Page 201
LDEORALB, LDEORLB
LDEORH, LDEORAH, LDEORALH, LDEORLH
Atomic exclusive OR on halfword in memory atomically loads a 16-bit halfword from memory, performs an exclusive
OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from
memory is returned in the destination register.
• If the destination register is not WZR, LDEORAH and LDEORALH load from memory with acquire semantics.
• LDEORLH and LDEORALH store to memory with release semantics.
• LDEORH has neither acquire nor release semantics.
This instruction is used by the alias STEORH, STEORLH.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc
LDEORAH (A == 1 && R == 0)
LDEORAH <Ws>, <Wt>, [<Xn|SP>]
LDEORALH (A == 1 && R == 1)
LDEORALH <Ws>, <Wt>, [<Xn|SP>]
LDEORH (A == 0 && R == 0)
LDEORH <Ws>, <Wt>, [<Xn|SP>]
LDEORLH (A == 0 && R == 1)
LDEORLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STEORH, STEORLH A == '0' && Rt == '11111'
LDEORH, LDEORAH,
Page 202
LDEORALH, LDEORLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDEORH, LDEORAH,
Page 203
LDEORALH, LDEORLH
LDG
Load Allocation Tag loads an Allocation Tag from a memory address, generates a Logical Address Tag from the
Allocation Tag and merges it into the destination register. The address used for the load is calculated from the base
register and an immediate signed offset scaled by the Tag granule.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 0 0 Xn Xt
Integer
LDG <Xt>, [<Xn|SP>{, #<simm>}]

integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<simm> Is the optional signed immediate offset, a multiple of 16 in the range -4096 to 4080, defaulting to 0 and
encoded in the "imm9" field.
Operation
bits(64) address;
bits(4) tag;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = Align(address, TAG_GRANULE);
tag = AArch64.MemTag[address, AccType_NORMAL];

X[t] = AArch64.AddressWithAllocationTag(X[t], AccType_NORMAL, tag);
LDG Page 204

LDGM
Load Tag Multiple reads a naturally aligned block of N Allocation Tags, where the size of N is identified in
GMID_EL1.BS, and writes the Allocation Tag read from address A to the destination register at
4*A<7:4>+3:4*A<7:4>. Bits of the destination register not written with an Allocation Tag are set to 0.
This instruction is UNDEFINED at EL0.
This instruction generates an Unchecked access.
If ID_AA64PFR1_EL1.MTE != 0b0010, this instruction is UNDEFINED.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt
Integer
LDGM <Xt>, [<Xn|SP>]

Assembler Symbols
Operation

UNDEFINED;
bits(64) data = Zeros(64);

bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
integer size = 4 * (2 ^ (UInt(GMID_EL1.BS)));

address = Align(address, size);
integer count = size >> LOG2_TAG_GRANULE;
integer index = UInt(address<LOG2_TAG_GRANULE+3:LOG2_TAG_GRANULE>);
for i = 0 to count-1
bits(4) tag = AArch64.MemTag[address, AccType_NORMAL];
data<(index*4)+3:index*4> = tag;
address = address + TAG_GRANULE;
index = index + 1;
X[t] = data;
LDGM Page 205

LDLAR
Load LOAcquire Register loads a 32-bit word or 64-bit doubleword from memory, and writes it to a register. The
instruction also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information
about memory accesses, see Load/Store addressing modes.
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
32-bit (size == 10)
LDLAR <Wt>, [<Xn|SP>{,#0}]
64-bit (size == 11)
LDLAR <Xt>, [<Xn|SP>{,#0}]

Assembler Symbols
Operation
bits(64) address;
bits(elsize) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = Mem[address, dbytes, AccType_LIMITEDORDERED];

LDLAR Page 206

LDLARB
Load LOAcquire Register Byte loads a byte from memory, zero-extends it and writes it to a register. The instruction
also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
accesses, see Load/Store addressing modes.
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
LDLARB <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = Mem[address, 1, AccType_LIMITEDORDERED];

LDLARB Page 207

LDLARH
Load LOAcquire Register Halfword loads a halfword from memory, zero-extends it, and writes it to a register. The
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
LDLARH <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = Mem[address, 2, AccType_LIMITEDORDERED];

LDLARH Page 208

LDNP
Load Pair of Registers, with non-temporal hint, calculates an address from a base register value and an immediate
offset, loads two 32-bit words or two 64-bit doublewords from memory, and writes them to two registers.
For information about memory accesses, see Load/Store addressing modes. For information about Non-temporal pair
instructions, see Load/Store Non-temporal pair.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 0 1 imm7 Rt2 Rn Rt
opc L
32-bit (opc == 00)
LDNP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]
64-bit (opc == 10)
LDNP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]
// Empty.
UNPREDICTABLE behaviors, and particularly LDNP.
Assembler Symbols
<imm> For the 32-bit variant: is the optional signed immediate byte offset, a multiple of 4 in the range -256 to
252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit variant: is the optional signed immediate byte offset, a multiple of 8 in the range -512 to
Shared Decode
if opc<0> == '1' then UNDEFINED;
integer scale = 2 + UInt(opc<1>);
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
LDNP Page 209

Operation
bits(64) address;
bits(datasize) data1;
if t == t2 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data1 = Mem[address, dbytes, AccType_STREAM];

data2 = Mem[address+dbytes, dbytes, AccType_STREAM];
if rt_unknown then
data1 = bits(datasize) UNKNOWN;
X[t] = data1;
X[t2] = data2;
LDNP Page 210

LDP
Load Pair of Registers calculates an address from a base register value and an immediate offset, loads two 32-bit
words or two 64-bit doublewords from memory, and writes them to two registers. For information about memory
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 1 1 imm7 Rt2 Rn Rt
opc L
32-bit (opc == 00)
LDP <Wt1>, <Wt2>, [<Xn|SP>], #<imm>
64-bit (opc == 10)
LDP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>
boolean wback = TRUE;

boolean postindex = TRUE;
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 1 1 imm7 Rt2 Rn Rt
opc L
32-bit (opc == 00)
LDP <Wt1>, <Wt2>, [<Xn|SP>, #<imm>]!
64-bit (opc == 10)
LDP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

boolean postindex = FALSE;
Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 0 1 imm7 Rt2 Rn Rt
opc L
32-bit (opc == 00)
LDP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]
64-bit (opc == 10)
LDP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]
boolean wback = FALSE;

LDP Page 211

UNPREDICTABLE behaviors, and particularly LDP.
Assembler Symbols
<imm> For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of
4 in the range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the 32-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the
range -256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of
Shared Decode
if L:opc<0> == '01' || opc == '11' then UNDEFINED;
boolean signed = (opc<0> != '0');
boolean tag_checked = wback || n != 31;
LDP Page 212

Operation
bits(64) address;
boolean wb_unknown = FALSE;
if wback && (t == n || t2 == n) && n != 31 then

Constraint c = ConstrainUnpredictable(Unpredictable_WBOVERLAPLD);
assert c IN {Constraint_WBSUPPRESS, Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_WBSUPPRESS wback = FALSE; // writeback is suppressed
when Constraint_UNKNOWN wb_unknown = TRUE; // writeback is UNKNOWN
if t == t2 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
data1 = Mem[address, dbytes, AccType_NORMAL];

data2 = Mem[address+dbytes, dbytes, AccType_NORMAL];
if rt_unknown then
if signed then
X[t] = SignExtend(data1, 64);
X[t2] = SignExtend(data2, 64);
else
X[t] = data1;
X[t2] = data2;
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
LDP Page 213

LDPSW
Load Pair of Registers Signed Word calculates an address from a base register value and an immediate offset, loads
two 32-bit words from memory, sign-extends them, and writes them to two registers. For information about memory
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 0 1 1 imm7 Rt2 Rn Rt
opc L
Post-index
LDPSW <Xt1>, <Xt2>, [<Xn|SP>], #<imm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 1 1 imm7 Rt2 Rn Rt
opc L
Pre-index
LDPSW <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 0 1 imm7 Rt2 Rn Rt
opc L
Signed offset
LDPSW <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

UNPREDICTABLE behaviors, and particularly LDPSW.
Assembler Symbols
<imm> For the post-index and pre-index variant: is the signed immediate byte offset, a multiple of 4 in the
range -256 to 252, encoded in the "imm7" field as <imm>/4.
LDPSW Page 214

For the signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the range
-256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
Shared Decode
bits(64) offset = LSL(SignExtend(imm7, 64), 2);
Operation
bits(64) address;
bits(32) data1;
bits(32) data2;

Constraint c = ConstrainUnpredictable(Unpredictable_WBOVERLAPLD);
case c of
if t == t2 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
data1 = Mem[address, 4, AccType_NORMAL];

data2 = Mem[address+4, 4, AccType_NORMAL];
if rt_unknown then
data1 = bits(32) UNKNOWN;
data2 = bits(32) UNKNOWN;
X[t] = SignExtend(data1, 64);
X[t2] = SignExtend(data2, 64);
if wback then
if wb_unknown then
if n == 31 then
SP[] = address;
else
X[n] = address;
LDPSW Page 215

LDPSW Page 216

LDR (immediate)
Load Register (immediate) loads a word or doubleword from memory and writes it to a register. The address that is
used for the load is calculated from a base register and an immediate offset. For information about memory accesses,
see Load/Store addressing modes. The Unsigned offset variant scales the immediate offset value by the size of the
value accessed before adding it to the base register value.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc
32-bit (size == 10)
LDR <Wt>, [<Xn|SP>], #<simm>
64-bit (size == 11)
LDR <Xt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc
32-bit (size == 10)
LDR <Wt>, [<Xn|SP>, #<simm>]!
64-bit (size == 11)
LDR <Xt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc
LDR (immediate) Page 217

32-bit (size == 10)
LDR <Wt>, [<Xn|SP>{, #<pimm>}]
64-bit (size == 11)
LDR <Xt>, [<Xn|SP>{, #<pimm>}]

bits(64) offset = LSL(ZeroExtend(imm12, 64), scale);
UNPREDICTABLE behaviors, and particularly LDR (immediate).
Assembler Symbols
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> For the 32-bit variant: is the optional positive immediate byte offset, a multiple of 4 in the range 0 to
16380, defaulting to 0 and encoded in the "imm12" field as <pimm>/4.
For the 64-bit variant: is the optional positive immediate byte offset, a multiple of 8 in the range 0 to
Shared Decode
integer regsize;


Operation
bits(64) address;
if wback && n == t && n != 31 then

c = ConstrainUnpredictable(Unpredictable_WBOVERLAPLD);
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
data = Mem[address, datasize DIV 8, AccType_NORMAL];

if wback then
if wb_unknown then
if n == 31 then
SP[] = address;
else
X[n] = address;

LDR (literal)
Load Register (literal) calculates an address from the PC value and an immediate offset, loads a word from memory,
and writes it to a register. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 x 0 1 1 0 0 0 imm19 Rt
opc
32-bit (opc == 00)
LDR <Wt>, <label>
64-bit (opc == 01)
LDR <Xt>, <label>
MemOp memop = MemOp_LOAD;
boolean signed = FALSE;
integer size;
bits(64) offset;
case opc of
when '00'
size = 4;
when '01'
size = 8;
when '10'
size = 4;
signed = TRUE;
when '11'
memop = MemOp_PREFETCH;
offset = SignExtend(imm19:'00', 64);
Assembler Symbols
<label> Is the program label from which the data is to be loaded. Its offset from the address of this instruction,
in the range +/-1MB, is encoded as "imm19" times 4.
Operation
bits(64) address = PC[] + offset;

bits(size*8) data;
SetTagCheckedInstruction(FALSE);
case memop of
when MemOp_LOAD
data = Mem[address, size, AccType_NORMAL];
if signed then
else
X[t] = data;
when MemOp_PREFETCH
LDR (literal) Page 220

LDR (literal) Page 221

LDR (register)
Load Register (register) calculates an address from a base register value and an offset register value, loads a word
from memory, and writes it to a register. The offset register value can optionally be shifted and extended. For
information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc
32-bit (size == 10)
LDR <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
64-bit (size == 11)
LDR <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

if option<1> == '0' then UNDEFINED; // sub-word index
integer shift = if S == '1' then scale else 0;
Assembler Symbols
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #2
For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
S <amount>
0 #0
1 #3
LDR (register) Page 222

Shared Decode
integer regsize;

Operation
bits(64) offset = ExtendReg(m, extend_type, shift);

SetTagCheckedInstruction(TRUE);
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDR (register) Page 223

LDRAA, LDRAB
Load Register, with pointer authentication. This instruction authenticates an address from a base register using a
modifier of zero and the specified key, adds an immediate offset to the authenticated address, and loads a 64-bit
doubleword from memory at this resulting address into a register.
Key A is used for LDRAA, and key B is used for LDRAB.
If the authentication passes, the PE behaves the same as for an LDR instruction. If the authentication fails, a
The authenticated address is not written back to the base register, unless the pre-indexed variant of the instruction is
used. In this case, the address that is written back to the base register does not include the pointer authentication
code.
Unscaled offset
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 M S 1 imm9 W 1 Rn Rt
size
Key A, offset (M == 0 && W == 0)
LDRAA <Xt>, [<Xn|SP>{, #<simm>}]
Key A, pre-indexed (M == 0 && W == 1)
LDRAA <Xt>, [<Xn|SP>{, #<simm>}]!
Key B, offset (M == 1 && W == 0)
LDRAB <Xt>, [<Xn|SP>{, #<simm>}]
Key B, pre-indexed (M == 1 && W == 1)
LDRAB <Xt>, [<Xn|SP>{, #<simm>}]!
if !HavePACExt() then UNDEFINED;

boolean wback = (W == '1');
bits(10) S10 = S:imm9;
bits(64) offset = LSL(SignExtend(S10, 64), 3);
Assembler Symbols
<simm> Is the optional signed immediate byte offset, a multiple of 8 in the range -4096 to 4088, defaulting to 0
and encoded in the "S:imm9" field as <simm>/8.
LDRAA, LDRAB Page 224

Operation
bits(64) address;
bits(64) data;

case c of
if n == 31 then
address = SP[];
else
address = X[n];
if use_key_a then
address = AuthDA(address, X[31], TRUE);
else
address = AuthDB(address, X[31], TRUE);
if n == 31 then
CheckSPAlignment();

data = Mem[address, 8, AccType_NORMAL];
X[t] = data;
if wback then
if wb_unknown then
if n == 31 then
SP[] = address;
else
X[n] = address;
LDRAA, LDRAB Page 225

LDRB (immediate)
Load Register Byte (immediate) loads a byte from memory, zero-extends it, and writes the result to a register. The
address that is used for the load is calculated from a base register and an immediate offset. For information about
memory accesses, see Load/Store addressing modes.
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc
Post-index
LDRB <Wt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc
Pre-index
LDRB <Wt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc
Unsigned offset
LDRB <Wt>, [<Xn|SP>{, #<pimm>}]

bits(64) offset = LSL(ZeroExtend(imm12, 64), 0);
UNPREDICTABLE behaviors, and particularly LDRH (immediate).
Assembler Symbols
LDRB (immediate) Page 226

<pimm> Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0 and encoded in the
"imm12" field.
Shared Decode
Operation
bits(64) address;
bits(8) data;

case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then

if wback then
if wb_unknown then
if n == 31 then
SP[] = address;
else
X[n] = address;
LDRB (immediate) Page 227

LDRB (register)
Load Register Byte (register) calculates an address from a base register value and an offset register value, loads a
byte from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc
Extended register (option != 011)
LDRB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]
Shifted register (option == 011)
LDRB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

Assembler Symbols
"Rm" field.
"Rm" field.
<extend> Is the index extend specifier, encoded in “option”:
option <extend>
010 UXTW
110 SXTW
111 SXTX
<amount> Is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if present.
Shared Decode
LDRB (register) Page 228

Operation
bits(64) offset = ExtendReg(m, extend_type, 0);

bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDRB (register) Page 229

LDRH (immediate)
Load Register Halfword (immediate) loads a halfword from memory, zero-extends it, and writes the result to a register.
The address that is used for the load is calculated from a base register and an immediate offset. For information about
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc
Post-index
LDRH <Wt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc
Pre-index
LDRH <Wt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc
Unsigned offset
LDRH <Wt>, [<Xn|SP>{, #<pimm>}]

UNPREDICTABLE behaviors, and particularly LDRH (immediate).
Assembler Symbols
LDRH (immediate) Page 230

<pimm> Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0 and
encoded in the "imm12" field as <pimm>/2.
Shared Decode
Operation
bits(64) address;
bits(16) data;

case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then

if wback then
if wb_unknown then
if n == 31 then
SP[] = address;
else
X[n] = address;
LDRH (immediate) Page 231

LDRH (register)
Load Register Halfword (register) calculates an address from a base register value and an offset register value, loads a
halfword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc
32-bit
LDRH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

integer shift = if S == '1' then 1 else 0;
UNPREDICTABLE behaviors.
Assembler Symbols
"Rm" field.
"Rm" field.
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #1
Shared Decode
LDRH (register) Page 232

Operation

bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDRH (register) Page 233

LDRSB (immediate)
Load Register Signed Byte (immediate) loads a byte from memory, sign-extends it to either 32 bits or 64 bits, and
writes the result to a register. The address that is used for the load is calculated from a base register and an
immediate offset. For information about memory accesses, see Load/Store addressing modes.
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 0 1 Rn Rt
size opc
32-bit (opc == 11)
LDRSB <Wt>, [<Xn|SP>], #<simm>
64-bit (opc == 10)
LDRSB <Xt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 1 1 Rn Rt
size opc
32-bit (opc == 11)
LDRSB <Wt>, [<Xn|SP>, #<simm>]!
64-bit (opc == 10)
LDRSB <Xt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 1 x imm12 Rn Rt
size opc
LDRSB (immediate) Page 234

32-bit (opc == 11)
LDRSB <Wt>, [<Xn|SP>{, #<pimm>}]
64-bit (opc == 10)
LDRSB <Xt>, [<Xn|SP>{, #<pimm>}]

UNPREDICTABLE behaviors, and particularly LDRSB (immediate).
Assembler Symbols
"imm12" field.
Shared Decode
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;
boolean tag_checked = memop != MemOp_PREFETCH && (wback || n != 31);

Operation
bits(64) address;
bits(8) data;

if memop == MemOp_LOAD && wback && n == t && n != 31 then

case c of
if memop == MemOp_STORE && wback && n == t && n != 31 then

c = ConstrainUnpredictable(Unpredictable_WBOVERLAPST);
assert c IN {Constraint_NONE, Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_NONE rt_unknown = FALSE; // value stored is original value
when Constraint_UNKNOWN rt_unknown = TRUE; // value stored is UNKNOWN
if n == 31 then
address = SP[];
else
address = X[n];
if !postindex then
case memop of
when MemOp_STORE
if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
if wback then
if wb_unknown then
if n == 31 then
SP[] = address;
else
X[n] = address;


LDRSB (register)
Load Register Signed Byte (register) calculates an address from a base register value and an offset register value,
loads a byte from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 1 Rm option S 1 0 Rn Rt
size opc
32-bit with extended register offset (opc == 11 && option != 011)
LDRSB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]
32-bit with shifted register offset (opc == 11 && option == 011)
LDRSB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]
64-bit with extended register offset (opc == 10 && option != 011)
LDRSB <Xt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]
64-bit with shifted register offset (opc == 10 && option == 011)
LDRSB <Xt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

Assembler Symbols
"Rm" field.
"Rm" field.
option <extend>
010 UXTW
110 SXTW
111 SXTX
LDRSB (register) Page 238

Shared Decode
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;
boolean tag_checked = memop != MemOp_PREFETCH;
Operation

bits(64) address;
bits(8) data;
if n == 31 then
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
LDRSB (register) Page 239

LDRSH (immediate)
Load Register Signed Halfword (immediate) loads a halfword from memory, sign-extends it to 32 bits or 64 bits, and
immediate offset. For information about memory accesses, see Load/Store addressing modes.
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 0 1 Rn Rt
size opc
32-bit (opc == 11)
LDRSH <Wt>, [<Xn|SP>], #<simm>
64-bit (opc == 10)
LDRSH <Xt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 1 1 Rn Rt
size opc
32-bit (opc == 11)
LDRSH <Wt>, [<Xn|SP>, #<simm>]!
64-bit (opc == 10)
LDRSH <Xt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 1 x imm12 Rn Rt
size opc
LDRSH (immediate) Page 240

32-bit (opc == 11)
LDRSH <Wt>, [<Xn|SP>{, #<pimm>}]
64-bit (opc == 10)
LDRSH <Xt>, [<Xn|SP>{, #<pimm>}]

UNPREDICTABLE behaviors, and particularly LDRSH (immediate).
Assembler Symbols
Shared Decode
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;

Operation
bits(64) address;
bits(16) data;

if memop == MemOp_LOAD && wback && n == t && n != 31 then

case c of
if memop == MemOp_STORE && wback && n == t && n != 31 then

case c of
if n == 31 then
address = SP[];
else
address = X[n];
if !postindex then
case memop of
when MemOp_STORE
if rt_unknown then
else
data = X[t];
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
if wback then
if wb_unknown then
if n == 31 then
SP[] = address;
else
X[n] = address;


LDRSH (register)
Load Register Signed Halfword (register) calculates an address from a base register value and an offset register value,
loads a halfword from memory, sign-extends it, and writes it to a register. For information about memory accesses see
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 1 Rm option S 1 0 Rn Rt
size opc
32-bit (opc == 11)
LDRSH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
64-bit (opc == 10)
LDRSH <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

Assembler Symbols
"Rm" field.
"Rm" field.
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
S <amount>
0 #0
1 #1
LDRSH (register) Page 244

Shared Decode
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;
Operation

bits(64) address;
bits(16) data;
if n == 31 then
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
LDRSH (register) Page 245

LDRSW (immediate)
Load Register Signed Word (immediate) loads a word from memory, sign-extends it to 64 bits, and writes the result to
a register. The address that is used for the load is calculated from a base register and an immediate offset. For
information about memory accesses, see Load/Store addressing modes.
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 0 1 Rn Rt
size opc
Post-index
LDRSW <Xt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 1 1 Rn Rt
size opc
Pre-index
LDRSW <Xt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 1 1 0 imm12 Rn Rt
size opc
Unsigned offset
LDRSW <Xt>, [<Xn|SP>{, #<pimm>}]

UNPREDICTABLE behaviors, and particularly LDRSW (immediate).
Assembler Symbols
LDRSW (immediate) Page 246

<pimm> Is the optional positive immediate byte offset, a multiple of 4 in the range 0 to 16380, defaulting to 0
and encoded in the "imm12" field as <pimm>/4.
Shared Decode
Operation
bits(64) address;
bits(32) data;

case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then

if wback then
if wb_unknown then
if n == 31 then
SP[] = address;
else
X[n] = address;
LDRSW (immediate) Page 247

LDRSW (literal)
Load Register Signed Word (literal) calculates an address from the PC value and an immediate offset, loads a word
from memory, and writes it to a register. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 0 0 imm19 Rt
opc
Literal
LDRSW <Xt>, <label>
bits(64) offset;
Assembler Symbols
Operation

bits(32) data;

LDRSW (literal) Page 248

LDRSW (register)
Load Register Signed Word (register) calculates an address from a base register value and an offset register value,
loads a word from memory, sign-extends it to form a 64-bit value, and writes it to a register. The offset register value
can be shifted left by 0 or 2 bits. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 1 Rm option S 1 0 Rn Rt
size opc
64-bit
LDRSW <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

Assembler Symbols
"Rm" field.
"Rm" field.
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
S <amount>
0 #0
1 #2
Shared Decode
LDRSW (register) Page 249

Operation

bits(64) address;
bits(32) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDRSW (register) Page 250

LDSET, LDSETA, LDSETAL, LDSETL
Atomic bit set on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory,
performs a bitwise OR with the value held in a register on it, and stores the result back to memory. The value initially
loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSETA and LDSETAL load from memory with acquire
semantics.
• LDSETL and LDSETAL store to memory with release semantics.
• LDSET has neither acquire nor release semantics.
This instruction is used by the alias STSET, STSETL.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc
LDSET, LDSETA, LDSETAL,

Page 251
LDSETL
32-bit LDSET (size == 10 && A == 0 && R == 0)
LDSET <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSETA (size == 10 && A == 1 && R == 0)
LDSETA <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSETAL (size == 10 && A == 1 && R == 1)
LDSETAL <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSETL (size == 10 && A == 0 && R == 1)
LDSETL <Ws>, <Wt>, [<Xn|SP>]
64-bit LDSET (size == 11 && A == 0 && R == 0)
LDSET <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSETA (size == 11 && A == 1 && R == 0)
LDSETA <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSETAL (size == 11 && A == 1 && R == 1)
LDSETAL <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSETL (size == 11 && A == 0 && R == 1)
LDSETL <Xs>, <Xt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSET, STSETL A == '0' && Rt == '11111'

Page 252
LDSETL
Operation
bits(64) address;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomic(address, MemAtomicOp_ORR, value, ldacctype, stacctype);
if t != 31 then

Page 253
LDSETL
LDSETB, LDSETAB, LDSETALB, LDSETLB
Atomic bit set on byte in memory atomically loads an 8-bit byte from memory, performs a bitwise OR with the value
held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the
• If the destination register is not WZR, LDSETAB and LDSETALB load from memory with acquire semantics.
• LDSETLB and LDSETALB store to memory with release semantics.
• LDSETB has neither acquire nor release semantics.
This instruction is used by the alias STSETB, STSETLB.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc
LDSETAB (A == 1 && R == 0)
LDSETAB <Ws>, <Wt>, [<Xn|SP>]
LDSETALB (A == 1 && R == 1)
LDSETALB <Ws>, <Wt>, [<Xn|SP>]
LDSETB (A == 0 && R == 0)
LDSETB <Ws>, <Wt>, [<Xn|SP>]
LDSETLB (A == 0 && R == 1)
LDSETLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSETB, STSETLB A == '0' && Rt == '11111'
LDSETB, LDSETAB,
Page 254
LDSETALB, LDSETLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDSETB, LDSETAB,
Page 255
LDSETALB, LDSETLB
LDSETH, LDSETAH, LDSETALH, LDSETLH
Atomic bit set on halfword in memory atomically loads a 16-bit halfword from memory, performs a bitwise OR with the
value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned
in the destination register.
• If the destination register is not WZR, LDSETAH and LDSETALH load from memory with acquire semantics.
• LDSETLH and LDSETALH store to memory with release semantics.
• LDSETH has neither acquire nor release semantics.
This instruction is used by the alias STSETH, STSETLH.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc
LDSETAH (A == 1 && R == 0)
LDSETAH <Ws>, <Wt>, [<Xn|SP>]
LDSETALH (A == 1 && R == 1)
LDSETALH <Ws>, <Wt>, [<Xn|SP>]
LDSETH (A == 0 && R == 0)
LDSETH <Ws>, <Wt>, [<Xn|SP>]
LDSETLH (A == 0 && R == 1)
LDSETLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSETH, STSETLH A == '0' && Rt == '11111'
LDSETH, LDSETAH,
Page 256
LDSETALH, LDSETLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDSETH, LDSETAH,
Page 257
LDSETALH, LDSETLH
LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL
Atomic signed maximum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, compares it against the value held in a register, and stores the larger value back to memory, treating the
values as signed numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSMAXA and LDSMAXAL load from memory with acquire
semantics.
• LDSMAXL and LDSMAXAL store to memory with release semantics.
• LDSMAX has neither acquire nor release semantics.
This instruction is used by the alias STSMAX, STSMAXL.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc
LDSMAX, LDSMAXA,
Page 258
LDSMAXAL, LDSMAXL
32-bit LDSMAX (size == 10 && A == 0 && R == 0)
LDSMAX <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSMAXA (size == 10 && A == 1 && R == 0)
LDSMAXA <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSMAXAL (size == 10 && A == 1 && R == 1)
LDSMAXAL <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSMAXL (size == 10 && A == 0 && R == 1)
LDSMAXL <Ws>, <Wt>, [<Xn|SP>]
64-bit LDSMAX (size == 11 && A == 0 && R == 0)
LDSMAX <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSMAXA (size == 11 && A == 1 && R == 0)
LDSMAXA <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSMAXAL (size == 11 && A == 1 && R == 1)
LDSMAXAL <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSMAXL (size == 11 && A == 0 && R == 1)
LDSMAXL <Xs>, <Xt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSMAX, STSMAXL A == '0' && Rt == '11111'
LDSMAX, LDSMAXA,
Page 259
LDSMAXAL, LDSMAXL
Operation
bits(64) address;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomic(address, MemAtomicOp_SMAX, value, ldacctype, stacctype);
if t != 31 then
LDSMAX, LDSMAXA,
Page 260
LDSMAXAL, LDSMAXL
LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB
Atomic signed maximum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value
held in a register, and stores the larger value back to memory, treating the values as signed numbers. The value
initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMAXAB and LDSMAXALB load from memory with acquire semantics.
• LDSMAXLB and LDSMAXALB store to memory with release semantics.
• LDSMAXB has neither acquire nor release semantics.
This instruction is used by the alias STSMAXB, STSMAXLB.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc
LDSMAXAB (A == 1 && R == 0)
LDSMAXAB <Ws>, <Wt>, [<Xn|SP>]
LDSMAXALB (A == 1 && R == 1)
LDSMAXALB <Ws>, <Wt>, [<Xn|SP>]
LDSMAXB (A == 0 && R == 0)
LDSMAXB <Ws>, <Wt>, [<Xn|SP>]
LDSMAXLB (A == 0 && R == 1)
LDSMAXLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSMAXB, STSMAXLB A == '0' && Rt == '11111'
LDSMAXB, LDSMAXAB,
Page 261
LDSMAXALB, LDSMAXLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDSMAXB, LDSMAXAB,
Page 262
LDSMAXALB, LDSMAXLB
LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH
Atomic signed maximum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against
the value held in a register, and stores the larger value back to memory, treating the values as signed numbers. The
• If the destination register is not WZR, LDSMAXAH and LDSMAXALH load from memory with acquire semantics.
• LDSMAXLH and LDSMAXALH store to memory with release semantics.
• LDSMAXH has neither acquire nor release semantics.
This instruction is used by the alias STSMAXH, STSMAXLH.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc
LDSMAXAH (A == 1 && R == 0)
LDSMAXAH <Ws>, <Wt>, [<Xn|SP>]
LDSMAXALH (A == 1 && R == 1)
LDSMAXALH <Ws>, <Wt>, [<Xn|SP>]
LDSMAXH (A == 0 && R == 0)
LDSMAXH <Ws>, <Wt>, [<Xn|SP>]
LDSMAXLH (A == 0 && R == 1)
LDSMAXLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSMAXH, STSMAXLH A == '0' && Rt == '11111'
LDSMAXH, LDSMAXAH,
Page 263
LDSMAXALH, LDSMAXLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDSMAXH, LDSMAXAH,
Page 264
LDSMAXALH, LDSMAXLH
LDSMIN, LDSMINA, LDSMINAL, LDSMINL
Atomic signed minimum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the
values as signed numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSMINA and LDSMINAL load from memory with acquire
semantics.
• LDSMINL and LDSMINAL store to memory with release semantics.
• LDSMIN has neither acquire nor release semantics.
This instruction is used by the alias STSMIN, STSMINL.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc
LDSMIN, LDSMINA,
Page 265
LDSMINAL, LDSMINL
32-bit LDSMIN (size == 10 && A == 0 && R == 0)
LDSMIN <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSMINA (size == 10 && A == 1 && R == 0)
LDSMINA <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSMINAL (size == 10 && A == 1 && R == 1)
LDSMINAL <Ws>, <Wt>, [<Xn|SP>]
32-bit LDSMINL (size == 10 && A == 0 && R == 1)
LDSMINL <Ws>, <Wt>, [<Xn|SP>]
64-bit LDSMIN (size == 11 && A == 0 && R == 0)
LDSMIN <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSMINA (size == 11 && A == 1 && R == 0)
LDSMINA <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSMINAL (size == 11 && A == 1 && R == 1)
LDSMINAL <Xs>, <Xt>, [<Xn|SP>]
64-bit LDSMINL (size == 11 && A == 0 && R == 1)
LDSMINL <Xs>, <Xt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSMIN, STSMINL A == '0' && Rt == '11111'
LDSMIN, LDSMINA,
Page 266
LDSMINAL, LDSMINL
Operation
bits(64) address;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomic(address, MemAtomicOp_SMIN, value, ldacctype, stacctype);
if t != 31 then
LDSMIN, LDSMINA,
Page 267
LDSMINAL, LDSMINL
LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB
Atomic signed minimum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value
held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The value
initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMINAB and LDSMINALB load from memory with acquire semantics.
• LDSMINLB and LDSMINALB store to memory with release semantics.
• LDSMINB has neither acquire nor release semantics.
This instruction is used by the alias STSMINB, STSMINLB.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc
LDSMINAB (A == 1 && R == 0)
LDSMINAB <Ws>, <Wt>, [<Xn|SP>]
LDSMINALB (A == 1 && R == 1)
LDSMINALB <Ws>, <Wt>, [<Xn|SP>]
LDSMINB (A == 0 && R == 0)
LDSMINB <Ws>, <Wt>, [<Xn|SP>]
LDSMINLB (A == 0 && R == 1)
LDSMINLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSMINB, STSMINLB A == '0' && Rt == '11111'
LDSMINB, LDSMINAB,
Page 268
LDSMINALB, LDSMINLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDSMINB, LDSMINAB,
Page 269
LDSMINALB, LDSMINLB
LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH
Atomic signed minimum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against
the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The
• If the destination register is not WZR, LDSMINAH and LDSMINALH load from memory with acquire semantics.
• LDSMINLH and LDSMINALH store to memory with release semantics.
• LDSMINH has neither acquire nor release semantics.
This instruction is used by the alias STSMINH, STSMINLH.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc
LDSMINAH (A == 1 && R == 0)
LDSMINAH <Ws>, <Wt>, [<Xn|SP>]
LDSMINALH (A == 1 && R == 1)
LDSMINALH <Ws>, <Wt>, [<Xn|SP>]
LDSMINH (A == 0 && R == 0)
LDSMINH <Ws>, <Wt>, [<Xn|SP>]
LDSMINLH (A == 0 && R == 1)
LDSMINLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STSMINH, STSMINLH A == '0' && Rt == '11111'
LDSMINH, LDSMINAH,
Page 270
LDSMINALH, LDSMINLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDSMINH, LDSMINAH,
Page 271
LDSMINALH, LDSMINLH
LDTR
Load Register (unprivileged) loads a word or doubleword from memory, and writes it to a register. The address that is
used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc
32-bit (size == 10)
LDTR <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (size == 11)
LDTR <Xt>, [<Xn|SP>{, #<simm>}]

Assembler Symbols
the "imm9" field.
Shared Decode
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;
integer regsize;

LDTR Page 272

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = Mem[address, datasize DIV 8, acctype];

LDTR Page 273

LDTRB
Load Register Byte (unprivileged) loads a byte from memory, zero-extends it, and writes the result to a register. The
address that is used for the load is calculated from a base register and an immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc
Unscaled offset
LDTRB <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode

else
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = Mem[address, 1, acctype];

LDTRB Page 274

LDTRB Page 275

LDTRH
Load Register Halfword (unprivileged) loads a halfword from memory, zero-extends it, and writes the result to a
register. The address that is used for the load is calculated from a base register and an immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc
Unscaled offset
LDTRH <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode

else
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDTRH Page 276

LDTRH Page 277

LDTRSB
Load Register Signed Byte (unprivileged) loads a byte from memory, sign-extends it to 32 bits or 64 bits, and writes
the result to a register. The address that is used for the load is calculated from a base register and an immediate
offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 1 0 Rn Rt
size opc
32-bit (opc == 11)
LDTRSB <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (opc == 10)
LDTRSB <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
LDTRSB Page 278

Shared Decode

else
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;
Operation
bits(64) address;
bits(8) data;
if n == 31 then
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, acctype] = data;
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
LDTRSB Page 279

LDTRSH
Load Register Signed Halfword (unprivileged) loads a halfword from memory, sign-extends it to 32 bits or 64 bits, and
immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 1 0 Rn Rt
size opc
32-bit (opc == 11)
LDTRSH <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (opc == 10)
LDTRSH <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
LDTRSH Page 280

Shared Decode

else
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;
Operation
bits(64) address;
bits(16) data;
if n == 31 then
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
LDTRSH Page 281

LDTRSW
Load Register Signed Word (unprivileged) loads a word from memory, sign-extends it to 64 bits, and writes the result
to a register. The address that is used for the load is calculated from a base register and an immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 1 0 Rn Rt
size opc
Unscaled offset
LDTRSW <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode

else
Operation
bits(64) address;
bits(32) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDTRSW Page 282

LDTRSW Page 283

LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL
Atomic unsigned maximum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword
from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the
values as unsigned numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDUMAXA and LDUMAXAL load from memory with acquire
semantics.
• LDUMAXL and LDUMAXAL store to memory with release semantics.
• LDUMAX has neither acquire nor release semantics.
This instruction is used by the alias STUMAX, STUMAXL.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc
LDUMAX, LDUMAXA,
Page 284
LDUMAXAL, LDUMAXL
32-bit LDUMAX (size == 10 && A == 0 && R == 0)
LDUMAX <Ws>, <Wt>, [<Xn|SP>]
32-bit LDUMAXA (size == 10 && A == 1 && R == 0)
LDUMAXA <Ws>, <Wt>, [<Xn|SP>]
32-bit LDUMAXAL (size == 10 && A == 1 && R == 1)
LDUMAXAL <Ws>, <Wt>, [<Xn|SP>]
32-bit LDUMAXL (size == 10 && A == 0 && R == 1)
LDUMAXL <Ws>, <Wt>, [<Xn|SP>]
64-bit LDUMAX (size == 11 && A == 0 && R == 0)
LDUMAX <Xs>, <Xt>, [<Xn|SP>]
64-bit LDUMAXA (size == 11 && A == 1 && R == 0)
LDUMAXA <Xs>, <Xt>, [<Xn|SP>]
64-bit LDUMAXAL (size == 11 && A == 1 && R == 1)
LDUMAXAL <Xs>, <Xt>, [<Xn|SP>]
64-bit LDUMAXL (size == 11 && A == 0 && R == 1)
LDUMAXL <Xs>, <Xt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STUMAX, STUMAXL A == '0' && Rt == '11111'
LDUMAX, LDUMAXA,
Page 285
LDUMAXAL, LDUMAXL
Operation
bits(64) address;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomic(address, MemAtomicOp_UMAX, value, ldacctype, stacctype);
if t != 31 then
LDUMAX, LDUMAXA,
Page 286
LDUMAXAL, LDUMAXL
LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB
Atomic unsigned maximum on byte in memory atomically loads an 8-bit byte from memory, compares it against the
value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers. The
• If the destination register is not WZR, LDUMAXAB and LDUMAXALB load from memory with acquire semantics.
• LDUMAXLB and LDUMAXALB store to memory with release semantics.
• LDUMAXB has neither acquire nor release semantics.
This instruction is used by the alias STUMAXB, STUMAXLB.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc
LDUMAXAB (A == 1 && R == 0)
LDUMAXAB <Ws>, <Wt>, [<Xn|SP>]
LDUMAXALB (A == 1 && R == 1)
LDUMAXALB <Ws>, <Wt>, [<Xn|SP>]
LDUMAXB (A == 0 && R == 0)
LDUMAXB <Ws>, <Wt>, [<Xn|SP>]
LDUMAXLB (A == 0 && R == 1)
LDUMAXLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STUMAXB, STUMAXLB A == '0' && Rt == '11111'
LDUMAXB, LDUMAXAB,
Page 287
LDUMAXALB, LDUMAXLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDUMAXB, LDUMAXAB,
Page 288
LDUMAXALB, LDUMAXLB
LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH
Atomic unsigned maximum on halfword in memory atomically loads a 16-bit halfword from memory, compares it
against the value held in a register, and stores the larger value back to memory, treating the values as unsigned
numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMAXAH and LDUMAXALH load from memory with acquire semantics.
• LDUMAXLH and LDUMAXALH store to memory with release semantics.
• LDUMAXH has neither acquire nor release semantics.
This instruction is used by the alias STUMAXH, STUMAXLH.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc
LDUMAXAH (A == 1 && R == 0)
LDUMAXAH <Ws>, <Wt>, [<Xn|SP>]
LDUMAXALH (A == 1 && R == 1)
LDUMAXALH <Ws>, <Wt>, [<Xn|SP>]
LDUMAXH (A == 0 && R == 0)
LDUMAXH <Ws>, <Wt>, [<Xn|SP>]
LDUMAXLH (A == 0 && R == 1)
LDUMAXLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STUMAXH, STUMAXLH A == '0' && Rt == '11111'
LDUMAXH, LDUMAXAH,
Page 289
LDUMAXALH, LDUMAXLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDUMAXH, LDUMAXAH,
Page 290
LDUMAXALH, LDUMAXLH
LDUMIN, LDUMINA, LDUMINAL, LDUMINL
Atomic unsigned minimum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword
from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating
the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDUMINA and LDUMINAL load from memory with acquire
semantics.
• LDUMINL and LDUMINAL store to memory with release semantics.
• LDUMIN has neither acquire nor release semantics.
This instruction is used by the alias STUMIN, STUMINL.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc
LDUMIN, LDUMINA,
Page 291
LDUMINAL, LDUMINL
32-bit LDUMIN (size == 10 && A == 0 && R == 0)
LDUMIN <Ws>, <Wt>, [<Xn|SP>]
32-bit LDUMINA (size == 10 && A == 1 && R == 0)
LDUMINA <Ws>, <Wt>, [<Xn|SP>]
32-bit LDUMINAL (size == 10 && A == 1 && R == 1)
LDUMINAL <Ws>, <Wt>, [<Xn|SP>]
32-bit LDUMINL (size == 10 && A == 0 && R == 1)
LDUMINL <Ws>, <Wt>, [<Xn|SP>]
64-bit LDUMIN (size == 11 && A == 0 && R == 0)
LDUMIN <Xs>, <Xt>, [<Xn|SP>]
64-bit LDUMINA (size == 11 && A == 1 && R == 0)
LDUMINA <Xs>, <Xt>, [<Xn|SP>]
64-bit LDUMINAL (size == 11 && A == 1 && R == 1)
LDUMINAL <Xs>, <Xt>, [<Xn|SP>]
64-bit LDUMINL (size == 11 && A == 0 && R == 1)
LDUMINL <Xs>, <Xt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STUMIN, STUMINL A == '0' && Rt == '11111'
LDUMIN, LDUMINA,
Page 292
LDUMINAL, LDUMINL
Operation
bits(64) address;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = MemAtomic(address, MemAtomicOp_UMIN, value, ldacctype, stacctype);
if t != 31 then
LDUMIN, LDUMINA,
Page 293
LDUMINAL, LDUMINL
LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB
Atomic unsigned minimum on byte in memory atomically loads an 8-bit byte from memory, compares it against the
value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers. The
• If the destination register is not WZR, LDUMINAB and LDUMINALB load from memory with acquire semantics.
• LDUMINLB and LDUMINALB store to memory with release semantics.
• LDUMINB has neither acquire nor release semantics.
This instruction is used by the alias STUMINB, STUMINLB.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc
LDUMINAB (A == 1 && R == 0)
LDUMINAB <Ws>, <Wt>, [<Xn|SP>]
LDUMINALB (A == 1 && R == 1)
LDUMINALB <Ws>, <Wt>, [<Xn|SP>]
LDUMINB (A == 0 && R == 0)
LDUMINB <Ws>, <Wt>, [<Xn|SP>]
LDUMINLB (A == 0 && R == 1)
LDUMINLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STUMINB, STUMINLB A == '0' && Rt == '11111'
LDUMINB, LDUMINAB,
Page 294
LDUMINALB, LDUMINLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDUMINB, LDUMINAB,
Page 295
LDUMINALB, LDUMINLB
LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH
Atomic unsigned minimum on halfword in memory atomically loads a 16-bit halfword from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned
numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMINAH and LDUMINALH load from memory with acquire semantics.
• LDUMINLH and LDUMINALH store to memory with release semantics.
• LDUMINH has neither acquire nor release semantics.
This instruction is used by the alias STUMINH, STUMINLH.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc
LDUMINAH (A == 1 && R == 0)
LDUMINAH <Ws>, <Wt>, [<Xn|SP>]
LDUMINALH (A == 1 && R == 1)
LDUMINALH <Ws>, <Wt>, [<Xn|SP>]
LDUMINH (A == 0 && R == 0)
LDUMINH <Ws>, <Wt>, [<Xn|SP>]
LDUMINLH (A == 0 && R == 1)
LDUMINLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
Alias Conditions

STUMINH, STUMINLH A == '0' && Rt == '11111'
LDUMINH, LDUMINAH,
Page 296
LDUMINALH, LDUMINLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
LDUMINH, LDUMINAH,
Page 297
LDUMINALH, LDUMINLH
LDUR
Load Register (unscaled) calculates an address from a base register and an immediate offset, loads a 32-bit word or
64-bit doubleword from memory, zero-extends it, and writes it to a register. For information about memory accesses,
see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc
32-bit (size == 10)
LDUR <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (size == 11)
LDUR <Xt>, [<Xn|SP>{, #<simm>}]

Assembler Symbols
the "imm9" field.
Shared Decode
integer regsize;

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDUR Page 298

LDUR Page 299

LDURB
Load Register Byte (unscaled) calculates an address from a base register and an immediate offset, loads a byte from
memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
LDURB <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDURB Page 300

LDURH
Load Register Halfword (unscaled) calculates an address from a base register and an immediate offset, loads a
halfword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
LDURH <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDURH Page 301

LDURSB
Load Register Signed Byte (unscaled) calculates an address from a base register and an immediate offset, loads a
signed byte from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 0 0 Rn Rt
size opc
32-bit (opc == 11)
LDURSB <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (opc == 10)
LDURSB <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;
LDURSB Page 302

Operation
bits(64) address;
bits(8) data;
if n == 31 then
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
LDURSB Page 303

LDURSH
Load Register Signed Halfword (unscaled) calculates an address from a base register and an immediate offset, loads a
signed halfword from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 0 0 Rn Rt
size opc
32-bit (opc == 11)
LDURSH <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (opc == 10)
LDURSH <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
MemOp memop;
boolean signed;
integer regsize;

regsize = 32;
signed = FALSE;
else
memop = MemOp_LOAD;
signed = TRUE;
LDURSH Page 304

Operation
bits(64) address;
bits(16) data;
if n == 31 then
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
when MemOp_LOAD
if signed then
else
when MemOp_PREFETCH
LDURSH Page 305

LDURSW
Load Register Signed Word (unscaled) calculates an address from a base register and an immediate offset, loads a
signed word from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
LDURSW <Xt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(32) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

LDURSW Page 306

LDXP
Load Exclusive Pair of Registers derives an address from a base register value, loads two 32-bit words or two 64-bit
doublewords from memory, and writes them to two registers. A 32-bit pair requires the address to be doubleword
aligned and is single-copy atomic at doubleword granularity. A 64-bit pair requires the address to be quadword aligned
and is single-copy atomic for each doubleword at doubleword granularity. The PE marks the physical address being
accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 1 1 (1) (1) (1) (1) (1) 0 Rt2 Rn Rt
L Rs o0
32-bit (sz == 0)
LDXP <Wt1>, <Wt2>, [<Xn|SP>{,#0}]
64-bit (sz == 1)
LDXP <Xt1>, <Xt2>, [<Xn|SP>{,#0}]

UNPREDICTABLE behaviors, and particularly LDXP.
Assembler Symbols
LDXP Page 307

Operation
bits(64) address;
if t == t2 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if rt_unknown then
// ConstrainedUNPREDICTABLE case
X[t] = bits(datasize) UNKNOWN; // In this case t = t2
elsif elsize == 32 then
// 32-bit load exclusive pair (atomic)
data = Mem[address, dbytes, AccType_ATOMIC];
if BigEndian() then
X[t] = data<datasize-1:elsize>;
X[t2] = data<elsize-1:0>;
else
X[t] = data<elsize-1:0>;
X[t2] = data<datasize-1:elsize>;
else // elsize == 64
// 64-bit load exclusive pair (not atomic),
// but must be 128-bit aligned
if address != Align(address, dbytes) then
AArch64.Abort(address, AArch64.AlignmentFault(AccType_ATOMIC, FALSE, FALSE));
X[t] = Mem[address, 8, AccType_ATOMIC];
X[t2] = Mem[address+8, 8, AccType_ATOMIC];
LDXP Page 308

LDXR
Load Exclusive Register derives an address from a base register value, loads a 32-bit word or a 64-bit doubleword
from memory, and writes it to a register. The memory access is atomic. The PE marks the physical address being
accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
32-bit (size == 10)
LDXR <Wt>, [<Xn|SP>{,#0}]
64-bit (size == 11)
LDXR <Xt>, [<Xn|SP>{,#0}]

Assembler Symbols
Operation
bits(64) address;
bits(elsize) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, dbytes, AccType_ATOMIC];

LDXR Page 309

LDXR Page 310

LDXRB
Load Exclusive Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it
and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an
exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
LDXRB <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 1, AccType_ATOMIC];

LDXRB Page 311

LDXRH
Load Exclusive Register Halfword derives an address from a base register value, loads a halfword from memory, zero-
extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed
as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
LDXRH <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 2, AccType_ATOMIC];

LDXRH Page 312

LSL (register)
Logical Shift Left (register) shifts a register value left by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is left-shifted.
This is an alias of LSLV. This means:
• The encodings in this description are named to match the encodings of LSLV.
• The description of LSLV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 0 Rn Rd
op2
32-bit (sf == 0)
LSL <Wd>, <Wn>, <Wm>
is equivalent to
LSLV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
LSL <Xd>, <Xn>, <Xm>
is equivalent to
LSLV <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation
The description of LSLV gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
LSL (register) Page 313

LSL (register) Page 314

LSL (immediate)
Logical Shift Left (immediate) shifts a register value left by an immediate number of bits, shifting in zeros, and writes
the result to the destination register.
This is an alias of UBFM. This means:
• The encodings in this description are named to match the encodings of UBFM.
• The description of UBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr != x11111 Rn Rd
opc imms
32-bit (sf == 0 && N == 0 && imms != 011111)
LSL <Wd>, <Wn>, #<shift>
is equivalent to
UBFM <Wd>, <Wn>, #(-<shift> MOD 32), #(31-<shift>)
and is the preferred disassembly when imms + 1 == immr.
64-bit (sf == 1 && N == 1 && imms != 111111)
LSL <Xd>, <Xn>, #<shift>
is equivalent to
UBFM <Xd>, <Xn>, #(-<shift> MOD 64), #(63-<shift>)
and is the preferred disassembly when imms + 1 == immr.
Assembler Symbols
<shift> For the 32-bit variant: is the shift amount, in the range 0 to 31.
For the 64-bit variant: is the shift amount, in the range 0 to 63.
Operation
The description of UBFM gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
LSL (immediate) Page 315

LSLV
Logical Shift Left Variable shifts a register value left by a variable number of bits, shifting in zeros, and writes the
defines the number of bits by which the first source register is left-shifted.
This instruction is used by the alias LSL (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 0 Rn Rd
op2
32-bit (sf == 0)
LSLV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
LSLV <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation

X[d] = result;
If PSTATE.DIT is 1:
LSLV Page 316

LSR (register)
Logical Shift Right (register) shifts a register value right by a variable number of bits, shifting in zeros, and writes the
defines the number of bits by which the first source register is right-shifted.
This is an alias of LSRV. This means:
• The encodings in this description are named to match the encodings of LSRV.
• The description of LSRV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
op2
32-bit (sf == 0)
LSR <Wd>, <Wn>, <Wm>
is equivalent to
LSRV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
LSR <Xd>, <Xn>, <Xm>
is equivalent to
LSRV <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation
The description of LSRV gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
LSR (register) Page 317

LSR (register) Page 318

LSR (immediate)
Logical Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in zeros, and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr x 1 1 1 1 1 Rn Rd
opc imms
32-bit (sf == 0 && N == 0 && imms == 011111)
LSR <Wd>, <Wn>, #<shift>
is equivalent to
UBFM <Wd>, <Wn>, #<shift>, #31
64-bit (sf == 1 && N == 1 && imms == 111111)
LSR <Xd>, <Xn>, #<shift>
is equivalent to
UBFM <Xd>, <Xn>, #<shift>, #63
Assembler Symbols
<shift> For the 32-bit variant: is the shift amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, encoded in the "immr" field.
Operation
If PSTATE.DIT is 1:
LSR (immediate) Page 319

LSRV
Logical Shift Right Variable shifts a register value right by a variable number of bits, shifting in zeros, and writes the
defines the number of bits by which the first source register is right-shifted.
This instruction is used by the alias LSR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
op2
32-bit (sf == 0)
LSRV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
LSRV <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation

X[d] = result;
If PSTATE.DIT is 1:
LSRV Page 320

MADD
Multiply-Add multiplies two register values, adds a third register value, and writes the result to the destination
register.
This instruction is used by the alias MUL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 0 Ra Rn Rd
o0
32-bit (sf == 0)
MADD <Wd>, <Wn>, <Wm>, <Wa>
64-bit (sf == 1)
MADD <Xd>, <Xn>, <Xm>, <Xa>
integer a = UInt(Ra);
integer destsize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Wa> Is the 32-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.
Alias Conditions

MUL Ra == '11111'
Operation
bits(destsize) operand1 = X[n];

bits(destsize) operand2 = X[m];
bits(destsize) operand3 = X[a];
integer result;
result = UInt(operand3) + (UInt(operand1) * UInt(operand2));
X[d] = result<destsize-1:0>;
MADD Page 321

If PSTATE.DIT is 1:
MADD Page 322

MNEG
Multiply-Negate multiplies two register values, negates the product, and writes the result to the destination register.
This is an alias of MSUB. This means:
• The encodings in this description are named to match the encodings of MSUB.
• The description of MSUB gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 1 1 1 1 1 1 Rn Rd
o0 Ra
32-bit (sf == 0)
MNEG <Wd>, <Wn>, <Wm>
is equivalent to
MSUB <Wd>, <Wn>, <Wm>, WZR
64-bit (sf == 1)
MNEG <Xd>, <Xn>, <Xm>
is equivalent to
MSUB <Xd>, <Xn>, <Xm>, XZR
Assembler Symbols
"Rn" field.
"Rm" field.
"Rn" field.
"Rm" field.
Operation
The description of MSUB gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MNEG Page 323

MNEG Page 324

MOV (to/from SP)
Move between register and stack pointer: Rd = Rn.
This is an alias of ADD (immediate). This means:
• The encodings in this description are named to match the encodings of ADD (immediate).
• The description of ADD (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Rn Rd
op S sh imm12
32-bit (sf == 0)
MOV <Wd|WSP>, <Wn|WSP>
is equivalent to
ADD <Wd|WSP>, <Wn|WSP>, #0
and is the preferred disassembly when (Rd == '11111' || Rn == '11111').
64-bit (sf == 1)
MOV <Xd|SP>, <Xn|SP>
is equivalent to
ADD <Xd|SP>, <Xn|SP>, #0
and is the preferred disassembly when (Rd == '11111' || Rn == '11111').
Assembler Symbols
field.
field.
Operation
The description of ADD (immediate) gives the operational pseudocode for this instruction.
MOV (to/from SP) Page 325

MOV (inverted wide immediate)
Move (inverted wide immediate) moves an inverted 16-bit immediate value to a register.
This is an alias of MOVN. This means:
• The encodings in this description are named to match the encodings of MOVN.
• The description of MOVN gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 1 hw imm16 Rd
opc
32-bit (sf == 0 && hw == 0x)
MOV <Wd>, #<imm>
is equivalent to
MOVN <Wd>, #<imm16>, LSL #<shift>
and is the preferred disassembly when ! (IsZero(imm16) && hw != '00') && ! IsOnes(imm16).
64-bit (sf == 1)
MOV <Xd>, #<imm>
is equivalent to
MOVN <Xd>, #<imm16>, LSL #<shift>
and is the preferred disassembly when ! (IsZero(imm16) && hw != '00').
Assembler Symbols
<imm> For the 32-bit variant: is a 32-bit immediate, the bitwise inverse of which can be encoded in
"imm16:hw", but excluding 0xffff0000 and 0x0000ffff
For the 64-bit variant: is a 64-bit immediate, the bitwise inverse of which can be encoded in
"imm16:hw".
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.
Operation
The description of MOVN gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MOV (inverted wide

Page 326
immediate)
MOV (inverted wide

Page 327
immediate)
MOV (wide immediate)
Move (wide immediate) moves a 16-bit immediate value to a register.
This is an alias of MOVZ. This means:
• The encodings in this description are named to match the encodings of MOVZ.
• The description of MOVZ gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 0 1 hw imm16 Rd
opc
32-bit (sf == 0 && hw == 0x)
MOV <Wd>, #<imm>
is equivalent to
MOVZ <Wd>, #<imm16>, LSL #<shift>
64-bit (sf == 1)
MOV <Xd>, #<imm>
is equivalent to
MOVZ <Xd>, #<imm16>, LSL #<shift>
Assembler Symbols
<imm> For the 32-bit variant: is a 32-bit immediate which can be encoded in "imm16:hw".
For the 64-bit variant: is a 64-bit immediate which can be encoded in "imm16:hw".
Operation
The description of MOVZ gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MOV (wide immediate) Page 328

MOV (bitmask immediate)
Move (bitmask immediate) writes a bitmask immediate value to a register.
This is an alias of ORR (immediate). This means:
• The encodings in this description are named to match the encodings of ORR (immediate).
• The description of ORR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 0 0 N immr imms 1 1 1 1 1 Rd
opc Rn
32-bit (sf == 0 && N == 0)
MOV <Wd|WSP>, #<imm>
is equivalent to
ORR <Wd|WSP>, WZR, #<imm>
and is the preferred disassembly when ! MoveWidePreferred(sf, N, imms, immr).
64-bit (sf == 1)
MOV <Xd|SP>, #<imm>
is equivalent to
ORR <Xd|SP>, XZR, #<imm>
and is the preferred disassembly when ! MoveWidePreferred(sf, N, imms, immr).
Assembler Symbols
field.
field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr", but excluding values which
could be encoded by MOVZ or MOVN.
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr", but excluding values which
could be encoded by MOVZ or MOVN.
Operation
The description of ORR (immediate) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MOV (bitmask immediate) Page 329

MOV (register)
Move (register) copies the value in a source register to the destination register.
This is an alias of ORR (shifted register). This means:
• The encodings in this description are named to match the encodings of ORR (shifted register).
• The description of ORR (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
opc shift N imm6 Rn
32-bit (sf == 0)
MOV <Wd>, <Wm>
is equivalent to
ORR <Wd>, WZR, <Wm>
64-bit (sf == 1)
MOV <Xd>, <Xm>
is equivalent to
ORR <Xd>, XZR, <Xm>
Assembler Symbols
<Wm> Is the 32-bit name of the general-purpose source register, encoded in the "Rm" field.
<Xm> Is the 64-bit name of the general-purpose source register, encoded in the "Rm" field.
Operation
The description of ORR (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MOV (register) Page 330

MOVK
Move wide with keep moves an optionally-shifted 16-bit immediate value into a register, keeping other bits unchanged.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 1 0 1 hw imm16 Rd
opc
32-bit (sf == 0 && hw == 0x)
MOVK <Wd>, #<imm>{, LSL #<shift>}
64-bit (sf == 1)
MOVK <Xd>, #<imm>{, LSL #<shift>}
integer pos;
if sf == '0' && hw<1> == '1' then UNDEFINED;

pos = UInt(hw:'0000');
Assembler Symbols
<imm> Is the 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
Operation
result = X[d];
result<pos+15:pos> = imm16;
X[d] = result;
If PSTATE.DIT is 1:
MOVK Page 331

MOVN
Move wide with NOT moves the inverse of an optionally-shifted 16-bit immediate value to a register.
This instruction is used by the alias MOV (inverted wide immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 1 hw imm16 Rd
opc
32-bit (sf == 0 && hw == 0x)
MOVN <Wd>, #<imm>{, LSL #<shift>}
64-bit (sf == 1)
MOVN <Xd>, #<imm>{, LSL #<shift>}
integer pos;

Assembler Symbols
Alias Conditions
Of
variant
MOV (inverted wide 64-bit ! (IsZero(imm16) && hw != '00')
immediate)
MOV (inverted wide 32-bit ! (IsZero(imm16) && hw != '00') && ! IsOnes(imm16)
immediate)
Operation
result = Zeros();
result = NOT(result);
X[d] = result;
If PSTATE.DIT is 1:
MOVN Page 332

MOVN Page 333

MOVZ
Move wide with zero moves an optionally-shifted 16-bit immediate value to a register.
This instruction is used by the alias MOV (wide immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 0 1 hw imm16 Rd
opc
32-bit (sf == 0 && hw == 0x)
MOVZ <Wd>, #<imm>{, LSL #<shift>}
64-bit (sf == 1)
MOVZ <Xd>, #<imm>{, LSL #<shift>}
integer pos;

Assembler Symbols
Alias Conditions

MOV (wide immediate) ! (IsZero(imm16) && hw != '00')
Operation
result = Zeros();
X[d] = result;
If PSTATE.DIT is 1:
MOVZ Page 334

MOVZ Page 335

MRS
Move System Register allows the PE to read an AArch64 System register into a general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 1 1 o0 op1 CRn CRm op2 Rt
L
System
MRS <Xt>, (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>)
AArch64.CheckSystemAccess('1':o0, op1, CRn, CRm, op2, Rt, L);
integer sys_op0 = 2 + UInt(o0);

integer sys_op1 = UInt(op1);
integer sys_crn = UInt(CRn);
integer sys_crm = UInt(CRm);
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose destination register, encoded in the "Rt" field.
<systemreg> Is a System register name, encoded in the "o0:op1:CRn:CRm:op2".
The System register names are defined in 'AArch64 System Registers' in the System Register XML.
<op0> Is an unsigned immediate, encoded in “o0”:
o0 <op0>
0 2
1 3
<Cn> Is a name 'Cn', with 'n' in the range 0 to 15, encoded in the "CRn" field.
Operation
X[t] = AArch64.SysRegRead(sys_op0, sys_op1, sys_crn, sys_crm, sys_op2);
MRS Page 336

MSR (immediate)
Move immediate value to Special Register moves an immediate value to selected bits of the PSTATE. For more
information, see Process state, PSTATE.
The bits that can be written by this instruction are:
• PSTATE.D, PSTATE.A, PSTATE.I, PSTATE.F, and PSTATE.SP.
• If ARMv8.0-SSBS is implemented, PSTATE.SSBS.
• If ARMv8.1-PAN is implemented, PSTATE.PAN.
• If ARMv8.2-UAO is implemented, PSTATE.UAO.
• If ARMv8.4-DIT is implemented, PSTATE.DIT.
• If ARMv8.5-MemTag is implemented, PSTATE.TCO.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 op1 0 1 0 0 CRm op2 1 1 1 1 1
System
MSR <pstatefield>, #<imm>
if op1 == '000' && op2 == '000' then SEE "CFINV";

if op1 == '000' && op2 == '001' then SEE "XAFLAG";
if op1 == '000' && op2 == '010' then SEE "AXFLAG";
AArch64.CheckSystemAccess('00', op1, '0100', CRm, op2, '11111', '0');
PSTATEField field;
case op1:op2 of
when '000 011'
if !HaveUAOExt() then
UNDEFINED;
field = PSTATEField_UAO;
when '000 100'
if !HavePANExt() then
UNDEFINED;
field = PSTATEField_PAN;
when '000 101' field = PSTATEField_SP;
when '011 010'
if !HaveDITExt() then
UNDEFINED;
field = PSTATEField_DIT;
when '011 100'
if !HaveMTEExt() then
UNDEFINED;
field = PSTATEField_TCO;
when '011 110' field = PSTATEField_DAIFSet;
when '011 111' field = PSTATEField_DAIFClr;
when '011 001'
if !HaveSSBSExt() then
UNDEFINED;
field = PSTATEField_SSBS;
otherwise UNDEFINED;
// Check that an AArch64 MSR/MRS access to the DAIF flags is permitted

if PSTATE.EL == EL0 && field IN {PSTATEField_DAIFSet, PSTATEField_DAIFClr} then
if !ELUsingAArch32(EL1) && ((EL2Enabled() && HCR_EL2.<E2H,TGE> == '11') || SCTLR_EL1.UMA == '0') then
if EL2Enabled() && !ELUsingAArch32(EL2) && HCR_EL2.TGE == '1' then
AArch64.SystemAccessTrap(EL2, 0x18);
else
AArch64.SystemAccessTrap(EL1, 0x18);
Assembler Symbols
<pstatefield> Is a PSTATE field name, encoded in “op1:op2”:
MSR (immediate) Page 337

op1 op2 <pstatefield> Architectural Feature
000 00x SEE PSTATE -
000 010 SEE PSTATE -
000 011 UAO ARMv8.2-UAO
000 100 PAN ARMv8.1-PAN
000 101 SPSel -
000 11x RESERVED -
001 xxx RESERVED -
010 xxx RESERVED -
011 000 RESERVED -
011 001 SSBS ARMv8.0-SSBS
011 010 DIT ARMv8.4-DIT
011 011 RESERVED -
011 100 TCO ARMv8.5-MemTag
011 101 RESERVED -
011 110 DAIFSet -
011 111 DAIFClr -
1xx xxx RESERVED -
Operation
case field of
when PSTATEField_SSBS
PSTATE.SSBS = CRm<0>;
when PSTATEField_SP
PSTATE.SP = CRm<0>;
when PSTATEField_DAIFSet
PSTATE.D = PSTATE.D OR CRm<3>;
PSTATE.A = PSTATE.A OR CRm<2>;
PSTATE.I = PSTATE.I OR CRm<1>;
PSTATE.F = PSTATE.F OR CRm<0>;
when PSTATEField_DAIFClr
PSTATE.D = PSTATE.D AND NOT(CRm<3>);
PSTATE.A = PSTATE.A AND NOT(CRm<2>);
PSTATE.I = PSTATE.I AND NOT(CRm<1>);
PSTATE.F = PSTATE.F AND NOT(CRm<0>);
when PSTATEField_PAN
PSTATE.PAN = CRm<0>;
when PSTATEField_UAO
PSTATE.UAO = CRm<0>;
when PSTATEField_DIT
PSTATE.DIT = CRm<0>;
when PSTATEField_TCO
PSTATE.TCO = CRm<0>;
MSR (immediate) Page 338

MSR (register)
Move general-purpose register to System Register allows the PE to write an AArch64 System register from a general-
purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 1 o0 op1 CRn CRm op2 Rt
L
System
MSR (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>), <Xt>
AArch64.CheckSystemAccess('1':o0, op1, CRn, CRm, op2, Rt, L);
integer sys_op0 = 2 + UInt(o0);

Assembler Symbols
<systemreg> Is a System register name, encoded in the "o0:op1:CRn:CRm:op2".

The System register names are defined in 'AArch64 System Registers' in the System Register XML.
<op0> Is an unsigned immediate, encoded in “o0”:
o0 <op0>
0 2
1 3
Operation
AArch64.SysRegWrite(sys_op0, sys_op1, sys_crn, sys_crm, sys_op2, X[t]);
MSR (register) Page 339

MSUB
Multiply-Subtract multiplies two register values, subtracts the product from a third register value, and writes the
result to the destination register.
This instruction is used by the alias MNEG.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 1 Ra Rn Rd
o0
32-bit (sf == 0)
MSUB <Wd>, <Wn>, <Wm>, <Wa>
64-bit (sf == 1)
MSUB <Xd>, <Xn>, <Xm>, <Xa>
integer destsize = if sf == '1' then 64 else 32;
Assembler Symbols
"Rn" field.
"Rm" field.
<Wa> Is the 32-bit name of the third general-purpose source register holding the minuend, encoded in the
"Ra" field.
"Rn" field.
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the minuend, encoded in the
"Ra" field.
Alias Conditions

MNEG Ra == '11111'
Operation
bits(destsize) operand1 = X[n];

bits(destsize) operand2 = X[m];
bits(destsize) operand3 = X[a];
integer result;
result = UInt(operand3) - (UInt(operand1) * UInt(operand2));

X[d] = result<destsize-1:0>;
MSUB Page 340

If PSTATE.DIT is 1:
MSUB Page 341

MUL
Multiply: Rd = Rn * Rm.
This is an alias of MADD. This means:
• The encodings in this description are named to match the encodings of MADD.
• The description of MADD gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 0 1 1 1 1 1 Rn Rd
o0 Ra
32-bit (sf == 0)
MUL <Wd>, <Wn>, <Wm>
is equivalent to
MADD <Wd>, <Wn>, <Wm>, WZR
64-bit (sf == 1)
MUL <Xd>, <Xn>, <Xm>
is equivalent to
MADD <Xd>, <Xn>, <Xm>, XZR
Assembler Symbols
"Rn" field.
"Rm" field.
"Rn" field.
"Rm" field.
Operation
The description of MADD gives the operational pseudocode for this instruction.
MUL Page 342

MVN
Bitwise NOT writes the bitwise inverse of a register value to the destination register.
This is an alias of ORN (shifted register). This means:
• The encodings in this description are named to match the encodings of ORN (shifted register).
• The description of ORN (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 shift 1 Rm imm6 1 1 1 1 1 Rd
opc N Rn
32-bit (sf == 0)
MVN <Wd>, <Wm>{, <shift> #<amount>}
is equivalent to
ORN <Wd>, WZR, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
MVN <Xd>, <Xm>{, <shift> #<amount>}
is equivalent to
ORN <Xd>, XZR, <Xm>{, <shift> #<amount>}
Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Operation
The description of ORN (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MVN Page 343

MVN Page 344

NEG (shifted register)
Negate (shifted register) negates an optionally-shifted register value, and writes the result to the destination register.
This is an alias of SUB (shifted register). This means:
• The encodings in this description are named to match the encodings of SUB (shifted register).
• The description of SUB (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 1 shift 0 Rm imm6 1 1 1 1 1 Rd
op S Rn
32-bit (sf == 0)
NEG <Wd>, <Wm>{, <shift> #<amount>}
is equivalent to
SUB <Wd>, WZR, <Wm> {, <shift> #<amount>}
64-bit (sf == 1)
NEG <Xd>, <Xm>{, <shift> #<amount>}
is equivalent to
SUB <Xd>, XZR, <Xm> {, <shift> #<amount>}
Assembler Symbols
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
"imm6" field.
"imm6" field.
Operation
The description of SUB (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
NEG (shifted register) Page 345

NEG (shifted register) Page 346

NEGS
Negate, setting flags, negates an optionally-shifted register value, and writes the result to the destination register. It
updates the condition flags based on the result.
This is an alias of SUBS (shifted register). This means:
• The encodings in this description are named to match the encodings of SUBS (shifted register).
• The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 shift 0 Rm imm6 1 1 1 1 1 Rd
op S Rn
32-bit (sf == 0)
NEGS <Wd>, <Wm>{, <shift> #<amount>}
is equivalent to
SUBS <Wd>, WZR, <Wm> {, <shift> #<amount>}
64-bit (sf == 1)
NEGS <Xd>, <Xm>{, <shift> #<amount>}
is equivalent to
SUBS <Xd>, XZR, <Xm> {, <shift> #<amount>}
Assembler Symbols
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
"imm6" field.
"imm6" field.
Operation
The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
NEGS Page 347

NEGS Page 348

NGC
Negate with Carry negates the sum of a register value and the value of NOT (Carry flag), and writes the result to the
This is an alias of SBC. This means:
• The encodings in this description are named to match the encodings of SBC.
• The description of SBC gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
op S Rn
32-bit (sf == 0)
NGC <Wd>, <Wm>
is equivalent to
SBC <Wd>, WZR, <Wm>
64-bit (sf == 1)
NGC <Xd>, <Xm>
is equivalent to
SBC <Xd>, XZR, <Xm>
Assembler Symbols
Operation
The description of SBC gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
NGC Page 349

NGCS
Negate with Carry, setting flags, negates the sum of a register value and the value of NOT (Carry flag), and writes the
result to the destination register. It updates the condition flags based on the result.
This is an alias of SBCS. This means:
• The encodings in this description are named to match the encodings of SBCS.
• The description of SBCS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
op S Rn
32-bit (sf == 0)
NGCS <Wd>, <Wm>
is equivalent to
SBCS <Wd>, WZR, <Wm>
64-bit (sf == 1)
NGCS <Xd>, <Xm>
is equivalent to
SBCS <Xd>, XZR, <Xm>
Assembler Symbols
Operation
The description of SBCS gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
NGCS Page 350

NOP
No Operation does nothing, other than advance the value of the program counter by 4. This instruction can be used for
instruction alignment purposes.
The timing effects of including a NOP instruction in a program are not guaranteed. It can increase execution time, leave
it unchanged, or even reduce it. Therefore, NOP instructions are not suitable for timing loops.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1
CRm op2
System
NOP
// Empty.
Operation
// do nothing
If PSTATE.DIT is 1:
NOP Page 351

ORN (shifted register)
Bitwise OR NOT (shifted register) performs a bitwise (inclusive) OR of a register value and the complement of an
optionally-shifted register value, and writes the result to the destination register.
This instruction is used by the alias MVN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc N
32-bit (sf == 0)
ORN <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
ORN <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Alias Conditions

MVN Rn == '11111'
ORN (shifted register) Page 352

Operation

result = operand1 OR operand2;

X[d] = result;
If PSTATE.DIT is 1:
ORN (shifted register) Page 353

ORR (immediate)
Bitwise OR (immediate) performs a bitwise (inclusive) OR of a register value and an immediate register value, and
This instruction is used by the alias MOV (bitmask immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
ORR <Wd|WSP>, <Wn>, #<imm>
64-bit (sf == 1)
ORR <Xd|SP>, <Xn>, #<imm>
bits(datasize) imm;
Assembler Symbols
field.
field.
Alias Conditions

MOV (bitmask immediate) Rn == '11111' && ! MoveWidePreferred(sf, N, imms, immr)
Operation
result = operand1 OR imm;

if d == 31 then
SP[] = result;
else
X[d] = result;
If PSTATE.DIT is 1:
ORR (immediate) Page 354


ORR (shifted register)
Bitwise OR (shifted register) performs a bitwise (inclusive) OR of a register value and an optionally-shifted register
value, and writes the result to the destination register.
This instruction is used by the alias MOV (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc N
32-bit (sf == 0)
ORR <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
ORR <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Alias Conditions

MOV (register) shift == '00' && imm6 == '000000' && Rn == '11111'
ORR (shifted register) Page 356

Operation


X[d] = result;
If PSTATE.DIT is 1:
ORR (shifted register) Page 357

PACDA, PACDZA
Pointer Authentication Code for Data address, using key A. This instruction computes and inserts a pointer
authentication code for a data address, using a modifier and key A.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACDA.
• The value zero, for PACDZA.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 1 0 Rn Rd
PACDA (Z == 0)
PACDA <Xd>, <Xn|SP>
PACDZA (Z == 1 && Rn == 11111)
PACDZA <Xd>

UNDEFINED;
if Z == '0' then // PACDA

else // PACDZA
Assembler Symbols
Operation
X[d] = AddPACDA(X[d], SP[]);
else
X[d] = AddPACDA(X[d], X[n]);
PACDA, PACDZA Page 358

PACDB, PACDZB
Pointer Authentication Code for Data address, using key B. This instruction computes and inserts a pointer
authentication code for a data address, using a modifier and key B.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACDB.
• The value zero, for PACDZB.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 1 1 Rn Rd
PACDB (Z == 0)
PACDB <Xd>, <Xn|SP>
PACDZB (Z == 1 && Rn == 11111)
PACDZB <Xd>

UNDEFINED;
if Z == '0' then // PACDB

else // PACDZB
Assembler Symbols
Operation
X[d] = AddPACDB(X[d], SP[]);
else
X[d] = AddPACDB(X[d], X[n]);
PACDB, PACDZB Page 359

PACGA
Pointer Authentication Code, using Generic key. This instruction computes the pointer authentication code for an
address in the first source register, using a modifier in the second source register, and the Generic key. The computed
pointer authentication code is returned in the upper 32 bits of the destination register.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 1 0 0 Rn Rd
Integer
PACGA <Xd>, <Xn>, <Xm|SP>

UNDEFINED;
if m == 31 then source_is_sp = TRUE;
Assembler Symbols
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Rm"
field.
Operation
X[d] = AddPACGA(X[n], SP[]);
else
X[d] = AddPACGA(X[n], X[m]);
PACGA Page 360

PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA
Pointer Authentication Code for Instruction address, using key A. This instruction computes and inserts a pointer
authentication code for an instruction address, using a modifier and key A.
The address is:
• In the general-purpose register that is specified by <Xd> for PACIA and PACIZA.
• In X17, for PACIA1716.
• In X30, for PACIASP and PACIAZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACIA.
• The value zero, for PACIZA and PACIAZ.
• In X16, for PACIA1716.
• In SP, for PACIASP.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 0 0 Rn Rd
PACIA (Z == 0)
PACIA <Xd>, <Xn|SP>
PACIZA (Z == 1 && Rn == 11111)
PACIZA <Xd>

UNDEFINED;
if Z == '0' then // PACIA

else // PACIZA
System
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 0 0 x 1 1 1 1 1
CRm op2
PACIA, PACIA1716, PACIASP,

Page 361
PACIAZ, PACIZA
PACIA1716 (CRm == 0001 && op2 == 000)
PACIA1716
PACIASP (CRm == 0011 && op2 == 001)
PACIASP
PACIAZ (CRm == 0011 && op2 == 000)
PACIAZ
integer d;
integer n;
case CRm:op2 of
when '0011 000' // PACIAZ
d = 30;
n = 31;
when '0011 001' // PACIASP
d = 30;
// Check for branch target compatibility between PSTATE.BTYPE
// and implicit branch target of PACIASP instruction.
SetBTypeCompatible(BTypeCompatible_PACIXSP());
when '0001 000' // PACIA1716

d = 17;
n = 16;
Assembler Symbols
Operation
X[d] = AddPACIA(X[d], SP[]);
else
X[d] = AddPACIA(X[d], X[n]);
PACIA, PACIA1716, PACIASP,

Page 362
PACIAZ, PACIZA
PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB
Pointer Authentication Code for Instruction address, using key B. This instruction computes and inserts a pointer
authentication code for an instruction address, using a modifier and key B.
The address is:
• In the general-purpose register that is specified by <Xd> for PACIB and PACIZB.
• In X17, for PACIB1716.
• In X30, for PACIBSP and PACIBZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACIB.
• The value zero, for PACIZB and PACIBZ.
• In X16, for PACIB1716.
• In SP, for PACIBSP.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 0 1 Rn Rd
PACIB (Z == 0)
PACIB <Xd>, <Xn|SP>
PACIZB (Z == 1 && Rn == 11111)
PACIZB <Xd>

UNDEFINED;
if Z == '0' then // PACIB

else // PACIZB
System
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 0 1 x 1 1 1 1 1
CRm op2
PACIB, PACIB1716, PACIBSP,

Page 363
PACIBZ, PACIZB
PACIB1716 (CRm == 0001 && op2 == 010)
PACIB1716
PACIBSP (CRm == 0011 && op2 == 011)
PACIBSP
PACIBZ (CRm == 0011 && op2 == 010)
PACIBZ
integer d;
integer n;
case CRm:op2 of
when '0011 010' // PACIBZ
d = 30;
n = 31;
when '0011 011' // PACIBSP
d = 30;
// Check for branch target compatibility between PSTATE.BTYPE
// and implicit branch target of PACIBSP instruction.
SetBTypeCompatible(BTypeCompatible_PACIXSP());
when '0001 010' // PACIB1716
d = 17;
n = 16;
Assembler Symbols
Operation
X[d] = AddPACIB(X[d], SP[]);
else
X[d] = AddPACIB(X[d], X[n]);
PACIB, PACIB1716, PACIBSP,

Page 364
PACIBZ, PACIZB
PRFM (immediate)
Prefetch Memory (immediate) signals the memory system that data memory accesses from a specified address are
likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
memory accesses when they do occur, such as preloading the cache line containing the specified address into one or
more caches.
The effect of an PRFM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 1 1 0 imm12 Rn Rt
size opc
Unsigned offset
PRFM (<prfop>|#<imm5>), [<Xn|SP>{, #<pimm>}]
Assembler Symbols
<prfop> Is the prefetch operation, defined as <type><target><policy>.

<type> is one of:
PLD
Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.
PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
<target> is one of:

L1
Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.
L2
L3
<policy> is one of:

KEEP
Retained or temporal prefetch, allocated in the cache normally. Encoded in the "Rt<0>" field as 0.
STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.
For more information on these prefetch operations, see Prefetch memory.

For other encodings of the "Rt" field, use <imm5>.
<imm5> Is the prefetch operation encoding as an immediate, in the range 0 to 31, encoded in the "Rt" field.
This syntax is only for encodings that are not accessible using <prfop>.
<pimm> Is the optional positive immediate byte offset, a multiple of 8 in the range 0 to 32760, defaulting to 0
and encoded in the "imm12" field as <pimm>/8.
Shared Decode
PRFM (immediate) Page 365

Operation
bits(64) address;
if n == 31 then
address = SP[];
else
address = X[n];
PRFM (immediate) Page 366

PRFM (literal)
Prefetch Memory (literal) signals the memory system that data memory accesses from a specified address are likely to
occur in the near future. The memory system can respond by taking actions that are expected to speed up the memory
accesses when they do occur, such as preloading the cache line containing the specified address into one or more
caches.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 0 imm19 Rt
opc
Literal
PRFM (<prfop>|#<imm5>), <label>
bits(64) offset;
Assembler Symbols

<type> is one of:
PLD
PLI
PST
<target> is one of:

L1
L2
L3
<policy> is one of:

KEEP
STRM
as 1.

PRFM (literal) Page 367

Operation
PRFM (literal) Page 368

PRFM (register)
Prefetch Memory (register) signals the memory system that data memory accesses from a specified address are likely
to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
more caches.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 1 0 1 Rm option S 1 0 Rn Rt
size opc
Integer
PRFM (<prfop>|#<imm5>), [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

Assembler Symbols

<type> is one of:
PLD
PLI
PST
<target> is one of:

L1
L2
L3
<policy> is one of:

KEEP
STRM
as 1.

"Rm" field.
"Rm" field.
PRFM (register) Page 369

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
S <amount>
0 #0
1 #3
Shared Decode
Operation

bits(64) address;
if n == 31 then
address = SP[];
else
address = X[n];
PRFM (register) Page 370

PRFUM
Prefetch Memory (unscaled offset) signals the memory system that data memory accesses from a specified address are
likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
more caches.
The effect of an PRFUM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 1 0 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
PRFUM (<prfop>|#<imm5>), [<Xn|SP>{, #<simm>}]
Assembler Symbols

<type> is one of:
PLD
PLI
PST
<target> is one of:

L1
L2
L3
<policy> is one of:

KEEP
STRM
as 1.

the "imm9" field.
Shared Decode
PRFUM Page 371

Operation
bits(64) address;
if n == 31 then
address = SP[];
else
address = X[n];
PRFUM Page 372

PSB CSYNC
Profiling Synchronization Barrier. This instruction is a barrier that ensures that all existing profiling data for the
current PE has been formatted, and profiling buffer addresses have been translated such that all writes to the profiling
buffer have been initiated. A following DSB instruction completes when the writes to the profiling buffer have
completed.
If the Statistical Profiling Extension is not implemented, this instruction executes as a NOP.
System
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1
CRm op2
System
PSB CSYNC
if !HaveStatisticalProfiling() then EndOfInstruction();
Operation
PSB CSYNC Page 373

PSSBB
Physical Speculative Store Bypass Barrier is a memory barrier which prevents speculative loads from bypassing
earlier stores to the same physical address.
The semantics of the Physical Speculative Store Bypass Barrier are:
• When a load to a location appears in program order after the PSSBB, then the load does not speculatively
read an entry earlier in the coherence order for that location than the entry generated by the latest store
satisfying all of the following conditions:
◦ The store is to the same location as the load.
◦ The store appears in program order before the PSSBB.
• When a load to a location appears in program order before the PSSBB, then the load does not speculatively
read data from any store satisfying all of the following conditions:
◦ The store appears in program order after the PSSBB.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1
CRm opc
System
PSSBB
Operation
SpeculativeStoreBypassBarrierToPA();
PSSBB Page 374

RBIT
Reverse Bits reverses the bit order in a register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Rn Rd
32-bit (sf == 0)
RBIT <Wd>, <Wn>
64-bit (sf == 1)
RBIT <Xd>, <Xn>
Assembler Symbols
Operation
bits(datasize) operand = X[n];

for i = 0 to datasize-1
result<datasize-1-i> = operand<i>;
X[d] = result;
If PSTATE.DIT is 1:
RBIT Page 375

RET
Return from subroutine branches unconditionally to an address in a register, with a hint that this is a subroutine
return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm
Integer
RET {<Xn>}
Assembler Symbols
the "Rn" field. Defaults to X30 if absent.
Operation
BranchTo(target, BranchType_RET);
If PSTATE.DIT is 1:
RET Page 376

RETAA, RETAB
Return from subroutine, with pointer authentication. This instruction authenticates the address that is held in LR,
using SP as the modifier and the specified key, branches to the authenticated address, with a hint that this instruction
is a subroutine return.
Key A is used for RETAA, and key B is used for RETAB.
The authenticated address is not written back to LR.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 M 1 1 1 1 1 1 1 1 1 1
Z op A Rn Rm
RETAA (M == 0)
RETAA
RETAB (M == 1)
RETAB
UNDEFINED;
Operation
bits(64) target = X[30];
bits(64) modifier = SP[];
if use_key_a then
else
BranchTo(target, BranchType_RET);
RETAA, RETAB Page 377

REV
Reverse Bytes reverses the byte order in a register.
This instruction is used by the pseudo-instruction REV64.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 x Rn Rd
opc
32-bit (sf == 0 && opc == 10)
REV <Wd>, <Wn>
64-bit (sf == 1 && opc == 11)
REV <Xd>, <Xn>
integer container_size;
case opc of
when '00'
Unreachable();
when '01'
container_size = 16;
when '10'
when '11'
if sf == '0' then UNDEFINED;
Assembler Symbols
Operation

integer containers = datasize DIV container_size;

integer elements_per_container = container_size DIV 8;
integer index = 0;
integer rev_index;
for c = 0 to containers-1
rev_index = index + ((elements_per_container - 1) * 8);
for e = 0 to elements_per_container-1
result<rev_index+7:rev_index> = operand<index+7:index>;
index = index + 8;
rev_index = rev_index - 8;
X[d] = result;
REV Page 378

If PSTATE.DIT is 1:
REV Page 379

REV16
Reverse bytes in 16-bit halfwords reverses the byte order in each 16-bit halfword of a register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 Rn Rd
opc
32-bit (sf == 0)
REV16 <Wd>, <Wn>
64-bit (sf == 1)
REV16 <Xd>, <Xn>
case opc of
when '00'
Unreachable();
when '01'
when '10'
when '11'
Assembler Symbols
Operation


integer index = 0;
integer rev_index;
index = index + 8;
X[d] = result;
If PSTATE.DIT is 1:
REV16 Page 380

REV16 Page 381

REV32
Reverse bytes in 32-bit words reverses the byte order in each 32-bit word of a register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
sf opc
64-bit
REV32 <Xd>, <Xn>
case opc of
when '00'
Unreachable();
when '01'
when '10'
when '11'
Assembler Symbols
Operation


integer index = 0;
integer rev_index;
index = index + 8;
X[d] = result;
If PSTATE.DIT is 1:
REV32 Page 382

REV32 Page 383

REV64
Reverse Bytes reverses the byte order in a 64-bit general-purpose register.

When assembling for Armv8.2, an assembler must support this pseudo-instruction. It is OPTIONAL whether an
assembler supports this pseudo-instruction when assembling for an architecture earlier than Armv8.2.
This is a pseudo-instruction of REV. This means:
• The encodings in this description are named to match the encodings of REV.
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of REV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 Rn Rd
sf opc
64-bit
REV64 <Xd>, <Xn>
is equivalent to
REV <Xd>, <Xn>
Assembler Symbols
Operation
The description of REV gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
REV64 Page 384

RMIF
Performs a rotation right of a value held in a general purpose register by an immediate value, and then inserts a
selection of the bottom four bits of the result of the rotation into the PSTATE flags, under the control of a second
immediate mask.
Integer
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 0 0 0 imm6 0 0 0 0 1 Rn 0 mask
sf
Integer
RMIF <Xn>, #<shift>, #<mask>

integer lsb = UInt(imm6);
Assembler Symbols
<shift> Is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the "imm6" field,
<mask> Is the flag bit mask, an immediate in the range 0 to 15, which selects the bits that are inserted into the
NZCV condition flags, encoded in the "mask" field.
Operation
bits(4) tmp;
bits(64) tmpreg = X[n];
tmp = (tmpreg:tmpreg)<lsb+3:lsb>;
if mask<3> == '1' then PSTATE.N = tmp<3>;
if mask<2> == '1' then PSTATE.Z = tmp<2>;
if mask<1> == '1' then PSTATE.C = tmp<1>;
if mask<0> == '1' then PSTATE.V = tmp<0>;
If PSTATE.DIT is 1:
RMIF Page 385

ROR (immediate)
Rotate right (immediate) provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left.
This is an alias of EXTR. This means:
• The encodings in this description are named to match the encodings of EXTR.
• The description of EXTR gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 1 N 0 Rm imms Rn Rd
32-bit (sf == 0 && N == 0 && imms == 0xxxxx)
ROR <Wd>, <Ws>, #<shift>
is equivalent to
EXTR <Wd>, <Ws>, <Ws>, #<shift>
64-bit (sf == 1 && N == 1)
ROR <Xd>, <Xs>, #<shift>
is equivalent to
EXTR <Xd>, <Xs>, <Xs>, #<shift>
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<Xs> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<shift> For the 32-bit variant: is the amount by which to rotate, in the range 0 to 31, encoded in the "imms"
field.
For the 64-bit variant: is the amount by which to rotate, in the range 0 to 63, encoded in the "imms"
field.
Operation
The description of EXTR gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
ROR (immediate) Page 386

ROR (register)
Rotate Right (register) provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left. The remainder obtained by
dividing the second source register by the data size defines the number of bits by which the first source register is
right-shifted.
This is an alias of RORV. This means:
• The encodings in this description are named to match the encodings of RORV.
• The description of RORV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
op2
32-bit (sf == 0)
ROR <Wd>, <Wn>, <Wm>
is equivalent to
RORV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
ROR <Xd>, <Xn>, <Xm>
is equivalent to
RORV <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation
The description of RORV gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
ROR (register) Page 387

ROR (register) Page 388

RORV
Rotate Right Variable provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left. The remainder obtained by
dividing the second source register by the data size defines the number of bits by which the first source register is
right-shifted.
This instruction is used by the alias ROR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
op2
32-bit (sf == 0)
RORV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
RORV <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation

X[d] = result;
If PSTATE.DIT is 1:
RORV Page 389

SB
Speculation Barrier is a barrier that controls speculation.

The semantics of the Speculation Barrier are that the execution, until the barrier completes, of any instruction that
appears later in the program order than the barrier:
• Cannot be performed speculatively to the extent that such speculation can be observed through side-channels
as a result of control flow speculation or data value speculation.
• Can be speculatively executed as a result of predicting that a potentially exception generating instruction has
not generated an exception.
In particular, any instruction that appears later in the program order than the barrier cannot cause a speculative
allocation into any caching structure where the allocation of that entry could be indicative of any data value present in
memory or in the registers.
The SB instruction:
• Cannot be speculatively executed as a result of control flow speculation or data value speculation.
• Can be speculatively executed as a result of predicting that a potentially exception generating instruction has
not generated an exception. The potentially exception generating instruction can complete once it is known
not to be speculative, and all data values generated by instructions appearing in program order before the SB
instruction have their predicted values confirmed.
When the prediction of the instruction stream is not informed by data taken from the register outputs of the
speculative execution of instructions appearing in program order after an uncompleted SB instruction, the SB
instruction has no effect on the use of prediction resources to predict the instruction stream that is being fetched.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 (0) (0) (0) (0) 1 1 1 1 1 1 1 1
CRm opc
System
SB
if !HaveSBExt() then UNDEFINED;
Operation
SpeculationBarrier();
SB Page 390
SBC
Subtract with Carry subtracts a register value and the value of NOT (Carry flag) from a register value, and writes the
This instruction is used by the alias NGC.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S
32-bit (sf == 0)
SBC <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
SBC <Xd>, <Xn>, <Xm>
Assembler Symbols
Alias Conditions

NGC Rn == '11111'
Operation
(result, -) = AddWithCarry(operand1, operand2, PSTATE.C);
X[d] = result;
If PSTATE.DIT is 1:
SBC Page 391

SBC Page 392

SBCS
Subtract with Carry, setting flags, subtracts a register value and the value of NOT (Carry flag) from a register value,
and writes the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias NGCS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S
32-bit (sf == 0)
SBCS <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
SBCS <Xd>, <Xn>, <Xm>
Assembler Symbols
Alias Conditions

NGCS Rn == '11111'
Operation
bits(4) nzcv;
(result, nzcv) = AddWithCarry(operand1, operand2, PSTATE.C);
X[d] = result;
If PSTATE.DIT is 1:
SBCS Page 393

SBCS Page 394

SBFIZ
Signed Bitfield Insert in Zeros copies a bitfield of <width> bits from the least significant bits of the source register to
bit position <lsb> of the destination register, setting the destination bits below the bitfield to zero, and the bits above
the bitfield to a copy of the most significant bit of the bitfield.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
SBFIZ <Wd>, <Wn>, #<lsb>, #<width>
is equivalent to
SBFM <Wd>, <Wn>, #(-<lsb> MOD 32), #(<width>-1)
64-bit (sf == 1 && N == 1)
SBFIZ <Xd>, <Xn>, #<lsb>, #<width>
is equivalent to
SBFM <Xd>, <Xn>, #(-<lsb> MOD 64), #(<width>-1)
Assembler Symbols
Operation
If PSTATE.DIT is 1:
SBFIZ Page 395

SBFIZ Page 396

SBFM
Signed Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
or 64 bits.
In both cases the destination bits below the bitfield are set to zero, and the bits above the bitfield are set to a copy of
the most significant bit of the bitfield.
This instruction is used by the aliases ASR (immediate), SBFIZ, SBFX, SXTB, SXTH, and SXTW.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
SBFM <Wd>, <Wn>, #<immr>, #<imms>
64-bit (sf == 1 && N == 1)
SBFM <Xd>, <Xn>, #<immr>, #<imms>
integer R;
integer S;

R = UInt(immr);
S = UInt(imms);
Assembler Symbols
Alias Conditions
Alias Of variant Is preferred when

ASR (immediate) 32-bit imms == '011111'
SBFM Page 397

ASR (immediate) 64-bit imms == '111111'
SBFIZ UInt(imms) < UInt(immr)
SBFX BFXPreferred(sf, opc<1>, imms, immr)
SXTB immr == '000000' && imms == '000111'
SXTH immr == '000000' && imms == '001111'
SXTW immr == '000000' && imms == '011111'
Operation

bits(datasize) bot = ROR(src, R) AND wmask;
// determine extension bits (sign, zero or dest register)

bits(datasize) top = Replicate(src<S>);

X[d] = (top AND NOT(tmask)) OR (bot AND tmask);
If PSTATE.DIT is 1:
SBFM Page 398

SBFX
Signed Bitfield Extract copies a bitfield of <width> bits starting from bit position <lsb> in the source register to the
least significant bits of the destination register, and sets destination bits above the bitfield to a copy of the most
significant bit of the bitfield.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
SBFX <Wd>, <Wn>, #<lsb>, #<width>
is equivalent to
SBFM <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)
and is the preferred disassembly when BFXPreferred(sf, opc<1>, imms, immr).
64-bit (sf == 1 && N == 1)
SBFX <Xd>, <Xn>, #<lsb>, #<width>
is equivalent to
SBFM <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)
Assembler Symbols
Operation
If PSTATE.DIT is 1:
SBFX Page 399

SBFX Page 400

SDIV
Signed Divide divides a signed integer register value by another signed integer register value, and writes the result to
the destination register. The condition flags are not affected.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 0 0 1 1 Rn Rd
o1
32-bit (sf == 0)
SDIV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
SDIV <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation

integer result;
if IsZero(operand2) then
result = 0;
else
result = RoundTowardsZero(Real(Int(operand1, FALSE)) / Real(Int(operand2, FALSE)));
SDIV Page 401

SETF8, SETF16
Set the PSTATE.NZV flags based on the value in the specified general-purpose register. SETF8 treats the value as an 8
bit value, and SETF16 treats the value as an 16 bit value.
The PSTATE.C flag is not affected by these instructions.
Integer
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 sz 0 0 1 0 Rn 0 1 1 0 1
sf
SETF8 (sz == 0)
SETF8 <Wn>
SETF16 (sz == 1)
SETF16 <Wn>

integer msb = if sz == '1' then 15 else 7;
Assembler Symbols
Operation
bits(32) tmpreg = X[n];

PSTATE.N = tmpreg<msb>;
PSTATE.Z = if (tmpreg<msb:0> == Zeros(msb + 1)) then '1' else '0';
PSTATE.V = tmpreg<msb+1> EOR tmpreg<msb>;
//PSTATE.C unchanged;
If PSTATE.DIT is 1:
SETF8, SETF16 Page 402

SEV
Send Event is a hint instruction. It causes an event to be signaled to all PEs in the multiprocessor system. For more
information, see Wait for Event mechanism and Send event.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 1 1 1
CRm op2
System
SEV
// Empty.
Operation
SendEvent();
SEV Page 403

SEVL
Send Event Local is a hint instruction that causes an event to be signaled locally without requiring the event to be
signaled to other PEs in the multiprocessor system. It can prime a wait-loop which starts with a WFE instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 1 1 1 1 1
CRm op2
System
SEVL
// Empty.
Operation
SendEventLocal();
SEVL Page 404

SMADDL
Signed Multiply-Add Long multiplies two 32-bit register values, adds a 64-bit register value, and writes the result to
the 64-bit destination register.
This instruction is used by the alias SMULL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 0 Ra Rn Rd
U o0
64-bit
SMADDL <Xd>, <Wn>, <Wm>, <Xa>
Assembler Symbols
"Rn" field.
"Rm" field.
field.
Alias Conditions

SMULL Ra == '11111'
Operation
bits(32) operand1 = X[n];

bits(32) operand2 = X[m];
bits(64) operand3 = X[a];
integer result;
result = Int(operand3, FALSE) + (Int(operand1, FALSE) * Int(operand2, FALSE));
X[d] = result<63:0>;
If PSTATE.DIT is 1:
SMADDL Page 405

SMC
Secure Monitor Call causes an exception to EL3.

SMC is available only for software executing at EL1 or higher. It is UNDEFINED in EL0.
If the values of HCR_EL2.TSC and SCR_EL3.SMD are both 0, execution of an SMC instruction at EL1 or higher
generates a Secure Monitor Call exception, recording it in ESR_ELx, using the EC value 0x17, that is taken to EL3.
If the value of HCR_EL2.TSC is 1 and EL2 is enabled in the current Security state, execution of an SMC instruction at
EL1 generates an exception that is taken to EL2, regardless of the value of SCR_EL3.SMD. For more information, see
Traps to EL2 of Non-secure EL1 execution of SMC instructions.
If the value of HCR_EL2.TSC is 0 and the value of SCR_EL3.SMD is 1, the SMC instruction is UNDEFINED.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 0 imm16 0 0 0 1 1
System
SMC #<imm>
// Empty.
Assembler Symbols
Operation
AArch64.CheckForSMCUndefOrTrap(imm16);
if SCR_EL3.SMD == '1' then

// SMC disabled
UNDEFINED;
else
AArch64.CallSecureMonitor(imm16);
SMC Page 406

SMNEGL
Signed Multiply-Negate Long multiplies two 32-bit register values, negates the product, and writes the result to the
64-bit destination register.
This is an alias of SMSUBL. This means:
• The encodings in this description are named to match the encodings of SMSUBL.
• The description of SMSUBL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 1 1 1 1 1 1 Rn Rd
U o0 Ra
64-bit
SMNEGL <Xd>, <Wn>, <Wm>
is equivalent to
SMSUBL <Xd>, <Wn>, <Wm>, XZR
Assembler Symbols
"Rn" field.
"Rm" field.
Operation
The description of SMSUBL gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
SMNEGL Page 407

SMSUBL
Signed Multiply-Subtract Long multiplies two 32-bit register values, subtracts the product from a 64-bit register value,
and writes the result to the 64-bit destination register.
This instruction is used by the alias SMNEGL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 1 Ra Rn Rd
U o0
64-bit
SMSUBL <Xd>, <Wn>, <Wm>, <Xa>
Assembler Symbols
"Rn" field.
"Rm" field.
"Ra" field.
Alias Conditions

SMNEGL Ra == '11111'
Operation

integer result;
result = Int(operand3, FALSE) - (Int(operand1, FALSE) * Int(operand2, FALSE));

If PSTATE.DIT is 1:
SMSUBL Page 408

SMULH
Signed Multiply High multiplies two 64-bit register values, and writes bits[127:64] of the 128-bit result to the 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 1 0 Rm 0 (1) (1) (1) (1) (1) Rn Rd
U Ra
64-bit
SMULH <Xd>, <Xn>, <Xm>
Assembler Symbols
"Rn" field.
"Rm" field.
Operation

integer result;
result = Int(operand1, FALSE) * Int(operand2, FALSE);
X[d] = result<127:64>;
If PSTATE.DIT is 1:
SMULH Page 409

SMULL
Signed Multiply Long multiplies two 32-bit register values, and writes the result to the 64-bit destination register.
This is an alias of SMADDL. This means:
• The encodings in this description are named to match the encodings of SMADDL.
• The description of SMADDL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 0 1 1 1 1 1 Rn Rd
U o0 Ra
64-bit
SMULL <Xd>, <Wn>, <Wm>
is equivalent to
SMADDL <Xd>, <Wn>, <Wm>, XZR
Assembler Symbols
"Rn" field.
"Rm" field.
Operation
The description of SMADDL gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
SMULL Page 410

SSBB
Speculative Store Bypass Barrier is a memory barrier which prevents speculative loads from bypassing earlier stores
to the same virtual address under certain conditions.
The semantics of the Speculative Store Bypass Barrier are:
• When a load to a location appears in program order after the SSBB, then the load does not speculatively read
an entry earlier in the coherence order for that location than the entry generated by the latest store satisfying
all of the following conditions:
◦ The store uses the same virtual address as the load.
◦ The store appears in program order before the SSBB.
• When a load to a location appears in program order before the SSBB, then the load does not speculatively
read data from any store satisfying all of the following conditions:
◦ The store uses the same virtual address as the load.
◦ The store appears in program order after the SSBB.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 1 1
CRm opc
System
SSBB
Operation
SpeculativeStoreBypassBarrierToVA();
SSBB Page 411

ST2G
Store Allocation Tags stores an Allocation Tag to two Tag granules of memory. The address used for the store is
calculated from the base register and an immediate signed offset scaled by the Tag granule. The Allocation Tag is
calculated from the Logical Address Tag in the source register.
Post-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 0 1 Xn Xt
Post-index
ST2G <Xt|SP>, [<Xn|SP>], #<simm>

boolean writeback = TRUE;
Pre-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 1 1 Xn Xt
Pre-index
ST2G <Xt|SP>, [<Xn|SP>, #<simm>]!

Signed offset
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 1 0 Xn Xt
Signed offset
ST2G <Xt|SP>, [<Xn|SP>{, #<simm>}]

boolean writeback = FALSE;
ST2G Page 412

Assembler Symbols
<Xt|SP> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
Operation
bits(64) address;
bits(64) data = if t == 31 then SP[] else X[t];
bits(4) tag = AArch64.AllocationTagFromAddress(data);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
AArch64.MemTag[address, AccType_NORMAL] = tag;

AArch64.MemTag[address+TAG_GRANULE, AccType_NORMAL] = tag;
if writeback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
ST2G Page 413

STADD, STADDL
Atomic add on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword
from memory, adds the value held in a register to it, and stores the result back to memory.
• STADD does not have release semantics.
• STADDL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDADD, LDADDA, LDADDAL, LDADDL. This means:
• The encodings in this description are named to match the encodings of LDADD, LDADDA, LDADDAL,
LDADDL.
• The description of LDADD, LDADDA, LDADDAL, LDADDL gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
32-bit LDADD alias (size == 10 && R == 0)
STADD <Ws>, [<Xn|SP>]
is equivalent to
LDADD <Ws>, WZR, [<Xn|SP>]
32-bit LDADDL alias (size == 10 && R == 1)
STADDL <Ws>, [<Xn|SP>]
is equivalent to
LDADDL <Ws>, WZR, [<Xn|SP>]
64-bit LDADD alias (size == 11 && R == 0)
STADD <Xs>, [<Xn|SP>]
is equivalent to
LDADD <Xs>, XZR, [<Xn|SP>]
64-bit LDADDL alias (size == 11 && R == 1)
STADDL <Xs>, [<Xn|SP>]
is equivalent to
LDADDL <Xs>, XZR, [<Xn|SP>]
STADD, STADDL Page 414

Assembler Symbols
Operation
The description of LDADD, LDADDA, LDADDAL, LDADDL gives the operational pseudocode for this instruction.
STADD, STADDL Page 415

STADDB, STADDLB
Atomic add on byte in memory, without return, atomically loads an 8-bit byte from memory, adds the value held in a
register to it, and stores the result back to memory.
• STADDB does not have release semantics.
• STADDLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDADDB, LDADDAB, LDADDALB, LDADDLB. This means:
• The encodings in this description are named to match the encodings of LDADDB, LDADDAB, LDADDALB,
LDADDLB.
• The description of LDADDB, LDADDAB, LDADDALB, LDADDLB gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
STADDB <Ws>, [<Xn|SP>]
is equivalent to
LDADDB <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STADDLB <Ws>, [<Xn|SP>]
is equivalent to
LDADDLB <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDADDB, LDADDAB, LDADDALB, LDADDLB gives the operational pseudocode for this instruction.
STADDB, STADDLB Page 416

STADDH, STADDLH
Atomic add on halfword in memory, without return, atomically loads a 16-bit halfword from memory, adds the value
held in a register to it, and stores the result back to memory.
• STADDH does not have release semantics.
• STADDLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDADDH, LDADDAH, LDADDALH, LDADDLH. This means:
• The encodings in this description are named to match the encodings of LDADDH, LDADDAH, LDADDALH,
LDADDLH.
• The description of LDADDH, LDADDAH, LDADDALH, LDADDLH gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
STADDH <Ws>, [<Xn|SP>]
is equivalent to
LDADDH <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STADDLH <Ws>, [<Xn|SP>]
is equivalent to
LDADDLH <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDADDH, LDADDAH, LDADDALH, LDADDLH gives the operational pseudocode for this instruction.
STADDH, STADDLH Page 417

STCLR, STCLRL
Atomic bit clear on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, performs a bitwise AND with the complement of the value held in a register on it, and
stores the result back to memory.
• STCLR does not have release semantics.
• STCLRL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDCLR, LDCLRA, LDCLRAL, LDCLRL. This means:
• The encodings in this description are named to match the encodings of LDCLR, LDCLRA, LDCLRAL, LDCLRL.
• The description of LDCLR, LDCLRA, LDCLRAL, LDCLRL gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
32-bit LDCLR alias (size == 10 && R == 0)
STCLR <Ws>, [<Xn|SP>]
is equivalent to
LDCLR <Ws>, WZR, [<Xn|SP>]
32-bit LDCLRL alias (size == 10 && R == 1)
STCLRL <Ws>, [<Xn|SP>]
is equivalent to
LDCLRL <Ws>, WZR, [<Xn|SP>]
64-bit LDCLR alias (size == 11 && R == 0)
STCLR <Xs>, [<Xn|SP>]
is equivalent to
LDCLR <Xs>, XZR, [<Xn|SP>]
64-bit LDCLRL alias (size == 11 && R == 1)
STCLRL <Xs>, [<Xn|SP>]
is equivalent to
LDCLRL <Xs>, XZR, [<Xn|SP>]
STCLR, STCLRL Page 418

Assembler Symbols
Operation
The description of LDCLR, LDCLRA, LDCLRAL, LDCLRL gives the operational pseudocode for this instruction.
STCLR, STCLRL Page 419

STCLRB, STCLRLB
Atomic bit clear on byte in memory, without return, atomically loads an 8-bit byte from memory, performs a bitwise
AND with the complement of the value held in a register on it, and stores the result back to memory.
• STCLRB does not have release semantics.
• STCLRLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB. This means:
• The encodings in this description are named to match the encodings of LDCLRB, LDCLRAB, LDCLRALB,
LDCLRLB.
• The description of LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
STCLRB <Ws>, [<Xn|SP>]
is equivalent to
LDCLRB <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STCLRLB <Ws>, [<Xn|SP>]
is equivalent to
LDCLRLB <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB gives the operational pseudocode for this instruction.
STCLRB, STCLRLB Page 420

STCLRH, STCLRLH
Atomic bit clear on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs a
bitwise AND with the complement of the value held in a register on it, and stores the result back to memory.
• STCLRH does not have release semantics.
• STCLRLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH. This means:
• The encodings in this description are named to match the encodings of LDCLRH, LDCLRAH, LDCLRALH,
LDCLRLH.
• The description of LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
STCLRH <Ws>, [<Xn|SP>]
is equivalent to
LDCLRH <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STCLRLH <Ws>, [<Xn|SP>]
is equivalent to
LDCLRLH <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH gives the operational pseudocode for this instruction.
STCLRH, STCLRLH Page 421

STEOR, STEORL
Atomic exclusive OR on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, performs an exclusive OR with the value held in a register on it, and stores the result back
to memory.
• STEOR does not have release semantics.
• STEORL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDEOR, LDEORA, LDEORAL, LDEORL. This means:
• The encodings in this description are named to match the encodings of LDEOR, LDEORA, LDEORAL,
LDEORL.
• The description of LDEOR, LDEORA, LDEORAL, LDEORL gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
32-bit LDEOR alias (size == 10 && R == 0)
STEOR <Ws>, [<Xn|SP>]
is equivalent to
LDEOR <Ws>, WZR, [<Xn|SP>]
32-bit LDEORL alias (size == 10 && R == 1)
STEORL <Ws>, [<Xn|SP>]
is equivalent to
LDEORL <Ws>, WZR, [<Xn|SP>]
64-bit LDEOR alias (size == 11 && R == 0)
STEOR <Xs>, [<Xn|SP>]
is equivalent to
LDEOR <Xs>, XZR, [<Xn|SP>]
64-bit LDEORL alias (size == 11 && R == 1)
STEORL <Xs>, [<Xn|SP>]
is equivalent to
LDEORL <Xs>, XZR, [<Xn|SP>]
STEOR, STEORL Page 422

Assembler Symbols
Operation
The description of LDEOR, LDEORA, LDEORAL, LDEORL gives the operational pseudocode for this instruction.
STEOR, STEORL Page 423

STEORB, STEORLB
Atomic exclusive OR on byte in memory, without return, atomically loads an 8-bit byte from memory, performs an
exclusive OR with the value held in a register on it, and stores the result back to memory.
• STEORB does not have release semantics.
• STEORLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDEORB, LDEORAB, LDEORALB, LDEORLB. This means:
• The encodings in this description are named to match the encodings of LDEORB, LDEORAB, LDEORALB,
LDEORLB.
• The description of LDEORB, LDEORAB, LDEORALB, LDEORLB gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
STEORB <Ws>, [<Xn|SP>]
is equivalent to
LDEORB <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STEORLB <Ws>, [<Xn|SP>]
is equivalent to
LDEORLB <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDEORB, LDEORAB, LDEORALB, LDEORLB gives the operational pseudocode for this instruction.
STEORB, STEORLB Page 424

STEORH, STEORLH
Atomic exclusive OR on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs
an exclusive OR with the value held in a register on it, and stores the result back to memory.
• STEORH does not have release semantics.
• STEORLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDEORH, LDEORAH, LDEORALH, LDEORLH. This means:
• The encodings in this description are named to match the encodings of LDEORH, LDEORAH, LDEORALH,
LDEORLH.
• The description of LDEORH, LDEORAH, LDEORALH, LDEORLH gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
STEORH <Ws>, [<Xn|SP>]
is equivalent to
LDEORH <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STEORLH <Ws>, [<Xn|SP>]
is equivalent to
LDEORLH <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDEORH, LDEORAH, LDEORALH, LDEORLH gives the operational pseudocode for this instruction.
STEORH, STEORLH Page 425

STG
Store Allocation Tag stores an Allocation Tag to memory. The address used for the store is calculated from the base
register and an immediate signed offset scaled by the Tag granule. The Allocation Tag is calculated from the Logical
Address Tag in the source register.
Post-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 0 1 Xn Xt
Post-index
STG <Xt|SP>, [<Xn|SP>], #<simm>
Pre-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 1 1 Xn Xt
Pre-index
STG <Xt|SP>, [<Xn|SP>, #<simm>]!
Signed offset
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 1 0 Xn Xt
Signed offset
STG <Xt|SP>, [<Xn|SP>{, #<simm>}]

STG Page 426

Assembler Symbols
Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then

if writeback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
STG Page 427

STGM
Store Tag Multiple writes a naturally aligned block of N Allocation Tags, where the size of N is identified in
GMID_EL1.BS, and the Allocation Tag written to address A is taken from the source register at
4*A<7:4>+3:4*A<7:4>.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt
Integer
STGM <Xt>, [<Xn|SP>]

Assembler Symbols
Operation

UNDEFINED;
bits(64) data = X[t];

bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
integer size = 4 * (2 ^ (UInt(GMID_EL1.BS)));

integer index = UInt(address<LOG2_TAG_GRANULE+3:LOG2_TAG_GRANULE>);
bits(4) tag = data<(index*4)+3:index*4>;
index = index + 1;
STGM Page 428

STGP
Store Allocation Tag and Pair of registers stores an Allocation Tag and two 64-bit doublewords to memory, from two
registers. The address used for the store is calculated from the base register and an immediate signed offset scaled by
the Tag granule. The Allocation Tag is calculated from the Logical Address Tag in the base register.
Post-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 0 1 0 simm7 Xt2 Xn Xt
Post-index
STGP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>
integer t2 = UInt(Xt2);
bits(64) offset = LSL(SignExtend(simm7, 64), LOG2_TAG_GRANULE);
Pre-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 1 0 simm7 Xt2 Xn Xt
Pre-index
STGP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!
Signed offset
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 0 0 simm7 Xt2 Xn Xt
Signed offset
STGP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

STGP Page 429

Assembler Symbols
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Xt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Xt2" field.
<imm> For the post-index and pre-index variant: is the signed immediate offset, a multiple of 16 in the range
-1024 to 1008, encoded in the "simm7" field.
For the signed offset variant: is the optional signed immediate offset, a multiple of 16 in the range -1024
to 1008, defaulting to 0 and encoded in the "simm7" field.
Operation
bits(64) address;
bits(64) data1;
bits(64) data2;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data1 = X[t];
data2 = X[t2];
if !postindex then
Mem[address, 8, AccType_NORMAL] = data1;

Mem[address+8, 8, AccType_NORMAL] = data2;
AArch64.MemTag[address, AccType_NORMAL] = AArch64.AllocationTagFromAddress(address);
if writeback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
STGP Page 430

STLLR
Store LORelease Register stores a 32-bit word or a 64-bit doubleword to a memory location, from a register. The
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
32-bit (size == 10)
STLLR <Wt>, [<Xn|SP>{,#0}]
64-bit (size == 11)
STLLR <Xt>, [<Xn|SP>{,#0}]

Assembler Symbols
Operation
bits(64) address;
bits(elsize) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, dbytes, AccType_LIMITEDORDERED] = data;
STLLR Page 431

STLLRB
Store LORelease Register Byte stores a byte from a 32-bit register to a memory location. The instruction also has
memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
STLLRB <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 1, AccType_LIMITEDORDERED] = data;
STLLRB Page 432

STLLRH
Store LORelease Register Halfword stores a halfword from a 32-bit register to a memory location. The instruction also
has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
No offset
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
STLLRH <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 2, AccType_LIMITEDORDERED] = data;
STLLRH Page 433

STLR
Store-Release Register stores a 32-bit word or a 64-bit doubleword to a memory location, from a register. The
instruction also has memory ordering semantics as described in Load-Acquire, Store-Release. For information about
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
32-bit (size == 10)
STLR <Wt>, [<Xn|SP>{,#0}]
64-bit (size == 11)
STLR <Xt>, [<Xn|SP>{,#0}]

Assembler Symbols
Operation
bits(64) address;
bits(elsize) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, dbytes, AccType_ORDERED] = data;
STLR Page 434

STLRB
Store-Release Register Byte stores a byte from a 32-bit register to a memory location. The instruction also has memory
ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses, see Load/
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
STLRB <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STLRB Page 435

STLRH
Store-Release Register Halfword stores a halfword from a 32-bit register to a memory location. The instruction also
has memory ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses,
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
No offset
STLRH <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STLRH Page 436

STLUR
Store-Release Register (unscaled) calculates an address from a base register value and an immediate offset, and
stores a 32-bit word or a 64-bit doubleword to the calculated address, from a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc
32-bit (size == 10)
STLUR <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (size == 11)
STLUR <Xt>, [<Xn|SP>{, #<simm>}]

Assembler Symbols
the "imm9" field.
Shared Decode

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, datasize DIV 8, AccType_ORDERED] = data;
STLUR Page 437

STLUR Page 438

STLURB
Store-Release Register Byte (unscaled) calculates an address from a base register value and an immediate offset, and
stores a byte to the calculated address, from a 32-bit register.
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
STLURB <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STLURB Page 439

STLURH
Store-Release Register Halfword (unscaled) calculates an address from a base register value and an immediate offset,
and stores a halfword to the calculated address, from a 32-bit register.
Unscaled offset
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
STLURH <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STLURH Page 440

STLXP
Store-Release Exclusive Pair of registers stores two 32-bit words or two 64-bit doublewords to a memory location if the
PE has exclusive access to the memory address, from two registers, and returns a status value of 0 if the store was
successful, or of 1 if no store was performed. See Synchronization and semaphores. A 32-bit pair requires the address
to be doubleword aligned and is single-copy atomic at doubleword granularity. A 64-bit pair requires the address to be
quadword aligned and, if the Store-Exclusive succeeds, it causes a single-copy atomic update of the 128-bit memory
location being updated. The instruction also has memory ordering semantics as described in Load-Acquire, Store-
Release. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 0 1 Rs 1 Rt2 Rn Rt
L o0
32-bit (sz == 0)
STLXP <Ws>, <Wt1>, <Wt2>, [<Xn|SP>{,#0}]
64-bit (sz == 1)
STLXP <Ws>, <Xt1>, <Xt2>, [<Xn|SP>{,#0}]
integer t2 = UInt(Rt2); // ignored by load/store single register
integer s = UInt(Rs); // ignored by all loads and store-release

UNPREDICTABLE behaviors, and particularly STLXP.
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
Aborts and alignment

If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort
exception to be generated, subject to the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.
STLXP Page 441

Operation
bits(64) address;
boolean rn_unknown = FALSE;
if s == t || (s == t2) then
Constraint c = ConstrainUnpredictable(Unpredictable_DATAOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_NONE, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // store UNKNOWN value
when Constraint_NONE rt_unknown = FALSE; // store original value
if s == n && n != 31 then
Constraint c = ConstrainUnpredictable(Unpredictable_BASEOVERLAP);
case c of
when Constraint_UNKNOWN rn_unknown = TRUE; // address is UNKNOWN
when Constraint_NONE rn_unknown = FALSE; // address is original base
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
else
address = X[n];
if rt_unknown then
data = bits(datasize) UNKNOWN;
else
bits(datasize DIV 2) el1 = X[t];
bits(datasize DIV 2) el2 = X[t2];
data = if BigEndian() then el1:el2 else el2:el1;
bit status = '1';
// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, dbytes) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, dbytes, AccType_ORDEREDATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);
STLXP Page 442

STLXR
Store-Release Exclusive Register stores a 32-bit word or a 64-bit doubleword to memory if the PE has exclusive access
to the memory address, from two registers, and returns a status value of 0 if the store was successful, or of 1 if no
store was performed. See Synchronization and semaphores. The memory access is atomic. The instruction also has
memory ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses see
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
32-bit (size == 10)
STLXR <Ws>, <Wt>, [<Xn|SP>{,#0}]
64-bit (size == 11)
STLXR <Ws>, <Xt>, [<Xn|SP>{,#0}]

UNPREDICTABLE behaviors, and particularly STLXR.
Assembler Symbols
0
1

STLXR Page 443

Operation
bits(64) address;
bits(elsize) data;
if s == t then
case c of
if s == n && n != 31 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
data = bits(elsize) UNKNOWN;
else
data = X[t];
bit status = '1';

Mem[address, dbytes, AccType_ORDEREDATOMIC] = data;
STLXR Page 444

STLXRB
Store-Release Exclusive Register Byte stores a byte from a 32-bit register to memory if the PE has exclusive access to
the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic. The instruction also has memory ordering semantics
as described in Load-Acquire, Store-Release. For information about memory accesses see Load/Store addressing
modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
No offset
STLXRB <Ws>, <Wt>, [<Xn|SP>{,#0}]
UNPREDICTABLE behaviors, and particularly STLXRB.
Assembler Symbols
0
1
Aborts
STLXRB Page 445

Operation
bits(64) address;
bits(8) data;
if s == t then
case c of
if s == n && n != 31 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
else
data = X[t];
bit status = '1';

if AArch64.ExclusiveMonitorsPass(address, 1) then
Mem[address, 1, AccType_ORDEREDATOMIC] = data;
STLXRB Page 446

STLXRH
Store-Release Exclusive Register Halfword stores a halfword from a 32-bit register to memory if the PE has exclusive
access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was
performed. See Synchronization and semaphores. The memory access is atomic. The instruction also has memory
ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses see Load/
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
No offset
STLXRH <Ws>, <Wt>, [<Xn|SP>{,#0}]
UNPREDICTABLE behaviors, and particularly STLXRH.
Assembler Symbols
0
1

A non halfword-aligned memory address causes an Alignment fault Data Abort exception to be generated, subject to
the following rules:
STLXRH Page 447

Operation
bits(64) address;
bits(16) data;
if s == t then
case c of
if s == n && n != 31 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
else
data = X[t];
bit status = '1';

Mem[address, 2, AccType_ORDEREDATOMIC] = data;
STLXRH Page 448

STNP
Store Pair of Registers, with non-temporal hint, calculates an address from a base register value and an immediate
offset, and stores two 32-bit words or two 64-bit doublewords to the calculated address, from two registers. For
information about memory accesses, see Load/Store addressing modes. For information about Non-temporal pair
instructions, see Load/Store Non-temporal pair.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 0 0 imm7 Rt2 Rn Rt
opc L
32-bit (opc == 00)
STNP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]
64-bit (opc == 10)
STNP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]
// Empty.
Assembler Symbols
Shared Decode
if opc<0> == '1' then UNDEFINED;
STNP Page 449

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data1 = X[t];
data2 = X[t2];
Mem[address, dbytes, AccType_STREAM] = data1;
Mem[address+dbytes, dbytes, AccType_STREAM] = data2;
STNP Page 450

STP
Store Pair of Registers calculates an address from a base register value and an immediate offset, and stores two 32-bit
words or two 64-bit doublewords to the calculated address, from two registers. For information about memory
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 1 0 imm7 Rt2 Rn Rt
opc L
32-bit (opc == 00)
STP <Wt1>, <Wt2>, [<Xn|SP>], #<imm>
64-bit (opc == 10)
STP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 1 0 imm7 Rt2 Rn Rt
opc L
32-bit (opc == 00)
STP <Wt1>, <Wt2>, [<Xn|SP>, #<imm>]!
64-bit (opc == 10)
STP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 0 0 imm7 Rt2 Rn Rt
opc L
32-bit (opc == 00)
STP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]
64-bit (opc == 10)
STP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

STP Page 451

UNPREDICTABLE behaviors, and particularly STP.
Assembler Symbols
Shared Decode
if L:opc<0> == '01' || opc == '11' then UNDEFINED;
STP Page 452

Operation
bits(64) address;

Constraint c = ConstrainUnpredictable(Unpredictable_WBOVERLAPST);
case c of
when Constraint_NONE rt_unknown = FALSE; // value stored is pre-writeback
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
if rt_unknown && t == n then

else
data1 = X[t];
if rt_unknown && t2 == n then
else
data2 = X[t2];
Mem[address, dbytes, AccType_NORMAL] = data1;
Mem[address+dbytes, dbytes, AccType_NORMAL] = data2;
if wback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
STP Page 453

STR (immediate)
Store Register (immediate) stores a word or a doubleword from a register to memory. The address that is used for the
store is calculated from a base register and an immediate offset. For information about memory accesses, see Load/
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc
32-bit (size == 10)
STR <Wt>, [<Xn|SP>], #<simm>
64-bit (size == 11)
STR <Xt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc
32-bit (size == 10)
STR <Wt>, [<Xn|SP>, #<simm>]!
64-bit (size == 11)
STR <Xt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc
STR (immediate) Page 454

32-bit (size == 10)
STR <Wt>, [<Xn|SP>{, #<pimm>}]
64-bit (size == 11)
STR <Xt>, [<Xn|SP>{, #<pimm>}]

Assembler Symbols
<pimm> For the 32-bit variant: is the optional positive immediate byte offset, a multiple of 4 in the range 0 to
Shared Decode


Operation
bits(64) address;

case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
if rt_unknown then
else
data = X[t];
Mem[address, datasize DIV 8, AccType_NORMAL] = data;
if wback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;

STR (register)
Store Register (register) calculates an address from a base register value and an offset register value, and stores a
32-bit word or a 64-bit doubleword to the calculated address, from a register. For information about memory accesses,
The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base
register value and an offset register value. The offset can be optionally shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc
32-bit (size == 10)
STR <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
64-bit (size == 11)
STR <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

Assembler Symbols
"Rm" field.
"Rm" field.
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
S <amount>
0 #0
1 #2
S <amount>
0 #0
1 #3
STR (register) Page 457

Shared Decode
Operation

bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STR (register) Page 458

STRB (immediate)
Store Register Byte (immediate) stores the least significant byte of a 32-bit register to memory. The address that is
used for the store is calculated from a base register and an immediate offset. For information about memory accesses,
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc
Post-index
STRB <Wt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc
Pre-index
STRB <Wt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc
Unsigned offset
STRB <Wt>, [<Xn|SP>{, #<pimm>}]

UNPREDICTABLE behaviors, and particularly STRB (immediate).
Assembler Symbols
STRB (immediate) Page 459

"imm12" field.
Shared Decode
Operation
bits(64) address;
bits(8) data;

case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
if rt_unknown then
else
data = X[t];
if wback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
STRB (immediate) Page 460

STRB (register)
Store Register Byte (register) calculates an address from a base register value and an offset register value, and stores
a byte from a 32-bit register to the calculated address. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc
Extended register (option != 011)
STRB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]
Shifted register (option == 011)
STRB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

Assembler Symbols
"Rm" field.
"Rm" field.
option <extend>
010 UXTW
110 SXTW
111 SXTX
Shared Decode
STRB (register) Page 461

Operation

bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STRB (register) Page 462

STRH (immediate)
Store Register Halfword (immediate) stores the least significant halfword of a 32-bit register to memory. The address
that is used for the store is calculated from a base register and an immediate offset. For information about memory
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc
Post-index
STRH <Wt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc
Pre-index
STRH <Wt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc
Unsigned offset
STRH <Wt>, [<Xn|SP>{, #<pimm>}]

UNPREDICTABLE behaviors, and particularly STRH (immediate).
Assembler Symbols
STRH (immediate) Page 463

Shared Decode
Operation
bits(64) address;
bits(16) data;

case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
if rt_unknown then
else
data = X[t];
if wback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
STRH (immediate) Page 464

STRH (register)
Store Register Halfword (register) calculates an address from a base register value and an offset register value, and
stores a halfword from a 32-bit register to the calculated address. For information about memory accesses, see Load/
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc
32-bit
STRH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

Assembler Symbols
"Rm" field.
"Rm" field.
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
S <amount>
0 #0
1 #1
Shared Decode
STRH (register) Page 465

Operation

bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STRH (register) Page 466

STSET, STSETL
Atomic bit set on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword
from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory.
• STSET does not have release semantics.
• STSETL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSET, LDSETA, LDSETAL, LDSETL. This means:
• The encodings in this description are named to match the encodings of LDSET, LDSETA, LDSETAL, LDSETL.
• The description of LDSET, LDSETA, LDSETAL, LDSETL gives the operational pseudocode for this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
32-bit LDSET alias (size == 10 && R == 0)
STSET <Ws>, [<Xn|SP>]
is equivalent to
LDSET <Ws>, WZR, [<Xn|SP>]
32-bit LDSETL alias (size == 10 && R == 1)
STSETL <Ws>, [<Xn|SP>]
is equivalent to
LDSETL <Ws>, WZR, [<Xn|SP>]
64-bit LDSET alias (size == 11 && R == 0)
STSET <Xs>, [<Xn|SP>]
is equivalent to
LDSET <Xs>, XZR, [<Xn|SP>]
64-bit LDSETL alias (size == 11 && R == 1)
STSETL <Xs>, [<Xn|SP>]
is equivalent to
LDSETL <Xs>, XZR, [<Xn|SP>]
STSET, STSETL Page 467

Assembler Symbols
Operation
The description of LDSET, LDSETA, LDSETAL, LDSETL gives the operational pseudocode for this instruction.
STSET, STSETL Page 468

STSETB, STSETLB
Atomic bit set on byte in memory, without return, atomically loads an 8-bit byte from memory, performs a bitwise OR
with the value held in a register on it, and stores the result back to memory.
• STSETB does not have release semantics.
• STSETLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSETB, LDSETAB, LDSETALB, LDSETLB. This means:
• The encodings in this description are named to match the encodings of LDSETB, LDSETAB, LDSETALB,
LDSETLB.
• The description of LDSETB, LDSETAB, LDSETALB, LDSETLB gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
STSETB <Ws>, [<Xn|SP>]
is equivalent to
LDSETB <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STSETLB <Ws>, [<Xn|SP>]
is equivalent to
LDSETLB <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDSETB, LDSETAB, LDSETALB, LDSETLB gives the operational pseudocode for this instruction.
STSETB, STSETLB Page 469

STSETH, STSETLH
Atomic bit set on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs a
bitwise OR with the value held in a register on it, and stores the result back to memory.
• STSETH does not have release semantics.
• STSETLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSETH, LDSETAH, LDSETALH, LDSETLH. This means:
• The encodings in this description are named to match the encodings of LDSETH, LDSETAH, LDSETALH,
LDSETLH.
• The description of LDSETH, LDSETAH, LDSETALH, LDSETLH gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
STSETH <Ws>, [<Xn|SP>]
is equivalent to
LDSETH <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STSETLH <Ws>, [<Xn|SP>]
is equivalent to
LDSETLH <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDSETH, LDSETAH, LDSETALH, LDSETLH gives the operational pseudocode for this instruction.
STSETH, STSETLH Page 470

STSMAX, STSMAXL
Atomic signed maximum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory,
treating the values as signed numbers.
• STSMAX does not have release semantics.
• STSMAXL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL. This means:
• The encodings in this description are named to match the encodings of LDSMAX, LDSMAXA, LDSMAXAL,
LDSMAXL.
• The description of LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
32-bit LDSMAX alias (size == 10 && R == 0)
STSMAX <Ws>, [<Xn|SP>]
is equivalent to
LDSMAX <Ws>, WZR, [<Xn|SP>]
32-bit LDSMAXL alias (size == 10 && R == 1)
STSMAXL <Ws>, [<Xn|SP>]
is equivalent to
LDSMAXL <Ws>, WZR, [<Xn|SP>]
64-bit LDSMAX alias (size == 11 && R == 0)
STSMAX <Xs>, [<Xn|SP>]
is equivalent to
LDSMAX <Xs>, XZR, [<Xn|SP>]
64-bit LDSMAXL alias (size == 11 && R == 1)
STSMAXL <Xs>, [<Xn|SP>]
is equivalent to
LDSMAXL <Xs>, XZR, [<Xn|SP>]
STSMAX, STSMAXL Page 471

Assembler Symbols
Operation
The description of LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL gives the operational pseudocode for this instruction.
STSMAX, STSMAXL Page 472

STSMAXB, STSMAXLB
Atomic signed maximum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the larger value back to memory, treating the values as signed
numbers.
• STSMAXB does not have release semantics.
• STSMAXLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB. This means:
• The encodings in this description are named to match the encodings of LDSMAXB, LDSMAXAB, LDSMAXALB,
LDSMAXLB.
• The description of LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB gives the operational pseudocode for
this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
STSMAXB <Ws>, [<Xn|SP>]
is equivalent to
LDSMAXB <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STSMAXLB <Ws>, [<Xn|SP>]
is equivalent to
LDSMAXLB <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB gives the operational pseudocode for this
instruction.
STSMAXB, STSMAXLB Page 473

STSMAXH, STSMAXLH
Atomic signed maximum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the larger value back to memory, treating the values as
signed numbers.
• STSMAXH does not have release semantics.
• STSMAXLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH. This means:
• The encodings in this description are named to match the encodings of LDSMAXH, LDSMAXAH,
LDSMAXALH, LDSMAXLH.
• The description of LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH gives the operational pseudocode for
this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
STSMAXH <Ws>, [<Xn|SP>]
is equivalent to
LDSMAXH <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STSMAXLH <Ws>, [<Xn|SP>]
is equivalent to
LDSMAXLH <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH gives the operational pseudocode for this
instruction.
STSMAXH, STSMAXLH Page 474

STSMIN, STSMINL
Atomic signed minimum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the smaller value back to
memory, treating the values as signed numbers.
• STSMIN does not have release semantics.
• STSMINL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSMIN, LDSMINA, LDSMINAL, LDSMINL. This means:
• The encodings in this description are named to match the encodings of LDSMIN, LDSMINA, LDSMINAL,
LDSMINL.
• The description of LDSMIN, LDSMINA, LDSMINAL, LDSMINL gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
32-bit LDSMIN alias (size == 10 && R == 0)
STSMIN <Ws>, [<Xn|SP>]
is equivalent to
LDSMIN <Ws>, WZR, [<Xn|SP>]
32-bit LDSMINL alias (size == 10 && R == 1)
STSMINL <Ws>, [<Xn|SP>]
is equivalent to
LDSMINL <Ws>, WZR, [<Xn|SP>]
64-bit LDSMIN alias (size == 11 && R == 0)
STSMIN <Xs>, [<Xn|SP>]
is equivalent to
LDSMIN <Xs>, XZR, [<Xn|SP>]
64-bit LDSMINL alias (size == 11 && R == 1)
STSMINL <Xs>, [<Xn|SP>]
is equivalent to
LDSMINL <Xs>, XZR, [<Xn|SP>]
STSMIN, STSMINL Page 475

Assembler Symbols
Operation
The description of LDSMIN, LDSMINA, LDSMINAL, LDSMINL gives the operational pseudocode for this instruction.
STSMIN, STSMINL Page 476

STSMINB, STSMINLB
Atomic signed minimum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as signed
numbers.
• STSMINB does not have release semantics.
• STSMINLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB. This means:
• The encodings in this description are named to match the encodings of LDSMINB, LDSMINAB, LDSMINALB,
LDSMINLB.
• The description of LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB gives the operational pseudocode for
this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
STSMINB <Ws>, [<Xn|SP>]
is equivalent to
LDSMINB <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STSMINLB <Ws>, [<Xn|SP>]
is equivalent to
LDSMINLB <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB gives the operational pseudocode for this
instruction.
STSMINB, STSMINLB Page 477

STSMINH, STSMINLH
Atomic signed minimum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the smaller value back to memory, treating the values as
signed numbers.
• STSMINH does not have release semantics.
• STSMINLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH. This means:
• The encodings in this description are named to match the encodings of LDSMINH, LDSMINAH, LDSMINALH,
LDSMINLH.
• The description of LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH gives the operational pseudocode for
this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
STSMINH <Ws>, [<Xn|SP>]
is equivalent to
LDSMINH <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STSMINLH <Ws>, [<Xn|SP>]
is equivalent to
LDSMINLH <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH gives the operational pseudocode for this
instruction.
STSMINH, STSMINLH Page 478

STTR
Store Register (unprivileged) stores a word or doubleword from a register to memory. The address that is used for the
store is calculated from a base register and an immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc
32-bit (size == 10)
STTR <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (size == 11)
STTR <Xt>, [<Xn|SP>{, #<simm>}]

Assembler Symbols
the "imm9" field.
Shared Decode

else

STTR Page 479

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, datasize DIV 8, acctype] = data;
STTR Page 480

STTRB
Store Register Byte (unprivileged) stores a byte from a 32-bit register to memory. The address that is used for the
store is calculated from a base register and an immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc
Unscaled offset
STTRB <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode

else
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STTRB Page 481

STTRB Page 482

STTRH
Store Register Halfword (unprivileged) stores a halfword from a 32-bit register to memory. The address that is used
for the store is calculated from a base register and an immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc
Unscaled offset
STTRH <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode

else
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STTRH Page 483

STTRH Page 484

STUMAX, STUMAXL
Atomic unsigned maximum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory,
treating the values as unsigned numbers.
• STUMAX does not have release semantics.
• STUMAXL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL. This means:
• The encodings in this description are named to match the encodings of LDUMAX, LDUMAXA, LDUMAXAL,
LDUMAXL.
• The description of LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
32-bit LDUMAX alias (size == 10 && R == 0)
STUMAX <Ws>, [<Xn|SP>]
is equivalent to
LDUMAX <Ws>, WZR, [<Xn|SP>]
32-bit LDUMAXL alias (size == 10 && R == 1)
STUMAXL <Ws>, [<Xn|SP>]
is equivalent to
LDUMAXL <Ws>, WZR, [<Xn|SP>]
64-bit LDUMAX alias (size == 11 && R == 0)
STUMAX <Xs>, [<Xn|SP>]
is equivalent to
LDUMAX <Xs>, XZR, [<Xn|SP>]
64-bit LDUMAXL alias (size == 11 && R == 1)
STUMAXL <Xs>, [<Xn|SP>]
is equivalent to
LDUMAXL <Xs>, XZR, [<Xn|SP>]
STUMAX, STUMAXL Page 485

Assembler Symbols
Operation
The description of LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL gives the operational pseudocode for this
instruction.
STUMAX, STUMAXL Page 486

STUMAXB, STUMAXLB
Atomic unsigned maximum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares
it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned
numbers.
• STUMAXB does not have release semantics.
• STUMAXLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB. This means:
• The encodings in this description are named to match the encodings of LDUMAXB, LDUMAXAB,
LDUMAXALB, LDUMAXLB.
• The description of LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB gives the operational pseudocode for
this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
STUMAXB <Ws>, [<Xn|SP>]
is equivalent to
LDUMAXB <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STUMAXLB <Ws>, [<Xn|SP>]
is equivalent to
LDUMAXLB <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB gives the operational pseudocode for this
instruction.
STUMAXB, STUMAXLB Page 487

STUMAXH, STUMAXLH
Atomic unsigned maximum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the larger value back to memory, treating the values as
unsigned numbers.
• STUMAXH does not have release semantics.
• STUMAXLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH. This means:
• The encodings in this description are named to match the encodings of LDUMAXH, LDUMAXAH,
LDUMAXALH, LDUMAXLH.
• The description of LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH gives the operational pseudocode for
this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
STUMAXH <Ws>, [<Xn|SP>]
is equivalent to
LDUMAXH <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STUMAXLH <Ws>, [<Xn|SP>]
is equivalent to
LDUMAXLH <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH gives the operational pseudocode for this
instruction.
STUMAXH, STUMAXLH Page 488

STUMIN, STUMINL
Atomic unsigned minimum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the smaller value back to
memory, treating the values as unsigned numbers.
• STUMIN does not have release semantics.
• STUMINL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDUMIN, LDUMINA, LDUMINAL, LDUMINL. This means:
• The encodings in this description are named to match the encodings of LDUMIN, LDUMINA, LDUMINAL,
LDUMINL.
• The description of LDUMIN, LDUMINA, LDUMINAL, LDUMINL gives the operational pseudocode for this
instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
32-bit LDUMIN alias (size == 10 && R == 0)
STUMIN <Ws>, [<Xn|SP>]
is equivalent to
LDUMIN <Ws>, WZR, [<Xn|SP>]
32-bit LDUMINL alias (size == 10 && R == 1)
STUMINL <Ws>, [<Xn|SP>]
is equivalent to
LDUMINL <Ws>, WZR, [<Xn|SP>]
64-bit LDUMIN alias (size == 11 && R == 0)
STUMIN <Xs>, [<Xn|SP>]
is equivalent to
LDUMIN <Xs>, XZR, [<Xn|SP>]
64-bit LDUMINL alias (size == 11 && R == 1)
STUMINL <Xs>, [<Xn|SP>]
is equivalent to
LDUMINL <Xs>, XZR, [<Xn|SP>]
STUMIN, STUMINL Page 489

Assembler Symbols
Operation
The description of LDUMIN, LDUMINA, LDUMINAL, LDUMINL gives the operational pseudocode for this instruction.
STUMIN, STUMINL Page 490

STUMINB, STUMINLB
Atomic unsigned minimum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned
numbers.
• STUMINB does not have release semantics.
• STUMINLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB. This means:
• The encodings in this description are named to match the encodings of LDUMINB, LDUMINAB, LDUMINALB,
LDUMINLB.
• The description of LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB gives the operational pseudocode for
this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
STUMINB <Ws>, [<Xn|SP>]
is equivalent to
LDUMINB <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STUMINLB <Ws>, [<Xn|SP>]
is equivalent to
LDUMINLB <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB gives the operational pseudocode for this
instruction.
STUMINB, STUMINLB Page 491

STUMINH, STUMINLH
Atomic unsigned minimum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the smaller value back to memory, treating the values as
unsigned numbers.
• STUMINH does not have release semantics.
• STUMINLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
This is an alias of LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH. This means:
• The encodings in this description are named to match the encodings of LDUMINH, LDUMINAH,
LDUMINALH, LDUMINLH.
• The description of LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH gives the operational pseudocode for
this instruction.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
STUMINH <Ws>, [<Xn|SP>]
is equivalent to
LDUMINH <Ws>, WZR, [<Xn|SP>]
Release (R == 1)
STUMINLH <Ws>, [<Xn|SP>]
is equivalent to
LDUMINLH <Ws>, WZR, [<Xn|SP>]
Assembler Symbols
Operation
The description of LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH gives the operational pseudocode for this
instruction.
STUMINH, STUMINLH Page 492

STUR
Store Register (unscaled) calculates an address from a base register value and an immediate offset, and stores a 32-bit
word or a 64-bit doubleword to the calculated address, from a register. For information about memory accesses, see
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc
32-bit (size == 10)
STUR <Wt>, [<Xn|SP>{, #<simm>}]
64-bit (size == 11)
STUR <Xt>, [<Xn|SP>{, #<simm>}]

Assembler Symbols
the "imm9" field.
Shared Decode

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STUR Page 493

STUR Page 494

STURB
Store Register Byte (unscaled) calculates an address from a base register value and an immediate offset, and stores a
byte to the calculated address, from a 32-bit register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
STURB <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(8) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STURB Page 495

STURH
Store Register Halfword (unscaled) calculates an address from a base register value and an immediate offset, and
stores a halfword to the calculated address, from a 32-bit register. For information about memory accesses, see Load/
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc
Unscaled offset
STURH <Wt>, [<Xn|SP>{, #<simm>}]
Assembler Symbols
the "imm9" field.
Shared Decode
Operation
bits(64) address;
bits(16) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
STURH Page 496

STXP
Store Exclusive Pair of registers stores two 32-bit words or two 64-bit doublewords from two registers to a memory
location if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was
successful, or of 1 if no store was performed. See Synchronization and semaphores. A 32-bit pair requires the address
to be doubleword aligned and is single-copy atomic at doubleword granularity. A 64-bit pair requires the address to be
quadword aligned and, if the Store-Exclusive succeeds, it causes a single-copy atomic update of the 128-bit memory
location being updated. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 0 1 Rs 0 Rt2 Rn Rt
L o0
32-bit (sz == 0)
STXP <Ws>, <Wt1>, <Wt2>, [<Xn|SP>{,#0}]
64-bit (sz == 1)
STXP <Ws>, <Xt1>, <Xt2>, [<Xn|SP>{,#0}]
integer t2 = UInt(Rt2); // ignored by load/store single register

UNPREDICTABLE behaviors, and particularly STXP.
Assembler Symbols
0
1

STXP Page 497

Operation
bits(64) address;
if s == t || (s == t2) then
case c of
if s == n && n != 31 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
else
bits(datasize DIV 2) el1 = X[t];
bits(datasize DIV 2) el2 = X[t2];
data = if BigEndian() then el1:el2 else el2:el1;
bit status = '1';
Mem[address, dbytes, AccType_ATOMIC] = data;
STXP Page 498

STXR
Store Exclusive Register stores a 32-bit word or a 64-bit doubleword from a register to memory if the PE has exclusive
access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was
performed. See Synchronization and semaphores. For information about memory accesses see Load/Store addressing
modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
32-bit (size == 10)
STXR <Ws>, <Wt>, [<Xn|SP>{,#0}]
64-bit (size == 11)
STXR <Ws>, <Xt>, [<Xn|SP>{,#0}]

UNPREDICTABLE behaviors, and particularly STXR.
Assembler Symbols
0
1

STXR Page 499

Operation
bits(64) address;
bits(elsize) data;
if s == t then
case c of
if s == n && n != 31 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
data = bits(elsize) UNKNOWN;
else
data = X[t];
bit status = '1';

Mem[address, dbytes, AccType_ATOMIC] = data;
STXR Page 500

STXRB
Store Exclusive Register Byte stores a byte from a register to memory if the PE has exclusive access to the memory
address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
No offset
STXRB <Ws>, <Wt>, [<Xn|SP>{,#0}]
UNPREDICTABLE behaviors, and particularly STXRB.
Assembler Symbols
0
1
Aborts
STXRB Page 501

Operation
bits(64) address;
bits(8) data;
if s == t then
case c of
if s == n && n != 31 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
else
data = X[t];
bit status = '1';

Mem[address, 1, AccType_ATOMIC] = data;
STXRB Page 502

STXRH
Store Exclusive Register Halfword stores a halfword from a register to memory if the PE has exclusive access to the
memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
No offset
STXRH <Ws>, <Wt>, [<Xn|SP>{,#0}]
Assembler Symbols
0
1

A non halfword-aligned memory address causes an Alignment fault Data Abort exception to be generated, subject to
the following rules:
STXRH Page 503

Operation
bits(64) address;
bits(16) data;
if s == t then
case c of
if s == n && n != 31 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
else
data = X[t];
bit status = '1';

Mem[address, 2, AccType_ATOMIC] = data;
STXRH Page 504

STZ2G
Store Allocation Tags, Zeroing stores an Allocation Tag to two Tag granules of memory, zeroing the associated data
locations. The address used for the store is calculated from the base register and an immediate signed offset scaled by
the Tag granule. The Allocation Tag is calculated from the Logical Address Tag in the source register.
Post-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 0 1 Xn Xt
Post-index
STZ2G <Xt|SP>, [<Xn|SP>], #<simm>

Pre-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 1 1 Xn Xt
Pre-index
STZ2G <Xt|SP>, [<Xn|SP>, #<simm>]!

Signed offset
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 1 0 Xn Xt
Signed offset
STZ2G <Xt|SP>, [<Xn|SP>{, #<simm>}]

STZ2G Page 505

Assembler Symbols
Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
Mem[address, TAG_GRANULE, AccType_NORMAL] = Zeros(8 * TAG_GRANULE);

Mem[address+TAG_GRANULE, TAG_GRANULE, AccType_NORMAL] = Zeros(8 * TAG_GRANULE);

AArch64.MemTag[address+TAG_GRANULE, AccType_NORMAL] = tag;
if writeback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
STZ2G Page 506

STZG
Store Allocation Tag, Zeroing stores an Allocation Tag to memory, zeroing the associated data location. The address
used for the store is calculated from the base register and an immediate signed offset scaled by the Tag granule. The
Allocation Tag is calculated from the Logical Address Tag in the source register.
Post-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 0 1 Xn Xt
Post-index
STZG <Xt|SP>, [<Xn|SP>], #<simm>

Pre-index
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 1 1 Xn Xt
Pre-index
STZG <Xt|SP>, [<Xn|SP>, #<simm>]!

Signed offset
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 1 0 Xn Xt
Signed offset
STZG <Xt|SP>, [<Xn|SP>{, #<simm>}]

STZG Page 507

Assembler Symbols
Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
Mem[address, TAG_GRANULE, AccType_NORMAL] = Zeros(TAG_GRANULE * 8);

if writeback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;
STZG Page 508

STZGM
Store Tag and Zero Multiple writes a naturally aligned block of N Allocation Tags and stores zero to the associated
data locations, where the size of N is identified in DCZID_EL0.BS, and the Allocation Tag written to address A is taken
from the source register bits<3:0>.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt
Integer
STZGM <Xt>, [<Xn|SP>]

Assembler Symbols
Operation

UNDEFINED;
bits(64) data = X[t];

bits(4) tag = data<3:0>;
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
integer size = 4 * (2 ^ (UInt(DCZID_EL0.BS)));

Mem[address, TAG_GRANULE, AccType_NORMAL] = Zeros(8 * TAG_GRANULE);
STZGM Page 509

SUB (extended register)
Subtract (extended register) subtracts a sign or zero-extended register value, followed by an optional left shift
amount, from a register value, and writes the result to the destination register. The argument that is extended from
the <Rm> register can be a byte, halfword, word, or doubleword.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op S
32-bit (sf == 0)
SUB <Wd|WSP>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}
64-bit (sf == 1)
SUB <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}
Assembler Symbols
field.
field.
field.
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
"Rm" field.
option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rd" or "Rn" is '11111' (WSP) and "option" is '010' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTW when "option" is '010'.
SUB (extended register) Page 510

option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rd" or "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
Operation
if d == 31 then
SP[] = result;
else
X[d] = result;
If PSTATE.DIT is 1:
SUB (extended register) Page 511

SUB (immediate)
Subtract (immediate) subtracts an optionally-shifted immediate value from a register value, and writes the result to
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 0 1 0 sh imm12 Rn Rd
op S
32-bit (sf == 0)
SUB <Wd|WSP>, <Wn|WSP>, #<imm>{, <shift>}
64-bit (sf == 1)
SUB <Xd|SP>, <Xn|SP>, #<imm>{, <shift>}
bits(datasize) imm;
case sh of
Assembler Symbols
field.
field.
sh <shift>
0 LSL #0
1 LSL #12
Operation
if d == 31 then
SP[] = result;
else
X[d] = result;
If PSTATE.DIT is 1:
SUB (immediate) Page 512


SUB (shifted register)
Subtract (shifted register) subtracts an optionally-shifted register value from a register value, and writes the result to
This instruction is used by the alias NEG (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op S
32-bit (sf == 0)
SUB <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
SUB <Xd>, <Xn>, <Xm>{, <shift> #<amount>}


Assembler Symbols
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
"imm6" field.
"imm6" field.
Alias Conditions

NEG (shifted register) Rn == '11111'
SUB (shifted register) Page 514

Operation
X[d] = result;
If PSTATE.DIT is 1:
SUB (shifted register) Page 515

SUBG
Subtract with Tag subtracts an immediate value scaled by the Tag granule from the address in the source register,
modifies the Logical Address Tag of the address using an immediate value, and writes the result to the destination
register. Tags specified in GCR_EL1.Exclude are excluded from the possible outputs when modifying the Logical
Address Tag.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 0 0 1 1 0 uimm6 (0) (0) uimm4 Xn Xd
op3
Integer
SUBG <Xd|SP>, <Xn|SP>, #<uimm6>, #<uimm4>

bits(64) offset = LSL(ZeroExtend(uimm6, 64), LOG2_TAG_GRANULE);
Assembler Symbols
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Xn" field.
<uimm6> Is an unsigned immediate, a multiple of 16 in the range 0 to 1008, encoded in the "uimm6" field.
<uimm4> Is an unsigned immediate, in the range 0 to 15, encoded in the "uimm4" field.
Operation

bits(4) start_tag = AArch64.AllocationTagFromAddress(operand1);
bits(16) exclude = GCR_EL1.Exclude;
bits(64) result;
bits(4) rtag;
rtag = AArch64.ChooseNonExcludedTag(start_tag, uimm4, exclude);
else
rtag = '0000';
(result, -) = AddWithCarry(operand1, NOT(offset), '1');
result = AArch64.AddressWithAllocationTag(result, AccType_NORMAL, rtag);
if d == 31 then
SP[] = result;
else
X[d] = result;
SUBG Page 516

SUBP
Subtract Pointer subtracts the 56-bit address held in the second source register from the 56-bit address held in the
first source register, sign-extends the result to 64-bits, and writes the result to the destination register.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn Xd
Integer
SUBP <Xd>, <Xn|SP>, <Xm|SP>
Assembler Symbols
field.
field.
Operation

bits(64) operand2 = if m == 31 then SP[] else X[m];
operand1 = SignExtend(operand1<55:0>, 64);
bits(64) result;
X[d] = result;
SUBP Page 517

SUBPS
Subtract Pointer, setting Flags subtracts the 56-bit address held in the second source register from the 56-bit address
held in the first source register, sign-extends the result to 64-bits, and writes the result to the destination register. It
updates the condition flags based on the result of the subtraction.
This instruction is used by the alias CMPP.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn Xd
Integer
SUBPS <Xd>, <Xn|SP>, <Xm|SP>
Assembler Symbols
field.
field.
Alias Conditions

CMPP S == '1' && Xd == '11111'
Operation

bits(64) operand2 = if m == 31 then SP[] else X[m];
bits(64) result;
bits(4) nzcv;
X[d] = result;
SUBPS Page 518

SUBS (extended register)
Subtract (extended register), setting flags, subtracts a sign or zero-extended register value, followed by an optional
left shift amount, from a register value, and writes the result to the destination register. The argument that is
extended from the <Rm> register can be a byte, halfword, word, or doubleword. It updates the condition flags based
on the result.
This instruction is used by the alias CMP (extended register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op S
32-bit (sf == 0)
SUBS <Wd>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}
64-bit (sf == 1)
SUBS <Xd>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}
Assembler Symbols
field.
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
"Rm" field.
option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
SUBS (extended register) Page 519

option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
Alias Conditions

CMP (extended register) Rd == '11111'
Operation
bits(4) nzcv;
X[d] = result;
If PSTATE.DIT is 1:
SUBS (extended register) Page 520

SUBS (immediate)
Subtract (immediate), setting flags, subtracts an optionally-shifted immediate value from a register value, and writes
This instruction is used by the alias CMP (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 0 1 0 sh imm12 Rn Rd
op S
32-bit (sf == 0)
SUBS <Wd>, <Wn|WSP>, #<imm>{, <shift>}
64-bit (sf == 1)
SUBS <Xd>, <Xn|SP>, #<imm>{, <shift>}
bits(datasize) imm;
case sh of
Assembler Symbols
sh <shift>
0 LSL #0
1 LSL #12
Alias Conditions

CMP (immediate) Rd == '11111'
Operation
bits(4) nzcv;
X[d] = result;
SUBS (immediate) Page 521

If PSTATE.DIT is 1:
SUBS (immediate) Page 522

SUBS (shifted register)
Subtract (shifted register), setting flags, subtracts an optionally-shifted register value from a register value, and writes
This instruction is used by the aliases CMP (shifted register), and NEGS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op S
32-bit (sf == 0)
SUBS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
SUBS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}


Assembler Symbols
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
"imm6" field.
"imm6" field.
Alias Conditions

CMP (shifted register) Rd == '11111'
NEGS Rn == '11111'
SUBS (shifted register) Page 523

Operation
bits(4) nzcv;
X[d] = result;
If PSTATE.DIT is 1:
SUBS (shifted register) Page 524

SVC
Supervisor Call causes an exception to be taken to EL1.

On executing an SVC instruction, the PE records the exception as a Supervisor Call exception in ESR_ELx, using the
EC value 0x15, and the value of the immediate argument.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 0 imm16 0 0 0 0 1
System
SVC #<imm>
// Empty.
Assembler Symbols
Operation
AArch64.CheckForSVCTrap(imm16);
AArch64.CallSupervisor(imm16);
SVC Page 525

SWP, SWPA, SWPAL, SWPL
Swap word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from a memory location,
and stores the value held in a register back to the same memory location. The value initially loaded from memory is
• If the destination register is not one of WZR or XZR, SWPA and SWPAL load from memory with acquire
semantics.
• SWPL and SWPAL store to memory with release semantics.
• SWP has neither acquire nor release semantics.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size
SWP, SWPA, SWPAL, SWPL Page 526

32-bit SWP (size == 10 && A == 0 && R == 0)
SWP <Ws>, <Wt>, [<Xn|SP>]
32-bit SWPA (size == 10 && A == 1 && R == 0)
SWPA <Ws>, <Wt>, [<Xn|SP>]
32-bit SWPAL (size == 10 && A == 1 && R == 1)
SWPAL <Ws>, <Wt>, [<Xn|SP>]
32-bit SWPL (size == 10 && A == 0 && R == 1)
SWPL <Ws>, <Wt>, [<Xn|SP>]
64-bit SWP (size == 11 && A == 0 && R == 0)
SWP <Xs>, <Xt>, [<Xn|SP>]
64-bit SWPA (size == 11 && A == 1 && R == 0)
SWPA <Xs>, <Xt>, [<Xn|SP>]
64-bit SWPAL (size == 11 && A == 1 && R == 1)
SWPAL <Xs>, <Xt>, [<Xn|SP>]
64-bit SWPL (size == 11 && A == 0 && R == 1)
SWPL <Xs>, <Xt>, [<Xn|SP>]

Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register to be stored, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register to be stored, encoded in the "Rs" field.

Operation
bits(64) address;
bits(datasize) store_value;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
store_value = X[s];
data = MemAtomic(address, MemAtomicOp_SWP, store_value, ldacctype, stacctype);

SWPB, SWPAB, SWPALB, SWPLB
Swap byte in memory atomically loads an 8-bit byte from a memory location, and stores the value held in a register
back to the same memory location. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, SWPAB and SWPALB load from memory with acquire semantics.
• SWPLB and SWPALB store to memory with release semantics.
• SWPB has neither acquire nor release semantics.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size
SWPAB (A == 1 && R == 0)
SWPAB <Ws>, <Wt>, [<Xn|SP>]
SWPALB (A == 1 && R == 1)
SWPALB <Ws>, <Wt>, [<Xn|SP>]
SWPB (A == 0 && R == 0)
SWPB <Ws>, <Wt>, [<Xn|SP>]
SWPLB (A == 0 && R == 1)
SWPLB <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
SWPB, SWPAB, SWPALB,

Page 529
SWPLB
Operation
bits(64) address;
bits(8) data;
bits(8) store_value;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
store_value = X[s];
SWPB, SWPAB, SWPALB,

Page 530
SWPLB
SWPH, SWPAH, SWPALH, SWPLH
Swap halfword in memory atomically loads a 16-bit halfword from a memory location, and stores the value held in a
register back to the same memory location. The value initially loaded from memory is returned in the destination
register.
• If the destination register is not WZR, SWPAH and SWPALH load from memory with acquire semantics.
• SWPLH and SWPALH store to memory with release semantics.
• SWPH has neither acquire nor release semantics.
Integer
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size
SWPAH (A == 1 && R == 0)
SWPAH <Ws>, <Wt>, [<Xn|SP>]
SWPALH (A == 1 && R == 1)
SWPALH <Ws>, <Wt>, [<Xn|SP>]
SWPH (A == 0 && R == 0)
SWPH <Ws>, <Wt>, [<Xn|SP>]
SWPLH (A == 0 && R == 1)
SWPLH <Ws>, <Wt>, [<Xn|SP>]

Assembler Symbols
SWPH, SWPAH, SWPALH,

Page 531
SWPLH
Operation
bits(64) address;
bits(16) data;
bits(16) store_value;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
store_value = X[s];
SWPH, SWPAH, SWPALH,

Page 532
SWPLH
SXTB
Signed Extend Byte extracts an 8-bit value from a register, sign-extends it to the size of the register, and writes the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N 0 0 0 0 0 0 0 0 0 1 1 1 Rn Rd
opc immr imms
32-bit (sf == 0 && N == 0)
SXTB <Wd>, <Wn>
is equivalent to
SBFM <Wd>, <Wn>, #0, #7
64-bit (sf == 1 && N == 1)
SXTB <Xd>, <Wn>
is equivalent to
SBFM <Xd>, <Xn>, #0, #7
Assembler Symbols
Operation
If PSTATE.DIT is 1:
SXTB Page 533

SXTH
Sign Extend Halfword extracts a 16-bit value, sign-extends it to the size of the register, and writes the result to the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N 0 0 0 0 0 0 0 0 1 1 1 1 Rn Rd
opc immr imms
32-bit (sf == 0 && N == 0)
SXTH <Wd>, <Wn>
is equivalent to
SBFM <Wd>, <Wn>, #0, #15
64-bit (sf == 1 && N == 1)
SXTH <Xd>, <Wn>
is equivalent to
SBFM <Xd>, <Xn>, #0, #15
Assembler Symbols
Operation
If PSTATE.DIT is 1:
SXTH Page 534

SXTW
Sign Extend Word sign-extends a word to the size of the register, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 Rn Rd
sf opc N immr imms
64-bit
SXTW <Xd>, <Wn>
is equivalent to
SBFM <Xd>, <Xn>, #0, #31
Assembler Symbols
Operation
If PSTATE.DIT is 1:
SXTW Page 535

SYS
System instruction. For more information, see Op0 equals 0b01, cache maintenance, TLB maintenance, and address
translation instructions for the encodings of System instructions.
This instruction is used by the aliases AT, CFP, CPP, DC, DVP, IC, and TLBI.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 CRn CRm op2 Rt
L
System
SYS #<op1>, <Cn>, <Cm>, #<op2>{, <Xt>}
AArch64.CheckSystemAccess('01', op1, CRn, CRm, op2, Rt, L);

Assembler Symbols
"Rt" field.
Alias Conditions

AT CRn == '0111' && CRm == '100x' && SysOp(op1,'0111',CRm,op2) == Sys_AT
CFP op1 == '011' && CRn == '0111' && CRm == '0011' && op2 == '100'
CPP op1 == '011' && CRn == '0111' && CRm == '0011' && op2 == '111'
DC CRn == '0111' && SysOp(op1,'0111',CRm,op2) == Sys_DC
DVP op1 == '011' && CRn == '0111' && CRm == '0011' && op2 == '101'
IC CRn == '0111' && SysOp(op1,'0111',CRm,op2) == Sys_IC
TLBI CRn == '1000' && SysOp(op1,'1000',CRm,op2) == Sys_TLBI
Operation
AArch64.SysInstr(1, sys_op1, sys_crn, sys_crm, sys_op2, X[t]);
SYS Page 536

SYSL
System instruction with result. For more information, see Op0 equals 0b01, cache maintenance, TLB maintenance, and
address translation instructions for the encodings of System instructions.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 1 0 1 op1 CRn CRm op2 Rt
L
System
SYSL <Xt>, #<op1>, <Cn>, <Cm>, #<op2>
AArch64.CheckSystemAccess('01', op1, CRn, CRm, op2, Rt, L);

Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose destination register, encoded in the "Rt" field.
Operation
// No architecturally defined instructions here.

X[t] = AArch64.SysInstrWithResult(1, sys_op1, sys_crn, sys_crm, sys_op2);
SYSL Page 537

TBNZ
Test bit and Branch if Nonzero compares the value of a bit in a general-purpose register with zero, and conditionally
branches to a label at a PC-relative offset if the comparison is not equal. It provides a hint that this is not a subroutine
call or return. This instruction does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
b5 0 1 1 0 1 1 1 b40 imm14 Rt
op
TBNZ <R><t>, #<imm>, <label>
integer datasize = if b5 == '1' then 64 else 32;

integer bit_pos = UInt(b5:b40);
Assembler Symbols
<R> Is a width specifier, encoded in “b5”:

b5 <R>
0 W
1 X
In assembler source code an 'X' specifier is always permitted, but a 'W' specifier is only permitted when
the bit number is less than 32.
<t> Is the number [0-30] of the general-purpose register to be tested or the name ZR (31), encoded in the
"Rt" field.
<imm> Is the bit number to be tested, in the range 0 to 63, encoded in "b5:b40".
the range +/-32KB, is encoded as "imm14" times 4.
Operation
bits(datasize) operand = X[t];
if operand<bit_pos> == op then
TBNZ Page 538

TBZ
Test bit and Branch if Zero compares the value of a test bit with zero, and conditionally branches to a label at a PC-
relative offset if the comparison is equal. It provides a hint that this is not a subroutine call or return. This instruction
does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
b5 0 1 1 0 1 1 0 b40 imm14 Rt
op
TBZ <R><t>, #<imm>, <label>
integer datasize = if b5 == '1' then 64 else 32;

integer bit_pos = UInt(b5:b40);
Assembler Symbols
<R> Is a width specifier, encoded in “b5”:

b5 <R>
0 W
1 X
In assembler source code an 'X' specifier is always permitted, but a 'W' specifier is only permitted when
the bit number is less than 32.
<t> Is the number [0-30] of the general-purpose register to be tested or the name ZR (31), encoded in the
"Rt" field.
<imm> Is the bit number to be tested, in the range 0 to 63, encoded in "b5:b40".
the range +/-32KB, is encoded as "imm14" times 4.
Operation
bits(datasize) operand = X[t];
if operand<bit_pos> == op then
TBZ Page 539

TLBI
TLB Invalidate operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 1 0 0 0 CRm op2 Rt
L CRn
System
TLBI <tlbi_op>{, <Xt>}
is equivalent to
SYS #<op1>, C8, <Cm>, #<op2>{, <Xt>}
and is the preferred disassembly when SysOp(op1,'1000',CRm,op2) == Sys_TLBI.
Assembler Symbols
<tlbi_op> Is a TLBI instruction name, as listed for the TLBI system instruction group, encoded in “op1:CRm:op2”:
TLBI Page 540

op1 CRm op2 <tlbi_op> Architectural Feature
000 0001 000 VMALLE1OS ARMv8.4-TLBI
000 0001 001 VAE1OS ARMv8.4-TLBI
000 0001 010 ASIDE1OS ARMv8.4-TLBI
000 0001 011 VAAE1OS ARMv8.4-TLBI
000 0001 101 VALE1OS ARMv8.4-TLBI
000 0001 111 VAALE1OS ARMv8.4-TLBI
000 0010 001 RVAE1IS ARMv8.4-TLBI
000 0010 011 RVAAE1IS ARMv8.4-TLBI
000 0010 101 RVALE1IS ARMv8.4-TLBI
000 0010 111 RVAALE1IS ARMv8.4-TLBI
000 0011 000 VMALLE1IS -
000 0011 001 VAE1IS -
000 0011 010 ASIDE1IS -
000 0011 011 VAAE1IS -
000 0011 101 VALE1IS -
000 0011 111 VAALE1IS -
000 0101 001 RVAE1OS ARMv8.4-TLBI
000 0101 011 RVAAE1OS ARMv8.4-TLBI
000 0101 101 RVALE1OS ARMv8.4-TLBI
000 0101 111 RVAALE1OS ARMv8.4-TLBI
000 0110 001 RVAE1 ARMv8.4-TLBI
000 0110 011 RVAAE1 ARMv8.4-TLBI
000 0110 101 RVALE1 ARMv8.4-TLBI
000 0110 111 RVAALE1 ARMv8.4-TLBI
000 0111 000 VMALLE1 -
000 0111 001 VAE1 -
000 0111 010 ASIDE1 -
000 0111 011 VAAE1 -
000 0111 101 VALE1 -
000 0111 111 VAALE1 -
100 0000 001 IPAS2E1IS -
100 0000 010 RIPAS2E1IS ARMv8.4-TLBI
100 0000 101 IPAS2LE1IS -
100 0000 110 RIPAS2LE1IS ARMv8.4-TLBI
100 0001 000 ALLE2OS ARMv8.4-TLBI
100 0001 110 VMALLS12E1OS ARMv8.4-TLBI
100 0011 000 ALLE2IS -
100 0011 001 VAE2IS -
100 0011 100 ALLE1IS -
100 0011 101 VALE2IS -
100 0011 110 VMALLS12E1IS -
100 0100 000 IPAS2E1OS ARMv8.4-TLBI
100 0100 001 IPAS2E1 -
100 0100 010 RIPAS2E1 ARMv8.4-TLBI
100 0100 011 RIPAS2E1OS ARMv8.4-TLBI
100 0100 100 IPAS2LE1OS ARMv8.4-TLBI
100 0100 101 IPAS2LE1 -
100 0100 110 RIPAS2LE1 ARMv8.4-TLBI
100 0100 111 RIPAS2LE1OS ARMv8.4-TLBI
100 0110 001 RVAE2 ARMv8.4-TLBI
100 0111 000 ALLE2 -
100 0111 001 VAE2 -
100 0111 100 ALLE1 -
100 0111 101 VALE2 -
100 0111 110 VMALLS12E1 -
TLBI Page 541

op1 CRm op2 <tlbi_op> Architectural Feature
110 0011 000 ALLE3IS -
110 0011 001 VAE3IS -
110 0011 101 VALE3IS -
110 0110 001 RVAE3 ARMv8.4-TLBI
110 0111 000 ALLE3 -
110 0111 001 VAE3 -
110 0111 101 VALE3 -
"Rt" field.
Operation
TLBI Page 542

TSB CSYNC
Trace Synchronization Barrier. This instruction is a barrier that synchronizes the trace operations of instructions.
If ARMv8.4-Trace is not implemented, this instruction executes as a NOP.
System
(Armv8.4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 1 1 1
CRm op2
System
TSB CSYNC
if !HaveSelfHostedTrace() then EndOfInstruction();
Operation
TSB CSYNC Page 543

TST (immediate)
Test bits (immediate), setting the condition flags and discarding the result: Rn AND imm.
This is an alias of ANDS (immediate). This means:
• The encodings in this description are named to match the encodings of ANDS (immediate).
• The description of ANDS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 1 0 0 N immr imms Rn 1 1 1 1 1
opc Rd
32-bit (sf == 0 && N == 0)
TST <Wn>, #<imm>
is equivalent to
ANDS WZR, <Wn>, #<imm>
64-bit (sf == 1)
TST <Xn>, #<imm>
is equivalent to
ANDS XZR, <Xn>, #<imm>
Assembler Symbols
Operation
The description of ANDS (immediate) gives the operational pseudocode for this instruction.
TST (immediate) Page 544

TST (shifted register)
Test (shifted register) performs a bitwise AND operation on a register value and an optionally-shifted register value. It
updates the condition flags based on the result, and discards the result.
This is an alias of ANDS (shifted register). This means:
• The encodings in this description are named to match the encodings of ANDS (shifted register).
• The description of ANDS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 0 shift 0 Rm imm6 Rn 1 1 1 1 1
opc N Rd
32-bit (sf == 0)
TST <Wn>, <Wm>{, <shift> #<amount>}
is equivalent to
ANDS WZR, <Wn>, <Wm>{, <shift> #<amount>}
64-bit (sf == 1)
TST <Xn>, <Xm>{, <shift> #<amount>}
is equivalent to
ANDS XZR, <Xn>, <Xm>{, <shift> #<amount>}
Assembler Symbols
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
"imm6" field.
"imm6" field,
Operation
The description of ANDS (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
TST (shifted register) Page 545

TST (shifted register) Page 546

UBFIZ
Unsigned Bitfield Insert in Zeros copies a bitfield of <width> bits from the least significant bits of the source register
to bit position <lsb> of the destination register, setting the destination bits above and below the bitfield to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
UBFIZ <Wd>, <Wn>, #<lsb>, #<width>
is equivalent to
UBFM <Wd>, <Wn>, #(-<lsb> MOD 32), #(<width>-1)
64-bit (sf == 1 && N == 1)
UBFIZ <Xd>, <Xn>, #<lsb>, #<width>
is equivalent to
UBFM <Xd>, <Xn>, #(-<lsb> MOD 64), #(<width>-1)
Assembler Symbols
Operation
If PSTATE.DIT is 1:
UBFIZ Page 547

UBFIZ Page 548

UBFM
Unigned Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
or 64 bits.
In both cases the destination bits below and above the bitfield are set to zero.
This instruction is used by the aliases LSL (immediate), LSR (immediate), UBFIZ, UBFX, UXTB, and UXTH.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
UBFM <Wd>, <Wn>, #<immr>, #<imms>
64-bit (sf == 1 && N == 1)
UBFM <Xd>, <Xn>, #<immr>, #<imms>
integer R;

R = UInt(immr);
Assembler Symbols
Alias Conditions

LSL (immediate) 32-bit imms != '011111' && imms + 1 == immr
LSL (immediate) 64-bit imms != '111111' && imms + 1 == immr
LSR (immediate) 32-bit imms == '011111'
UBFM Page 549

LSR (immediate) 64-bit imms == '111111'
UBFIZ UInt(imms) < UInt(immr)
UBFX BFXPreferred(sf, opc<1>, imms, immr)
UXTB immr == '000000' && imms == '000111'
UXTH immr == '000000' && imms == '001111'
Operation

bits(datasize) bot = ROR(src, R) AND wmask;

X[d] = bot AND tmask;
If PSTATE.DIT is 1:
UBFM Page 550

UBFX
Unsigned Bitfield Extract copies a bitfield of <width> bits starting from bit position <lsb> in the source register to the
least significant bits of the destination register, and sets destination bits above the bitfield to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc
32-bit (sf == 0 && N == 0)
UBFX <Wd>, <Wn>, #<lsb>, #<width>
is equivalent to
UBFM <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)
64-bit (sf == 1 && N == 1)
UBFX <Xd>, <Xn>, #<lsb>, #<width>
is equivalent to
UBFM <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)
Assembler Symbols
Operation
If PSTATE.DIT is 1:
UBFX Page 551

UBFX Page 552

UDF
Permanently Undefined generates an Undefined Instruction exception (ESR_ELx.EC = 0b000000). The encodings for
UDF used in this section are defined as permanently UNDEFINED in the Armv8-A architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 imm16
Integer
UDF #<imm>
// The imm16 field is ignored by hardware.

UNDEFINED;
Assembler Symbols
<imm> is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field. The PE ignores
the value of this constant.
Operation
// No operation.
UDF Page 553

UDIV
Unsigned Divide divides an unsigned integer register value by another unsigned integer register value, and writes the
result to the destination register. The condition flags are not affected.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 0 0 1 0 Rn Rd
o1
32-bit (sf == 0)
UDIV <Wd>, <Wn>, <Wm>
64-bit (sf == 1)
UDIV <Xd>, <Xn>, <Xm>
Assembler Symbols
Operation

integer result;
if IsZero(operand2) then
result = 0;
else
result = RoundTowardsZero(Real(Int(operand1, TRUE)) / Real(Int(operand2, TRUE)));
UDIV Page 554

UMADDL
Unsigned Multiply-Add Long multiplies two 32-bit register values, adds a 64-bit register value, and writes the result to
the 64-bit destination register.
This instruction is used by the alias UMULL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 0 Ra Rn Rd
U o0
64-bit
UMADDL <Xd>, <Wn>, <Wm>, <Xa>
Assembler Symbols
"Rn" field.
"Rm" field.
field.
Alias Conditions

UMULL Ra == '11111'
Operation

integer result;
result = Int(operand3, TRUE) + (Int(operand1, TRUE) * Int(operand2, TRUE));
If PSTATE.DIT is 1:
UMADDL Page 555

UMNEGL
Unsigned Multiply-Negate Long multiplies two 32-bit register values, negates the product, and writes the result to the
64-bit destination register.
This is an alias of UMSUBL. This means:
• The encodings in this description are named to match the encodings of UMSUBL.
• The description of UMSUBL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 1 1 1 1 1 1 Rn Rd
U o0 Ra
64-bit
UMNEGL <Xd>, <Wn>, <Wm>
is equivalent to
UMSUBL <Xd>, <Wn>, <Wm>, XZR
Assembler Symbols
"Rn" field.
"Rm" field.
Operation
The description of UMSUBL gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
UMNEGL Page 556

UMSUBL
Unsigned Multiply-Subtract Long multiplies two 32-bit register values, subtracts the product from a 64-bit register
value, and writes the result to the 64-bit destination register.
This instruction is used by the alias UMNEGL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 1 Ra Rn Rd
U o0
64-bit
UMSUBL <Xd>, <Wn>, <Wm>, <Xa>
Assembler Symbols
"Rn" field.
"Rm" field.
"Ra" field.
Alias Conditions

UMNEGL Ra == '11111'
Operation

integer result;
result = Int(operand3, TRUE) - (Int(operand1, TRUE) * Int(operand2, TRUE));

If PSTATE.DIT is 1:
UMSUBL Page 557

UMULH
Unsigned Multiply High multiplies two 64-bit register values, and writes bits[127:64] of the 128-bit result to the 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 1 0 Rm 0 (1) (1) (1) (1) (1) Rn Rd
U Ra
64-bit
UMULH <Xd>, <Xn>, <Xm>
Assembler Symbols
"Rn" field.
"Rm" field.
Operation

integer result;
result = Int(operand1, TRUE) * Int(operand2, TRUE);
X[d] = result<127:64>;
If PSTATE.DIT is 1:
UMULH Page 558

UMULL
Unsigned Multiply Long multiplies two 32-bit register values, and writes the result to the 64-bit destination register.
This is an alias of UMADDL. This means:
• The encodings in this description are named to match the encodings of UMADDL.
• The description of UMADDL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 0 1 1 1 1 1 Rn Rd
U o0 Ra
64-bit
UMULL <Xd>, <Wn>, <Wm>
is equivalent to
UMADDL <Xd>, <Wn>, <Wm>, XZR
Assembler Symbols
"Rn" field.
"Rm" field.
Operation
The description of UMADDL gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
UMULL Page 559

UXTB
Unsigned Extend Byte extracts an 8-bit value from a register, zero-extends it to the size of the register, and writes the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Rn Rd
sf opc N immr imms
32-bit
UXTB <Wd>, <Wn>
is equivalent to
UBFM <Wd>, <Wn>, #0, #7
Assembler Symbols
Operation
If PSTATE.DIT is 1:
UXTB Page 560

UXTH
Unsigned Extend Halfword extracts a 16-bit value from a register, zero-extends it to the size of the register, and writes
the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 Rn Rd
sf opc N immr imms
32-bit
UXTH <Wd>, <Wn>
is equivalent to
UBFM <Wd>, <Wn>, #0, #15
Assembler Symbols
Operation
If PSTATE.DIT is 1:
UXTH Page 561

WFE
Wait For Event is a hint instruction that indicates that the PE can enter a low-power state and remain there until a
wakeup event occurs. Wakeup events include the event signaled as a result of executing the SEV instruction on any PE
in the multiprocessor system. For more information, see Wait For Event mechanism and Send event.
As described in Wait For Event mechanism and Send event, the execution of a WFE instruction that would otherwise
cause entry to a low-power state can be trapped to a higher Exception level. See:
• Traps to EL1 of EL0 execution of WFE and WFI instructions.
• Traps to EL2 of Non-secure EL0 and EL1 execution of WFE and WFI instructions.
• Traps to EL3 of EL2, EL1, and EL0 execution of WFE and WFI instructions.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1
CRm op2
System
WFE
// Empty.
Operation
else
trap = FALSE;
- = SCTLR[];
target_el = EL1;
else
target_el = EL2;
else

target_el = EL3;
else

(delay_enabled, delay) = WFETrapDelay(target_el); // (If trap delay is enabled, Delay amount)
else
WaitForEvent();
WFE Page 562

WFI
Wait For Interrupt is a hint instruction that indicates that the PE can enter a low-power state and remain there until a
wakeup event occurs. For more information, see Wait For Interrupt.
As described in Wait For Interrupt, the execution of a WFI instruction that would otherwise cause entry to a low-power
state can be trapped to a higher Exception level. See:
• Traps to EL1 of EL0 execution of WFE and WFI instructions.
• Traps to EL2 of Non-secure EL0 and EL1 execution of WFE and WFI instructions.
• Traps to EL3 of EL2, EL1, and EL0 execution of WFE and WFI instructions.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1
CRm op2
System
WFI
// Empty.
Operation
WaitForInterrupt();
WFI Page 563

XAFLAG
Convert floating-point condition flags from external format to Arm format. This instruction converts the state of the
PSTATE.{N,Z,C,V} flags from an alternative representation required by some software to a form representing the
result of an Arm floating-point scalar compare instruction.
System
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 0 1 1 1 1 1 1
CRm
System
XAFLAG
if !HaveFlagFormatExt() then UNDEFINED;
Operation
bit N = NOT(PSTATE.C) AND NOT(PSTATE.Z);

bit Z = PSTATE.Z AND PSTATE.C;
bit C = PSTATE.C OR PSTATE.Z;
bit V = NOT(PSTATE.C) AND PSTATE.Z;
PSTATE.N = N;
PSTATE.Z = Z;
PSTATE.C = C;
PSTATE.V = V;
XAFLAG Page 564

XPACD, XPACI, XPACLRI
Strip Pointer Authentication Code. This instruction removes the pointer authentication code from an address. The
address is in the specified general-purpose register for XPACI and XPACD, and is in LR for XPACLRI.
The XPACD instruction is used for data addresses, and XPACI and XPACLRI are used for instruction addresses.
Integer
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 1 0 0 0 D 1 1 1 1 1 Rd
Rn
XPACD (D == 1)
XPACD <Xd>
XPACI (D == 0)
XPACI <Xd>
boolean data = (D == '1');

UNDEFINED;
System
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1
System
XPACLRI
integer d = 30;
boolean data = FALSE;
Assembler Symbols
Operation
X[d] = Strip(X[d], data);
XPACD, XPACI, XPACLRI Page 565

YIELD
YIELD is a hint instruction. Software with a multithreading capability can use a YIELD instruction to indicate to the PE
that it is performing a task, for example a spin-lock, that could be swapped out to improve overall system performance.
The PE can use this hint to suspend and resume multiple software threads if it supports the capability.
For more information about the recommended use of this instruction, see The YIELD instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1
CRm op2
System
YIELD
// Empty.
Operation
Hint_Yield();
YIELD Page 566

A64 -- SIMD and Floating-point Instructions (alphabetic order)
ABS: Absolute value (vector).
ADD (vector): Add (vector).
ADDHN, ADDHN2: Add returning High Narrow.
ADDP (scalar): Add Pair of elements (scalar).
ADDP (vector): Add Pairwise (vector).
ADDV: Add across Vector.
AESD: AES single round decryption.
AESE: AES single round encryption.
AESIMC: AES inverse mix columns.
AESMC: AES mix columns.
AND (vector): Bitwise AND (vector).
BCAX: Bit Clear and XOR.
BFCVT: Floating-point convert from single-precision to BFloat16 format (scalar).
BFCVTN, BFCVTN2: Floating-point convert from single-precision to BFloat16 format (vector).
BFDOT (by element): BFloat16 floating-point dot product (vector, by element).
BFDOT (vector): BFloat16 floating-point dot product (vector).
BFMLALB, BFMLALT (by element): BFloat16 floating-point widening multiply-add long (by element).
BFMLALB, BFMLALT (vector): BFloat16 floating-point widening multiply-add long (vector).
BFMMLA: BFloat16 floating-point matrix multiply-accumulate into 2x2 matrix.
BIC (vector, immediate): Bitwise bit Clear (vector, immediate).
BIC (vector, register): Bitwise bit Clear (vector, register).
BIF: Bitwise Insert if False.
BIT: Bitwise Insert if True.
BSL: Bitwise Select.
CLS (vector): Count Leading Sign bits (vector).
CLZ (vector): Count Leading Zero bits (vector).
CMEQ (register): Compare bitwise Equal (vector).
CMEQ (zero): Compare bitwise Equal to zero (vector).
CMGE (register): Compare signed Greater than or Equal (vector).
CMGE (zero): Compare signed Greater than or Equal to zero (vector).
CMGT (register): Compare signed Greater than (vector).
CMGT (zero): Compare signed Greater than zero (vector).
CMHI (register): Compare unsigned Higher (vector).
CMHS (register): Compare unsigned Higher or Same (vector).
Page 567
CMLE (zero): Compare signed Less than or Equal to zero (vector).
CMLT (zero): Compare signed Less than zero (vector).
CMTST: Compare bitwise Test bits nonzero (vector).
CNT: Population Count per byte.
DUP (element): Duplicate vector element to vector or scalar.
DUP (general): Duplicate general-purpose register to vector.
EOR (vector): Bitwise Exclusive OR (vector).
EOR3: Three-way Exclusive OR.
EXT: Extract vector from pair of vectors.
FABD: Floating-point Absolute Difference (vector).
FABS (scalar): Floating-point Absolute value (scalar).
FABS (vector): Floating-point Absolute value (vector).
FACGE: Floating-point Absolute Compare Greater than or Equal (vector).
FACGT: Floating-point Absolute Compare Greater than (vector).
FADD (scalar): Floating-point Add (scalar).
FADD (vector): Floating-point Add (vector).
FADDP (scalar): Floating-point Add Pair of elements (scalar).
FADDP (vector): Floating-point Add Pairwise (vector).
FCADD: Floating-point Complex Add.
FCCMP: Floating-point Conditional quiet Compare (scalar).
FCCMPE: Floating-point Conditional signaling Compare (scalar).
FCMEQ (register): Floating-point Compare Equal (vector).
FCMEQ (zero): Floating-point Compare Equal to zero (vector).
FCMGE (register): Floating-point Compare Greater than or Equal (vector).
FCMGE (zero): Floating-point Compare Greater than or Equal to zero (vector).
FCMGT (register): Floating-point Compare Greater than (vector).
FCMGT (zero): Floating-point Compare Greater than zero (vector).
FCMLA: Floating-point Complex Multiply Accumulate.
FCMLA (by element): Floating-point Complex Multiply Accumulate (by element).
FCMLE (zero): Floating-point Compare Less than or Equal to zero (vector).
FCMLT (zero): Floating-point Compare Less than zero (vector).
FCMP: Floating-point quiet Compare (scalar).
FCMPE: Floating-point signaling Compare (scalar).
FCSEL: Floating-point Conditional Select (scalar).
FCVT: Floating-point Convert precision (scalar).
FCVTAS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).
Page 568
FCVTAS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).
FCVTAU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).
FCVTAU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).
FCVTL, FCVTL2: Floating-point Convert to higher precision Long (vector).
FCVTMS (scalar): Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).
FCVTMS (vector): Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).
FCVTMU (scalar): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).
FCVTMU (vector): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).
FCVTN, FCVTN2: Floating-point Convert to lower precision Narrow (vector).
FCVTNS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).
FCVTNS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).
FCVTNU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).
FCVTNU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).
FCVTPS (scalar): Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).
FCVTPS (vector): Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).
FCVTPU (scalar): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).
FCVTPU (vector): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).
FCVTXN, FCVTXN2: Floating-point Convert to lower precision Narrow, rounding to odd (vector).
FCVTZS (scalar, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).
FCVTZS (scalar, integer): Floating-point Convert to Signed integer, rounding toward Zero (scalar).
FCVTZS (vector, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).
FCVTZS (vector, integer): Floating-point Convert to Signed integer, rounding toward Zero (vector).
FCVTZU (scalar, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).
FCVTZU (scalar, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).
FCVTZU (vector, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).
FCVTZU (vector, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (vector).
FDIV (scalar): Floating-point Divide (scalar).
FDIV (vector): Floating-point Divide (vector).
FJCVTZS: Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero.
FMADD: Floating-point fused Multiply-Add (scalar).
FMAX (scalar): Floating-point Maximum (scalar).
FMAX (vector): Floating-point Maximum (vector).
FMAXNM (scalar): Floating-point Maximum Number (scalar).
FMAXNM (vector): Floating-point Maximum Number (vector).
FMAXNMP (scalar): Floating-point Maximum Number of Pair of elements (scalar).
FMAXNMP (vector): Floating-point Maximum Number Pairwise (vector).
Page 569
FMAXNMV: Floating-point Maximum Number across Vector.
FMAXP (scalar): Floating-point Maximum of Pair of elements (scalar).
FMAXP (vector): Floating-point Maximum Pairwise (vector).
FMAXV: Floating-point Maximum across Vector.
FMIN (scalar): Floating-point Minimum (scalar).
FMIN (vector): Floating-point minimum (vector).
FMINNM (scalar): Floating-point Minimum Number (scalar).
FMINNM (vector): Floating-point Minimum Number (vector).
FMINNMP (scalar): Floating-point Minimum Number of Pair of elements (scalar).
FMINNMP (vector): Floating-point Minimum Number Pairwise (vector).
FMINNMV: Floating-point Minimum Number across Vector.
FMINP (scalar): Floating-point Minimum of Pair of elements (scalar).
FMINP (vector): Floating-point Minimum Pairwise (vector).
FMINV: Floating-point Minimum across Vector.
FMLA (by element): Floating-point fused Multiply-Add to accumulator (by element).
FMLA (vector): Floating-point fused Multiply-Add to accumulator (vector).
FMLAL, FMLAL2 (by element): Floating-point fused Multiply-Add Long to accumulator (by element).
FMLAL, FMLAL2 (vector): Floating-point fused Multiply-Add Long to accumulator (vector).
FMLS (by element): Floating-point fused Multiply-Subtract from accumulator (by element).
FMLS (vector): Floating-point fused Multiply-Subtract from accumulator (vector).
FMLSL, FMLSL2 (by element): Floating-point fused Multiply-Subtract Long from accumulator (by element).
FMLSL, FMLSL2 (vector): Floating-point fused Multiply-Subtract Long from accumulator (vector).
FMOV (general): Floating-point Move to or from general-purpose register without conversion.
FMOV (register): Floating-point Move register without conversion.
FMOV (scalar, immediate): Floating-point move immediate (scalar).
FMOV (vector, immediate): Floating-point move immediate (vector).
FMSUB: Floating-point Fused Multiply-Subtract (scalar).
FMUL (by element): Floating-point Multiply (by element).
FMUL (scalar): Floating-point Multiply (scalar).
FMUL (vector): Floating-point Multiply (vector).
FMULX: Floating-point Multiply extended.
FMULX (by element): Floating-point Multiply extended (by element).
FNEG (scalar): Floating-point Negate (scalar).
FNEG (vector): Floating-point Negate (vector).
FNMADD: Floating-point Negated fused Multiply-Add (scalar).
FNMSUB: Floating-point Negated fused Multiply-Subtract (scalar).
Page 570
FNMUL (scalar): Floating-point Multiply-Negate (scalar).
FRECPE: Floating-point Reciprocal Estimate.
FRECPS: Floating-point Reciprocal Step.
FRECPX: Floating-point Reciprocal exponent (scalar).
FRINT32X (scalar): Floating-point Round to 32-bit Integer, using current rounding mode (scalar).
FRINT32X (vector): Floating-point Round to 32-bit Integer, using current rounding mode (vector).
FRINT32Z (scalar): Floating-point Round to 32-bit Integer toward Zero (scalar).
FRINT32Z (vector): Floating-point Round to 32-bit Integer toward Zero (vector).
FRINT64X (scalar): Floating-point Round to 64-bit Integer, using current rounding mode (scalar).
FRINT64X (vector): Floating-point Round to 64-bit Integer, using current rounding mode (vector).
FRINT64Z (scalar): Floating-point Round to 64-bit Integer toward Zero (scalar).
FRINT64Z (vector): Floating-point Round to 64-bit Integer toward Zero (vector).
FRINTA (scalar): Floating-point Round to Integral, to nearest with ties to Away (scalar).
FRINTA (vector): Floating-point Round to Integral, to nearest with ties to Away (vector).
FRINTI (scalar): Floating-point Round to Integral, using current rounding mode (scalar).
FRINTI (vector): Floating-point Round to Integral, using current rounding mode (vector).
FRINTM (scalar): Floating-point Round to Integral, toward Minus infinity (scalar).
FRINTM (vector): Floating-point Round to Integral, toward Minus infinity (vector).
FRINTN (scalar): Floating-point Round to Integral, to nearest with ties to even (scalar).
FRINTN (vector): Floating-point Round to Integral, to nearest with ties to even (vector).
FRINTP (scalar): Floating-point Round to Integral, toward Plus infinity (scalar).
FRINTP (vector): Floating-point Round to Integral, toward Plus infinity (vector).
FRINTX (scalar): Floating-point Round to Integral exact, using current rounding mode (scalar).
FRINTX (vector): Floating-point Round to Integral exact, using current rounding mode (vector).
FRINTZ (scalar): Floating-point Round to Integral, toward Zero (scalar).
FRINTZ (vector): Floating-point Round to Integral, toward Zero (vector).
FRSQRTE: Floating-point Reciprocal Square Root Estimate.
FRSQRTS: Floating-point Reciprocal Square Root Step.
FSQRT (scalar): Floating-point Square Root (scalar).
FSQRT (vector): Floating-point Square Root (vector).
FSUB (scalar): Floating-point Subtract (scalar).
FSUB (vector): Floating-point Subtract (vector).
INS (element): Insert vector element from another vector element.
INS (general): Insert vector element from general-purpose register.
LD1 (multiple structures): Load multiple single-element structures to one, two, three, or four registers.
LD1 (single structure): Load one single-element structure to one lane of one register.
Page 571
LD1R: Load one single-element structure and Replicate to all lanes (of one register).
LD2 (multiple structures): Load multiple 2-element structures to two registers.
LD2 (single structure): Load single 2-element structure to one lane of two registers.
LD2R: Load single 2-element structure and Replicate to all lanes of two registers.
LD3 (multiple structures): Load multiple 3-element structures to three registers.
LD3 (single structure): Load single 3-element structure to one lane of three registers).
LD3R: Load single 3-element structure and Replicate to all lanes of three registers.
LD4 (multiple structures): Load multiple 4-element structures to four registers.
LD4 (single structure): Load single 4-element structure to one lane of four registers.
LD4R: Load single 4-element structure and Replicate to all lanes of four registers.
LDNP (SIMD&FP): Load Pair of SIMD&FP registers, with Non-temporal hint.
LDP (SIMD&FP): Load Pair of SIMD&FP registers.
LDR (immediate, SIMD&FP): Load SIMD&FP Register (immediate offset).
LDR (literal, SIMD&FP): Load SIMD&FP Register (PC-relative literal).
LDR (register, SIMD&FP): Load SIMD&FP Register (register offset).
LDUR (SIMD&FP): Load SIMD&FP Register (unscaled offset).
MLA (by element): Multiply-Add to accumulator (vector, by element).
MLA (vector): Multiply-Add to accumulator (vector).
MLS (by element): Multiply-Subtract from accumulator (vector, by element).
MLS (vector): Multiply-Subtract from accumulator (vector).
MOV (element): Move vector element to another vector element: an alias of INS (element).
MOV (from general): Move general-purpose register to a vector element: an alias of INS (general).
MOV (scalar): Move vector element to scalar: an alias of DUP (element).
MOV (to general): Move vector element to general-purpose register: an alias of UMOV.
MOV (vector): Move vector: an alias of ORR (vector, register).
MOVI: Move Immediate (vector).
MUL (by element): Multiply (vector, by element).
MUL (vector): Multiply (vector).
MVN: Bitwise NOT (vector): an alias of NOT.
MVNI: Move inverted Immediate (vector).
NEG (vector): Negate (vector).
NOT: Bitwise NOT (vector).
ORN (vector): Bitwise inclusive OR NOT (vector).
ORR (vector, immediate): Bitwise inclusive OR (vector, immediate).
ORR (vector, register): Bitwise inclusive OR (vector, register).
PMUL: Polynomial Multiply.
Page 572
PMULL, PMULL2: Polynomial Multiply Long.
RADDHN, RADDHN2: Rounding Add returning High Narrow.
RAX1: Rotate and Exclusive OR.
RBIT (vector): Reverse Bit order (vector).
REV16 (vector): Reverse elements in 16-bit halfwords (vector).
REV32 (vector): Reverse elements in 32-bit words (vector).
REV64: Reverse elements in 64-bit doublewords (vector).
RSHRN, RSHRN2: Rounding Shift Right Narrow (immediate).
RSUBHN, RSUBHN2: Rounding Subtract returning High Narrow.
SABA: Signed Absolute difference and Accumulate.
SABAL, SABAL2: Signed Absolute difference and Accumulate Long.
SABD: Signed Absolute Difference.
SABDL, SABDL2: Signed Absolute Difference Long.
SADALP: Signed Add and Accumulate Long Pairwise.
SADDL, SADDL2: Signed Add Long (vector).
SADDLP: Signed Add Long Pairwise.
SADDLV: Signed Add Long across Vector.
SADDW, SADDW2: Signed Add Wide.
SCVTF (scalar, fixed-point): Signed fixed-point Convert to Floating-point (scalar).
SCVTF (scalar, integer): Signed integer Convert to Floating-point (scalar).
SCVTF (vector, fixed-point): Signed fixed-point Convert to Floating-point (vector).
SCVTF (vector, integer): Signed integer Convert to Floating-point (vector).
SDOT (by element): Dot Product signed arithmetic (vector, by element).
SDOT (vector): Dot Product signed arithmetic (vector).
SHA1C: SHA1 hash update (choose).
SHA1H: SHA1 fixed rotate.
SHA1M: SHA1 hash update (majority).
SHA1P: SHA1 hash update (parity).
SHA1SU0: SHA1 schedule update 0.
SHA256H: SHA256 hash update (part 1).
SHA256H2: SHA256 hash update (part 2).
SHA512H: SHA512 Hash update part 1.
SHA512H2: SHA512 Hash update part 2.
Page 573
SHA512SU0: SHA512 Schedule Update 0.
SHA512SU1: SHA512 Schedule Update 1.
SHADD: Signed Halving Add.
SHL: Shift Left (immediate).
SHLL, SHLL2: Shift Left Long (by element size).
SHRN, SHRN2: Shift Right Narrow (immediate).
SHSUB: Signed Halving Subtract.
SLI: Shift Left and Insert (immediate).
SM3PARTW1: SM3PARTW1.
SM3PARTW2: SM3PARTW2.
SM3SS1: SM3SS1.
SM3TT1A: SM3TT1A.
SM3TT1B: SM3TT1B.
SM3TT2A: SM3TT2A.
SM3TT2B: SM3TT2B.
SM4E: SM4 Encode.
SM4EKEY: SM4 Key.
SMAX: Signed Maximum (vector).
SMAXP: Signed Maximum Pairwise.
SMAXV: Signed Maximum across Vector.
SMIN: Signed Minimum (vector).
SMINP: Signed Minimum Pairwise.
SMINV: Signed Minimum across Vector.
SMLAL, SMLAL2 (by element): Signed Multiply-Add Long (vector, by element).
SMLAL, SMLAL2 (vector): Signed Multiply-Add Long (vector).
SMLSL, SMLSL2 (by element): Signed Multiply-Subtract Long (vector, by element).
SMLSL, SMLSL2 (vector): Signed Multiply-Subtract Long (vector).
SMMLA (vector): Signed 8-bit integer matrix multiply-accumulate (vector).
SMOV: Signed Move vector element to general-purpose register.
SMULL, SMULL2 (by element): Signed Multiply Long (vector, by element).
SMULL, SMULL2 (vector): Signed Multiply Long (vector).
SQABS: Signed saturating Absolute value.
SQADD: Signed saturating Add.
SQDMLAL, SQDMLAL2 (by element): Signed saturating Doubling Multiply-Add Long (by element).
SQDMLAL, SQDMLAL2 (vector): Signed saturating Doubling Multiply-Add Long.
SQDMLSL, SQDMLSL2 (by element): Signed saturating Doubling Multiply-Subtract Long (by element).
Page 574
SQDMLSL, SQDMLSL2 (vector): Signed saturating Doubling Multiply-Subtract Long.
SQDMULH (by element): Signed saturating Doubling Multiply returning High half (by element).
SQDMULH (vector): Signed saturating Doubling Multiply returning High half.
SQDMULL, SQDMULL2 (by element): Signed saturating Doubling Multiply Long (by element).
SQDMULL, SQDMULL2 (vector): Signed saturating Doubling Multiply Long.
SQNEG: Signed saturating Negate.
SQRDMLAH (by element): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by
element).
SQRDMLAH (vector): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector).
SQRDMLSH (by element): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element).
SQRDMLSH (vector): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector).
SQRDMULH (by element): Signed saturating Rounding Doubling Multiply returning High half (by element).
SQRDMULH (vector): Signed saturating Rounding Doubling Multiply returning High half.
SQRSHL: Signed saturating Rounding Shift Left (register).
SQRSHRN, SQRSHRN2: Signed saturating Rounded Shift Right Narrow (immediate).
SQRSHRUN, SQRSHRUN2: Signed saturating Rounded Shift Right Unsigned Narrow (immediate).
SQSHL (immediate): Signed saturating Shift Left (immediate).
SQSHL (register): Signed saturating Shift Left (register).
SQSHLU: Signed saturating Shift Left Unsigned (immediate).
SQSHRN, SQSHRN2: Signed saturating Shift Right Narrow (immediate).
SQSHRUN, SQSHRUN2: Signed saturating Shift Right Unsigned Narrow (immediate).
SQSUB: Signed saturating Subtract.
SQXTN, SQXTN2: Signed saturating extract Narrow.
SQXTUN, SQXTUN2: Signed saturating extract Unsigned Narrow.
SRHADD: Signed Rounding Halving Add.
SRI: Shift Right and Insert (immediate).
SRSHL: Signed Rounding Shift Left (register).
SRSHR: Signed Rounding Shift Right (immediate).
SRSRA: Signed Rounding Shift Right and Accumulate (immediate).
SSHL: Signed Shift Left (register).
SSHLL, SSHLL2: Signed Shift Left Long (immediate).
SSHR: Signed Shift Right (immediate).
SSRA: Signed Shift Right and Accumulate (immediate).
SSUBL, SSUBL2: Signed Subtract Long.
SSUBW, SSUBW2: Signed Subtract Wide.
ST1 (multiple structures): Store multiple single-element structures from one, two, three, or four registers.
ST1 (single structure): Store a single-element structure from one lane of one register.
Page 575
ST2 (multiple structures): Store multiple 2-element structures from two registers.
ST2 (single structure): Store single 2-element structure from one lane of two registers.
ST3 (multiple structures): Store multiple 3-element structures from three registers.
ST3 (single structure): Store single 3-element structure from one lane of three registers.
ST4 (multiple structures): Store multiple 4-element structures from four registers.
ST4 (single structure): Store single 4-element structure from one lane of four registers.
STNP (SIMD&FP): Store Pair of SIMD&FP registers, with Non-temporal hint.
STP (SIMD&FP): Store Pair of SIMD&FP registers.
STR (immediate, SIMD&FP): Store SIMD&FP register (immediate offset).
STR (register, SIMD&FP): Store SIMD&FP register (register offset).
STUR (SIMD&FP): Store SIMD&FP register (unscaled offset).
SUB (vector): Subtract (vector).
SUBHN, SUBHN2: Subtract returning High Narrow.
SUDOT (by element): Dot product with signed and unsigned integers (vector, by element).
SUQADD: Signed saturating Accumulate of Unsigned value.
SXTL, SXTL2: Signed extend Long: an alias of SSHLL, SSHLL2.
TBL: Table vector Lookup.
TBX: Table vector lookup extension.
TRN1: Transpose vectors (primary).
TRN2: Transpose vectors (secondary).
UABA: Unsigned Absolute difference and Accumulate.
UABAL, UABAL2: Unsigned Absolute difference and Accumulate Long.
UABD: Unsigned Absolute Difference (vector).
UABDL, UABDL2: Unsigned Absolute Difference Long.
UADALP: Unsigned Add and Accumulate Long Pairwise.
UADDL, UADDL2: Unsigned Add Long (vector).
UADDLP: Unsigned Add Long Pairwise.
UADDLV: Unsigned sum Long across Vector.
UADDW, UADDW2: Unsigned Add Wide.
UCVTF (scalar, fixed-point): Unsigned fixed-point Convert to Floating-point (scalar).
UCVTF (scalar, integer): Unsigned integer Convert to Floating-point (scalar).
UCVTF (vector, fixed-point): Unsigned fixed-point Convert to Floating-point (vector).
UCVTF (vector, integer): Unsigned integer Convert to Floating-point (vector).
UDOT (by element): Dot Product unsigned arithmetic (vector, by element).
UDOT (vector): Dot Product unsigned arithmetic (vector).
UHADD: Unsigned Halving Add.
Page 576
UHSUB: Unsigned Halving Subtract.
UMAX: Unsigned Maximum (vector).
UMAXP: Unsigned Maximum Pairwise.
UMAXV: Unsigned Maximum across Vector.
UMIN: Unsigned Minimum (vector).
UMINP: Unsigned Minimum Pairwise.
UMINV: Unsigned Minimum across Vector.
UMLAL, UMLAL2 (by element): Unsigned Multiply-Add Long (vector, by element).
UMLAL, UMLAL2 (vector): Unsigned Multiply-Add Long (vector).
UMLSL, UMLSL2 (by element): Unsigned Multiply-Subtract Long (vector, by element).
UMLSL, UMLSL2 (vector): Unsigned Multiply-Subtract Long (vector).
UMMLA (vector): Unsigned 8-bit integer matrix multiply-accumulate (vector).
UMOV: Unsigned Move vector element to general-purpose register.
UMULL, UMULL2 (by element): Unsigned Multiply Long (vector, by element).
UMULL, UMULL2 (vector): Unsigned Multiply long (vector).
UQADD: Unsigned saturating Add.
UQRSHL: Unsigned saturating Rounding Shift Left (register).
UQRSHRN, UQRSHRN2: Unsigned saturating Rounded Shift Right Narrow (immediate).
UQSHL (immediate): Unsigned saturating Shift Left (immediate).
UQSHL (register): Unsigned saturating Shift Left (register).
UQSHRN, UQSHRN2: Unsigned saturating Shift Right Narrow (immediate).
UQSUB: Unsigned saturating Subtract.
UQXTN, UQXTN2: Unsigned saturating extract Narrow.
URECPE: Unsigned Reciprocal Estimate.
URHADD: Unsigned Rounding Halving Add.
URSHL: Unsigned Rounding Shift Left (register).
URSHR: Unsigned Rounding Shift Right (immediate).
URSQRTE: Unsigned Reciprocal Square Root Estimate.
URSRA: Unsigned Rounding Shift Right and Accumulate (immediate).
USDOT (by element): Dot Product with unsigned and signed integers (vector, by element).
USDOT (vector): Dot Product with unsigned and signed integers (vector).
USHL: Unsigned Shift Left (register).
USHLL, USHLL2: Unsigned Shift Left Long (immediate).
USHR: Unsigned Shift Right (immediate).
USMMLA (vector): Unsigned and signed 8-bit integer matrix multiply-accumulate (vector).
USQADD: Unsigned saturating Accumulate of Signed value.
Page 577
USRA: Unsigned Shift Right and Accumulate (immediate).
USUBL, USUBL2: Unsigned Subtract Long.
USUBW, USUBW2: Unsigned Subtract Wide.
UXTL, UXTL2: Unsigned extend Long: an alias of USHLL, USHLL2.
UZP1: Unzip vectors (primary).
UZP2: Unzip vectors (secondary).
XAR: Exclusive OR and Rotate.
XTN, XTN2: Extract Narrow.
ZIP1: Zip vectors (primary).
ZIP2: Zip vectors (secondary).
Page 578
ABS
Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP
register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U
Scalar
ABS <V><d>, <V><n>
if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean neg = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U
Vector
ABS <Vd>.<T>, <Vn>.<T>
if size:Q == '110' then UNDEFINED;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<T> Is an arrangement specifier, encoded in “size:Q”:
ABS Page 579

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer element;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
ABS Page 580

ADD (vector)
Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results
into a vector, and writes the vector to the destination SIMD&FP register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U
Scalar
ADD <V><d>, <V><n>, <V><m>
boolean sub_op = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U
Vector
ADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
ADD (vector) Page 581

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
element1 = Elem[operand1, e, esize];
if sub_op then
Elem[result, e, esize] = element1 - element2;
else
Elem[result, e, esize] = element1 + element2;
V[d] = result;
If PSTATE.DIT is 1:
ADD (vector) Page 582

ADDHN, ADDHN2
Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the
corresponding vector element in the second source SIMD&FP register, places the most significant half of the result
into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
The results are truncated. For rounded results, see RADDHN.
The ADDHN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
ADDHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 0 0 Rn Rd
U o1
Three registers, not all the same type
ADDHN{2} <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>
if size == '11' then UNDEFINED;

integer datasize = 64;
integer part = UInt(Q);
boolean sub_op = (o1 == '1');

boolean round = (U == '1');
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Tb> Is an arrangement specifier, encoded in “size:Q”:
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Ta> Is an arrangement specifier, encoded in “size”:
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
ADDHN, ADDHN2 Page 583

Operation
bits(2*datasize) operand1 = V[n];
bits(2*datasize) operand2 = V[m];
integer round_const = if round then 1 << (esize - 1) else 0;
bits(2*esize) element1;
bits(2*esize) sum;
element1 = Elem[operand1, e, 2*esize];
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
sum = sum + round_const;
Elem[result, e, esize] = sum<2*esize-1:esize>;
Vpart[d, part] = result;
If PSTATE.DIT is 1:
ADDHN, ADDHN2 Page 584

ADDP (scalar)
Add Pair of elements (scalar). This instruction adds two vector elements in the source SIMD&FP register and writes
the scalar result into the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 1 0 0 0 1 1 0 1 1 1 0 Rn Rd
Advanced SIMD
ADDP <V><d>, <Vn>.<T>

integer datasize = esize * 2;
Assembler Symbols
<V> Is the destination width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D
<T> Is the source arrangement specifier, encoded in “size”:
size <T>
0x RESERVED
10 RESERVED
11 2D
Operation
V[d] = Reduce(ReduceOp_ADD, operand, esize);
If PSTATE.DIT is 1:
ADDP (scalar) Page 585

ADDP (vector)
Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and
writes the vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 1 1 Rn Rd
Three registers of the same type
ADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
V[d] = result;
If PSTATE.DIT is 1:
ADDP (vector) Page 586

ADDP (vector) Page 587

ADDV
Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the
scalar result to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 1 1 0 Rn Rd
Advanced SIMD
ADDV <V><d>, <Vn>.<T>


Assembler Symbols

size <V>
00 B
01 H
10 S
11 RESERVED
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
V[d] = Reduce(ReduceOp_ADD, operand, esize);
If PSTATE.DIT is 1:
ADDV Page 588

ADDV Page 589

AESD
AES single round decryption.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 Rn Rd
D
Advanced SIMD
AESD <Vd>.16B, <Vn>.16B
if !HaveAESExt() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) operand1 = V[d];

bits(128) operand2 = V[n];
bits(128) result;
result = AESInvSubBytes(AESInvShiftRows(result));
V[d] = result;
If PSTATE.DIT is 1:
AESD Page 590

AESE
AES single round encryption.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 Rn Rd
D
Advanced SIMD
AESE <Vd>.16B, <Vn>.16B
Assembler Symbols
Operation

bits(128) result;
result = AESSubBytes(AESShiftRows(result));
V[d] = result;
If PSTATE.DIT is 1:
AESE Page 591

AESIMC
AES inverse mix columns.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 1 1 1 0 Rn Rd
D
Advanced SIMD
AESIMC <Vd>.16B, <Vn>.16B
Assembler Symbols
Operation
bits(128) operand = V[n];

bits(128) result;
result = AESInvMixColumns(operand);
V[d] = result;
If PSTATE.DIT is 1:
AESIMC Page 592

AESMC
AES mix columns.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 Rn Rd
D
Advanced SIMD
AESMC <Vd>.16B, <Vn>.16B
Assembler Symbols
Operation

bits(128) result;
result = AESMixColumns(operand);
V[d] = result;
If PSTATE.DIT is 1:
AESMC Page 593

AND (vector)
Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and
writes the result to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 Rm 0 0 0 1 1 1 Rn Rd
size
AND <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
<T> Is an arrangement specifier, encoded in “Q”:
Q <T>
0 8B
1 16B
Operation

V[d] = result;
If PSTATE.DIT is 1:
AND (vector) Page 594

BCAX
Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the
complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting
vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
This instruction is implemented only when ARMv8.2-SHA is implemented.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 0 1 Rm 0 Ra Rn Rd
Advanced SIMD
BCAX <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B
if !HaveSHA3Ext() then UNDEFINED;

Assembler Symbols
<Va> Is the name of the third SIMD&FP source register, encoded in the "Ra" field.
Operation
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Va = V[a];
V[d] = Vn EOR (Vm AND NOT(Va));
If PSTATE.DIT is 1:
BCAX Page 595

BFCVT
Floating-point convert from single-precision to BFloat16 format (scalar) converts the single-precision floating-point
value in the 32-bit SIMD&FP source register to BFloat16 format and writes the result in the 16-bit SIMD&FP
Unlike the BFloat16 multiplication instructions, this instruction honors all the control bits in the FPCR that apply to
single-precision arithmetic, including the rounding mode. This instruction can generate a floating-point exception that
causes a cumulative exception bit in the FPSR to be set, or a synchronous exception to be taken, depending on the
enable bits in the FPCR. ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.
Single-precision to BFloat16
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 Rn Rd
Single-precision to BFloat16
BFCVT <Hd>, <Sn>
if !HaveBF16Ext() then UNDEFINED;

Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
bits(16) result;
result = FPConvertBF(operand, FPCR);

V[d] = result;
BFCVT Page 596

BFCVTN, BFCVTN2
Floating-point convert from single-precision to BFloat16 format (vector) reads each single-precision element in the
SIMD&FP source vector, converts each value to BFloat16 format, and writes the results in the lower or upper half of
the SIMD&FP destination vector. The result elements are half the width of the source elements.
The BFCVTN instruction writes the half-width results to the lower half of the destination vector and clears the upper
half to zero, while the BFCVTN2 instruction writes the results to the upper half of the destination vector without
affecting the other bits in the register.
Unlike the BFloat16 multiplication instructions, this instruction honors all of the control bits in the FPCR that apply to
single-precision arithmetic, including the rounding mode. It can also generate a floating-point exception that causes
cumulative exception bits in the FPSR to be set, or a synchronous exception to be taken, depending on the enable bits
in the FPCR.
Vector single-precision to BFloat16

(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd
Vector single-precision to BFloat16
BFCVTN{2} <Vd>.<Ta>, <Vn>.4S

integer elements = 64 DIV 16;
Assembler Symbols
Q 2
0 [absent]
1 [present]
<Ta> Is an arrangement specifier, encoded in “Q”:
Q <Ta>
0 4H
1 8H
Operation
bits(64) result;
Elem[result, e, 16] = FPConvertBF(Elem[operand, e, 32], FPCR);
BFCVTN, BFCVTN2 Page 597

BFDOT (by element)
BFloat16 floating-point dot product (vector, by element). This instruction delimits the source vectors into pairs of
16-bit BF16 elements. Each pair of elements in the first source vector is multiplied by the specified pair of elements in
the second source vector. The resulting single-precision products are then summed and added destructively to the
single-precision element of the destination vector that aligns with the pair of BF16 values in the first source vector.
The instruction ignores the FPCR and does not update the FPSR exception status.
The BF16 pair within the second source vector is specified using an immediate index. The index range is from 0 to 3
inclusive. ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 1 L M Rm 1 1 1 1 H 0 Rn Rd
Vector
BFDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.2H[<index>]

integer m = UInt(M:Rm);
integer i = UInt(H:L);
integer elements = datasize DIV 32;
Assembler Symbols
Q <Ta>
0 2S
1 4S
<Tb> Is an arrangement specifier, encoded in “Q”:
Q <Tb>
0 4H
1 8H
<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the immediate index of a pair of 16-bit elements in the range 0 to 3, encoded in the "H:L" fields.
BFDOT (by element) Page 598

Operation
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(16) elt1_a = Elem[operand1, 2*e+0, 16];
bits(16) elt1_b = Elem[operand1, 2*e+1, 16];
bits(16) elt2_a = Elem[operand2, 2*i+0, 16];
bits(16) elt2_b = Elem[operand2, 2*i+1, 16];
bits(32) sum = BFAdd(BFMul(elt1_a, elt2_a), BFMul(elt1_b, elt2_b));

Elem[result, e, 32] = BFAdd(Elem[operand3, e, 32], sum);
V[d] = result;
BFDOT (by element) Page 599

BFDOT (vector)
BFloat16 floating-point dot product (vector). This instruction delimits the source vectors into pairs of 16-bit BF16
elements. Within each pair, the elements in the first source vector are multiplied by the corresponding elements in the
second source vector. The resulting single-precision products are then summed and added destructively to the single-
precision element of the destination vector that aligns with the pair of BF16 values in the first source vector. The
instruction ignores the FPCR and does not update the FPSR exception status.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 1 1 1 1 1 1 Rn Rd
Vector
BFDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 4H
1 8H
Operation

V[d] = result;
BFDOT (vector) Page 600

BFDOT (vector) Page 601

BFMLALB, BFMLALT (by element)
BFloat16 floating-point widening multiply-add long (by element) widens the even-numbered (bottom) or odd-numbered
(top) 16-bit elements in the first source vector, and the indexed element in the second source vector from Bfloat16 to
single-precision format. The instruction then multiplies and adds these values to the overlapping single-precision
elements of the destination vector.
This performs a fused multiply-add without intermediate rounding that honors all of the control bits in the FPCR that
apply to single-precision arithmetic, including the rounding mode. It can also generate a floating-point exception that
causes cumulative exception bits in the FPSR to be set, or a synchronous exception to be taken, depending on the
enable bits in the FPCR. ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 1 L M Rm 1 1 1 1 H 0 Rn Rd
Vector
BFMLAL<bt> <Vd>.4S, <Vn>.8H, <Vm>.H[<index>]

integer m = UInt('0':Rm);
integer index = UInt(H:L:M);

integer sel = UInt(Q);
Assembler Symbols
<bt> Is the bottom or top element specifier, encoded in “Q”:

Q <bt>
0 B
1 T
<Vm> Is the name of the second SIMD&FP source register, in the range V0 to V15, encoded in the "Rm" field.
<index> Is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.
Operation
bits(128) result;
bits(32) element2 = Elem[operand2, index, 16]:Zeros(16);
bits(32) element1 = Elem[operand1, 2*e+sel, 16]:Zeros(16);
bits(32) addend = Elem[operand3, e, 32];
Elem[result, e, 32] = FPMulAdd(addend, element1, element2, FPCR);
V[d] = result;
BFMLALB, BFMLALT (by

Page 602
element)
BFMLALB, BFMLALT (by

Page 603
element)
BFMLALB, BFMLALT (vector)
BFloat16 floating-point widening multiply-add long (vector) widens the even-numbered (bottom) or odd-numbered
(top) 16-bit elements in the first and second source vectors from Bfloat16 to single-precision format. The instruction
then multiplies and adds these values to the overlapping single-precision elements of the destination vector.
This performs a fused multiply-add without intermediate rounding that honors all of the control bits in the FPCR that
apply to single-precision arithmetic, including the rounding mode. It can also generate a floating-point exception that
causes cumulative exception bits in the FPSR to be set, or a synchronous exception to be taken, depending on the
enable bits in the FPCR. ID_AA64ISAR1_EL1.BF16 indicates whether these instruction is supported.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 1 1 1 1 1 1 Rn Rd
Vector
BFMLAL<bt> <Vd>.4S, <Vn>.8H, <Vm>.8H


integer sel = UInt(Q);
Assembler Symbols
<bt> Is the bottom or top element specifier, encoded in “Q”:

Q <bt>
0 B
1 T
Operation
bits(128) result;
bits(32) addend = Elem[operand3, e, 32];
Elem[result, e, 32] = FPMulAdd(addend, element1, element2, FPCR);
V[d] = result;
BFMLALB, BFMLALT (vector) Page 604

BFMMLA
BFloat16 floating-point matrix multiply-accumulate into 2x2 matrix. This instruction multiplies the 2x4 matrix of BF16
values held in the first 128-bit source vector by the 4x2 BF16 matrix in the second 128-bit source vector. The resulting
2x2 single-precision matrix product is then added destructively to the 2x2 single-precision matrix in the 128-bit
destination vector. This is equivalent to performing a 4-way dot product per destination element. The instruction
ignores the FPCR and does not update the FPSR exception status.
Arm expects that the BFMMLA instruction will deliver a peak BF16 multiply throughput that is at least as high as can
be achieved using two BFDOT instructions, with a goal that it should have significantly higher throughput.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 1 0 Rm 1 1 1 0 1 1 Rn Rd
Vector
BFMMLA <Vd>.4S, <Vn>.8H, <Vm>.8H

Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
Operation
bits(128) op1 = V[n];
bits(128) op2 = V[m];
bits(128) acc = V[d];
V[d] = BFMatMulAdd(acc, op1, op2);
BFMMLA Page 605

BIC (vector, immediate)
Bitwise bit Clear (vector, immediate). This instruction reads each vector element from the destination SIMD&FP
register, performs a bitwise AND between each result and the complement of an immediate constant, places the result
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 0 0 0 a b c x x x 1 0 1 d e f g h Rd
op cmode
16-bit (cmode == 10x1)
BIC <Vd>.<T>, #<imm8>{, LSL #<amount>}
32-bit (cmode == 0xx1)
BIC <Vd>.<T>, #<imm8>{, LSL #<amount>}
integer rd = UInt(Rd);

bits(datasize) imm;
bits(64) imm64;
ImmediateOp operation;
case cmode:op of
when '0xx01' operation = ImmediateOp_MVNI;
when '0xx11' operation = ImmediateOp_BIC;
when '10x01' operation = ImmediateOp_MVNI;
when '10x11' operation = ImmediateOp_BIC;
when '1110x' operation = ImmediateOp_MOVI;
when '11111'
// FMOV Dn,#imm is in main FP instruction set
if Q == '0' then UNDEFINED;
operation = ImmediateOp_MOVI;
imm64 = AdvSIMDExpandImm(op, cmode, a:b:c:d:e:f:g:h);

imm = Replicate(imm64, datasize DIV 64);
Assembler Symbols
<Vd> Is the name of the SIMD&FP register, encoded in the "Rd" field.
<T> For the 16-bit variant: is an arrangement specifier, encoded in “Q”:
Q <T>
0 4H
1 8H
For the 32-bit variant: is an arrangement specifier, encoded in “Q”:
Q <T>
0 2S
1 4S
<imm8> Is an 8-bit immediate encoded in "a:b:c:d:e:f:g:h".

<amount> For the 16-bit variant: is the shift amount encoded in “cmode<1>”:
cmode<1> <amount>
0 0
1 8
defaulting to 0 if LSL is omitted.
BIC (vector, immediate) Page 606

For the 32-bit variant: is the shift amount encoded in “cmode<2:1>”:
cmode<2:1> <amount>
00 0
01 8
10 16
11 24
Operation
bits(datasize) operand;
case operation of
when ImmediateOp_MOVI
result = imm;
when ImmediateOp_MVNI
result = NOT(imm);
when ImmediateOp_ORR
operand = V[rd];
result = operand OR imm;
when ImmediateOp_BIC
operand = V[rd];
result = operand AND NOT(imm);
V[rd] = result;
If PSTATE.DIT is 1:
BIC (vector, immediate) Page 607

BIC (vector, register)
Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP
register and the complement of the second source SIMD&FP register, and writes the result to the destination
SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 Rm 0 0 0 1 1 1 Rn Rd
size
BIC <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 8B
1 16B
Operation

V[d] = result;
If PSTATE.DIT is 1:
BIC (vector, register) Page 608

BIF
Bitwise Insert if False. This instruction inserts each bit from the first source SIMD&FP register into the destination
SIMD&FP register if the corresponding bit of the second source SIMD&FP register is 0, otherwise leaves the bit in the
destination register unchanged.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 Rm 0 0 0 1 1 1 Rn Rd
opc2
BIF <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
operand1 = V[d];
operand3 = NOT(V[m]);
V[d] = operand1 EOR ((operand1 EOR operand4) AND operand3);
If PSTATE.DIT is 1:
BIF Page 609

BIT
Bitwise Insert if True. This instruction inserts each bit from the first source SIMD&FP register into the SIMD&FP
destination register if the corresponding bit of the second source SIMD&FP register is 1, otherwise leaves the bit in
the destination register unchanged.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
opc2
BIT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
operand1 = V[d];
operand3 = V[m];
If PSTATE.DIT is 1:
BIT Page 610

BSL
Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the
first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 Rm 0 0 0 1 1 1 Rn Rd
opc2
BSL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
operand1 = V[m];
operand3 = V[d];
If PSTATE.DIT is 1:
BSL Page 611

CLS (vector)
Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant
bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the
result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most
significant bit itself.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 0 1 0 Rn Rd
U
Vector
CLS <Vd>.<T>, <Vn>.<T>

CountOp countop = if U == '1' then CountOp_CLZ else CountOp_CLS;
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer count;
if countop == CountOp_CLS then
count = CountLeadingSignBits(Elem[operand, e, esize]);
else
count = CountLeadingZeroBits(Elem[operand, e, esize]);
Elem[result, e, esize] = count<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
CLS (vector) Page 612

CLS (vector) Page 613

CLZ (vector)
Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most
significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the
vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 0 1 0 Rn Rd
U
Vector
CLZ <Vd>.<T>, <Vn>.<T>

CountOp countop = if U == '1' then CountOp_CLZ else CountOp_CLS;
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer count;
if countop == CountOp_CLS then
count = CountLeadingSignBits(Elem[operand, e, esize]);
else
count = CountLeadingZeroBits(Elem[operand, e, esize]);
V[d] = result;
If PSTATE.DIT is 1:
CLZ (vector) Page 614

CLZ (vector) Page 615

CMEQ (register)
Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP
register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is
equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets
every bit of the corresponding vector element in the destination SIMD&FP register to zero.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U
Scalar
CMEQ <V><d>, <V><n>, <V><m>
boolean and_test = (U == '0');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U
Vector
CMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
CMEQ (register) Page 616

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
boolean test_passed;
if and_test then
test_passed = !IsZero(element1 AND element2);
else
test_passed = (element1 == element2);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
If PSTATE.DIT is 1:
CMEQ (register) Page 617

CMEQ (zero)
Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register
and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP
register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to
zero.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op
Scalar
CMEQ <V><d>, <V><n>, #0

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op
Vector
CMEQ <Vd>.<T>, <Vn>.<T>, #0

case op:U of
CMEQ (zero) Page 618

Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element;
case comparison of
when CompareOp_GT test_passed = element > 0;
when CompareOp_GE test_passed = element >= 0;
when CompareOp_EQ test_passed = element == 0;
when CompareOp_LE test_passed = element <= 0;
when CompareOp_LT test_passed = element < 0;
V[d] = result;
If PSTATE.DIT is 1:
CMEQ (zero) Page 619

CMGE (register)
Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source
SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first
signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding
vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector
element in the destination SIMD&FP register to zero.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq
Scalar
CMGE <V><d>, <V><n>, <V><m>
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq
Vector
CMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
CMGE (register) Page 620

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element1;
integer element2;
element1 = Int(Elem[operand1, e, esize], unsigned);
test_passed = if cmp_eq then element1 >= element2 else element1 > element2;
V[d] = result;
If PSTATE.DIT is 1:
CMGE (register) Page 621

CMGE (zero)
Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source
SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op
Scalar
CMGE <V><d>, <V><n>, #0

case op:U of
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op
Vector
CMGE <Vd>.<T>, <Vn>.<T>, #0

case op:U of
CMGE (zero) Page 622

Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element;
case comparison of
V[d] = result;
If PSTATE.DIT is 1:
CMGE (zero) Page 623

CMGT (register)
Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP
register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer
value is greater than the second signed integer value sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq
Scalar
CMGT <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq
Vector
CMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
CMGT (register) Page 624

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element1;
integer element2;
V[d] = result;
If PSTATE.DIT is 1:
CMGT (register) Page 625

CMGT (zero)
Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP
register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op
Scalar
CMGT <V><d>, <V><n>, #0

case op:U of
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op
Vector
CMGT <Vd>.<T>, <Vn>.<T>, #0

case op:U of
CMGT (zero) Page 626

Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element;
case comparison of
V[d] = result;
If PSTATE.DIT is 1:
CMGT (zero) Page 627

CMHI (register)
Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP
register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned
integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in
the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the
destination SIMD&FP register to zero.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq
Scalar
CMHI <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq
Vector
CMHI <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
CMHI (register) Page 628

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element1;
integer element2;
V[d] = result;
If PSTATE.DIT is 1:
CMHI (register) Page 629

CMHS (register)
Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source
SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first
unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the
corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the
corresponding vector element in the destination SIMD&FP register to zero.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq
Scalar
CMHS <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq
Vector
CMHS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
CMHS (register) Page 630

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element1;
integer element2;
V[d] = result;
If PSTATE.DIT is 1:
CMHS (register) Page 631

CMLE (zero)
Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source
SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op
Scalar
CMLE <V><d>, <V><n>, #0

case op:U of
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op
Vector
CMLE <Vd>.<T>, <Vn>.<T>, #0

case op:U of
CMLE (zero) Page 632

Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element;
case comparison of
V[d] = result;
If PSTATE.DIT is 1:
CMLE (zero) Page 633

CMLT (zero)
Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register
and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination
SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP
register to zero.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 0 1 0 Rn Rd
Scalar
CMLT <V><d>, <V><n>, #0

CompareOp comparison = CompareOp_LT;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 0 1 0 Rn Rd
Vector
CMLT <Vd>.<T>, <Vn>.<T>, #0

Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
CMLT (zero) Page 634

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element;
case comparison of
V[d] = result;
If PSTATE.DIT is 1:
CMLT (zero) Page 635

CMTST
Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP
register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the
result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U
Scalar
CMTST <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U
Vector
CMTST <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
CMTST Page 636

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
if and_test then
test_passed = !IsZero(element1 AND element2);
else
test_passed = (element1 == element2);
V[d] = result;
If PSTATE.DIT is 1:
CMTST Page 637

CNT
Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element
in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd
Vector
CNT <Vd>.<T>, <Vn>.<T>

integer esize = 8;
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED
Operation
integer count;
count = BitCount(Elem[operand, e, esize]);
V[d] = result;
If PSTATE.DIT is 1:
CNT Page 638

DUP (element)
Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element
index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the
destination SIMD&FP register.
This instruction is used by the alias MOV (scalar).
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd
Scalar
DUP <V><d>, <Vn>.<T>[<index>]
integer size = LowestSetBit(imm5);

if size > 3 then UNDEFINED;
integer index = UInt(imm5<4:size+1>);

integer idxdsize = if imm5<4> == '1' then 128 else 64;
integer esize = 8 << size;

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd
Vector
DUP <Vd>.<T>, <Vn>.<Ts>[<index>]


if size == 3 && Q == '0' then UNDEFINED;

Assembler Symbols
<T> For the scalar variant: is the element width specifier, encoded in “imm5”:
DUP (element) Page 639

imm5 <T>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
For the vector variant: is an arrangement specifier, encoded in “imm5:Q”:
imm5 Q <T>
x0000 x RESERVED
xxxx1 0 8B
xxxx1 1 16B
xxx10 0 4H
xxx10 1 8H
xx100 0 2S
xx100 1 4S
x1000 0 RESERVED
x1000 1 2D
<Ts> Is an element size specifier, encoded in “imm5”:

imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
<V> Is the destination width specifier, encoded in “imm5”:

imm5 <V>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
<index> Is the element index encoded in “imm5”:
imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
Operation
bits(idxdsize) operand = V[n];
bits(esize) element;
element = Elem[operand, index, esize];

Elem[result, e, esize] = element;
V[d] = result;
If PSTATE.DIT is 1:


DUP (general)
Duplicate general-purpose register to vector. This instruction duplicates the contents of the source general-purpose
register into a scalar or each element in a vector, and writes the result to the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 0 0 1 1 Rn Rd
Advanced SIMD
DUP <Vd>.<T>, <R><n>

// imm5<4:size+1> is IGNORED
if size == 3 && Q == '0' then UNDEFINED;

Assembler Symbols
<T> Is an arrangement specifier, encoded in “imm5:Q”:
imm5 Q <T>
x0000 x RESERVED
xxxx1 0 8B
xxxx1 1 16B
xxx10 0 4H
xxx10 1 8H
xx100 0 2S
xx100 1 4S
x1000 0 RESERVED
x1000 1 2D
<R> Is the width specifier for the general-purpose source register, encoded in “imm5”:
imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X
Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.
<n> Is the number [0-30] of the general-purpose source register or ZR (31), encoded in the "Rn" field.
Operation
bits(esize) element = X[n];
V[d] = result;
DUP (general) Page 642

If PSTATE.DIT is 1:
DUP (general) Page 643

EOR3
Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 0 0 Rm 0 Ra Rn Rd
Advanced SIMD
EOR3 <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B

Assembler Symbols
Operation
V[d] = Vn EOR Vm EOR Va;
If PSTATE.DIT is 1:
EOR3 Page 644

EOR (vector)
Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source
SIMD&FP registers, and places the result in the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 Rm 0 0 0 1 1 1 Rn Rd
opc2
EOR <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
operand1 = V[m];
operand2 = Zeros();
operand3 = Ones();
If PSTATE.DIT is 1:
EOR (vector) Page 645

EXT
Extract vector from pair of vectors. This instruction extracts the lowest vector elements from the second source
SIMD&FP register and the highest vector elements from the first source SIMD&FP register, concatenates the results
into a vector, and writes the vector to the destination SIMD&FP register vector. The index value specifies the lowest
vector element to extract from the first source register, and consecutive elements are extracted from the first, then
second, source registers until the destination vector is filled.
The following figure shows an example of the operation of EXT doubleword operation for Q = 0 and imm4<2:0> = 3.
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Vm Vn
Vd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 0 Rm 0 imm4 0 Rn Rd
Advanced SIMD
EXT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<index>
if Q == '0' && imm4<3> == '1' then UNDEFINED;

integer position = UInt(imm4) << 3;
Assembler Symbols
Q <T>
0 8B
1 16B
<index> Is the lowest numbered byte element to be extracted, encoded in “Q:imm4”:
Q imm4<3> <index>
0 0 imm4<2:0>
0 1 RESERVED
1 x imm4
Operation
bits(datasize) hi = V[m];
bits(datasize) lo = V[n];
bits(datasize*2) concat = hi:lo;
V[d] = concat<position+datasize-1:position>;
EXT Page 646

If PSTATE.DIT is 1:
EXT Page 647

FABD
Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the
second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source
SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination
SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
Scalar half precision

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd
FABD <Hd>, <Hn>, <Hm>
if !HaveFP16Ext() then UNDEFINED;
integer esize = 16;
boolean abs = TRUE;
Scalar single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd
FABD <V><d>, <V><n>, <V><m>
integer esize = 32 << UInt(sz);
boolean abs = TRUE;
Vector half precision

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd
U
FABD Page 648

FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
boolean abs = (U == '1');
Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U
FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
if sz:Q == '10' then UNDEFINED;
Assembler Symbols
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<V> Is a width specifier, encoded in “sz”:
sz <V>
0 S
1 D
<T> For the vector half precision variant: is an arrangement specifier, encoded in “Q”:
Q <T>
0 4H
1 8H
For the vector single-precision and double-precision variant: is an arrangement specifier, encoded in
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
FABD Page 649

Operation
bits(esize) diff;
diff = FPSub(element1, element2, FPCR);
Elem[result, e, esize] = if abs then FPAbs(diff) else diff;
V[d] = result;
FABD Page 650

FABS (vector)
Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the
source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 Rn Rd
U
Half-precision
FABS <Vd>.<T>, <Vn>.<T>
integer esize = 16;

Single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 1 1 0 Rn Rd
U
FABS <Vd>.<T>, <Vn>.<T>

Assembler Symbols
<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
FABS (vector) Page 651

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
element = Elem[operand, e, esize];
if neg then
element = FPNeg(element);
else
element = FPAbs(element);
V[d] = result;
FABS (vector) Page 652

FABS (scalar)
Floating-point Absolute value (scalar). This instruction calculates the absolute value in the SIMD&FP source register
and writes the result to the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 1 1 0 0 0 0 Rn Rd
opc
Half-precision (ftype == 11)

(Armv8.2)
FABS <Hd>, <Hn>
Single-precision (ftype == 00)
FABS <Sd>, <Sn>
Double-precision (ftype == 01)
FABS <Dd>, <Dn>
integer datasize;
case ftype of
when '00' datasize = 32;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
Operation
result = FPAbs(operand);
V[d] = result;
FABS (scalar) Page 653

FABS (scalar) Page 654

FACGE
Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each
floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point
value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets
every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of
the corresponding vector element in the destination SIMD&FP register to zero.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac
FACGE <Hd>, <Hn>, <Hm>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac
FACGE Page 655

FACGE <V><d>, <V><n>, <V><m>
CompareOp cmp;
boolean abs;
case E:U:ac of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac
FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac
FACGE Page 656

FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
CompareOp cmp;
boolean abs;
case E:U:ac of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
FACGE Page 657

Operation
if abs then
element1 = FPAbs(element1);
case cmp of
when CompareOp_EQ test_passed = FPCompareEQ(element1, element2, FPCR);
when CompareOp_GE test_passed = FPCompareGE(element1, element2, FPCR);
when CompareOp_GT test_passed = FPCompareGT(element1, element2, FPCR);
V[d] = result;
FACGE Page 658

FACGT
Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector
element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the
second source SIMD&FP register and if the first value is greater than the second value sets every bit of the
corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the
corresponding vector element in the destination SIMD&FP register to zero.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac
FACGT <Hd>, <Hn>, <Hm>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac
FACGT Page 659

FACGT <V><d>, <V><n>, <V><m>
CompareOp cmp;
boolean abs;
case E:U:ac of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac
FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac
FACGT Page 660

FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
CompareOp cmp;
boolean abs;
case E:U:ac of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
FACGT Page 661

Operation
if abs then
case cmp of
V[d] = result;
FACGT Page 662

FADD (vector)
Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP
registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in
this instruction are floating-point values.
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 1 0 1 Rn Rd
U
Half-precision
FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
boolean pair = (U == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U
FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
FADD (vector) Page 663

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
Elem[result, e, esize] = FPAdd(element1, element2, FPCR);
V[d] = result;
FADD (vector) Page 664

FADD (scalar)
Floating-point Add (scalar). This instruction adds the floating-point values of the two source SIMD&FP registers, and
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 1 0 Rn Rd
op

(Armv8.2)
FADD <Hd>, <Hn>, <Hm>
FADD <Sd>, <Sn>, <Sm>
FADD <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
FADD (scalar) Page 665

Operation
result = FPAdd(operand1, operand2, FPCR);
V[d] = result;
FADD (scalar) Page 666

FADDP (scalar)
Floating-point Add Pair of elements (scalar). This instruction adds two floating-point vector elements in the source
SIMD&FP register and writes the scalar result into the destination SIMD&FP register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 1 1 0 Rn Rd
sz
Half-precision
FADDP <V><d>, <Vn>.<T>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 1 1 0 Rn Rd
FADDP <V><d>, <Vn>.<T>
integer esize = 32;

Assembler Symbols
<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 H
1 RESERVED
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 D
<T> For the half-precision variant: is the source arrangement specifier, encoded in “sz”:
FADDP (scalar) Page 667

sz <T>
0 2H
1 RESERVED
For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:
sz <T>
0 2S
1 2D
Operation
V[d] = Reduce(ReduceOp_FADD, operand, esize);
FADDP (scalar) Page 668

FADDP (vector)
Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first
source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of
adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a
vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point
values.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 1 0 1 Rn Rd
U
Half-precision
FADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U
FADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
FADDP (vector) Page 669

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
Elem[result, e, esize] = FPAdd(element1, element2, FPCR);
V[d] = result;
FADDP (vector) Page 670

FCADD
Floating-point Complex Add.

This instruction operates on complex numbers that are represented in SIMD&FP registers as pairs of elements, with
the more significant element holding the imaginary part of the number and the less significant element holding the
real part of the number. Each element holds a floating-point value. It performs the following computation on the
corresponding complex number element pairs from the two source registers:
• Considering the complex number from the second source register on an Argand diagram, the number is
rotated counterclockwise by 90 or 270 degrees.
• The rotated complex number is added to the complex number from the first source register.
exception traps.
Vector
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 1 1 rot 0 1 Rn Rd
Vector
FCADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<rotate>
if !HaveFCADDExt() then UNDEFINED;

if Q == '0' && size == '11' then UNDEFINED;
if !HaveFP16Ext() && esize == 16 then UNDEFINED;
Assembler Symbols
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<rotate> Is the rotation, encoded in “rot”:
rot <rotate>
0 90
1 270
FCADD Page 671

Operation
for e = 0 to (elements DIV 2)-1

case rot of
when '0'
element1 = FPNeg(Elem[operand2, e*2+1, esize]);
element3 = Elem[operand2, e*2, esize];
when '1'
element1 = Elem[operand2, e*2+1, esize];
element3 = FPNeg(Elem[operand2, e*2, esize]);
Elem[result, e*2, esize] = FPAdd(Elem[operand1, e*2, esize], element1, FPCR);
Elem[result, e*2+1, esize] = FPAdd(Elem[operand1, e*2+1, esize], element3, FPCR);
V[d] = result;
FCADD Page 672

FCCMP
Floating-point Conditional quiet Compare (scalar). This instruction compares the two SIMD&FP source register values
and writes the result to the PSTATE.{N, Z, C, V} flags. If the condition does not pass then the PSTATE.{N, Z, C, V}
flags are set to the flag bit specifier.
It raises an Invalid Operation exception only if either operand is a signaling NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 0 1 Rn 0 nzcv
op

(Armv8.2)
FCCMP <Hn>, <Hm>, #<nzcv>, <cond>
FCCMP <Sn>, <Sm>, #<nzcv>, <cond>
FCCMP <Dn>, <Dm>, #<nzcv>, <cond>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
NaNs
FCCMP Page 673

The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or
both of the operands are NaNs, they are unordered, and all three of (Operand1 < Operand2), (Operand1 ==
Operand2) and (Operand1 > Operand2) are false. This case results in the FPSCR flags being set to N=0, Z=0, C=1,
and V=1.
Operation

operand2 = V[m];
flags = FPCompare(operand1, operand2, FALSE, FPCR);
FCCMP Page 674

FCCMPE
Floating-point Conditional signaling Compare (scalar). This instruction compares the two SIMD&FP source register
values and writes the result to the PSTATE.{N, Z, C, V} flags. If the condition does not pass then the PSTATE.{N, Z, C,
V} flags are set to the flag bit specifier.
If either operand is any type of NaN, or if either operand is a signaling NaN, the instruction raises an Invalid
Operation exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 0 1 Rn 1 nzcv
op

(Armv8.2)
FCCMPE <Hn>, <Hm>, #<nzcv>, <cond>
FCCMPE <Sn>, <Sm>, #<nzcv>, <cond>
FCCMPE <Dn>, <Dm>, #<nzcv>, <cond>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FCCMPE Page 675

NaNs
and V=1.
FCCMPE raises an Invalid Operation exception if either operand is any type of NaN, and is suitable for testing for <, <=,
>, >=, and other predicates that raise an exception when the operands are unordered.
Operation

operand2 = V[m];
flags = FPCompare(operand1, operand2, TRUE, FPCR);
FCCMPE Page 676

FCMEQ (register)
Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source
SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the
comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
FCMEQ <Hd>, <Hn>, <Hm>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
FCMEQ (register) Page 677

FCMEQ <V><d>, <V><n>, <V><m>
CompareOp cmp;
boolean abs;
case E:U:ac of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
CompareOp cmp;
boolean abs;
case E:U:ac of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
if abs then
case cmp of
V[d] = result;

FCMEQ (zero)
Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP
register and if the value is equal to zero sets every bit of the corresponding vector element in the destination
SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP
register to zero.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op
FCMEQ <Hd>, <Hn>, #0.0
integer esize = 16;

case op:U of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op
FCMEQ (zero) Page 681

FCMEQ <V><d>, <V><n>, #0.0

case op:U of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op
FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0
integer esize = 16;

case op:U of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0

case op:U of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
bits(esize) zero = FPZero('0');
case comparison of
when CompareOp_GT test_passed = FPCompareGT(element, zero, FPCR);
when CompareOp_GE test_passed = FPCompareGE(element, zero, FPCR);
when CompareOp_EQ test_passed = FPCompareEQ(element, zero, FPCR);
when CompareOp_LE test_passed = FPCompareGE(zero, element, FPCR);
when CompareOp_LT test_passed = FPCompareGT(zero, element, FPCR);
V[d] = result;

FCMGE (register)
Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first
source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the
second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP
register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to
zero.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
FCMGE <Hd>, <Hn>, <Hm>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
FCMGE (register) Page 685

FCMGE <V><d>, <V><n>, <V><m>
CompareOp cmp;
boolean abs;
case E:U:ac of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
CompareOp cmp;
boolean abs;
case E:U:ac of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
if abs then
case cmp of
V[d] = result;

FCMGE (zero)
Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the
source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector
element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in
the destination SIMD&FP register to zero.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op
FCMGE <Hd>, <Hn>, #0.0
integer esize = 16;

case op:U of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op
FCMGE (zero) Page 689

FCMGE <V><d>, <V><n>, #0.0

case op:U of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op
FCMGE <Vd>.<T>, <Vn>.<T>, #0.0
integer esize = 16;

case op:U of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGE <Vd>.<T>, <Vn>.<T>, #0.0

case op:U of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
case comparison of
V[d] = result;

FCMGT (register)
Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source
SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source
SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
FCMGT <Hd>, <Hn>, <Hm>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
FCMGT (register) Page 693

FCMGT <V><d>, <V><n>, <V><m>
CompareOp cmp;
boolean abs;
case E:U:ac of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
CompareOp cmp;
boolean abs;
case E:U:ac of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
CompareOp cmp;
boolean abs;
case E:U:ac of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
if abs then
case cmp of
V[d] = result;

FCMGT (zero)
Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source
SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op
FCMGT <Hd>, <Hn>, #0.0
integer esize = 16;

case op:U of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op
FCMGT (zero) Page 697

FCMGT <V><d>, <V><n>, #0.0

case op:U of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op
FCMGT <Vd>.<T>, <Vn>.<T>, #0.0
integer esize = 16;

case op:U of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGT <Vd>.<T>, <Vn>.<T>, #0.0

case op:U of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
case comparison of
V[d] = result;

FCMLA (by element)
Floating-point Complex Multiply Accumulate (by element).

real part of the number. Each element holds a floating-point value. It performs the following computation on complex
numbers from the first source register and the destination register with the specified complex number from the
second source register:
rotated counterclockwise by 0, 90, 180, or 270 degrees.
• The two elements of the transformed complex number are multiplied by:
◦ The real element of the complex number from the first source register, if the transformation was a
rotation by 0 or 180 degrees.
◦ The imaginary element of the complex number from the first source register, if the transformation
was a rotation by 90 or 270 degrees.
• The complex number resulting from that multiplication is added to the complex number from the destination
register.
The multiplication and addition operations are performed as a fused multiply-add, without any intermediate rounding.
exception traps.
Vector
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 rot 1 H 0 Rn Rd
(size == 01)
FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>], #<rotate>
(size == 10)
FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>], #<rotate>

if size == '00' || size == '11' then UNDEFINED;
if size == '01' then index = UInt(H:L);
if size == '10' then index = UInt(H);
if size == '10' && (L == '1' || Q == '0') then UNDEFINED;
if size == '01' && H == '1' && Q == '0' then UNDEFINED;
Assembler Symbols
FCMLA (by element) Page 701

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
<Ts> Is an element size specifier, encoded in “size”:
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
<index> Is the element index, encoded in “size:H:L”:

size <index>
00 RESERVED
01 H:L
10 H
11 RESERVED

rot <rotate>
00 0
01 90
10 180
11 270
Operation

case rot of
when '00'
element1 = Elem[operand2, index*2, esize];
element3 = Elem[operand2, index*2+1, esize];
when '01'
element1 = FPNeg(Elem[operand2, index*2+1, esize]);
element3 = Elem[operand2, index*2, esize];
when '10'
element1 = FPNeg(Elem[operand2, index*2, esize]);
element3 = FPNeg(Elem[operand2, index*2+1, esize]);
when '11'
element1 = Elem[operand2, index*2+1, esize];
element3 = FPNeg(Elem[operand2, index*2, esize]);
Elem[result, e*2, esize] = FPMulAdd(Elem[operand3, e*2, esize], element2, element1, FPCR);

Elem[result, e*2+1, esize] = FPMulAdd(Elem[operand3, e*2+1, esize], element4, element3, FPCR);
V[d] = result;


FCMLA
Floating-point Complex Multiply Accumulate.

real part of the number. Each element holds a floating-point value. It performs the following computation on the
corresponding complex number element pairs from the two source registers and the destination register:
rotated counterclockwise by 0, 90, 180, or 270 degrees.
• The two elements of the transformed complex number are multiplied by:
◦ The real element of the complex number from the first source register, if the transformation was a
rotation by 0 or 180 degrees.
◦ The imaginary element of the complex number from the first source register, if the transformation
was a rotation by 90 or 270 degrees.
• The complex number resulting from that multiplication is added to the complex number from the destination
register.
The multiplication and addition operations are performed as a fused multiply-add, without any intermediate rounding.
exception traps.
Vector
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 1 0 rot 1 Rn Rd
Vector
FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<rotate>

if Q == '0' && size == '11' then UNDEFINED;
Assembler Symbols
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
FCMLA Page 704

rot <rotate>
00 0
01 90
10 180
11 270
Operation

case rot of
when '00'
when '01'
when '10'
when '11'
Elem[result, e*2, esize] = FPMulAdd(Elem[operand3, e*2, esize], element2, element1, FPCR);

Elem[result, e*2+1, esize] = FPMulAdd(Elem[operand3, e*2+1, esize], element4, element3, FPCR);
V[d] = result;
FCMLA Page 705

FCMLE (zero)
Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the
source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector
element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in
the destination SIMD&FP register to zero.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op
FCMLE <Hd>, <Hn>, #0.0
integer esize = 16;

case op:U of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op
FCMLE (zero) Page 706

FCMLE <V><d>, <V><n>, #0.0

case op:U of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op
FCMLE <Vd>.<T>, <Vn>.<T>, #0.0
integer esize = 16;

case op:U of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMLE <Vd>.<T>, <Vn>.<T>, #0.0

case op:U of
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
case comparison of
V[d] = result;

FCMLT (zero)
Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source
SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0 1 0 Rn Rd
FCMLT <Hd>, <Hn>, #0.0
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 0 1 0 Rn Rd
FCMLT <V><d>, <V><n>, #0.0


(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0 1 0 Rn Rd
FCMLT (zero) Page 710

FCMLT <Vd>.<T>, <Vn>.<T>, #0.0
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 0 1 0 Rn Rd
FCMLT <Vd>.<T>, <Vn>.<T>, #0.0

Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
case comparison of
V[d] = result;

FCMP
Floating-point quiet Compare (scalar). This instruction compares the two SIMD&FP source register values, or the first
SIMD&FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.
It raises an Invalid Operation exception only if either operand is a signaling NaN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 0 0 Rn 0 x 0 0 0
opc
Half-precision (ftype == 11 && opc == 00)

(Armv8.2)
FCMP <Hn>, <Hm>
Half-precision, zero (ftype == 11 && Rm == (00000) && opc == 01)

(Armv8.2)
FCMP <Hn>, #0.0
Single-precision (ftype == 00 && opc == 00)
FCMP <Sn>, <Sm>
Single-precision, zero (ftype == 00 && Rm == (00000) && opc == 01)
FCMP <Sn>, #0.0
Double-precision (ftype == 01 && opc == 00)
FCMP <Dn>, <Dm>
Double-precision, zero (ftype == 01 && Rm == (00000) && opc == 01)
FCMP <Dn>, #0.0
integer m = UInt(Rm); // ignored when opc<0> == '1'
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
boolean signal_all_nans = (opc<1> == '1');

boolean cmp_with_zero = (opc<0> == '1');
FCMP Page 713

Assembler Symbols
<Dn> For the double-precision variant: is the 64-bit name of the first SIMD&FP source register, encoded in
the "Rn" field.
For the double-precision, zero variant: is the 64-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
<Hn> For the half-precision variant: is the 16-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the half-precision, zero variant: is the 16-bit name of the SIMD&FP source register, encoded in the
"Rn" field.
<Sn> For the single-precision variant: is the 32-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the single-precision, zero variant: is the 32-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
NaNs
and V=1.
Operation

operand2 = if cmp_with_zero then FPZero('0') else V[m];
PSTATE.<N,Z,C,V> = FPCompare(operand1, operand2, signal_all_nans, FPCR);
FCMP Page 714

FCMPE
Floating-point signaling Compare (scalar). This instruction compares the two SIMD&FP source register values, or the
first SIMD&FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.
If either operand is any type of NaN, or if either operand is a signaling NaN, the instruction raises an Invalid
Operation exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 0 0 Rn 1 x 0 0 0
opc
Half-precision (ftype == 11 && opc == 10)

(Armv8.2)
FCMPE <Hn>, <Hm>
Half-precision, zero (ftype == 11 && Rm == (00000) && opc == 11)

(Armv8.2)
FCMPE <Hn>, #0.0
Single-precision (ftype == 00 && opc == 10)
FCMPE <Sn>, <Sm>
Single-precision, zero (ftype == 00 && Rm == (00000) && opc == 11)
FCMPE <Sn>, #0.0
Double-precision (ftype == 01 && opc == 10)
FCMPE <Dn>, <Dm>
Double-precision, zero (ftype == 01 && Rm == (00000) && opc == 11)
FCMPE <Dn>, #0.0
integer m = UInt(Rm); // ignored when opc<0> == '1'
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
boolean signal_all_nans = (opc<1> == '1');

boolean cmp_with_zero = (opc<0> == '1');
FCMPE Page 715

Assembler Symbols
<Dn> For the double-precision variant: is the 64-bit name of the first SIMD&FP source register, encoded in
the "Rn" field.
For the double-precision, zero variant: is the 64-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
<Hn> For the half-precision variant: is the 16-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the half-precision, zero variant: is the 16-bit name of the SIMD&FP source register, encoded in the
"Rn" field.
<Sn> For the single-precision variant: is the 32-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the single-precision, zero variant: is the 32-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
NaNs
and V=1.
FCMPE raises an Invalid Operation exception if either operand is any type of NaN, and is suitable for testing for <, <=,
>, >=, and other predicates that raise an exception when the operands are unordered.
Operation

operand2 = if cmp_with_zero then FPZero('0') else V[m];
PSTATE.<N,Z,C,V> = FPCompare(operand1, operand2, signal_all_nans, FPCR);
FCMPE Page 716

FCSEL
Floating-point Conditional Select (scalar). This instruction allows the SIMD&FP destination register to take the value
from either one or the other of two SIMD&FP source registers. If the condition passes, the first SIMD&FP source
register value is taken, otherwise the second SIMD&FP source register value is taken.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 1 1 Rn Rd

(Armv8.2)
FCSEL <Hd>, <Hn>, <Hm>, <cond>
FCSEL <Sd>, <Sn>, <Sm>, <cond>
FCSEL <Dd>, <Dn>, <Dm>, <cond>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FCSEL Page 717

Operation
result = if ConditionHolds(cond) then V[n] else V[m];
V[d] = result;
If PSTATE.DIT is 1:
FCSEL Page 718

FCVT
Floating-point Convert precision (scalar). This instruction converts the floating-point value in the SIMD&FP source
register to the precision for the destination register data type using the rounding mode that is determined by the
FPCR and writes the result to the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 1 opc 1 0 0 0 0 Rn Rd
Half-precision to single-precision (ftype == 11 && opc == 00)
FCVT <Sd>, <Hn>
Half-precision to double-precision (ftype == 11 && opc == 01)
FCVT <Dd>, <Hn>
Single-precision to half-precision (ftype == 00 && opc == 11)
FCVT <Hd>, <Sn>
Single-precision to double-precision (ftype == 00 && opc == 01)
FCVT <Dd>, <Sn>
Double-precision to half-precision (ftype == 01 && opc == 11)
FCVT <Hd>, <Dn>
Double-precision to single-precision (ftype == 01 && opc == 00)
FCVT <Sd>, <Dn>
integer srcsize;
integer dstsize;
if ftype == opc then UNDEFINED;
case ftype of
when '00' srcsize = 32;
case opc of
when '00' dstsize = 32;
Assembler Symbols
FCVT Page 719

Operation
bits(dstsize) result;
bits(srcsize) operand = V[n];
result = FPConvert(operand, FPCR);

V[d] = result;
FCVT Page 720

FCVTAS (vector)
Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each
element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away
rounding mode and writes the result to the SIMD&FP destination register.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U
FCVTAS <Hd>, <Hn>
integer esize = 16;

FPRounding rounding = FPRounding_TIEAWAY;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U
FCVTAS <V><d>, <V><n>


FCVTAS (vector) Page 721

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U
FCVTAS <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U
FCVTAS <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, FPCR, rounding);
V[d] = result;

FCVTAS (scalar)
Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round to Nearest
with Ties to Away rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
Half-precision to 32-bit (sf == 0 && ftype == 11)

(Armv8.2)
FCVTAS <Wd>, <Hn>

(Armv8.2)
FCVTAS <Xd>, <Hn>
Single-precision to 32-bit (sf == 0 && ftype == 00)
FCVTAS <Wd>, <Sn>
FCVTAS <Xd>, <Sn>
Double-precision to 32-bit (sf == 0 && ftype == 01)
FCVTAS <Wd>, <Dn>
FCVTAS <Xd>, <Dn>
integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
FCVTAS (scalar) Page 724

Assembler Symbols
Operation
bits(fltsize) fltval;
bits(intsize) intval;
fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, FPCR, FPRounding_TIEAWAY);
X[d] = intval;
FCVTAS (scalar) Page 725

FCVTAU (vector)
Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts
each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties
to Away rounding mode and writes the result to the SIMD&FP destination register.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U
FCVTAU <Hd>, <Hn>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U
FCVTAU <V><d>, <V><n>


FCVTAU (vector) Page 726

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U
FCVTAU <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U
FCVTAU <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTAU (scalar)
Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar). This instruction converts
the floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round to
Nearest with Ties to Away rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode

(Armv8.2)
FCVTAU <Wd>, <Hn>

(Armv8.2)
FCVTAU <Xd>, <Hn>
FCVTAU <Wd>, <Sn>
FCVTAU <Xd>, <Sn>
FCVTAU <Wd>, <Dn>
FCVTAU <Xd>, <Dn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
FCVTAU (scalar) Page 729

Assembler Symbols
Operation
fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, FPCR, FPRounding_TIEAWAY);
X[d] = intval;
FCVTAU (scalar) Page 730

FCVTL, FCVTL2
Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the
SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode
that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP
Where the operation lengthens a 64-bit vector to a 128-bit vector, the FCVTL2 variant operates on the elements in the
top 64 bits of the source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 1 1 0 Rn Rd
FCVTL{2} <Vd>.<Ta>, <Vn>.<Tb>

Assembler Symbols
Q 2
0 [absent]
1 [present]
<Ta> Is an arrangement specifier, encoded in “sz”:
sz <Ta>
0 4S
1 2D
<Tb> Is an arrangement specifier, encoded in “sz:Q”:
sz Q <Tb>
0 0 4H
0 1 8H
1 0 2S
1 1 4S
Operation
bits(datasize) operand = Vpart[n, part];
bits(2*datasize) result;
Elem[result, e, 2*esize] = FPConvert(Elem[operand, e, esize], FPCR);
V[d] = result;
FCVTL, FCVTL2 Page 731

FCVTL, FCVTL2 Page 732

FCVTMS (vector)
Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTMS <Hd>, <Hn>
integer esize = 16;

FPRounding rounding = FPDecodeRounding(o1:o2);

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTMS <V><d>, <V><n>


FCVTMS (vector) Page 733

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTMS <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTMS <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTMS (scalar)
Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards
Minus Infinity rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 0 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
FCVTMS (scalar) Page 736

(Armv8.2)
FCVTMS <Wd>, <Hn>

(Armv8.2)
FCVTMS <Xd>, <Hn>
FCVTMS <Wd>, <Sn>
FCVTMS <Xd>, <Sn>
FCVTMS <Wd>, <Dn>
FCVTMS <Xd>, <Dn>

integer fltsize;
FPRounding rounding;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
Assembler Symbols

Operation
fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, FPCR, rounding);
X[d] = intval;

FCVTMU (vector)
Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar
or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus
Infinity rounding mode, and writes the result to the SIMD&FP destination register.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTMU <Hd>, <Hn>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTMU <V><d>, <V><n>


FCVTMU (vector) Page 739

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTMU <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTMU <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTMU (scalar)
Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards
Minus Infinity rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 0 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
FCVTMU (scalar) Page 742

(Armv8.2)
FCVTMU <Wd>, <Hn>

(Armv8.2)
FCVTMU <Xd>, <Hn>
FCVTMU <Wd>, <Sn>
FCVTMU <Xd>, <Sn>
FCVTMU <Wd>, <Dn>
FCVTMU <Xd>, <Dn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, FPCR, rounding);
X[d] = intval;

FCVTN, FCVTN2
Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP
source register, converts each result to half the precision of the source element, writes the final result to a vector, and
writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are
half as long as the source vector elements. The rounding mode is determined by the FPCR.
The FCVTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
FCVTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd
FCVTN{2} <Vd>.<Tb>, <Vn>.<Ta>

Assembler Symbols
Q 2
0 [absent]
1 [present]
sz Q <Tb>
0 0 4H
0 1 8H
1 0 2S
1 1 4S
sz <Ta>
0 4S
1 2D
FCVTN, FCVTN2 Page 745

Operation
bits(2*datasize) operand = V[n];
Elem[result, e, esize] = FPConvert(Elem[operand, e, 2*esize], FPCR);
FCVTN, FCVTN2 Page 746

FCVTNS (vector)
Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a
scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTNS <Hd>, <Hn>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTNS <V><d>, <V><n>


FCVTNS (vector) Page 747

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTNS <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTNS <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTNS (scalar)
Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round to Nearest
rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
FCVTNS (scalar) Page 750

(Armv8.2)
FCVTNS <Wd>, <Hn>

(Armv8.2)
FCVTNS <Xd>, <Hn>
FCVTNS <Wd>, <Sn>
FCVTNS <Xd>, <Sn>
FCVTNS <Wd>, <Dn>
FCVTNS <Xd>, <Dn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
fltval = V[n];
X[d] = intval;

FCVTNU (vector)
Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a
scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTNU <Hd>, <Hn>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTNU <V><d>, <V><n>


FCVTNU (vector) Page 753

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTNU <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTNU <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTNU (scalar)
Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar). This instruction converts
the floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round to
Nearest rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
FCVTNU (scalar) Page 756

(Armv8.2)
FCVTNU <Wd>, <Hn>

(Armv8.2)
FCVTNU <Xd>, <Hn>
FCVTNU <Wd>, <Sn>
FCVTNU <Xd>, <Sn>
FCVTNU <Wd>, <Dn>
FCVTNU <Xd>, <Dn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
fltval = V[n];
X[d] = intval;

FCVTPS (vector)
Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTPS <Hd>, <Hn>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTPS <V><d>, <V><n>


FCVTPS (vector) Page 759

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTPS <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTPS <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTPS (scalar)
Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards Plus Infinity
rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 1 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
FCVTPS (scalar) Page 762

(Armv8.2)
FCVTPS <Wd>, <Hn>

(Armv8.2)
FCVTPS <Xd>, <Hn>
FCVTPS <Wd>, <Sn>
FCVTPS <Xd>, <Sn>
FCVTPS <Wd>, <Dn>
FCVTPS <Xd>, <Dn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
fltval = V[n];
X[d] = intval;

FCVTPU (vector)
Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTPU <Hd>, <Hn>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTPU <V><d>, <V><n>


FCVTPU (vector) Page 765

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTPU <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
FCVTPU <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTPU (scalar)
Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards
Plus Infinity rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 1 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
FCVTPU (scalar) Page 768

(Armv8.2)
FCVTPU <Wd>, <Hn>

(Armv8.2)
FCVTPU <Xd>, <Hn>
FCVTPU <Wd>, <Sn>
FCVTPU <Xd>, <Sn>
FCVTPU <Wd>, <Dn>
FCVTPU <Xd>, <Dn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
fltval = V[n];
X[d] = intval;

FCVTXN, FCVTXN2
Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element
in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to
Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
This instruction uses the Round to Odd rounding mode which is not defined by the IEEE 754-2008 standard. This
rounding mode ensures that if the result of the conversion is inexact the least significant bit of the mantissa is forced
to 1. This rounding mode enables a floating-point value to be converted to a lower precision format via an intermediate
precision format while avoiding double rounding errors. For example, a 64-bit floating-point value can be converted to
a correctly rounded 16-bit floating-point value by first using this instruction to produce a 32-bit value and then using
another instruction with the wanted rounding mode to convert the 32-bit value to the final 16-bit floating-point value.
The FCVTXN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the FCVTXN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
exception traps.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd
Scalar
FCVTXN <Vb><d>, <Va><n>
if sz == '0' then UNDEFINED;

integer esize = 32;
integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd
Vector
FCVTXN{2} <Vd>.<Tb>, <Vn>.<Ta>

integer esize = 32;
Assembler Symbols
FCVTXN, FCVTXN2 Page 771

Q 2
0 [absent]
1 [present]
sz Q <Tb>
0 x RESERVED
1 0 2S
1 1 4S
sz <Ta>
0 RESERVED
1 2D
<Vb> Is the destination width specifier, encoded in “sz”:

sz <Vb>
0 RESERVED
1 S
<Va> Is the source width specifier, encoded in “sz”:
sz <Va>
0 RESERVED
1 D
Operation
Elem[result, e, esize] = FPConvert(Elem[operand, e, 2*esize], FPCR, FPRounding_ODD);
FCVTXN, FCVTXN2 Page 772

FCVTZS (vector, fixed-point)
Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and
writes the result to the SIMD&FP destination register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh
Scalar
FCVTZS <V><d>, <V><n>, #<fbits>
if immh == '000x' || (immh == '001x' && !HaveFP16Ext()) then UNDEFINED;

integer esize = if immh == '1xxx' then 64 else if immh == '01xx' then 32 else 16;
integer fracbits = (esize * 2) - UInt(immh:immb);

FPRounding rounding = FPRounding_ZERO;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh
Vector
FCVTZS <Vd>.<T>, <Vn>.<T>, #<fbits>
if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;

Assembler Symbols
<V> Is a width specifier, encoded in “immh”:
FCVTZS (vector, fixed-point) Page 773

immh <V>
000x RESERVED
001x H
01xx S
1xxx D
<T> Is an arrangement specifier, encoded in “immh:Q”:
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:
immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:
immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
Operation
Elem[result, e, esize] = FPToFixed(element, fracbits, unsigned, FPCR, rounding);
V[d] = result;
FCVTZS (vector, fixed-point) Page 774

FCVTZS (vector, integer)
Floating-point Convert to Signed integer, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from a floating-point value to a signed integer value using the Round towards Zero rounding mode,

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTZS <Hd>, <Hn>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTZS <V><d>, <V><n>


FCVTZS (vector, integer) Page 775

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTZS <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTZS <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTZS (scalar, fixed-point)
Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit fixed-point signed integer using the Round towards
Zero rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 1 1 0 0 0 scale Rn Rd
rmode opcode

(Armv8.2)
FCVTZS <Wd>, <Hn>, #<fbits>

(Armv8.2)
FCVTZS <Xd>, <Hn>, #<fbits>
FCVTZS <Wd>, <Sn>, #<fbits>
FCVTZS <Xd>, <Sn>, #<fbits>
FCVTZS <Wd>, <Dn>, #<fbits>
FCVTZS <Xd>, <Dn>, #<fbits>

integer fltsize;
case ftype of
when '00' fltsize = 32;
when '11'
fltsize = 16;
else
UNDEFINED;
if sf == '0' && scale<5> == '0' then UNDEFINED;

integer fracbits = 64 - UInt(scale);
FCVTZS (scalar, fixed-point) Page 778

Assembler Symbols
<fbits> For the double-precision to 32-bit, half-precision to 32-bit and single-precision to 32-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 32, encoded as 64
minus "scale".
For the double-precision to 64-bit, half-precision to 64-bit and single-precision to 64-bit variant: is the
minus "scale".
Operation
fltval = V[n];
intval = FPToFixed(fltval, fracbits, FALSE, FPCR, FPRounding_ZERO);
X[d] = intval;
FCVTZS (scalar, fixed-point) Page 779

FCVTZS (scalar, integer)
Floating-point Convert to Signed integer, rounding toward Zero (scalar). This instruction converts the floating-point
value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards Zero rounding
mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 1 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
FCVTZS (scalar, integer) Page 780

(Armv8.2)
FCVTZS <Wd>, <Hn>

(Armv8.2)
FCVTZS <Xd>, <Hn>
FCVTZS <Wd>, <Sn>
FCVTZS <Xd>, <Sn>
FCVTZS <Wd>, <Dn>
FCVTZS <Xd>, <Dn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
fltval = V[n];
X[d] = intval;

FCVTZU (vector, fixed-point)
Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or
each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh
Scalar
FCVTZU <V><d>, <V><n>, #<fbits>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh
Vector
FCVTZU <Vd>.<T>, <Vn>.<T>, #<fbits>


Assembler Symbols
FCVTZU (vector, fixed-point) Page 783

immh <V>
000x RESERVED
001x H
01xx S
1xxx D
immh Q <T>
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
“immh:immb”:
immh <fbits>
000x RESERVED
“immh:immb”:
immh <fbits>
0001 RESERVED
Operation
Elem[result, e, esize] = FPToFixed(element, fracbits, unsigned, FPCR, rounding);
V[d] = result;
FCVTZU (vector, fixed-point) Page 784

FCVTZU (vector, integer)
Floating-point Convert to Unsigned integer, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from a floating-point value to an unsigned integer value using the Round towards Zero rounding
mode, and writes the result to the SIMD&FP destination register.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTZU <Hd>, <Hn>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTZU <V><d>, <V><n>


FCVTZU (vector, integer) Page 785

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTZU <Vd>.<T>, <Vn>.<T>
integer esize = 16;


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
FCVTZU <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H

“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;

FCVTZU (scalar, fixed-point)
Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit fixed-point unsigned integer using the Round towards
Zero rounding mode, and writes the result to the general-purpose destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
rmode opcode

(Armv8.2)
FCVTZU <Wd>, <Hn>, #<fbits>

(Armv8.2)
FCVTZU <Xd>, <Hn>, #<fbits>
FCVTZU <Wd>, <Sn>, #<fbits>
FCVTZU <Xd>, <Sn>, #<fbits>
FCVTZU <Wd>, <Dn>, #<fbits>
FCVTZU <Xd>, <Dn>, #<fbits>

integer fltsize;
case ftype of
when '11'
fltsize = 16;
else
UNDEFINED;

FCVTZU (scalar, fixed-point) Page 788

Assembler Symbols
<fbits> For the double-precision to 32-bit, half-precision to 32-bit and single-precision to 32-bit variant: is the
minus "scale".
For the double-precision to 64-bit, half-precision to 64-bit and single-precision to 64-bit variant: is the
minus "scale".
Operation
fltval = V[n];
intval = FPToFixed(fltval, fracbits, TRUE, FPCR, FPRounding_ZERO);
X[d] = intval;
FCVTZU (scalar, fixed-point) Page 789

FCVTZU (scalar, integer)
Floating-point Convert to Unsigned integer, rounding toward Zero (scalar). This instruction converts the floating-point
value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards Zero rounding
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 1 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
FCVTZU (scalar, integer) Page 790

(Armv8.2)
FCVTZU <Wd>, <Hn>

(Armv8.2)
FCVTZU <Xd>, <Hn>
FCVTZU <Wd>, <Sn>
FCVTZU <Xd>, <Sn>
FCVTZU <Wd>, <Dn>
FCVTZU <Xd>, <Dn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
fltval = V[n];
X[d] = intval;

FDIV (vector)
Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source
SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register,
places the results in a vector, and writes the vector to the destination SIMD&FP register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd
Half-precision
FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd
FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 4H
1 8H
FDIV (vector) Page 793

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
Elem[result, e, esize] = FPDiv(element1, element2, FPCR);
V[d] = result;
FDIV (vector) Page 794

FDIV (scalar)
Floating-point Divide (scalar). This instruction divides the floating-point value of the first source SIMD&FP register by
the floating-point value of the second source SIMD&FP register, and writes the result to the destination SIMD&FP
register.
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 0 1 1 0 Rn Rd

(Armv8.2)
FDIV <Hd>, <Hn>, <Hm>
FDIV <Sd>, <Sn>, <Sm>
FDIV <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FDIV (scalar) Page 795

Operation
result = FPDiv(operand1, operand2, FPCR);
V[d] = result;
FDIV (scalar) Page 796

FJCVTZS
Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero. This instruction converts the double-
precision floating-point value in the SIMD&FP source register to a 32-bit signed integer using the Round towards Zero
rounding mode, and writes the result to the general-purpose destination register. If the result is too large to be
accommodated as a signed 32-bit integer, then the result is the integer modulo 232, as held in a 32-bit signed integer.
exception traps.
Double-precision to 32-bit
(Armv8.3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 Rn Rd
sf ftype rmode opcode
FJCVTZS <Wd>, <Dn>
if !HaveFJCVTZSExt() then UNDEFINED;
Assembler Symbols
Operation
bits(64) fltval;
bits(32) intval;
bit Z;
fltval = V[n];
(intval, Z) = FPToFixedJS(fltval, FPCR, TRUE);
PSTATE.<N,Z,C,V> = '0':Z:'00';
X[d] = intval;
FJCVTZS Page 797

FMADD
Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source
registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 0 Rm 0 Ra Rn Rd
o1 o0

(Armv8.2)
FMADD <Hd>, <Hn>, <Hm>, <Ha>
FMADD <Sd>, <Sn>, <Sm>, <Sa>
FMADD <Dd>, <Dn>, <Dm>, <Da>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
<Dn> Is the 64-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Dm> Is the 64-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Hm> Is the 16-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Ha> Is the 16-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
FMADD Page 798

<Sn> Is the 32-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Sm> Is the 32-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Sa> Is the 32-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
Operation
bits(datasize) operanda = V[a];
result = FPMulAdd(operanda, operand1, operand2, FPCR);
V[d] = result;
FMADD Page 799

FMAX (vector)
Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to
the destination SIMD&FP register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1
Half-precision
FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;

boolean minimum = (o1 == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1
FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
FMAX (vector) Page 800

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR);
V[d] = result;
FMAX (vector) Page 801

FMAX (scalar)
Floating-point Maximum (scalar). This instruction compares the two source SIMD&FP registers, and writes the larger
of the two floating-point values to the destination SIMD&FP register.
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 0 0 1 0 Rn Rd
op

(Armv8.2)
FMAX <Hd>, <Hn>, <Hm>
FMAX <Sd>, <Sn>, <Sm>
FMAX <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FMAX (scalar) Page 802

Operation
result = FPMax(operand1, operand2, FPCR);

V[d] = result;
FMAX (scalar) Page 803

FMAXNM (vector)
Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a
Half-precision
FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;

boolean minimum = (a == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1
FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
FMAXNM (vector) Page 804

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR);
V[d] = result;
FMAXNM (vector) Page 805

FMAXNM (scalar)
Floating-point Maximum Number (scalar). This instruction compares the first and second source SIMD&FP register
values, and writes the larger of the two floating-point values to the destination SIMD&FP register.
NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 1 0 1 0 Rn Rd
op

(Armv8.2)
FMAXNM <Hd>, <Hn>, <Hm>
FMAXNM <Sd>, <Sn>, <Sm>
FMAXNM <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FMAXNM (scalar) Page 806

Operation
result = FPMaxNum(operand1, operand2, FPCR);

V[d] = result;
FMAXNM (scalar) Page 807

FMAXNMP (scalar)
Floating-point Maximum Number of Pair of elements (scalar). This instruction compares two vector elements in the
source SIMD&FP register and writes the largest of the floating-point values as a scalar to the destination SIMD&FP
register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1 sz
Half-precision
FMAXNMP <V><d>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
FMAXNMP <V><d>, <Vn>.<T>
integer esize = 32;

Assembler Symbols
sz <V>
0 H
1 RESERVED
sz <V>
0 S
1 D
FMAXNMP (scalar) Page 808

sz <T>
0 2H
1 RESERVED
“sz”:
sz <T>
0 2S
1 2D
Operation
V[d] = Reduce(ReduceOp_FMAXNUM, operand, esize);
FMAXNMP (scalar) Page 809

FMAXNMP (vector)
Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector
elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register,
reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of
values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are
floating-point values.
NaN, the result is the numerical value, otherwise the result is identical to FMAX (scalar).
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a
Half-precision
FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1
FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

FMAXNMP (vector) Page 810

Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
if minimum then
else
V[d] = result;
FMAXNMP (vector) Page 811

FMAXNMV
Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source
SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values
in this instruction are floating-point values.
NaN, the result of the comparison is the numerical value, otherwise the result is identical to FMAX (scalar).
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
Half-precision
FMAXNMV <V><d>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
FMAXNMV <V><d>, <Vn>.<T>
if sz:Q != '01' then UNDEFINED; // .4S only

Assembler Symbols
<V> For the half-precision variant: is the destination width specifier, H.

sz <V>
0 S
1 RESERVED
FMAXNMV Page 812

Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED
Operation
V[d] = Reduce(ReduceOp_FMAXNUM, operand, esize);
FMAXNMV Page 813

FMAXP (scalar)
Floating-point Maximum of Pair of elements (scalar). This instruction compares two vector elements in the source
SIMD&FP register and writes the largest of the floating-point values as a scalar to the destination SIMD&FP register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1 sz
Half-precision
FMAXP <V><d>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
FMAXP <V><d>, <Vn>.<T>
integer esize = 32;

Assembler Symbols
sz <V>
0 H
1 RESERVED
sz <V>
0 S
1 D
FMAXP (scalar) Page 814

sz <T>
0 2H
1 RESERVED
“sz”:
sz <T>
0 2S
1 2D
Operation
V[d] = Reduce(ReduceOp_FMAX, operand, esize);
FMAXP (scalar) Page 815

FMAXP (vector)
Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of
the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair
of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and
writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1
Half-precision
FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1
FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
FMAXP (vector) Page 816

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
if minimum then
else
V[d] = result;
FMAXP (vector) Page 817

FMAXV
Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP
register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this
instruction are floating-point values.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
Half-precision
FMAXV <V><d>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
FMAXV <V><d>, <Vn>.<T>
if sz:Q != '01' then UNDEFINED;

Assembler Symbols

sz <V>
0 S
1 RESERVED
FMAXV Page 818

Q <T>
0 4H
1 8H
Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED
Operation
V[d] = Reduce(ReduceOp_FMAX, operand, esize);
FMAXV Page 819

FMIN (vector)
Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1
Half-precision
FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1
FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
FMIN (vector) Page 820

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
if minimum then
else
V[d] = result;
FMIN (vector) Page 821

FMIN (scalar)
Floating-point Minimum (scalar). This instruction compares the first and second source SIMD&FP register values, and
writes the smaller of the two floating-point values to the destination SIMD&FP register.
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 0 1 1 0 Rn Rd
op

(Armv8.2)
FMIN <Hd>, <Hn>, <Hm>
FMIN <Sd>, <Sn>, <Sm>
FMIN <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FMIN (scalar) Page 822

Operation
result = FPMin(operand1, operand2, FPCR);

V[d] = result;
FMIN (scalar) Page 823

FMINNM (vector)
Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the
NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a
Half-precision
FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1
FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
FMINNM (vector) Page 824

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
if minimum then
else
V[d] = result;
FMINNM (vector) Page 825

FMINNM (scalar)
Floating-point Minimum Number (scalar). This instruction compares the first and second source SIMD&FP register
values, and writes the smaller of the two floating-point values to the destination SIMD&FP register.
NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 1 1 1 0 Rn Rd
op

(Armv8.2)
FMINNM <Hd>, <Hn>, <Hm>
FMINNM <Sd>, <Sn>, <Sm>
FMINNM <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FMINNM (scalar) Page 826

Operation
result = FPMinNum(operand1, operand2, FPCR);
V[d] = result;
FMINNM (scalar) Page 827

FMINNMP (scalar)
Floating-point Minimum Number of Pair of elements (scalar). This instruction compares two vector elements in the
source SIMD&FP register and writes the smallest of the floating-point values as a scalar to the destination SIMD&FP
register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1 sz
Half-precision
FMINNMP <V><d>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
FMINNMP <V><d>, <Vn>.<T>
integer esize = 32;

Assembler Symbols
sz <V>
0 H
1 RESERVED
sz <V>
0 S
1 D
FMINNMP (scalar) Page 828

sz <T>
0 2H
1 RESERVED
“sz”:
sz <T>
0 2S
1 2D
Operation
V[d] = Reduce(ReduceOp_FMINNUM, operand, esize);
FMINNMP (scalar) Page 829

FMINNMP (vector)
Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector
elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register,
reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of
floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this
NaN, the result is the numerical value, otherwise the result is identical to FMIN (scalar).
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a
Half-precision
FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1
FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

FMINNMP (vector) Page 830

Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
if minimum then
else
V[d] = result;
FMINNMP (vector) Page 831

FMINNMV
Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source
SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the
values in this instruction are floating-point values.
NaN, the result of the comparison is the numerical value, otherwise the result is identical to FMIN (scalar).
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
Half-precision
FMINNMV <V><d>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
FMINNMV <V><d>, <Vn>.<T>
if sz:Q != '01' then UNDEFINED; // .4S only

Assembler Symbols

sz <V>
0 S
1 RESERVED
FMINNMV Page 832

Q <T>
0 4H
1 8H
Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED
Operation
V[d] = Reduce(ReduceOp_FMINNUM, operand, esize);
FMINNMV Page 833

FMINP (scalar)
Floating-point Minimum of Pair of elements (scalar). This instruction compares two vector elements in the source
SIMD&FP register and writes the smallest of the floating-point values as a scalar to the destination SIMD&FP register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1 sz
Half-precision
FMINP <V><d>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
FMINP <V><d>, <Vn>.<T>
integer esize = 32;

Assembler Symbols
sz <V>
0 H
1 RESERVED
sz <V>
0 S
1 D
FMINP (scalar) Page 834

sz <T>
0 2H
1 RESERVED
“sz”:
sz <T>
0 2S
1 2D
Operation
V[d] = Reduce(ReduceOp_FMIN, operand, esize);
FMINP (scalar) Page 835

FMINP (vector)
Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of
the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair
of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and
writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1
Half-precision
FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1
FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
FMINP (vector) Page 836

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if pair then
else
if minimum then
else
V[d] = result;
FMINP (vector) Page 837

FMINV
Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP
register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
Half-precision
FMINV <V><d>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
FMINV <V><d>, <Vn>.<T>
if sz:Q != '01' then UNDEFINED;

Assembler Symbols

sz <V>
0 S
1 RESERVED
FMINV Page 838

Q <T>
0 4H
1 8H
Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED
Operation
V[d] = Reduce(ReduceOp_FMIN, operand, esize);
FMINV Page 839

FMLA (by element)
Floating-point fused Multiply-Add to accumulator (by element). This instruction multiplies the vector elements in the
first source SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the
results in the vector elements of the destination SIMD&FP register. All the values in this instruction are floating-point
values.
exception traps.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision
Scalar, half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 0 0 0 1 H 0 Rn Rd
o2
FMLA <Hd>, <Hn>, <Vm>.H[<index>]
integer idxdsize = if H == '1' then 128 else 64;

integer esize = 16;

Scalar, single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 0 0 0 1 H 0 Rn Rd
o2
FMLA (by element) Page 840

FMLA <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
integer m = UInt(Rmhi:Rm);

Vector, half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 0 0 0 1 H 0 Rn Rd
o2
FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]

integer esize = 16;

Vector, single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 0 0 0 1 H 0 Rn Rd
o2

FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi = M;
case sz:L of

Assembler Symbols
sz <V>
0 S
1 D
<T> For the vector, half-precision variant: is an arrangement specifier, encoded in “Q”:
Q <T>
0 4H
1 8H
For the vector, single-precision and double-precision variant: is an arrangement specifier, encoded in
“Q:sz”:
Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D
<Vm> For the half-precision variant: is the name of the second SIMD&FP source register, in the range V0 to
V15, encoded in the "Rm" field.
For the single-precision and double-precision variant: is the name of the second SIMD&FP source
register, encoded in the "M:Rm" fields.
<Ts> Is an element size specifier, encoded in “sz”:
sz <Ts>
0 S
1 D
<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.
For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:

sz L <index>
0 x H:L
1 0 H
1 1 RESERVED
Operation
bits(idxdsize) operand2 = V[m];
bits(esize) element2 = Elem[operand2, index, esize];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, FPCR);
V[d] = result;

FMLA (vector)
Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point
values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of
the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 0 1 1 Rn Rd
a
Half-precision
FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
boolean sub_op = (a == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 1 1 Rn Rd
op
FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
boolean sub_op = (op == '1');
Assembler Symbols
FMLA (vector) Page 844

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FMLA (vector) Page 845

FMLAL, FMLAL2 (by element)
Floating-point fused Multiply-Add Long to accumulator (by element). This instruction multiplies the vector elements in
the first source SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the
product to the corresponding vector element of the destination SIMD&FP register. The instruction does not round the
result of the multiply before the accumulation.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
ID_AA64ISAR0_EL1.FHM indicates whether this instruction is supported.
It has encodings from 2 classes: FMLAL and FMLAL2
FMLAL
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 0 0 0 0 H 0 Rn Rd
sz S
FMLAL
FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]
if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

integer m = UInt('0':Rm); // Vm can only be in bottom 16 registers.
integer esize = 32;

boolean sub_op = (S == '1');

integer part = 0;
FMLAL2
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 0 L M Rm 1 0 0 0 H 0 Rn Rd
sz S
FMLAL, FMLAL2 (by

Page 846
element)
FMLAL2
FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]

integer esize = 32;


integer part = 1;
Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 2H
1 4H
<index> Is the element index, encoded in the "H:L:M" fields.
Operation
bits(datasize DIV 2) operand1 = Vpart[n, part];
bits(esize DIV 2) element1;
bits(esize DIV 2) element2 = Elem[operand2, index, esize DIV 2];
element1 = Elem[operand1, e, esize DIV 2];
Elem[result, e, esize] = FPMulAddH(Elem[operand3, e, esize], element1, element2, FPCR);
V[d] = result;
FMLAL, FMLAL2 (by

Page 847
element)
FMLAL, FMLAL2 (vector)
Floating-point fused Multiply-Add Long to accumulator (vector). This instruction multiplies corresponding half-
precision floating-point values in the vectors in the two source SIMD&FP registers, and accumulates the product to
the corresponding vector element of the destination SIMD&FP register. The instruction does not round the result of
the multiply before the accumulation.
support it.
It has encodings from 2 classes: FMLAL and FMLAL2
FMLAL
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 Rm 1 1 1 0 1 1 Rn Rd
S sz
FMLAL
FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer esize = 32;
integer part = 0;
FMLAL2
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 Rm 1 1 0 0 1 1 Rn Rd
S sz
FMLAL2
FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer esize = 32;
integer part = 1;
FMLAL, FMLAL2 (vector) Page 848

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 2H
1 4H
Operation
bits(datasize DIV 2) operand2 = Vpart[m, part];
V[d] = result;
FMLAL, FMLAL2 (vector) Page 849

FMLS (by element)
Floating-point fused Multiply-Subtract from accumulator (by element). This instruction multiplies the vector elements
in the first source SIMD&FP register by the specified value in the second source SIMD&FP register, and subtracts the
results from the vector elements of the destination SIMD&FP register. All the values in this instruction are floating-
point values.
exception traps.
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 0 1 0 1 H 0 Rn Rd
o2
FMLS <Hd>, <Hn>, <Vm>.H[<index>]

integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 0 1 0 1 H 0 Rn Rd
o2
FMLS (by element) Page 850

FMLS <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi = M;
case sz:L of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 0 1 0 1 H 0 Rn Rd
o2
FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]

integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 0 1 0 1 H 0 Rn Rd
o2

FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi = M;
case sz:L of

Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“Q:sz”:
Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D
sz <Ts>
0 S
1 D

sz L <index>
0 x H:L
1 0 H
1 1 RESERVED
Operation
V[d] = result;

FMLS (vector)
Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-
point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the
corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP
register.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 0 1 1 Rn Rd
a
Half-precision
FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
boolean sub_op = (a == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 1 1 Rn Rd
op
FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
boolean sub_op = (op == '1');
Assembler Symbols
FMLS (vector) Page 854

Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FMLS (vector) Page 855

FMLSL, FMLSL2 (by element)
Floating-point fused Multiply-Subtract Long from accumulator (by element). This instruction multiplies the negated
vector elements in the first source SIMD&FP register by the specified value in the second source SIMD&FP register,
and accumulates the product to the corresponding vector element of the destination SIMD&FP register. The
instruction does not round the result of the multiply before the accumulation.
support it.
It has encodings from 2 classes: FMLSL and FMLSL2
FMLSL
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 0 1 0 0 H 0 Rn Rd
sz S
FMLSL
FMLSL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]

integer esize = 32;


integer part = 0;
FMLSL2
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 0 L M Rm 1 1 0 0 H 0 Rn Rd
sz S
FMLSL, FMLSL2 (by

Page 856
element)
FMLSL2
FMLSL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]

integer esize = 32;


integer part = 1;
Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 2H
1 4H
<index> Is the element index, encoded in the "H:L:M" fields.
Operation
bits(esize DIV 2) element2 = Elem[operand2, index, esize DIV 2];
V[d] = result;
FMLSL, FMLSL2 (by

Page 857
element)
FMLSL, FMLSL2 (vector)
Floating-point fused Multiply-Subtract Long from accumulator (vector). This instruction negates the values in the
vector of one SIMD&FP register, multiplies these with the corresponding values in another vector, and accumulates
the product to the corresponding vector element of the destination SIMD&FP register. The instruction does not round
the result of the multiply before the accumulation.
support it.
It has encodings from 2 classes: FMLSL and FMLSL2
FMLSL
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 1 1 1 0 1 1 Rn Rd
S sz
FMLSL
FMLSL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer esize = 32;
integer part = 0;
FMLSL2
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 0 1 Rm 1 1 0 0 1 1 Rn Rd
S sz
FMLSL2
FMLSL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer esize = 32;
integer part = 1;
FMLSL, FMLSL2 (vector) Page 858

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 2H
1 4H
Operation
bits(datasize DIV 2) operand2 = Vpart[m, part];
V[d] = result;
FMLSL, FMLSL2 (vector) Page 859

FMOV (vector, immediate)
Floating-point move immediate (vector). This instruction copies an immediate floating-point constant into every
element of the SIMD&FP destination register.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 0 0 0 a b c 1 1 1 1 1 1 d e f g h Rd
Half-precision
FMOV <Vd>.<T>, #<imm>

bits(datasize) imm;
imm8 = a:b:c:d:e:f:g:h;
imm16 = imm8<7>:NOT(imm8<6>):Replicate(imm8<6>, 2):imm8<5:0>:Zeros(6);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q op 0 1 1 1 1 0 0 0 0 0 a b c 1 1 1 1 0 1 d e f g h Rd
cmode
Single-precision (op == 0)
FMOV <Vd>.<T>, #<imm>
Double-precision (Q == 1 && op == 1)
FMOV <Vd>.2D, #<imm>

bits(datasize) imm;
bits(64) imm64;
if cmode:op == '11111' then


Assembler Symbols
FMOV (vector, immediate) Page 860

Q <T>
0 4H
1 8H
For the single-precision variant: is an arrangement specifier, encoded in “Q”:
Q <T>
0 2S
1 4S
<imm> Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision, encoded in
"a:b:c:d:e:f:g:h". For details of the range of constants available and the encoding of <imm>, see
Modified immediate constants in A64 floating-point instructions.
Operation
V[rd] = imm;
FMOV (vector, immediate) Page 861

FMOV (register)
Floating-point Move register without conversion. This instruction copies the floating-point value in the SIMD&FP
source register to the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 0 1 0 0 0 0 Rn Rd
opc

(Armv8.2)
FMOV <Hd>, <Hn>
FMOV <Sd>, <Sn>
FMOV <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
Operation
V[d] = operand;
FMOV (register) Page 862

FMOV (general)
Floating-point Move to or from general-purpose register without conversion. This instruction transfers the contents of
a SIMD&FP register to a general-purpose register, or the contents of a general-purpose register to a SIMD&FP
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 x 1 1 x 0 0 0 0 0 0 Rn Rd
rmode opcode
FMOV (general) Page 863

Half-precision to 32-bit (sf == 0 && ftype == 11 && rmode == 00 && opcode == 110)
(Armv8.2)
FMOV <Wd>, <Hn>
Half-precision to 64-bit (sf == 1 && ftype == 11 && rmode == 00 && opcode == 110)
(Armv8.2)
FMOV <Xd>, <Hn>
32-bit to half-precision (sf == 0 && ftype == 11 && rmode == 00 && opcode == 111)
(Armv8.2)
FMOV <Hd>, <Wn>
32-bit to single-precision (sf == 0 && ftype == 00 && rmode == 00 && opcode == 111)
FMOV <Sd>, <Wn>
Single-precision to 32-bit (sf == 0 && ftype == 00 && rmode == 00 && opcode == 110)
FMOV <Wd>, <Sn>
64-bit to half-precision (sf == 1 && ftype == 11 && rmode == 00 && opcode == 111)
(Armv8.2)
FMOV <Hd>, <Xn>
64-bit to double-precision (sf == 1 && ftype == 01 && rmode == 00 && opcode == 111)
FMOV <Dd>, <Xn>
64-bit to top half of 128-bit (sf == 1 && ftype == 10 && rmode == 01 && opcode == 111)
FMOV <Vd>.D[1], <Xn>
Double-precision to 64-bit (sf == 1 && ftype == 01 && rmode == 00 && opcode == 110)
FMOV <Xd>, <Dn>
Top half of 128-bit to 64-bit (sf == 1 && ftype == 10 && rmode == 01 && opcode == 110)
FMOV <Xd>, <Vn>.D[1]


integer fltsize;
FPConvOp op;
boolean unsigned;
integer part;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
if opcode<2:1>:rmode != '11 01' then UNDEFINED;
fltsize = 128;
when '11'
fltsize = 16;
else
UNDEFINED;
case opcode<2:1>:rmode of
when '00 xx' // FCVT[NPMZ][US]
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_FtoI;
when '01 00' // [US]CVTF
rounding = FPRoundingMode(FPCR);
op = FPConvOp_CVT_ItoF;
when '10 00' // FCVTA[US]
rounding = FPRounding_TIEAWAY;
op = FPConvOp_CVT_FtoI;
when '11 00' // FMOV
if fltsize != 16 && fltsize != intsize then UNDEFINED;
op = if opcode<0> == '1' then FPConvOp_MOV_ItoF else FPConvOp_MOV_FtoI;
part = 0;
when '11 01' // FMOV D[1]
if intsize != 64 || fltsize != 128 then UNDEFINED;
op = if opcode<0> == '1' then FPConvOp_MOV_ItoF else FPConvOp_MOV_FtoI;
part = 1;
fltsize = 64; // size of D[1] is 64
when '11 11' // FJCVTZS
if !HaveFJCVTZSExt() then UNDEFINED;
rounding = FPRounding_ZERO;
op = FPConvOp_CVT_FtoI_JS;
otherwise
UNDEFINED;
Assembler Symbols

Operation
case op of
when FPConvOp_CVT_FtoI
fltval = V[n];
intval = FPToFixed(fltval, 0, unsigned, FPCR, rounding);
X[d] = intval;
when FPConvOp_CVT_ItoF
intval = X[n];
fltval = FixedToFP(intval, 0, unsigned, FPCR, rounding);
V[d] = fltval;
when FPConvOp_MOV_FtoI
fltval = Vpart[n, part];
intval = ZeroExtend(fltval, intsize);
X[d] = intval;
when FPConvOp_MOV_ItoF
intval = X[n];
fltval = intval<fltsize-1:0>;
Vpart[d, part] = fltval;
when FPConvOp_CVT_FtoI_JS
bit Z;
fltval = V[n];
(intval, Z) = FPToFixedJS(fltval, FPCR, TRUE);
PSTATE.<N,Z,C,V> = '0':Z:'00';
X[d] = intval;

FMOV (scalar, immediate)
Floating-point move immediate (scalar). This instruction copies a floating-point immediate constant into the SIMD&FP
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 imm8 1 0 0 0 0 0 0 0 Rd

(Armv8.2)
FMOV <Hd>, #<imm>
FMOV <Sd>, #<imm>
FMOV <Dd>, #<imm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
bits(datasize) imm = VFPExpandImm(imm8);
Assembler Symbols
<imm> Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision, encoded in
the "imm8" field. For details of the range of constants available and the encoding of <imm>, see
Modified immediate constants in A64 floating-point instructions.
Operation
V[d] = imm;
FMOV (scalar, immediate) Page 867

FMSUB
Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source
registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to
the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 0 Rm 1 Ra Rn Rd
o1 o0

(Armv8.2)
FMSUB <Hd>, <Hn>, <Hm>, <Ha>
FMSUB <Sd>, <Sn>, <Sm>, <Sa>
FMSUB <Dd>, <Dn>, <Dm>, <Da>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
field.
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
field.
field.
FMSUB Page 868

<Ha> Is the 16-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
field.
field.
<Sa> Is the 32-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
Operation
operand1 = FPNeg(operand1);
V[d] = result;
FMSUB Page 869

FMUL (by element)
Floating-point Multiply (by element). This instruction multiplies the vector elements in the first source SIMD&FP
register by the specified value in the second source SIMD&FP register, places the results in a vector, and writes the
vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
exception traps.
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U
FMUL <Hd>, <Hn>, <Vm>.H[<index>]

integer esize = 16;

boolean mulx_op = (U == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U
FMUL (by element) Page 870

FMUL <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi = M;
case sz:L of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U
FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]

integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U

FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi = M;
case sz:L of

Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“Q:sz”:
Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D
sz <Ts>
0 S
1 D

sz L <index>
0 x H:L
1 0 H
1 1 RESERVED
Operation
if mulx_op then
Elem[result, e, esize] = FPMulX(element1, element2, FPCR);
else
Elem[result, e, esize] = FPMul(element1, element2, FPCR);
V[d] = result;

FMUL (vector)
Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the
two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP
register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd
Half-precision
FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd
FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 4H
1 8H
FMUL (vector) Page 874

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FMUL (vector) Page 875

FMUL (scalar)
Floating-point Multiply (scalar). This instruction multiplies the floating-point values of the two source SIMD&FP
registers, and writes the result to the destination SIMD&FP register.
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 0 0 1 0 Rn Rd
op

(Armv8.2)
FMUL <Hd>, <Hn>, <Hm>
FMUL <Sd>, <Sn>, <Sm>
FMUL <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FMUL (scalar) Page 876

Operation
result = FPMul(operand1, operand2, FPCR);
V[d] = result;
FMUL (scalar) Page 877

FMULX (by element)
Floating-point Multiply extended (by element). This instruction multiplies the floating-point values in the vector
elements in the first source SIMD&FP register by the specified floating-point value in the second source SIMD&FP
register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the
values is negative, otherwise the result is positive.
exception traps.
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U
FMULX <Hd>, <Hn>, <Vm>.H[<index>]

integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U
FMULX (by element) Page 878

FMULX <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi = M;
case sz:L of

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U
FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]

integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U

FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi = M;
case sz:L of

Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“Q:sz”:
Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D
sz <Ts>
0 S
1 D

sz L <index>
0 x H:L
1 0 H
1 1 RESERVED
Operation
if mulx_op then
else
V[d] = result;

FMULX
Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the
two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the
If one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the
values is negative, otherwise the result is positive.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd
FMULX <Hd>, <Hn>, <Hm>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd
FMULX <V><d>, <V><n>, <V><m>

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd
FMULX Page 882

FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd
FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
FMULX Page 883

Operation
V[d] = result;
FMULX Page 884

FNEG (vector)
Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP
register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 Rn Rd
U
Half-precision
FNEG <Vd>.<T>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 1 1 0 Rn Rd
U
FNEG <Vd>.<T>, <Vn>.<T>

Assembler Symbols
Q <T>
0 4H
1 8H
FNEG (vector) Page 885

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
if neg then
element = FPNeg(element);
else
element = FPAbs(element);
V[d] = result;
FNEG (vector) Page 886

FNEG (scalar)
Floating-point Negate (scalar). This instruction negates the value in the SIMD&FP source register and writes the
result to the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 0 1 0 0 0 0 Rn Rd
opc

(Armv8.2)
FNEG <Hd>, <Hn>
FNEG <Sd>, <Sn>
FNEG <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
Operation
result = FPNeg(operand);
V[d] = result;
FNEG (scalar) Page 887

FNEG (scalar) Page 888

FNMADD
Floating-point Negated fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP
source registers, negates the product, subtracts the value of the third SIMD&FP source register, and writes the result
to the destination SIMD&FP register.
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 1 Rm 0 Ra Rn Rd
o1 o0

(Armv8.2)
FNMADD <Hd>, <Hn>, <Hm>, <Ha>
FNMADD <Sd>, <Sn>, <Sm>, <Sa>
FNMADD <Dd>, <Dn>, <Dm>, <Da>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
field.
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
field.
field.
<Ha> Is the 16-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
FNMADD Page 889

field.
field.
<Sa> Is the 32-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
Operation
operanda = FPNeg(operanda);
operand1 = FPNeg(operand1);
V[d] = result;
FNMADD Page 890

FNMSUB
Floating-point Negated fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two
SIMD&FP source registers, subtracts the value of the third SIMD&FP source register, and writes the result to the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 1 Rm 1 Ra Rn Rd
o1 o0

(Armv8.2)
FNMSUB <Hd>, <Hn>, <Hm>, <Ha>
FNMSUB <Sd>, <Sn>, <Sm>, <Sa>
FNMSUB <Dd>, <Dn>, <Dm>, <Da>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
field.
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
field.
field.
FNMSUB Page 891

<Ha> Is the 16-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
field.
field.
<Sa> Is the 32-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
Operation
operanda = FPNeg(operanda);
V[d] = result;
FNMSUB Page 892

FNMUL (scalar)
Floating-point Multiply-Negate (scalar). This instruction multiplies the floating-point values of the two source
SIMD&FP registers, and writes the negation of the result to the destination SIMD&FP register.
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 1 0 0 0 1 0 Rn Rd
op

(Armv8.2)
FNMUL <Hd>, <Hn>, <Hm>
FNMUL <Sd>, <Sn>, <Sm>
FNMUL <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FNMUL (scalar) Page 893

Operation
result = FPMul(operand1, operand2, FPCR);
result = FPNeg(result);
V[d] = result;
FNMUL (scalar) Page 894

FRECPE
Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element
in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP
register.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
FRECPE <Hd>, <Hn>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
FRECPE <V><d>, <V><n>


(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
FRECPE Page 895

FRECPE <Vd>.<T>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
FRECPE <Vd>.<T>, <Vn>.<T>

Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
FRECPE Page 896

Operation
Elem[result, e, esize] = FPRecipEstimate(element, FPCR);
V[d] = result;
FRECPE Page 897

FRECPS
Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the
two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a
vector, and writes the vector to the destination SIMD&FP register.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd
FRECPS <Hd>, <Hn>, <Hm>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd
FRECPS <V><d>, <V><n>, <V><m>

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd
FRECPS Page 898

FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd
FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
FRECPS Page 899

Operation
Elem[result, e, esize] = FPRecipStepFused(element1, element2);
V[d] = result;
FRECPS Page 900

FRECPX
Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector
element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination
SIMD&FP register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 0 Rn Rd
Half-precision
FRECPX <Hd>, <Hn>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
FRECPX <V><d>, <V><n>

Assembler Symbols
sz <V>
0 S
1 D
FRECPX Page 901

Operation
Elem[result, e, esize] = FPRecpX(element, FPCR);
V[d] = result;
FRECPX Page 902

FRINT32X (vector)
Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size
using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.

(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 0 1 0 Rn Rd
U op
FRINT32X <Vd>.<T>, <Vn>.<T>
if !HaveFrintExt() then UNDEFINED;


integer intsize = if op == '0' then 32 else 64;
FPRounding rounding = if U == '0' then FPRounding_ZERO else FPRoundingMode(FPCR);
Assembler Symbols
<T> Is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
Elem[result, e, esize] = FPRoundIntN(element, FPCR, rounding, intsize);
V[d] = result;
FRINT32X (vector) Page 903


FRINT32X (scalar)
Floating-point Round to 32-bit Integer, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value that fits into a 32-bit integer size using the
rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the input
value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the instruction returns {for the
corresponding result value} the most negative integer representable in the destination size, and an Invalid Operation
floating-point exception is raised.
Floating-point
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 0 1 1 0 0 0 0 Rn Rd
ftype op
FRINT32X <Sd>, <Sn>
FRINT32X <Dd>, <Dn>

integer datasize;
case ftype of
when '1x' UNDEFINED;
FPRounding rounding = FPRoundingMode(FPCR);
Assembler Symbols
Operation
result = FPRoundIntN(operand, FPCR, rounding, 32);
V[d] = result;
FRINT32X (scalar) Page 905


FRINT32Z (vector)
Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in
the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round
towards Zero rounding mode, and writes the result to the SIMD&FP destination register.

(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 0 1 0 Rn Rd
U op
FRINT32Z <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FRINT32Z (vector) Page 907


FRINT32Z (scalar)
Floating-point Round to 32-bit Integer toward Zero (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value that fits into a 32-bit integer size using the Round towards
Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the
{corresponding} input value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the
instruction returns {for the corresponding result value} the most negative integer representable in the destination
size, and an Invalid Operation floating-point exception is raised.
Floating-point
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 0 0 1 0 0 0 0 Rn Rd
ftype op
FRINT32Z <Sd>, <Sn>
FRINT32Z <Dd>, <Dn>

integer datasize;
case ftype of
Assembler Symbols
Operation
result = FPRoundIntN(operand, FPCR, FPRounding_ZERO, 32);
V[d] = result;
FRINT32Z (scalar) Page 909


FRINT64X (vector)
Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size
using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.

(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
U op
FRINT64X <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;


FRINT64X (scalar)
Floating-point Round to 64-bit Integer, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value that fits into a 64-bit integer size using the
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the input
value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the instruction returns {for the
corresponding result value} the most negative integer representable in the destination size, and an Invalid Operation
floating-point exception is raised.
Floating-point
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 1 1 1 0 0 0 0 Rn Rd
ftype op
FRINT64X <Sd>, <Sn>
FRINT64X <Dd>, <Dn>

integer datasize;
case ftype of
Assembler Symbols
Operation
result = FPRoundIntN(operand, FPCR, rounding, 64);
V[d] = result;


FRINT64Z (vector)
Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in
the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round
towards Zero rounding mode, and writes the result to the SIMD&FP destination register.

(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
U op
FRINT64Z <Vd>.<T>, <Vn>.<T>


Assembler Symbols
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;


FRINT64Z (scalar)
Floating-point Round to 64-bit Integer toward Zero (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value that fits into a 64-bit integer size using the Round towards
Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the
{corresponding} input value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the
instruction returns {for the corresponding result value} the most negative integer representable in the destination
size, and an Invalid Operation floating-point exception is raised.
Floating-point
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 1 0 1 0 0 0 0 Rn Rd
ftype op
FRINT64Z <Sd>, <Sn>
FRINT64Z <Dd>, <Dn>

integer datasize;
case ftype of
Assembler Symbols
Operation
result = FPRoundIntN(operand, FPCR, FPRounding_ZERO, 64);
V[d] = result;


FRINTA (vector)
Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-
point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to
Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
Half-precision
FRINTA <Vd>.<T>, <Vn>.<T>
integer esize = 16;

boolean exact = FALSE;

case U:o1:o2 of
when '0xx' rounding = FPDecodeRounding(o1:o2);
when '100' rounding = FPRounding_TIEAWAY;
when '110' rounding = FPRoundingMode(FPCR); exact = TRUE;
when '111' rounding = FPRoundingMode(FPCR);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
FRINTA (vector) Page 919

FRINTA <Vd>.<T>, <Vn>.<T>


case U:o1:o2 of
Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
Elem[result, e, esize] = FPRoundInt(element, FPCR, rounding, exact);
V[d] = result;
FRINTA (vector) Page 920

FRINTA (scalar)
Floating-point Round to Integral, to nearest with ties to Away (scalar). This instruction rounds a floating-point value in
the SIMD&FP source register to an integral floating-point value of the same size using the Round to Nearest with Ties
to Away rounding mode, and writes the result to the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 0 0 1 0 0 0 0 Rn Rd
rmode

(Armv8.2)
FRINTA <Hd>, <Hn>
FRINTA <Sd>, <Sn>
FRINTA <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FRINTA (scalar) Page 921

Operation
result = FPRoundInt(operand, FPCR, FPRounding_TIEAWAY, FALSE);
V[d] = result;
FRINTA (scalar) Page 922

FRINTI (vector)
Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-
point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding
mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
Half-precision
FRINTI <Vd>.<T>, <Vn>.<T>
integer esize = 16;


case U:o1:o2 of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
FRINTI (vector) Page 923

FRINTI <Vd>.<T>, <Vn>.<T>


case U:o1:o2 of
Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FRINTI (vector) Page 924

FRINTI (scalar)
Floating-point Round to Integral, using current rounding mode (scalar). This instruction rounds a floating-point value
in the SIMD&FP source register to an integral floating-point value of the same size using the rounding mode that is
determined by the FPCR, and writes the result to the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 1 1 1 0 0 0 0 Rn Rd
rmode

(Armv8.2)
FRINTI <Hd>, <Hn>
FRINTI <Sd>, <Sn>
FRINTI <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FRINTI (scalar) Page 925

Operation
result = FPRoundInt(operand, FPCR, rounding, FALSE);
V[d] = result;
FRINTI (scalar) Page 926

FRINTM (vector)
Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point
values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards
Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
Half-precision
FRINTM <Vd>.<T>, <Vn>.<T>
integer esize = 16;


case U:o1:o2 of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
FRINTM (vector) Page 927

FRINTM <Vd>.<T>, <Vn>.<T>


case U:o1:o2 of
Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FRINTM (vector) Page 928

FRINTM (scalar)
Floating-point Round to Integral, toward Minus infinity (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value of the same size using the Round towards Minus Infinity
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 0 1 0 0 0 0 Rn Rd
rmode

(Armv8.2)
FRINTM <Hd>, <Hn>
FRINTM <Sd>, <Sn>
FRINTM <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding('10');
Assembler Symbols
FRINTM (scalar) Page 929

Operation
V[d] = result;
FRINTM (scalar) Page 930

FRINTN (vector)
Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point
values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
Half-precision
FRINTN <Vd>.<T>, <Vn>.<T>
integer esize = 16;


case U:o1:o2 of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
FRINTN (vector) Page 931

FRINTN <Vd>.<T>, <Vn>.<T>


case U:o1:o2 of
Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FRINTN (vector) Page 932

FRINTN (scalar)
Floating-point Round to Integral, to nearest with ties to even (scalar). This instruction rounds a floating-point value in
the SIMD&FP source register to an integral floating-point value of the same size using the Round to Nearest rounding
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 0 1 0 0 0 0 Rn Rd
rmode

(Armv8.2)
FRINTN <Hd>, <Hn>
FRINTN <Sd>, <Sn>
FRINTN <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FRINTN (scalar) Page 933

Operation
V[d] = result;
FRINTN (scalar) Page 934

FRINTP (vector)
Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values
in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus
Infinity rounding mode, and writes the result to the SIMD&FP destination register.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
Half-precision
FRINTP <Vd>.<T>, <Vn>.<T>
integer esize = 16;


case U:o1:o2 of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
FRINTP (vector) Page 935

FRINTP <Vd>.<T>, <Vn>.<T>


case U:o1:o2 of
Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FRINTP (vector) Page 936

FRINTP (scalar)
Floating-point Round to Integral, toward Plus infinity (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value of the same size using the Round towards Plus Infinity
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 1 1 0 0 0 0 Rn Rd
rmode

(Armv8.2)
FRINTP <Hd>, <Hn>
FRINTP <Sd>, <Sn>
FRINTP <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FRINTP (scalar) Page 937

Operation
V[d] = result;
FRINTP (scalar) Page 938

FRINTX (vector)
Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the
When a result value is not numerically equal to the corresponding input value, an Inexact exception is raised. A zero
input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is
propagated as for normal arithmetic.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
Half-precision
FRINTX <Vd>.<T>, <Vn>.<T>
integer esize = 16;


case U:o1:o2 of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
FRINTX (vector) Page 939

FRINTX <Vd>.<T>, <Vn>.<T>


case U:o1:o2 of
Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FRINTX (vector) Page 940

FRINTX (scalar)
Floating-point Round to Integral exact, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value of the same size using the rounding mode
that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
When the result value is not numerically equal to the input value, an Inexact exception is raised. A zero input gives a
zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as
for normal arithmetic.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 1 0 1 0 0 0 0 Rn Rd
rmode

(Armv8.2)
FRINTX <Hd>, <Hn>
FRINTX <Sd>, <Sn>
FRINTX <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FRINTX (scalar) Page 941

Operation
result = FPRoundInt(operand, FPCR, rounding, TRUE);
V[d] = result;
FRINTX (scalar) Page 942

FRINTZ (vector)
Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the
SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
Half-precision
FRINTZ <Vd>.<T>, <Vn>.<T>
integer esize = 16;


case U:o1:o2 of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
FRINTZ (vector) Page 943

FRINTZ <Vd>.<T>, <Vn>.<T>


case U:o1:o2 of
Assembler Symbols
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
V[d] = result;
FRINTZ (vector) Page 944

FRINTZ (scalar)
Floating-point Round to Integral, toward Zero (scalar). This instruction rounds a floating-point value in the SIMD&FP
source register to an integral floating-point value of the same size using the Round towards Zero rounding mode, and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 1 1 0 0 0 0 Rn Rd
rmode

(Armv8.2)
FRINTZ <Hd>, <Hn>
FRINTZ <Sd>, <Sn>
FRINTZ <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FRINTZ (scalar) Page 945

Operation
V[d] = result;
FRINTZ (scalar) Page 946

FRSQRTE
Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each
vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination
SIMD&FP register.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
FRSQRTE <Hd>, <Hn>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
FRSQRTE <V><d>, <V><n>


(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
FRSQRTE Page 947

FRSQRTE <Vd>.<T>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
FRSQRTE <Vd>.<T>, <Vn>.<T>

Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
FRSQRTE Page 948

Operation
Elem[result, e, esize] = FPRSqrtEstimate(element, FPCR);
V[d] = result;
FRSQRTE Page 949

FRSQRTS
Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the
vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0,
places the results into a vector, and writes the vector to the destination SIMD&FP register.
exception traps.

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 0 Rm 0 0 1 1 1 1 Rn Rd
FRSQRTS <Hd>, <Hn>, <Hm>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 Rm 1 1 1 1 1 1 Rn Rd
FRSQRTS <V><d>, <V><n>, <V><m>

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 1 1 1 1 Rn Rd
FRSQRTS Page 950

FRSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 1 1 Rn Rd
FRSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
FRSQRTS Page 951

Operation
Elem[result, e, esize] = FPRSqrtStepFused(element1, element2);
V[d] = result;
FRSQRTS Page 952

FSQRT (vector)
Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source
SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 0 Rn Rd
Half-precision
FSQRT <Vd>.<T>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
FSQRT <Vd>.<T>, <Vn>.<T>

Assembler Symbols
Q <T>
0 4H
1 8H
FSQRT (vector) Page 953

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
Elem[result, e, esize] = FPSqrt(element, FPCR);
V[d] = result;
FSQRT (vector) Page 954

FSQRT (scalar)
Floating-point Square Root (scalar). This instruction calculates the square root of the value in the SIMD&FP source
register and writes the result to the SIMD&FP destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 1 1 0 0 0 0 Rn Rd
opc

(Armv8.2)
FSQRT <Hd>, <Hn>
FSQRT <Sd>, <Sn>
FSQRT <Dd>, <Dn>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FSQRT (scalar) Page 955

Operation
result = FPSqrt(operand, FPCR);
V[d] = result;
FSQRT (scalar) Page 956

FSUB (vector)
Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP
register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into
elements of a vector, and writes the vector to the destination SIMD&FP register.
exception traps.
Half-precision
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd
U
Half-precision
FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U
FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 4H
1 8H
FSUB (vector) Page 957

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
bits(esize) diff;
diff = FPSub(element1, element2, FPCR);
Elem[result, e, esize] = if abs then FPAbs(diff) else diff;
V[d] = result;
FSUB (vector) Page 958

FSUB (scalar)
Floating-point Subtract (scalar). This instruction subtracts the floating-point value of the second source SIMD&FP
register from the floating-point value of the first source SIMD&FP register, and writes the result to the destination
SIMD&FP register.
exception traps.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 1 1 0 Rn Rd
op

(Armv8.2)
FSUB <Hd>, <Hn>, <Hm>
FSUB <Sd>, <Sn>, <Sm>
FSUB <Dd>, <Dn>, <Dm>
integer datasize;
case ftype of
when '11'
datasize = 16;
else
UNDEFINED;
Assembler Symbols
FSUB (scalar) Page 959

Operation
result = FPSub(operand1, operand2, FPCR);

V[d] = result;
FSUB (scalar) Page 960

INS (element)
Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP
register to the specified vector element of the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
This instruction is used by the alias MOV (element).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 0 0 imm5 0 imm4 1 Rn Rd
Advanced SIMD
INS <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]

integer dst_index = UInt(imm5<4:size+1>);

integer src_index = UInt(imm4<3:size>);
// imm4<size-1:0> is IGNORED
Assembler Symbols
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
<index1> Is the destination element index encoded in “imm5”:

imm5 <index1>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
<index2> Is the source element index encoded in “imm5:imm4”:
imm5 <index2>
x0000 RESERVED
xxxx1 imm4<3:0>
xxx10 imm4<3:1>
xx100 imm4<3:2>
x1000 imm4<3>
INS (element) Page 961

Operation
bits(128) result;
result = V[d];
Elem[result, dst_index, esize] = Elem[operand, src_index, esize];
V[d] = result;
If PSTATE.DIT is 1:
INS (element) Page 962

INS (general)
Insert vector element from general-purpose register. This instruction copies the contents of the source general-
purpose register to the specified vector element in the destination SIMD&FP register.
bits to zero.
This instruction is used by the alias MOV (from general).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 0 imm5 0 0 0 1 1 1 Rn Rd
Advanced SIMD
INS <Vd>.<Ts>[<index>], <R><n>

Assembler Symbols
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X
INS (general) Page 963

Operation
bits(esize) element = X[n];
bits(128) result;
result = V[d];
Elem[result, index, esize] = element;
V[d] = result;
If PSTATE.DIT is 1:
INS (general) Page 964

LD1 (multiple structures)
Load multiple single-element structures to one, two, three, or four registers. This instruction loads multiple single-
element structures from memory and writes the result to one, two, three, or four SIMD&FP registers.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 x x 1 x size Rn Rt
L opcode
One register (opcode == 0111)
LD1 { <Vt>.<T> }, [<Xn|SP>]
Two registers (opcode == 1010)
LD1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Three registers (opcode == 0110)
LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]
Four registers (opcode == 0010)
LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]
integer m = integer UNKNOWN;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm x x 1 x size Rn Rt
L opcode
LD1 (multiple structures) Page 965

One register, immediate offset (Rm == 11111 && opcode == 0111)
LD1 { <Vt>.<T> }, [<Xn|SP>], <imm>
One register, register offset (Rm != 11111 && opcode == 0111)
LD1 { <Vt>.<T> }, [<Xn|SP>], <Xm>
Two registers, immediate offset (Rm == 11111 && opcode == 1010)
LD1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
Two registers, register offset (Rm != 11111 && opcode == 1010)
LD1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
Three registers, immediate offset (Rm == 11111 && opcode == 0110)
LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>
Three registers, register offset (Rm != 11111 && opcode == 0110)
LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>
Four registers, immediate offset (Rm == 11111 && opcode == 0010)
LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>
Four registers, register offset (Rm != 11111 && opcode == 0010)
LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<imm> For the one register, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #8
1 #16
For the two registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #16
1 #32
For the three registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #24
1 #48
For the four registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #32
1 #64
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer rpt; // number of iterations

integer selem; // structure elements
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

Operation
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

LD1 (single structure)
Load one single-element structure to one lane of one register. This instruction loads a single-element structure from
memory and writes the result to the specified lane of the SIMD&FP register without affecting the other bits of the
register.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode
8-bit (opcode == 000)
LD1 { <Vt>.B }[<index>], [<Xn|SP>]
16-bit (opcode == 010 && size == x0)
LD1 { <Vt>.H }[<index>], [<Xn|SP>]
32-bit (opcode == 100 && size == 00)
LD1 { <Vt>.S }[<index>], [<Xn|SP>]
64-bit (opcode == 100 && S == 0 && size == 01)
LD1 { <Vt>.D }[<index>], [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm x x 0 S size Rn Rt
L R opcode
LD1 (single structure) Page 969

8-bit, immediate offset (Rm == 11111 && opcode == 000)
LD1 { <Vt>.B }[<index>], [<Xn|SP>], #1
8-bit, register offset (Rm != 11111 && opcode == 000)
LD1 { <Vt>.B }[<index>], [<Xn|SP>], <Xm>
16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)
LD1 { <Vt>.H }[<index>], [<Xn|SP>], #2
16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)
LD1 { <Vt>.H }[<index>], [<Xn|SP>], <Xm>
32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)
LD1 { <Vt>.S }[<index>], [<Xn|SP>], #4
32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)
LD1 { <Vt>.S }[<index>], [<Xn|SP>], <Xm>
64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)
LD1 { <Vt>.D }[<index>], [<Xn|SP>], #8
64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)
LD1 { <Vt>.D }[<index>], [<Xn|SP>], <Xm>
Assembler Symbols
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".

Shared Decode
integer scale = UInt(opcode<2:1>);

integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;

integer esize = 8 << scale;

Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
t = (t + 1) MOD 32;
else
// load/store one element per register
rval = V[t];
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

LD1R
Load one single-element structure and Replicate to all lanes (of one register). This instruction loads a single-element
structure from memory and replicates the structure to all the lanes of the SIMD&FP register.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 size Rn Rt
L R opcode S
No offset
LD1R { <Vt>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm 1 1 0 0 size Rn Rt
L R opcode S
Immediate offset (Rm == 11111)
LD1R { <Vt>.<T> }, [<Xn|SP>], <imm>
Register offset (Rm != 11111)
LD1R { <Vt>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
LD1R Page 973

<imm> Is the post-index immediate offset, encoded in “size”:
size <imm>
00 #1
01 #2
10 #4
11 #8
Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;

LD1R Page 974

Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else
LD1R Page 975

Load multiple 2-element structures to two registers. This instruction loads multiple 2-element structures from memory
and writes the result to the two SIMD&FP registers, with de-interleaving.
For an example of de-interleaving, see LD3 (multiple structures).
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 size Rn Rt
L opcode
No offset
LD2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 1 0 0 0 size Rn Rt
L opcode
LD2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
LD2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<imm> Is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #16
1 #32
Shared Decode


case opcode of


Operation
bits(64) address;
bits(64) offs;
integer tt;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
tt = (t + r) MOD 32;
rval = V[tt];
V[tt] = rval;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Load single 2-element structure to one lane of two registers. This instruction loads a 2-element structure from memory
and writes the result to the corresponding elements of the two SIMD&FP registers without affecting the other bits of
the registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode
LD2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>]
16-bit (opcode == 010 && size == x0)
LD2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>]
32-bit (opcode == 100 && size == 00)
LD2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>]
64-bit (opcode == 100 && S == 0 && size == 01)
LD2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm x x 0 S size Rn Rt
L R opcode

LD2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], #2
LD2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], <Xm>
LD2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], #4
LD2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], <Xm>
LD2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], #8
LD2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], <Xm>
LD2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], #16
LD2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], <Xm>
Assembler Symbols

Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;


Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

LD2R
Load single 2-element structure and Replicate to all lanes of two registers. This instruction loads a 2-element structure
from memory and replicates the structure to all the lanes of the two SIMD&FP registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 size Rn Rt
L R opcode S
No offset
LD2R { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm 1 1 0 0 size Rn Rt
L R opcode S
LD2R { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
LD2R { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
LD2R Page 983

size <imm>
00 #2
01 #4
10 #8
11 #16
Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;

LD2R Page 984

Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else
LD2R Page 985

Load multiple 3-element structures to three registers. This instruction loads multiple 3-element structures from
memory and writes the result to the three SIMD&FP registers, with de-interleaving.
The following figure shows an example of the operation of de-interleaving of a LD3.16 (multiple 3-element structures)
instruction:.
Memory
A[0].x
A[0].y
A[0].z
A[1].x
A[1].y
A is a packed array of
A[1].z
3-element structures.
A[2].x
Each element is a 16-bit
A[2].y
halfword. A[2].z
A[3].x
A[3].y X3X2X1X0D0
A[3].z Y3Y2Y1Y0D1 Registers
Z3Z2Z1Z0D2
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 size Rn Rt
L opcode
No offset
LD3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 0 1 0 0 size Rn Rt
L opcode
LD3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>
LD3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Q <imm>
0 #24
1 #48
Shared Decode


case opcode of


Operation
bits(64) address;
bits(64) offs;
integer tt;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
tt = (t + r) MOD 32;
rval = V[tt];
V[tt] = rval;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Load single 3-element structure to one lane of three registers). This instruction loads a 3-element structure from
memory and writes the result to the corresponding elements of the three SIMD&FP registers without affecting the
other bits of the registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode
LD3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>]
16-bit (opcode == 011 && size == x0)
LD3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>]
32-bit (opcode == 101 && size == 00)
LD3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>]
64-bit (opcode == 101 && S == 0 && size == 01)
LD3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm x x 1 S size Rn Rt
L R opcode

LD3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], #3
LD3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], <Xm>
LD3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], #6
LD3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], <Xm>
LD3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], #12
LD3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], <Xm>
LD3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], #24
LD3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], <Xm>
Assembler Symbols

Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;


Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

LD3R
Load single 3-element structure and Replicate to all lanes of three registers. This instruction loads a 3-element
structure from memory and replicates the structure to all the lanes of the three SIMD&FP registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 size Rn Rt
L R opcode S
No offset
LD3R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm 1 1 1 0 size Rn Rt
L R opcode S
LD3R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>
LD3R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
LD3R Page 993

size <imm>
00 #3
01 #6
10 #12
11 #24
Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;

LD3R Page 994

Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else
LD3R Page 995

Load multiple 4-element structures to four registers. This instruction loads multiple 4-element structures from
memory and writes the result to the four SIMD&FP registers, with de-interleaving.
For an example of de-interleaving, see LD3 (multiple structures).
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 size Rn Rt
L opcode
No offset
LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 0 0 0 0 size Rn Rt
L opcode
LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>
LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

Q <imm>
0 #32
1 #64
Shared Decode


case opcode of


Operation
bits(64) address;
bits(64) offs;
integer tt;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
tt = (t + r) MOD 32;
rval = V[tt];
V[tt] = rval;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Load single 4-element structure to one lane of four registers. This instruction loads a 4-element structure from
memory and writes the result to the corresponding elements of the four SIMD&FP registers without affecting the
other bits of the registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode
LD4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>]
16-bit (opcode == 011 && size == x0)
LD4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>]
32-bit (opcode == 101 && size == 00)
LD4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>]
64-bit (opcode == 101 && S == 0 && size == 01)
LD4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm x x 1 S size Rn Rt
L R opcode

LD4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], #4
LD4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], <Xm>
LD4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], #8
LD4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], <Xm>
LD4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], #16
LD4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], <Xm>
LD4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], #32
LD4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], <Xm>
Assembler Symbols

Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;


Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

LD4R
Load single 4-element structure and Replicate to all lanes of four registers. This instruction loads a 4-element
structure from memory and replicates the structure to all the lanes of the four SIMD&FP registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 size Rn Rt
L R opcode S
No offset
LD4R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm 1 1 1 0 size Rn Rt
L R opcode S
LD4R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>
LD4R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
LD4R Page 1003

size <imm>
00 #4
01 #8
10 #16
11 #32
Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;

LD4R Page 1004

Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else
LD4R Page 1005

LDNP (SIMD&FP)
Load Pair of SIMD&FP registers, with Non-temporal hint. This instruction loads a pair of SIMD&FP registers from
memory, issuing a hint to the memory system that the access is non-temporal. The address that is used for the load is
calculated from a base register value and an optional immediate offset.
For information about non-temporal pair instructions, see Load/Store SIMD and Floating-point Non-temporal pair.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 0 1 imm7 Rt2 Rn Rt
L
32-bit (opc == 00)
LDNP <St1>, <St2>, [<Xn|SP>{, #<imm>}]
64-bit (opc == 01)
LDNP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]
128-bit (opc == 10)
LDNP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]
// Empty.
UNPREDICTABLE behaviors, and particularly LDNP (SIMD&FP).
Assembler Symbols
<Dt1> Is the 64-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt2> Is the 64-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Qt1> Is the 128-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt2> Is the 128-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<St1> Is the 32-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<St2> Is the 32-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
For the 128-bit variant: is the optional signed immediate byte offset, a multiple of 16 in the range -1024
to 1008, defaulting to 0 and encoded in the "imm7" field as <imm>/16.
Shared Decode
if opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc);
LDNP (SIMD&FP) Page 1006

Operation
bits(64) address;
if t == t2 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data1 = Mem[address, dbytes, AccType_VECSTREAM];

data2 = Mem[address+dbytes, dbytes, AccType_VECSTREAM];
if rt_unknown then
V[t] = data1;
V[t2] = data2;
LDNP (SIMD&FP) Page 1007

LDP (SIMD&FP)
Load Pair of SIMD&FP registers. This instruction loads a pair of SIMD&FP registers from memory. The address that is
used for the load is calculated from a base register value and an optional immediate offset.
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 1 1 imm7 Rt2 Rn Rt
L
32-bit (opc == 00)
LDP <St1>, <St2>, [<Xn|SP>], #<imm>
64-bit (opc == 01)
LDP <Dt1>, <Dt2>, [<Xn|SP>], #<imm>
128-bit (opc == 10)
LDP <Qt1>, <Qt2>, [<Xn|SP>], #<imm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 1 1 imm7 Rt2 Rn Rt
L
32-bit (opc == 00)
LDP <St1>, <St2>, [<Xn|SP>, #<imm>]!
64-bit (opc == 01)
LDP <Dt1>, <Dt2>, [<Xn|SP>, #<imm>]!
128-bit (opc == 10)
LDP <Qt1>, <Qt2>, [<Xn|SP>, #<imm>]!

Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 0 1 imm7 Rt2 Rn Rt
L
LDP (SIMD&FP) Page 1008

32-bit (opc == 00)
LDP <St1>, <St2>, [<Xn|SP>{, #<imm>}]
64-bit (opc == 01)
LDP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]
128-bit (opc == 10)
LDP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]

UNPREDICTABLE behaviors, and particularly LDP (SIMD&FP).
Assembler Symbols
For the 128-bit post-index and 128-bit pre-index variant: is the signed immediate byte offset, a multiple
of 16 in the range -1024 to 1008, encoded in the "imm7" field as <imm>/16.
Shared Decode

Operation
bits(64) address;
if t == t2 then
case c of
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
data1 = Mem[address, dbytes, AccType_VEC];

data2 = Mem[address+dbytes, dbytes, AccType_VEC];
if rt_unknown then
V[t] = data1;
V[t2] = data2;
if wback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;

LDR (immediate, SIMD&FP)
Load SIMD&FP Register (immediate offset). This instruction loads an element from memory, and writes the result as a
scalar to the SIMD&FP register. The address that is used for the load is calculated from a base register value, a signed
immediate offset, and an optional offset that is a multiple of the element size.
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 0 1 Rn Rt
opc
8-bit (size == 00 && opc == 01)
LDR <Bt>, [<Xn|SP>], #<simm>
16-bit (size == 01 && opc == 01)
LDR <Ht>, [<Xn|SP>], #<simm>
32-bit (size == 10 && opc == 01)
LDR <St>, [<Xn|SP>], #<simm>
64-bit (size == 11 && opc == 01)
LDR <Dt>, [<Xn|SP>], #<simm>
128-bit (size == 00 && opc == 11)
LDR <Qt>, [<Xn|SP>], #<simm>

integer scale = UInt(opc<1>:size);
if scale > 4 then UNDEFINED;
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 1 1 Rn Rt
opc
LDR (immediate, SIMD&FP) Page 1011

8-bit (size == 00 && opc == 01)
LDR <Bt>, [<Xn|SP>, #<simm>]!
16-bit (size == 01 && opc == 01)
LDR <Ht>, [<Xn|SP>, #<simm>]!
32-bit (size == 10 && opc == 01)
LDR <St>, [<Xn|SP>, #<simm>]!
64-bit (size == 11 && opc == 01)
LDR <Dt>, [<Xn|SP>, #<simm>]!
128-bit (size == 00 && opc == 11)
LDR <Qt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 1 x 1 imm12 Rn Rt
opc
8-bit (size == 00 && opc == 01)
LDR <Bt>, [<Xn|SP>{, #<pimm>}]
16-bit (size == 01 && opc == 01)
LDR <Ht>, [<Xn|SP>{, #<pimm>}]
32-bit (size == 10 && opc == 01)
LDR <St>, [<Xn|SP>{, #<pimm>}]
64-bit (size == 11 && opc == 01)
LDR <Dt>, [<Xn|SP>{, #<pimm>}]
128-bit (size == 00 && opc == 11)
LDR <Qt>, [<Xn|SP>{, #<pimm>}]


Assembler Symbols
<Bt> Is the 8-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt> Is the 64-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Ht> Is the 16-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<pimm> For the 8-bit variant: is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to
0 and encoded in the "imm12" field.
Shared Decode
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;
when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;
if wback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;

LDR (literal, SIMD&FP)
Load SIMD&FP Register (PC-relative literal). This instruction loads a SIMD&FP register from memory. The address
that is used for the load is calculated from the PC value and an immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 0 1 1 1 0 0 imm19 Rt
32-bit (opc == 00)
LDR <St>, <label>
64-bit (opc == 01)
LDR <Dt>, <label>
128-bit (opc == 10)
LDR <Qt>, <label>
integer size;
bits(64) offset;
case opc of
when '00'
size = 4;
when '01'
size = 8;
when '10'
size = 16;
when '11'
UNDEFINED;
Assembler Symbols
<Dt> Is the 64-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
Operation

bits(size*8) data;
data = Mem[address, size, AccType_VEC];

V[t] = data;
LDR (literal, SIMD&FP) Page 1015

LDR (literal, SIMD&FP) Page 1016

LDR (register, SIMD&FP)
Load SIMD&FP Register (register offset). This instruction loads a SIMD&FP register from memory. The address that is
used for the load is calculated from a base register value and an offset register value. The offset can be optionally
shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 1 Rm option S 1 0 Rn Rt
opc
8-fsreg,LDR-8-fsreg (size == 00 && opc == 01 && option != 011)
LDR <Bt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]
8-fsreg,LDR-8-fsreg (size == 00 && opc == 01 && option == 011)
LDR <Bt>, [<Xn|SP>, <Xm>{, LSL <amount>}]
16-fsreg,LDR-16-fsreg (size == 01 && opc == 01)
LDR <Ht>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
LDR <St>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
LDR <Dt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
LDR <Qt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

Assembler Symbols
"Rm" field.
"Rm" field.
<extend> For the 8-bit variant: is the index extend specifier, encoded in “option”:
LDR (register, SIMD&FP) Page 1017

option <extend>
010 UXTW
110 SXTW
111 SXTX
For the 128-bit, 16-bit, 32-bit and 64-bit variant: is the index extend/shift specifier, defaulting to LSL,
and which must be omitted for the LSL option when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> For the 8-bit variant: is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if
present.
S <amount>
0 #0
1 #1
S <amount>
0 #0
1 #2
S <amount>
0 #0
1 #3
S <amount>
0 #0
1 #4
Shared Decode

Operation

bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = V[t];
when MemOp_LOAD
V[t] = data;

LDUR (SIMD&FP)
Load SIMD&FP Register (unscaled offset). This instruction loads a SIMD&FP register from memory. The address that
is used for the load is calculated from a base register value and an optional immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 0 0 Rn Rt
opc
8-bit (size == 00 && opc == 01)
LDUR <Bt>, [<Xn|SP>{, #<simm>}]
16-bit (size == 01 && opc == 01)
LDUR <Ht>, [<Xn|SP>{, #<simm>}]
32-bit (size == 10 && opc == 01)
LDUR <St>, [<Xn|SP>{, #<simm>}]
64-bit (size == 11 && opc == 01)
LDUR <Dt>, [<Xn|SP>{, #<simm>}]
128-bit (size == 00 && opc == 11)
LDUR <Qt>, [<Xn|SP>{, #<simm>}]

Assembler Symbols
the "imm9" field.
Shared Decode
LDUR (SIMD&FP) Page 1020

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = V[t];
when MemOp_LOAD
V[t] = data;
LDUR (SIMD&FP) Page 1021

MLA (by element)
Multiply-Add to accumulator (vector, by element). This instruction multiplies the vector elements in the first source
SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the results with
the vector elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 0 0 0 H 0 Rn Rd
o2
Vector
MLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;

Assembler Symbols
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
<index> Is the element index, encoded in “size:L:H:M”:
MLA (by element) Page 1022

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;
bits(esize) product;
element2 = UInt(Elem[operand2, index, esize]);

element1 = UInt(Elem[operand1, e, esize]);
product = (element1*element2)<esize-1:0>;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
V[d] = result;
If PSTATE.DIT is 1:
MLA (by element) Page 1023

MLA (vector)
Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two
source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 0 1 Rn Rd
U
MLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
product = (UInt(element1)*UInt(element2))<esize-1:0>;
if sub_op then
else
V[d] = result;
MLA (vector) Page 1024

If PSTATE.DIT is 1:
MLA (vector) Page 1025

MLS (by element)
Multiply-Subtract from accumulator (vector, by element). This instruction multiplies the vector elements in the first
source SIMD&FP register by the specified value in the second source SIMD&FP register, and subtracts the results
from the vector elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer
values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 1 0 0 H 0 Rn Rd
o2
Vector
MLS <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
MLS (by element) Page 1026

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;

if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
MLS (by element) Page 1027

MLS (vector)
Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the
two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 1 0 1 Rn Rd
U
MLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
if sub_op then
else
V[d] = result;
MLS (vector) Page 1028

If PSTATE.DIT is 1:
MLS (vector) Page 1029

MOV (scalar)
Move vector element to scalar. This instruction duplicates the specified vector element in the SIMD&FP source
register into a scalar, and writes the result to the SIMD&FP destination register.
This is an alias of DUP (element). This means:
• The encodings in this description are named to match the encodings of DUP (element).
• The description of DUP (element) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd
Scalar
MOV <V><d>, <Vn>.<T>[<index>]
is equivalent to
DUP <V><d>, <Vn>.<T>[<index>]
Assembler Symbols
<V> Is the destination width specifier, encoded in “imm5”:

imm5 <V>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
<T> Is the element width specifier, encoded in “imm5”:
imm5 <T>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
Operation
The description of DUP (element) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MOV (scalar) Page 1030

MOV (scalar) Page 1031

MOV (element)
Move vector element to another vector element. This instruction copies the vector element of the source SIMD&FP
register to the specified vector element of the destination SIMD&FP register.
bits to zero.
This is an alias of INS (element). This means:
• The encodings in this description are named to match the encodings of INS (element).
• The description of INS (element) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 0 0 imm5 0 imm4 1 Rn Rd
Advanced SIMD
MOV <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]
is equivalent to
INS <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]
Assembler Symbols
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
<index1> Is the destination element index encoded in “imm5”:

imm5 <index1>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
<index2> Is the source element index encoded in “imm5:imm4”:
imm5 <index2>
x0000 RESERVED
xxxx1 imm4<3:0>
xxx10 imm4<3:1>
xx100 imm4<3:2>
x1000 imm4<3>
Operation
The description of INS (element) gives the operational pseudocode for this instruction.
MOV (element) Page 1032

If PSTATE.DIT is 1:
MOV (element) Page 1033

MOV (from general)
Move general-purpose register to a vector element. This instruction copies the contents of the source general-purpose
register to the specified vector element in the destination SIMD&FP register.
bits to zero.
This is an alias of INS (general). This means:
• The encodings in this description are named to match the encodings of INS (general).
• The description of INS (general) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 0 imm5 0 0 0 1 1 1 Rn Rd
Advanced SIMD
MOV <Vd>.<Ts>[<index>], <R><n>
is equivalent to
INS <Vd>.<Ts>[<index>], <R><n>
Assembler Symbols
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X
Operation
The description of INS (general) gives the operational pseudocode for this instruction.
MOV (from general) Page 1034

If PSTATE.DIT is 1:
MOV (from general) Page 1035

MOV (vector)
Move vector. This instruction copies the vector in the source SIMD&FP register into the destination SIMD&FP
register.
This is an alias of ORR (vector, register). This means:
• The encodings in this description are named to match the encodings of ORR (vector, register).
• The description of ORR (vector, register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
size
MOV <Vd>.<T>, <Vn>.<T>
is equivalent to
ORR <Vd>.<T>, <Vn>.<T>, <Vn>.<T>
and is the preferred disassembly when Rm == Rn.
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
The description of ORR (vector, register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MOV (vector) Page 1036

MOV (to general)
Move vector element to general-purpose register. This instruction reads the unsigned integer from the source
SIMD&FP register, zero-extends it to form a 32-bit or 64-bit value, and writes the result to the destination general-
purpose register.
This is an alias of UMOV. This means:
• The encodings in this description are named to match the encodings of UMOV.
• The description of UMOV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 x x x 0 0 0 0 1 1 1 1 Rn Rd
imm5
32-bit (Q == 0 && imm5 == xx100)
MOV <Wd>, <Vn>.S[<index>]
is equivalent to
UMOV <Wd>, <Vn>.S[<index>]
64-reg,UMOV-64-reg (Q == 1 && imm5 == x1000)
MOV <Xd>, <Vn>.D[<index>]
is equivalent to
UMOV <Xd>, <Vn>.D[<index>]
Assembler Symbols
<index> For the 32-bit variant: is the element index encoded in "imm5<4:3>".
For the 64-reg,UMOV-64-reg variant: is the element index encoded in "imm5<4>".
Operation
The description of UMOV gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MOV (to general) Page 1037

MOV (to general) Page 1038

MOVI
Move Immediate (vector). This instruction places an immediate constant into every vector element of the destination
SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q op 0 1 1 1 1 0 0 0 0 0 a b c cmode 0 1 d e f g h Rd
8-bit (op == 0 && cmode == 1110)
MOVI <Vd>.<T>, #<imm8>{, LSL #0}
16-bit shifted immediate (op == 0 && cmode == 10x0)
MOVI <Vd>.<T>, #<imm8>{, LSL #<amount>}
32-bit shifted immediate (op == 0 && cmode == 0xx0)
MOVI <Vd>.<T>, #<imm8>{, LSL #<amount>}
32-bit shifting ones (op == 0 && cmode == 110x)
MOVI <Vd>.<T>, #<imm8>, MSL #<amount>
64-bit scalar (Q == 0 && op == 1 && cmode == 1110)
MOVI <Dd>, #<imm>
64-bit vector (Q == 1 && op == 1 && cmode == 1110)
MOVI <Vd>.2D, #<imm>

bits(datasize) imm;
bits(64) imm64;
case cmode:op of
when '0xx00' operation = ImmediateOp_MOVI;
when '0xx10' operation = ImmediateOp_ORR;
when '10x00' operation = ImmediateOp_MOVI;
when '10x10' operation = ImmediateOp_ORR;
when '11110' operation = ImmediateOp_MOVI;
when '11111'

MOVI Page 1039

Assembler Symbols
<imm> Is a 64-bit immediate 'aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh',
encoded in "a:b:c:d:e:f:g:h".
Q <T>
0 8B
1 16B
Q <T>
0 4H
1 8H
Q <T>
0 2S
1 4S

<amount> For the 16-bit shifted immediate variant: is the shift amount encoded in “cmode<1>”:
cmode<1> <amount>
0 0
1 8
For the 32-bit shifted immediate variant: is the shift amount encoded in “cmode<2:1>”:
cmode<2:1> <amount>
00 0
01 8
10 16
11 24
For the 32-bit shifting ones variant: is the shift amount encoded in “cmode<0>”:
cmode<0> <amount>
0 8
1 16
Operation
case operation of
result = imm;
result = NOT(imm);
operand = V[rd];
operand = V[rd];
V[rd] = result;
MOVI Page 1040

If PSTATE.DIT is 1:
MOVI Page 1041

MUL (by element)
Multiply (vector, by element). This instruction multiplies the vector elements in the first source SIMD&FP register by
the specified value in the second source SIMD&FP register, places the results in a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are unsigned integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 0 0 H 0 Rn Rd
Vector
MUL <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
MUL (by element) Page 1042

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;

Elem[result, e, esize] = product;
V[d] = result;
If PSTATE.DIT is 1:
MUL (by element) Page 1043

MUL (vector)
Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP
registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 1 1 Rn Rd
U
MUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
if U == '1' && size != '00' then UNDEFINED;
boolean poly = (U == '1');
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
if poly then
product = PolynomialMult(element1, element2)<esize-1:0>;
else
V[d] = result;
MUL (vector) Page 1044

If PSTATE.DIT is 1:
MUL (vector) Page 1045

MVN
Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the
inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
This is an alias of NOT. This means:
• The encodings in this description are named to match the encodings of NOT.
• The description of NOT gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd
Vector
MVN <Vd>.<T>, <Vn>.<T>
is equivalent to
NOT <Vd>.<T>, <Vn>.<T>
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
The description of NOT gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
MVN Page 1046

MVNI
Move inverted Immediate (vector). This instruction places the inverse of an immediate constant into every vector
element of the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 0 0 0 a b c cmode 0 1 d e f g h Rd
op
16-bit shifted immediate (cmode == 10x0)
MVNI <Vd>.<T>, #<imm8>{, LSL #<amount>}
32-bit shifted immediate (cmode == 0xx0)
MVNI <Vd>.<T>, #<imm8>{, LSL #<amount>}
32-bit shifting ones (cmode == 110x)
MVNI <Vd>.<T>, #<imm8>, MSL #<amount>

bits(datasize) imm;
bits(64) imm64;
case cmode:op of
when '11111'

Assembler Symbols
Q <T>
0 4H
1 8H
Q <T>
0 2S
1 4S

<amount> For the 16-bit shifted immediate variant: is the shift amount encoded in “cmode<1>”:
MVNI Page 1047

cmode<1> <amount>
0 0
1 8
For the 32-bit shifted immediate variant: is the shift amount encoded in “cmode<2:1>”:
cmode<2:1> <amount>
00 0
01 8
10 16
11 24
For the 32-bit shifting ones variant: is the shift amount encoded in “cmode<0>”:
cmode<0> <amount>
0 8
1 16
Operation
case operation of
result = imm;
result = NOT(imm);
operand = V[rd];
operand = V[rd];
V[rd] = result;
If PSTATE.DIT is 1:
MVNI Page 1048

NEG (vector)
Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value,
puts the result into a vector, and writes the vector to the destination SIMD&FP register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U
Scalar
NEG <V><d>, <V><n>

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U
Vector
NEG <Vd>.<T>, <Vn>.<T>

Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
NEG (vector) Page 1049

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element;
if neg then
element = -element;
else
V[d] = result;
If PSTATE.DIT is 1:
NEG (vector) Page 1050

NOT
Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the
inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
This instruction is used by the alias MVN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd
Vector
NOT <Vd>.<T>, <Vn>.<T>
integer esize = 8;
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
Elem[result, e, esize] = NOT(element);
V[d] = result;
If PSTATE.DIT is 1:
NOT Page 1051

ORN (vector)
Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 Rm 0 0 0 1 1 1 Rn Rd
size
ORN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
V[d] = result;
If PSTATE.DIT is 1:
ORN (vector) Page 1052

ORR (vector, immediate)
Bitwise inclusive OR (vector, immediate). This instruction reads each vector element from the destination SIMD&FP
register, performs a bitwise OR between each result and an immediate constant, places the result into a vector, and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 0 0 0 a b c x x x 1 0 1 d e f g h Rd
op cmode
16-bit (cmode == 10x1)
ORR <Vd>.<T>, #<imm8>{, LSL #<amount>}
32-bit (cmode == 0xx1)
ORR <Vd>.<T>, #<imm8>{, LSL #<amount>}

bits(datasize) imm;
bits(64) imm64;
case cmode:op of
when '0xx00' operation = ImmediateOp_MOVI;
when '0xx10' operation = ImmediateOp_ORR;
when '10x10' operation = ImmediateOp_ORR;
when '11110' operation = ImmediateOp_MOVI;
Assembler Symbols
<Vd> Is the name of the SIMD&FP register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
Q <T>
0 2S
1 4S

<amount> For the 16-bit variant: is the shift amount encoded in “cmode<1>”:
cmode<1> <amount>
0 0
1 8
For the 32-bit variant: is the shift amount encoded in “cmode<2:1>”:
ORR (vector, immediate) Page 1053

cmode<2:1> <amount>
00 0
01 8
10 16
11 24
Operation
case operation of
result = imm;
result = NOT(imm);
operand = V[rd];
operand = V[rd];
V[rd] = result;
If PSTATE.DIT is 1:
ORR (vector, immediate) Page 1054

ORR (vector, register)
Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP
This instruction is used by the alias MOV (vector).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
size
ORR <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
Q <T>
0 8B
1 16B
Alias Conditions

MOV (vector) Rm == Rn
Operation
V[d] = result;
If PSTATE.DIT is 1:
ORR (vector, register) Page 1055

ORR (vector, register) Page 1056

PMUL
Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP
registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
For information about multiplying polynomials see Polynomial arithmetic over {0, 1}.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 1 1 1 Rn Rd
U
PMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
if U == '1' && size != '00' then UNDEFINED;
boolean poly = (U == '1');
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED
Operation
if poly then
product = PolynomialMult(element1, element2)<esize-1:0>;
else
V[d] = result;
PMUL Page 1057

If PSTATE.DIT is 1:
PMUL Page 1058

PMULL, PMULL2
Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors
of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP
register. The destination vector elements are twice as long as the elements that are multiplied.
For information about multiplying polynomials see Polynomial arithmetic over {0, 1}.
The PMULL instruction extracts each source vector from the lower half of each source register, while the PMULL2
instruction extracts each source vector from the upper half of each source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 1 0 0 0 Rn Rd
PMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if size == '11' && !HaveBit128PMULLExt() then UNDEFINED;
Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 RESERVED
10 RESERVED
11 1Q
The '1Q' arrangement is only allocated in an implementation that includes the Cryptographic Extension,
and is otherwise RESERVED.
size Q <Tb>
00 0 8B
00 1 16B
01 x RESERVED
10 x RESERVED
11 0 1D
11 1 2D
PMULL, PMULL2 Page 1059

Operation
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
Elem[result, e, 2*esize] = PolynomialMult(element1, element2);
V[d] = result;
If PSTATE.DIT is 1:
PMULL, PMULL2 Page 1060

RADDHN, RADDHN2
Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register
to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the
result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
The results are rounded. For truncated results, see ADDHN.
The RADDHN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the RADDHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 0 0 Rn Rd
U o1
RADDHN{2} <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
RADDHN, RADDHN2 Page 1061

Operation
bits(2*esize) sum;
if sub_op then
else
If PSTATE.DIT is 1:
RADDHN, RADDHN2 Page 1062

RAX1
Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1,
performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register,
and writes the result to the destination SIMD&FP register.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 1 1 Rn Rd
Advanced SIMD
RAX1 <Vd>.2D, <Vn>.2D, <Vm>.2D

Assembler Symbols
Operation
V[d] = Vn EOR (ROL(Vm<127:64>, 1):ROL(Vm<63:0>, 1));
If PSTATE.DIT is 1:
RAX1 Page 1063

RBIT (vector)
Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the
bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd
Vector
RBIT <Vd>.<T>, <Vn>.<T>
integer esize = 8;
Assembler Symbols
Q <T>
0 8B
1 16B
Operation
bits(esize) rev;
for i = 0 to esize-1
rev<esize-1-i> = element<i>;
Elem[result, e, esize] = rev;
V[d] = result;
If PSTATE.DIT is 1:
RBIT (vector) Page 1064

REV16 (vector)
Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of
the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 1 1 0 Rn Rd
U o0
Vector
REV16 <Vd>.<T>, <Vn>.<T>
// size=esize: B(0), H(1), S(1), D(S)

// op=REVx: 64(0), 32(1), 16(2)

bits(2) op = o0:U;
// => op+size:
// 64+B = 0, 64+H = 1, 64+S = 2, 64+D = X
// 32+B = 1, 32+H = 2, 32+S = X, 32+D = X
// 16+B = 2, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
// => 3-(op+size) (index bits in group)
// 64/B = 3, 64+H = 2, 64+S = 1, 64+D = X
// 32+B = 2, 32+H = 1, 32+S = X, 32+D = X
// 16+B = 1, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
// index bits within group: 1, 2, 3

if UInt(op) + UInt(size) >= 3 then UNDEFINED;
case op of
when '10' container_size = 16;

integer elements_per_container = container_size DIV esize;
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED
REV16 (vector) Page 1065

Operation
integer element = 0;
integer rev_element;
rev_element = element + elements_per_container - 1;
Elem[result, rev_element, esize] = Elem[operand, element, esize];
element = element + 1;
rev_element = rev_element - 1;
V[d] = result;
If PSTATE.DIT is 1:

REV32 (vector)
Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word
of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
U o0
Vector
REV32 <Vd>.<T>, <Vn>.<T>
// size=esize: B(0), H(1), S(1), D(S)

// op=REVx: 64(0), 32(1), 16(2)

bits(2) op = o0:U;
// => op+size:
// 64+B = 0, 64+H = 1, 64+S = 2, 64+D = X
// 32+B = 1, 32+H = 2, 32+S = X, 32+D = X
// 16+B = 2, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
// 64/B = 3, 64+H = 2, 64+S = 1, 64+D = X
// 32+B = 2, 32+H = 1, 32+S = X, 32+D = X
// 16+B = 1, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X

case op of

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
1x x RESERVED

Operation
V[d] = result;
If PSTATE.DIT is 1:

REV64
Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements
in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
U o0
Vector
REV64 <Vd>.<T>, <Vn>.<T>
// size=esize: B(0), H(1), S(1), D(S)

// op=REVx: 64(0), 32(1), 16(2)

bits(2) op = o0:U;
// => op+size:
// 64+B = 0, 64+H = 1, 64+S = 2, 64+D = X
// 32+B = 1, 32+H = 2, 32+S = X, 32+D = X
// 16+B = 2, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
// 64/B = 3, 64+H = 2, 64+S = 1, 64+D = X
// 32+B = 2, 32+H = 1, 32+S = X, 32+D = X
// 16+B = 1, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X

case op of

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
REV64 Page 1069

Operation
V[d] = result;
If PSTATE.DIT is 1:
REV64 Page 1070

RSHRN, RSHRN2
Rounding Shift Right Narrow (immediate). This instruction reads each unsigned integer value from the vector in the
source SIMD&FP register, right shifts each result by an immediate value, writes the final result to a vector, and writes
the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as
long as the source vector elements. The results are rounded. For truncated results, see SHRN.
The RSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
RSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op
Vector
RSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
Assembler Symbols
Q 2
0 [absent]
1 [present]
<Tb> Is an arrangement specifier, encoded in “immh:Q”:
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Ta> Is an arrangement specifier, encoded in “immh”:
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
RSHRN, RSHRN2 Page 1071

<shift> Is the right shift amount, in the range 1 to the destination element width in bits, encoded in
“immh:immb”:
immh <shift>
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
1xxx RESERVED
Operation
bits(datasize*2) operand = V[n];
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
element = (UInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
If PSTATE.DIT is 1:
RSHRN, RSHRN2 Page 1072

RSUBHN, RSUBHN2
Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source
SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most
significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP
register.
The results are rounded. For truncated results, see SUBHN.
The RSUBHN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the RSUBHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 0 0 Rn Rd
U o1
RSUBHN{2} <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
RSUBHN, RSUBHN2 Page 1073

Operation
bits(2*esize) sum;
if sub_op then
else
If PSTATE.DIT is 1:
RSUBHN, RSUBHN2 Page 1074

SABA
Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source
SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the
absolute values of the results into the elements of the vector of the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 1 1 Rn Rd
U ac
SABA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean accumulate = (ac == '1');
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
bits(esize) absdiff;
result = if accumulate then V[d] else Zeros();

absdiff = Abs(element1-element2)<esize-1:0>;
Elem[result, e, esize] = Elem[result, e, esize] + absdiff;
V[d] = result;
SABA Page 1075

If PSTATE.DIT is 1:
SABA Page 1076

SABAL, SABAL2
Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper
half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP
register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP
register. The destination vector elements are twice as long as the source vector elements.
The SABAL instruction extracts each source vector from the lower half of each source register, while the SABAL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 0 0 Rn Rd
U op
SABAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

boolean accumulate = (op == '0');

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SABAL, SABAL2 Page 1077

Operation
integer element1;
integer element2;
bits(2*esize) absdiff;

absdiff = Abs(element1-element2)<2*esize-1:0>;
Elem[result, e, 2*esize] = Elem[result, e, 2*esize] + absdiff;
V[d] = result;
If PSTATE.DIT is 1:
SABAL, SABAL2 Page 1078

SABD
Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP
register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the
results into a vector, and writes the vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 0 1 Rn Rd
U ac
SABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;

V[d] = result;
SABD Page 1079

If PSTATE.DIT is 1:
SABD Page 1080

SABDL, SABDL2
Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP
register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the
results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements.
The SABDL instruction writes the vector to the lower half of the destination register and clears the upper half, while the
SABDL2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 0 0 Rn Rd
U op
SABDL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SABDL, SABDL2 Page 1081

Operation
integer element1;
integer element2;

V[d] = result;
If PSTATE.DIT is 1:
SABDL, SABDL2 Page 1082

SADALP
Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the
vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 0 1 0 Rn Rd
U op
Vector
SADALP <Vd>.<Ta>, <Vn>.<Tb>

integer elements = datasize DIV (2 * esize);
boolean acc = (op == '1');
Assembler Symbols
<Ta> Is an arrangement specifier, encoded in “size:Q”:
size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SADALP Page 1083

Operation
bits(2*esize) sum;
integer op1;
integer op2;
if acc then result = V[d];

op1 = Int(Elem[operand, 2*e+0, esize], unsigned);
sum = (op1+op2)<2*esize-1:0>;
if acc then
Elem[result, e, 2*esize] = Elem[result, e, 2*esize] + sum;
else
Elem[result, e, 2*esize] = sum;
V[d] = result;
If PSTATE.DIT is 1:
SADALP Page 1084

SADDL, SADDL2
Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source
SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results
into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as
long as the source vector elements. All the values in this instruction are signed integer values.
The SADDL instruction extracts each source vector from the lower half of each source register, while the SADDL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 0 0 Rn Rd
U o1
SADDL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SADDL, SADDL2 Page 1085

Operation
integer element1;
integer element2;
integer sum;
if sub_op then
else
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
SADDL, SADDL2 Page 1086

SADDLP
Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source
SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 0 1 0 Rn Rd
U op
Vector
SADDLP <Vd>.<Ta>, <Vn>.<Tb>

Assembler Symbols
size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SADDLP Page 1087

Operation
bits(2*esize) sum;
integer op1;
integer op2;

if acc then
else
V[d] = result;
If PSTATE.DIT is 1:
SADDLP Page 1088

SADDLV
Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together,
and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source
vector elements. All the values in this instruction are signed integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
Advanced SIMD
SADDLV <V><d>, <Vn>.<T>

Assembler Symbols

size <V>
00 H
01 S
10 D
11 RESERVED
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
integer sum;
sum = Int(Elem[operand, 0, esize], unsigned);

sum = sum + Int(Elem[operand, e, esize], unsigned);
V[d] = sum<2*esize-1:0>;
SADDLV Page 1089

If PSTATE.DIT is 1:
SADDLV Page 1090

SADDW, SADDW2
Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding
vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and
writes the vector to the SIMD&FP destination register.
The SADDW instruction extracts the second source vector from the lower half of the second source register, while the
SADDW2 instruction extracts the second source vector from the upper half of the second source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 1 0 0 Rn Rd
U o1
SADDW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SADDW, SADDW2 Page 1091

Operation
integer element1;
integer element2;
integer sum;
element1 = Int(Elem[operand1, e, 2*esize], unsigned);
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
SADDW, SADDW2 Page 1092

SCVTF (vector, fixed-point)
Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-
point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh
Scalar
SCVTF <V><d>, <V><n>, #<fbits>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh
Vector
SCVTF <Vd>.<T>, <Vn>.<T>, #<fbits>


Assembler Symbols
SCVTF (vector, fixed-point) Page 1093

immh <V>
000x RESERVED
001x H
01xx S
1xxx D
immh Q <T>
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
“immh:immb”:
immh <fbits>
000x RESERVED
“immh:immb”:
immh <fbits>
0001 RESERVED
Operation
Elem[result, e, esize] = FixedToFP(element, fracbits, unsigned, FPCR, rounding);
V[d] = result;
SCVTF (vector, fixed-point) Page 1094

SCVTF (vector, integer)
Signed integer Convert to Floating-point (vector). This instruction converts each element in a vector from signed
integer to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U
SCVTF <Hd>, <Hn>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U
SCVTF <V><d>, <V><n>


(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U
SCVTF (vector, integer) Page 1095

SCVTF <Vd>.<T>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U
SCVTF <Vd>.<T>, <Vn>.<T>

Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
Elem[result, e, esize] = FixedToFP(element, 0, unsigned, FPCR, rounding);
V[d] = result;

SCVTF (scalar, fixed-point)
Signed fixed-point Convert to Floating-point (scalar). This instruction converts the signed value in the 32-bit or 64-bit
general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
rmode opcode
SCVTF (scalar, fixed-point) Page 1098

32-bit to half-precision (sf == 0 && ftype == 11)
(Armv8.2)
SCVTF <Hd>, <Wn>, #<fbits>
32-bit to single-precision (sf == 0 && ftype == 00)
SCVTF <Sd>, <Wn>, #<fbits>
32-bit to double-precision (sf == 0 && ftype == 01)
SCVTF <Dd>, <Wn>, #<fbits>

(Armv8.2)
SCVTF <Hd>, <Xn>, #<fbits>
SCVTF <Sd>, <Xn>, #<fbits>
SCVTF <Dd>, <Xn>, #<fbits>

integer fltsize;
case ftype of
when '11'
fltsize = 16;
else
UNDEFINED;

Assembler Symbols
<fbits> For the 32-bit to double-precision, 32-bit to half-precision and 32-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 32, encoded as 64
minus "scale".

For the 64-bit to double-precision, 64-bit to half-precision and 64-bit to single-precision variant: is the
minus "scale".
Operation
intval = X[n];
fltval = FixedToFP(intval, fracbits, FALSE, FPCR, rounding);
V[d] = fltval;

SCVTF (scalar, integer)
Signed integer Convert to Floating-point (scalar). This instruction converts the signed integer value in the general-
purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and writes the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 1 0 0 0 0 0 0 0 Rn Rd
rmode opcode
SCVTF (scalar, integer) Page 1101

(Armv8.2)
SCVTF <Hd>, <Wn>
SCVTF <Sd>, <Wn>
SCVTF <Dd>, <Wn>

(Armv8.2)
SCVTF <Hd>, <Xn>
SCVTF <Sd>, <Xn>
SCVTF <Dd>, <Xn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
intval = X[n];
fltval = FixedToFP(intval, 0, FALSE, FPCR, rounding);
V[d] = fltval;

SDOT (by element)
Dot Product signed arithmetic (vector, by element). This instruction performs the dot product of the four 8-bit
elements in each 32-bit element of the first source register with the four 8-bit elements of an indexed 32-bit element in
the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
support it.
ID_AA64ISAR0_EL1.DP indicates whether this instruction is supported.
Vector
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 1 0 H 0 Rn Rd
U
Vector
SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]
if !HaveDOTPExt() then UNDEFINED;

boolean signed = (U == '0');
integer index = UInt(H:L);

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 8B
1 16B
<index> Is the element index, encoded in the "H:L" fields.
SDOT (by element) Page 1104

Operation
bits(datasize) result = V[d];
integer res = 0;
integer element1, element2;
for i = 0 to 3
if signed then
element1 = SInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = SInt(Elem[operand2, 4*index+i, esize DIV 4]);
else
element1 = UInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = UInt(Elem[operand2, 4*index+i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = Elem[result, e, esize] + res;
V[d] = result;
SDOT (by element) Page 1105

SDOT (vector)
Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in
each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element
in the second source register, accumulating the result into the corresponding 32-bit element of the destination
register.
support it.
Vector
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 1 0 0 1 0 1 Rn Rd
U
Vector
SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 8B
1 16B
SDOT (vector) Page 1106

Operation
result = V[d];
integer res = 0;
for i = 0 to 3
if signed then
else
V[d] = result;
SDOT (vector) Page 1107

SHA1C
SHA1 hash update (choose).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
Advanced SIMD
SHA1C <Qd>, <Sn>, <Vm>.4S
Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
bits(128) X = V[d];
bits(32) Y = V[n]; // Note: 32 not 128 bits wide
bits(128) W = V[m];
bits(32) t;
for e = 0 to 3
t = SHAchoose(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;
If PSTATE.DIT is 1:
SHA1C Page 1108

SHA1H
SHA1 fixed rotate.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 Rn Rd
Advanced SIMD
SHA1H <Sd>, <Sn>
Assembler Symbols
Operation
bits(32) operand = V[n]; // read element [0] only, [1-3] zeroed

V[d] = ROL(operand, 30);
If PSTATE.DIT is 1:
SHA1H Page 1109

SHA1M
SHA1 hash update (majority).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 0 1 0 0 0 Rn Rd
Advanced SIMD
SHA1M <Qd>, <Sn>, <Vm>.4S
Assembler Symbols
Operation
bits(128) X = V[d];
bits(128) W = V[m];
bits(32) t;
for e = 0 to 3
t = SHAmajority(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;
If PSTATE.DIT is 1:
SHA1M Page 1110

SHA1P
SHA1 hash update (parity).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 0 0 1 0 0 Rn Rd
Advanced SIMD
SHA1P <Qd>, <Sn>, <Vm>.4S
Assembler Symbols
Operation
bits(128) X = V[d];
bits(128) W = V[m];
bits(32) t;
for e = 0 to 3
t = SHAparity(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;
If PSTATE.DIT is 1:
SHA1P Page 1111

SHA1SU0
SHA1 schedule update 0.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 0 1 1 0 0 Rn Rd
Advanced SIMD
SHA1SU0 <Vd>.4S, <Vn>.4S, <Vm>.4S
Assembler Symbols
Operation

bits(128) result;
result = operand2<63:0>:operand1<127:64>;
result = result EOR operand1 EOR operand3;
V[d] = result;
If PSTATE.DIT is 1:
SHA1SU0 Page 1112

SHA1SU1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 Rn Rd
Advanced SIMD
SHA1SU1 <Vd>.4S, <Vn>.4S
Assembler Symbols
Operation

bits(128) result;
bits(128) T = operand1 EOR LSR(operand2, 32);
result<31:0> = ROL(T<31:0>, 1);
result<63:32> = ROL(T<63:32>, 1);
result<95:64> = ROL(T<95:64>, 1);
result<127:96> = ROL(T<127:96>, 1) EOR ROL(T<31:0>, 2);
V[d] = result;
If PSTATE.DIT is 1:
SHA1SU1 Page 1113

SHA256H2
SHA256 hash update (part 2).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 1 0 1 0 0 Rn Rd
P
Advanced SIMD
SHA256H2 <Qd>, <Qn>, <Vm>.4S
Assembler Symbols
<Qn> Is the 128-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
Operation
bits(128) result;
result = SHA256hash(V[n], V[d], V[m], FALSE);
V[d] = result;
If PSTATE.DIT is 1:
SHA256H2 Page 1114

SHA256H
SHA256 hash update (part 1).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 1 0 0 0 0 Rn Rd
P
Advanced SIMD
SHA256H <Qd>, <Qn>, <Vm>.4S
Assembler Symbols
Operation
bits(128) result;
result = SHA256hash(V[d], V[n], V[m], TRUE);
V[d] = result;
If PSTATE.DIT is 1:
SHA256H Page 1115

SHA256SU0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 Rn Rd
Advanced SIMD
SHA256SU0 <Vd>.4S, <Vn>.4S
Assembler Symbols
Operation

bits(128) result;
bits(128) T = operand2<31:0>:operand1<127:32>;
bits(32) elt;
for e = 0 to 3
elt = Elem[T, e, 32];
elt = ROR(elt, 7) EOR ROR(elt, 18) EOR LSR(elt, 3);
Elem[result, e, 32] = elt + Elem[operand1, e, 32];
V[d] = result;
If PSTATE.DIT is 1:
SHA256SU0 Page 1116

SHA256SU1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 1 1 0 0 0 Rn Rd
Advanced SIMD
SHA256SU1 <Vd>.4S, <Vn>.4S, <Vm>.4S
Assembler Symbols
Operation

bits(128) result;
bits(128) T0 = operand3<31:0>:operand2<127:32>;
bits(64) T1;
bits(32) elt;
T1 = operand3<127:64>;
for e = 0 to 1
elt = Elem[T1, e, 32];
elt = elt + Elem[operand1, e, 32] + Elem[T0, e, 32];
Elem[result, e, 32] = elt;
T1 = result<63:0>;
for e = 2 to 3
elt = Elem[T1, e-2, 32];
elt = elt + Elem[operand1, e, 32] + Elem[T0, e, 32];
Elem[result, e, 32] = elt;
V[d] = result;
If PSTATE.DIT is 1:
SHA256SU1 Page 1117

SHA512H2
SHA512 Hash update part 2 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the sigma0 and majority functions of two iterations of the SHA512 computation. It returns
this value to the destination SIMD&FP register.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 0 1 Rn Rd
Advanced SIMD
SHA512H2 <Qd>, <Qn>, <Vm>.2D

Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination register, encoded in the "Rd" field.
Operation
bits(128) Vtmp;
bits(64) NSigma0;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];
NSigma0 = ROR(Y<63:0>, 28) EOR ROR(Y<63:0>, 34) EOR ROR(Y<63:0>, 39);

Vtmp<127:64> = (X<63:0> AND Y<127:64>) EOR (X<63:0> AND Y<63:0>) EOR (Y<127:64> AND Y<63:0>);
Vtmp<127:64> = (Vtmp<127:64> + NSigma0 + W<127:64>);
NSigma0 = ROR(Vtmp<127:64>, 28) EOR ROR(Vtmp<127:64>, 34) EOR ROR(Vtmp<127:64>, 39);
Vtmp<63:0> = (Vtmp<127:64> AND Y<63:0>) EOR (Vtmp<127:64> AND Y<127:64>) EOR (Y<127:64> AND Y<63:0>);
Vtmp<63:0> = (Vtmp<63:0> + NSigma0 + W<63:0>);
V[d] = Vtmp;
If PSTATE.DIT is 1:
SHA512H2 Page 1118

SHA512H
SHA512 Hash update part 1 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the sigma1 and chi functions of two iterations of the SHA512 computation. It returns this
value to the destination SIMD&FP register.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 0 0 Rn Rd
Advanced SIMD
SHA512H <Qd>, <Qn>, <Vm>.2D

Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination register, encoded in the "Rd" field.
Operation
bits(128) Vtmp;
bits(64) MSigma1;
bits(64) tmp;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];
MSigma1 = ROR(Y<127:64>, 14) EOR ROR(Y<127:64>, 18) EOR ROR(Y<127:64>, 41);

Vtmp<127:64> = (Y<127:64> AND X<63:0>) EOR (NOT(Y<127:64>) AND X<127:64>);
Vtmp<127:64> = (Vtmp<127:64> + MSigma1 + W<127:64>);
tmp = Vtmp<127:64> + Y<63:0>;
MSigma1 = ROR(tmp, 14) EOR ROR(tmp, 18) EOR ROR(tmp, 41);
Vtmp<63:0> = (tmp AND Y<127:64>) EOR (NOT(tmp) AND X<63:0>);
Vtmp<63:0> = (Vtmp<63:0> + MSigma1 + W<63:0>);
V[d] = Vtmp;
If PSTATE.DIT is 1:
SHA512H Page 1119

SHA512SU0
SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed
after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 Rn Rd
Advanced SIMD
SHA512SU0 <Vd>.2D, <Vn>.2D

Assembler Symbols
Operation
bits(64) sig0;
bits(128) Vtmp;
bits(128) X = V[n];
bits(128) W = V[d];
sig0 = ROR(W<127:64>, 1) EOR ROR(W<127:64>, 8) EOR ('0000000':W<127:71>);
Vtmp<63:0> = W<63:0> + sig0;
sig0 = ROR(X<63:0>, 1) EOR ROR(X<63:0>, 8) EOR ('0000000':X<63:7>);
Vtmp<127:64> = W<127:64> + sig0;
V[d] = Vtmp;
If PSTATE.DIT is 1:
SHA512SU0 Page 1120

SHA512SU1
SHA512 Schedule Update 1 takes the values from the three source SIMD&FP registers and produces a 128-bit output
value that combines the gamma1 functions of two iterations of the SHA512 schedule update that are performed after
the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 1 0 Rn Rd
Advanced SIMD
SHA512SU1 <Vd>.2D, <Vn>.2D, <Vm>.2D

Assembler Symbols
Operation
bits(64) sig1;
bits(128) Vtmp;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];

Vtmp<127:64> = W<127:64> + sig1 + Y<127:64>;
Vtmp<63:0> = W<63:0> + sig1 + Y<63:0>;
V[d] = Vtmp;
If PSTATE.DIT is 1:
SHA512SU1 Page 1121

SHADD
Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP
registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see SRHADD.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 0 1 Rn Rd
U
SHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer sum;
Elem[result, e, esize] = sum<esize:1>;
V[d] = result;
SHADD Page 1122

If PSTATE.DIT is 1:
SHADD Page 1123

SHL
Shift Left (immediate). This instruction reads each value from a vector, left shifts each result by an immediate value,
writes the final result to a vector, and writes the vector to the destination SIMD&FP register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh
Scalar
SHL <V><d>, <V><n>, #<shift>
if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer shift = UInt(immh:immb) - esize;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh
Vector
SHL <Vd>.<T>, <Vn>.<T>, #<shift>

Assembler Symbols

immh <V>
0xxx RESERVED
1xxx D
SHL Page 1124

immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<shift> For the scalar variant: is the left shift amount, in the range 0 to 63, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (UInt(immh:immb)-64)
For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
Operation
Elem[result, e, esize] = LSL(Elem[operand, e, esize], shift);
V[d] = result;
If PSTATE.DIT is 1:
SHL Page 1125

SHLL, SHLL2
Shift Left Long (by element size). This instruction reads each vector element in the lower or upper half of the source
SIMD&FP register, left shifts each result by the element size, writes the final result to a vector, and writes the vector
to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
The SHLL instruction extracts vector elements from the lower half of the source register, while the SHLL2 instruction
extracts vector elements from the upper half of the source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 1 1 0 Rn Rd
Vector
SHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #<shift>

integer shift = esize;

boolean unsigned = FALSE; // Or TRUE without change of functionality
Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<shift> Is the left shift amount, which must be equal to the source element width in bits, encoded in “size”:
size <shift>
00 8
01 16
10 32
11 RESERVED
SHLL, SHLL2 Page 1126

Operation
integer element;
element = Int(Elem[operand, e, esize], unsigned) << shift;
Elem[result, e, 2*esize] = element<2*esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
SHLL, SHLL2 Page 1127

SHRN, SHRN2
Shift Right Narrow (immediate). This instruction reads each unsigned integer value from the source SIMD&FP
register, right shifts each result by an immediate value, puts the final result into a vector, and writes the vector to the
lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the
source vector elements. The results are truncated. For rounded results, see RSHRN.
The RSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
RSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op
Vector
SHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>


Assembler Symbols
Q 2
0 [absent]
1 [present]
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
SHRN, SHRN2 Page 1128

<shift> Is the right shift amount, in the range 1 to the destination element width in bits, encoded in
“immh:immb”:
immh <shift>
1xxx RESERVED
Operation
integer element;
element = (UInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
If PSTATE.DIT is 1:
SHRN, SHRN2 Page 1129

SHSUB
Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register
from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit,
places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 0 1 Rn Rd
U
SHSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer diff;
diff = element1 - element2;
Elem[result, e, esize] = diff<esize:1>;
V[d] = result;
If PSTATE.DIT is 1:
SHSUB Page 1130

SHSUB Page 1131

SLI
Shift Left and Insert (immediate). This instruction reads each vector element in the source SIMD&FP register, left
shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the
destination SIMD&FP register such that the new zero bits created by the shift are not inserted but retain their existing
value. Bits shifted out of the left of each vector element in the source register are lost.
The following figure shows an example of the operation of shift left by 3 for an 8-bit vector element.
63 5655 0
Vn.B[7]
63 5655 0
Vd.B[7] after operation
63 5655 0
Vd.B[7] before operation
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh
Scalar
SLI <V><d>, <V><n>, #<shift>

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh
SLI Page 1132

Vector
SLI <Vd>.<T>, <Vn>.<T>, #<shift>

Assembler Symbols

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<shift> For the scalar variant: is the left shift amount, in the range 0 to 63, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
immh <shift>
SLI Page 1133

Operation
bits(esize) mask = LSL(Ones(esize), shift);
bits(esize) shifted;
shifted = LSL(Elem[operand, e, esize], shift);
Elem[result, e, esize] = (Elem[operand2, e, esize] AND NOT(mask)) OR shifted;
V[d] = result;
If PSTATE.DIT is 1:
SLI Page 1134

SM3PARTW1
SM3PARTW1 takes three 128-bit vectors from the three source SIMD&FP registers and returns a 128-bit result in the
destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input
vectors with some fixed rotations, see the Operation pseudocode for more information.
This instruction is implemented only when ARMv8.2-SM is implemented.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 0 0 Rn Rd
Advanced SIMD
SM3PARTW1 <Vd>.4S, <Vn>.4S, <Vm>.4S
if !HaveSM3Ext() then UNDEFINED;

Assembler Symbols
Operation
bits(128) Vd = V[d];
bits(128) result;
result<95:0> = (Vd EOR Vn)<95:0> EOR (ROL(Vm<127:96>, 15):ROL(Vm<95:64>, 15):ROL(Vm<63:32>, 15));
for i = 0 to 3
if i == 3 then
result<127:96> = (Vd EOR Vn)<127:96> EOR (ROL(result<31:0>, 15));
result<(32*i)+31:(32*i)> = result<(32*i)+31:(32*i)> EOR ROL(result<(32*i)+31:(32*i)>, 15) EOR ROL(res
V[d] = result;
If PSTATE.DIT is 1:
SM3PARTW1 Page 1135

SM3PARTW2
SM3PARTW2 takes three 128-bit vectors from three source SIMD&FP registers and returns a 128-bit result in the
destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input
vectors with some fixed rotations, see the Operation pseudocode for more information.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 0 1 Rn Rd
Advanced SIMD
SM3PARTW2 <Vd>.4S, <Vn>.4S, <Vm>.4S

Assembler Symbols
Operation
bits(128) result;
bits(128) tmp;
bits(32) tmp2;
tmp<127:0> = Vn EOR (ROL(Vm<127:96>, 7):ROL(Vm<95:64>, 7):ROL(Vm<63:32>, 7):ROL(Vm<31:0>, 7));
result<127:0> = Vd<127:0> EOR tmp<127:0>;
tmp2 = ROL(tmp<31:0>, 15);
tmp2 = tmp2 EOR ROL(tmp2, 15) EOR ROL(tmp2, 23);
result<127:96> = result<127:96> EOR tmp2;
V[d] = result;
If PSTATE.DIT is 1:
SM3PARTW2 Page 1136

SM3SS1
SM3SS1 rotates the top 32 bits of the 128-bit vector in the first source SIMD&FP register by 12, and adds that 32-bit
value to the two other 32-bit values held in the top 32 bits of each of the 128-bit vectors in the second and third source
SIMD&FP registers, rotating this result left by 7 and writing the final result into the top 32 bits of the vector in the
destination SIMD&FP register, with the bottom 96 bits of the vector being written to 0.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 0 Ra Rn Rd
Advanced SIMD
SM3SS1 <Vd>.4S, <Vn>.4S, <Vm>.4S, <Va>.4S

Assembler Symbols
Operation
Vd<127:96> = ROL((ROL(Vn<127:96>, 12) + Vm<127:96> + Va<127:96>), 7);
Vd<95:0> = Zeros();
V[d] = Vd;
If PSTATE.DIT is 1:
SM3SS1 Page 1137

SM3TT1A
SM3TT1A takes three 128-bit vectors from three source SIMD&FP registers and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a three-way exclusive OR of the three 32-bit
fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the following
three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the three-way exclusive OR.
• The result of the exclusive OR of the top 32-bit element of the second source vector, Vn, with a rotation left by
12 of the top 32-bit element of the first source vector.
• A 32-bit element indexed out of the third source vector, Vm.
The result of this addition is returned as the top element of the result. The other elements of the result are taken from
elements of the first source vector, with the element returned in bits<63:32> being rotated left by 9.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 0 0 Rn Rd
Advanced SIMD
SM3TT1A <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]

integer i = UInt(imm2);
Assembler Symbols
<imm2> Is a 32-bit element indexed out of <Vm>, encoded in "imm2".
Operation
bits(32) WjPrime;
bits(128) result;
bits(32) TT1;
bits(32) SS2;
WjPrime = Elem[Vm, i, 32];

SS2 = Vn<127:96> EOR ROL(Vd<127:96>, 12);
TT1 = Vd<63:32> EOR (Vd<127:96> EOR Vd<95:64>);
TT1 = (TT1+Vd<31:0>+SS2+WjPrime)<31:0>;
result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 9);
result<95:64> = Vd<127:96>;
result<127:96> = TT1;
V[d] = result;
SM3TT1A Page 1138

If PSTATE.DIT is 1:
SM3TT1A Page 1139

SM3TT1B
SM3TT1B takes three 128-bit vectors from three source SIMD&FP registers and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a 32-bit majority function between the three
32-bit fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the
following three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the 32-bit majority function.
• The result of the exclusive OR of the top 32-bit element of the second source vector, Vn, with a rotation left by
12 of the top 32-bit element of the first source vector.
The result of this addition is returned as the top element of the result. The other elements of the result are taken from
elements of the first source vector, with the element returned in bits<63:32> being rotated left by 9.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 0 1 Rn Rd
Advanced SIMD
SM3TT1B <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]

Assembler Symbols
Operation
bits(32) WjPrime;
bits(128) result;
bits(32) TT1;
bits(32) SS2;
WjPrime = Elem[Vm, i, 32];

SS2 = Vn<127:96> EOR ROL(Vd<127:96>, 12);
TT1 = (Vd<127:96> AND Vd<63:32>) OR (Vd<127:96> AND Vd<95:64>) OR (Vd<63:32> AND Vd<95:64>);
TT1 = (TT1+Vd<31:0>+SS2+WjPrime)<31:0>;
result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 9);
result<95:64> = Vd<127:96>;
result<127:96> = TT1;
V[d] = result;
SM3TT1B Page 1140

If PSTATE.DIT is 1:
SM3TT1B Page 1141

SM3TT2A
SM3TT2A takes three 128-bit vectors from three source SIMD&FP register and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a three-way exclusive OR of the three 32-bit
fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the following
three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the three-way exclusive OR.
• The 32-bit element held in the top 32 bits of the second source vector, Vn.
A three-way exclusive OR is performed of the result of this addition, the result of the addition rotated left by 9, and the
result of the addition rotated left by 17. The result of this exclusive OR is returned as the top element of the returned
result. The other elements of this result are taken from elements of the first source vector, with the element returned
in bits<63:32> being rotated left by 19.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 1 0 Rn Rd
Advanced SIMD
SM3TT2A <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]

Assembler Symbols
Operation
bits(32) Wj;
bits(128) result;
bits(32) TT2;
Wj = Elem[Vm, i, 32];
TT2 = Vd<63:32> EOR (Vd<127:96> EOR Vd<95:64>);
TT2 = (TT2+Vd<31:0>+Vn<127:96>+Wj)<31:0>;
result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 19);
result<95:64> = Vd<127:96>;
result<127:96> = TT2 EOR ROL(TT2, 9) EOR ROL(TT2, 17);
V[d] = result;
SM3TT2A Page 1142

If PSTATE.DIT is 1:
SM3TT2A Page 1143

SM3TT2B
SM3TT2B takes three 128-bit vectors from three source SIMD&FP registers, and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a 32-bit majority function between the three
32-bit fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the
following three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the 32-bit majority function.
• The 32-bit element held in the top 32 bits of the second source vector, Vn.
A three-way exclusive OR is performed of the result of this addition, the result of the addition rotated left by 9, and the
result of the addition rotated left by 17. The result of this exclusive OR is returned as the top element of the returned
result. The other elements of this result are taken from elements of the first source vector, with the element returned
in bits<63:32> being rotated left by 19.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 1 1 Rn Rd
Advanced SIMD
SM3TT2B <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]

Assembler Symbols
Operation
bits(32) Wj;
bits(128) result;
bits(32) TT2;
Wj = Elem[Vm, i, 32];
TT2 = (Vd<127:96> AND Vd<95:64>) OR (NOT(Vd<127:96>) AND Vd<63:32>);
TT2 = (TT2+Vd<31:0>+Vn<127:96>+Wj)<31:0>;
result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 19);
result<95:64> = Vd<127:96>;
result<127:96> = TT2 EOR ROL(TT2, 9) EOR ROL(TT2, 17);
V[d] = result;
SM3TT2B Page 1144

If PSTATE.DIT is 1:
SM3TT2B Page 1145

SM4E
SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the
round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by
four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 Rn Rd
Advanced SIMD
SM4E <Vd>.4S, <Vn>.4S

Assembler Symbols
Operation
bits(32) intval;
bits(8) sboxout;
bits(128) roundresult;
bits(32) roundkey;
roundresult = V[d];
for index = 0 to 3
roundkey = Elem[Vn, index, 32];
intval = roundresult<127:96> EOR roundresult<95:64> EOR roundresult<63:32> EOR roundkey;
for i = 0 to 3
Elem[intval, i, 8] = Sbox(Elem[intval, i, 8]);
intval = intval EOR ROL(intval, 2) EOR ROL(intval, 10) EOR ROL(intval, 18) EOR ROL(intval, 24);
intval = intval EOR roundresult<31:0>;
roundresult<31:0> = roundresult<63:32>;
roundresult<127:96> = intval;
V[d] = roundresult;
If PSTATE.DIT is 1:
SM4E Page 1146

SM4E Page 1147

SM4EKEY
SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the
second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning
the 128-bit result to the destination SIMD&FP register.
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 1 0 Rn Rd
Advanced SIMD
SM4EKEY <Vd>.4S, <Vn>.4S, <Vm>.4S

Assembler Symbols
Operation
bits(32) intval;
bits(8) sboxout;
bits(128) result;
bits(32) const;
bits(128) roundresult;
roundresult = V[n];
for index = 0 to 3
const = Elem[Vm, index, 32];
intval = roundresult<127:96> EOR roundresult<95:64> EOR roundresult<63:32> EOR const;
for i = 0 to 3
Elem[intval, i, 8] = Sbox(Elem[intval, i, 8]);
intval = intval EOR ROL(intval, 13) EOR ROL(intval, 23);

intval = intval EOR roundresult<31:0>;
roundresult<127:96> = intval;
V[d] = roundresult;
If PSTATE.DIT is 1:
SM4EKEY Page 1148

SM4EKEY Page 1149

SMAX
Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 0 1 Rn Rd
U o1
SMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer maxmin;
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
SMAX Page 1150

If PSTATE.DIT is 1:
SMAX Page 1151

SMAXP
Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 0 1 Rn Rd
U o1
SMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer maxmin;
element1 = Int(Elem[concat, 2*e, esize], unsigned);
element2 = Int(Elem[concat, (2*e)+1, esize], unsigned);
V[d] = result;
SMAXP Page 1152

If PSTATE.DIT is 1:
SMAXP Page 1153

SMAXV
Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are signed integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 0 1 0 1 0 1 0 Rn Rd
U op
Advanced SIMD
SMAXV <V><d>, <Vn>.<T>


boolean min = (op == '1');
Assembler Symbols

size <V>
00 B
01 H
10 S
11 RESERVED
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
integer maxmin;
integer element;
maxmin = Int(Elem[operand, 0, esize], unsigned);

element = Int(Elem[operand, e, esize], unsigned);
maxmin = if min then Min(maxmin, element) else Max(maxmin, element);
V[d] = maxmin<esize-1:0>;
SMAXV Page 1154

If PSTATE.DIT is 1:
SMAXV Page 1155

SMIN
Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 1 1 Rn Rd
U o1
SMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer maxmin;
V[d] = result;
SMIN Page 1156

If PSTATE.DIT is 1:
SMIN Page 1157

SMINP
Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 1 1 Rn Rd
U o1
SMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer maxmin;
V[d] = result;
SMINP Page 1158

If PSTATE.DIT is 1:
SMINP Page 1159

SMINV
Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are signed integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 0 1 0 Rn Rd
U op
Advanced SIMD
SMINV <V><d>, <Vn>.<T>


Assembler Symbols

size <V>
00 B
01 H
10 S
11 RESERVED
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
integer maxmin;
integer element;

SMINV Page 1160

If PSTATE.DIT is 1:
SMINV Page 1161

SMLAL, SMLAL2 (by element)
Signed Multiply-Add Long (vector, by element). This instruction multiplies each vector element in the lower or upper
half of the first source SIMD&FP register by the specified vector element in the second source SIMD&FP register, and
accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector
elements are twice as long as the elements that are multiplied. All the values in this instruction are signed integer
values.
The SMLAL instruction extracts vector elements from the lower half of the first source register, while the SMLAL2
instruction extracts vector elements from the upper half of the first source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 0 1 0 H 0 Rn Rd
U o2
Vector
SMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
SMLAL, SMLAL2 (by

Page 1162
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
bits(2*datasize) operand3 = V[d];
integer element1;
integer element2;
bits(2*esize) product;
element2 = Int(Elem[operand2, index, esize], unsigned);

product = (element1*element2)<2*esize-1:0>;
if sub_op then
Elem[result, e, 2*esize] = Elem[operand3, e, 2*esize] - product;
else
Elem[result, e, 2*esize] = Elem[operand3, e, 2*esize] + product;
V[d] = result;
If PSTATE.DIT is 1:
SMLAL, SMLAL2 (by

Page 1163
element)
SMLAL, SMLAL2 (vector)
Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or
upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements
of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are
multiplied.
The SMLAL instruction extracts each source vector from the lower half of each source register, while the SMLAL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 0 0 Rn Rd
U o1
SMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SMLAL, SMLAL2 (vector) Page 1164

Operation
integer element1;
integer element2;
bits(2*esize) accum;
if sub_op then
accum = Elem[operand3, e, 2*esize] - product;
else
accum = Elem[operand3, e, 2*esize] + product;
Elem[result, e, 2*esize] = accum;
V[d] = result;
If PSTATE.DIT is 1:
SMLAL, SMLAL2 (vector) Page 1165

SMLSL, SMLSL2 (by element)
Signed Multiply-Subtract Long (vector, by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register and subtracts the results from the vector elements of the destination SIMD&FP register. The destination
vector elements are twice as long as the elements that are multiplied.
The SMLSL instruction extracts vector elements from the lower half of the first source register, while the SMLSL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 1 1 0 H 0 Rn Rd
U o2
Vector
SMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
SMLSL, SMLSL2 (by

Page 1166
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;

if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
SMLSL, SMLSL2 (by

Page 1167
element)
SMLSL, SMLSL2 (vector)
Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or
upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of
the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are
multiplied.
The SMLSL instruction extracts each source vector from the lower half of each source register, while the SMLSL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 0 0 Rn Rd
U o1
SMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SMLSL, SMLSL2 (vector) Page 1168

Operation
integer element1;
integer element2;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
SMLSL, SMLSL2 (vector) Page 1169

SMMLA (vector)
Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of signed 8-bit integer
values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The
resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the
destination vector. This is equivalent to performing an 8-way dot product per destination element.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 1 0 0 Rm 1 0 1 0 0 1 Rn Rd
U B
Vector
SMMLA <Vd>.4S, <Vn>.16B, <Vm>.16B
if !HaveInt8MatMulExt() then UNDEFINED;

Assembler Symbols
Operation
bits(128) addend = V[d];
V[d] = MatMulAdd(addend, operand1, operand2, FALSE, FALSE);
SMMLA (vector) Page 1170

SMOV
Signed Move vector element to general-purpose register. This instruction reads the signed integer from the source
SIMD&FP register, sign-extends it to form a 32-bit or 64-bit value, and writes the result to destination general-purpose
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 1 0 1 1 Rn Rd
32-bit (Q == 0)
SMOV <Wd>, <Vn>.<Ts>[<index>]
64-reg,SMOV-64-reg (Q == 1)
SMOV <Xd>, <Vn>.<Ts>[<index>]
integer size;
case Q:imm5 of
when 'xxxxx1' size = 0; // SMOV [WX]d, Vn.B
when 'xxxx10' size = 1; // SMOV [WX]d, Vn.H
when '1xx100' size = 2; // SMOV Xd, Vn.S

Assembler Symbols
<Ts> For the 32-bit variant: is an element size specifier, encoded in “imm5”:
imm5 <Ts>
xxx00 RESERVED
xxxx1 B
xxx10 H
For the 64-reg,SMOV-64-reg variant: is an element size specifier, encoded in “imm5”:
imm5 <Ts>
xx000 RESERVED
xxxx1 B
xxx10 H
xx100 S
<index> For the 32-bit variant: is the element index encoded in “imm5”:
imm5 <index>
xxx00 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
For the 64-reg,SMOV-64-reg variant: is the element index encoded in “imm5”:
SMOV Page 1171

imm5 <index>
xx000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
Operation
X[d] = SignExtend(Elem[operand, index, esize], datasize);
If PSTATE.DIT is 1:
SMOV Page 1172

SMULL, SMULL2 (by element)
Signed Multiply Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of
the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, places the
result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are
twice as long as the elements that are multiplied.
The SMULL instruction extracts vector elements from the lower half of the first source register, while the SMULL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 1 0 H 0 Rn Rd
U
Vector
SMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SMULL, SMULL2 (by

Page 1173
element)
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;

Elem[result, e, 2*esize] = product;
V[d] = result;
If PSTATE.DIT is 1:
SMULL, SMULL2 (by

Page 1174
element)
SMULL, SMULL2 (vector)
Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper
half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the
The destination vector elements are twice as long as the elements that are multiplied.
The SMULL instruction extracts each source vector from the lower half of each source register, while the SMULL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 0 0 0 0 Rn Rd
U
SMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SMULL, SMULL2 (vector) Page 1175

Operation
integer element1;
integer element2;
Elem[result, e, 2*esize] = (element1*element2)<2*esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
SMULL, SMULL2 (vector) Page 1176

SQABS
Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts
the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values
in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U
Scalar
SQABS <V><d>, <V><n>

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U
Vector
SQABS <Vd>.<T>, <Vn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
SQABS Page 1177

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element;
boolean sat;
if neg then
element = -element;
else
(Elem[result, e, esize], sat) = SignedSatQ(element, esize);
if sat then FPSR.QC = '1';
V[d] = result;
SQABS Page 1178

SQADD
Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP
registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U
Scalar
SQADD <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U
Vector
SQADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
SQADD Page 1179

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element1;
integer element2;
integer sum;
boolean sat;
(Elem[result, e, esize], sat) = SatQ(sum, esize, unsigned);
V[d] = result;
SQADD Page 1180

SQDMLAL, SQDMLAL2 (by element)
Signed saturating Doubling Multiply-Add Long (by element). This instruction multiplies each vector element in the
lower or upper half of the first source SIMD&FP register by the specified vector element of the second source
SIMD&FP register, doubles the results, and accumulates the final results with the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
bit FPSR.QC is set.
The SQDMLAL instruction extracts vector elements from the lower half of the first source register, while the SQDMLAL2
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 0 0 1 1 H 0 Rn Rd
o2
Scalar
SQDMLAL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 0 1 1 H 0 Rn Rd
o2
SQDMLAL, SQDMLAL2 (by

Page 1181
element)
Vector
SQDMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Va> Is the destination width specifier, encoded in “size”:

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
<Vb> Is the source width specifier, encoded in “size”:
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

Page 1182
element)
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;
integer accum;
boolean sat1;
boolean sat2;
element2 = SInt(Elem[operand2, index, esize]);

element1 = SInt(Elem[operand1, e, esize]);
(product, sat1) = SignedSatQ(2 * element1 * element2, 2 * esize);
if sub_op then
accum = SInt(Elem[operand3, e, 2*esize]) - SInt(product);
else
accum = SInt(Elem[operand3, e, 2*esize]) + SInt(product);
(Elem[result, e, 2*esize], sat2) = SignedSatQ(accum, 2 * esize);
if sat1 || sat2 then FPSR.QC = '1';
V[d] = result;

Page 1183
element)
SQDMLAL, SQDMLAL2 (vector)
Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the
lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final
results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as
long as the elements that are multiplied.
bit FPSR.QC is set.
The SQDMLAL instruction extracts each source vector from the lower half of each source register, while the SQDMLAL2
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 1 0 0 Rn Rd
o1
Scalar
SQDMLAL <Va><d>, <Vb><n>, <Vb><m>

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 0 0 Rn Rd
o1
Vector
SQDMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

SQDMLAL, SQDMLAL2
Page 1184
(vector)
Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED
SQDMLAL, SQDMLAL2
Page 1185
(vector)
Operation
integer element1;
integer element2;
integer accum;
boolean sat1;
boolean sat2;
if sub_op then
else
V[d] = result;
SQDMLAL, SQDMLAL2
Page 1186
(vector)
SQDMLSL, SQDMLSL2 (by element)
Signed saturating Doubling Multiply-Subtract Long (by element). This instruction multiplies each vector element in
the lower or upper half of the first source SIMD&FP register by the specified vector element of the second source
SIMD&FP register, doubles the results, and subtracts the final results from the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the
values in this instruction are signed integer values.
bit FPSR.QC is set.
The SQDMLSL instruction extracts vector elements from the lower half of the first source register, while the SQDMLSL2
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 0 1 1 1 H 0 Rn Rd
o2
Scalar
SQDMLSL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 1 1 1 H 0 Rn Rd
o2
SQDMLSL, SQDMLSL2 (by

Page 1187
element)
Vector
SQDMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

Page 1188
element)
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;
integer accum;
boolean sat1;
boolean sat2;

if sub_op then
else
V[d] = result;

Page 1189
element)
SQDMLSL, SQDMLSL2 (vector)
Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in
the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the
final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice
as long as the elements that are multiplied.
bit FPSR.QC is set.
The SQDMLSL instruction extracts each source vector from the lower half of each source register, while the SQDMLSL2
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 1 1 0 0 Rn Rd
o1
Scalar
SQDMLSL <Va><d>, <Vb><n>, <Vb><m>

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 0 0 Rn Rd
o1
Vector
SQDMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

SQDMLSL, SQDMLSL2
Page 1190
(vector)
Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED
SQDMLSL, SQDMLSL2
Page 1191
(vector)
Operation
integer element1;
integer element2;
integer accum;
boolean sat1;
boolean sat2;
if sub_op then
else
V[d] = result;
SQDMLSL, SQDMLSL2
Page 1192
(vector)
SQDMULH (by element)
Signed saturating Doubling Multiply returning High half (by element). This instruction multiplies each vector element
in the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, doubles
the results, places the most significant half of the final results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see SQRDMULH.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 1 0 0 H 0 Rn Rd
op
Scalar
SQDMULH <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 0 0 H 0 Rn Rd
op
SQDMULH (by element) Page 1193

Vector
SQDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 RESERVED
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;
integer product;
boolean sat;

product = (2 * element1 * element2) + round_const;
// The following only saturates if element1 and element2 equal -(2^(esize-1))
(Elem[result, e, esize], sat) = SignedSatQ(product >> esize, esize);
V[d] = result;

SQDMULH (vector)
Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding
elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results
The results are truncated. For rounded results, see SQRDMULH.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U
Scalar
SQDMULH <V><d>, <V><n>, <V><m>
boolean rounding = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U
Vector
SQDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 RESERVED
SQDMULH (vector) Page 1196

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer round_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer product;
boolean sat;
V[d] = result;
SQDMULH (vector) Page 1197

SQDMULL, SQDMULL2 (by element)
Signed saturating Doubling Multiply Long (by element). This instruction multiplies each vector element in the lower or
register, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP
register. All the values in this instruction are signed integer values.
bit FPSR.QC is set.
The SQDMULL instruction extracts the first source vector from the lower half of the first source register, while the
SQDMULL2 instruction extracts the first source vector from the upper half of the first source register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 0 1 1 H 0 Rn Rd
Scalar
SQDMULL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 1 1 H 0 Rn Rd
SQDMULL, SQDMULL2 (by

Page 1198
element)
Vector
SQDMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

Page 1199
element)
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation

integer element1;
integer element2;
boolean sat;

(product, sat) = SignedSatQ(2 * element1 * element2, 2 * esize);
V[d] = result;

Page 1200
element)
SQDMULL, SQDMULL2 (vector)
Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or
upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes
the vector to the destination SIMD&FP register.
bit FPSR.QC is set.
The SQDMULL instruction extracts each source vector from the lower half of each source register, while the SQDMULL2
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 1 0 1 0 0 Rn Rd
Scalar
SQDMULL <Va><d>, <Vb><n>, <Vb><m>

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 0 1 0 0 Rn Rd
Vector
SQDMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q 2
0 [absent]
1 [present]
SQDMULL, SQDMULL2
Page 1201
(vector)
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED
Operation
integer element1;
integer element2;
boolean sat;
(product, sat) = SignedSatQ(2 * element1 * element2, 2 * esize);
V[d] = result;
SQDMULL, SQDMULL2
Page 1202
(vector)
SQNEG
Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates
each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in
this instruction are signed integer values.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U
Scalar
SQNEG <V><d>, <V><n>

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U
Vector
SQNEG <Vd>.<T>, <Vn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
SQNEG Page 1203

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element;
boolean sat;
if neg then
element = -element;
else
(Elem[result, e, esize], sat) = SignedSatQ(element, esize);
V[d] = result;
SQNEG Page 1204

SQRDMLAH (by element)
Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element). This instruction
multiplies the vector elements of the first source SIMD&FP register with the value of a vector element of the second
source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most
significant half of the final results with the vector elements of the destination SIMD&FP register. The results are
rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Scalar
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
S
Scalar
SQRDMLAH <V><d>, <V><n>, <Vm>.<Ts>[<index>]
if !HaveQRDMLAHExt() then UNDEFINED;

integer index;
bit Rmhi;
case size of

boolean rounding = TRUE;

Vector
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
S
SQRDMLAH (by element) Page 1205

Vector
SQRDMLAH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of


Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 RESERVED
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer rounding_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer element3;
integer product;
boolean sat;

if sub_op then
accum = ((element3 << esize) - 2 * (element1 * element2) + rounding_const);
else
accum = ((element3 << esize) + 2 * (element1 * element2) + rounding_const);
(Elem[result, e, esize], sat) = SignedSatQ(accum >> esize, esize);
V[d] = result;

SQRDMLAH (vector)
Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies
the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant
half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.
Scalar
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 0 Rm 1 0 0 0 0 1 Rn Rd
S
Scalar
SQRDMLAH <V><d>, <V><n>, <V><m>
Vector
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 0 0 1 Rn Rd
S
Vector
SQRDMLAH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
SQRDMLAH (vector) Page 1208

size <V>
00 RESERVED
01 H
10 S
11 RESERVED
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer element3;
integer product;
boolean sat;
if sub_op then
else
V[d] = result;
SQRDMLAH (vector) Page 1209

SQRDMLSH (by element)
Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element). This instruction multiplies
the vector elements of the first source SIMD&FP register with the value of a vector element of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half
of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
Scalar
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 size L M Rm 1 1 1 1 H 0 Rn Rd
S
Scalar
SQRDMLSH <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of


Vector
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 1 1 H 0 Rn Rd
S
SQRDMLSH (by element) Page 1210

Vector
SQRDMLSH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of


Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 RESERVED
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;
integer element3;
integer product;
boolean sat;

if sub_op then
else
V[d] = result;

SQRDMLSH (vector)
Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the
vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half
of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
Scalar
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 0 Rm 1 0 0 0 1 1 Rn Rd
S
Scalar
SQRDMLSH <V><d>, <V><n>, <V><m>
Vector
(Armv8.1)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 0 1 1 Rn Rd
S
Vector
SQRDMLSH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
SQRDMLSH (vector) Page 1213

size <V>
00 RESERVED
01 H
10 S
11 RESERVED
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer element3;
integer product;
boolean sat;
if sub_op then
else
V[d] = result;
SQRDMLSH (vector) Page 1214

SQRDMULH (by element)
Signed saturating Rounding Doubling Multiply returning High half (by element). This instruction multiplies each
vector element in the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register, doubles the results, places the most significant half of the final results into a vector, and writes the vector to
The results are rounded. For truncated results, see SQDMULH.
If any of the results overflows, they are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
op
Scalar
SQRDMULH <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
op
SQRDMULH (by element) Page 1215

Vector
SQRDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 RESERVED
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;
integer product;
boolean sat;

// The following only saturates if element1 and element2 equal -(2^(esize-1))
V[d] = result;

SQRDMULH (vector)
Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of
corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of
the final results into a vector, and writes the vector to the destination SIMD&FP register.
The results are rounded. For truncated results, see SQDMULH.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U
Scalar
SQRDMULH <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U
Vector
SQRDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 RESERVED
SQRDMULH (vector) Page 1218

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer round_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer product;
boolean sat;
V[d] = result;
SQRDMULH (vector) Page 1219

SQRSHL
Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source
SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the
second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP
register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For
truncated results, see SQSHL.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S
Scalar
SQRSHL <V><d>, <V><n>, <V><m>
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S
Vector
SQRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
SQRSHL Page 1220

size <V>
00 B
01 H
10 S
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer round_const = 0;
integer shift;
integer element;
boolean sat;
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
else
V[d] = result;
SQRSHL Page 1221

SQRSHRN, SQRSHRN2
Signed saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half
the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination
SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are half
as long as the source vector elements. The results are rounded. For truncated results, see SQSHRN.
The SQRSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQRSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op
Scalar
SQRSHRN <Vb><d>, <Va><n>, #<shift>
if immh == '0000' then UNDEFINED;

integer part = 0;

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op
Vector
SQRSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>


SQRSHRN, SQRSHRN2 Page 1222

Assembler Symbols
Q 2
0 [absent]
1 [present]
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
<Vb> Is the destination width specifier, encoded in “immh”:

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
<Va> Is the source width specifier, encoded in “immh”:
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
immh <shift>
0000 RESERVED
1xxx RESERVED
For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
immh <shift>
1xxx RESERVED

Operation
integer element;
boolean sat;
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;

SQRSHRUN, SQRSHRUN2
Signed saturating Rounded Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value
in the vector of the source SIMD&FP register, right shifts each value by an immediate value, saturates the result to an
unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. The results are rounded. For truncated results, see SQSHRUN.
The SQRSHRUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQRSHRUN2 instruction writes the vector to the upper half of the destination register without affecting the other
bits of the register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op
Scalar
SQRSHRUN <Vb><d>, <Va><n>, #<shift>

integer part = 0;

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op
Vector
SQRSHRUN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>


SQRSHRUN, SQRSHRUN2 Page 1225

Assembler Symbols
Q 2
0 [absent]
1 [present]
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
immh <shift>
0000 RESERVED
1xxx RESERVED
immh <shift>
1xxx RESERVED

Operation
integer element;
boolean sat;
element = (SInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
(Elem[result, e, esize], sat) = UnsignedSatQ(element, esize);

SQSHL (immediate)
Signed saturating Shift Left (immediate). This instruction reads each vector element in the source SIMD&FP register,
shifts each result by an immediate value, places the final result in a vector, and writes the vector to the destination
SIMD&FP register. The results are truncated. For rounded results, see UQRSHL.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op
Scalar
SQSHL <V><d>, <V><n>, #<shift>

boolean src_unsigned;
boolean dst_unsigned;
case op:U of
when '01' src_unsigned = FALSE; dst_unsigned = TRUE;
when '10' src_unsigned = FALSE; dst_unsigned = FALSE;
when '11' src_unsigned = TRUE; dst_unsigned = TRUE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op
SQSHL (immediate) Page 1228

Vector
SQSHL <Vd>.<T>, <Vn>.<T>, #<shift>

case op:U of
Assembler Symbols

immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<shift> For the scalar variant: is the left shift amount, in the range 0 to the operand width in bits minus 1,
immh <shift>
0000 RESERVED

immh <shift>
Operation
integer element;
boolean sat;
element = Int(Elem[operand, e, esize], src_unsigned) << shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, dst_unsigned);
V[d] = result;

SQSHL (register)
Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP
register, shifts each element by a value from the least significant byte of the corresponding element of the second
source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For
rounded results, see SQRSHL.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S
Scalar
SQSHL <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S
Vector
SQSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
SQSHL (register) Page 1231

size <V>
00 B
01 H
10 S
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer shift;
integer element;
boolean sat;
if rounding then
if saturating then
else
V[d] = result;
SQSHL (register) Page 1232

SQSHLU
Signed saturating Shift Left Unsigned (immediate). This instruction reads each signed integer value in the vector of
the source SIMD&FP register, shifts each value by an immediate value, saturates the shifted result to an unsigned
integer value, places the result in a vector, and writes the vector to the destination SIMD&FP register. The results are
truncated. For rounded results, see UQRSHL.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 1 0 0 1 Rn Rd
U immh op
Scalar
SQSHLU <V><d>, <V><n>, #<shift>

case op:U of
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 1 0 0 1 Rn Rd
U immh op
SQSHLU Page 1233

Vector
SQSHLU <Vd>.<T>, <Vn>.<T>, #<shift>

case op:U of
Assembler Symbols

immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0000 RESERVED
SQSHLU Page 1234

immh <shift>
Operation
integer element;
boolean sat;
V[d] = result;
SQSHLU Page 1235

SQSHRN, SQSHRN2
Signed saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts and truncates each result by an immediate value, saturates each shifted result to a value that is
half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the
destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector
elements are half as long as the source vector elements. For rounded results, see SQRSHRN.
The SQSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op
Scalar
SQSHRN <Vb><d>, <Va><n>, #<shift>

integer part = 0;

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op
Vector
SQSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>


SQSHRN, SQSHRN2 Page 1236

Assembler Symbols
Q 2
0 [absent]
1 [present]
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
immh <shift>
0000 RESERVED
1xxx RESERVED
immh <shift>
1xxx RESERVED

Operation
integer element;
boolean sat;

SQSHRUN, SQSHRUN2
Signed saturating Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value in the
vector of the source SIMD&FP register, right shifts each value by an immediate value, saturates the result to an
unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. The results are truncated. For rounded results, see SQRSHRUN.
The SQSHRUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQSHRUN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op
Scalar
SQSHRUN <Vb><d>, <Va><n>, #<shift>

integer part = 0;

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op
Vector
SQSHRUN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>


SQSHRUN, SQSHRUN2 Page 1239

Assembler Symbols
Q 2
0 [absent]
1 [present]
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
immh <shift>
0000 RESERVED
1xxx RESERVED
immh <shift>
1xxx RESERVED

Operation
integer element;
boolean sat;
element = (SInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
(Elem[result, e, esize], sat) = UnsignedSatQ(element, esize);

SQSUB
Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register
from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U
Scalar
SQSUB <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U
Vector
SQSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
SQSUB Page 1242

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element1;
integer element2;
integer diff;
boolean sat;
(Elem[result, e, esize], sat) = SatQ(diff, esize, unsigned);
V[d] = result;
SQSUB Page 1243

SQXTN, SQXTN2
Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register,
saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or
upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector
elements. All the values in this instruction are signed integer values.
bit FPSR.QC is set.
The SQXTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
SQXTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U
Scalar
SQXTN <Vb><d>, <Va><n>

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U
Vector
SQXTN{2} <Vd>.<Tb>, <Vn>.<Ta>

Assembler Symbols
SQXTN, SQXTN2 Page 1244

Q 2
0 [absent]
1 [present]
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vb> Is the destination width specifier, encoded in “size”:

size <Vb>
00 B
01 H
10 S
11 RESERVED
<Va> Is the source width specifier, encoded in “size”:
size <Va>
00 H
01 S
10 D
11 RESERVED
Operation
bits(2*esize) element;
boolean sat;
element = Elem[operand, e, 2*esize];
(Elem[result, e, esize], sat) = SatQ(Int(element, unsigned), esize, unsigned);
SQXTN, SQXTN2 Page 1245

SQXTUN, SQXTUN2
Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the
source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the
result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The
destination vector elements are half as long as the source vector elements.
The SQXTUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQXTUN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd
Scalar
SQXTUN <Vb><d>, <Va><n>

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd
Vector
SQXTUN{2} <Vd>.<Tb>, <Vn>.<Ta>

Assembler Symbols
Q 2
0 [absent]
1 [present]
SQXTUN, SQXTUN2 Page 1246

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

size <Vb>
00 B
01 H
10 S
11 RESERVED
size <Va>
00 H
01 S
10 D
11 RESERVED
Operation
boolean sat;
(Elem[result, e, esize], sat) = UnsignedSatQ(SInt(element), esize);
SQXTUN, SQXTUN2 Page 1247

SRHADD
Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source
SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the
The results are rounded. For truncated results, see SHADD.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 1 0 1 Rn Rd
U
SRHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
Elem[result, e, esize] = (element1+element2+1)<esize:1>;
V[d] = result;
SRHADD Page 1248

SRI
Shift Right and Insert (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the
destination SIMD&FP register such that the new zero bits created by the shift are not inserted but retain their existing
value. Bits shifted out of the right of each vector element of the source register are lost.
The following figure shows an example of the operation of shift right by 3 for an 8-bit vector element.
63 5655 0
Vn.B[7]
63 5655 0
Vd.B[7] after operation
63 5655 0
Vd.B[7] before operation
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 0 0 0 1 Rn Rd
immh
Scalar
SRI <V><d>, <V><n>, #<shift>

integer shift = (esize * 2) - UInt(immh:immb);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 0 0 0 1 Rn Rd
immh
SRI Page 1249

Vector
SRI <Vd>.<T>, <Vn>.<T>, #<shift>

Assembler Symbols

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
SRI Page 1250

Operation
bits(esize) mask = LSR(Ones(esize), shift);
bits(esize) shifted;
shifted = LSR(Elem[operand, e, esize], shift);
Elem[result, e, esize] = (Elem[operand2, e, esize] AND NOT(mask)) OR shifted;
V[d] = result;
If PSTATE.DIT is 1:
SRI Page 1251

SRSHL
Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source
SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift. For a
truncating shift, see SSHL.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S
Scalar
SRSHL <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S
Vector
SRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
SRSHL Page 1252

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer shift;
integer element;
boolean sat;
if rounding then
if saturating then
else
V[d] = result;
SRSHL Page 1253

SRSHR
Signed Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register,
right shifts each result by an immediate value, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are signed integer values. The results are rounded. For
truncated results, see SSHR.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0
Scalar
SRSHR <V><d>, <V><n>, #<shift>


boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0
Vector
SRSHR <Vd>.<T>, <Vn>.<T>, #<shift>


Assembler Symbols
SRSHR Page 1254

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0xxx RESERVED
“immh:immb”:
immh <shift>
Operation
integer element;
operand2 = if accumulate then V[d] else Zeros();

element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;
V[d] = result;
SRSHR Page 1255

SRSRA
Signed Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, and accumulates the final results with the vector
elements of the destination SIMD&FP register. All the values in this instruction are signed integer values. The results
are rounded. For truncated results, see SSRA.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0
Scalar
SRSRA <V><d>, <V><n>, #<shift>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0
Vector
SRSRA <Vd>.<T>, <Vn>.<T>, #<shift>


Assembler Symbols
SRSRA Page 1256

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0xxx RESERVED
“immh:immb”:
immh <shift>
Operation
integer element;

V[d] = result;
SRSRA Page 1257

SSHL
Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP
register, shifts each value by a value from the least significant byte of the corresponding element of the second source
SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a
rounding shift, see SRSHL.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S
Scalar
SSHL <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S
Vector
SSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
SSHL Page 1258

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer shift;
integer element;
boolean sat;
if rounding then
if saturating then
else
V[d] = result;
If PSTATE.DIT is 1:
SSHL Page 1259

SSHLL, SSHLL2
Signed Shift Left Long (immediate). This instruction reads each vector element from the source SIMD&FP register,
left shifts each vector element by the specified shift amount, places the result into a vector, and writes the vector to
the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
All the values in this instruction are signed integer values.
The SSHLL instruction extracts vector elements from the lower half of the source register, while the SSHLL2 instruction
This instruction is used by the alias SXTL, SXTL2.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 1 0 0 1 Rn Rd
U immh
Vector
SSHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #<shift>


Assembler Symbols
Q 2
0 [absent]
1 [present]
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
SSHLL, SSHLL2 Page 1260

<shift> Is the left shift amount, in the range 0 to the source element width in bits minus 1, encoded in
“immh:immb”:
immh <shift>
1xxx RESERVED
Alias Conditions

SXTL, SXTL2 immb == '000' && BitCount(immh) == 1
Operation
bits(datasize*2) result;
integer element;
V[d] = result;
If PSTATE.DIT is 1:
SSHLL, SSHLL2 Page 1261

SSHR
Signed Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each result by an immediate value, places the final result into a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are signed integer values. The results are truncated. For rounded
results, see SRSHR.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0
Scalar
SSHR <V><d>, <V><n>, #<shift>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0
Vector
SSHR <Vd>.<T>, <Vn>.<T>, #<shift>


Assembler Symbols
SSHR Page 1262

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0xxx RESERVED
“immh:immb”:
immh <shift>
Operation
integer element;

V[d] = result;
If PSTATE.DIT is 1:
SSHR Page 1263

SSRA
Signed Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of
the destination SIMD&FP register. All the values in this instruction are signed integer values. The results are
truncated. For rounded results, see SRSRA.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0
Scalar
SSRA <V><d>, <V><n>, #<shift>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0
Vector
SSRA <Vd>.<T>, <Vn>.<T>, #<shift>


Assembler Symbols
SSRA Page 1264

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0xxx RESERVED
“immh:immb”:
immh <shift>
Operation
integer element;

V[d] = result;
If PSTATE.DIT is 1:
SSRA Page 1265

SSUBL, SSUBL2
Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source
SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into
a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed
integer values. The destination vector elements are twice as long as the source vector elements.
The SSUBL instruction extracts each source vector from the lower half of each source register, while the SSUBL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 0 0 Rn Rd
U o1
SSUBL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SSUBL, SSUBL2 Page 1266

Operation
integer element1;
integer element2;
integer sum;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
SSUBL, SSUBL2 Page 1267

SSUBW, SSUBW2
Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source
SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a
vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer
values.
The SSUBW instruction extracts the second source vector from the lower half of the second source register, while the
SSUBW2 instruction extracts the second source vector from the upper half of the second source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 0 0 Rn Rd
U o1
SSUBW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
SSUBW, SSUBW2 Page 1268

Operation
integer element1;
integer element2;
integer sum;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
SSUBW, SSUBW2 Page 1269

ST1 (multiple structures)
Store multiple single-element structures from one, two, three, or four registers. This instruction stores elements to
memory from one, two, three, or four SIMD&FP registers, without interleaving. Every element of each register is
stored.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 x x 1 x size Rn Rt
L opcode
One register (opcode == 0111)
ST1 { <Vt>.<T> }, [<Xn|SP>]
Two registers (opcode == 1010)
ST1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Three registers (opcode == 0110)
ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]
Four registers (opcode == 0010)
ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm x x 1 x size Rn Rt
L opcode
ST1 (multiple structures) Page 1270

One register, immediate offset (Rm == 11111 && opcode == 0111)
ST1 { <Vt>.<T> }, [<Xn|SP>], <imm>
One register, register offset (Rm != 11111 && opcode == 0111)
ST1 { <Vt>.<T> }, [<Xn|SP>], <Xm>
Two registers, immediate offset (Rm == 11111 && opcode == 1010)
ST1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
Two registers, register offset (Rm != 11111 && opcode == 1010)
ST1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
Three registers, immediate offset (Rm == 11111 && opcode == 0110)
ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>
Three registers, register offset (Rm != 11111 && opcode == 0110)
ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>
Four registers, immediate offset (Rm == 11111 && opcode == 0010)
ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>
Four registers, register offset (Rm != 11111 && opcode == 0010)
ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
<imm> For the one register, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #8
1 #16
For the two registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #16
1 #32
For the three registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #24
1 #48
For the four registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #32
1 #64
Shared Decode


case opcode of


Operation
bits(64) address;
bits(64) offs;
integer tt;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
tt = (t + r) MOD 32;
rval = V[tt];
V[tt] = rval;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

ST1 (single structure)
Store a single-element structure from one lane of one register. This instruction stores the specified element of a
SIMD&FP register to memory.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 0 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode
ST1 { <Vt>.B }[<index>], [<Xn|SP>]
16-bit (opcode == 010 && size == x0)
ST1 { <Vt>.H }[<index>], [<Xn|SP>]
32-bit (opcode == 100 && size == 00)
ST1 { <Vt>.S }[<index>], [<Xn|SP>]
64-bit (opcode == 100 && S == 0 && size == 01)
ST1 { <Vt>.D }[<index>], [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 0 Rm x x 0 S size Rn Rt
L R opcode
ST1 (single structure) Page 1274

ST1 { <Vt>.B }[<index>], [<Xn|SP>], #1
ST1 { <Vt>.B }[<index>], [<Xn|SP>], <Xm>
ST1 { <Vt>.H }[<index>], [<Xn|SP>], #2
ST1 { <Vt>.H }[<index>], [<Xn|SP>], <Xm>
ST1 { <Vt>.S }[<index>], [<Xn|SP>], #4
ST1 { <Vt>.S }[<index>], [<Xn|SP>], <Xm>
ST1 { <Vt>.D }[<index>], [<Xn|SP>], #8
ST1 { <Vt>.D }[<index>], [<Xn|SP>], <Xm>
Assembler Symbols

Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;


Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Store multiple 2-element structures from two registers. This instruction stores multiple 2-element structures from two
SIMD&FP registers to memory, with interleaving. Every element of each register is stored.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 size Rn Rt
L opcode
No offset
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 1 0 0 0 size Rn Rt
L opcode
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

Q <imm>
0 #16
1 #32
Shared Decode


case opcode of


Operation
bits(64) address;
bits(64) offs;
integer tt;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
tt = (t + r) MOD 32;
rval = V[tt];
V[tt] = rval;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Store single 2-element structure from one lane of two registers. This instruction stores a 2-element structure to
memory from corresponding elements of two SIMD&FP registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 1 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode
ST2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>]
16-bit (opcode == 010 && size == x0)
ST2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>]
32-bit (opcode == 100 && size == 00)
ST2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>]
64-bit (opcode == 100 && S == 0 && size == 01)
ST2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 1 Rm x x 0 S size Rn Rt
L R opcode

ST2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], #2
ST2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], <Xm>
ST2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], #4
ST2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], <Xm>
ST2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], #8
ST2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], <Xm>
ST2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], #16
ST2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], <Xm>
Assembler Symbols

Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;


Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Store multiple 3-element structures from three registers. This instruction stores multiple 3-element structures to
memory from three SIMD&FP registers, with interleaving. Every element of each register is stored.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 size Rn Rt
L opcode
No offset
ST3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 0 1 0 0 size Rn Rt
L opcode
ST3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>
ST3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

Q <imm>
0 #24
1 #48
Shared Decode


case opcode of


Operation
bits(64) address;
bits(64) offs;
integer tt;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
tt = (t + r) MOD 32;
rval = V[tt];
V[tt] = rval;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Store single 3-element structure from one lane of three registers. This instruction stores a 3-element structure to
memory from corresponding elements of three SIMD&FP registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 0 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode
ST3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>]
16-bit (opcode == 011 && size == x0)
ST3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>]
32-bit (opcode == 101 && size == 00)
ST3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>]
64-bit (opcode == 101 && S == 0 && size == 01)
ST3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 0 Rm x x 1 S size Rn Rt
L R opcode

ST3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], #3
ST3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], <Xm>
ST3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], #6
ST3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], <Xm>
ST3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], #12
ST3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], <Xm>
ST3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], #24
ST3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], <Xm>
Assembler Symbols

Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;


Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Store multiple 4-element structures from four registers. This instruction stores multiple 4-element structures to
memory from four SIMD&FP registers, with interleaving. Every element of each register is stored.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 size Rn Rt
L opcode
No offset
ST4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 0 0 0 0 size Rn Rt
L opcode
ST4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>
ST4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

Q <imm>
0 #32
1 #64
Shared Decode


case opcode of


Operation
bits(64) address;
bits(64) offs;
integer tt;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
tt = (t + r) MOD 32;
rval = V[tt];
V[tt] = rval;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

Store single 4-element structure from one lane of four registers. This instruction stores a 4-element structure to
memory from corresponding elements of four SIMD&FP registers.
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 1 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode
ST4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>]
16-bit (opcode == 011 && size == x0)
ST4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>]
32-bit (opcode == 101 && size == 00)
ST4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>]
64-bit (opcode == 101 && S == 0 && size == 01)
ST4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>]
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 1 Rm x x 1 S size Rn Rt
L R opcode

ST4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], #4
ST4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], <Xm>
ST4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], #8
ST4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], <Xm>
ST4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], #16
ST4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], <Xm>
ST4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], #32
ST4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], <Xm>
Assembler Symbols

Shared Decode

integer index;
case scale of
when 3
scale = UInt(size);
replicate = TRUE;
when 0
when 1
when 2
else
scale = 3;


Operation
bits(64) address;
bits(64) offs;
bits(128) rval;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
t = (t + 1) MOD 32;
else
rval = V[t];
V[t] = rval;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
else

STNP (SIMD&FP)
Store Pair of SIMD&FP registers, with Non-temporal hint. This instruction stores a pair of SIMD&FP registers to
memory, issuing a hint to the memory system that the access is non-temporal. The address used for the store is
calculated from an address from a base register value and an immediate offset. For information about non-temporal
pair instructions, see Load/Store SIMD and Floating-point Non-temporal pair.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 0 0 imm7 Rt2 Rn Rt
L
32-bit (opc == 00)
STNP <St1>, <St2>, [<Xn|SP>{, #<imm>}]
64-bit (opc == 01)
STNP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]
128-bit (opc == 10)
STNP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]
// Empty.
Assembler Symbols
For the 128-bit variant: is the optional signed immediate byte offset, a multiple of 16 in the range -1024
to 1008, defaulting to 0 and encoded in the "imm7" field as <imm>/16.
Shared Decode
STNP (SIMD&FP) Page 1299

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data1 = V[t];
data2 = V[t2];
Mem[address, dbytes, AccType_VECSTREAM] = data1;
Mem[address+dbytes, dbytes, AccType_VECSTREAM] = data2;
STNP (SIMD&FP) Page 1300

STP (SIMD&FP)
Store Pair of SIMD&FP registers. This instruction stores a pair of SIMD&FP registers to memory. The address used for
the store is calculated from a base register value and an immediate offset.
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 1 0 imm7 Rt2 Rn Rt
L
32-bit (opc == 00)
STP <St1>, <St2>, [<Xn|SP>], #<imm>
64-bit (opc == 01)
STP <Dt1>, <Dt2>, [<Xn|SP>], #<imm>
128-bit (opc == 10)
STP <Qt1>, <Qt2>, [<Xn|SP>], #<imm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 1 0 imm7 Rt2 Rn Rt
L
32-bit (opc == 00)
STP <St1>, <St2>, [<Xn|SP>, #<imm>]!
64-bit (opc == 01)
STP <Dt1>, <Dt2>, [<Xn|SP>, #<imm>]!
128-bit (opc == 10)
STP <Qt1>, <Qt2>, [<Xn|SP>, #<imm>]!

Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 0 0 imm7 Rt2 Rn Rt
L
STP (SIMD&FP) Page 1301

32-bit (opc == 00)
STP <St1>, <St2>, [<Xn|SP>{, #<imm>}]
64-bit (opc == 01)
STP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]
128-bit (opc == 10)
STP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]

Assembler Symbols
For the 128-bit post-index and 128-bit pre-index variant: is the signed immediate byte offset, a multiple
of 16 in the range -1024 to 1008, encoded in the "imm7" field as <imm>/16.
Shared Decode

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
data1 = V[t];
data2 = V[t2];
Mem[address, dbytes, AccType_VEC] = data1;
Mem[address+dbytes, dbytes, AccType_VEC] = data2;
if wback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;

STR (immediate, SIMD&FP)
Store SIMD&FP register (immediate offset). This instruction stores a single SIMD&FP register to memory. The
address that is used for the store is calculated from a base register value and an immediate offset.
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 0 1 Rn Rt
opc
8-bit (size == 00 && opc == 00)
STR <Bt>, [<Xn|SP>], #<simm>
16-bit (size == 01 && opc == 00)
STR <Ht>, [<Xn|SP>], #<simm>
32-bit (size == 10 && opc == 00)
STR <St>, [<Xn|SP>], #<simm>
64-bit (size == 11 && opc == 00)
STR <Dt>, [<Xn|SP>], #<simm>
128-bit (size == 00 && opc == 10)
STR <Qt>, [<Xn|SP>], #<simm>

Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 1 1 Rn Rt
opc
STR (immediate, SIMD&FP) Page 1304

8-bit (size == 00 && opc == 00)
STR <Bt>, [<Xn|SP>, #<simm>]!
16-bit (size == 01 && opc == 00)
STR <Ht>, [<Xn|SP>, #<simm>]!
32-bit (size == 10 && opc == 00)
STR <St>, [<Xn|SP>, #<simm>]!
64-bit (size == 11 && opc == 00)
STR <Dt>, [<Xn|SP>, #<simm>]!
128-bit (size == 00 && opc == 10)
STR <Qt>, [<Xn|SP>, #<simm>]!

Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 1 x 0 imm12 Rn Rt
opc
8-bit (size == 00 && opc == 00)
STR <Bt>, [<Xn|SP>{, #<pimm>}]
16-bit (size == 01 && opc == 00)
STR <Ht>, [<Xn|SP>{, #<pimm>}]
32-bit (size == 10 && opc == 00)
STR <St>, [<Xn|SP>{, #<pimm>}]
64-bit (size == 11 && opc == 00)
STR <Dt>, [<Xn|SP>{, #<pimm>}]
128-bit (size == 00 && opc == 10)
STR <Qt>, [<Xn|SP>{, #<pimm>}]


Assembler Symbols
<pimm> For the 8-bit variant: is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to
0 and encoded in the "imm12" field.
Shared Decode

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
case memop of
when MemOp_STORE
data = V[t];
when MemOp_LOAD
V[t] = data;
if wback then
if postindex then
if n == 31 then
SP[] = address;
else
X[n] = address;

STR (register, SIMD&FP)
Store SIMD&FP register (register offset). This instruction stores a single SIMD&FP register to memory. The address
that is used for the store is calculated from a base register value and an offset register value. The offset can be
optionally shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 1 Rm option S 1 0 Rn Rt
opc
8-fsreg,STR-8-fsreg (size == 00 && opc == 00 && option != 011)
STR <Bt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]
8-fsreg,STR-8-fsreg (size == 00 && opc == 00 && option == 011)
STR <Bt>, [<Xn|SP>, <Xm>{, LSL <amount>}]
16-fsreg,STR-16-fsreg (size == 01 && opc == 00)
STR <Ht>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
STR <St>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
STR <Dt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]
STR <Qt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

Assembler Symbols
"Rm" field.
"Rm" field.
<extend> For the 8-bit variant: is the index extend specifier, encoded in “option”:
STR (register, SIMD&FP) Page 1308

option <extend>
010 UXTW
110 SXTW
111 SXTX
For the 128-bit, 16-bit, 32-bit and 64-bit variant: is the index extend/shift specifier, defaulting to LSL,
and which must be omitted for the LSL option when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> For the 8-bit variant: is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if
present.
S <amount>
0 #0
1 #1
S <amount>
0 #0
1 #2
S <amount>
0 #0
1 #3
S <amount>
0 #0
1 #4
Shared Decode

Operation

bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = V[t];
when MemOp_LOAD
V[t] = data;

STUR (SIMD&FP)
Store SIMD&FP register (unscaled offset). This instruction stores a single SIMD&FP register to memory. The address
that is used for the store is calculated from a base register value and an optional immediate offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 0 0 Rn Rt
opc
8-bit (size == 00 && opc == 00)
STUR <Bt>, [<Xn|SP>{, #<simm>}]
16-bit (size == 01 && opc == 00)
STUR <Ht>, [<Xn|SP>{, #<simm>}]
32-bit (size == 10 && opc == 00)
STUR <St>, [<Xn|SP>{, #<simm>}]
64-bit (size == 11 && opc == 00)
STUR <Dt>, [<Xn|SP>{, #<simm>}]
128-bit (size == 00 && opc == 10)
STUR <Qt>, [<Xn|SP>{, #<simm>}]

Assembler Symbols
the "imm9" field.
Shared Decode
STUR (SIMD&FP) Page 1311

Operation
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = V[t];
when MemOp_LOAD
V[t] = data;
STUR (SIMD&FP) Page 1312

SUB (vector)
Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the
corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U
Scalar
SUB <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U
Vector
SUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
SUB (vector) Page 1313

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
SUB (vector) Page 1314

SUBHN, SUBHN2
Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP
register from the corresponding vector element in the first source SIMD&FP register, places the most significant half
of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the
values in this instruction are signed integer values.
The results are truncated. For rounded results, see RSUBHN.
The SUBHN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
SUBHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 0 0 Rn Rd
U o1
SUBHN{2} <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
SUBHN, SUBHN2 Page 1315

Operation
bits(2*esize) sum;
if sub_op then
else
If PSTATE.DIT is 1:
SUBHN, SUBHN2 Page 1316

SUDOT (by element)
Dot product index form with signed and unsigned integers. This instruction performs the dot product of the four
signed 8-bit integer values in each 32-bit element of the first source register with the four unsigned 8-bit integer
values in an indexed 32-bit element of the second source register, accumulating the result into the corresponding
32-bit element of the destination vector.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 1 1 1 1 H 0 Rn Rd
US
Vector
SUDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]

boolean op1_unsigned = (US == '1');
Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 8B
1 16B
<index> Is the immediate index of a quadtuplet of four 8-bit elements in the range 0 to 3, encoded in the "H:L"
fields.
SUDOT (by element) Page 1317

Operation
bits(32) res = Elem[operand3, e, 32];
for b = 0 to 3
integer element1 = Int(Elem[operand1, 4*e+b, 8], op1_unsigned);
integer element2 = Int(Elem[operand2, 4*i+b, 8], op2_unsigned);
Elem[result, e, 32] = res;
V[d] = result;
SUDOT (by element) Page 1318

SUQADD
Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector
elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the
destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
Scalar
SUQADD <V><d>, <V><n>

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
Vector
SUQADD <Vd>.<T>, <Vn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
SUQADD Page 1319

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation

integer op1;
integer op2;
boolean sat;
op1 = Int(Elem[operand, e, esize], !unsigned);
op2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], sat) = SatQ(op1 + op2, esize, unsigned);
V[d] = result;
SUQADD Page 1320

SXTL, SXTL2
Signed extend Long. This instruction duplicates each vector element in the lower or upper half of the source
SIMD&FP register into a vector, and writes the vector to the destination SIMD&FP register. The destination vector
elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
The SXTL instruction extracts the source vector from the lower half of the source register, while the SXTL2 instruction
extracts the source vector from the upper half of the source register.
This is an alias of SSHLL, SSHLL2. This means:
• The encodings in this description are named to match the encodings of SSHLL, SSHLL2.
• The description of SSHLL, SSHLL2 gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 0 0 0 1 0 1 0 0 1 Rn Rd
U immh immb
Vector
SXTL{2} <Vd>.<Ta>, <Vn>.<Tb>
is equivalent to
SSHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #0
and is the preferred disassembly when BitCount(immh) == 1.
Assembler Symbols
Q 2
0 [absent]
1 [present]
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
Operation
The description of SSHLL, SSHLL2 gives the operational pseudocode for this instruction.
SXTL, SXTL2 Page 1321

If PSTATE.DIT is 1:
SXTL, SXTL2 Page 1322

TBL
Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP
register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source
table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP
register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is
used to describe the table, the first source register describes the lowest bytes of the table.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 Rm 0 len 0 0 0 Rn Rd
op
Two register table (len == 01)
TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B }, <Vm>.<Ta>
Three register table (len == 10)
TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>
Four register table (len == 11)
TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B }, <Vm>.<Ta>
Single register table (len == 00)
TBL <Vd>.<Ta>, { <Vn>.16B }, <Vm>.<Ta>

integer regs = UInt(len) + 1;
boolean is_tbl = (op == '0');
Assembler Symbols
Q <Ta>
0 8B
1 16B
<Vn> For the four register table, three register table and two register table variant: is the name of the first
SIMD&FP table register, encoded in the "Rn" field.
For the single register table variant: is the name of the SIMD&FP table register, encoded in the "Rn"
field.
<Vn+1> Is the name of the second SIMD&FP table register, encoded as "Rn" plus 1 modulo 32.
<Vn+2> Is the name of the third SIMD&FP table register, encoded as "Rn" plus 2 modulo 32.
<Vn+3> Is the name of the fourth SIMD&FP table register, encoded as "Rn" plus 3 modulo 32.
<Vm> Is the name of the SIMD&FP index register, encoded in the "Rm" field.
TBL Page 1323

Operation
bits(datasize) indices = V[m];
bits(128*regs) table = Zeros();
integer index;
// Create table from registers

for i = 0 to regs-1
table<128*i+127:128*i> = V[n];
n = (n + 1) MOD 32;
result = if is_tbl then Zeros() else V[d];

for i = 0 to elements-1
index = UInt(Elem[indices, i, 8]);
if index < 16 * regs then
Elem[result, i, 8] = Elem[table, index, 8];
V[d] = result;
If PSTATE.DIT is 1:
TBL Page 1324

TBX
Table vector lookup extension. This instruction reads each value from the vector elements in the index source
SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four
source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination
SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination
register is left unchanged. If more than one source register is used to describe the table, the first source register
describes the lowest bytes of the table.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 Rm 0 len 1 0 0 Rn Rd
op
Two register table (len == 01)
TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B }, <Vm>.<Ta>
Three register table (len == 10)
TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>
Four register table (len == 11)
TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B }, <Vm>.<Ta>
Single register table (len == 00)
TBX <Vd>.<Ta>, { <Vn>.16B }, <Vm>.<Ta>

integer regs = UInt(len) + 1;
boolean is_tbl = (op == '0');
Assembler Symbols
Q <Ta>
0 8B
1 16B
<Vn> For the four register table, three register table and two register table variant: is the name of the first
SIMD&FP table register, encoded in the "Rn" field.
For the single register table variant: is the name of the SIMD&FP table register, encoded in the "Rn"
field.
<Vn+1> Is the name of the second SIMD&FP table register, encoded as "Rn" plus 1 modulo 32.
<Vn+2> Is the name of the third SIMD&FP table register, encoded as "Rn" plus 2 modulo 32.
<Vn+3> Is the name of the fourth SIMD&FP table register, encoded as "Rn" plus 3 modulo 32.
<Vm> Is the name of the SIMD&FP index register, encoded in the "Rm" field.
TBX Page 1325

Operation
bits(datasize) indices = V[m];
bits(128*regs) table = Zeros();
integer index;
// Create table from registers

for i = 0 to regs-1
table<128*i+127:128*i> = V[n];
n = (n + 1) MOD 32;
result = if is_tbl then Zeros() else V[d];

for i = 0 to elements-1
index = UInt(Elem[indices, i, 8]);
if index < 16 * regs then
Elem[result, i, 8] = Elem[table, index, 8];
V[d] = result;
If PSTATE.DIT is 1:
TBX Page 1326

TRN1
Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two
source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the
vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-
numbered elements of the destination vector, starting at zero, while vector elements from the second source register
are placed into odd-numbered elements of the destination vector.
By using this instruction with TRN2, a 2 x 2 matrix can be transposed.
The following figure shows an example of the operation of TRN1 and TRN2 halfword operations where Q = 0.
TRN1.16 TRN2.16
Vn Vn
3 2 1 0 3 2 1 0
Vd Vd
Vm Vm
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 0 1 0 1 0 Rn Rd
op
Advanced SIMD
TRN1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer part = UInt(op);
integer pairs = elements DIV 2;
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
TRN1 Page 1327

Operation
for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, 2*p+part, esize];
V[d] = result;
If PSTATE.DIT is 1:
TRN1 Page 1328

TRN2
Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two
source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the
destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements
of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-
numbered elements of the destination vector.
By using this instruction with TRN1, a 2 x 2 matrix can be transposed.
The following figure shows an example of the operation of TRN1 and TRN2 halfword operations where Q = 0.
TRN1.16 TRN2.16
Vn Vn
3 2 1 0 3 2 1 0
Vd Vd
Vm Vm
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 1 1 0 1 0 Rn Rd
op
Advanced SIMD
TRN2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
TRN2 Page 1329

Operation
V[d] = result;
If PSTATE.DIT is 1:
TRN2 Page 1330

UABA
Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second
source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the
absolute values of the results into the elements of the vector of the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 1 1 Rn Rd
U ac
UABA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;

V[d] = result;
UABA Page 1331

If PSTATE.DIT is 1:
UABA Page 1332

UABAL, UABAL2
Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or
upper half of the second source SIMD&FP register from the corresponding vector elements of the first source
SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in
this instruction are unsigned integer values.
The UABAL instruction extracts each source vector from the lower half of each source register, while the UABAL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 0 0 Rn Rd
U op
UABAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UABAL, UABAL2 Page 1333

Operation
integer element1;
integer element2;

V[d] = result;
If PSTATE.DIT is 1:
UABAL, UABAL2 Page 1334

UABD
Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source
SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute
values of the results into a vector, and writes the vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 0 1 Rn Rd
U ac
UABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;

V[d] = result;
UABD Page 1335

If PSTATE.DIT is 1:
UABD Page 1336

UABDL, UABDL2
Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the
second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register,
places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements. All the values in this instruction are
unsigned integer values.
The UABDL instruction extracts each source vector from the lower half of each source register, while the UABDL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 0 0 Rn Rd
U op
UABDL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UABDL, UABDL2 Page 1337

Operation
integer element1;
integer element2;

V[d] = result;
If PSTATE.DIT is 1:
UABDL, UABDL2 Page 1338

UADALP
Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the
vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 0 1 0 Rn Rd
U op
Vector
UADALP <Vd>.<Ta>, <Vn>.<Tb>

Assembler Symbols
size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UADALP Page 1339

Operation
bits(2*esize) sum;
integer op1;
integer op2;

if acc then
else
V[d] = result;
If PSTATE.DIT is 1:
UADALP Page 1340

UADDL, UADDL2
Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source
SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into
a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long
as the source vector elements. All the values in this instruction are unsigned integer values.
The UADDL instruction extracts each source vector from the lower half of each source register, while the UADDL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 0 0 Rn Rd
U o1
UADDL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UADDL, UADDL2 Page 1341

Operation
integer element1;
integer element2;
integer sum;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
UADDL, UADDL2 Page 1342

UADDLP
Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the
source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
The destination vector elements are twice as long as the source vector elements.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 0 1 0 Rn Rd
U op
Vector
UADDLP <Vd>.<Ta>, <Vn>.<Tb>

Assembler Symbols
size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UADDLP Page 1343

Operation
bits(2*esize) sum;
integer op1;
integer op2;

if acc then
else
V[d] = result;
If PSTATE.DIT is 1:
UADDLP Page 1344

UADDLV
Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register
together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as
the source vector elements. All the values in this instruction are unsigned integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
Advanced SIMD
UADDLV <V><d>, <Vn>.<T>

Assembler Symbols

size <V>
00 H
01 S
10 D
11 RESERVED
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
integer sum;
sum = Int(Elem[operand, 0, esize], unsigned);

sum = sum + Int(Elem[operand, e, esize], unsigned);
V[d] = sum<2*esize-1:0>;
UADDLV Page 1345

If PSTATE.DIT is 1:
UADDLV Page 1346

UADDW, UADDW2
Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the
corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in
a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register
and the first source register are twice as long as the vector elements of the second source register. All the values in
this instruction are unsigned integer values.
The UADDW instruction extracts vector elements from the lower half of the second source register, while the UADDW2
instruction extracts vector elements from the upper half of the second source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 1 0 0 Rn Rd
U o1
UADDW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UADDW, UADDW2 Page 1347

Operation
integer element1;
integer element2;
integer sum;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
UADDW, UADDW2 Page 1348

UCVTF (vector, fixed-point)
Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-
point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh
Scalar
UCVTF <V><d>, <V><n>, #<fbits>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh
Vector
UCVTF <Vd>.<T>, <Vn>.<T>, #<fbits>


Assembler Symbols
UCVTF (vector, fixed-point) Page 1349

immh <V>
000x RESERVED
001x H
01xx S
1xxx D
immh Q <T>
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
“immh:immb”:
immh <fbits>
000x RESERVED
“immh:immb”:
immh <fbits>
0001 RESERVED
Operation
Elem[result, e, esize] = FixedToFP(element, fracbits, unsigned, FPCR, rounding);
V[d] = result;
UCVTF (vector, fixed-point) Page 1350

UCVTF (vector, integer)
Unsigned integer Convert to Floating-point (vector). This instruction converts each element in a vector from an
unsigned integer value to a floating-point value using the rounding mode that is specified by the FPCR, and writes the

(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U
UCVTF <Hd>, <Hn>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U
UCVTF <V><d>, <V><n>


(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U
UCVTF (vector, integer) Page 1351

UCVTF <Vd>.<T>, <Vn>.<T>
integer esize = 16;

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U
UCVTF <Vd>.<T>, <Vn>.<T>

Assembler Symbols
sz <V>
0 S
1 D
Q <T>
0 4H
1 8H
“sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

Operation
Elem[result, e, esize] = FixedToFP(element, 0, unsigned, FPCR, rounding);
V[d] = result;

UCVTF (scalar, fixed-point)
Unsigned fixed-point Convert to Floating-point (scalar). This instruction converts the unsigned value in the 32-bit or
64-bit general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR,
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
rmode opcode
UCVTF (scalar, fixed-point) Page 1354

(Armv8.2)
UCVTF <Hd>, <Wn>, #<fbits>
UCVTF <Sd>, <Wn>, #<fbits>
UCVTF <Dd>, <Wn>, #<fbits>

(Armv8.2)
UCVTF <Hd>, <Xn>, #<fbits>
UCVTF <Sd>, <Xn>, #<fbits>
UCVTF <Dd>, <Xn>, #<fbits>

integer fltsize;
case ftype of
when '11'
fltsize = 16;
else
UNDEFINED;

Assembler Symbols
<fbits> For the 32-bit to double-precision, 32-bit to half-precision and 32-bit to single-precision variant: is the
minus "scale".

For the 64-bit to double-precision, 64-bit to half-precision and 64-bit to single-precision variant: is the
minus "scale".
Operation
intval = X[n];
fltval = FixedToFP(intval, fracbits, TRUE, FPCR, rounding);
V[d] = fltval;

UCVTF (scalar, integer)
Unsigned integer Convert to Floating-point (scalar). This instruction converts the unsigned integer value in the
general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 1 1 0 0 0 0 0 0 Rn Rd
rmode opcode
UCVTF (scalar, integer) Page 1357

(Armv8.2)
UCVTF <Hd>, <Wn>
UCVTF <Sd>, <Wn>
UCVTF <Dd>, <Wn>

(Armv8.2)
UCVTF <Hd>, <Xn>
UCVTF <Sd>, <Xn>
UCVTF <Dd>, <Xn>

integer fltsize;
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
fltsize = 16;
else
UNDEFINED;
Assembler Symbols

Operation
intval = X[n];
fltval = FixedToFP(intval, 0, TRUE, FPCR, rounding);
V[d] = fltval;

UDOT (by element)
Dot Product unsigned arithmetic (vector, by element). This instruction performs the dot product of the four 8-bit
elements in each 32-bit element of the first source register with the four 8-bit elements of an indexed 32-bit element in
the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
support it.
Vector
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 1 0 H 0 Rn Rd
U
Vector
UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]

integer index = UInt(H:L);

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 8B
1 16B
<index> Is the element index, encoded in the "H:L" fields.
UDOT (by element) Page 1360

Operation
bits(datasize) result = V[d];
integer res = 0;
for i = 0 to 3
if signed then
element2 = SInt(Elem[operand2, 4*index+i, esize DIV 4]);
else
element2 = UInt(Elem[operand2, 4*index+i, esize DIV 4]);
V[d] = result;
UDOT (by element) Page 1361

UDOT (vector)
Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit
elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding
32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the
support it.
Vector
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 1 0 1 Rn Rd
U
Vector
UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 8B
1 16B
UDOT (vector) Page 1362

Operation
result = V[d];
integer res = 0;
for i = 0 to 3
if signed then
else
V[d] = result;
UDOT (vector) Page 1363

UHADD
Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP
registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see URHADD.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 0 1 Rn Rd
U
UHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer sum;
Elem[result, e, esize] = sum<esize:1>;
V[d] = result;
UHADD Page 1364

If PSTATE.DIT is 1:
UHADD Page 1365

UHSUB
Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register
from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places
each result into a vector, and writes the vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 0 1 Rn Rd
U
UHSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer diff;
Elem[result, e, esize] = diff<esize:1>;
V[d] = result;
If PSTATE.DIT is 1:
UHSUB Page 1366

UHSUB Page 1367

UMAX
Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 0 1 Rn Rd
U o1
UMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer maxmin;
V[d] = result;
UMAX Page 1368

If PSTATE.DIT is 1:
UMAX Page 1369

UMAXP
Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 0 1 Rn Rd
U o1
UMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer maxmin;
V[d] = result;
UMAXP Page 1370

If PSTATE.DIT is 1:
UMAXP Page 1371

UMAXV
Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are unsigned integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 0 1 0 1 0 1 0 Rn Rd
U op
Advanced SIMD
UMAXV <V><d>, <Vn>.<T>


Assembler Symbols

size <V>
00 B
01 H
10 S
11 RESERVED
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
integer maxmin;
integer element;

UMAXV Page 1372

If PSTATE.DIT is 1:
UMAXV Page 1373

UMIN
Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP
registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 1 1 Rn Rd
U o1
UMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer maxmin;
V[d] = result;
UMIN Page 1374

If PSTATE.DIT is 1:
UMIN Page 1375

UMINP
Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into
a vector, and writes the vector to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 1 1 Rn Rd
U o1
UMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
integer maxmin;
V[d] = result;
UMINP Page 1376

If PSTATE.DIT is 1:
UMINP Page 1377

UMINV
Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are unsigned integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 0 1 0 Rn Rd
U op
Advanced SIMD
UMINV <V><d>, <Vn>.<T>


Assembler Symbols

size <V>
00 B
01 H
10 S
11 RESERVED
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
integer maxmin;
integer element;

UMINV Page 1378

If PSTATE.DIT is 1:
UMINV Page 1379

UMLAL, UMLAL2 (by element)
Unsigned Multiply-Add Long (vector, by element). This instruction multiplies each vector element in the lower or
register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination
The UMLAL instruction extracts vector elements from the lower half of the first source register, while the UMLAL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 0 1 0 H 0 Rn Rd
U o2
Vector
UMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
UMLAL, UMLAL2 (by

Page 1380
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;

if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
UMLAL, UMLAL2 (by

Page 1381
element)
UMLAL, UMLAL2 (vector)
Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the
first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and
accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector
elements are twice as long as the elements that are multiplied.
The UMLAL instruction extracts vector elements from the lower half of the first source register, while the UMLAL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 0 0 Rn Rd
U o1
UMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UMLAL, UMLAL2 (vector) Page 1382

Operation
integer element1;
integer element2;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
UMLAL, UMLAL2 (vector) Page 1383

UMLSL, UMLSL2 (by element)
Unsigned Multiply-Subtract Long (vector, by element). This instruction multiplies each vector element in the lower or
register and subtracts the results from the vector elements of the destination SIMD&FP register. The destination
The UMLSL instruction extracts vector elements from the lower half of the first source register, while the UMLSL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 1 1 0 H 0 Rn Rd
U o2
Vector
UMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
UMLSL, UMLSL2 (by

Page 1384
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;

if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
UMLSL, UMLSL2 (by

Page 1385
element)
UMLSL, UMLSL2 (vector)
Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or
upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the
values in this instruction are unsigned integer values.
The UMLSL instruction extracts each source vector from the lower half of each source register, while the UMLSL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 0 0 Rn Rd
U o1
UMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UMLSL, UMLSL2 (vector) Page 1386

Operation
integer element1;
integer element2;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
UMLSL, UMLSL2 (vector) Page 1387

UMMLA (vector)
Unsigned 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer
values in the first source vector by the 8x2 matrix of unsigned 8-bit integer values in the second source vector. The
resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 1 0 0 Rm 1 0 1 0 0 1 Rn Rd
U B
Vector
UMMLA <Vd>.4S, <Vn>.16B, <Vm>.16B

Assembler Symbols
Operation
V[d] = MatMulAdd(addend, operand1, operand2, TRUE, TRUE);
UMMLA (vector) Page 1388

UMOV
Unsigned Move vector element to general-purpose register. This instruction reads the unsigned integer from the
source SIMD&FP register, zero-extends it to form a 32-bit or 64-bit value, and writes the result to the destination
general-purpose register.
This instruction is used by the alias MOV (to general).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 1 1 1 1 Rn Rd
32-bit (Q == 0)
UMOV <Wd>, <Vn>.<Ts>[<index>]
64-reg,UMOV-64-reg (Q == 1 && imm5 == x1000)
UMOV <Xd>, <Vn>.<Ts>[<index>]
integer size;
case Q:imm5 of
when '0xxxx1' size = 0; // UMOV Wd, Vn.B
when '0xxx10' size = 1; // UMOV Wd, Vn.H
when '0xx100' size = 2; // UMOV Wd, Vn.S
when '1x1000' size = 3; // UMOV Xd, Vn.D

Assembler Symbols
<Ts> For the 32-bit variant: is an element size specifier, encoded in “imm5”:
imm5 <Ts>
xx000 RESERVED
xxxx1 B
xxx10 H
xx100 S
For the 64-reg,UMOV-64-reg variant: is an element size specifier, encoded in “imm5”:
imm5 <Ts>
x0000 RESERVED
xxxx1 RESERVED
xxx10 RESERVED
xx100 RESERVED
x1000 D
<index> For the 32-bit variant: is the element index encoded in “imm5”:
UMOV Page 1389

imm5 <index>
xx000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
For the 64-reg,UMOV-64-reg variant: is the element index encoded in "imm5<4>".
Alias Conditions

MOV (to general) imm5 == 'x1000'
MOV (to general) imm5 == 'xx100'
Operation
X[d] = ZeroExtend(Elem[operand, index, esize], datasize);
If PSTATE.DIT is 1:
UMOV Page 1390

UMULL, UMULL2 (by element)
Unsigned Multiply Long (vector, by element). This instruction multiplies each vector element in the lower or upper half
of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, places
the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are
twice as long as the elements that are multiplied.
The UMULL instruction extracts vector elements from the lower half of the first source register, while the UMULL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 0 1 0 H 0 Rn Rd
U
Vector
UMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer index;
bit Rmhi;
case size of

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UMULL, UMULL2 (by

Page 1391
element)
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
integer element1;
integer element2;

V[d] = result;
If PSTATE.DIT is 1:
UMULL, UMULL2 (by

Page 1392
element)
UMULL, UMULL2 (vector)
Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half
of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP
register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this
instruction are unsigned integer values.
The UMULL instruction extracts each source vector from the lower half of each source register, while the UMULL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 1 0 0 0 0 Rn Rd
U
UMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
UMULL, UMULL2 (vector) Page 1393

Operation
integer element1;
integer element2;
Elem[result, e, 2*esize] = (element1*element2)<2*esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
UMULL, UMULL2 (vector) Page 1394

UQADD
Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP
registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U
Scalar
UQADD <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U
Vector
UQADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
UQADD Page 1395

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element1;
integer element2;
integer sum;
boolean sat;
(Elem[result, e, esize], sat) = SatQ(sum, esize, unsigned);
V[d] = result;
UQADD Page 1396

UQRSHL
Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source
SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector
element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For
truncated results, see UQSHL.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S
Scalar
UQRSHL <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S
Vector
UQRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
UQRSHL Page 1397

size <V>
00 B
01 H
10 S
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer shift;
integer element;
boolean sat;
if rounding then
if saturating then
else
V[d] = result;
UQRSHL Page 1398

UQRSHRN, UQRSHRN2
Unsigned saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the
source SIMD&FP register, right shifts each result by an immediate value, puts the final result into a vector, and writes
the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are
unsigned integer values. The results are rounded. For truncated results, see UQSHRN.
The UQRSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the UQRSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op
Scalar
UQRSHRN <Vb><d>, <Va><n>, #<shift>

integer part = 0;

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op
Vector
UQRSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>


UQRSHRN, UQRSHRN2 Page 1399

Assembler Symbols
Q 2
0 [absent]
1 [present]
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
immh <shift>
0000 RESERVED
1xxx RESERVED
immh <shift>
1xxx RESERVED

Operation
integer element;
boolean sat;

UQSHL (immediate)
Unsigned saturating Shift Left (immediate). This instruction takes each vector element in the source SIMD&FP
register, shifts it by an immediate value, places the results in a vector, and writes the vector to the destination
SIMD&FP register. The results are truncated. For rounded results, see UQRSHL.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op
Scalar
UQSHL <V><d>, <V><n>, #<shift>

case op:U of
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op
UQSHL (immediate) Page 1402

Vector
UQSHL <Vd>.<T>, <Vn>.<T>, #<shift>

case op:U of
Assembler Symbols

immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0000 RESERVED

immh <shift>
Operation
integer element;
boolean sat;
V[d] = result;

UQSHL (register)
Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source
SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the
second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP
register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For
rounded results, see UQRSHL.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S
Scalar
UQSHL <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S
Vector
UQSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
UQSHL (register) Page 1405

size <V>
00 B
01 H
10 S
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer shift;
integer element;
boolean sat;
if rounding then
if saturating then
else
V[d] = result;
UQSHL (register) Page 1406

UQSHRN, UQSHRN2
Unsigned saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half
the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination
SIMD&FP register. All the values in this instruction are unsigned integer values. The results are truncated. For
rounded results, see UQRSHRN.
The UQSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the UQSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op
Scalar
UQSHRN <Vb><d>, <Va><n>, #<shift>

integer part = 0;

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op
UQSHRN, UQSHRN2 Page 1407

Vector
UQSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>


Assembler Symbols
Q 2
0 [absent]
1 [present]
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED

immh <shift>
0000 RESERVED
1xxx RESERVED
immh <shift>
1xxx RESERVED
Operation
integer element;
boolean sat;

UQSUB
Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register
from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U
Scalar
UQSUB <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U
Vector
UQSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
UQSUB Page 1410

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer element1;
integer element2;
integer diff;
boolean sat;
(Elem[result, e, esize], sat) = SatQ(diff, esize, unsigned);
V[d] = result;
UQSUB Page 1411

UQXTN, UQXTN2
Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register,
saturates each value to half the original width, places the result into a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are unsigned integer values.
The UQXTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
UQXTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U
Scalar
UQXTN <Vb><d>, <Va><n>

integer part = 0;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U
Vector
UQXTN{2} <Vd>.<Tb>, <Vn>.<Ta>

Assembler Symbols
UQXTN, UQXTN2 Page 1412

Q 2
0 [absent]
1 [present]
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

size <Vb>
00 B
01 H
10 S
11 RESERVED
size <Va>
00 H
01 S
10 D
11 RESERVED
Operation
boolean sat;
(Elem[result, e, esize], sat) = SatQ(Int(element, unsigned), esize, unsigned);
UQXTN, UQXTN2 Page 1413

URECPE
Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register,
calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector
to the destination SIMD&FP register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
Vector
URECPE <Vd>.<T>, <Vn>.<T>

integer esize = 32;
Assembler Symbols
sz Q <T>
0 0 2S
0 1 4S
1 x RESERVED
Operation
bits(32) element;
element = Elem[operand, e, 32];
Elem[result, e, 32] = UnsignedRecipEstimate(element);
V[d] = result;
URECPE Page 1414

URHADD
Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source
SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the
The results are rounded. For truncated results, see UHADD.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 1 0 1 Rn Rd
U
URHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
Operation
integer element1;
integer element2;
Elem[result, e, esize] = (element1+element2+1)<esize:1>;
V[d] = result;
URHADD Page 1415

URSHL
Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP
register, shifts the vector element by a value from the least significant byte of the corresponding element of the second
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S
Scalar
URSHL <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S
Vector
URSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
URSHL Page 1416

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer shift;
integer element;
boolean sat;
if rounding then
if saturating then
else
V[d] = result;
URSHL Page 1417

URSHR
Unsigned Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are unsigned integer values. The results are rounded.
For truncated results, see USHR.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0
Scalar
URSHR <V><d>, <V><n>, #<shift>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0
Vector
URSHR <Vd>.<T>, <Vn>.<T>, #<shift>


Assembler Symbols
URSHR Page 1418

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0xxx RESERVED
“immh:immb”:
immh <shift>
Operation
integer element;

V[d] = result;
URSHR Page 1419

URSQRTE
Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP
register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the
vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
Vector
URSQRTE <Vd>.<T>, <Vn>.<T>

integer esize = 32;
Assembler Symbols
sz Q <T>
0 0 2S
0 1 4S
1 x RESERVED
Operation
bits(32) element;
element = Elem[operand, e, 32];
Elem[result, e, 32] = UnsignedRSqrtEstimate(element);
V[d] = result;
URSQRTE Page 1420

URSRA
Unsigned Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, and accumulates the final results with the vector
elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The
results are rounded. For truncated results, see USRA.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0
Scalar
URSRA <V><d>, <V><n>, #<shift>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0
Vector
URSRA <Vd>.<T>, <Vn>.<T>, #<shift>


Assembler Symbols
URSRA Page 1421

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0xxx RESERVED
“immh:immb”:
immh <shift>
Operation
integer element;

V[d] = result;
URSRA Page 1422

USDOT (by element)
Dot Product index form with unsigned and signed integers. This instruction performs the dot product of the four
unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer
values in an indexed 32-bit element of the second source register, accumulating the result into the corresponding
32-bit element of the destination register.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 1 1 1 1 H 0 Rn Rd
US
Vector
USDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 8B
1 16B
<index> Is the immediate index of a quadtuplet of four 8-bit elements in the range 0 to 3, encoded in the "H:L"
fields.
USDOT (by element) Page 1423

Operation
for b = 0 to 3
integer element1 = Int(Elem[operand1, 4*e+b, 8], op1_unsigned);
integer element2 = Int(Elem[operand2, 4*i+b, 8], op2_unsigned);
V[d] = result;
USDOT (by element) Page 1424

USDOT (vector)
Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four
unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer
values in the corresponding 32-bit element of the second source register, accumulating the result into the
corresponding 32-bit element of the destination register.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 0 Rm 1 0 0 1 1 1 Rn Rd
Vector
USDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

Assembler Symbols
Q <Ta>
0 2S
1 4S
Q <Tb>
0 8B
1 16B
Operation
for b = 0 to 3
integer element1 = UInt(Elem[operand1, 4*e+b, 8]);
integer element2 = SInt(Elem[operand2, 4*e+b, 8]);
V[d] = result;
USDOT (vector) Page 1425

USDOT (vector) Page 1426

USHL
Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register,
shifts each element by a value from the least significant byte of the corresponding element of the second source
SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a
rounding shift, see URSHL.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S
Scalar
USHL <V><d>, <V><n>, <V><m>
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S
Vector
USHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Assembler Symbols

size <V>
0x RESERVED
10 RESERVED
11 D
USHL Page 1427

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation
integer shift;
integer element;
boolean sat;
if rounding then
if saturating then
else
V[d] = result;
If PSTATE.DIT is 1:
USHL Page 1428

USHLL, USHLL2
Unsigned Shift Left Long (immediate). This instruction reads each vector element in the lower or upper half of the
source SIMD&FP register, shifts the unsigned integer value left by the specified number of bits, places the result into
a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long
as the source vector elements.
The USHLL instruction extracts vector elements from the lower half of the source register, while the USHLL2 instruction
This instruction is used by the alias UXTL, UXTL2.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 1 0 0 1 Rn Rd
U immh
Vector
USHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #<shift>


Assembler Symbols
Q 2
0 [absent]
1 [present]
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
USHLL, USHLL2 Page 1429

<shift> Is the left shift amount, in the range 0 to the source element width in bits minus 1, encoded in
“immh:immb”:
immh <shift>
1xxx RESERVED
Alias Conditions

UXTL, UXTL2 immb == '000' && BitCount(immh) == 1
Operation
bits(datasize*2) result;
integer element;
V[d] = result;
If PSTATE.DIT is 1:
USHLL, USHLL2 Page 1430

USHR
Unsigned Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are unsigned integer values. The results are truncated. For
rounded results, see URSHR.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0
Scalar
USHR <V><d>, <V><n>, #<shift>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0
Vector
USHR <Vd>.<T>, <Vn>.<T>, #<shift>


Assembler Symbols
USHR Page 1431

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0xxx RESERVED
“immh:immb”:
immh <shift>
Operation
integer element;

V[d] = result;
If PSTATE.DIT is 1:
USHR Page 1432

USMMLA (vector)
Unsigned and signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned
8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source
vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator
in the destination vector. This is equivalent to performing an 8-way dot product per destination element.
Vector
(Armv8.6)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 1 0 0 Rm 1 0 1 0 1 1 Rn Rd
U B
Vector
USMMLA <Vd>.4S, <Vn>.16B, <Vm>.16B

Assembler Symbols
Operation
V[d] = MatMulAdd(addend, operand1, operand2, TRUE, FALSE);
USMMLA (vector) Page 1433

USQADD
Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector
elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the
destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the
bit FPSR.QC is set.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
Scalar
USQADD <V><d>, <V><n>

Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
Vector
USQADD <Vd>.<T>, <Vn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
USQADD Page 1434

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
Operation

integer op1;
integer op2;
boolean sat;
op1 = Int(Elem[operand, e, esize], !unsigned);
op2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], sat) = SatQ(op1 + op2, esize, unsigned);
V[d] = result;
USQADD Page 1435

USRA
Unsigned Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of
the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The results are
truncated. For rounded results, see URSRA.
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0
Scalar
USRA <V><d>, <V><n>, #<shift>


Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0
Vector
USRA <Vd>.<T>, <Vn>.<T>, #<shift>


Assembler Symbols
USRA Page 1436

immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
immh <shift>
0xxx RESERVED
“immh:immb”:
immh <shift>
Operation
integer element;

V[d] = result;
If PSTATE.DIT is 1:
USRA Page 1437

USUBL, USUBL2
Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second
source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the
result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are
unsigned integer values. The destination vector elements are twice as long as the source vector elements.
The USUBL instruction extracts each source vector from the lower half of each source register, while the USUBL2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 0 0 Rn Rd
U o1
USUBL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
USUBL, USUBL2 Page 1438

Operation
integer element1;
integer element2;
integer sum;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
USUBL, USUBL2 Page 1439

USUBW, USUBW2
Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from
the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in
a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed
integer values.
The vector elements of the destination register and the first source register are twice as long as the vector elements of
the second source register.
The USUBW instruction extracts vector elements from the lower half of the first source register, while the USUBW2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 0 0 Rn Rd
U o1
USUBW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>


Assembler Symbols
Q 2
0 [absent]
1 [present]
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
USUBW, USUBW2 Page 1440

Operation
integer element1;
integer element2;
integer sum;
if sub_op then
else
V[d] = result;
If PSTATE.DIT is 1:
USUBW, USUBW2 Page 1441

UXTL, UXTL2
Unsigned extend Long. This instruction copies each vector element from the lower or upper half of the source
SIMD&FP register into a vector, and writes the vector to the destination SIMD&FP register. The destination vector
elements are twice as long as the source vector elements.
The UXTL instruction extracts vector elements from the lower half of the source register, while the UXTL2 instruction
This is an alias of USHLL, USHLL2. This means:
• The encodings in this description are named to match the encodings of USHLL, USHLL2.
• The description of USHLL, USHLL2 gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 0 0 0 1 0 1 0 0 1 Rn Rd
U immh immb
Vector
UXTL{2} <Vd>.<Ta>, <Vn>.<Tb>
is equivalent to
USHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #0
and is the preferred disassembly when BitCount(immh) == 1.
Assembler Symbols
Q 2
0 [absent]
1 [present]
immh <Ta>
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh Q <Tb>
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
Operation
The description of USHLL, USHLL2 gives the operational pseudocode for this instruction.
UXTL, UXTL2 Page 1442

If PSTATE.DIT is 1:
UXTL, UXTL2 Page 1443

UZP1
Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source
SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the
lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a
This instruction can be used with UZP2 to de-interleave two vectors.
The following figure shows an example of the operation of UZP1 and UZP2 with the arrangement specifier 8B.
Vn A7 A6 A5 A4 A3 A2 A1 A0
Vm B7 B6 B5 B4 B3 B2 B1 B0
UZP1.8, doubleword UZP2.8, doubleword

Vd B6 B4 B2 B0 A6 A4 A2 A0 Vd B7 B 5 B 3 B 1 A7 A5 A3 A1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 0 0 1 1 0 Rn Rd
op
Advanced SIMD
UZP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
UZP1 Page 1444

Operation
bits(datasize) operandl = V[n];
bits(datasize) operandh = V[m];
bits(datasize*2) zipped = operandh:operandl;

Elem[result, e, esize] = Elem[zipped, 2*e+part, esize];
V[d] = result;
If PSTATE.DIT is 1:
UZP1 Page 1445

UZP2
Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source
SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a
vector, and the result from the second source register into consecutive elements in the upper half of a vector, and
This instruction can be used with UZP1 to de-interleave two vectors.
The following figure shows an example of the operation of UZP1 and UZP2 with the arrangement specifier 8B.
Vn A7 A6 A5 A4 A3 A2 A1 A0
Vm B7 B6 B5 B4 B3 B2 B1 B0
UZP1.8, doubleword UZP2.8, doubleword

Vd B6 B4 B2 B0 A6 A4 A2 A0 Vd B7 B 5 B 3 B 1 A7 A5 A3 A1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 1 0 1 1 0 Rn Rd
op
Advanced SIMD
UZP2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
UZP2 Page 1446

Operation
bits(datasize) operandl = V[n];
bits(datasize) operandh = V[m];
bits(datasize*2) zipped = operandh:operandl;

V[d] = result;
If PSTATE.DIT is 1:
UZP2 Page 1447

XAR
Exclusive OR and Rotate performs a bitwise exclusive OR of the 128-bit vectors in the two source SIMD&FP registers,
rotates each 64-bit element of the resulting 128-bit vector right by the value specified by a 6-bit immediate value, and
Advanced SIMD
(Armv8.2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 0 0 Rm imm6 Rn Rd
Advanced SIMD
XAR <Vd>.2D, <Vn>.2D, <Vm>.2D, #<imm6>

Assembler Symbols
<imm6> Is a rotation right, encoded in "imm6".
Operation
bits(128) tmp;
tmp = Vn EOR Vm;
V[d] = ROR(tmp<127:64>, UInt(imm6)):ROR(tmp<63:0>, UInt(imm6));
If PSTATE.DIT is 1:
XAR Page 1448

XTN, XTN2
Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to
half the original width, places the result into a vector, and writes the vector to the lower or upper half of the
destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
The XTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
XTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd
Vector
XTN{2} <Vd>.<Tb>, <Vn>.<Ta>

Assembler Symbols
Q 2
0 [absent]
1 [present]
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
XTN, XTN2 Page 1449

Operation
If PSTATE.DIT is 1:
XTN, XTN2 Page 1450

ZIP1
Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP
registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination
SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with
subsequent pairs taken alternately from each source register.
This instruction can be used with ZIP2 to interleave two vectors.
The following figure shows an example of the operation of ZIP1 and ZIP2 with the arrangement specifier 8B.
Vn A7 A6 A5 A4 A3 A2 A1 A0
Vm B7 B6 B5 B4 B3 B2 B1 B0
ZIP1.8, doubleword ZIP2.8, doubleword

Vd B3 A3 B2 A2 B1 A1 B0 A0 Vd B7 A7 B 6 A6 B 5 A5 B 4 A4
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 0 1 1 1 0 Rn Rd
op
Advanced SIMD
ZIP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
ZIP1 Page 1451

Operation
integer base = part * pairs;
Elem[result, 2*p+0, esize] = Elem[operand1, base+p, esize];
V[d] = result;
If PSTATE.DIT is 1:
ZIP1 Page 1452

ZIP2
Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP
registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination
SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with
subsequent pairs taken alternately from each source register.
This instruction can be used with ZIP1 to interleave two vectors.
The following figure shows an example of the operation of ZIP1 and ZIP2 with the arrangement specifier 8B.
Vn A7 A6 A5 A4 A3 A2 A1 A0
Vm B7 B6 B5 B4 B3 B2 B1 B0
ZIP1.8, doubleword ZIP2.8, doubleword

Vd B3 A3 B2 A2 B1 A1 B0 A0 Vd B7 A7 B 6 A6 B 5 A5 B 4 A4
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 1 1 1 1 0 Rn Rd
op
Advanced SIMD
ZIP2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
ZIP2 Page 1453

Operation
V[d] = result;
If PSTATE.DIT is 1:
ZIP2 Page 1454

A64 -- SVE Instructions (alphabetic order)
ABS: Absolute value (predicated).
ADD (immediate): Add immediate (unpredicated).
ADD (vectors, predicated): Add vectors (predicated).
ADD (vectors, unpredicated): Add vectors (unpredicated).
ADDPL: Add multiple of predicate register size to scalar register.
ADDVL: Add multiple of vector register size to scalar register.
ADR: Compute vector address.
AND (immediate): Bitwise AND with immediate (unpredicated).
AND (vectors, predicated): Bitwise AND vectors (predicated).
AND (vectors, unpredicated): Bitwise AND vectors (unpredicated).
AND, ANDS (predicates): Bitwise AND predicates.
ANDV: Bitwise AND reduction to scalar.
ASR (immediate, predicated): Arithmetic shift right by immediate (predicated).
ASR (immediate, unpredicated): Arithmetic shift right by immediate (unpredicated).
ASR (vectors): Arithmetic shift right by vector (predicated).
ASR (wide elements, predicated): Arithmetic shift right by 64-bit wide elements (predicated).
ASR (wide elements, unpredicated): Arithmetic shift right by 64-bit wide elements (unpredicated).
ASRD: Arithmetic shift right for divide by immediate (predicated).
ASRR: Reversed arithmetic shift right by vector (predicated).
BFCVT: Floating-point down convert to BFloat16 format (predicated).
BFCVTNT: Floating-point down convert and narrow to BFloat16 (top, predicated).
BFDOT (indexed): BFloat16 floating-point indexed dot product.
BFDOT (vectors): BFloat16 floating-point dot product.
BFMLALB (indexed): BFloat16 floating-point multiply-add long to single-precision (bottom, indexed).
BFMLALB (vectors): BFloat16 floating-point multiply-add long to single-precision (bottom).
BFMLALT (indexed): BFloat16 floating-point multiply-add long to single-precision (top, indexed).
BFMLALT (vectors): BFloat16 floating-point multiply-add long to single-precision (top).
BFMMLA: BFloat16 floating-point matrix multiply-accumulate.
BIC (immediate): Bitwise clear bits using immediate (unpredicated): an alias of AND (immediate).
BIC (vectors, predicated): Bitwise clear vectors (predicated).
BIC (vectors, unpredicated): Bitwise clear vectors (unpredicated).
BIC, BICS (predicates): Bitwise clear predicates.
BRKA, BRKAS: Break after first true condition.
BRKB, BRKBS: Break before first true condition.
Page 1455
BRKN, BRKNS: Propagate break to next partition.
BRKPA, BRKPAS: Break after first true condition, propagating from previous partition.
BRKPB, BRKPBS: Break before first true condition, propagating from previous partition.
CLASTA (scalar): Conditionally extract element after last to general-purpose register.
CLASTA (SIMD&FP scalar): Conditionally extract element after last to SIMD&FP scalar register.
CLASTA (vectors): Conditionally extract element after last to vector register.
CLASTB (scalar): Conditionally extract last element to general-purpose register.
CLASTB (SIMD&FP scalar): Conditionally extract last element to SIMD&FP scalar register.
CLASTB (vectors): Conditionally extract last element to vector register.
CLS: Count leading sign bits (predicated).
CLZ: Count leading zero bits (predicated).
CMP<cc> (immediate): Compare vector to immediate.
CMP<cc> (vectors): Compare vectors.
CMP<cc> (wide elements): Compare vector to 64-bit wide elements.
CMPLE (vectors): Compare signed less than or equal to vector, setting the condition flags: an alias of CMP<cc>
(vectors).
CMPLO (vectors): Compare unsigned lower than vector, setting the condition flags: an alias of CMP<cc> (vectors).
CMPLS (vectors): Compare unsigned lower or same as vector, setting the condition flags: an alias of CMP<cc>
(vectors).
CMPLT (vectors): Compare signed less than vector, setting the condition flags: an alias of CMP<cc> (vectors).
CNOT: Logically invert boolean condition in vector (predicated).
CNT: Count non-zero bits (predicated).
CNTB, CNTD, CNTH, CNTW: Set scalar to multiple of predicate constraint element count.
CNTP: Set scalar to count of true predicate elements.
COMPACT: Shuffle active elements of vector to the right and fill with zero.
CPY (immediate, merging): Copy signed integer immediate to vector elements (merging).
CPY (immediate, zeroing): Copy signed integer immediate to vector elements (zeroing).
CPY (scalar): Copy general-purpose register to vector elements (predicated).
CPY (SIMD&FP scalar): Copy SIMD&FP scalar register to vector elements (predicated).
CTERMEQ, CTERMNE: Compare and terminate loop.
DECB, DECD, DECH, DECW (scalar): Decrement scalar by multiple of predicate constraint element count.
DECD, DECH, DECW (vector): Decrement vector by multiple of predicate constraint element count.
DECP (scalar): Decrement scalar by count of true predicate elements.
DECP (vector): Decrement vector by count of true predicate elements.
DUP (immediate): Broadcast signed immediate to vector elements (unpredicated).
DUP (indexed): Broadcast indexed element to vector (unpredicated).
DUP (scalar): Broadcast general-purpose register to vector elements (unpredicated).
Page 1456
DUPM: Broadcast logical bitmask immediate to vector (unpredicated).
EON: Bitwise exclusive OR with inverted immediate (unpredicated): an alias of EOR (immediate).
EOR (immediate): Bitwise exclusive OR with immediate (unpredicated).
EOR (vectors, predicated): Bitwise exclusive OR vectors (predicated).
EOR (vectors, unpredicated): Bitwise exclusive OR vectors (unpredicated).
EOR, EORS (predicates): Bitwise exclusive OR predicates.
EORV: Bitwise exclusive OR reduction to scalar.
EXT: Extract vector from pair of vectors.
FABD: Floating-point absolute difference (predicated).
FABS: Floating-point absolute value (predicated).
FAC<cc>: Floating-point absolute compare vectors.
FACLE: Floating-point absolute compare less than or equal: an alias of FAC<cc>.
FACLT: Floating-point absolute compare less than: an alias of FAC<cc>.
FADD (immediate): Floating-point add immediate (predicated).
FADD (vectors, predicated): Floating-point add vector (predicated).
FADD (vectors, unpredicated): Floating-point add vector (unpredicated).
FADDA: Floating-point add strictly-ordered reduction, accumulating in scalar.
FADDV: Floating-point add recursive reduction to scalar.
FCADD: Floating-point complex add with rotate (predicated).
FCM<cc> (vectors): Floating-point compare vectors.
FCM<cc> (zero): Floating-point compare vector with zero.
FCMLA (indexed): Floating-point complex multiply-add by indexed values with rotate.
FCMLA (vectors): Floating-point complex multiply-add with rotate (predicated).
FCMLE (vectors): Floating-point compare less than or equal to vector: an alias of FCM<cc> (vectors).
FCMLT (vectors): Floating-point compare less than vector: an alias of FCM<cc> (vectors).
FCPY: Copy 8-bit floating-point immediate to vector elements (predicated).
FCVT: Floating-point convert precision (predicated).
FCVTZS: Floating-point convert to signed integer, rounding toward zero (predicated).
FCVTZU: Floating-point convert to unsigned integer, rounding toward zero (predicated).
FDIV: Floating-point divide by vector (predicated).
FDIVR: Floating-point reversed divide by vector (predicated).
FDUP: Broadcast 8-bit floating-point immediate to vector elements (unpredicated).
FEXPA: Floating-point exponential accelerator.
FMAD: Floating-point fused multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm].
FMAX (immediate): Floating-point maximum with immediate (predicated).
FMAX (vectors): Floating-point maximum (predicated).
Page 1457
FMAXNM (immediate): Floating-point maximum number with immediate (predicated).
FMAXNM (vectors): Floating-point maximum number (predicated).
FMAXNMV: Floating-point maximum number recursive reduction to scalar.
FMAXV: Floating-point maximum recursive reduction to scalar.
FMIN (immediate): Floating-point minimum with immediate (predicated).
FMIN (vectors): Floating-point minimum (predicated).
FMINNM (immediate): Floating-point minimum number with immediate (predicated).
FMINNM (vectors): Floating-point minimum number (predicated).
FMINNMV: Floating-point minimum number recursive reduction to scalar.
FMINV: Floating-point minimum recursive reduction to scalar.
FMLA (indexed): Floating-point fused multiply-add by indexed elements (Zda = Zda + Zn * Zm[indexed]).
FMLA (vectors): Floating-point fused multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm].
FMLS (indexed): Floating-point fused multiply-subtract by indexed elements (Zda = Zda + -Zn * Zm[indexed]).
FMLS (vectors): Floating-point fused multiply-subtract vectors (predicated), writing addend [Zda = Zda + -Zn * Zm].
FMMLA: Floating-point matrix multiply-accumulate.
FMOV (immediate, predicated): Move 8-bit floating-point immediate to vector elements (predicated): an alias of FCPY.
FMOV (immediate, unpredicated): Move 8-bit floating-point immediate to vector elements (unpredicated): an alias of
FDUP.
FMOV (zero, predicated): Move floating-point +0.0 to vector elements (predicated): an alias of CPY (immediate,
merging).
FMOV (zero, unpredicated): Move floating-point +0.0 to vector elements (unpredicated): an alias of DUP (immediate).
FMSB: Floating-point fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za + -Zdn * Zm].
FMUL (immediate): Floating-point multiply by immediate (predicated).
FMUL (indexed): Floating-point multiply by indexed elements.
FMUL (vectors, predicated): Floating-point multiply vectors (predicated).
FMUL (vectors, unpredicated): Floating-point multiply vectors (unpredicated).
FMULX: Floating-point multiply-extended vectors (predicated).
FNEG: Floating-point negate (predicated).
FNMAD: Floating-point negated fused multiply-add vectors (predicated), writing multiplicand [Zdn = -Za + -Zdn *
Zm].
FNMLA: Floating-point negated fused multiply-add vectors (predicated), writing addend [Zda = -Zda + -Zn * Zm].
FNMLS: Floating-point negated fused multiply-subtract vectors (predicated), writing addend [Zda = -Zda + Zn * Zm].
FNMSB: Floating-point negated fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = -Za + Zdn *
Zm].
FRECPE: Floating-point reciprocal estimate (unpredicated).
FRECPS: Floating-point reciprocal step (unpredicated).
FRECPX: Floating-point reciprocal exponent (predicated).
FRINT<r>: Floating-point round to integral value (predicated).
Page 1458
FRSQRTE: Floating-point reciprocal square root estimate (unpredicated).
FRSQRTS: Floating-point reciprocal square root step (unpredicated).
FSCALE: Floating-point adjust exponent by vector (predicated).
FSQRT: Floating-point square root (predicated).
FSUB (immediate): Floating-point subtract immediate (predicated).
FSUB (vectors, predicated): Floating-point subtract vectors (predicated).
FSUB (vectors, unpredicated): Floating-point subtract vectors (unpredicated).
FSUBR (immediate): Floating-point reversed subtract from immediate (predicated).
FSUBR (vectors): Floating-point reversed subtract vectors (predicated).
FTMAD: Floating-point trigonometric multiply-add coefficient.
FTSMUL: Floating-point trigonometric starting value.
FTSSEL: Floating-point trigonometric select coefficient.
INCB, INCD, INCH, INCW (scalar): Increment scalar by multiple of predicate constraint element count.
INCD, INCH, INCW (vector): Increment vector by multiple of predicate constraint element count.
INCP (scalar): Increment scalar by count of true predicate elements.
INCP (vector): Increment vector by count of true predicate elements.
INDEX (immediate, scalar): Create index starting from immediate and incremented by general-purpose register.
INDEX (immediates): Create index starting from and incremented by immediate.
INDEX (scalar, immediate): Create index starting from general-purpose register and incremented by immediate.
INDEX (scalars): Create index starting from and incremented by general-purpose register.
INSR (scalar): Insert general-purpose register in shifted vector.
INSR (SIMD&FP scalar): Insert SIMD&FP scalar register in shifted vector.
LASTA (scalar): Extract element after last to general-purpose register.
LASTA (SIMD&FP scalar): Extract element after last to SIMD&FP scalar register.
LASTB (scalar): Extract last element to general-purpose register.
LASTB (SIMD&FP scalar): Extract last element to SIMD&FP scalar register.
LD1B (scalar plus immediate): Contiguous load unsigned bytes to vector (immediate index).
LD1B (scalar plus scalar): Contiguous load unsigned bytes to vector (scalar index).
LD1B (scalar plus vector): Gather load unsigned bytes to vector (vector index).
LD1B (vector plus immediate): Gather load unsigned bytes to vector (immediate index).
LD1D (scalar plus immediate): Contiguous load doublewords to vector (immediate index).
LD1D (scalar plus scalar): Contiguous load doublewords to vector (scalar index).
LD1D (scalar plus vector): Gather load doublewords to vector (vector index).
LD1D (vector plus immediate): Gather load doublewords to vector (immediate index).
LD1H (scalar plus immediate): Contiguous load unsigned halfwords to vector (immediate index).
LD1H (scalar plus scalar): Contiguous load unsigned halfwords to vector (scalar index).
Page 1459
LD1H (scalar plus vector): Gather load unsigned halfwords to vector (vector index).
LD1H (vector plus immediate): Gather load unsigned halfwords to vector (immediate index).
LD1RB: Load and broadcast unsigned byte to vector.
LD1RD: Load and broadcast doubleword to vector.
LD1RH: Load and broadcast unsigned halfword to vector.
LD1ROB (scalar plus immediate): Contiguous load and replicate thirty-two bytes (immediate index).
LD1ROB (scalar plus scalar): Contiguous load and replicate thirty-two bytes (scalar index).
LD1ROD (scalar plus immediate): Contiguous load and replicate four doublewords (immediate index).
LD1ROD (scalar plus scalar): Contiguous load and replicate four doublewords (scalar index).
LD1ROH (scalar plus immediate): Contiguous load and replicate sixteen halfwords (immediate index).
LD1ROH (scalar plus scalar): Contiguous load and replicate sixteen halfwords (scalar index).
LD1ROW (scalar plus immediate): Contiguous load and replicate eight words (immediate index).
LD1ROW (scalar plus scalar): Contiguous load and replicate eight words (scalar index).
LD1RQB (scalar plus immediate): Contiguous load and replicate sixteen bytes (immediate index).
LD1RQB (scalar plus scalar): Contiguous load and replicate sixteen bytes (scalar index).
LD1RQD (scalar plus immediate): Contiguous load and replicate two doublewords (immediate index).
LD1RQD (scalar plus scalar): Contiguous load and replicate two doublewords (scalar index).
LD1RQH (scalar plus immediate): Contiguous load and replicate eight halfwords (immediate index).
LD1RQH (scalar plus scalar): Contiguous load and replicate eight halfwords (scalar index).
LD1RQW (scalar plus immediate): Contiguous load and replicate four words (immediate index).
LD1RQW (scalar plus scalar): Contiguous load and replicate four words (scalar index).
LD1RSB: Load and broadcast signed byte to vector.
LD1RSH: Load and broadcast signed halfword to vector.
LD1RSW: Load and broadcast signed word to vector.
LD1RW: Load and broadcast unsigned word to vector.
LD1SB (scalar plus immediate): Contiguous load signed bytes to vector (immediate index).
LD1SB (scalar plus scalar): Contiguous load signed bytes to vector (scalar index).
LD1SB (scalar plus vector): Gather load signed bytes to vector (vector index).
LD1SB (vector plus immediate): Gather load signed bytes to vector (immediate index).
LD1SH (scalar plus immediate): Contiguous load signed halfwords to vector (immediate index).
LD1SH (scalar plus scalar): Contiguous load signed halfwords to vector (scalar index).
LD1SH (scalar plus vector): Gather load signed halfwords to vector (vector index).
LD1SH (vector plus immediate): Gather load signed halfwords to vector (immediate index).
LD1SW (scalar plus immediate): Contiguous load signed words to vector (immediate index).
LD1SW (scalar plus scalar): Contiguous load signed words to vector (scalar index).
LD1SW (scalar plus vector): Gather load signed words to vector (vector index).
Page 1460
LD1SW (vector plus immediate): Gather load signed words to vector (immediate index).
LD1W (scalar plus immediate): Contiguous load unsigned words to vector (immediate index).
LD1W (scalar plus scalar): Contiguous load unsigned words to vector (scalar index).
LD1W (scalar plus vector): Gather load unsigned words to vector (vector index).
LD1W (vector plus immediate): Gather load unsigned words to vector (immediate index).
LD2B (scalar plus immediate): Contiguous load two-byte structures to two vectors (immediate index).
LD2B (scalar plus scalar): Contiguous load two-byte structures to two vectors (scalar index).
LD2D (scalar plus immediate): Contiguous load two-doubleword structures to two vectors (immediate index).
LD2D (scalar plus scalar): Contiguous load two-doubleword structures to two vectors (scalar index).
LD2H (scalar plus immediate): Contiguous load two-halfword structures to two vectors (immediate index).
LD2H (scalar plus scalar): Contiguous load two-halfword structures to two vectors (scalar index).
LD2W (scalar plus immediate): Contiguous load two-word structures to two vectors (immediate index).
LD2W (scalar plus scalar): Contiguous load two-word structures to two vectors (scalar index).
LD3B (scalar plus immediate): Contiguous load three-byte structures to three vectors (immediate index).
LD3B (scalar plus scalar): Contiguous load three-byte structures to three vectors (scalar index).
LD3D (scalar plus immediate): Contiguous load three-doubleword structures to three vectors (immediate index).
LD3D (scalar plus scalar): Contiguous load three-doubleword structures to three vectors (scalar index).
LD3H (scalar plus immediate): Contiguous load three-halfword structures to three vectors (immediate index).
LD3H (scalar plus scalar): Contiguous load three-halfword structures to three vectors (scalar index).
LD3W (scalar plus immediate): Contiguous load three-word structures to three vectors (immediate index).
LD3W (scalar plus scalar): Contiguous load three-word structures to three vectors (scalar index).
LD4B (scalar plus immediate): Contiguous load four-byte structures to four vectors (immediate index).
LD4B (scalar plus scalar): Contiguous load four-byte structures to four vectors (scalar index).
LD4D (scalar plus immediate): Contiguous load four-doubleword structures to four vectors (immediate index).
LD4D (scalar plus scalar): Contiguous load four-doubleword structures to four vectors (scalar index).
LD4H (scalar plus immediate): Contiguous load four-halfword structures to four vectors (immediate index).
LD4H (scalar plus scalar): Contiguous load four-halfword structures to four vectors (scalar index).
LD4W (scalar plus immediate): Contiguous load four-word structures to four vectors (immediate index).
LD4W (scalar plus scalar): Contiguous load four-word structures to four vectors (scalar index).
LDFF1B (scalar plus scalar): Contiguous load first-fault unsigned bytes to vector (scalar index).
LDFF1B (scalar plus vector): Gather load first-fault unsigned bytes to vector (vector index).
LDFF1B (vector plus immediate): Gather load first-fault unsigned bytes to vector (immediate index).
LDFF1D (scalar plus scalar): Contiguous load first-fault doublewords to vector (scalar index).
LDFF1D (scalar plus vector): Gather load first-fault doublewords to vector (vector index).
LDFF1D (vector plus immediate): Gather load first-fault doublewords to vector (immediate index).
LDFF1H (scalar plus scalar): Contiguous load first-fault unsigned halfwords to vector (scalar index).
Page 1461
LDFF1H (scalar plus vector): Gather load first-fault unsigned halfwords to vector (vector index).
LDFF1H (vector plus immediate): Gather load first-fault unsigned halfwords to vector (immediate index).
LDFF1SB (scalar plus scalar): Contiguous load first-fault signed bytes to vector (scalar index).
LDFF1SB (scalar plus vector): Gather load first-fault signed bytes to vector (vector index).
LDFF1SB (vector plus immediate): Gather load first-fault signed bytes to vector (immediate index).
LDFF1SH (scalar plus scalar): Contiguous load first-fault signed halfwords to vector (scalar index).
LDFF1SH (scalar plus vector): Gather load first-fault signed halfwords to vector (vector index).
LDFF1SH (vector plus immediate): Gather load first-fault signed halfwords to vector (immediate index).
LDFF1SW (scalar plus scalar): Contiguous load first-fault signed words to vector (scalar index).
LDFF1SW (scalar plus vector): Gather load first-fault signed words to vector (vector index).
LDFF1SW (vector plus immediate): Gather load first-fault signed words to vector (immediate index).
LDFF1W (scalar plus scalar): Contiguous load first-fault unsigned words to vector (scalar index).
LDFF1W (scalar plus vector): Gather load first-fault unsigned words to vector (vector index).
LDFF1W (vector plus immediate): Gather load first-fault unsigned words to vector (immediate index).
LDNF1B: Contiguous load non-fault unsigned bytes to vector (immediate index).
LDNF1D: Contiguous load non-fault doublewords to vector (immediate index).
LDNF1H: Contiguous load non-fault unsigned halfwords to vector (immediate index).
LDNF1SB: Contiguous load non-fault signed bytes to vector (immediate index).
LDNF1SH: Contiguous load non-fault signed halfwords to vector (immediate index).
LDNF1SW: Contiguous load non-fault signed words to vector (immediate index).
LDNF1W: Contiguous load non-fault unsigned words to vector (immediate index).
LDNT1B (scalar plus immediate): Contiguous load non-temporal bytes to vector (immediate index).
LDNT1B (scalar plus scalar): Contiguous load non-temporal bytes to vector (scalar index).
LDNT1D (scalar plus immediate): Contiguous load non-temporal doublewords to vector (immediate index).
LDNT1D (scalar plus scalar): Contiguous load non-temporal doublewords to vector (scalar index).
LDNT1H (scalar plus immediate): Contiguous load non-temporal halfwords to vector (immediate index).
LDNT1H (scalar plus scalar): Contiguous load non-temporal halfwords to vector (scalar index).
LDNT1W (scalar plus immediate): Contiguous load non-temporal words to vector (immediate index).
LDNT1W (scalar plus scalar): Contiguous load non-temporal words to vector (scalar index).
LDR (predicate): Load predicate register.
LDR (vector): Load vector register.
LSL (immediate, predicated): Logical shift left by immediate (predicated).
LSL (immediate, unpredicated): Logical shift left by immediate (unpredicated).
LSL (vectors): Logical shift left by vector (predicated).
LSL (wide elements, predicated): Logical shift left by 64-bit wide elements (predicated).
LSL (wide elements, unpredicated): Logical shift left by 64-bit wide elements (unpredicated).
Page 1462
LSLR: Reversed logical shift left by vector (predicated).
LSR (immediate, predicated): Logical shift right by immediate (predicated).
LSR (immediate, unpredicated): Logical shift right by immediate (unpredicated).
LSR (vectors): Logical shift right by vector (predicated).
LSR (wide elements, predicated): Logical shift right by 64-bit wide elements (predicated).
LSR (wide elements, unpredicated): Logical shift right by 64-bit wide elements (unpredicated).
LSRR: Reversed logical shift right by vector (predicated).
MAD: Multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm].
MLA: Multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm].
MLS: Multiply-subtract vectors (predicated), writing addend [Zda = Zda - Zn * Zm].
MOV (bitmask immediate): Move logical bitmask immediate to vector (unpredicated): an alias of DUPM.
MOV (immediate, predicated, merging): Move signed integer immediate to vector elements (merging): an alias of CPY
(immediate, merging).
MOV (immediate, predicated, zeroing): Move signed integer immediate to vector elements (zeroing): an alias of CPY
(immediate, zeroing).
MOV (immediate, unpredicated): Move signed immediate to vector elements (unpredicated): an alias of DUP
(immediate).
MOV (predicate, predicated, merging): Move predicates (merging): an alias of SEL (predicates).
MOV (predicate, predicated, zeroing): Move predicates (zeroing): an alias of AND, ANDS (predicates).
MOV (predicate, unpredicated): Move predicate (unpredicated): an alias of ORR, ORRS (predicates).
MOV (scalar, predicated): Move general-purpose register to vector elements (predicated): an alias of CPY (scalar).
MOV (scalar, unpredicated): Move general-purpose register to vector elements (unpredicated): an alias of DUP
(scalar).
MOV (SIMD&FP scalar, predicated): Move SIMD&FP scalar register to vector elements (predicated): an alias of CPY
(SIMD&FP scalar).
MOV (SIMD&FP scalar, unpredicated): Move indexed element or SIMD&FP scalar to vector (unpredicated): an alias of
DUP (indexed).
MOV (vector, predicated): Move vector elements (predicated): an alias of SEL (vectors).
MOV (vector, unpredicated): Move vector register (unpredicated): an alias of ORR (vectors, unpredicated).
MOVPRFX (predicated): Move prefix (predicated).
MOVPRFX (unpredicated): Move prefix (unpredicated).
MOVS (predicated): Move predicates (zeroing), setting the condition flags: an alias of AND, ANDS (predicates).
MOVS (unpredicated): Move predicate (unpredicated), setting the condition flags: an alias of ORR, ORRS (predicates).
MSB: Multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za - Zdn * Zm].
MUL (immediate): Multiply by immediate (unpredicated).
MUL (vectors): Multiply vectors (predicated).
NAND, NANDS: Bitwise NAND predicates.
NEG: Negate (predicated).
NOR, NORS: Bitwise NOR predicates.
Page 1463
NOT (predicate): Bitwise invert predicate: an alias of EOR, EORS (predicates).
NOT (vector): Bitwise invert vector (predicated).
NOTS: Bitwise invert predicate, setting the condition flags: an alias of EOR, EORS (predicates).
ORN (immediate): Bitwise inclusive OR with inverted immediate (unpredicated): an alias of ORR (immediate).
ORN, ORNS (predicates): Bitwise inclusive OR inverted predicate.
ORR (immediate): Bitwise inclusive OR with immediate (unpredicated).
ORR (vectors, predicated): Bitwise inclusive OR vectors (predicated).
ORR (vectors, unpredicated): Bitwise inclusive OR vectors (unpredicated).
ORR, ORRS (predicates): Bitwise inclusive OR predicate.
ORV: Bitwise inclusive OR reduction to scalar.
PFALSE: Set all predicate elements to false.
PFIRST: Set the first active predicate element to true.
PNEXT: Find next active predicate.
PRFB (scalar plus immediate): Contiguous prefetch bytes (immediate index).
PRFB (scalar plus scalar): Contiguous prefetch bytes (scalar index).
PRFB (scalar plus vector): Gather prefetch bytes (scalar plus vector).
PRFB (vector plus immediate): Gather prefetch bytes (vector plus immediate).
PRFD (scalar plus immediate): Contiguous prefetch doublewords (immediate index).
PRFD (scalar plus scalar): Contiguous prefetch doublewords (scalar index).
PRFD (scalar plus vector): Gather prefetch doublewords (scalar plus vector).
PRFD (vector plus immediate): Gather prefetch doublewords (vector plus immediate).
PRFH (scalar plus immediate): Contiguous prefetch halfwords (immediate index).
PRFH (scalar plus scalar): Contiguous prefetch halfwords (scalar index).
PRFH (scalar plus vector): Gather prefetch halfwords (scalar plus vector).
PRFH (vector plus immediate): Gather prefetch halfwords (vector plus immediate).
PRFW (scalar plus immediate): Contiguous prefetch words (immediate index).
PRFW (scalar plus scalar): Contiguous prefetch words (scalar index).
PRFW (scalar plus vector): Gather prefetch words (scalar plus vector).
PRFW (vector plus immediate): Gather prefetch words (vector plus immediate).
PTEST: Set condition flags for predicate.
PTRUE, PTRUES: Initialise predicate from named constraint.
PUNPKHI, PUNPKLO: Unpack and widen half of predicate.
RBIT: Reverse bits (predicated).
RDFFR (unpredicated): Read the first-fault register.
RDFFR, RDFFRS (predicated): Return predicate of succesfully loaded elements.
RDVL: Read multiple of vector register size to scalar register.
Page 1464
REV (predicate): Reverse all elements in a predicate.
REV (vector): Reverse all elements in a vector (unpredicated).
REVB, REVH, REVW: Reverse bytes / halfwords / words within elements (predicated).
SABD: Signed absolute difference (predicated).
SADDV: Signed add reduction to scalar.
SCVTF: Signed integer convert to floating-point (predicated).
SDIV: Signed divide (predicated).
SDIVR: Signed reversed divide (predicated).
SDOT (indexed): Signed integer indexed dot product.
SDOT (vectors): Signed integer dot product.
SEL (predicates): Conditionally select elements from two predicates.
SEL (vectors): Conditionally select elements from two vectors.
SETFFR: Initialise the first-fault register to all true.
SMAX (immediate): Signed maximum with immediate (unpredicated).
SMAX (vectors): Signed maximum vectors (predicated).
SMAXV: Signed maximum reduction to scalar.
SMIN (immediate): Signed minimum with immediate (unpredicated).
SMIN (vectors): Signed minimum vectors (predicated).
SMINV: Signed minimum reduction to scalar.
SMMLA: Signed integer matrix multiply-accumulate.
SMULH: Signed multiply returning high half (predicated).
SPLICE: Splice two vectors under predicate control.
SQADD (immediate): Signed saturating add immediate (unpredicated).
SQADD (vectors): Signed saturating add vectors (unpredicated).
SQDECB: Signed saturating decrement scalar by multiple of 8-bit predicate constraint element count.
SQDECD (scalar): Signed saturating decrement scalar by multiple of 64-bit predicate constraint element count.
SQDECD (vector): Signed saturating decrement vector by multiple of 64-bit predicate constraint element count.
SQDECH (scalar): Signed saturating decrement scalar by multiple of 16-bit predicate constraint element count.
SQDECH (vector): Signed saturating decrement vector by multiple of 16-bit predicate constraint element count.
SQDECP (scalar): Signed saturating decrement scalar by count of true predicate elements.
SQDECP (vector): Signed saturating decrement vector by count of true predicate elements.
SQDECW (scalar): Signed saturating decrement scalar by multiple of 32-bit predicate constraint element count.
SQDECW (vector): Signed saturating decrement vector by multiple of 32-bit predicate constraint element count.
SQINCB: Signed saturating increment scalar by multiple of 8-bit predicate constraint element count.
SQINCD (scalar): Signed saturating increment scalar by multiple of 64-bit predicate constraint element count.
SQINCD (vector): Signed saturating increment vector by multiple of 64-bit predicate constraint element count.
Page 1465
SQINCH (scalar): Signed saturating increment scalar by multiple of 16-bit predicate constraint element count.
SQINCH (vector): Signed saturating increment vector by multiple of 16-bit predicate constraint element count.
SQINCP (scalar): Signed saturating increment scalar by count of true predicate elements.
SQINCP (vector): Signed saturating increment vector by count of true predicate elements.
SQINCW (scalar): Signed saturating increment scalar by multiple of 32-bit predicate constraint element count.
SQINCW (vector): Signed saturating increment vector by multiple of 32-bit predicate constraint element count.
SQSUB (immediate): Signed saturating subtract immediate (unpredicated).
SQSUB (vectors): Signed saturating subtract vectors (unpredicated).
ST1B (scalar plus immediate): Contiguous store bytes from vector (immediate index).
ST1B (scalar plus scalar): Contiguous store bytes from vector (scalar index).
ST1B (scalar plus vector): Scatter store bytes from a vector (vector index).
ST1B (vector plus immediate): Scatter store bytes from a vector (immediate index).
ST1D (scalar plus immediate): Contiguous store doublewords from vector (immediate index).
ST1D (scalar plus scalar): Contiguous store doublewords from vector (scalar index).
ST1D (scalar plus vector): Scatter store doublewords from a vector (vector index).
ST1D (vector plus immediate): Scatter store doublewords from a vector (immediate index).
ST1H (scalar plus immediate): Contiguous store halfwords from vector (immediate index).
ST1H (scalar plus scalar): Contiguous store halfwords from vector (scalar index).
ST1H (scalar plus vector): Scatter store halfwords from a vector (vector index).
ST1H (vector plus immediate): Scatter store halfwords from a vector (immediate index).
ST1W (scalar plus immediate): Contiguous store words from vector (immediate index).
ST1W (scalar plus scalar): Contiguous store words from vector (scalar index).
ST1W (scalar plus vector): Scatter store words from a vector (vector index).
ST1W (vector plus immediate): Scatter store words from a vector (immediate index).
ST2B (scalar plus immediate): Contiguous store two-byte structures from two vectors (immediate index).
ST2B (scalar plus scalar): Contiguous store two-byte structures from two vectors (scalar index).
ST2D (scalar plus immediate): Contiguous store two-doubleword structures from two vectors (immediate index).
ST2D (scalar plus scalar): Contiguous store two-doubleword structures from two vectors (scalar index).
ST2H (scalar plus immediate): Contiguous store two-halfword structures from two vectors (immediate index).
ST2H (scalar plus scalar): Contiguous store two-halfword structures from two vectors (scalar index).
ST2W (scalar plus immediate): Contiguous store two-word structures from two vectors (immediate index).
ST2W (scalar plus scalar): Contiguous store two-word structures from two vectors (scalar index).
ST3B (scalar plus immediate): Contiguous store three-byte structures from three vectors (immediate index).
ST3B (scalar plus scalar): Contiguous store three-byte structures from three vectors (scalar index).
ST3D (scalar plus immediate): Contiguous store three-doubleword structures from three vectors (immediate index).
ST3D (scalar plus scalar): Contiguous store three-doubleword structures from three vectors (scalar index).
Page 1466
ST3H (scalar plus immediate): Contiguous store three-halfword structures from three vectors (immediate index).
ST3H (scalar plus scalar): Contiguous store three-halfword structures from three vectors (scalar index).
ST3W (scalar plus immediate): Contiguous store three-word structures from three vectors (immediate index).
ST3W (scalar plus scalar): Contiguous store three-word structures from three vectors (scalar index).
ST4B (scalar plus immediate): Contiguous store four-byte structures from four vectors (immediate index).
ST4B (scalar plus scalar): Contiguous store four-byte structures from four vectors (scalar index).
ST4D (scalar plus immediate): Contiguous store four-doubleword structures from four vectors (immediate index).
ST4D (scalar plus scalar): Contiguous store four-doubleword structures from four vectors (scalar index).
ST4H (scalar plus immediate): Contiguous store four-halfword structures from four vectors (immediate index).
ST4H (scalar plus scalar): Contiguous store four-halfword structures from four vectors (scalar index).
ST4W (scalar plus immediate): Contiguous store four-word structures from four vectors (immediate index).
ST4W (scalar plus scalar): Contiguous store four-word structures from four vectors (scalar index).
STNT1B (scalar plus immediate): Contiguous store non-temporal bytes from vector (immediate index).
STNT1B (scalar plus scalar): Contiguous store non-temporal bytes from vector (scalar index).
STNT1D (scalar plus immediate): Contiguous store non-temporal doublewords from vector (immediate index).
STNT1D (scalar plus scalar): Contiguous store non-temporal doublewords from vector (scalar index).
STNT1H (scalar plus immediate): Contiguous store non-temporal halfwords from vector (immediate index).
STNT1H (scalar plus scalar): Contiguous store non-temporal halfwords from vector (scalar index).
STNT1W (scalar plus immediate): Contiguous store non-temporal words from vector (immediate index).
STNT1W (scalar plus scalar): Contiguous store non-temporal words from vector (scalar index).
STR (predicate): Store predicate register.
STR (vector): Store vector register.
SUB (immediate): Subtract immediate (unpredicated).
SUB (vectors, predicated): Subtract vectors (predicated).
SUB (vectors, unpredicated): Subtract vectors (unpredicated).
SUBR (immediate): Reversed subtract from immediate (unpredicated).
SUBR (vectors): Reversed subtract vectors (predicated).
SUDOT: Signed by unsigned integer indexed dot product.
SUNPKHI, SUNPKLO: Signed unpack and extend half of vector.
SXTB, SXTH, SXTW: Signed byte / halfword / word extend (predicated).
TBL: Programmable table lookup in single vector table.
TRN1, TRN2 (predicates): Interleave even or odd elements from two predicates.
TRN1, TRN2 (vectors): Interleave even or odd elements from two vectors.
UABD: Unsigned absolute difference (predicated).
UADDV: Unsigned add reduction to scalar.
UCVTF: Unsigned integer convert to floating-point (predicated).
Page 1467
UDIV: Unsigned divide (predicated).
UDIVR: Unsigned reversed divide (predicated).
UDOT (indexed): Unsigned integer indexed dot product.
UDOT (vectors): Unsigned integer dot product.
UMAX (immediate): Unsigned maximum with immediate (unpredicated).
UMAX (vectors): Unsigned maximum vectors (predicated).
UMAXV: Unsigned maximum reduction to scalar.
UMIN (immediate): Unsigned minimum with immediate (unpredicated).
UMIN (vectors): Unsigned minimum vectors (predicated).
UMINV: Unsigned minimum reduction to scalar.
UMMLA: Unsigned integer matrix multiply-accumulate.
UMULH: Unsigned multiply returning high half (predicated).
UQADD (immediate): Unsigned saturating add immediate (unpredicated).
UQADD (vectors): Unsigned saturating add vectors (unpredicated).
UQDECB: Unsigned saturating decrement scalar by multiple of 8-bit predicate constraint element count.
UQDECD (scalar): Unsigned saturating decrement scalar by multiple of 64-bit predicate constraint element count.
UQDECD (vector): Unsigned saturating decrement vector by multiple of 64-bit predicate constraint element count.
UQDECH (scalar): Unsigned saturating decrement scalar by multiple of 16-bit predicate constraint element count.
UQDECH (vector): Unsigned saturating decrement vector by multiple of 16-bit predicate constraint element count.
UQDECP (scalar): Unsigned saturating decrement scalar by count of true predicate elements.
UQDECP (vector): Unsigned saturating decrement vector by count of true predicate elements.
UQDECW (scalar): Unsigned saturating decrement scalar by multiple of 32-bit predicate constraint element count.
UQDECW (vector): Unsigned saturating decrement vector by multiple of 32-bit predicate constraint element count.
UQINCB: Unsigned saturating increment scalar by multiple of 8-bit predicate constraint element count.
UQINCD (scalar): Unsigned saturating increment scalar by multiple of 64-bit predicate constraint element count.
UQINCD (vector): Unsigned saturating increment vector by multiple of 64-bit predicate constraint element count.
UQINCH (scalar): Unsigned saturating increment scalar by multiple of 16-bit predicate constraint element count.
UQINCH (vector): Unsigned saturating increment vector by multiple of 16-bit predicate constraint element count.
UQINCP (scalar): Unsigned saturating increment scalar by count of true predicate elements.
UQINCP (vector): Unsigned saturating increment vector by count of true predicate elements.
UQINCW (scalar): Unsigned saturating increment scalar by multiple of 32-bit predicate constraint element count.
UQINCW (vector): Unsigned saturating increment vector by multiple of 32-bit predicate constraint element count.
UQSUB (immediate): Unsigned saturating subtract immediate (unpredicated).
UQSUB (vectors): Unsigned saturating subtract vectors (unpredicated).
USDOT (indexed): Unsigned by signed integer indexed dot product.
USDOT (vectors): Unsigned by signed integer dot product.
Page 1468
USMMLA: Unsigned by signed integer matrix multiply-accumulate.
UUNPKHI, UUNPKLO: Unsigned unpack and extend half of vector.
UXTB, UXTH, UXTW: Unsigned byte / halfword / word extend (predicated).
UZP1, UZP2 (predicates): Concatenate even or odd elements from two predicates.
UZP1, UZP2 (vectors): Concatenate even or odd elements from two vectors.
WHILELE: While incrementing signed scalar less than or equal to scalar.
WHILELO: While incrementing unsigned scalar lower than scalar.
WHILELS: While incrementing unsigned scalar lower or same as scalar.
WHILELT: While incrementing signed scalar less than scalar.
WRFFR: Write the first-fault register.
ZIP1, ZIP2 (predicates): Interleave elements from two half predicates.
ZIP1, ZIP2 (vectors): Interleave elements from two half vectors.
Page 1469
ABS
Absolute value (predicated).
Compute the absolute value of the signed integer in each active element of the source vector, and place the results in
the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 0 1 0 1 Pg Zn Zd
SVE
ABS <Zd>.<T>, <Pg>/M, <Zn>.<T>
if !HaveSVE() then UNDEFINED;

integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<T> Is the size specifier, encoded in “size”:
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
bits(VL) result = Z[d];
integer element = SInt(Elem[operand, e, esize]);
if ElemP[mask, e, esize] == '1' then
Z[d] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
ABS Page 1470

ABS Page 1471

ADD (vectors, predicated)
Add vectors (predicated).
Add active elements of the second source vector to corresponding elements of the first source vector and destructively
place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector
register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 0 0 0 0 Pg Zm Zdn
SVE
ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

integer dn = UInt(Zdn);
integer m = UInt(Zm);
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
bits(esize) element1 = Elem[operand1, e, esize];
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
UNPREDICTABLE:
ADD (vectors, predicated) Page 1472

ADD (vectors, predicated) Page 1473

ADD (immediate)
Add immediate (unpredicated).
Add an unsigned immediate to each element of the source vector, and destructively place the results in the
corresponding elements of the source vector. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 0 1 1 sh imm8 Zdn
SVE
ADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

if size:sh == '001' then UNDEFINED;
integer imm = UInt(imm8);
if sh == '1' then imm = imm << 8;
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = element1 + imm;
Z[dn] = result;
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.


ADD (vectors, unpredicated)
Add vectors (unpredicated).
Add all elements of the second source vector to corresponding elements of the first source vector and place the results
in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 0 0 0 Zn Zd
SVE
ADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) result;
Z[d] = result;
ADD (vectors, unpredicated) Page 1476

ADDPL
Add multiple of predicate register size to scalar register.
Add the current predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source
general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose
register or current stack pointer.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Rn 0 1 0 1 0 imm6 Rd
SVE
ADDPL <Xd|SP>, <Xn|SP>, #<imm>

integer imm = SInt(imm6);
Assembler Symbols
field.
<imm> Is the signed immediate operand, in the range -32 to 31, encoded in the "imm6" field.
Operation
CheckSVEEnabled();
bits(64) result = operand1 + (imm * (PL DIV 8));
if d == 31 then
SP[] = result;
else
X[d] = result;
ADDPL Page 1477

ADDVL
Add multiple of vector register size to scalar register.
Add the current vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source
general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose
register or current stack pointer.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Rn 0 1 0 1 0 imm6 Rd
SVE
ADDVL <Xd|SP>, <Xn|SP>, #<imm>

Assembler Symbols
field.
Operation
CheckSVEEnabled();
bits(64) result = operand1 + (imm * (VL DIV 8));
if d == 31 then
SP[] = result;
else
X[d] = result;
ADDVL Page 1478

ADR
Compute vector address.
Optionally sign or zero-extend the least significant 32-bits of each element from a vector of offsets or indices in the
second source vector, scale each index by 2, 4 or 8, add to a vector of base addresses from the first source vector, and
place the resulting addresses in the destination vector. This instruction is unpredicated.
It has encodings from 3 classes: Packed offsets , Unpacked 32-bit signed offsets and Unpacked 32-bit unsigned offsets
Packed offsets
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 sz 1 Zm 1 0 1 0 msz Zn Zd
Packed offsets
ADR <Zd>.<T>, [<Zn>.<T>, <Zm>.<T>{, <mod> <amount>}]

integer osize = esize;
boolean unsigned = TRUE;
integer mbytes = 1 << UInt(msz);
Unpacked 32-bit signed offsets
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Zm 1 0 1 0 msz Zn Zd
Unpacked 32-bit signed offsets
ADR <Zd>.D, [<Zn>.D, <Zm>.D, SXTW{ <amount>}]

integer esize = 64;
integer osize = 32;
boolean unsigned = FALSE;
Unpacked 32-bit unsigned offsets
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 1 0 1 0 msz Zn Zd
ADR Page 1479

Unpacked 32-bit unsigned offsets
ADR <Zd>.D, [<Zn>.D, <Zm>.D, UXTW{ <amount>}]

integer esize = 64;
integer osize = 32;
Assembler Symbols
<T> Is the size specifier, encoded in “sz”:
sz <T>
0 S
1 D
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
<mod> Is the index extend and shift specifier, encoded in “msz”:
msz <mod>
00 [absent]
x1 LSL
10 LSL
<amount> Is the index shift amount, encoded in “msz”:

msz <amount>
00 [absent]
01 #1
10 #2
11 #3
Operation
CheckSVEEnabled();
bits(VL) base = Z[n];
bits(VL) offs = Z[m];
bits(VL) result;
bits(esize) addr = Elem[base, e, esize];
integer offset = Int(Elem[offs, e, esize]<osize-1:0>, unsigned);
Elem[result, e, esize] = addr + (offset * mbytes);
Z[d] = result;
ADR Page 1480

AND, ANDS (predicates)
Bitwise AND predicates.
Bitwise AND active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Optionally sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
This instruction is used by the aliases MOVS (predicated), and MOV (predicate, predicated, zeroing).
It has encodings from 2 classes: Not setting the condition flags and Setting the condition flags
Not setting the condition flags
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;
Setting the condition flags
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
ANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
boolean setflags = TRUE;
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Alias Conditions

MOVS (predicated) S == '1' && Pn == Pm
AND, ANDS (predicates) Page 1481

MOV (predicate, predicated, zeroing) S == '0' && Pn == Pm
Operation
CheckSVEEnabled();
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
bit element1 = ElemP[operand1, e, esize];
ElemP[result, e, esize] = element1 AND element2;
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
AND, ANDS (predicates) Page 1482

AND (vectors, predicated)
Bitwise AND vectors (predicated).
Bitwise AND active elements of the second source vector with corresponding elements of the first source vector and
destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 0 0 0 Pg Zm Zdn
SVE
AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = element1 AND element2;
else
Z[dn] = result;
UNPREDICTABLE:
AND (vectors, predicated) Page 1483

AND (vectors, predicated) Page 1484

AND (immediate)
Bitwise AND with immediate (unpredicated).
Bitwise AND an immediate with each 64-bit element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This instruction is used by the pseudo-instruction BIC (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 0 0 0 0 imm13 Zdn
SVE
AND <Zdn>.<T>, <Zdn>.<T>, #<const>

bits(64) imm;
(imm, -) = DecodeBitMasks(imm13<12>, imm13<5:0>, imm13<11:6>, TRUE);
Assembler Symbols
<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:
imm13<12> imm13<5:0> <T>
0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 64;
bits(VL) operand = Z[dn];
bits(VL) result;
bits(64) element1 = Elem[operand, e, 64];
Elem[result, e, 64] = element1 AND imm;
Z[dn] = result;
UNPREDICTABLE:


AND (vectors, unpredicated)
Bitwise AND vectors (unpredicated).
Bitwise AND all elements of the second source vector with corresponding elements of the first source vector and place
the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Zm 0 0 1 1 0 0 Zn Zd
SVE
AND <Zd>.D, <Zn>.D, <Zm>.D

Assembler Symbols
Operation
CheckSVEEnabled();
Z[d] = operand1 AND operand2;
AND (vectors, unpredicated) Page 1487

ANDV
Bitwise AND reduction to scalar.
Bitwise AND horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Inactive elements in the source vector are treated as all ones.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 0 0 1 Pg Zn Vd
SVE
ANDV <V><d>, <Pg>, <Zn>.<T>

integer d = UInt(Vd);
Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) result = Ones(esize);
result = result AND Elem[operand, e, esize];
V[d] = result;
ANDV Page 1488

ASR (immediate, predicated)
Arithmetic shift right by immediate (predicated).
Shift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the
range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 0 0 0 1 0 0 Pg tszl imm3 Zdn
L U
SVE
ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>

bits(4) tsize = tszh:tszl;
case tsize of
when '0001' esize = 8;
when '001x' esize = 16;
when '01xx' esize = 32;
when '1xxx' esize = 64;
integer shift = (2 * esize) - UInt(tsize:imm3);
Assembler Symbols
<T> Is the size specifier, encoded in “tszh:tszl”:
tszh tszl <T>
00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = ASR(element1, shift);
else
Z[dn] = result;
ASR (immediate, predicated) Page 1489

UNPREDICTABLE:
ASR (immediate, predicated) Page 1490

ASR (wide elements, predicated)
Arithmetic shift right by 64-bit wide elements (predicated).
Shift right, preserving the sign bit, active elements of the first source vector by corresponding overlapping 64-bit
elements of the second source vector and destructively place the results in the corresponding elements of the first
source vector. The shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant,
and not used modulo the destination element size. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 1 0 0 Pg Zm Zdn
R L U
SVE
ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D

Assembler Symbols
size <T>
00 B
01 H
10 S
11 RESERVED
Operation
CheckSVEEnabled();
bits(VL) result;
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
else
Z[dn] = result;
ASR (wide elements,

Page 1491
predicated)
UNPREDICTABLE:
and destination element size as this instruction.
ASR (wide elements,

Page 1492
predicated)
ASR (vectors)
Arithmetic shift right by vector (predicated).
Shift right, preserving the sign bit, active elements of the first source vector by corresponding elements of the second
source vector and destructively place the results in the corresponding elements of the first source vector. The shift
amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element
size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 1 0 0 Pg Zm Zdn
R L U
SVE
ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
ASR (vectors) Page 1493

ASR (vectors) Page 1494

ASR (immediate, unpredicated)
Arithmetic shift right by immediate (unpredicated).
Shift right by immediate, preserving the sign bit, each element of the source vector, and place the results in the
corresponding elements of the destination vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 0 0 Zn Zd
U
SVE
ASR <Zd>.<T>, <Zn>.<T>, #<const>

case tsize of
Assembler Symbols
tszh tszl <T>
00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
ASR (immediate,
Page 1495
unpredicated)
ASR (wide elements, unpredicated)
Arithmetic shift right by 64-bit wide elements (unpredicated).
Shift right, preserving the sign bit, all elements of the first source vector by corresponding overlapping 64-bit
elements of the second source vector and place the first in the corresponding elements of the destination vector. The
shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo
the destination element size. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 0 0 Zn Zd
U
SVE
ASR <Zd>.<T>, <Zn>.<T>, <Zm>.D

Assembler Symbols
size <T>
00 B
01 H
10 S
11 RESERVED
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
ASR (wide elements,

Page 1496
unpredicated)
ASRD
Arithmetic shift right for divide by immediate (predicated).
Shift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The result rounds toward zero as in a signed division. The
immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
L U
SVE
ASRD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>

case tsize of
Assembler Symbols
tszh tszl <T>
00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D
Operation
CheckSVEEnabled();
bits(VL) result;
integer element1 = SInt(Elem[operand1, e, esize]);
if element1 < 0 then
element1 = element1 + ((1 << shift) - 1);
Elem[result, e, esize] = (element1 >> shift)<esize-1:0>;
else
Z[dn] = result;
ASRD Page 1497

UNPREDICTABLE:
ASRD Page 1498

ASRR
Reversed arithmetic shift right by vector (predicated).
Reversed shift right, preserving the sign bit, active elements of the second source vector by corresponding elements of
the first source vector and destructively place the results in the corresponding elements of the first source vector. The
shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the
element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 1 0 0 Pg Zm Zdn
R L U
SVE
ASRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
ASRR Page 1499

ASRR Page 1500

BFCVT
Floating-point down convert to BFloat16 format (predicated).
Convert to BFloat16 from single-precision in each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
Since the result type is smaller than the input type, the results are zero-extended to fill each destination element.
Unlike the BFloat16 matrix multiplication and dot product instructions, this instruction honors all of the FPCR bits
that apply to single-precision arithmetic. It can also generate a floating-point exception that causes cumulative
exception bits in the FPSR to be set, or a synchronous exception to be taken, depending on the enable bits in the
FPCR.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 Pg Zn Zd
SVE
BFCVT <Zd>.H, <Pg>/M, <Zn>.S
if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

Assembler Symbols
Operation
CheckSVEEnabled();
bits(32) element = Elem[operand, e, 32];
if ElemP[mask, e, 32] == '1' then
Elem[result, 2*e, 16] = FPConvertBF(element, FPCR<31:0>);
Elem[result, 2*e+1, 16] = Zeros();
Z[d] = result;
UNPREDICTABLE:
BFCVT Page 1501

BFCVTNT
Floating-point down convert and narrow to BFloat16 (top, predicated).
Convert active 32-bit single-precision elements from the source vector to BFloat16 format, and place the results in the
odd-numbered 16-bit elements of the destination vector, leaving the even-numbered elements unchanged. Inactive
elements in the destination vector register remain unmodified.
Unlike the BFloat16 matrix multiplication and dot product instructions, this instruction honors all of the FPCR bits
that apply to single-precision arithmetic. It can also generate a floating-point exception that causes cumulative
exception bits in the FPSR to be set, or a synchronous exception to be taken, depending on the enable bits in the
FPCR.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 Pg Zn Zd
SVE
BFCVTNT <Zd>.H, <Pg>/M, <Zn>.S

Assembler Symbols
Operation
CheckSVEEnabled();
bits(32) element = Elem[operand, e, 32];
Elem[result, 2*e+1, 16] = FPConvertBF(element, FPCR<31:0>);
Z[d] = result;
BFCVTNT Page 1502

BFDOT (vectors)
BFloat16 floating-point dot product.
The BFloat16 floating-point (BF16) dot product instruction computes the dot product of a pair of BF16 values held in
each 32-bit element of the first source vector multiplied by a pair of BF16 values in the corresponding 32-bit element
of the second source vector, and then destructively adds the single-precision dot product to the corresponding single-
precision element of the destination vector.
This instruction is unpredicated.
All floating-point calculations performed by this instruction are performed with the following behaviors, irrespective of
the value in FPCR:
* Uses the non-IEEE 754 Round-to-Odd mode, which forces bit 0 of an inexact result to 1, and rounds an overflow to an
appropriately signed Infinity.
* The cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC and IOC) are not modified.
* Trapped floating-point exceptions are disabled, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE and IOE)
are all zero.
* Denormalized inputs and results are flushed to zero, as if FPCR.FZ == 1.
* Only the Default NaN is generated, as if FPCR.DN == 1.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 Zm 1 0 0 0 0 0 Zn Zda
SVE
BFDOT <Zda>.S, <Zn>.H, <Zm>.H

integer da = UInt(Zda);
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
Operation
CheckSVEEnabled();
bits(VL) operand3 = Z[da];
bits(VL) result;
bits(16) elt1_a = Elem[operand1, 2 * e + 0, 16];
bits(16) elt1_b = Elem[operand1, 2 * e + 1, 16];

Z[da] = result;
BFDOT (vectors) Page 1503

UNPREDICTABLE:
BFDOT (vectors) Page 1504

BFDOT (indexed)
BFloat16 floating-point indexed dot product.
The BFloat16 floating-point (BF16) indexed dot product instruction computes the dot product of a pair of BF16 values
held in each 32-bit element of the first source vector multiplied by a pair of BF16 values in an indexed 32-bit element
of the second source vector, and then destructively adds the single-precision dot product to the corresponding single-
precision element of the destination vector.
The BF16 pairs within the second source vector are specified using an immediate index which selects the same BF16
pair position within each 128-bit vector segment. The index range is from 0 to 3. This instruction is unpredicated.
the value in FPCR:
are all zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 i2 Zm 0 1 0 0 0 0 Zn Zda
SVE
BFDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]

integer index = UInt(i2);
Assembler Symbols
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index, in the range 0 to 3, encoded in the "i2" field.
BFDOT (indexed) Page 1505

Operation
CheckSVEEnabled();
integer eltspersegment = 128 DIV 32;
bits(VL) result;
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(16) elt2_a = Elem[operand2, 2 * s + 0, 16];
bits(16) elt2_b = Elem[operand2, 2 * s + 1, 16];

Z[da] = result;
UNPREDICTABLE:
BFDOT (indexed) Page 1506

BFMLALB (vectors)
BFloat16 floating-point multiply-add long to single-precision (bottom).
This BFloat16 floating-point multiply-add long instruction widens the even-numbered 16-bit BFloat16 elements in the
first source vector and the corresponding elements in the second source vector to single-precision format and then
destructively multiplies and adds these values without intermediate rounding to the overlapping 32-bit single-precision
elements of the addend and destination vector. This instruction is unpredicated.
Unlike the BFloat16 matrix multiplication and dot product instructions, this instruction performs a fused multiply-add
that honors all of the FPCR bits that apply to single-precision arithmetic. It can also generate a floating-point
exception that causes cumulative exception bits in the FPSR to be set, or a synchronous exception to be taken,
depending on the enable bits in the FPCR.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 0 0 0 0 0 Zn Zda
o2 op T
SVE
BFMLALB <Zda>.S, <Zn>.H, <Zm>.H

Assembler Symbols
Operation
CheckSVEEnabled();
bits(VL) result;
bits(32) element1 = Elem[operand1, 2 * e + 0, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = FPMulAdd(element3, element1, element2, FPCR<31:0>);
Z[da] = result;
UNPREDICTABLE:
BFMLALB (vectors) Page 1507

BFMLALB (indexed)
BFloat16 floating-point multiply-add long to single-precision (bottom, indexed).
This BFloat16 floating-point multiply-add long instruction widens the even-numbered 16-bit BFloat16 elements in the
first source vector and the indexed element from the corresponding 128-bit segment in the second source vector to
single-precision format and then destructively multiplies and adds these values without intermediate rounding to the
overlapping 32-bit single-precision elements of the addend and destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i3h Zm 0 1 0 0 i3l 0 Zn Zda
o2 op T
SVE
BFMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]

integer index = UInt(i3h:i3l);
Assembler Symbols
<imm> Is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
Operation
CheckSVEEnabled();
bits(VL) result;
integer s = 2 * segmentbase + index;
bits(32) element2 = Elem[operand2, s, 16] : Zeros(16);
Z[da] = result;
UNPREDICTABLE:
BFMLALB (indexed) Page 1508

BFMLALB (indexed) Page 1509

BFMLALT (vectors)
BFloat16 floating-point multiply-add long to single-precision (top).
This BFloat16 floating-point multiply-add long instruction widens the odd-numbered 16-bit BFloat16 elements in the
first source vector and the corresponding elements in the second source vector to single-precision format and then
destructively multiplies and adds these values without intermediate rounding to the overlapping 32-bit single-precision
elements of the addend and destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 0 0 0 0 1 Zn Zda
o2 op T
SVE
BFMLALT <Zda>.S, <Zn>.H, <Zm>.H

Assembler Symbols
Operation
CheckSVEEnabled();
bits(VL) result;
Z[da] = result;
UNPREDICTABLE:
BFMLALT (vectors) Page 1510

BFMLALT (indexed)
BFloat16 floating-point multiply-add long to single-precision (top, indexed).
This BFloat16 floating-point multiply-add long instruction widens the odd-numbered 16-bit BFloat16 elements in the
first source vector and the indexed element from the corresponding 128-bit segment in the second source vector to
single-precision format and then destructively multiplies and adds these values without intermediate rounding to the
overlapping 32-bit single-precision elements of the addend and destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i3h Zm 0 1 0 0 i3l 1 Zn Zda
o2 op T
SVE
BFMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]

Assembler Symbols
<imm> Is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
Operation
CheckSVEEnabled();
bits(VL) result;
integer s = 2 * segmentbase + index;
bits(32) element2 = Elem[operand2, s, 16] : Zeros(16);
Z[da] = result;
UNPREDICTABLE:
BFMLALT (indexed) Page 1511

BFMLALT (indexed) Page 1512

BFMMLA
BFloat16 floating-point matrix multiply-accumulate.
This BFloat16 floating-point (BF16) matrix multiply-accumulate instruction multiplies the 2×4 matrix of BF16 values
held in each 128-bit segment of the first source vector by the 4×2 BF16 matrix in the corresponding segment of the
second source vector. The resulting 2×2 single-precision (FP32) matrix product is then destructively added to the
FP32 matrix accumulator held in the corresponding segment of the addend and destination vector. This is equivalent
to performing a 4-way dot product per destination element.
This instruction is unpredicated and vector length agnostic.
the value in FPCR:
are all zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 Zm 1 1 1 0 0 1 Zn Zda
SVE
BFMMLA <Zda>.S, <Zn>.H, <Zm>.H

Assembler Symbols
Operation
CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) result;
bits(128) op1, op2;
bits(128) res, addend;
for s = 0 to segments-1
op1 = Elem[operand1, s, 128];
addend = Elem[operand3, s, 128];
res = BFMatMulAdd(addend, op1, op2);
Elem[result, s, 128] = res;
Z[da] = result;
BFMMLA Page 1513

UNPREDICTABLE:
BFMMLA Page 1514

BIC (immediate)
Bitwise clear bits using immediate (unpredicated).
Bitwise clear bits using immediate with each 64-bit element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
This is a pseudo-instruction of AND (immediate). This means:
• The encodings in this description are named to match the encodings of AND (immediate).
• The description of AND (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 0 0 0 0 imm13 Zdn
SVE
BIC <Zdn>.<T>, <Zdn>.<T>, #<const>
is equivalent to
AND <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)
Assembler Symbols
imm13<12> imm13<5:0> <T>
0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D
Operation
The description of AND (immediate) gives the operational pseudocode for this instruction.
UNPREDICTABLE:
BIC (immediate) Page 1515

BIC, BICS (predicates)
Bitwise clear predicates.
Bitwise AND inverted active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Optionally sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on
the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S
BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S
BICS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
BIC, BICS (predicates) Page 1516

Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = element1 AND (NOT element2);
else
if setflags then
P[d] = result;
BIC, BICS (predicates) Page 1517

BIC (vectors, predicated)
Bitwise clear vectors (predicated).
Bitwise AND inverted active elements of the second source vector with corresponding elements of the first source
vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in
the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 0 0 0 Pg Zm Zdn
SVE
BIC <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = element1 AND (NOT element2);
else
Z[dn] = result;
UNPREDICTABLE:
BIC (vectors, predicated) Page 1518

BIC (vectors, predicated) Page 1519

BIC (vectors, unpredicated)
Bitwise clear vectors (unpredicated).
Bitwise AND inverted all elements of the second source vector with corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 Zm 0 0 1 1 0 0 Zn Zd
SVE
BIC <Zd>.D, <Zn>.D, <Zm>.D

Assembler Symbols
Operation
CheckSVEEnabled();
Z[d] = operand1 AND (NOT operand2);
BIC (vectors, unpredicated) Page 1520

BRKA, BRKAS
Break after first true condition.
Sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to
zero, depending on whether merging or zeroing predication is selected. Optionally sets the FIRST (N), NONE (Z), !LAST
(C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd
B S
BRKA <Pd>.B, <Pg>/<ZM>, <Pn>.B

integer esize = 8;
boolean merging = (M == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 Pg 0 Pn 0 Pd
B S M
BRKAS <Pd>.B, <Pg>/Z, <Pn>.B

integer esize = 8;
boolean merging = FALSE;
Assembler Symbols
<ZM> Is the predication qualifier, encoded in “M”:
M <ZM>
0 Z
1 M
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
BRKA, BRKAS Page 1521

Operation
CheckSVEEnabled();
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;
boolean element = ElemP[operand, e, esize] == '1';
ElemP[result, e, esize] = if !break then '1' else '0';
break = break || element;
elsif merging then
ElemP[result, e, esize] = ElemP[operand2, e, esize];
else
if setflags then
P[d] = result;
BRKA, BRKAS Page 1522

BRKB, BRKBS
Break before first true condition.
Sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to
zero, depending on whether merging or zeroing predication is selected. Optionally sets the FIRST (N), NONE (Z), !LAST
(C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd
B S
BRKB <Pd>.B, <Pg>/<ZM>, <Pn>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 0 1 Pg 0 Pn 0 Pd
B S M
BRKBS <Pd>.B, <Pg>/Z, <Pn>.B

integer esize = 8;
Assembler Symbols
M <ZM>
0 Z
1 M
BRKB, BRKBS Page 1523

Operation
CheckSVEEnabled();
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;
boolean element = ElemP[operand, e, esize] == '1';
break = break || element;
ElemP[result, e, esize] = if !break then '1' else '0';
elsif merging then
ElemP[result, e, esize] = ElemP[operand2, e, esize];
else
if setflags then
P[d] = result;
BRKB, BRKBS Page 1524

BRKN, BRKNS
Propagate break to next partition.
If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
leaves the destination and second source predicate unchanged. Inactive elements in the destination predicate register
are set to zero. Optionally sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the
V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm
S
BRKN <Pdm>.B, <Pg>/Z, <Pn>.B, <Pdm>.B

integer dm = UInt(Pdm);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm
S
BRKNS <Pdm>.B, <Pg>/Z, <Pn>.B, <Pdm>.B

integer dm = UInt(Pdm);
Assembler Symbols
<Pdm> Is the name of the second source and destination scalable predicate register, encoded in the "Pdm"
field.
BRKN, BRKNS Page 1525

Operation
CheckSVEEnabled();
bits(PL) operand2 = P[dm];
bits(PL) result;
if LastActive(mask, operand1, 8) == '1' then

result = operand2;
else
result = Zeros();
if setflags then
PSTATE.<N,Z,C,V> = PredTest(Ones(PL), result, 8);
P[dm] = result;
BRKN, BRKNS Page 1526

BRKPA, BRKPAS
Break after first true condition, propagating from previous partition.
sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Optionally sets
the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 1 1 Pg 0 Pn 0 Pd
S B
BRKPA <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 1 1 Pg 0 Pn 0 Pd
S B
BRKPAS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
BRKPA, BRKPAS Page 1527

Operation
CheckSVEEnabled();
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');
ElemP[result, e, 8] = if last then '1' else '0';
last = last && (ElemP[operand2, e, 8] == '0');
else
ElemP[result, e, 8] = '0';
if setflags then
P[d] = result;
BRKPA, BRKPAS Page 1528

BRKPB, BRKPBS
Break before first true condition, propagating from previous partition.
sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Optionally sets
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 1 1 Pg 0 Pn 1 Pd
S B
BRKPB <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 1 1 Pg 0 Pn 1 Pd
S B
BRKPBS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
BRKPB, BRKPBS Page 1529

Operation
CheckSVEEnabled();
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');
last = last && (ElemP[operand2, e, 8] == '0');
ElemP[result, e, 8] = if last then '1' else '0';
else
ElemP[result, e, 8] = '0';
if setflags then
P[d] = result;
BRKPB, BRKPBS Page 1530

CLASTA (scalar)
Conditionally extract element after last to general-purpose register.
From the source vector register extract the element after the last active element, or if the last active element is the
final element extract element zero, and then zero-extend that element to destructively place in the destination and
first source general-purpose register. If there are no active elements then destructively zero-extend the least
significant element-size bits of the destination and first source general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 0 1 0 1 Pg Zm Rdn
B
SVE
CLASTA <R><dn>, <Pg>, <R><dn>, <Zm>.<T>

integer dn = UInt(Rdn);
integer csize = if esize < 64 then 32 else 64;
boolean isBefore = FALSE;
Assembler Symbols
<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X
<dn> Is the number [0-30] of the source and destination general-purpose register or the name ZR (31),
encoded in the "Rdn" field.
<Zm> Is the name of the source scalable vector register, encoded in the "Zm" field.
size <T>
00 B
01 H
10 S
11 D
CLASTA (scalar) Page 1531

Operation
CheckSVEEnabled();
bits(esize) operand1 = X[dn];
bits(csize) result;
integer last = LastActiveElement(mask, esize);
if last < 0 then

result = ZeroExtend(operand1);
else
if !isBefore then
last = last + 1;
if last >= elements then last = 0;
result = ZeroExtend(Elem[operand2, last, esize]);
X[dn] = result;
CLASTA (scalar) Page 1532

CLASTA (SIMD&FP scalar)
Conditionally extract element after last to SIMD&FP scalar register.
From the source vector register extract the element after the last active element, or if the last active element is the
final element extract element zero, and then zero-extend that element to destructively place in the destination and
first source SIMD & floating-point scalar register. If there are no active elements then destructively zero-extend the
least significant element-size bits of the destination and first source SIMD & floating-point scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 Pg Zm Vdn
B
SVE
CLASTA <V><dn>, <Pg>, <V><dn>, <Zm>.<T>

integer dn = UInt(Vdn);
Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
<dn> Is the number [0-31] of the source and destination SIMD&FP register, encoded in the "Vdn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) operand1 = V[dn];
bits(esize) result;
if last < 0 then

else
if !isBefore then
last = last + 1;
result = Elem[operand2, last, esize];
V[dn] = result;
CLASTA (SIMD&FP scalar) Page 1533

CLASTA (SIMD&FP scalar) Page 1534

CLASTA (vectors)
Conditionally extract element after last to vector register.
From the second source vector register extract the element after the last active element, or if the last active element
is the final element extract element zero, and then replicate that element to destructively fill the destination and first
source vector.
If there are no active elements then leave the destination and source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 Pg Zm Zdn
B
SVE
CLASTA <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
if last < 0 then

result = operand1;
else
if !isBefore then
last = last + 1;
Elem[result, e, esize] = Elem[operand2, last, esize];
Z[dn] = result;
CLASTA (vectors) Page 1535

UNPREDICTABLE:
CLASTA (vectors) Page 1536

CLASTB (scalar)
Conditionally extract last element to general-purpose register.
From the source vector register extract the last active element, and then zero-extend that element to destructively
place in the destination and first source general-purpose register. If there are no active elements then destructively
zero-extend the least significant element-size bits of the destination and first source general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 1 1 0 1 Pg Zm Rdn
B
SVE
CLASTB <R><dn>, <Pg>, <R><dn>, <Zm>.<T>

integer csize = if esize < 64 then 32 else 64;
boolean isBefore = TRUE;
Assembler Symbols

size <R>
01 W
x0 W
11 X
<dn> Is the number [0-30] of the source and destination general-purpose register or the name ZR (31),
encoded in the "Rdn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) operand1 = X[dn];
bits(csize) result;
if last < 0 then

else
if !isBefore then
last = last + 1;
result = ZeroExtend(Elem[operand2, last, esize]);
X[dn] = result;
CLASTB (scalar) Page 1537

CLASTB (scalar) Page 1538

CLASTB (SIMD&FP scalar)
Conditionally extract last element to SIMD&FP scalar register.
From the source vector register extract the last active element, and then zero-extend that element to destructively
place in the destination and first source SIMD & floating-point scalar register. If there are no active elements then
destructively zero-extend the least significant element-size bits of the destination and first source SIMD & floating-
point scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 Pg Zm Vdn
B
SVE
CLASTB <V><dn>, <Pg>, <V><dn>, <Zm>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) result;
if last < 0 then

else
if !isBefore then
last = last + 1;
result = Elem[operand2, last, esize];
V[dn] = result;
CLASTB (SIMD&FP scalar) Page 1539

CLASTB (SIMD&FP scalar) Page 1540

CLASTB (vectors)
Conditionally extract last element to vector register.
From the second source vector register extract the last active element, and then replicate that element to
destructively fill the destination and first source vector.
If there are no active elements then leave the destination and source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 Pg Zm Zdn
B
SVE
CLASTB <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
if last < 0 then

result = operand1;
else
if !isBefore then
last = last + 1;
Elem[result, e, esize] = Elem[operand2, last, esize];
Z[dn] = result;
UNPREDICTABLE:
CLASTB (vectors) Page 1541

CLASTB (vectors) Page 1542

CLS
Count leading sign bits (predicated).
Count leading sign bits in each active element of the source vector, and place the results in the corresponding
elements of the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 1 0 1 Pg Zn Zd
SVE
CLS <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = CountLeadingSignBits(element)<esize-1:0>;
Z[d] = result;
UNPREDICTABLE:
CLS Page 1543

CLZ
Count leading zero bits (predicated).
Count leading zero bits in each active element of the source vector, and place the results in the corresponding
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 1 0 1 Pg Zn Zd
SVE
CLZ <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = CountLeadingZeroBits(element)<esize-1:0>;
Z[d] = result;
UNPREDICTABLE:
CLZ Page 1544

CMP<cc> (immediate)
Compare vector to immediate.
Compare active integer elements in the source vector with an immediate, and place the boolean results of the
specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS, LE, LO, LS, LT or NE.
It has encodings from 10 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same , Less than ,
Less than or equal , Lower , Lower or same and Not equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 1 0 0 Pg Zn 0 Pd
ne
Equal
CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_EQ;
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 0 Pg Zn 1 Pd
lt ne
Greater than
CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_GT;
Greater than or equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 0 Pg Zn 0 Pd
lt ne
CMP<cc> (immediate) Page 1545

CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_GE;
Higher
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 0 Pg Zn 1 Pd
lt ne
Higher
CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_GT;
Higher or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 0 Pg Zn 0 Pd
lt ne
Higher or same
CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_GE;
Less than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 1 Pg Zn 0 Pd
lt ne

Less than
CMPLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_LT;
Less than or equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 1 Pg Zn 1 Pd
lt ne
Less than or equal
CMPLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_LE;
Lower
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 1 Pg Zn 0 Pd
lt ne
Lower
CMPLO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_LT;
Lower or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 1 Pg Zn 1 Pd
lt ne

Lower or same
CMPLS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_LE;
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 1 0 0 Pg Zn 1 Pd
ne
Not equal
CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

SVECmp op = Cmp_NE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
<imm> For the equal, greater than, greater than or equal, less than, less than or equal and not equal variant: is
the signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.
For the higher, higher or same, lower and lower or same variant: is the unsigned immediate operand, in
the range 0 to 127, encoded in the "imm7" field.

Operation
CheckSVEEnabled();
bits(PL) result;
integer element1 = Int(Elem[operand1, e, esize], unsigned);
boolean cond;
case op of
when Cmp_EQ cond = element1 == imm;
when Cmp_NE cond = element1 != imm;
when Cmp_GE cond = element1 >= imm;
when Cmp_LT cond = element1 < imm;
when Cmp_GT cond = element1 > imm;
when Cmp_LE cond = element1 <= imm;
ElemP[result, e, esize] = if cond then '1' else '0';
else

P[d] = result;

CMP<cc> (wide elements)
Compare vector to 64-bit wide elements.
Compare active integer elements in the first source vector with overlapping 64-bit doubleword elements in the second
source vector, and place the boolean results of the specified comparison in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE
(Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS, LE, LO, LS, LT or NE.
It has encodings from 10 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same , Less than ,
Less than or equal , Lower , Lower or same and Not equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 1 Pg Zn 0 Pd
ne
Equal
CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_EQ;
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn 1 Pd
U lt ne
Greater than
CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_GT;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn 0 Pd
U lt ne
CMP<cc> (wide elements) Page 1550

CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_GE;
Higher
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 0 Pg Zn 1 Pd
U lt ne
Higher
CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_GT;
Higher or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 0 Pg Zn 0 Pd
U lt ne
Higher or same
CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_GE;
Less than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn 0 Pd
U lt ne

Less than
CMPLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_LT;
Less than or equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn 1 Pd
U lt ne
Less than or equal
CMPLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_LE;
Lower
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 1 Pg Zn 0 Pd
U lt ne
Lower
CMPLO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_LT;
Lower or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 1 Pg Zn 1 Pd
U lt ne

Lower or same
CMPLS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_LE;
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 1 Pg Zn 1 Pd
ne
Not equal
CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

SVECmp op = Cmp_NE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 RESERVED

Operation
CheckSVEEnabled();
bits(PL) result;
integer element2 = Int(Elem[operand2, (e * esize) DIV 64, 64], unsigned);
boolean cond;
case op of
when Cmp_EQ cond = element1 == element2;
when Cmp_NE cond = element1 != element2;
when Cmp_GE cond = element1 >= element2;
when Cmp_LT cond = element1 < element2;
when Cmp_GT cond = element1 > element2;
when Cmp_LE cond = element1 <= element2;
else

P[d] = result;

CMP<cc> (vectors)
Compare vectors.
Compare active integer elements in the first source vector with corresponding elements in the second source vector,
and place the boolean results of the specified comparison in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS or NE.
This instruction is used by the pseudo-instructions CMPLE (vectors), CMPLO (vectors), CMPLS (vectors), and CMPLT
(vectors).
It has encodings from 6 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same and Not equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 1 Pg Zn 0 Pd
ne
Equal
CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_EQ;
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 1 Pd
ne
Greater than
CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_GT;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 0 Pd
ne
CMP<cc> (vectors) Page 1555

CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_GE;
Higher
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 1 Pd
ne
Higher
CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_GT;
Higher or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 0 Pd
ne
Higher or same
CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_GE;
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 1 Pg Zn 1 Pd
ne

Not equal
CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_NE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) result;
boolean cond;
case op of
when Cmp_EQ cond = element1 == element2;
when Cmp_NE cond = element1 != element2;
when Cmp_GE cond = element1 >= element2;
when Cmp_LT cond = element1 < element2;
when Cmp_GT cond = element1 > element2;
when Cmp_LE cond = element1 <= element2;
else

P[d] = result;

CMPLE (vectors)
Compare signed less than or equal to vector, setting the condition flags.
Compare active signed integer elements in the first source vector being less than or equal to corresponding signed
elements in the second source vector, and place the boolean results of the comparison in the corresponding elements
of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
This is a pseudo-instruction of CMP<cc> (vectors). This means:
• The encodings in this description are named to match the encodings of CMP<cc> (vectors).
• The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 0 Pd
ne
CMPLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>
is equivalent to
CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
CMPLE (vectors) Page 1558

CMPLO (vectors)
Compare unsigned lower than vector, setting the condition flags.
Compare active unsigned integer elements in the first source vector being lower than corresponding unsigned
elements in the second source vector, and place the boolean results of the comparison in the corresponding elements
of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 1 Pd
ne
Higher
CMPLO <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>
is equivalent to
CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CMPLO (vectors) Page 1559

CMPLS (vectors)
Compare unsigned lower or same as vector, setting the condition flags.
Compare active unsigned integer elements in the first source vector being lower than or same as corresponding
unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding
elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the
FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 0 Pd
ne
Higher or same
CMPLS <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>
is equivalent to
CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CMPLS (vectors) Page 1560

CMPLT (vectors)
Compare signed less than vector, setting the condition flags.
Compare active signed integer elements in the first source vector being less than corresponding signed elements in
the second source vector, and place the boolean results of the comparison in the corresponding elements of the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 1 Pd
ne
Greater than
CMPLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>
is equivalent to
CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CMPLT (vectors) Page 1561

CNOT
Logically invert boolean condition in vector (predicated).
Logically invert the boolean value in each active element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
Boolean TRUE is any non-zero value in a source, and one in a result element. Boolean FALSE is always zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 1 0 1 Pg Zn Zd
SVE
CNOT <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = ZeroExtend(IsZeroBit(element), esize);
Z[d] = result;
UNPREDICTABLE:
CNOT Page 1562

CNOT Page 1563

CNT
Count non-zero bits (predicated).
Count non-zero bits in each active element of the source vector, and place the results in the corresponding elements of
the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 1 0 1 Pg Zn Zd
SVE
CNT <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = BitCount(element)<esize-1:0>;
Z[d] = result;
UNPREDICTABLE:
CNT Page 1564

CNTB, CNTD, CNTH, CNTW
Set scalar to multiple of predicate constraint element count.
Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then places the result in the scalar destination.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 4 classes: Byte , Doubleword , Halfword and Word
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>
Byte
CNTB <Xd>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>
Doubleword
CNTD <Xd>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>
CNTB, CNTD, CNTH, CNTW Page 1565

Halfword
CNTH <Xd>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>
Word
CNTW <Xd>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
<Xd> Is the 64-bit name of the destination general-purpose register, encoded in the "Rd" field.
<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
X[d] = (count * imm)<63:0>;


CNTP
Set scalar to count of true predicate elements.
Counts the number of active and true elements in the source predicate and places the scalar result in the destination
general-purpose register. Inactive predicate elements are not counted.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 0 1 0 Pg 0 Pn Rd
SVE
CNTP <Xd>, <Pg>, <Pn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(64) sum = Zeros();
if ElemP[mask, e, esize] == '1' && ElemP[operand, e, esize] == '1' then
sum = sum + 1;
X[d] = sum;
CNTP Page 1568

COMPACT
Shuffle active elements of vector to the right and fill with zero.
Read the active elements from the source vector and pack them into the lowest-numbered elements of the destination
vector. Then set any remaining elements of the destination vector to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 1 1 0 0 Pg Zn Zd
SVE
COMPACT <Zd>.<T>, <Pg>, <Zn>.<T>

if size == '0x' then UNDEFINED;
Assembler Symbols
<T> Is the size specifier, encoded in “size<0>”:
size<0> <T>
0 S
1 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer x = 0;
Elem[result, e, esize] = Zeros();
bits(esize) element = Elem[operand1, e, esize];
Elem[result, x, esize] = element;
x = x + 1;
Z[d] = result;
COMPACT Page 1569

CPY (immediate, zeroing)
Copy signed integer immediate to vector elements (zeroing).
Copy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register are set to zero.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
This instruction is used by the alias MOV (immediate, predicated, zeroing).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 0 sh imm8 Zd
M
SVE
CPY <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.
sh <shift>
0 LSL #0
1 LSL #8
CPY (immediate, zeroing) Page 1570

Operation
CheckSVEEnabled();
bits(VL) dest = Z[d];
bits(VL) result;
Elem[result, e, esize] = imm<esize-1:0>;
elsif merging then
Elem[result, e, esize] = Elem[dest, e, esize];
else
Z[d] = result;
CPY (immediate, zeroing) Page 1571

CPY (immediate, merging)
Copy signed integer immediate to vector elements (merging).
Copy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register remain unmodified.
This instruction is used by the aliases FMOV (zero, predicated), and MOV (immediate, predicated, merging).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 sh imm8 Zd
M
SVE
CPY <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}

boolean merging = TRUE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
CPY (immediate, merging) Page 1572

Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = imm<esize-1:0>;
elsif merging then
else
Z[d] = result;
UNPREDICTABLE:
CPY (immediate, merging) Page 1573

CPY (scalar)
Copy general-purpose register to vector elements (predicated).
Copy the general-purpose scalar source register to each active element in the destination vector. Inactive elements in
This instruction is used by the alias MOV (scalar, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 1 Pg Rn Zd
SVE
CPY <Zd>.<T>, <Pg>/M, <R><n|SP>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
size <R>
01 W
x0 W
11 X
<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.
Operation
CheckSVEEnabled();
bits(64) operand1;
if n == 31 then
operand1 = SP[];
else
operand1 = X[n];
Elem[result, e, esize] = operand1<esize-1:0>;
Z[d] = result;
CPY (scalar) Page 1574

UNPREDICTABLE:
CPY (scalar) Page 1575

CPY (SIMD&FP scalar)
Copy SIMD&FP scalar register to vector elements (predicated).
Copy the SIMD & floating-point scalar source register to each active element in the destination vector. Inactive
This instruction is used by the alias MOV (SIMD&FP scalar, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 0 Pg Vn Zd
SVE
CPY <Zd>.<T>, <Pg>/M, <V><n>

integer n = UInt(Vn);
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
size <V>
00 B
01 H
10 S
11 D
<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vn" field.
Operation
CheckSVEEnabled();
bits(esize) operand1 = V[n];
Elem[result, e, esize] = operand1;
Z[d] = result;
UNPREDICTABLE:
CPY (SIMD&FP scalar) Page 1576

CPY (SIMD&FP scalar) Page 1577

CTERMEQ, CTERMNE
Compare and terminate loop.
Detect termination conditions in serialized vector loops. Tests whether the comparison between the scalar source
operands holds true and if not tests the state of the !LAST condition flag (C) which indicates whether the previous flag-
setting predicate instruction selected the last element of the vector partition.
The Z and C condition flags are preserved by this instruction. The N and V condition flags are set as a pair to generate
one of the following conditions for a subsequent conditional instruction:
* GE (N=0 & V=0): continue loop (compare failed and last element not selected);
* LT (N=0 & V=1): terminate loop (last element selected);
* LT (N=1 & V=0): terminate loop (compare succeeded);
The scalar source operands are 32-bit or 64-bit general-purpose registers of the same size.
It has encodings from 2 classes: Equal and Not equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 sz 1 Rm 0 0 1 0 0 0 Rn 0 0 0 0 0
ne
Equal
CTERMEQ <R><n>, <R><m>

SVECmp op = Cmp_EQ;
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 sz 1 Rm 0 0 1 0 0 0 Rn 1 0 0 0 0
ne
Not equal
CTERMNE <R><n>, <R><m>

SVECmp op = Cmp_NE;
Assembler Symbols
<R> Is a width specifier, encoded in “sz”:

sz <R>
0 W
1 X
<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
CTERMEQ, CTERMNE Page 1578

Operation
CheckSVEEnabled();
bits(esize) operand1 = X[n];
bits(esize) operand2 = X[m];
integer element1 = UInt(operand1);
integer element2 = UInt(operand2);
boolean term;
case op of
when Cmp_EQ term = element1 == element2;
when Cmp_NE term = element1 != element2;
if term then
PSTATE.N = '1';
PSTATE.V = '0';
else
PSTATE.N = '0';
PSTATE.V = (NOT PSTATE.C);
CTERMEQ, CTERMNE Page 1579

DECB, DECD, DECH, DECW (scalar)
Decrement scalar by multiple of predicate constraint element count.
in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination.
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D
Byte
DECB <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D
Doubleword
DECD <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D
DECB, DECD, DECH, DECW

Page 1580
(scalar)
Halfword
DECH <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D
Word
DECW <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(64) operand1 = X[dn];
X[dn] = operand1 - (count * imm);

Page 1581
(scalar)

Page 1582
(scalar)
DECD, DECH, DECW (vector)
Decrement vector by multiple of predicate constraint element count.
in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements.
It has encodings from 3 classes: Doubleword , Halfword and Word
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D
Doubleword
DECD <Zdn>.D{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D
Halfword
DECH <Zdn>.H{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D
DECD, DECH, DECW (vector) Page 1583

Word
DECW <Zdn>.S{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = Elem[operand1, e, esize] - (count * imm);
Z[dn] = result;
UNPREDICTABLE:


DECP (scalar)
Decrement scalar by count of true predicate elements.
Counts the number of true elements in the source predicate and then uses the result to decrement the scalar
destination.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 1 1 0 0 0 1 0 0 Pm Rdn
D
SVE
DECP <Xdn>, <Pm>.<T>

Assembler Symbols
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer count = 0;
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
X[dn] = operand1 - count;
DECP (scalar) Page 1586

DECP (vector)
Decrement vector by count of true predicate elements.
Counts the number of true elements in the source predicate and then uses the result to decrement all destination
vector elements.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 1 1 0 0 0 0 0 0 Pm Zdn
D
SVE
DECP <Zdn>.<T>, <Pm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer count = 0;
count = count + 1;
Elem[result, e, esize] = Elem[operand1, e, esize] - count;
Z[dn] = result;
UNPREDICTABLE:
DECP (vector) Page 1587

DECP (vector) Page 1588

DUP (immediate)
Broadcast signed immediate to vector elements (unpredicated).
Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is
unpredicated.
This instruction is used by the aliases FMOV (zero, unpredicated), and MOV (immediate, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 sh imm8 Zd
SVE
DUP <Zd>.<T>, #<imm>{, <shift>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result = Replicate(imm<esize-1:0>);
Z[d] = result;
DUP (immediate) Page 1589

DUP (scalar)
Broadcast general-purpose register to vector elements (unpredicated).
Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This
instruction is unpredicated.
This instruction is used by the alias MOV (scalar, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Zd
SVE
DUP <Zd>.<T>, <R><n|SP>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

size <R>
01 W
x0 W
11 X
field.
Operation
CheckSVEEnabled();
bits(64) operand;
if n == 31 then
operand = SP[];
else
operand = X[n];
bits(VL) result;
Elem[result, e, esize] = operand<esize-1:0>;
Z[d] = result;
DUP (scalar) Page 1590

DUP (indexed)
Broadcast indexed element to vector (unpredicated).
Unconditionally broadcast the indexed source vector element into each element of the destination vector. This
The immediate element index is in the range of 0 to 63 (bytes), 31 (halfwords), 15 (words), 7 (doublewords) or 3
(quadwords). Selecting an element beyond the accessible vector length causes the destination vector to be set to zero.
This instruction is used by the alias MOV (SIMD&FP scalar, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 imm2 1 tsz 0 0 1 0 0 0 Zn Zd
SVE
DUP <Zd>.<T>, <Zn>.<T>[<imm>]

bits(7) imm = imm2:tsz;
case tsz of
when '10000' esize = 128; index = UInt(imm<6:5>);
when 'x1000' esize = 64; index = UInt(imm<6:4>);
when 'xx100' esize = 32; index = UInt(imm<6:3>);
when 'xxx10' esize = 16; index = UInt(imm<6:2>);
when 'xxxx1' esize = 8; index = UInt(imm<6:1>);
Assembler Symbols
<T> Is the size specifier, encoded in “tsz”:
tsz <T>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q
<imm> Is the immediate index, in the range 0 to one less than the number of elements in 512 bits, encoded in
"imm2:tsz".
Alias Conditions

MOV (SIMD&FP scalar, unpredicated) BitCount(imm2:tsz) == 1
MOV (SIMD&FP scalar, unpredicated) BitCount(imm2:tsz) > 1
DUP (indexed) Page 1591

Operation
CheckSVEEnabled();
bits(VL) result;
if index >= elements then

element = Zeros();
else
element = Elem[operand1, index, esize];
result = Replicate(element);
Z[d] = result;
DUP (indexed) Page 1592

DUPM
Broadcast logical bitmask immediate to vector (unpredicated).
Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction
is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16,
32 or 64 bits.
This instruction is used by the alias MOV (bitmask immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 1 0 0 0 0 imm13 Zd
SVE
DUPM <Zd>.<T>, #<const>

integer esize = 64;
bits(esize) imm;
Assembler Symbols
imm13<12> imm13<5:0> <T>
0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D
Alias Conditions

MOV (bitmask immediate) SVEMoveMaskPreferred(imm13)
Operation
CheckSVEEnabled();
bits(VL) result = Replicate(imm);
Z[d] = result;
DUPM Page 1593

EON
Bitwise exclusive OR with inverted immediate (unpredicated).
Bitwise exclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of
ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This is a pseudo-instruction of EOR (immediate). This means:
• The encodings in this description are named to match the encodings of EOR (immediate).
• The description of EOR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 1 0 0 0 0 imm13 Zdn
SVE
EON <Zdn>.<T>, <Zdn>.<T>, #<const>
is equivalent to
EOR <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)
Assembler Symbols
imm13<12> imm13<5:0> <T>
0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D
Operation
The description of EOR (immediate) gives the operational pseudocode for this instruction.
UNPREDICTABLE:
EON Page 1594

EOR, EORS (predicates)
Bitwise exclusive OR predicates.
Bitwise exclusive OR active elements of the second source predicate with corresponding elements of the first source
This instruction is used by the aliases NOTS, and NOT (predicate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
EORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
Alias Conditions

NOTS Pm == Pg
EOR, EORS (predicates) Page 1595

NOT (predicate) Pm == Pg
Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = element1 EOR element2;
else
if setflags then
P[d] = result;
EOR, EORS (predicates) Page 1596

EOR (vectors, predicated)
Bitwise exclusive OR vectors (predicated).
Bitwise exclusive OR active elements of the second source vector with corresponding elements of the first source
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 0 0 0 Pg Zm Zdn
SVE
EOR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = element1 EOR element2;
else
Z[dn] = result;
UNPREDICTABLE:
EOR (vectors, predicated) Page 1597

EOR (vectors, predicated) Page 1598

EOR (immediate)
Bitwise exclusive OR with immediate (unpredicated).
Bitwise exclusive OR an immediate with each 64-bit element of the source vector, and destructively place the results in
This instruction is used by the pseudo-instruction EON.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 1 0 0 0 0 imm13 Zdn
SVE
EOR <Zdn>.<T>, <Zdn>.<T>, #<const>

bits(64) imm;
Assembler Symbols
imm13<12> imm13<5:0> <T>
0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, 64] = element1 EOR imm;
Z[dn] = result;
UNPREDICTABLE:


EOR (vectors, unpredicated)
Bitwise exclusive OR vectors (unpredicated).
Bitwise exclusive OR all elements of the second source vector with corresponding elements of the first source vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 Zm 0 0 1 1 0 0 Zn Zd
SVE
EOR <Zd>.D, <Zn>.D, <Zm>.D

Assembler Symbols
Operation
CheckSVEEnabled();
Z[d] = operand1 EOR operand2;
EOR (vectors, unpredicated) Page 1601

EORV
Bitwise exclusive OR reduction to scalar.
Bitwise exclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 0 0 1 Pg Zn Vd
SVE
EORV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) result = Zeros(esize);
result = result EOR Elem[operand, e, esize];
V[d] = result;
EORV Page 1602

EXT
Extract vector from pair of vectors.
Copy the indexed byte up to the last byte of the first source vector to the bottom of the result vector, then fill the
remainder of the result starting from the first byte of the second source vector. The result is placed destructively in the
first source vector. This instruction is unpredicated.
An index that is greater than or equal to the vector length in bytes is treated as zero, leaving the destination and first
source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 imm8h 0 0 0 imm8l Zm Zdn
SVE
EXT <Zdn>.B, <Zdn>.B, <Zm>.B, #<imm>

integer esize = 8;
integer position = UInt(imm8h:imm8l);
Assembler Symbols
<imm> Is the unsigned immediate operand, in the range 0 to 255, encoded in the "imm8h:imm8l" fields.
Operation
CheckSVEEnabled();
bits(VL) result;
if position >= elements then

position = 0;
position = position << 3;

bits(VL*2) concat = operand2 : operand1;
result = concat<position+VL-1:position>;
Z[dn] = result;
UNPREDICTABLE:
EXT Page 1603

FABD
Floating-point absolute difference (predicated).
Compute the absolute difference of active floating-point elements of the second source vector and corresponding
floating-point elements of the first source vector and destructively place the result in the corresponding elements of
the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 0 0 1 0 0 Pg Zm Zdn
SVE
FABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPAbs(FPSub(element1, element2, FPCR<31:0>));
else
Elem[result, e, esize] = element1;
Z[dn] = result;
UNPREDICTABLE:
FABD Page 1604

FABD Page 1605

FABS
Floating-point absolute value (predicated).
Take the absolute value of each active floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. This clears the sign bit and cannot signal a floating-point exception.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 0 0 1 0 1 Pg Zn Zd
SVE
FABS <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = FPAbs(element);
Z[d] = result;
UNPREDICTABLE:
FABS Page 1606

FABS Page 1607

FAC<cc>
Floating-point absolute compare vectors.
Compare active absolute values of floating-point elements in the first source vector with corresponding absolute
values of elements in the second source vector, and place the boolean results of the specified comparison in the
corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to
zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: GE, GT, LE, or LT.
This instruction is used by the pseudo-instructions FACLE, and FACLT.
It has encodings from 2 classes: Greater than and Greater than or equal
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 1 Pg Zn 1 Pd
Greater than
FACGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_GT;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 1 Pd
FACGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_GE;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
FAC<cc> Page 1608

Operation
CheckSVEEnabled();
bits(PL) result;
case op of
when Cmp_GE res = FPCompareGE(FPAbs(element1), FPAbs(element2), FPCR<31:0>);
when Cmp_GT res = FPCompareGT(FPAbs(element1), FPAbs(element2), FPCR<31:0>);
ElemP[result, e, esize] = if res then '1' else '0';
else
P[d] = result;
FAC<cc> Page 1609

FACLE
Floating-point absolute compare less than or equal.
Compare active absolute values of floating-point elements in the first source vector being less than or equal to
corresponding absolute values of elements in the second source vector, and place the boolean results of the
comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate
register are set to zero. Does not set the condition flags.
This is a pseudo-instruction of FAC<cc>. This means:
• The encodings in this description are named to match the encodings of FAC<cc>.
• The description of FAC<cc> gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 1 Pd
FACLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>
is equivalent to
FACGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of FAC<cc> gives the operational pseudocode for this instruction.
FACLE Page 1610

FACLT
Floating-point absolute compare less than.
Compare active absolute values of floating-point elements in the first source vector being less than corresponding
absolute values of elements in the second source vector, and place the boolean results of the comparison in the
corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to
zero. Does not set the condition flags.
This is a pseudo-instruction of FAC<cc>. This means:
• The encodings in this description are named to match the encodings of FAC<cc>.
• The description of FAC<cc> gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 1 Pg Zn 1 Pd
Greater than
FACLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>
is equivalent to
FACGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of FAC<cc> gives the operational pseudocode for this instruction.
FACLT Page 1611

FADD (immediate)
Floating-point add immediate (predicated).
Add an immediate to each active floating-point element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 0 1 0 0 Pg 0 0 0 0 i1 Zdn
SVE
FADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

bits(esize) imm = if i1 == '0' then FPPointFive('0') else FPOne('0');
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
<const> Is the floating-point immediate value, encoded in “i1”:
i1 <const>
0 #0.5
1 #1.0
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPAdd(element1, imm, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FADD (immediate) Page 1612

FADD (immediate) Page 1613

FADD (vectors, predicated)
Floating-point add vector (predicated).
Add active floating-point elements of the second source vector to corresponding floating-point elements of the first
source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 1 0 0 Pg Zm Zdn
SVE
FADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPAdd(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FADD (vectors, predicated) Page 1614

FADD (vectors, predicated) Page 1615

FADD (vectors, unpredicated)
Floating-point add vector (unpredicated).
Add all floating-point elements of the second source vector to corresponding elements of the first source vector and
place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 0 0 Zn Zd
SVE
FADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPAdd(element1, element2, FPCR<31:0>);
Z[d] = result;
FADD (vectors, unpredicated) Page 1616

FADDA
Floating-point add strictly-ordered reduction, accumulating in scalar.
Floating-point add a SIMD&FP scalar source and all active lanes of the vector source and place the result
destructively in the SIMD&FP scalar source register. Vector elements are processed strictly in order from low to high,
with the scalar source providing the initial value. Inactive elements in the source vector are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 0 0 0 1 Pg Zm Vdn
SVE
FADDA <V><dn>, <Pg>, <V><dn>, <Zm>.<T>

Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 D
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) result = operand1;
result = FPAdd(result, element, FPCR<31:0>);
V[dn] = result;
FADDA Page 1617

FADDV
Floating-point add recursive reduction to scalar.
Floating-point add horizontally over all lanes of a vector using a recursive pairwise reduction, and place the result in
the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as +0.0.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 0 0 1 Pg Zn Vd
SVE
FADDV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 D
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) identity = FPZero('0');
V[d] = ReducePredicated(ReduceOp_FADD, operand, mask, identity);
FADDV Page 1618

FCADD
Floating-point complex add with rotate (predicated).
Add the real and imaginary components of the active floating-point complex numbers from the first source vector to
the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the
direction from the positive real axis towards the positive imaginary axis, when considered in polar representation,
equivalent to multiplying the complex numbers in the second source vector by ± J beforehand. Destructively place the
results in the corresponding elements of the first source vector. Inactive elements in the destination vector register
remain unmodified.
Each complex number is represented in a vector register as an even/odd pair of elements with the real part in the
even-numbered element and the imaginary part in the odd-numbered element.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 0 0 0 0 0 rot 1 0 0 Pg Zm Zdn
SVE
FCADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>, <const>

boolean sub_i = (rot == '0');
boolean sub_r = (rot == '1');
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
<const> Is the const specifier, encoded in “rot”:
rot <const>
0 #90
1 #270
FCADD Page 1619

Operation
CheckSVEEnabled();
integer pairs = VL DIV (2 * esize);
bits(VL) result;
acc_r = Elem[operand1, 2 * p + 0, esize];
acc_i = Elem[operand1, 2 * p + 1, esize];
elt2_r = Elem[operand2, 2 * p + 0, esize];
elt2_i = Elem[operand2, 2 * p + 1, esize];
if ElemP[mask, 2 * p + 0, esize] == '1' then
if sub_i then elt2_i = FPNeg(elt2_i);
acc_r = FPAdd(acc_r, elt2_i, FPCR<31:0>);
if sub_r then elt2_r = FPNeg(elt2_r);
acc_i = FPAdd(acc_i, elt2_r, FPCR<31:0>);
Elem[result, 2 * p + 0, esize] = acc_r;
Elem[result, 2 * p + 1, esize] = acc_i;
Z[dn] = result;
UNPREDICTABLE:
FCADD Page 1620

FCM<cc> (zero)
Floating-point compare vector with zero.
Compare active floating-point elements in the source vector with zero, and place the boolean results of the specified
comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate
register are set to zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, LE, LT, or NE.
It has encodings from 6 classes: Equal , Greater than , Greater than or equal , Less than , Less than or equal and Not
equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 1 0 0 0 1 Pg Zn 0 Pd
eq lt ne
Equal
FCMEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

SVECmp op = Cmp_EQ;
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 0 0 0 1 Pg Zn 1 Pd
eq lt ne
Greater than
FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

SVECmp op = Cmp_GT;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 0 0 0 1 Pg Zn 0 Pd
eq lt ne
FCM<cc> (zero) Page 1621

FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

SVECmp op = Cmp_GE;
Less than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 1 0 0 1 Pg Zn 0 Pd
eq lt ne
Less than
FCMLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

SVECmp op = Cmp_LT;
Less than or equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 1 0 0 1 Pg Zn 1 Pd
eq lt ne
Less than or equal
FCMLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

SVECmp op = Cmp_LE;
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 1 1 0 0 1 Pg Zn 0 Pd
eq lt ne

Not equal
FCMNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

SVECmp op = Cmp_NE;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) result;
case op of
when Cmp_EQ res = FPCompareEQ(element, 0<esize-1:0>, FPCR<31:0>);
when Cmp_GE res = FPCompareGE(element, 0<esize-1:0>, FPCR<31:0>);
when Cmp_GT res = FPCompareGT(element, 0<esize-1:0>, FPCR<31:0>);
when Cmp_NE res = FPCompareNE(element, 0<esize-1:0>, FPCR<31:0>);
when Cmp_LT res = FPCompareGT(0<esize-1:0>, element, FPCR<31:0>);
when Cmp_LE res = FPCompareGE(0<esize-1:0>, element, FPCR<31:0>);
else
P[d] = result;

FCM<cc> (vectors)
Floating-point compare vectors.
Compare active floating-point elements in the first source vector with corresponding elements in the second source
vector, and place the boolean results of the specified comparison in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, or NE, with the addition of UO for
an unordered comparison.
This instruction is used by the pseudo-instructions FCMLE (vectors), and FCMLT (vectors).
It has encodings from 5 classes: Equal , Greater than , Greater than or equal , Not equal and Unordered
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 1 Pg Zn 0 Pd
cmph cmpl
Equal
FCMEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_EQ;
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 1 Pd
cmph cmpl
Greater than
FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_GT;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 0 Pd
cmph cmpl
FCM<cc> (vectors) Page 1624

FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_GE;
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 1 Pg Zn 1 Pd
cmph cmpl
Not equal
FCMNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_NE;
Unordered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 0 Pd
Unordered
FCMUO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

SVECmp op = Cmp_UN;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D

Operation
CheckSVEEnabled();
bits(PL) result;
case op of
when Cmp_EQ res = FPCompareEQ(element1, element2, FPCR<31:0>);
when Cmp_GE res = FPCompareGE(element1, element2, FPCR<31:0>);
when Cmp_GT res = FPCompareGT(element1, element2, FPCR<31:0>);
when Cmp_UN res = FPCompareUN(element1, element2, FPCR<31:0>);
when Cmp_NE res = FPCompareNE(element1, element2, FPCR<31:0>);
when Cmp_LT res = FPCompareGT(element2, element1, FPCR<31:0>);
when Cmp_LE res = FPCompareGE(element2, element1, FPCR<31:0>);
else
P[d] = result;

FCMLA (vectors)
Floating-point complex multiply-add with rotate (predicated).
Multiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of
the floating-point complex numbers in the first source vector by the corresponding complex number in the second
source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive
imaginary axis, when considered in polar representation.
Then destructively add the products to the corresponding components of the complex numbers in the addend and
destination vector, without intermediate rounding.
These transformations permit the creation of a variety of multiply-add and multiply-subtract operations on complex
numbers by combining two of these instructions with the same vector operands but with rotations that are 90 degrees
apart.
even-numbered element and the imaginary part in the odd-numbered element. Inactive elements in the destination
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 0 Zm 0 rot Pg Zn Zda
SVE
FCMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>, <const>

integer sel_a = UInt(rot<0>);
integer sel_b = UInt(NOT(rot<0>));
boolean neg_i = (rot<1> == '1');
boolean neg_r = (rot<0> != rot<1>);
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
rot <const>
00 #0
01 #90
10 #180
11 #270
FCMLA (vectors) Page 1627

Operation
CheckSVEEnabled();
bits(VL) result;
addend_r = Elem[operand3, 2 * p + 0, esize];
addend_i = Elem[operand3, 2 * p + 1, esize];
elt1_a = Elem[operand1, 2 * p + sel_a, esize];
elt2_b = Elem[operand2, 2 * p + sel_b, esize];
if neg_r then elt2_a = FPNeg(elt2_a);
addend_r = FPMulAdd(addend_r, elt1_a, elt2_a, FPCR<31:0>);
if neg_i then elt2_b = FPNeg(elt2_b);
addend_i = FPMulAdd(addend_i, elt1_a, elt2_b, FPCR<31:0>);
Elem[result, 2 * p + 0, esize] = addend_r;
Elem[result, 2 * p + 1, esize] = addend_i;
Z[da] = result;
UNPREDICTABLE:
FCMLA (vectors) Page 1628

FCMLA (indexed)
Floating-point complex multiply-add by indexed values with rotate.
Multiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of
the floating-point complex numbers in each 128-bit segment of the first source vector by the specified complex number
in the corresponding the second source vector segment rotated by 0, 90, 180 or 270 degrees in the direction from the
positive real axis towards the positive imaginary axis, when considered in polar representation.
Then destructively add the products to the corresponding components of the complex numbers in the addend and
destination vector, without intermediate rounding.
These transformations permit the creation of a variety of multiply-add and multiply-subtract operations on complex
numbers by combining two of these instructions with the same vector operands but with rotations that are 90 degrees
apart.
even-numbered element and the imaginary part in the odd-numbered element.
The complex numbers within the second source vector are specified using an immediate index which selects the same
complex number position within each 128-bit vector segment. The index range is from 0 to one less than the number
of complex numbers per 128-bit segment, encoded in 1 to 2 bits depending on the size of the complex number. This
It has encodings from 2 classes: Half-precision and Single-precision
Half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 rot Zn Zda
size<1>size<0>
Half-precision
FCMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>], <const>

integer esize = 16;
Single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 1 rot Zn Zda
size<1>size<0>
Single-precision
FCMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>], <const>

integer esize = 32;
FCMLA (indexed) Page 1629

Assembler Symbols
<Zm> For the half-precision variant: is the name of the second source scalable vector register Z0-Z7, encoded
in the "Zm" field.
For the single-precision variant: is the name of the second source scalable vector register Z0-Z15,
encoded in the "Zm" field.
<imm> For the half-precision variant: is the index of a Real and Imaginary pair, in the range 0 to 3, encoded in
the "i2" field.
For the single-precision variant: is the index of a Real and Imaginary pair, in the range 0 to 1, encoded
in the "i1" field.
rot <const>
00 #0
01 #90
10 #180
11 #270
Operation
CheckSVEEnabled();
integer pairspersegment = 128 DIV (2 * esize);
bits(VL) result;
segmentbase = p - (p MOD pairspersegment);
s = segmentbase + index;
addend_r = Elem[operand3, 2 * p + 0, esize];
addend_i = Elem[operand3, 2 * p + 1, esize];
elt2_a = Elem[operand2, 2 * s + sel_a, esize];
elt2_b = Elem[operand2, 2 * s + sel_b, esize];
if neg_r then elt2_a = FPNeg(elt2_a);
if neg_i then elt2_b = FPNeg(elt2_b);
addend_r = FPMulAdd(addend_r, elt1_a, elt2_a, FPCR<31:0>);
addend_i = FPMulAdd(addend_i, elt1_a, elt2_b, FPCR<31:0>);
Elem[result, 2 * p + 0, esize] = addend_r;
Elem[result, 2 * p + 1, esize] = addend_i;
Z[da] = result;
UNPREDICTABLE:
FCMLA (indexed) Page 1630

FCMLE (vectors)
Floating-point compare less than or equal to vector.
Compare active floating-point elements in the first source vector being less than or equal to corresponding elements in
the second source vector, and place the boolean results of the comparison in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.
This is a pseudo-instruction of FCM<cc> (vectors). This means:
• The encodings in this description are named to match the encodings of FCM<cc> (vectors).
• The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 0 Pd
cmph cmpl
FCMLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>
is equivalent to
FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
FCMLE (vectors) Page 1631

FCMLT (vectors)
Floating-point compare less than vector.
Compare active floating-point elements in the first source vector being less than corresponding elements in the second
source vector, and place the boolean results of the comparison in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
This is a pseudo-instruction of FCM<cc> (vectors). This means:
• The encodings in this description are named to match the encodings of FCM<cc> (vectors).
• The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 1 Pd
cmph cmpl
Greater than
FCMLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>
is equivalent to
FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
FCMLT (vectors) Page 1632

FCPY
Copy 8-bit floating-point immediate to vector elements (predicated).
Copy a floating-point immediate into each active element in the destination vector. Inactive elements in the destination
This instruction is used by the alias FMOV (immediate, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 1 1 0 imm8 Zd
SVE
FCPY <Zd>.<T>, <Pg>/M, #<const>

bits(esize) imm = VFPExpandImm(imm8);
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
<const> Is a floating-point immediate value expressable as ±n÷16×2^r, where n and r are integers such that 16
≤ n ≤ 31 and -3 ≤ r ≤ 4, i.e. a normalized binary floating-point encoding with 1 sign bit, 3-bit exponent,
and 4-bit fractional part, encoded in the "imm8" field.
Operation
CheckSVEEnabled();
Elem[result, e, esize] = imm;
Z[d] = result;
UNPREDICTABLE:
FCPY Page 1633

FCPY Page 1634

FCVT
Floating-point convert precision (predicated).
Convert the size and precision of each active floating-point element of the source vector, and place the results in the
unmodified.
Since the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 6 classes: Half-precision to single-precision , Half-precision to double-precision , Single-

precision to half-precision , Single-precision to double-precision , Double-precision to half-precision and Double-
precision to single-precision
Half-precision to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 1 Pg Zn Zd
Half-precision to single-precision
FCVT <Zd>.S, <Pg>/M, <Zn>.H

integer esize = 32;
integer s_esize = 16;
integer d_esize = 32;
Half-precision to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 Pg Zn Zd
Half-precision to double-precision
FCVT <Zd>.D, <Pg>/M, <Zn>.H

integer esize = 64;
Single-precision to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 0 1 0 1 Pg Zn Zd
FCVT Page 1635

Single-precision to half-precision
FCVT <Zd>.H, <Pg>/M, <Zn>.S

integer esize = 32;
Single-precision to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 Pg Zn Zd
Single-precision to double-precision
FCVT <Zd>.D, <Pg>/M, <Zn>.S

integer esize = 64;
Double-precision to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0 1 0 1 Pg Zn Zd
Double-precision to half-precision
FCVT <Zd>.H, <Pg>/M, <Zn>.D

integer esize = 64;
Double-precision to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 0 1 0 1 Pg Zn Zd
Double-precision to single-precision
FCVT <Zd>.S, <Pg>/M, <Zn>.D

integer esize = 64;
FCVT Page 1636

Assembler Symbols
Operation
CheckSVEEnabled();
bits(d_esize) res = FPConvertSVE(element<s_esize-1:0>, FPCR<31:0>);
Elem[result, e, esize] = ZeroExtend(res);
Z[d] = result;
UNPREDICTABLE:
FCVT Page 1637

FCVTZS
Floating-point convert to signed integer, rounding toward zero (predicated).
Convert to the signed integer nearer to zero from each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
When the result is the smaller type the results are sign-extended to fill each destination element.
It has encodings from 7 classes: Half-precision to 16-bit , Half-precision to 32-bit , Half-precision to 64-bit , Single-
precision to 32-bit , Single-precision to 64-bit , Double-precision to 32-bit and Double-precision to 64-bit
Half-precision to 16-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 Pg Zn Zd
int_U
FCVTZS <Zd>.H, <Pg>/M, <Zn>.H

integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U
FCVTZS <Zd>.S, <Pg>/M, <Zn>.H

integer esize = 32;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 Pg Zn Zd
int_U
FCVTZS Page 1638

FCVTZS <Zd>.D, <Pg>/M, <Zn>.H

integer esize = 64;
Single-precision to 32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U
FCVTZS <Zd>.S, <Pg>/M, <Zn>.S

integer esize = 32;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U
FCVTZS <Zd>.D, <Pg>/M, <Zn>.S

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 Pg Zn Zd
int_U
FCVTZS Page 1639

FCVTZS <Zd>.S, <Pg>/M, <Zn>.D

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 Pg Zn Zd
int_U
FCVTZS <Zd>.D, <Pg>/M, <Zn>.D

integer esize = 64;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(d_esize) res = FPToFixed(element<s_esize-1:0>, 0, unsigned, FPCR<31:0>, rounding);
Elem[result, e, esize] = Extend(res, unsigned);
Z[d] = result;
UNPREDICTABLE:
FCVTZS Page 1640

FCVTZS Page 1641

FCVTZU
Floating-point convert to unsigned integer, rounding toward zero (predicated).
Convert to the unsigned integer nearer to zero from each active floating-point element of the source vector, and place
the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
It has encodings from 7 classes: Half-precision to 16-bit , Half-precision to 32-bit , Half-precision to 64-bit , Single-
precision to 32-bit , Single-precision to 64-bit , Double-precision to 32-bit and Double-precision to 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 Pg Zn Zd
int_U
FCVTZU <Zd>.H, <Pg>/M, <Zn>.H

integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U
FCVTZU <Zd>.S, <Pg>/M, <Zn>.H

integer esize = 32;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 0 1 Pg Zn Zd
int_U
FCVTZU Page 1642

FCVTZU <Zd>.D, <Pg>/M, <Zn>.H

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U
FCVTZU <Zd>.S, <Pg>/M, <Zn>.S

integer esize = 32;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U
FCVTZU <Zd>.D, <Pg>/M, <Zn>.S

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 1 Pg Zn Zd
int_U
FCVTZU Page 1643

FCVTZU <Zd>.S, <Pg>/M, <Zn>.D

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 0 1 Pg Zn Zd
int_U
FCVTZU <Zd>.D, <Pg>/M, <Zn>.D

integer esize = 64;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(d_esize) res = FPToFixed(element<s_esize-1:0>, 0, unsigned, FPCR<31:0>, rounding);
Elem[result, e, esize] = Extend(res, unsigned);
Z[d] = result;
UNPREDICTABLE:
FCVTZU Page 1644

FCVTZU Page 1645

FDIV
Floating-point divide by vector (predicated).
Divide active floating-point elements of the first source vector by corresponding floating-point elements of the second
source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 1 1 0 0 Pg Zm Zdn
SVE
FDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPDiv(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FDIV Page 1646

FDIV Page 1647

FDIVR
Floating-point reversed divide by vector (predicated).
Reversed divide active floating-point elements of the second source vector by corresponding floating-point elements of
the first source vector and destructively place the quotient in the corresponding elements of the first source vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 0 1 0 0 Pg Zm Zdn
SVE
FDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPDiv(element2, element1, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FDIVR Page 1648

FDIVR Page 1649

FDUP
Broadcast 8-bit floating-point immediate to vector elements (unpredicated).
Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is
unpredicated.
This instruction is used by the alias FMOV (immediate, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 1 1 1 0 imm8 Zd
SVE
FDUP <Zd>.<T>, #<const>

bits(esize) imm = VFPExpandImm(imm8);
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = imm;
Z[d] = result;
FDUP Page 1650

FEXPA
Floating-point exponential accelerator.
The FEXPA instruction accelerates the polynomial series calculation of the EXP(X) function.
The double-precision variant copies the low 52 bits of an entry from a hard-wired table of 64-bit coefficients, indexed
by the low 6 bits of each element of the source vector, and prepends to that the next 11 bits of the source element
(src<16:6>), setting the sign bit to zero.
The single-precision variant copies the low 23 bits of an entry from hard-wired table of 32-bit coefficients, indexed by
the low 6 bits of each element of the source vector, and prepends to that the next 8 bits of the source element
(src<13:6>), setting the sign bit to zero.
The half-precision variant copies the low 10 bits of an entry from hard-wired table of 16-bit coefficients, indexed by the
low 5 bits of each element of the source vector, and prepends to that the next 5 bits of the source element (src<9:5>),
setting the sign bit to zero.
A coefficient table entry with index M holds the floating-point value 2(m/64), or for the half-precision variant 2(m/32).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Zn Zd
SVE
FEXPA <Zd>.<T>, <Zn>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPExpA(element);
Z[d] = result;
FEXPA Page 1651

FMAD
Floating-point fused multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm].
Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first
source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 0 0 Pg Zm Zdn
N op
SVE
FMAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

integer a = UInt(Za);
boolean op1_neg = FALSE;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
<Za> Is the name of the third source scalable vector register, encoded in the "Za" field.
FMAD Page 1652

Operation
CheckSVEEnabled();
bits(VL) operand3 = Z[a];
bits(VL) result;

if op1_neg then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMAD Page 1653

FMAX (immediate)
Floating-point maximum with immediate (predicated).
Determine the maximum of an immediate and each active floating-point element of the source vector, and
destructively place the results in the corresponding elements of the source vector. The immediate may take the value
+0.0 or +1.0 only. If the element value is NaN then the result is NaN. Inactive elements in the destination vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 1 0 1 0 0 Pg 0 0 0 0 i1 Zdn
SVE
FMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

bits(esize) imm = if i1 == '0' then Zeros() else FPOne('0');
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
i1 <const>
0 #0.0
1 #1.0
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMax(element1, imm, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMAX (immediate) Page 1654

FMAX (immediate) Page 1655

FMAX (vectors)
Floating-point maximum (predicated).
Determine the maximum of active floating-point elements of the second source vector and corresponding floating-point
elements of the first source vector and destructively place the results in the corresponding elements of the first source
vector. If either element value is NaN then the result is NaN. Inactive elements in the destination vector register
remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 1 0 0 Pg Zm Zdn
SVE
FMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMax(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMAX (vectors) Page 1656

FMAX (vectors) Page 1657

FMAXNM (immediate)
Floating-point maximum number with immediate (predicated).
Determine the maximum number value of an immediate and each active floating-point element of the source vector,
and destructively place the results in the corresponding elements of the source vector. The immediate may take the
value +0.0 or +1.0 only. If the element value is NaN then the result is the immediate. Inactive elements in the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 0 0 1 0 0 Pg 0 0 0 0 i1 Zdn
SVE
FMAXNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
i1 <const>
0 #0.0
1 #1.0
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMaxNum(element1, imm, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMAXNM (immediate) Page 1658

FMAXNM (immediate) Page 1659

FMAXNM (vectors)
Floating-point maximum number (predicated).
Determine the maximum number value of active floating-point elements of the second source vector and
corresponding floating-point elements of the first source vector and destructively place the results in the
corresponding elements of the first source vector. If one element value is NaN then the result is the numeric value.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 1 0 0 Pg Zm Zdn
SVE
FMAXNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMAXNM (vectors) Page 1660

FMAXNM (vectors) Page 1661

FMAXNMV
Floating-point maximum number recursive reduction to scalar.
Floating-point maximum number horizontally over all lanes of a vector using a recursive pairwise reduction, and place
the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as the default
NaN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 0 0 1 Pg Zn Vd
SVE
FMAXNMV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 D
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) identity = FPDefaultNaN();
V[d] = ReducePredicated(ReduceOp_FMAXNUM, operand, mask, identity);
FMAXNMV Page 1662

FMAXV
Floating-point maximum recursive reduction to scalar.
Floating-point maximum horizontally over all lanes of a vector using a recursive pairwise reduction, and place the
result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as -Infinity.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 0 0 1 Pg Zn Vd
SVE
FMAXV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 D
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) identity = FPInfinity('1');
V[d] = ReducePredicated(ReduceOp_FMAX, operand, mask, identity);
FMAXV Page 1663

FMIN (immediate)
Floating-point minimum with immediate (predicated).
Determine the minimum of an immediate and each active floating-point element of the source vector, and destructively
place the results in the corresponding elements of the source vector. The immediate may take the value +0.0 or +1.0
only. If the element value is NaN then the result is NaN. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 1 1 1 0 0 Pg 0 0 0 0 i1 Zdn
SVE
FMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
i1 <const>
0 #0.0
1 #1.0
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMin(element1, imm, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMIN (immediate) Page 1664

FMIN (immediate) Page 1665

FMIN (vectors)
Floating-point minimum (predicated).
Determine the minimum of active floating-point elements of the second source vector and corresponding floating-point
elements of the first source vector and destructively place the results in the corresponding elements of the first source
vector. If either element value is NaN then the result is NaN. Inactive elements in the destination vector register
remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 1 0 0 Pg Zm Zdn
SVE
FMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMin(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMIN (vectors) Page 1666

FMIN (vectors) Page 1667

FMINNM (immediate)
Floating-point minimum number with immediate (predicated).
Determine the minimum number value of an immediate and each active floating-point element of the source vector,
and destructively place the results in the corresponding elements of the source vector. The immediate may take the
value +0.0 or +1.0 only. If the element value is NaN then the result is the immediate. Inactive elements in the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 0 1 1 0 0 Pg 0 0 0 0 i1 Zdn
SVE
FMINNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
i1 <const>
0 #0.0
1 #1.0
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMinNum(element1, imm, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMINNM (immediate) Page 1668

FMINNM (immediate) Page 1669

FMINNM (vectors)
Floating-point minimum number (predicated).
Determine the minimum number value of active floating-point elements of the second source vector and corresponding
floating-point elements of the first source vector and destructively place the results in the corresponding elements of
the first source vector. If one element value is NaN then the result is the numeric value. Inactive elements in the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 1 1 0 0 Pg Zm Zdn
SVE
FMINNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMINNM (vectors) Page 1670

FMINNM (vectors) Page 1671

FMINNMV
Floating-point minimum number recursive reduction to scalar.
Floating-point minimum number horizontally over all lanes of a vector using a recursive pairwise reduction, and place
the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as the default
NaN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 1 0 0 1 Pg Zn Vd
SVE
FMINNMV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 D
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) identity = FPDefaultNaN();
V[d] = ReducePredicated(ReduceOp_FMINNUM, operand, mask, identity);
FMINNMV Page 1672

FMINV
Floating-point minimum recursive reduction to scalar.
Floating-point minimum horizontally over all lanes of a vector using a recursive pairwise reduction, and place the
result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as +Infinity.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 0 0 1 Pg Zn Vd
SVE
FMINV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 RESERVED
01 H
10 S
11 D
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) identity = FPInfinity('0');
V[d] = ReducePredicated(ReduceOp_FMIN, operand, mask, identity);
FMINV Page 1673

FMLA (vectors)
Floating-point fused multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm].
the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and
third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 0 0 Pg Zn Zda
N op
SVE
FMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
FMLA (vectors) Page 1674

Operation
CheckSVEEnabled();
bits(VL) result;

else
Z[da] = result;
UNPREDICTABLE:
FMLA (vectors) Page 1675

FMLA (indexed)
Floating-point fused multiply-add by indexed elements (Zda = Zda + Zn * Zm[indexed]).
Multiply all floating-point elements within each 128-bit segment of the first source vector by the specified element in
the corresponding second source vector segment. The products are then destructively added without intermediate
rounding to the corresponding elements of the addend and destination vector.
The elements within the second source vector are specified using an immediate index which selects the same element
position within each 128-bit vector segment. The index range is from 0 to one less than the number of elements per
128-bit segment, encoded in 1 to 3 bits depending on the size of the element. This instruction is unpredicated.
It has encodings from 3 classes: Half-precision , Single-precision and Double-precision
Half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 0 0 0 0 Zn Zda
op
Half-precision
FMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>]

integer esize = 16;
Single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> op
Single-precision
FMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>]

integer esize = 32;
Double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> op
FMLA (indexed) Page 1676

Double-precision
FMLA <Zda>.D, <Zn>.D, <Zm>.D[<imm>]

integer esize = 64;
Assembler Symbols
<Zm> For the half-precision and single-precision variant: is the name of the second source scalable vector
register Z0-Z7, encoded in the "Zm" field.
For the double-precision variant: is the name of the second source scalable vector register Z0-Z15,
<imm> For the half-precision variant: is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
For the single-precision variant: is the immediate index, in the range 0 to 3, encoded in the "i2" field.
For the double-precision variant: is the immediate index, in the range 0 to 1, encoded in the "i1" field.
Operation
CheckSVEEnabled();
integer eltspersegment = 128 DIV esize;
bits(VL) result = Z[da];
bits(esize) element2 = Elem[operand2, s, esize];
bits(esize) element3 = Elem[result, e, esize];
Z[da] = result;
UNPREDICTABLE:
FMLA (indexed) Page 1677

FMLS (vectors)
Floating-point fused multiply-subtract vectors (predicated), writing addend [Zda = Zda + -Zn * Zm].
Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the
destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 0 1 Pg Zn Zda
N op
SVE
FMLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

boolean op1_neg = TRUE;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
FMLS (vectors) Page 1678

Operation
CheckSVEEnabled();
bits(VL) result;

else
Z[da] = result;
UNPREDICTABLE:
FMLS (vectors) Page 1679

FMLS (indexed)
Floating-point fused multiply-subtract by indexed elements (Zda = Zda + -Zn * Zm[indexed]).
the corresponding second source vector segment. The products are then destructively subtracted without intermediate
rounding from the corresponding elements of the addend and destination vector.
Half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 0 0 0 1 Zn Zda
op
Half-precision
FMLS <Zda>.H, <Zn>.H, <Zm>.H[<imm>]

integer esize = 16;
Single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> op
Single-precision
FMLS <Zda>.S, <Zn>.S, <Zm>.S[<imm>]

integer esize = 32;
Double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> op
FMLS (indexed) Page 1680

Double-precision
FMLS <Zda>.D, <Zn>.D, <Zm>.D[<imm>]

integer esize = 64;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(VL) result = Z[da];
bits(esize) element3 = Elem[result, e, esize];
Z[da] = result;
UNPREDICTABLE:
FMLS (indexed) Page 1681

FMMLA
Floating-point matrix multiply-accumulate.
The floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in
a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of
the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2
matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend
and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction
is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the
current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the
trailing bits are set to zero.
ID_AA64ZFR0_EL1.F32MM indicates whether the single-precision variant is implemented.
ID_AA64ZFR0_EL1.F64MM indicates whether the double-precision variant is implemented.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 Zm 1 1 1 0 0 1 Zn Zda
32-bit element
FMMLA <Zda>.S, <Zn>.S, <Zm>.S
if !HaveSVEFP32MatMulExt() then UNDEFINED;

integer esize = 32;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 1 1 0 0 1 Zn Zda
64-bit element
FMMLA <Zda>.D, <Zn>.D, <Zm>.D

integer esize = 64;
Assembler Symbols
FMMLA Page 1682

Operation
CheckSVEEnabled();
if VL < esize * 4 then UNDEFINED;
integer segments = VL DIV (4 * esize);
bits(VL) result = Zeros();
bits(4*esize) op1, op2;
bits(4*esize) res, addend;
op1 = Elem[operand1, s, 4*esize];
op2 = Elem[operand2, s, 4*esize];
addend = Elem[operand3, s, 4*esize];
res = FPMatMulAdd(addend, op1, op2, esize, FPCR<31:0>);
Elem[result, s, 4*esize] = res;
Z[da] = result;
UNPREDICTABLE:
FMMLA Page 1683

FMOV (zero, predicated)
Move floating-point +0.0 to vector elements (predicated).
Move floating-point constant +0.0 to to each active element in the destination vector. Inactive elements in the
This is a pseudo-instruction of CPY (immediate, merging). This means:
• The encodings in this description are named to match the encodings of CPY (immediate, merging).
• The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 0 0 0 0 0 0 0 0 0 Zd
M sh imm8
SVE
FMOV <Zd>.<T>, <Pg>/M, #0.0
is equivalent to
CPY <Zd>.<T>, <Pg>/M, #0
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
UNPREDICTABLE:
FMOV (zero, predicated) Page 1684

FMOV (zero, unpredicated)
Move floating-point +0.0 to vector elements (unpredicated).
Unconditionally broadcast the floating-point constant +0.0 into each element of the destination vector. This instruction
is unpredicated.
This is a pseudo-instruction of DUP (immediate). This means:
• The encodings in this description are named to match the encodings of DUP (immediate).
• The description of DUP (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Zd
sh imm8
SVE
FMOV <Zd>.<T>, #0.0
is equivalent to
DUP <Zd>.<T>, #0
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of DUP (immediate) gives the operational pseudocode for this instruction.
FMOV (zero, unpredicated) Page 1685

FMOV (immediate, predicated)
Move 8-bit floating-point immediate to vector elements (predicated).
Move a floating-point immediate into each active element in the destination vector. Inactive elements in the
This is an alias of FCPY. This means:
• The encodings in this description are named to match the encodings of FCPY.
• The description of FCPY gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 1 1 0 imm8 Zd
SVE
FMOV <Zd>.<T>, <Pg>/M, #<const>
is equivalent to
FCPY <Zd>.<T>, <Pg>/M, #<const>
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of FCPY gives the operational pseudocode for this instruction.
UNPREDICTABLE:
FMOV (immediate,
Page 1686
predicated)
FMOV (immediate, unpredicated)
Move 8-bit floating-point immediate to vector elements (unpredicated).
Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is
unpredicated.
This is an alias of FDUP. This means:
• The encodings in this description are named to match the encodings of FDUP.
• The description of FDUP gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 1 1 1 0 imm8 Zd
SVE
FMOV <Zd>.<T>, #<const>
is equivalent to
FDUP <Zd>.<T>, #<const>
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of FDUP gives the operational pseudocode for this instruction.
FMOV (immediate,
Page 1687
unpredicated)
FMSB
Floating-point fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za + -Zdn * Zm].
elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination
and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 0 1 Pg Zm Zdn
N op
SVE
FMSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
FMSB Page 1688

Operation
CheckSVEEnabled();
bits(VL) result;

else
Z[dn] = result;
UNPREDICTABLE:
FMSB Page 1689

FMUL (immediate)
Floating-point multiply by immediate (predicated).
Multiply by an immediate each active floating-point element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate may take the value +0.5 or +2.0 only. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 1 0 1 0 0 Pg 0 0 0 0 i1 Zdn
SVE
FMUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

bits(esize) imm = if i1 == '0' then FPPointFive('0') else FPTwo('0');
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
i1 <const>
0 #0.5
1 #2.0
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMul(element1, imm, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMUL (immediate) Page 1690

FMUL (immediate) Page 1691

FMUL (vectors, predicated)
Floating-point multiply vectors (predicated).
Multiply active floating-point elements of the first source vector by corresponding floating-point elements of the
second source vector and destructively place the results in the corresponding elements of the first source vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 0 1 0 0 Pg Zm Zdn
SVE
FMUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMul(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMUL (vectors, predicated) Page 1692

FMUL (vectors, predicated) Page 1693

FMUL (vectors, unpredicated)
Floating-point multiply vectors (unpredicated).
Multiply all elements of the first source vector by corresponding floating-point elements of the second source vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 1 0 Zn Zd
SVE
FMUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
FMUL (vectors, unpredicated) Page 1694

FMUL (indexed)
Floating-point multiply by indexed elements.
the corresponding second source vector segment. The results are placed in the corresponding elements of the
destination vector.
Half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 1 0 0 0 Zn Zd
Half-precision
FMUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]

integer esize = 16;
Single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 1 0 0 0 Zn Zd
size<1>size<0>
Single-precision
FMUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>]

integer esize = 32;
Double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 1 0 0 0 Zn Zd
size<1>size<0>
Double-precision
FMUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>]

integer esize = 64;
FMUL (indexed) Page 1695

Assembler Symbols
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
FMUL (indexed) Page 1696

FMULX
Floating-point multiply-extended vectors (predicated).
Multiply active floating-point elements of the first source vector by corresponding floating-point elements of the
second source vector except that ∞×0.0 gives 2.0 instead of NaN, and destructively place the results in the
corresponding elements of the first source vector. Inactive elements in the destination vector register remain
unmodified.
The instruction can be used with FRECPX to safely convert arbitrary elements in mathematical vector space to UNIT
VECTORS or DIRECTION VECTORS with length 1.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 1 0 1 0 0 Pg Zm Zdn
SVE
FMULX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPMulX(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FMULX Page 1697

FMULX Page 1698

FNEG
Floating-point negate (predicated).
Negate each active floating-point element of the source vector, and place the results in the corresponding elements of
the destination vector. This inverts the sign bit and cannot signal a floating-point exception. Inactive elements in the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 0 1 1 0 1 Pg Zn Zd
SVE
FNEG <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = FPNeg(element);
Z[d] = result;
UNPREDICTABLE:
FNEG Page 1699

FNEG Page 1700

FNMAD
Floating-point negated fused multiply-add vectors (predicated), writing multiplicand [Zdn = -Za + -Zdn * Zm].
the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination
and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 1 0 Pg Zm Zdn
N op
SVE
FNMAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
FNMAD Page 1701

Operation
CheckSVEEnabled();
bits(VL) result;

else
Z[dn] = result;
UNPREDICTABLE:
FNMAD Page 1702

FNMLA
Floating-point negated fused multiply-add vectors (predicated), writing addend [Zda = -Zda + -Zn * Zm].
the third source (addend) vector without intermediate rounding. Destructively place the negated results in the
destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 1 0 Pg Zn Zda
N op
SVE
FNMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
FNMLA Page 1703

Operation
CheckSVEEnabled();
bits(VL) result;

else
Z[da] = result;
UNPREDICTABLE:
FNMLA Page 1704

FNMLS
Floating-point negated fused multiply-subtract vectors (predicated), writing addend [Zda = -Zda + Zn * Zm].
elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in
the destination and third source (addend) vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 1 1 Pg Zn Zda
N op
SVE
FNMLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
FNMLS Page 1705

Operation
CheckSVEEnabled();
bits(VL) result;

else
Z[da] = result;
UNPREDICTABLE:
FNMLS Page 1706

FNMSB
Floating-point negated fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = -Za + Zdn * Zm].
elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the
destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 1 1 Pg Zm Zdn
N op
SVE
FNMSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
FNMSB Page 1707

Operation
CheckSVEEnabled();
bits(VL) result;

else
Z[dn] = result;
UNPREDICTABLE:
FNMSB Page 1708

FRECPE
Floating-point reciprocal estimate (unpredicated).
Find the approximate reciprocal of each floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 1 0 0 0 1 1 0 0 Zn Zd
SVE
FRECPE <Zd>.<T>, <Zn>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPRecipEstimate(element, FPCR<31:0>);
Z[d] = result;
FRECPE Page 1709

FRECPS
Floating-point reciprocal step (unpredicated).
Multiply corresponding floating-point elements of the first and second source vectors, subtract the products from 2.0
without intermediate rounding and place the results in the corresponding elements of the destination vector. This
This instruction can be used to perform a single Newton-Raphson iteration for calculating the reciprocal of a vector of
floating-point values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 1 1 0 Zn Zd
SVE
FRECPS <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPRecipStepFused(element1, element2);
Z[d] = result;
FRECPS Page 1710

FRECPX
Floating-point reciprocal exponent (predicated).
Invert the exponent and zero the fractional part of each active floating-point element of the source vector, and place
the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
The result of this instruction can be used with FMULX to convert arbitrary elements in mathematical vector space to
"unit vectors" or "direction vectors" of length 1.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 0 1 0 1 Pg Zn Zd
SVE
FRECPX <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = FPRecpX(element, FPCR<31:0>);
Z[d] = result;
UNPREDICTABLE:
FRECPX Page 1711

FRECPX Page 1712

FRINT<r>
Floating-point round to integral value (predicated).
Round to an integral floating-point value with the specified rounding option from each active floating-point element of
the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in
The <r> symbol specifies one of the following rounding options: N (to nearest, with ties to even), A (to nearest, with
ties away from zero), M (toward minus Infinity), P (toward plus Infinity), Z (toward zero), I (current FPCR rounding
mode), or X (current FPCR rounding mode, signalling inexact).
It has encodings from 7 classes: Current mode , Current mode signalling inexact , Nearest with ties to away , Nearest
with ties to even , Toward zero , Toward minus infinity and Toward plus infinity
Current mode
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 1 0 1 Pg Zn Zd
Current mode
FRINTI <Zd>.<T>, <Pg>/M, <Zn>.<T>

FPRounding rounding = FPRoundingMode(FPCR<31:0>);
Current mode signalling inexact
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 1 0 1 Pg Zn Zd
Current mode signalling inexact
FRINTX <Zd>.<T>, <Pg>/M, <Zn>.<T>

boolean exact = TRUE;
Nearest with ties to away
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 1 0 1 Pg Zn Zd
FRINT<r> Page 1713

Nearest with ties to away
FRINTA <Zd>.<T>, <Pg>/M, <Zn>.<T>

Nearest with ties to even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 1 0 1 Pg Zn Zd
Nearest with ties to even
FRINTN <Zd>.<T>, <Pg>/M, <Zn>.<T>

FPRounding rounding = FPRounding_TIEEVEN;
Toward zero
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 1 1 0 1 Pg Zn Zd
Toward zero
FRINTZ <Zd>.<T>, <Pg>/M, <Zn>.<T>

Toward minus infinity
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 0 1 0 1 Pg Zn Zd
FRINT<r> Page 1714

Toward minus infinity
FRINTM <Zd>.<T>, <Pg>/M, <Zn>.<T>

FPRounding rounding = FPRounding_NEGINF;
Toward plus infinity
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 1 1 0 1 Pg Zn Zd
Toward plus infinity
FRINTP <Zd>.<T>, <Pg>/M, <Zn>.<T>

FPRounding rounding = FPRounding_POSINF;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = FPRoundInt(element, FPCR<31:0>, rounding, exact);
Z[d] = result;
UNPREDICTABLE:
FRINT<r> Page 1715

FRINT<r> Page 1716

FRSQRTE
Floating-point reciprocal square root estimate (unpredicated).
Find the approximate reciprocal square root of each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 1 1 0 0 1 1 0 0 Zn Zd
SVE
FRSQRTE <Zd>.<T>, <Zn>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPRSqrtEstimate(element, FPCR<31:0>);
Z[d] = result;
FRSQRTE Page 1717

FRSQRTS
Floating-point reciprocal square root step (unpredicated).
Multiply corresponding floating-point elements of the first and second source vectors, subtract the products from 3.0
and divide the results by 2.0 without any intermediate rounding and place the results in the corresponding elements of
the destination vector. This instruction is unpredicated.
This instruction can be used to perform a single Newton-Raphson iteration for calculating the reciprocal square root of
a vector of floating-point values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 1 1 1 Zn Zd
SVE
FRSQRTS <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPRSqrtStepFused(element1, element2);
Z[d] = result;
FRSQRTS Page 1718

FSCALE
Floating-point adjust exponent by vector (predicated).
Multiply the active floating-point elements of the first source vector by 2.0 to the power of the signed integer values in
the corresponding elements of the second source vector and destructively place the results in the corresponding
elements of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 0 1 1 0 0 Pg Zm Zdn
SVE
FSCALE <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPScale(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FSCALE Page 1719

FSCALE Page 1720

FSQRT
Floating-point square root (predicated).
Calculate the square root of each active floating-point element of the source vector, and place the results in the
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 1 1 0 1 Pg Zn Zd
SVE
FSQRT <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = FPSqrt(element, FPCR<31:0>);
Z[d] = result;
UNPREDICTABLE:
FSQRT Page 1721

FSQRT Page 1722

FSUB (immediate)
Floating-point subtract immediate (predicated).
Subtract an immediate from each active floating-point element of the source vector, and destructively place the results
in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 1 1 0 0 Pg 0 0 0 0 i1 Zdn
SVE
FSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
i1 <const>
0 #0.5
1 #1.0
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPSub(element1, imm, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FSUB (immediate) Page 1723

FSUB (immediate) Page 1724

FSUB (vectors, predicated)
Floating-point subtract vectors (predicated).
Subtract active floating-point elements of the second source vector from corresponding floating-point elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 1 1 0 0 Pg Zm Zdn
SVE
FSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPSub(element1, element2, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FSUB (vectors, predicated) Page 1725

FSUB (vectors, predicated) Page 1726

FSUB (vectors, unpredicated)
Floating-point subtract vectors (unpredicated).
Subtract all floating-point elements of the second source vector from corresponding elements of the first source vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 0 1 Zn Zd
SVE
FSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
FSUB (vectors, unpredicated) Page 1727

FSUBR (immediate)
Floating-point reversed subtract from immediate (predicated).
Reversed subtract from an immediate each active floating-point element of the source vector, and destructively place
the results in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 1 1 1 0 0 Pg 0 0 0 0 i1 Zdn
SVE
FSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
i1 <const>
0 #0.5
1 #1.0
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPSub(imm, element1, FPCR<31:0>);
else
Z[dn] = result;
UNPREDICTABLE:
FSUBR (immediate) Page 1728

FSUBR (immediate) Page 1729

FSUBR (vectors)
Floating-point reversed subtract vectors (predicated).
Reversed subtract active floating-point elements of the first source vector from corresponding floating-point elements
of the second source vector and destructively place the results in the corresponding elements of the first source
vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 1 1 0 0 Pg Zm Zdn
SVE
FSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
FSUBR (vectors) Page 1730

FSUBR (vectors) Page 1731

FTMAD
Floating-point trigonometric multiply-add coefficient.
The FTMAD instruction calculates the series terms for either SIN(X) or COS(X), where the argument X has been adjusted
to be in the range -π/4 < X ≤ π/4.
To calculate the series terms of SIN(X) and COS(X) the initial source operands of FTMAD should be zero in the first source
vector and X2 in the second source vector. The FTMAD instruction is then executed eight times to calculate the sum of
eight series terms, which gives a result of sufficient precision.
The FTMAD instruction multiplies each element of the first source vector by the absolute value of the corresponding
element of the second source vector and performs a fused addition of each product with a value obtained from a table
of hard-wired coefficients, and places the results destructively in the first source vector.
The coefficients are different for SIN(X) and COS(X), and are selected by a combination of the sign bit in the second
source element and an immediate index in the range 0 to 7.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 imm3 1 0 0 0 0 0 Zm Zdn
SVE
FTMAD <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, #<imm>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
<imm> Is the unsigned immediate operand, in the range 0 to 7, encoded in the "imm3" field.
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPTrigMAdd(imm, element1, element2, FPCR<31:0>);
Z[dn] = result;
FTMAD Page 1732

UNPREDICTABLE:
FTMAD Page 1733

FTSMUL
Floating-point trigonometric starting value.
The FTSMUL instruction calculates the initial value for the FTMAD instruction. The instruction squares each element in
the first source vector and then sets the sign bit to a copy of bit 0 of the corresponding element in the second source
register, and places the results in the destination vector. This instruction is unpredicated.
To compute SIN(X) or COS(X) the instruction is executed with elements of the first source vector set to X, adjusted to be
in the range -π/4 < X ≤ π/4.
The elements of the second source vector hold the corresponding value of the quadrant Q number as an integer not a
floating-point value. The value Q satisfies the relationship (2q-1) × π/4 < X ≤ (2q+1) × π/4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 1 1 Zn Zd
SVE
FTSMUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPTrigSMul(element1, element2, FPCR<31:0>);
Z[d] = result;
FTSMUL Page 1734

FTSSEL
Floating-point trigonometric select coefficient.
The FTSSEL instruction selects the coefficient for the final multiplication in the polynomial series approximation. The
instruction places the value 1.0 or a copy of the first source vector element in the destination element, depending on
bit 0 of the quadrant number Q held in the corresponding element of the second source vector. The sign bit of the
destination element is copied from bit 1 of the corresponding value of Q. This instruction is unpredicated.
To compute SIN(X) or COS(X) the instruction is executed with elements of the first source vector set to X, adjusted to be
in the range -π/4 < X ≤ π/4.
The elements of the second source vector hold the corresponding value of the quadrant Q number as an integer not a
floating-point value. The value Q satisfies the relationship (2q-1) × π/4 < X ≤ (2q+1) × π/4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 1 1 0 0 Zn Zd
SVE
FTSSEL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = FPTrigSSel(element1, element2);
Z[d] = result;
FTSSEL Page 1735

INCB, INCD, INCH, INCW (scalar)
Increment scalar by multiple of predicate constraint element count.
in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination.
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D
Byte
INCB <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D
Doubleword
INCD <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D
INCB, INCD, INCH, INCW

Page 1736
(scalar)
Halfword
INCH <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D
Word
INCW <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
X[dn] = operand1 + (count * imm);

Page 1737
(scalar)

Page 1738
(scalar)
INCD, INCH, INCW (vector)
Increment vector by multiple of predicate constraint element count.
in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements.
It has encodings from 3 classes: Doubleword , Halfword and Word
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D
Doubleword
INCD <Zdn>.D{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D
Halfword
INCH <Zdn>.H{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D
INCD, INCH, INCW (vector) Page 1739

Word
INCW <Zdn>.S{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = Elem[operand1, e, esize] + (count * imm);
Z[dn] = result;
UNPREDICTABLE:


INCP (scalar)
Increment scalar by count of true predicate elements.
Counts the number of true elements in the source predicate and then uses the result to increment the scalar
destination.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 0 1 0 0 Pm Rdn
D
SVE
INCP <Xdn>, <Pm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer count = 0;
count = count + 1;
X[dn] = operand1 + count;
INCP (scalar) Page 1742

INCP (vector)
Increment vector by count of true predicate elements.
Counts the number of true elements in the source predicate and then uses the result to increment all destination
vector elements.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 0 0 0 0 Pm Zdn
D
SVE
INCP <Zdn>.<T>, <Pm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer count = 0;
count = count + 1;
Elem[result, e, esize] = Elem[operand1, e, esize] + count;
Z[dn] = result;
UNPREDICTABLE:
INCP (vector) Page 1743

INCP (vector) Page 1744

INDEX (immediates)
Create index starting from and incremented by immediate.
Populates the destination vector by setting the first element to the first signed immediate integer operand and
monotonically incrementing the value by the second signed immediate integer operand for each subsequent element.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 imm5b 0 1 0 0 0 0 imm5 Zd
SVE
INDEX <Zd>.<T>, #<imm1>, #<imm2>

integer imm1 = SInt(imm5);
integer imm2 = SInt(imm5b);
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
<imm1> Is the first signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.
<imm2> Is the second signed immediate operand, in the range -16 to 15, encoded in the "imm5b" field.
Operation
CheckSVEEnabled();
bits(VL) result;
integer index = imm1 + e * imm2;
Elem[result, e, esize] = index<esize-1:0>;
Z[d] = result;
INDEX (immediates) Page 1745

INDEX (immediate, scalar)
Create index starting from immediate and incremented by general-purpose register.
Populates the destination vector by setting the first element to the first signed immediate integer operand and
monotonically incrementing the value by the second signed scalar integer operand for each subsequent element. The
scalar source operand is a general-purpose register in which only the least significant bits corresponding to the vector
element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Rm 0 1 0 0 1 0 imm5 Zd
SVE
INDEX <Zd>.<T>, #<imm>, <R><m>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
size <R>
01 W
x0 W
11 X
field.
Operation
CheckSVEEnabled();
integer element2 = SInt(operand2);
bits(VL) result;
integer index = imm + e * element2;
Z[d] = result;
INDEX (immediate, scalar) Page 1746

INDEX (scalar, immediate)
Create index starting from general-purpose register and incremented by immediate.
Populates the destination vector by setting the first element to the first signed scalar integer operand and
monotonically incrementing the value by the second signed immediate integer operand for each subsequent element.
The scalar source operand is a general-purpose register in which only the least significant bits corresponding to the
vector element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 imm5 0 1 0 0 0 1 Rn Zd
SVE
INDEX <Zd>.<T>, <R><n>, #<imm>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

size <R>
01 W
x0 W
11 X
field.
Operation
CheckSVEEnabled();
bits(VL) result;
integer index = element1 + e * imm;
Z[d] = result;
INDEX (scalar, immediate) Page 1747

INDEX (scalars)
Create index starting from and incremented by general-purpose register.
Populates the destination vector by setting the first element to the first signed scalar integer operand and
monotonically incrementing the value by the second signed scalar integer operand for each subsequent element. The
scalar source operands are general-purpose registers in which only the least significant bits corresponding to the
vector element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Rm 0 1 0 0 1 1 Rn Zd
SVE
INDEX <Zd>.<T>, <R><n>, <R><m>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

size <R>
01 W
x0 W
11 X
field.
field.
Operation
CheckSVEEnabled();
bits(VL) result;
integer index = element1 + e * element2;
Z[d] = result;
INDEX (scalars) Page 1748

INSR (scalar)
Insert general-purpose register in shifted vector.
Shift the destination vector left by one element, and then place a copy of the least-significant bits of the general-
purpose register in element 0 of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 0 0 0 1 1 1 0 Rm Zdn
SVE
INSR <Zdn>.<T>, <R><m>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

size <R>
01 W
x0 W
11 X
field.
Operation
CheckSVEEnabled();
bits(VL) dest = Z[dn];
bits(esize) src = X[m];
Z[dn] = dest<VL-esize-1:0> : src;
UNPREDICTABLE:
INSR (scalar) Page 1749

INSR (SIMD&FP scalar)
Insert SIMD&FP scalar register in shifted vector.
Shift the destination vector left by one element, and then place a copy of the SIMD&FP scalar register in element 0 of
the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 1 0 0 0 0 1 1 1 0 Vm Zdn
SVE
INSR <Zdn>.<T>, <V><m>

integer m = UInt(Vm);
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

size <V>
00 B
01 H
10 S
11 D
<m> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vm" field.
Operation
CheckSVEEnabled();
bits(VL) dest = Z[dn];
bits(esize) src = V[m];
Z[dn] = dest<VL-esize-1:0> : src;
UNPREDICTABLE:
INSR (SIMD&FP scalar) Page 1750

LASTA (scalar)
Extract element after last to general-purpose register.
If there is an active element then extract the element after the last active element modulo the number of elements
from the final source vector register. If there are no active elements, extract element zero. Then zero-extend and place
the extracted element in the destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 1 Pg Zn Rd
B
SVE
LASTA <R><d>, <Pg>, <Zn>.<T>

integer rsize = if esize < 64 then 32 else 64;
Assembler Symbols

size <R>
01 W
x0 W
11 X
<d> Is the number [0-30] of the destination general-purpose register or the name ZR (31), encoded in the
"Rd" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(rsize) result;
if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
result = ZeroExtend(Elem[operand, last, esize]);
X[d] = result;
LASTA (scalar) Page 1751

LASTA (scalar) Page 1752

LASTA (SIMD&FP scalar)
Extract element after last to SIMD&FP scalar register.
If there is an active element then extract the element after the last active element modulo the number of elements
from the final source vector register. If there are no active elements, extract element zero. Then place the extracted
element in the destination SIMD&FP scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 0 1 0 0 Pg Zn Vd
B
SVE
LASTA <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
if isBefore then
else
last = last + 1;
V[d] = Elem[operand, last, esize];
LASTA (SIMD&FP scalar) Page 1753

LASTB (scalar)
Extract last element to general-purpose register.
If there is an active element then extract the last active element from the final source vector register. If there are no
active elements, extract the highest-numbered element. Then zero-extend and place the extracted element in the
destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 1 1 0 1 Pg Zn Rd
B
SVE
LASTB <R><d>, <Pg>, <Zn>.<T>

integer rsize = if esize < 64 then 32 else 64;
Assembler Symbols

size <R>
01 W
x0 W
11 X
<d> Is the number [0-30] of the destination general-purpose register or the name ZR (31), encoded in the
"Rd" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(rsize) result;
if isBefore then
else
last = last + 1;
result = ZeroExtend(Elem[operand, last, esize]);
X[d] = result;
LASTB (scalar) Page 1754

LASTB (scalar) Page 1755

LASTB (SIMD&FP scalar)
Extract last element to SIMD&FP scalar register.
If there is an active element then extract the last active element from the final source vector register. If there are no
active elements, extract the highest-numbered element. Then place the extracted element in the destination SIMD&FP
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 1 1 0 0 Pg Zn Vd
B
SVE
LASTB <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
if isBefore then
else
last = last + 1;
V[d] = Elem[operand, last, esize];
LASTB (SIMD&FP scalar) Page 1756

LD1B (vector plus immediate)
Gather load unsigned bytes to vector (immediate index).
Gather load of unsigned bytes to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is in the range 0 to 31. Inactive elements will not read Device memory or signal
faults, and are set to zero in the destination vector.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LD1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer t = UInt(Zt);
integer esize = 32;
integer msize = 8;
integer offset = UInt(imm5);
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LD1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.
LD1B (vector plus immediate) Page 1757

Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_NORMAL];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Z[t] = result;
LD1B (vector plus immediate) Page 1758

LD1B (scalar plus immediate)
Contiguous load unsigned bytes to vector (immediate index).
Contiguous load of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not read Device memory or signal a fault, and are
set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element
8-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
8-bit element
LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer msize = 8;
integer offset = SInt(imm4);
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer msize = 8;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
LD1B (scalar plus immediate) Page 1759

32-bit element
LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
if LastActiveElement(mask, esize) >= 0 ||
ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
base = X[n];
addr = base + offset * elements * mbytes;

else
addr = addr + mbytes;
Z[t] = result;

LD1B (scalar plus scalar)
Contiguous load unsigned bytes to vector (scalar index).
Contiguous load of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements will not read Device memory or signal a fault,
and are set to zero in the destination vector.
8-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
8-bit element
LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

if Rm == '11111' then UNDEFINED;
integer esize = 8;
integer msize = 8;
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 16;
integer msize = 8;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
LD1B (scalar plus scalar) Page 1762

32-bit element
LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 64;
integer msize = 8;
Assembler Symbols
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
bits(64) offset = X[m];
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
addr = base + UInt(offset) * mbytes;
else
offset = offset + 1;
Z[t] = result;

LD1B (scalar plus vector)
Gather load unsigned bytes to vector (vector index).
Gather load of unsigned bytes to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to 64 bits. Inactive
elements will not read Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset
32-bit unpacked unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 8;
integer offs_size = 32;
boolean offs_unsigned = xs == '0';
integer scale = 0;
32-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 1 0 Pg Rn Zt
U ff
LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1B (scalar plus vector) Page 1765

LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 8;
boolean offs_unsigned = TRUE;
integer scale = 0;
Assembler Symbols
<mod> Is the index extend and shift specifier, encoded in “xs”:
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset = Z[m];
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
addr = base + (off << scale);
else
Z[t] = result;
LD1B (scalar plus vector) Page 1766

LD1D (vector plus immediate)
Gather load doublewords to vector (immediate index).
Gather load of doublewords to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements will not read Device
memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
SVE
LD1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 64;
Assembler Symbols
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
else
Z[t] = result;
LD1D (vector plus immediate) Page 1767

LD1D (scalar plus immediate)
Contiguous load doublewords to vector (immediate index).
Contiguous load of doublewords to elements of a vector register from the memory address generated by a 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
SVE
LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 64;
Assembler Symbols
"imm4" field.
LD1D (scalar plus immediate) Page 1768

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;

LD1D (scalar plus scalar)
Contiguous load doublewords to vector (scalar index).
Contiguous load of doublewords to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 8 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not read Device memory or
signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
SVE
LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer msize = 64;
Assembler Symbols
LD1D (scalar plus scalar) Page 1770

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;

LD1D (scalar plus vector)
Gather load doublewords to vector (vector index).
Gather load of doublewords to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 8. Inactive elements will not read Device memory or signal faults, and are set to zero in the
destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset
32-bit unpacked scaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #3]

integer esize = 64;
integer msize = 64;
integer scale = 3;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 64;
integer scale = 0;
64-bit scaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 1 Zm 1 1 0 Pg Rn Zt
U ff
LD1D (scalar plus vector) Page 1772

LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3]

integer esize = 64;
integer msize = 64;
integer scale = 3;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 64;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;

LD1H (vector plus immediate)
Gather load unsigned halfwords to vector (immediate index).
Gather load of unsigned halfwords to active elements of a vector register from memory addresses generated by a
vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not read
Device memory or signal faults, and are set to zero in the destination vector.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LD1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LD1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
LD1H (vector plus immediate) Page 1775

Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
else
Z[t] = result;
LD1H (vector plus immediate) Page 1776

LD1H (scalar plus immediate)
Contiguous load unsigned halfwords to vector (immediate index).
Contiguous load of unsigned halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not read Device memory or signal a
fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer msize = 16;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
LD1H (scalar plus immediate) Page 1777

64-bit element
LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;

LD1H (scalar plus scalar)
Contiguous load unsigned halfwords to vector (scalar index).
Contiguous load of unsigned halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements will not read Device memory
or signal a fault, and are set to zero in the destination vector.
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
integer msize = 16;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
LD1H (scalar plus scalar) Page 1779

64-bit element
LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 64;
integer msize = 16;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;

LD1H (scalar plus vector)
Gather load unsigned halfwords to vector (vector index).
Gather load of unsigned halfwords to active elements of a vector register from memory addresses generated by a
64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and
then optionally multiplied by 2. Inactive elements will not read Device memory or signal faults, and are set to zero in
the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]

integer esize = 32;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1H (scalar plus vector) Page 1781

LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 1 0 Pg Rn Zt
U ff
LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 1 0 Pg Rn Zt
U ff
LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 16;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;

LD1RB
Load and broadcast unsigned byte to vector.
Load a single unsigned byte from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is in the range 0 to 63.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
8-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 0 1 imm6 1 0 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>
8-bit element
LD1RB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 8;
integer msize = 8;
16-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 0 1 imm6 1 0 1 Pg Rn Zt
16-bit element
LD1RB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 16;
integer msize = 8;
32-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 0 1 imm6 1 1 0 Pg Rn Zt
LD1RB Page 1784

32-bit element
LD1RB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 32;
integer msize = 8;
64-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 0 1 imm6 1 1 1 Pg Rn Zt
64-bit element
LD1RB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm6" field.
LD1RB Page 1785

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

if last >= 0 then
addr = base + offset * mbytes;
else
Z[t] = result;
LD1RB Page 1786

LD1RD
Load and broadcast doubleword to vector.
Load a single doubleword from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is a multiple of 8 in the range 0 to 504.
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 1 1 imm6 1 1 1 Pg Rn Zt
SVE
LD1RD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
integer msize = 64;
Assembler Symbols
LD1RD Page 1787

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

if last >= 0 then
else
Z[t] = result;
LD1RD Page 1788

LD1RH
Load and broadcast unsigned halfword to vector.
Load a single unsigned halfword from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is a multiple of 2 in the range 0 to 126.
16-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 1 1 imm6 1 0 1 Pg Rn Zt
16-bit element
LD1RH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 16;
integer msize = 16;
32-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 1 1 imm6 1 1 0 Pg Rn Zt
32-bit element
LD1RH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 32;
integer msize = 16;
64-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 1 1 imm6 1 1 1 Pg Rn Zt
LD1RH Page 1789

64-bit element
LD1RH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

if last >= 0 then
else
Z[t] = result;
LD1RH Page 1790

LD1ROB (scalar plus immediate)
Contiguous load and replicate thirty-two bytes (immediate index).
Load thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not read Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first thirty-two predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1ROB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 8;
Assembler Symbols
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
LD1ROB (scalar plus

Page 1791
immediate)
Operation
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(64) addr;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
addr = base + offset * 32;

Elem[result, e, esize] = Mem[addr, mbytes, AccType_NORMAL];
else
Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);
LD1ROB (scalar plus

Page 1792
immediate)
LD1ROB (scalar plus scalar)
Contiguous load and replicate thirty-two bytes (scalar index).
Load thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and scalar index which is added to the base address.
Only the first thirty-two predicate elements are used and higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1ROB { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 8;
Assembler Symbols
LD1ROB (scalar plus scalar) Page 1793

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(256) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];

else
LD1ROB (scalar plus scalar) Page 1794

LD1ROD (scalar plus immediate)
Contiguous load and replicate four doublewords (immediate index).
Load four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Only the first four predicate elements are used and higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1ROD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
Assembler Symbols
LD1ROD (scalar plus

Page 1795
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(256) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
LD1ROD (scalar plus

Page 1796
immediate)
LD1ROD (scalar plus scalar)
Contiguous load and replicate four doublewords (scalar index).
Load four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and scalar index which is multiplied by 8 and added to the base address.
Only the first four predicate elements are used and higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1ROD { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
Assembler Symbols
LD1ROD (scalar plus scalar) Page 1797

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(256) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];

else
LD1ROD (scalar plus scalar) Page 1798

LD1ROH (scalar plus immediate)
Contiguous load and replicate sixteen halfwords (immediate index).
Load sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by
address.
Only the first sixteen predicate elements are used and higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1ROH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 16;
Assembler Symbols
LD1ROH (scalar plus

Page 1799
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(256) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
LD1ROH (scalar plus

Page 1800
immediate)
LD1ROH (scalar plus scalar)
Contiguous load and replicate sixteen halfwords (scalar index).
Load sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by
Only the first sixteen predicate elements are used and higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1ROH { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
Assembler Symbols
LD1ROH (scalar plus scalar) Page 1801

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(256) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];

else
LD1ROH (scalar plus scalar) Page 1802

LD1ROW (scalar plus immediate)
Contiguous load and replicate eight words (immediate index).
Load eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Only the first eight predicate elements are used and higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1ROW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 32;
Assembler Symbols
LD1ROW (scalar plus

Page 1803
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(256) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
LD1ROW (scalar plus

Page 1804
immediate)
LD1ROW (scalar plus scalar)
Contiguous load and replicate eight words (scalar index).
Load eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and scalar index which is multiplied by 4 and added to the base address.
Only the first eight predicate elements are used and higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1ROW { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
Assembler Symbols
LD1ROW (scalar plus scalar) Page 1805

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(256) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];

else
LD1ROW (scalar plus scalar) Page 1806

LD1RQB (scalar plus immediate)
Contiguous load and replicate sixteen bytes (immediate index).
Load sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated
by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the
base address.
Inactive elements will not read Device memory or signal a fault, and are set to zero. The resulting short vector is then
replicated to fill the long destination vector. Only the first sixteen predicate elements are used and higher numbered
predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1RQB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 8;
Assembler Symbols
LD1RQB (scalar plus

Page 1807
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = Replicate(result, VL DIV 128);
LD1RQB (scalar plus

Page 1808
immediate)
LD1RQB (scalar plus scalar)
Contiguous load and replicate sixteen bytes (scalar index).
Load sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated
by a 64-bit scalar base address and scalar index which is added to the base address.
replicated to fill the long destination vector. Only the first sixteen predicate elements are used and higher numbered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1RQB { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 8;
Assembler Symbols
LD1RQB (scalar plus scalar) Page 1809

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(128) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];

else
LD1RQB (scalar plus scalar) Page 1810

LD1RQD (scalar plus immediate)
Contiguous load and replicate two doublewords (immediate index).
Load two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112
added to the base address.
replicated to fill the long destination vector. Only the first two predicate elements are used and higher numbered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1RQD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
Assembler Symbols
LD1RQD (scalar plus

Page 1811
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(128) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
LD1RQD (scalar plus

Page 1812
immediate)
LD1RQD (scalar plus scalar)
Contiguous load and replicate two doublewords (scalar index).
Load two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and scalar index which is multiplied by 8 and added to the base address.
replicated to fill the long destination vector. Only the first two predicate elements are used and higher numbered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1RQD { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
Assembler Symbols
LD1RQD (scalar plus scalar) Page 1813

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(128) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];

else
LD1RQD (scalar plus scalar) Page 1814

LD1RQH (scalar plus immediate)
Contiguous load and replicate eight halfwords (immediate index).
Load eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112
added to the base address.
replicated to fill the long destination vector. Only the first eight predicate elements are used and higher numbered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1RQH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 16;
Assembler Symbols
LD1RQH (scalar plus

Page 1815
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(128) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
LD1RQH (scalar plus

Page 1816
immediate)
LD1RQH (scalar plus scalar)
Contiguous load and replicate eight halfwords (scalar index).
Load eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and scalar index which is multiplied by 2 and added to the base address.
replicated to fill the long destination vector. Only the first eight predicate elements are used and higher numbered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1RQH { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
Assembler Symbols
LD1RQH (scalar plus scalar) Page 1817

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(128) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];

else
LD1RQH (scalar plus scalar) Page 1818

LD1RQW (scalar plus immediate)
Contiguous load and replicate four words (immediate index).
Load four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by
address.
replicated to fill the long destination vector. Only the first four predicate elements are used and higher numbered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1RQW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 32;
Assembler Symbols
LD1RQW (scalar plus

Page 1819
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(128) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
LD1RQW (scalar plus

Page 1820
immediate)
LD1RQW (scalar plus scalar)
Contiguous load and replicate four words (scalar index).
Load four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by
replicated to fill the long destination vector. Only the first four predicate elements are used and higher numbered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
SVE
LD1RQW { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
Assembler Symbols
LD1RQW (scalar plus scalar) Page 1821

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(128) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];

else
LD1RQW (scalar plus scalar) Page 1822

LD1RSB
Load and broadcast signed byte to vector.
Load a single signed byte from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is in the range 0 to 63.
16-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 1 1 imm6 1 1 0 Pg Rn Zt
16-bit element
LD1RSB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 16;
integer msize = 8;
32-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 1 1 imm6 1 0 1 Pg Rn Zt
32-bit element
LD1RSB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 32;
integer msize = 8;
64-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 1 1 imm6 1 0 0 Pg Rn Zt
LD1RSB Page 1823

64-bit element
LD1RSB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm6" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

if last >= 0 then
else
Z[t] = result;
LD1RSB Page 1824

LD1RSH
Load and broadcast signed halfword to vector.
Load a single signed halfword from a memory address generated by a 64-bit scalar base address plus an immediate
32-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 0 1 imm6 1 0 1 Pg Rn Zt
32-bit element
LD1RSH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 32;
integer msize = 16;
64-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 0 1 imm6 1 0 0 Pg Rn Zt
64-bit element
LD1RSH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
LD1RSH Page 1825

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

if last >= 0 then
else
Z[t] = result;
LD1RSH Page 1826

LD1RSW
Load and broadcast signed word to vector.
Load a single signed word from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is a multiple of 4 in the range 0 to 252.
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 1 1 imm6 1 0 0 Pg Rn Zt
SVE
LD1RSW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
LD1RSW Page 1827

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

if last >= 0 then
else
Z[t] = result;
LD1RSW Page 1828

LD1RW
Load and broadcast unsigned word to vector.
Load a single unsigned word from a memory address generated by a 64-bit scalar base address plus an immediate
32-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 0 1 imm6 1 1 0 Pg Rn Zt
32-bit element
LD1RW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 32;
integer msize = 32;
64-bit element
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 0 1 imm6 1 1 1 Pg Rn Zt
64-bit element
LD1RW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
LD1RW Page 1829

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

if last >= 0 then
else
Z[t] = result;
LD1RW Page 1830

LD1SB (vector plus immediate)
Gather load signed bytes to vector (immediate index).
Gather load of signed bytes to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is in the range 0 to 31. Inactive elements will not read Device memory or signal
faults, and are set to zero in the destination vector.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LD1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LD1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm5" field.
LD1SB (vector plus

Page 1831
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
else
Z[t] = result;
LD1SB (vector plus

Page 1832
immediate)
LD1SB (scalar plus immediate)
Contiguous load signed bytes to vector (immediate index).
Contiguous load of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar
base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer msize = 8;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
LD1SB (scalar plus

Page 1833
immediate)
64-bit element
LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;
LD1SB (scalar plus

Page 1834
immediate)
LD1SB (scalar plus scalar)
Contiguous load signed bytes to vector (scalar index).
Contiguous load of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar
base and scalar index which is added to the base address. After each element access the index value is incremented,
but the index register is not updated. Inactive elements will not read Device memory or signal a fault, and are set to
zero in the destination vector.
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 16;
integer msize = 8;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
LD1SB (scalar plus scalar) Page 1835

64-bit element
LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 64;
integer msize = 8;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;
LD1SB (scalar plus scalar) Page 1836

LD1SB (scalar plus vector)
Gather load signed bytes to vector (vector index).
Gather load of signed bytes to active elements of a vector register from memory addresses generated by a 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 0 0 Pg Rn Zt
U ff
LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1SB (scalar plus vector) Page 1837

LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 8;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;
LD1SB (scalar plus vector) Page 1838

LD1SH (vector plus immediate)
Gather load signed halfwords to vector (immediate index).
Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not read Device
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LD1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LD1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
LD1SH (vector plus

Page 1839
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
else
Z[t] = result;
LD1SH (vector plus

Page 1840
immediate)
LD1SH (scalar plus immediate)
Contiguous load signed halfwords to vector (immediate index).
Contiguous load of signed halfwords to elements of a vector register from the memory address generated by a 64-bit
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
"imm4" field.
LD1SH (scalar plus

Page 1841
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;
LD1SH (scalar plus

Page 1842
immediate)
LD1SH (scalar plus scalar)
Contiguous load signed halfwords to vector (scalar index).
Contiguous load of signed halfwords to elements of a vector register from the memory address generated by a 64-bit
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 64;
integer msize = 16;
Assembler Symbols
LD1SH (scalar plus scalar) Page 1843

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;
LD1SH (scalar plus scalar) Page 1844

LD1SH (scalar plus vector)
Gather load signed halfwords to vector (vector index).
Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a 64-bit
destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 0 0 Pg Rn Zt
U ff
LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]

integer esize = 32;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 0 0 Pg Rn Zt
U ff
LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1SH (scalar plus vector) Page 1845

LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 0 0 Pg Rn Zt
U ff
LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 0 0 Pg Rn Zt
U ff
LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 16;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;

LD1SW (vector plus immediate)
Gather load signed words to vector (immediate index).
Gather load of signed words to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not read Device
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
SVE
LD1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
else
Z[t] = result;
LD1SW (vector plus

Page 1848
immediate)
LD1SW (scalar plus immediate)
Contiguous load signed words to vector (immediate index).
Contiguous load of signed words to elements of a vector register from the memory address generated by a 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
SVE
LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
"imm4" field.
LD1SW (scalar plus

Page 1849
immediate)
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;
LD1SW (scalar plus

Page 1850
immediate)
LD1SW (scalar plus scalar)
Contiguous load signed words to vector (scalar index).
Contiguous load of signed words to elements of a vector register from the memory address generated by a 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
SVE
LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 64;
integer msize = 32;
Assembler Symbols
LD1SW (scalar plus scalar) Page 1851

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;
LD1SW (scalar plus scalar) Page 1852

LD1SW (scalar plus vector)
Gather load signed words to vector (vector index).
Gather load of signed words to active elements of a vector register from memory addresses generated by a 64-bit
destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 0 0 Pg Rn Zt
U ff
LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 32;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 0 0 Pg Rn Zt
U ff
LD1SW (scalar plus vector) Page 1853

LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 32;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;

LD1W (vector plus immediate)
Gather load unsigned words to vector (immediate index).
Gather load of unsigned words to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not read Device
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LD1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 32;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LD1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
LD1W (vector plus

Page 1856
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
else
Z[t] = result;
LD1W (vector plus

Page 1857
immediate)
LD1W (scalar plus immediate)
Contiguous load unsigned words to vector (immediate index).
Contiguous load of unsigned words to elements of a vector register from the memory address generated by a 64-bit
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 32;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
"imm4" field.
LD1W (scalar plus immediate) Page 1858

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;

LD1W (scalar plus scalar)
Contiguous load unsigned words to vector (scalar index).
Contiguous load of unsigned words to elements of a vector register from the memory address generated by a 64-bit
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
integer msize = 32;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 64;
integer msize = 32;
Assembler Symbols
LD1W (scalar plus scalar) Page 1860

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;

LD1W (scalar plus vector)
Gather load unsigned words to vector (vector index).
Gather load of unsigned words to active elements of a vector register from memory addresses generated by a 64-bit
destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #2]

integer esize = 32;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff
LD1W (scalar plus vector) Page 1862

LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 32;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 0 Zm 0 1 0 Pg Rn Zt
U ff
LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 32;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 1 0 Pg Rn Zt
U ff
LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 32;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
else
Z[t] = result;

Contiguous load two-byte structures to two vectors (immediate index).
Contiguous load two-byte structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive elements will not read Device memory or
signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer nreg = 2;
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
array [0..1] of bits(VL) values;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
addr = base + offset * elements * nreg * mbytes;

for r = 0 to nreg-1
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_NORMAL];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Contiguous load two-byte structures to two vectors (scalar index).
Contiguous load two-byte structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by two. The index register is not updated by the instruction.
two consecutive bytes in memory which make up each structure. Inactive elements will not read Device memory or
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 8;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
offset = offset + nreg;
for r = 0 to nreg-1

Contiguous load two-doubleword structures to two vectors (immediate index).
Contiguous load two-doubleword structures, each to the same element number in two vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to
14 that is multiplied by the vector's in-memory size, irrespective of predication,
two consecutive doublewords in memory which make up each structure. Inactive elements will not read Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load two-doubleword structures to two vectors (scalar index).
Contiguous load two-doubleword structures, each to the same element number in two vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by two. The index
register is not updated by the instruction.
two consecutive doublewords in memory which make up each structure. Inactive elements will not read Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load two-halfword structures to two vectors (immediate index).
Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory
two consecutive halfwords in memory which make up each structure. Inactive elements will not read Device memory
or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load two-halfword structures to two vectors (scalar index).
Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
two consecutive halfwords in memory which make up each structure. Inactive elements will not read Device memory
or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load two-word structures to two vectors (immediate index).
Contiguous load two-word structures, each to the same element number in two vector registers from the memory
two consecutive words in memory which make up each structure. Inactive elements will not read Device memory or
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load two-word structures to two vectors (scalar index).
Contiguous load two-word structures, each to the same element number in two vector registers from the memory
two consecutive words in memory which make up each structure. Inactive elements will not read Device memory or
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load three-byte structures to three vectors (immediate index).
Contiguous load three-byte structures, each to the same element number in three vector registers from the memory
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive elements will not read Device memory
or signal a fault, and the corresponding element is set to zero in each of the three destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer nreg = 3;
Assembler Symbols
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load three-byte structures to three vectors (scalar index).
Contiguous load three-byte structures, each to the same element number in three vector registers from the memory
structure access the index value is incremented by three. The index register is not updated by the instruction.
the three consecutive bytes in memory which make up each structure. Inactive elements will not read Device memory
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 8;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load three-doubleword structures to three vectors (immediate index).
Contiguous load three-doubleword structures, each to the same element number in three vector registers from the
the three consecutive doublewords in memory which make up each structure. Inactive elements will not read Device
memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load three-doubleword structures to three vectors (scalar index).
Contiguous load three-doubleword structures, each to the same element number in three vector registers from the
option) and added to the base address. After each structure access the index value is incremented by three. The index
the three consecutive doublewords in memory which make up each structure. Inactive elements will not read Device
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load three-halfword structures to three vectors (immediate index).
Contiguous load three-halfword structures, each to the same element number in three vector registers from the
the three consecutive halfwords in memory which make up each structure. Inactive elements will not read Device
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load three-halfword structures to three vectors (scalar index).
Contiguous load three-halfword structures, each to the same element number in three vector registers from the
the three consecutive halfwords in memory which make up each structure. Inactive elements will not read Device
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load three-word structures to three vectors (immediate index).
Contiguous load three-word structures, each to the same element number in three vector registers from the memory
the three consecutive words in memory which make up each structure. Inactive elements will not read Device memory
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load three-word structures to three vectors (scalar index).
Contiguous load three-word structures, each to the same element number in three vector registers from the memory
and added to the base address. After each structure access the index value is incremented by three. The index register
the three consecutive words in memory which make up each structure. Inactive elements will not read Device memory
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load four-byte structures to four vectors (immediate index).
Contiguous load four-byte structures, each to the same element number in four vector registers from the memory
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive elements will not read Device memory or
signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer nreg = 4;
Assembler Symbols
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load four-byte structures to four vectors (scalar index).
Contiguous load four-byte structures, each to the same element number in four vector registers from the memory
structure access the index value is incremented by four. The index register is not updated by the instruction.
four consecutive bytes in memory which make up each structure. Inactive elements will not read Device memory or
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 8;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load four-doubleword structures to four vectors (immediate index).
Contiguous load four-doubleword structures, each to the same element number in four vector registers from the
four consecutive doublewords in memory which make up each structure. Inactive elements will not read Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load four-doubleword structures to four vectors (scalar index).
Contiguous load four-doubleword structures, each to the same element number in four vector registers from the
option) and added to the base address. After each structure access the index value is incremented by four. The index
four consecutive doublewords in memory which make up each structure. Inactive elements will not read Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load four-halfword structures to four vectors (immediate index).
Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory
four consecutive halfwords in memory which make up each structure. Inactive elements will not read Device memory
or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load four-halfword structures to four vectors (scalar index).
Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory
and added to the base address. After each structure access the index value is incremented by four. The index register
four consecutive halfwords in memory which make up each structure. Inactive elements will not read Device memory
or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load four-word structures to four vectors (immediate index).
Contiguous load four-word structures, each to the same element number in four vector registers from the memory
four consecutive words in memory which make up each structure. Inactive elements will not read Device memory or
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

for r = 0 to nreg-1
else
for r = 0 to nreg-1

Contiguous load four-word structures to four vectors (scalar index).
Contiguous load four-word structures, each to the same element number in four vector registers from the memory
four consecutive words in memory which make up each structure. Inactive elements will not read Device memory or
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
else
for r = 0 to nreg-1

LDFF1B (vector plus immediate)
Gather load first-fault unsigned bytes to vector (immediate index).
Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will
not read Device memory or signal faults, and are set to zero in the destination vector.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LDFF1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LDFF1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm5" field.
LDFF1B (vector plus

Page 1913
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
if first then
// Mem[] will not return if a fault is detected for the first active element
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';
// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

unknown = unknown || ElemFFR[e, esize] == '0';
if unknown then
if !fault && ConstrainUnpredictableBool(Unpredictable_SVELDNFDATA) then
elsif ConstrainUnpredictableBool(Unpredictable_SVELDNFZERO) then
else // merge
Elem[result, e, esize] = Elem[orig, e, esize];
else
Z[t] = result;
LDFF1B (vector plus

Page 1914
immediate)
LDFF1B (scalar plus scalar)
Contiguous load first-fault unsigned bytes to vector (scalar index).
Contiguous load with first-faulting behavior of unsigned bytes to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements will not read Device
memory or signal a fault, and are set to zero in the destination vector.
8-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
8-bit element
LDFF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

integer esize = 8;
integer msize = 8;
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LDFF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

integer esize = 16;
integer msize = 8;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
LDFF1B (scalar plus scalar) Page 1915

32-bit element
LDFF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
if first then
first = FALSE;
else
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1B (scalar plus vector)
Gather load first-fault unsigned bytes to vector (vector index).
Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally sign or zero-extended
from 32 to 64 bits. Inactive elements will not read Device memory or signal faults, and are set to zero in the
destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 1 1 Pg Rn Zt
U ff
LDFF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1B (scalar plus vector) Page 1918

LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 8;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = Z[m];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1D (vector plus immediate)
Gather load first-fault doublewords to vector (immediate index).
Gather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements
will not read Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
SVE
LDFF1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 64;
Assembler Symbols
LDFF1D (vector plus

Page 1921
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1D (vector plus

Page 1922
immediate)
LDFF1D (scalar plus scalar)
Contiguous load first-fault doublewords to vector (scalar index).
Contiguous load with first-faulting behavior of doublewords to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each
element access the index value is incremented, but the index register is not updated. Inactive elements will not read
Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
SVE
LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #3}]

integer esize = 64;
integer msize = 64;
Assembler Symbols
"Rm" field.
LDFF1D (scalar plus scalar) Page 1923

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1D (scalar plus scalar) Page 1924

LDFF1D (scalar plus vector)
Gather load first-fault doublewords to vector (vector index).
Gather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32
to 64 bits and then optionally multiplied by 8. Inactive elements will not read Device memory or signal faults, and are
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #3]

integer esize = 64;
integer msize = 64;
integer scale = 3;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 64;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 1 Zm 1 1 1 Pg Rn Zt
U ff
LDFF1D (scalar plus vector) Page 1925

LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3]

integer esize = 64;
integer msize = 64;
integer scale = 3;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 64;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = Z[m];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1H (vector plus immediate)
Gather load first-fault unsigned halfwords to vector (immediate index).
Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LDFF1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LDFF1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
LDFF1H (vector plus

Page 1928
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1H (vector plus

Page 1929
immediate)
LDFF1H (scalar plus scalar)
Contiguous load first-fault unsigned halfwords to vector (scalar index).
Contiguous load with first-faulting behavior of unsigned halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is multiplied by 2 and added to the base address.
After each element access the index value is incremented, but the index register is not updated. Inactive elements will
not read Device memory or signal a fault, and are set to zero in the destination vector.
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LDFF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

integer esize = 16;
integer msize = 16;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
LDFF1H (scalar plus scalar) Page 1930

64-bit element
LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
"Rm" field.

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1H (scalar plus vector)
Gather load first-fault unsigned halfwords to vector (vector index).
Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-
extended from 32 to 64 bits and then optionally multiplied by 2. Inactive elements will not read Device memory or
signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]

integer esize = 32;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1H (scalar plus vector) Page 1933

LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 1 1 Pg Rn Zt
U ff
LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 1 1 Pg Rn Zt
U ff
LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 16;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = Z[m];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1SB (vector plus immediate)
Gather load first-fault signed bytes to vector (immediate index).
Gather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not read
Device memory or signal faults, and are set to zero in the destination vector.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LDFF1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LDFF1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm5" field.
LDFF1SB (vector plus

Page 1937
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1SB (vector plus

Page 1938
immediate)
LDFF1SB (scalar plus scalar)
Contiguous load first-fault signed bytes to vector (scalar index).
Contiguous load with first-faulting behavior of signed bytes to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is added to the base address. After each element access the
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LDFF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

integer esize = 16;
integer msize = 8;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LDFF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
LDFF1SB (scalar plus scalar) Page 1939

64-bit element
LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"Rm" field.

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1SB (scalar plus vector)
Gather load first-fault signed bytes to vector (vector index).
Gather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to
64 bits. Inactive elements will not read Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 0 1 Pg Rn Zt
U ff
LDFF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1SB (scalar plus vector) Page 1942

LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 8;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = Z[m];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1SH (vector plus immediate)
Gather load first-fault signed halfwords to vector (immediate index).
Gather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LDFF1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LDFF1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
LDFF1SH (vector plus

Page 1945
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1SH (vector plus

Page 1946
immediate)
LDFF1SH (scalar plus scalar)
Contiguous load first-fault signed halfwords to vector (scalar index).
Contiguous load with first-faulting behavior of signed halfwords to elements of a vector register from the memory
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
"Rm" field.
LDFF1SH (scalar plus scalar) Page 1947

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1SH (scalar plus scalar) Page 1948

LDFF1SH (scalar plus vector)
Gather load first-fault signed halfwords to vector (vector index).
Gather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 0 1 Pg Rn Zt
U ff
LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]

integer esize = 32;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 0 1 Pg Rn Zt
U ff
LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1SH (scalar plus vector) Page 1949

LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 0 1 Pg Rn Zt
U ff
LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 0 1 Pg Rn Zt
U ff
LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 16;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = Z[m];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1SW (vector plus immediate)
Gather load first-fault signed words to vector (immediate index).
Gather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements
will not read Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
SVE
LDFF1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
LDFF1SW (vector plus

Page 1953
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1SW (vector plus

Page 1954
immediate)
LDFF1SW (scalar plus scalar)
Contiguous load first-fault signed words to vector (scalar index).
Contiguous load with first-faulting behavior of signed words to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is multiplied by 4 and added to the base address. After each
element access the index value is incremented, but the index register is not updated. Inactive elements will not read
Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
SVE
LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
"Rm" field.
LDFF1SW (scalar plus scalar) Page 1955

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1SW (scalar plus scalar) Page 1956

LDFF1SW (scalar plus vector)
Gather load first-fault signed words to vector (vector index).
Gather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32
to 64 bits and then optionally multiplied by 4. Inactive elements will not read Device memory or signal faults, and are
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 0 1 Pg Rn Zt
U ff
LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 32;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 0 1 Pg Rn Zt
U ff
LDFF1SW (scalar plus vector) Page 1957

LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 32;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = Z[m];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDFF1W (vector plus immediate)
Gather load first-fault unsigned words to vector (immediate index).
Gather load with first-faulting behavior of unsigned words to active elements of a vector register from memory
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
32-bit element
LDFF1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 32;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
LDFF1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
LDFF1W (vector plus

Page 1960
immediate)
Operation
CheckSVEEnabled();
bits(64) addr;
bits(VL) result;
bits(msize) data;
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1W (vector plus

Page 1961
immediate)
LDFF1W (scalar plus scalar)
Contiguous load first-fault unsigned words to vector (scalar index).
Contiguous load with first-faulting behavior of unsigned words to elements of a vector register from the memory
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]

integer esize = 32;
integer msize = 32;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
"Rm" field.
LDFF1W (scalar plus scalar) Page 1962

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDFF1W (scalar plus scalar) Page 1963

LDFF1W (scalar plus vector)
Gather load first-fault unsigned words to vector (vector index).
Gather load with first-faulting behavior of unsigned words to active elements of a vector register from memory
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #2]

integer esize = 32;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff
LDFF1W (scalar plus vector) Page 1964

LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 32;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 0 Zm 0 1 1 Pg Rn Zt
U ff
LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 32;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 1 1 Pg Rn Zt
U ff
LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 32;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = Z[m];
if first then
first = FALSE;
else
else

if faulted then

if unknown then
else // merge
else
Z[t] = result;

LDNF1B
Contiguous load non-fault unsigned bytes to vector (immediate index).
Contiguous load with non-faulting behavior of unsigned bytes to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not read Device
8-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
8-bit element
LDNF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer msize = 8;
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LDNF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer msize = 8;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
LDNF1B Page 1968

32-bit element
LDNF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LDNF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm4" field.
LDNF1B Page 1969

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
if ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDNF1B Page 1970

LDNF1D
Contiguous load non-fault doublewords to vector (immediate index).
Contiguous load with non-faulting behavior of doublewords to elements of a vector register from the memory address
generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-
memory size, irrespective of predication, and added to the base address. Inactive elements will not read Device
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
SVE
LDNF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 64;
Assembler Symbols
"imm4" field.
LDNF1D Page 1971

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDNF1D Page 1972

LDNF1H
Contiguous load non-fault unsigned halfwords to vector (immediate index).
Contiguous load with non-faulting behavior of unsigned halfwords to elements of a vector register from the memory
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LDNF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer msize = 16;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LDNF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
LDNF1H Page 1973

64-bit element
LDNF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
"imm4" field.
LDNF1H Page 1974

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDNF1H Page 1975

LDNF1SB
Contiguous load non-fault signed bytes to vector (immediate index).
Contiguous load with non-faulting behavior of signed bytes to elements of a vector register from the memory address
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer msize = 8;
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
LDNF1SB Page 1976

64-bit element
LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm4" field.
LDNF1SB Page 1977

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDNF1SB Page 1978

LDNF1SH
Contiguous load non-fault signed halfwords to vector (immediate index).
Contiguous load with non-faulting behavior of signed halfwords to elements of a vector register from the memory
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
"imm4" field.
LDNF1SH Page 1979

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDNF1SH Page 1980

LDNF1SW
Contiguous load non-fault signed words to vector (immediate index).
Contiguous load with non-faulting behavior of signed words to elements of a vector register from the memory address
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
SVE
LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
"imm4" field.
LDNF1SW Page 1981

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDNF1SW Page 1982

LDNF1W
Contiguous load non-fault unsigned words to vector (immediate index).
Contiguous load with non-faulting behavior of unsigned words to elements of a vector register from the memory
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
LDNF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer msize = 32;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
LDNF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
"imm4" field.
LDNF1W Page 1983

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
bits(msize) data;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else

if faulted then

if unknown then
else // merge
else
Z[t] = result;
LDNF1W Page 1984

LDNT1B (scalar plus immediate)
Contiguous load non-temporal bytes to vector (immediate index).
Contiguous load non-temporal of bytes to elements of a vector register from the memory address generated by a 64-bit
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

Elem[result, e, esize] = Mem[addr, mbytes, AccType_STREAM];
else
Z[t] = result;
LDNT1B (scalar plus

Page 1985
immediate)
LDNT1B (scalar plus

Page 1986
immediate)
LDNT1B (scalar plus scalar)
Contiguous load non-temporal bytes to vector (scalar index).
Contiguous load non-temporal of bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements will not read Device memory or signal a fault,
and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

integer esize = 8;
Assembler Symbols
LDNT1B (scalar plus scalar) Page 1987

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];
else
Z[t] = result;
LDNT1B (scalar plus scalar) Page 1988

LDNT1D (scalar plus immediate)
Contiguous load non-temporal doublewords to vector (immediate index).
Contiguous load non-temporal of doublewords to elements of a vector register from the memory address generated by
a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;
LDNT1D (scalar plus

Page 1989
immediate)
LDNT1D (scalar plus

Page 1990
immediate)
LDNT1D (scalar plus scalar)
Contiguous load non-temporal doublewords to vector (scalar index).
Contiguous load non-temporal of doublewords to elements of a vector register from the memory address generated by
a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements will not read Device
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
Assembler Symbols
LDNT1D (scalar plus scalar) Page 1991

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];
else
Z[t] = result;
LDNT1D (scalar plus scalar) Page 1992

LDNT1H (scalar plus immediate)
Contiguous load non-temporal halfwords to vector (immediate index).
Contiguous load non-temporal of halfwords to elements of a vector register from the memory address generated by a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;
LDNT1H (scalar plus

Page 1993
immediate)
LDNT1H (scalar plus

Page 1994
immediate)
LDNT1H (scalar plus scalar)
Contiguous load non-temporal halfwords to vector (scalar index).
Contiguous load non-temporal of halfwords to elements of a vector register from the memory address generated by a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
Assembler Symbols
LDNT1H (scalar plus scalar) Page 1995

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];
else
Z[t] = result;
LDNT1H (scalar plus scalar) Page 1996

LDNT1W (scalar plus immediate)
Contiguous load non-temporal words to vector (immediate index).
Contiguous load non-temporal of words to elements of a vector register from the memory address generated by a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

else
Z[t] = result;
LDNT1W (scalar plus

Page 1997
immediate)
LDNT1W (scalar plus

Page 1998
immediate)
LDNT1W (scalar plus scalar)
Contiguous load non-temporal words to vector (scalar index).
Contiguous load non-temporal of words to elements of a vector register from the memory address generated by a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
SVE
LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
Assembler Symbols
LDNT1W (scalar plus scalar) Page 1999

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(64) offset;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
offset = X[m];
else
Z[t] = result;
LDNT1W (scalar plus scalar) Page 2000

LDR (predicate)
Load predicate register.
Load a predicate register from a memory address generated by a 64-bit scalar base, plus an immediate offset in the
range -256 to 255 which is multiplied by the current predicate register size in bytes. This instruction is unpredicated.
The load is performed as a stream of bytes containing 8 consecutive predicate bits in ascending element order, without
any endian conversion.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 imm9h 0 0 0 imm9l Rn 0 Pt
SVE
LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]

integer t = UInt(Pt);
integer imm = SInt(imm9h:imm9l);
Assembler Symbols
<Pt> Is the name of the destination scalable predicate register, encoded in the "Pt" field.
"imm9h:imm9l" fields.
Operation
CheckSVEEnabled();
integer elements = PL DIV 8;
bits(64) base;
integer offset = imm * elements;
bits(PL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
boolean aligned = AArch64.CheckAlignment(base + offset, 2, AccType_NORMAL, FALSE);

Elem[result, e, 8] = AArch64.MemSingle[base + offset, 1, AccType_NORMAL, aligned];
P[t] = result;
LDR (predicate) Page 2001

LDR (vector)
Load vector register.
Load a vector register from a memory address generated by a 64-bit scalar base, plus an immediate offset in the range
-256 to 255 which is multiplied by the current vector register size in bytes. This instruction is unpredicated.
The load is performed as a stream of byte elements in ascending element order, without any endian conversion.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 imm9h 0 1 0 imm9l Rn Zt
SVE
LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]

Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
boolean aligned = AArch64.CheckAlignment(base + offset, 16, AccType_NORMAL, FALSE);

Elem[result, e, 8] = AArch64.MemSingle[base + offset, 1, AccType_NORMAL, aligned];
Z[t] = result;
LDR (vector) Page 2002

LSL (immediate, predicated)
Logical shift left by immediate (predicated).
Shift left by immediate each active element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 0 to
number of bits per element minus 1. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
L U
SVE
LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>

case tsize of
integer shift = UInt(tsize:imm3) - esize;
Assembler Symbols
tszh tszl <T>
00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D
<const> Is the immediate shift amount, in the range 0 to number of bits per element minus 1, encoded in
"tsz:imm3".
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = LSL(element1, shift);
else
Z[dn] = result;
LSL (immediate, predicated) Page 2003

UNPREDICTABLE:
LSL (immediate, predicated) Page 2004

LSL (wide elements, predicated)
Logical shift left by 64-bit wide elements (predicated).
Shift left active elements of the first source vector by corresponding overlapping 64-bit elements of the second source
vector and destructively place the results in the corresponding elements of the first source vector. The shift amount is
a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 1 0 0 Pg Zm Zdn
R L U
SVE
LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D

Assembler Symbols
size <T>
00 B
01 H
10 S
11 RESERVED
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
LSL (wide elements,

Page 2005
predicated)
LSL (wide elements,

Page 2006
predicated)
LSL (vectors)
Logical shift left by vector (predicated).
Shift left active elements of the first source vector by corresponding elements of the second source vector and
destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a
vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 1 0 0 Pg Zm Zdn
R L U
SVE
LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
LSL (vectors) Page 2007

LSL (vectors) Page 2008

LSL (immediate, unpredicated)
Logical shift left by immediate (unpredicated).
Shift left by immediate each element of the source vector, and place the results in the corresponding elements of the
destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element
minus 1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SVE
LSL <Zd>.<T>, <Zn>.<T>, #<const>

case tsize of
integer shift = UInt(tsize:imm3) - esize;
Assembler Symbols
tszh tszl <T>
00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D
<const> Is the immediate shift amount, in the range 0 to number of bits per element minus 1, encoded in
"tsz:imm3".
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
LSL (immediate,
Page 2009
unpredicated)
LSL (wide elements, unpredicated)
Logical shift left by 64-bit wide elements (unpredicated).
Shift left all elements of the first source vector by corresponding overlapping 64-bit elements of the second source
vector and place the first in the corresponding elements of the destination vector. The shift amount is a vector of
unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination element
size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 1 1 Zn Zd
SVE
LSL <Zd>.<T>, <Zn>.<T>, <Zm>.D

Assembler Symbols
size <T>
00 B
01 H
10 S
11 RESERVED
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
LSL (wide elements,

Page 2010
unpredicated)
LSLR
Reversed logical shift left by vector (predicated).
Reversed shift left active elements of the second source vector by corresponding elements of the first source vector
and destructively place the results in the corresponding elements of the first source vector. The shift amount operand
is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 1 0 0 Pg Zm Zdn
R L U
SVE
LSLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
LSLR Page 2011

LSLR Page 2012

LSR (immediate, predicated)
Logical shift right by immediate (predicated).
Shift right by immediate, inserting zeroes, each active element of the source vector, and destructively place the results
in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
L U
SVE
LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>

case tsize of
Assembler Symbols
tszh tszl <T>
00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = LSR(element1, shift);
else
Z[dn] = result;
LSR (immediate, predicated) Page 2013

UNPREDICTABLE:
LSR (immediate, predicated) Page 2014

LSR (wide elements, predicated)
Logical shift right by 64-bit wide elements (predicated).
Shift right, inserting zeroes, active elements of the first source vector by corresponding overlapping 64-bit elements of
the second source vector and destructively place the results in the corresponding elements of the first source vector.
The shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used
modulo the destination element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 1 0 0 Pg Zm Zdn
R L U
SVE
LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D

Assembler Symbols
size <T>
00 B
01 H
10 S
11 RESERVED
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
LSR (wide elements,

Page 2015
predicated)
LSR (wide elements,

Page 2016
predicated)
LSR (vectors)
Logical shift right by vector (predicated).
Shift right, inserting zeroes, active elements of the first source vector by corresponding elements of the second source
vector and destructively place the results in the corresponding elements of the first source vector. The shift amount
operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 1 1 0 0 Pg Zm Zdn
R L U
SVE
LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
LSR (vectors) Page 2017

LSR (vectors) Page 2018

LSR (immediate, unpredicated)
Logical shift right by immediate (unpredicated).
Shift right by immediate, inserting zeroes, each element of the source vector, and place the results in the
corresponding elements of the destination vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
SVE
LSR <Zd>.<T>, <Zn>.<T>, #<const>

case tsize of
Assembler Symbols
tszh tszl <T>
00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
LSR (immediate,
Page 2019
unpredicated)
LSR (wide elements, unpredicated)
Logical shift right by 64-bit wide elements (unpredicated).
Shift right, inserting zeroes, all elements of the first source vector by corresponding overlapping 64-bit elements of the
second source vector and place the first in the corresponding elements of the destination vector. The shift amount is a
vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination
element size. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 0 1 Zn Zd
U
SVE
LSR <Zd>.<T>, <Zn>.<T>, <Zm>.D

Assembler Symbols
size <T>
00 B
01 H
10 S
11 RESERVED
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
LSR (wide elements,

Page 2020
unpredicated)
LSRR
Reversed logical shift right by vector (predicated).
Reversed shift right, inserting zeroes, active elements of the second source vector by corresponding elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. The
shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 1 0 0 Pg Zm Zdn
R L U
SVE
LSRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
LSRR Page 2021

LSRR Page 2022

MAD
Multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm].
Multiply the corresponding active elements of the first and second source vectors and add to elements of the third
(addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 0 Pg Za Zdn
op
SVE
MAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

boolean sub_op = FALSE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer element1 = UInt(Elem[operand1, e, esize]);
integer product = element1 * element2;
if sub_op then
else
else
Z[dn] = result;
MAD Page 2023

UNPREDICTABLE:
MAD Page 2024

MLA
Multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm].
Multiply the corresponding active elements of the first and second source vectors and add to elements of the third
source (addend) vector. Destructively place the results in the destination and third source (addend) vector. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn Zda
op
SVE
MLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

boolean sub_op = FALSE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
if sub_op then
else
else
Z[da] = result;
MLA Page 2025

UNPREDICTABLE:
MLA Page 2026

MLS
Multiply-subtract vectors (predicated), writing addend [Zda = Zda - Zn * Zm].
Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the
third source (addend) vector. Destructively place the results in the destination and third source (addend) vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn Zda
op
SVE
MLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

boolean sub_op = TRUE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
if sub_op then
else
else
Z[da] = result;
MLS Page 2027

UNPREDICTABLE:
MLS Page 2028

MOV (predicate, predicated, zeroing)
Move predicates (zeroing).
Read active elements from the source predicate and place in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
This is an alias of AND, ANDS (predicates). This means:
• The encodings in this description are named to match the encodings of AND, ANDS (predicates).
• The description of AND, ANDS (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
MOV <Pd>.B, <Pg>/Z, <Pn>.B
is equivalent to
AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pn>.B
and is the preferred disassembly when S == '0' && Pn == Pm.
Assembler Symbols
Operation
The description of AND, ANDS (predicates) gives the operational pseudocode for this instruction.
MOV (predicate, predicated,

Page 2029
zeroing)
MOV (immediate, predicated, zeroing)
Move signed integer immediate to vector elements (zeroing).
Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register are set to zero.
This is an alias of CPY (immediate, zeroing). This means:
• The encodings in this description are named to match the encodings of CPY (immediate, zeroing).
• The description of CPY (immediate, zeroing) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 0 sh imm8 Zd
M
SVE
MOV <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}
is equivalent to
CPY <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
The description of CPY (immediate, zeroing) gives the operational pseudocode for this instruction.
MOV (immediate, predicated,

Page 2030
zeroing)
MOV (immediate, predicated, merging)
Move signed integer immediate to vector elements (merging).
Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
This is an alias of CPY (immediate, merging). This means:
• The encodings in this description are named to match the encodings of CPY (immediate, merging).
• The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 sh imm8 Zd
M
SVE
MOV <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}
is equivalent to
CPY <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
UNPREDICTABLE:

Page 2031
merging)

Page 2032
merging)
MOV (scalar, predicated)
Move general-purpose register to vector elements (predicated).
Move the general-purpose scalar source register to each active element in the destination vector. Inactive elements in
This is an alias of CPY (scalar). This means:
• The encodings in this description are named to match the encodings of CPY (scalar).
• The description of CPY (scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 1 Pg Rn Zd
SVE
MOV <Zd>.<T>, <Pg>/M, <R><n|SP>
is equivalent to
CPY <Zd>.<T>, <Pg>/M, <R><n|SP>
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
size <R>
01 W
x0 W
11 X
field.
Operation
The description of CPY (scalar) gives the operational pseudocode for this instruction.
UNPREDICTABLE:
MOV (scalar, predicated) Page 2033

MOV (SIMD&FP scalar, predicated)
Move SIMD&FP scalar register to vector elements (predicated).
Move the SIMD & floating-point scalar source register to each active element in the destination vector. Inactive
This is an alias of CPY (SIMD&FP scalar). This means:
• The encodings in this description are named to match the encodings of CPY (SIMD&FP scalar).
• The description of CPY (SIMD&FP scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 0 Pg Vn Zd
SVE
MOV <Zd>.<T>, <Pg>/M, <V><n>
is equivalent to
CPY <Zd>.<T>, <Pg>/M, <V><n>
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
size <V>
00 B
01 H
10 S
11 D
<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vn" field.
Operation
The description of CPY (SIMD&FP scalar) gives the operational pseudocode for this instruction.
UNPREDICTABLE:
MOV (SIMD&FP scalar,

Page 2034
predicated)
MOV (immediate, unpredicated)
Move signed immediate to vector elements (unpredicated).
Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is
unpredicated.
This is an alias of DUP (immediate). This means:
• The encodings in this description are named to match the encodings of DUP (immediate).
• The description of DUP (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 sh imm8 Zd
SVE
MOV <Zd>.<T>, #<imm>{, <shift>}
is equivalent to
DUP <Zd>.<T>, #<imm>{, <shift>}
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
The description of DUP (immediate) gives the operational pseudocode for this instruction.
MOV (immediate,
Page 2035
unpredicated)
MOV (scalar, unpredicated)
Move general-purpose register to vector elements (unpredicated).
Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This
This is an alias of DUP (scalar). This means:
• The encodings in this description are named to match the encodings of DUP (scalar).
• The description of DUP (scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Zd
SVE
MOV <Zd>.<T>, <R><n|SP>
is equivalent to
DUP <Zd>.<T>, <R><n|SP>
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

size <R>
01 W
x0 W
11 X
field.
Operation
The description of DUP (scalar) gives the operational pseudocode for this instruction.
MOV (scalar, unpredicated) Page 2036

MOV (SIMD&FP scalar, unpredicated)
Move indexed element or SIMD&FP scalar to vector (unpredicated).
Unconditionally broadcast the SIMD&FP scalar into each element of the destination vector. This instruction is
unpredicated.
This is an alias of DUP (indexed). This means:
• The encodings in this description are named to match the encodings of DUP (indexed).
• The description of DUP (indexed) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 imm2 1 tsz 0 0 1 0 0 0 Zn Zd
MOV <Zd>.<T>, <V><n>
is equivalent to
DUP <Zd>.<T>, <Zn>.<T>[0]
and is the preferred disassembly when BitCount(imm2:tsz) == 1.
MOV <Zd>.<T>, <Zn>.<T>[<imm>]
is equivalent to
DUP <Zd>.<T>, <Zn>.<T>[<imm>]
and is the preferred disassembly when BitCount(imm2:tsz) > 1.
Assembler Symbols
<T> Is the size specifier, encoded in “tsz”:
tsz <T>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q
<imm> Is the immediate index, in the range 0 to one less than the number of elements in 512 bits, encoded in
"imm2:tsz".
<V> Is a width specifier, encoded in “tsz”:
tsz <V>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q
<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Zn" field.
Operation
The description of DUP (indexed) gives the operational pseudocode for this instruction.

Page 2037
unpredicated)

Page 2038
unpredicated)
MOV (bitmask immediate)
Move logical bitmask immediate to vector (unpredicated).
Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction
is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16,
32 or 64 bits.
This is an alias of DUPM. This means:
• The encodings in this description are named to match the encodings of DUPM.
• The description of DUPM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 1 0 0 0 0 imm13 Zd
SVE
MOV <Zd>.<T>, #<const>
is equivalent to
DUPM <Zd>.<T>, #<const>
and is the preferred disassembly when SVEMoveMaskPreferred(imm13).
Assembler Symbols
imm13<12> imm13<5:0> <T>
0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D
Operation
The description of DUPM gives the operational pseudocode for this instruction.
MOV (bitmask immediate) Page 2039

MOV (predicate, unpredicated)
Move predicate (unpredicated).
Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated.
Does not set the condition flags.
This is an alias of ORR, ORRS (predicates). This means:
• The encodings in this description are named to match the encodings of ORR, ORRS (predicates).
• The description of ORR, ORRS (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
MOV <Pd>.B, <Pn>.B
is equivalent to
ORR <Pd>.B, <Pn>/Z, <Pn>.B, <Pn>.B
and is the preferred disassembly when S == '0' && Pn == Pm && Pm == Pg.
Assembler Symbols
Operation
The description of ORR, ORRS (predicates) gives the operational pseudocode for this instruction.
MOV (predicate,
Page 2040
unpredicated)
MOV (vector, unpredicated)
Move vector register (unpredicated).
Move vector register. This instruction is unpredicated.
This is an alias of ORR (vectors, unpredicated). This means:
• The encodings in this description are named to match the encodings of ORR (vectors, unpredicated).
• The description of ORR (vectors, unpredicated) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 0 0 1 1 0 0 Zn Zd
SVE
MOV <Zd>.D, <Zn>.D
is equivalent to
ORR <Zd>.D, <Zn>.D, <Zn>.D
and is the preferred disassembly when Zn == Zm.
Assembler Symbols
Operation
The description of ORR (vectors, unpredicated) gives the operational pseudocode for this instruction.
MOV (vector, unpredicated) Page 2041

MOV (predicate, predicated, merging)
Move predicates (merging).
Inactive elements in the destination predicate register remain unmodified. Does not set the condition flags.
This is an alias of SEL (predicates). This means:
• The encodings in this description are named to match the encodings of SEL (predicates).
• The description of SEL (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S
SVE
MOV <Pd>.B, <Pg>/M, <Pn>.B
is equivalent to
SEL <Pd>.B, <Pg>, <Pn>.B, <Pd>.B
and is the preferred disassembly when Pd == Pm.
Assembler Symbols
Operation
The description of SEL (predicates) gives the operational pseudocode for this instruction.
MOV (predicate, predicated,

Page 2042
merging)
MOV (vector, predicated)
Move vector elements (predicated).
Move elements from the source vector to the corresponding elements of the destination vector. Inactive elements in
This is an alias of SEL (vectors). This means:
• The encodings in this description are named to match the encodings of SEL (vectors).
• The description of SEL (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 1 1 Pg Zn Zd
SVE
MOV <Zd>.<T>, <Pg>/M, <Zn>.<T>
is equivalent to
SEL <Zd>.<T>, <Pg>, <Zn>.<T>, <Zd>.<T>
and is the preferred disassembly when Zd == Zm.
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
The description of SEL (vectors) gives the operational pseudocode for this instruction.
MOV (vector, predicated) Page 2043

MOVPRFX (predicated)
Move prefix (predicated).
The predicated MOVPRFX instruction is a hint to hardware that the instruction may be combined with the destructive
instruction which follows it in program order to create a single constructive operation. Since it is a hint it is also
permitted to be implemented as a discrete vector copy, and the result of executing the pair of instructions with or
without combining is identical. The choice of combined versus discrete operation may vary dynamically.
Unless the combination of a constructive operation with merging predication is specifically required, it is strongly
recommended that for performance reasons software should prefer to use the zeroing form of predicated MOVPRFX or
the unpredicated MOVPRFX instruction.
Although the operation of the instruction is defined as a simple predicated vector copy, it is required that the prefixed
instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation with
merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same
predicate register, and have the same maximum element size (ignoring a fixed 64-bit "wide vector" operand), and the
same destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in
any other operand position, even if they have different names but refer to the same architectural register state. Any
other use is UNPREDICTABLE.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 M 0 0 1 Pg Zn Zd
SVE
MOVPRFX <Zd>.<T>, <Pg>/<ZM>, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
M <ZM>
0 Z
1 M
MOVPRFX (predicated) Page 2044

Operation
CheckSVEEnabled();
bits(VL) result;
elsif merging then
else
Z[d] = result;
MOVPRFX (predicated) Page 2045

MOVPRFX (unpredicated)
Move prefix (unpredicated).
The unpredicated MOVPRFX instruction is a hint to hardware that the instruction may be combined with the destructive
instruction which follows it in program order to create a single constructive operation. Since it is a hint it is also
permitted to be implemented as a discrete vector copy, and the result of executing the pair of instructions with or
without combining is identical. The choice of combined versus discrete operation may vary dynamically.
Although the operation of the instruction is defined as a simple unpredicated vector copy, it is required that the
prefixed instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation
with merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same
destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in any
other operand position, even if they have different names but refer to the same architectural register state. Any other
use is UNPREDICTABLE.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 1 Zn Zd
SVE
MOVPRFX <Zd>, <Zn>

Assembler Symbols
Operation
CheckSVEEnabled();
bits(VL) result = Z[n];
Z[d] = result;
MOVPRFX (unpredicated) Page 2046

MOVS (predicated)
Move predicates (zeroing), setting the condition flags.
Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition
This is an alias of AND, ANDS (predicates). This means:
• The encodings in this description are named to match the encodings of AND, ANDS (predicates).
• The description of AND, ANDS (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
MOVS <Pd>.B, <Pg>/Z, <Pn>.B
is equivalent to
ANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pn>.B
and is the preferred disassembly when S == '1' && Pn == Pm.
Assembler Symbols
Operation
The description of AND, ANDS (predicates) gives the operational pseudocode for this instruction.
MOVS (predicated) Page 2047

MOVS (unpredicated)
Move predicate (unpredicated), setting the condition flags.
Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated.
Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
This is an alias of ORR, ORRS (predicates). This means:
• The encodings in this description are named to match the encodings of ORR, ORRS (predicates).
• The description of ORR, ORRS (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
MOVS <Pd>.B, <Pn>.B
is equivalent to
ORRS <Pd>.B, <Pn>/Z, <Pn>.B, <Pn>.B
and is the preferred disassembly when S == '1' && Pn == Pm && Pm == Pg.
Assembler Symbols
Operation
The description of ORR, ORRS (predicates) gives the operational pseudocode for this instruction.
MOVS (unpredicated) Page 2048

MSB
Multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za - Zdn * Zm].
Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the
third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 1 Pg Za Zdn
op
SVE
MSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

boolean sub_op = TRUE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
if sub_op then
else
else
Z[dn] = result;
MSB Page 2049

UNPREDICTABLE:
MSB Page 2050

MUL (vectors)
Multiply vectors (predicated).
Multiply active elements of the first source vector by corresponding elements of the second source vector and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 0 0 0 Pg Zm Zdn
H U
SVE
MUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = product<esize-1:0>;
else
Z[dn] = result;
UNPREDICTABLE:
MUL (vectors) Page 2051

MUL (vectors) Page 2052

MUL (immediate)
Multiply by immediate (unpredicated).
Multiply by an immediate each element of the source vector, and destructively place the results in the corresponding
elements of the source vector. The immediate is a signed 8-bit value in the range -128 to +127, inclusive. This
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 0 0 0 0 1 1 0 imm8 Zdn
SVE
MUL <Zdn>.<T>, <Zdn>.<T>, #<imm>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = (element1 * imm)<esize-1:0>;
Z[dn] = result;
UNPREDICTABLE:
MUL (immediate) Page 2053

NAND, NANDS
Bitwise NAND predicates.
Bitwise NAND active elements of the second source predicate with corresponding elements of the first source
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S
NAND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S
NANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
NAND, NANDS Page 2054

Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = NOT(element1 AND element2);
else
if setflags then
P[d] = result;
NAND, NANDS Page 2055

NEG
Negate (predicated).
Negate the signed integer value in each active element of the source vector, and place the results in the corresponding
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 1 0 1 Pg Zn Zd
SVE
NEG <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
element = -element;
Z[d] = result;
UNPREDICTABLE:
NEG Page 2056

NOR, NORS
Bitwise NOR predicates.
Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Optionally sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
NOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
NORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
NOR, NORS Page 2057

Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = NOT(element1 OR element2);
else
if setflags then
P[d] = result;
NOR, NORS Page 2058

NOT (predicate)
Bitwise invert predicate.
Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.
This is an alias of EOR, EORS (predicates). This means:
• The encodings in this description are named to match the encodings of EOR, EORS (predicates).
• The description of EOR, EORS (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
NOT <Pd>.B, <Pg>/Z, <Pn>.B
is equivalent to
EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pg>.B
and is the preferred disassembly when Pm == Pg.
Assembler Symbols
Operation
The description of EOR, EORS (predicates) gives the operational pseudocode for this instruction.
NOT (predicate) Page 2059

NOT (vector)
Bitwise invert vector (predicated).
Bitwise invert each active element of the source vector, and place the results in the corresponding elements of the
destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 1 0 1 0 1 Pg Zn Zd
SVE
NOT <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = NOT element;
Z[d] = result;
UNPREDICTABLE:
NOT (vector) Page 2060

NOTS
Bitwise invert predicate, setting the condition flags.
Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the
This is an alias of EOR, EORS (predicates). This means:
• The encodings in this description are named to match the encodings of EOR, EORS (predicates).
• The description of EOR, EORS (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
NOTS <Pd>.B, <Pg>/Z, <Pn>.B
is equivalent to
EORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pg>.B
and is the preferred disassembly when Pm == Pg.
Assembler Symbols
Operation
The description of EOR, EORS (predicates) gives the operational pseudocode for this instruction.
NOTS Page 2061

ORN (immediate)
Bitwise inclusive OR with inverted immediate (unpredicated).
Bitwise inclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of
ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This is a pseudo-instruction of ORR (immediate). This means:
• The encodings in this description are named to match the encodings of ORR (immediate).
• The description of ORR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 imm13 Zdn
SVE
ORN <Zdn>.<T>, <Zdn>.<T>, #<const>
is equivalent to
ORR <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)
Assembler Symbols
imm13<12> imm13<5:0> <T>
0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D
Operation
The description of ORR (immediate) gives the operational pseudocode for this instruction.
UNPREDICTABLE:
ORN (immediate) Page 2062

ORN, ORNS (predicates)
Bitwise inclusive OR inverted predicate.
Bitwise inclusive OR inverted active elements of the second source predicate with corresponding elements of the first
source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in
the destination predicate register are set to zero. Optionally sets the FIRST (N), NONE (Z), !LAST (C) condition flags
based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S
ORN <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S
ORNS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
ORN, ORNS (predicates) Page 2063

Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = element1 OR (NOT element2);
else
if setflags then
P[d] = result;
ORN, ORNS (predicates) Page 2064

ORR, ORRS (predicates)
Bitwise inclusive OR predicate.
Bitwise inclusive OR active elements of the second source predicate with corresponding elements of the first source
This instruction is used by the aliases MOVS (unpredicated), and MOV (predicate, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
ORRS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
Alias Conditions

MOVS (unpredicated) S == '1' && Pn == Pm && Pm == Pg
ORR, ORRS (predicates) Page 2065

MOV (predicate, unpredicated) S == '0' && Pn == Pm && Pm == Pg
Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = element1 OR element2;
else
if setflags then
P[d] = result;
ORR, ORRS (predicates) Page 2066

ORR (vectors, predicated)
Bitwise inclusive OR vectors (predicated).
Bitwise inclusive OR active elements of the second source vector with corresponding elements of the first source
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 0 0 0 Pg Zm Zdn
SVE
ORR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = element1 OR element2;
else
Z[dn] = result;
UNPREDICTABLE:
ORR (vectors, predicated) Page 2067

ORR (vectors, predicated) Page 2068

ORR (immediate)
Bitwise inclusive OR with immediate (unpredicated).
Bitwise inclusive OR an immediate with each 64-bit element of the source vector, and destructively place the results in
This instruction is used by the pseudo-instruction ORN (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 imm13 Zdn
SVE
ORR <Zdn>.<T>, <Zdn>.<T>, #<const>

bits(64) imm;
Assembler Symbols
imm13<12> imm13<5:0> <T>
0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, 64] = element1 OR imm;
Z[dn] = result;
UNPREDICTABLE:


ORR (vectors, unpredicated)
Bitwise inclusive OR vectors (unpredicated).
Bitwise inclusive OR all elements of the second source vector with corresponding elements of the first source vector
and place the first in the corresponding elements of the destination vector. This instruction is unpredicated.
This instruction is used by the alias MOV (vector, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 0 0 1 1 0 0 Zn Zd
SVE
ORR <Zd>.D, <Zn>.D, <Zm>.D

Assembler Symbols
Alias Conditions

MOV (vector, unpredicated) Zn == Zm
Operation
CheckSVEEnabled();
Z[d] = operand1 OR operand2;
ORR (vectors, unpredicated) Page 2071

ORV
Bitwise inclusive OR reduction to scalar.
Bitwise inclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 0 0 1 Pg Zn Vd
SVE
ORV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(esize) result = Zeros(esize);
result = result OR Elem[operand, e, esize];
V[d] = result;
ORV Page 2072

PFALSE
Set all predicate elements to false.
Set all elements in the destination predicate to false.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 Pd
S
SVE
PFALSE <Pd>.B

Assembler Symbols
Operation
CheckSVEEnabled();
P[d] = Zeros(PL);
PFALSE Page 2073

PFIRST
Set the first active predicate element to true.
Sets the first active element in the destination predicate to true, otherwise elements from the source predicate are
passed through unchanged. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and
the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 Pg 0 Pdn
S
SVE
PFIRST <Pdn>.B, <Pg>, <Pdn>.B

integer esize = 8;
integer dn = UInt(Pdn);
Assembler Symbols
<Pdn> Is the name of the source and destination scalable predicate register, encoded in the "Pdn" field.
Operation
CheckSVEEnabled();
bits(PL) result = P[dn];
integer first = -1;
if ElemP[mask, e, esize] == '1' && first == -1 then
first = e;
if first >= 0 then

ElemP[result, first, esize] = '1';

P[dn] = result;
PFIRST Page 2074

PNEXT
Find next active predicate.
An instruction used to construct a loop which iterates over all active elements in a predicate. If all source predicate
elements are false it sets the first active predicate element in the destination predicate to true. Otherwise it
determines the next active predicate element following the last true source predicate element, and if one is found sets
the corresponding destination predicate element to true. All other destination predicate elements are set to false. Sets
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 1 1 1 0 0 0 1 0 Pg 0 Pdn
SVE
PNEXT <Pdn>.<T>, <Pg>, <Pdn>.<T>

integer dn = UInt(Pdn);
Assembler Symbols
<Pdn> Is the name of the source and destination scalable predicate register, encoded in the "Pdn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) operand = P[dn];
bits(PL) result;
integer next = LastActiveElement(operand, esize) + 1;
while next < elements && (ElemP[mask, next, esize] == '0') do

next = next + 1;
result = Zeros();
if next < elements then
ElemP[result, next, esize] = '1';

P[dn] = result;
PNEXT Page 2075

PRFB (vector plus immediate)
Gather prefetch bytes (vector plus immediate).
Gather prefetch of bytes from the active memory addresses generated by a vector base plus immediate index. The
index is in the range 0 to 31. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
32-bit element
PRFB <prfop>, <Pg>, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 0;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
64-bit element
PRFB <prfop>, <Pg>, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer scale = 0;
Assembler Symbols
<prfop> Is the prefetch operation specifier, encoded in “prfop”:
PRFB (vector plus immediate) Page 2076

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
"imm5" field.
Operation
CheckSVEEnabled();
bits(VL) base;
bits(64) addr;
base = Z[n];
addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
PRFB (vector plus immediate) Page 2077

PRFB (scalar plus immediate)
Contiguous prefetch bytes (immediate index).
Contiguous prefetch of byte elements from the memory address generated by a 64-bit scalar base and immediate
index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication, and added
to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>
SVE
PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer scale = 0;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
"imm6" field.
PRFB (scalar plus immediate) Page 2078

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
base = SP[];
else
base = X[n];
addr = base + ((offset * elements) << scale);

addr = addr + (1 << scale);
PRFB (scalar plus immediate) Page 2079

PRFB (scalar plus scalar)
Contiguous prefetch bytes (scalar index).
Contiguous prefetch of byte elements from the memory address generated by a 64-bit scalar base and scalar index
which is added to the base address. After each element prefetch the index value is incremented, but the index register
is not updated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
SVE
PRFB <prfop>, <Pg>, [<Xn|SP>, <Xm>]

integer esize = 8;
integer scale = 0;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
PRFB (scalar plus scalar) Page 2080

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
base = SP[];
else
base = X[n];
addr = base + (UInt(offset) << scale);
PRFB (scalar plus scalar) Page 2081

PRFB (scalar plus vector)
Gather prefetch bytes (scalar plus vector).
Gather prefetch of bytes from the active memory addresses generated by a 64-bit scalar base plus vector index. The
index values are optionally sign or zero-extended from 32 to 64 bits. Inactive addresses are not prefetched from
memory.
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>
PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
boolean offs_unsigned = (xs == '0');
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>
PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 0 0 Pg Rn 0 prfop
msz<1>msz<0>
PRFB (scalar plus vector) Page 2082

PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer scale = 0;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
if n == 31 then
base = SP[];
else
base = X[n];
offset = Z[m];


PRFD (vector plus immediate)
Gather prefetch doublewords (vector plus immediate).
Gather prefetch of doublewords from the active memory addresses generated by a vector base plus immediate index.
The index is a multiple of 8 in the range 0 to 248. Inactive addresses are not prefetched from memory.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
32-bit element
PRFD <prfop>, <Pg>, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer scale = 3;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
64-bit element
PRFD <prfop>, <Pg>, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer scale = 3;
Assembler Symbols
PRFD (vector plus immediate) Page 2085

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
Operation
CheckSVEEnabled();
bits(VL) base;
bits(64) addr;
base = Z[n];
PRFD (vector plus immediate) Page 2086

PRFD (scalar plus immediate)
Contiguous prefetch doublewords (immediate index).
Contiguous prefetch of doubleword elements from the memory address generated by a 64-bit scalar base and
immediate index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication,
and added to the base address.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>
SVE
PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer scale = 3;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
"imm6" field.
PRFD (scalar plus immediate) Page 2087

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
base = SP[];
else
base = X[n];

PRFD (scalar plus immediate) Page 2088

PRFD (scalar plus scalar)
Contiguous prefetch doublewords (scalar index).
Contiguous prefetch of doubleword elements from the memory address generated by a 64-bit scalar base and scalar
index which is multiplied by 8 and added to the base address. After each element prefetch the index value is
incremented, but the index register is not updated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
SVE
PRFD <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer scale = 3;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
PRFD (scalar plus scalar) Page 2089

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
base = SP[];
else
base = X[n];
PRFD (scalar plus scalar) Page 2090

PRFD (scalar plus vector)
Gather prefetch doublewords (scalar plus vector).
Gather prefetch of doublewords from the active memory addresses generated by a 64-bit scalar base plus vector
index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 8. Inactive
addresses are not prefetched from memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>
PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #3]

integer esize = 32;
integer scale = 3;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>
PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #3]

integer esize = 64;
integer scale = 3;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 1 1 Pg Rn 0 prfop
msz<1>msz<0>
PRFD (scalar plus vector) Page 2091

PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #3]

integer esize = 64;
integer scale = 3;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
if n == 31 then
base = SP[];
else
base = X[n];
offset = Z[m];


PRFH (vector plus immediate)
Gather prefetch halfwords (vector plus immediate).
Gather prefetch of halfwords from the active memory addresses generated by a vector base plus immediate index. The
index is a multiple of 2 in the range 0 to 62. Inactive addresses are not prefetched from memory.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
32-bit element
PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer scale = 1;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
64-bit element
PRFH <prfop>, <Pg>, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer scale = 1;
Assembler Symbols
PRFH (vector plus immediate) Page 2094

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
Operation
CheckSVEEnabled();
bits(VL) base;
bits(64) addr;
base = Z[n];
PRFH (vector plus immediate) Page 2095

PRFH (scalar plus immediate)
Contiguous prefetch halfwords (immediate index).
Contiguous prefetch of halfword elements from the memory address generated by a 64-bit scalar base and immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>
SVE
PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer scale = 1;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
"imm6" field.
PRFH (scalar plus immediate) Page 2096

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
base = SP[];
else
base = X[n];

PRFH (scalar plus immediate) Page 2097

PRFH (scalar plus scalar)
Contiguous prefetch halfwords (scalar index).
Contiguous prefetch of halfword elements from the memory address generated by a 64-bit scalar base and scalar
index which is multiplied by 2 and added to the base address. After each element prefetch the index value is
incremented, but the index register is not updated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
SVE
PRFH <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
integer scale = 1;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
PRFH (scalar plus scalar) Page 2098

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
base = SP[];
else
base = X[n];
PRFH (scalar plus scalar) Page 2099

PRFH (scalar plus vector)
Gather prefetch halfwords (scalar plus vector).
Gather prefetch of halfwords from the active memory addresses generated by a 64-bit scalar base plus vector index.
The index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 2. Inactive
addresses are not prefetched from memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>
PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #1]

integer esize = 32;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>
PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #1]

integer esize = 64;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 0 1 Pg Rn 0 prfop
msz<1>msz<0>
PRFH (scalar plus vector) Page 2100

PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #1]

integer esize = 64;
integer scale = 1;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
if n == 31 then
base = SP[];
else
base = X[n];
offset = Z[m];


PRFW (vector plus immediate)
Gather prefetch words (vector plus immediate).
Gather prefetch of words from the active memory addresses generated by a vector base plus immediate index. The
index is a multiple of 4 in the range 0 to 124. Inactive addresses are not prefetched from memory.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
32-bit element
PRFW <prfop>, <Pg>, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer scale = 2;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
64-bit element
PRFW <prfop>, <Pg>, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer scale = 2;
Assembler Symbols
PRFW (vector plus

Page 2103
immediate)
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
Operation
CheckSVEEnabled();
bits(VL) base;
bits(64) addr;
base = Z[n];
PRFW (vector plus

Page 2104
immediate)
PRFW (scalar plus immediate)
Contiguous prefetch words (immediate index).
Contiguous prefetch of word elements from the memory address generated by a 64-bit scalar base and immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>
SVE
PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer scale = 2;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
"imm6" field.
PRFW (scalar plus immediate) Page 2105

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
base = SP[];
else
base = X[n];

PRFW (scalar plus immediate) Page 2106

PRFW (scalar plus scalar)
Contiguous prefetch words (scalar index).
Contiguous prefetch of word elements from the memory address generated by a 64-bit scalar base and scalar index
which is multiplied by 4 and added to the base address. After each element prefetch the index value is incremented,
but the index register is not updated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
SVE
PRFW <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
integer scale = 2;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
PRFW (scalar plus scalar) Page 2107

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
base = SP[];
else
base = X[n];
PRFW (scalar plus scalar) Page 2108

PRFW (scalar plus vector)
Gather prefetch words (scalar plus vector).
Gather prefetch of words from the active memory addresses generated by a 64-bit scalar base plus vector index. The
index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 4. Inactive addresses
are not prefetched from memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>
PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #2]

integer esize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>
PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #2]

integer esize = 64;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
PRFW (scalar plus vector) Page 2109

PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #2]

integer esize = 64;
integer scale = 2;
Assembler Symbols

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) offset;
if n == 31 then
base = SP[];
else
base = X[n];
offset = Z[m];


PTEST
Set condition flags for predicate.
Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate source register, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 1 1 Pg 0 Pn 0 0 0 0 0
S
SVE
PTEST <Pg>, <Pn>.B

integer esize = 8;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(PL) result = P[n];
PTEST Page 2112

PTRUE, PTRUES
Initialise predicate from named constraint.
Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to
false otherwise. If the constraint specifies more elements than are available at the current vector length then all
elements of the destination predicate are set to false.
Undefined Instruction exception. Optionally sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 0 1 1 1 0 0 0 pattern 0 Pd
S
PTRUE <Pd>.<T>{, <pattern>}

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 1 1 1 1 0 0 0 pattern 0 Pd
S
PTRUES <Pd>.<T>{, <pattern>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
PTRUE, PTRUES Page 2113

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = if e < count then '1' else '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(result, result, esize);
P[d] = result;
PTRUE, PTRUES Page 2114

PUNPKHI, PUNPKLO
Unpack and widen half of predicate.
Unpack elements from the lowest or highest half of the source predicate and place in elements of twice their size
within the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: High half and Low half
High half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 Pn 0 Pd
H
High half
PUNPKHI <Pd>.H, <Pn>.B

integer esize = 16;
boolean hi = TRUE;
Low half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 Pn 0 Pd
H
Low half
PUNPKLO <Pd>.H, <Pn>.B

integer esize = 16;
boolean hi = FALSE;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = ElemP[operand, if hi then e + elements else e, esize DIV 2];
P[d] = result;
PUNPKHI, PUNPKLO Page 2115

RBIT
Reverse bits (predicated).
Reverse bits in each active element of the source vector, and place the results in the corresponding elements of the
destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 1 1 1 0 0 Pg Zn Zd
SVE
RBIT <Zd>.<T>, <Pg>/M, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = BitReverse(element);
Z[d] = result;
UNPREDICTABLE:
RBIT Page 2116

RDFFR (unpredicated)
Read the first-fault register.
Read the first-fault register (FFR) and place in the destination predicate without predication.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Pd
S
SVE
RDFFR <Pd>.B

Assembler Symbols
Operation
CheckSVEEnabled();
bits(PL) ffr = FFR[];
P[d] = ffr;
RDFFR (unpredicated) Page 2117

RDFFR, RDFFRS (predicated)
Return predicate of succesfully loaded elements.
Read the first-fault register (FFR) and place active elements in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Optionally sets the FIRST (N), NONE
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd
S
RDFFR <Pd>.B, <Pg>/Z

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd
S
RDFFRS <Pd>.B, <Pg>/Z

Assembler Symbols
Operation
CheckSVEEnabled();
bits(PL) ffr = FFR[];
bits(PL) result = ffr AND mask;
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, 8);
P[d] = result;
RDFFR, RDFFRS (predicated) Page 2118

RDVL
Read multiple of vector register size to scalar register.
Multiply the current vector register size in bytes by an immediate in the range -32 to 31 and place the result in the
64-bit destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 0 imm6 Rd
SVE
RDVL <Xd>, #<imm>

Assembler Symbols
Operation
CheckSVEEnabled();
integer len = imm * (VL DIV 8);
X[d] = len<63:0>;
RDVL Page 2119

REV (predicate)
Reverse all elements in a predicate.
Reverse the order of all elements in the source predicate and place in the destination predicate. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 1 0 0 0 1 0 0 0 0 0 Pn 0 Pd
SVE
REV <Pd>.<T>, <Pn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) result = Reverse(operand, esize DIV 8);
P[d] = result;
REV (predicate) Page 2120

REV (vector)
Reverse all elements in a vector (unpredicated).
Reverse the order of all elements in the source vector and place in the destination vector. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 1 0 0 0 0 0 1 1 1 0 Zn Zd
SVE
REV <Zd>.<T>, <Zn>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result = Reverse(operand, esize);
Z[d] = result;
REV (vector) Page 2121

REVB, REVH, REVW
Reverse bytes / halfwords / words within elements (predicated).
Reverse the order of 8-bit bytes, 16-bit halfwords or 32-bit words within each active element of the source vector, and
place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector
It has encodings from 3 classes: Byte , Halfword and Word
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 0 1 0 0 Pg Zn Zd
Byte
REVB <Zd>.<T>, <Pg>/M, <Zn>.<T>

integer swsize = 8;
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 1 1 0 0 Pg Zn Zd
Halfword
REVH <Zd>.<T>, <Pg>/M, <Zn>.<T>

if size != '1x' then UNDEFINED;
integer swsize = 16;
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 1 0 1 0 0 Pg Zn Zd
Word
REVW <Zd>.D, <Pg>/M, <Zn>.D

integer swsize = 32;
REVB, REVH, REVW Page 2122

Assembler Symbols
<T> For the byte variant: is the size specifier, encoded in “size”:
size <T>
00 RESERVED
01 H
10 S
11 D
For the halfword variant: is the size specifier, encoded in “size<0>”:
size<0> <T>
0 S
1 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = Reverse(element, swsize);
Z[d] = result;
UNPREDICTABLE:
REVB, REVH, REVW Page 2123

SABD
Signed absolute difference (predicated).
Compute the absolute difference between signed integer values in active elements of the second source vector and
corresponding elements of the first source vector and destructively place the difference in the corresponding elements
of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 1 0 0 0 0 0 Pg Zm Zdn
U
SVE
SABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer absdiff = Abs(element1 - element2);
Elem[result, e, esize] = absdiff<esize-1:0>;
else
Z[dn] = result;
UNPREDICTABLE:
SABD Page 2124

SABD Page 2125

SADDV
Signed add reduction to scalar.
Signed add horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Narrow elements are first sign-extended to 64 bits. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 0 0 0 1 Pg Zn Vd
U
SVE
SADDV <Dd>, <Pg>, <Zn>.<T>

Assembler Symbols
<Dd> Is the 64-bit name of the destination SIMD&FP register, encoded in the "Vd" field.
size <T>
00 B
01 H
10 S
11 RESERVED
Operation
CheckSVEEnabled();
integer sum = 0;
sum = sum + element;
V[d] = sum<63:0>;
SADDV Page 2126

SCVTF
Signed integer convert to floating-point (predicated).
Convert to floating-point from the signed integer in each active element of the source vector, and place the results in
the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
It has encodings from 7 classes: 16-bit to half-precision , 32-bit to half-precision , 32-bit to single-precision , 32-bit to
double-precision , 64-bit to half-precision , 64-bit to single-precision and 64-bit to double-precision
16-bit to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 Pg Zn Zd
int_U
SCVTF <Zd>.H, <Pg>/M, <Zn>.H

integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U
SCVTF <Zd>.H, <Pg>/M, <Zn>.S

integer esize = 32;
32-bit to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U
SCVTF Page 2127

SCVTF <Zd>.S, <Pg>/M, <Zn>.S

integer esize = 32;
32-bit to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 1 Pg Zn Zd
int_U
SCVTF <Zd>.D, <Pg>/M, <Zn>.S

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 Pg Zn Zd
int_U
SCVTF <Zd>.H, <Pg>/M, <Zn>.D

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U
SCVTF Page 2128

SCVTF <Zd>.S, <Pg>/M, <Zn>.D

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 0 1 0 1 Pg Zn Zd
int_U
SCVTF <Zd>.D, <Pg>/M, <Zn>.D

integer esize = 64;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(d_esize) fpval = FixedToFP(element<s_esize-1:0>, 0, unsigned, FPCR<31:0>, rounding);
Elem[result, e, esize] = ZeroExtend(fpval);
Z[d] = result;
UNPREDICTABLE:
SCVTF Page 2129

SCVTF Page 2130

SDIV
Signed divide (predicated).
Signed divide active elements of the first source vector by corresponding elements of the second source vector and
destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 0 0 0 Pg Zm Zdn
R U
SVE
SDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size<0> <T>
0 S
1 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer quotient;
if element2 == 0 then
quotient = 0;
else
quotient = RoundTowardsZero(Real(element1) / Real(element2));
Elem[result, e, esize] = quotient<esize-1:0>;
else
Z[dn] = result;
SDIV Page 2131

UNPREDICTABLE:
SDIV Page 2132

SDIVR
Signed reversed divide (predicated).
Signed reversed divide active elements of the second source vector by corresponding elements of the first source
vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 0 0 0 0 Pg Zm Zdn
R U
SVE
SDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size<0> <T>
0 S
1 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer quotient;
quotient = 0;
else
else
Z[dn] = result;
SDIVR Page 2133

UNPREDICTABLE:
SDIVR Page 2134

SDOT (vectors)
Signed integer dot product.
The signed integer dot product instruction computes the dot product of a group of four signed 8-bit or 16-bit integer
values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four signed 8-bit or
16-bit integer values in the corresponding 32-bit or 64-bit element of the second source vector, and then destructively
adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 0 Zn Zda
U
SVE
SDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>

Assembler Symbols
size<0> <T>
0 S
1 D
<Tb> Is the size specifier, encoded in “size<0>”:
size<0> <Tb>
0 B
1 H
Operation
CheckSVEEnabled();
bits(VL) result;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = SInt(Elem[operand1, 4 * e + i, esize DIV 4]);
Elem[result, e, esize] = res;
Z[da] = result;
SDOT (vectors) Page 2135

UNPREDICTABLE:
SDOT (vectors) Page 2136

SDOT (indexed)
Signed integer indexed dot product.
The signed integer indexed dot product instruction computes the dot product of a group of four signed 8-bit or 16-bit
integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four signed 8-bit
or 16-bit integer values in an indexed 32-bit or 64-bit element of the second source vector, and then destructively adds
the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to one less than the number of groups per
128-bit segment, encoded in 1 to 2 bits depending on the size of the group. This instruction is unpredicated.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> U
32-bit
SDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

integer esize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> U
64-bit
SDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]

integer esize = 64;
Assembler Symbols
<Zm> For the 32-bit variant: is the name of the second source scalable vector register Z0-Z7, encoded in the
"Zm" field.
For the 64-bit variant: is the name of the second source scalable vector register Z0-Z15, encoded in the
"Zm" field.
<imm> For the 32-bit variant: is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit
vector segment, in the range 0 to 3, encoded in the "i2" field.
For the 64-bit variant: is the immediate index of a quadtuplet of four 16-bit elements within each 128-bit
SDOT (indexed) Page 2137

Operation
CheckSVEEnabled();
bits(VL) result;
for i = 0 to 3
integer element2 = SInt(Elem[operand2, 4 * s + i, esize DIV 4]);
Z[da] = result;
UNPREDICTABLE:
SDOT (indexed) Page 2138

SEL (predicates)
Conditionally select elements from two predicates.
Read active elements from the first source predicate and inactive elements from the second source predicate and
place in the corresponding elements of the destination predicate. Does not set the condition flags.
This instruction is used by the alias MOV (predicate, predicated, merging).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S
SVE
SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B

integer esize = 8;
Assembler Symbols
Alias Conditions

MOV (predicate, predicated, merging) Pd == Pm
Operation
CheckSVEEnabled();
bits(PL) result;
ElemP[result, e, esize] = element1;
else
ElemP[result, e, esize] = element2;
P[d] = result;
SEL (predicates) Page 2139

SEL (vectors)
Conditionally select elements from two vectors.
Read active elements from the first source vector and inactive elements from the second source vector and place in
the corresponding elements of the destination vector.
This instruction is used by the alias MOV (vector, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 1 1 Pg Zn Zd
SVE
SEL <Zd>.<T>, <Pg>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Alias Conditions

MOV (vector, predicated) Zd == Zm
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[d] = result;
SEL (vectors) Page 2140

SEL (vectors) Page 2141

SETFFR
Initialise the first-fault register to all true.
Initialise the first-fault register (FFR) to all true prior to a sequence of first-fault or non-fault loads. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
SVE
SETFFR
Operation
CheckSVEEnabled();
FFR[] = Ones(PL);
SETFFR Page 2142

SMAX (vectors)
Signed maximum vectors (predicated).
Determine the signed maximum of active elements of the second source vector and corresponding elements of the first
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 0 0 0 0 Pg Zm Zdn
U
SVE
SMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer maximum = Max(element1, element2);
Elem[result, e, esize] = maximum<esize-1:0>;
else
Z[dn] = result;
UNPREDICTABLE:
SMAX (vectors) Page 2143

SMAX (vectors) Page 2144

SMAX (immediate)
Signed maximum with immediate (unpredicated).
Determine the signed maximum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a signed 8-bit value in the range -128 to
+127, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 1 0 imm8 Zdn
U
SVE
SMAX <Zdn>.<T>, <Zdn>.<T>, #<imm>

integer imm = Int(imm8, unsigned);
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = Max(element1, imm)<esize-1:0>;
Z[dn] = result;
UNPREDICTABLE:
SMAX (immediate) Page 2145

SMAXV
Signed maximum reduction to scalar.
Signed maximum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the minimum signed integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 0 0 0 1 Pg Zn Vd
U
SVE
SMAXV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer maximum = if unsigned then 0 else -(2^(esize-1));
integer element = Int(Elem[operand, e, esize], unsigned);
maximum = Max(maximum, element);
V[d] = maximum<esize-1:0>;
SMAXV Page 2146

SMIN (vectors)
Signed minimum vectors (predicated).
Determine the signed minimum of active elements of the second source vector and corresponding elements of the first
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 0 0 0 0 Pg Zm Zdn
U
SVE
SMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer minimum = Min(element1, element2);
Elem[result, e, esize] = minimum<esize-1:0>;
else
Z[dn] = result;
UNPREDICTABLE:
SMIN (vectors) Page 2147

SMIN (vectors) Page 2148

SMIN (immediate)
Signed minimum with immediate (unpredicated).
Determine the signed minimum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a signed 8-bit value in the range -128 to
+127, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 1 0 imm8 Zdn
U
SVE
SMIN <Zdn>.<T>, <Zdn>.<T>, #<imm>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = Min(element1, imm)<esize-1:0>;
Z[dn] = result;
UNPREDICTABLE:
SMIN (immediate) Page 2149

SMINV
Signed minimum reduction to scalar.
Signed minimum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the maximum signed integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 0 0 0 1 Pg Zn Vd
U
SVE
SMINV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer minimum = if unsigned then (2^esize - 1) else (2^(esize-1) - 1);
minimum = Min(minimum, element);
V[d] = minimum<esize-1:0>;
SMINV Page 2150

SMMLA
Signed integer matrix multiply-accumulate.
The signed integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of signed 8-bit integer values
held in each 128-bit segment of the first source vector by the 8×2 matrix of signed 8-bit integer values in the
corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is then
destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend and
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 0 0 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>
SVE
SMMLA <Zda>.S, <Zn>.B, <Zm>.B
if !HaveSVE() || !HaveInt8MatMulExt() then UNDEFINED;

boolean op1_unsigned = FALSE;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(128) op1, op2;
res = MatMulAdd(addend, op1, op2, op1_unsigned, op2_unsigned);
Z[da] = result;
UNPREDICTABLE:
SMMLA Page 2151

SMMLA Page 2152

SMULH
Signed multiply returning high half (predicated).
Widening multiply signed integer values in active elements of the first source vector by corresponding elements of the
second source vector and destructively place the high half of the result in the corresponding elements of the first
source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 0 0 0 0 Pg Zm Zdn
H U
SVE
SMULH <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer product = (element1 * element2) >> esize;
else
Z[dn] = result;
UNPREDICTABLE:
SMULH Page 2153

SMULH Page 2154

SPLICE
Splice two vectors under predicate control.
Copy the first active to last active elements (inclusive) from the first source vector to the lowest-numbered elements of
the result. Then set any remaining elements of the result to a copy of the lowest-numbered elements from the second
source vector. The result is placed destructively in the first source vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 Pg Zm Zdn
SVE
SPLICE <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer x = 0;
boolean active = FALSE;
integer lastnum = LastActiveElement(mask, esize);
if lastnum >= 0 then

for e = 0 to lastnum
active = active || ElemP[mask, e, esize] == '1';
if active then
Elem[result, x, esize] = Elem[operand1, e, esize];
x = x + 1;
elements = elements - x - 1;
for e = 0 to elements
Elem[result, x, esize] = Elem[operand2, e, esize];
x = x + 1;
Z[dn] = result;
SPLICE Page 2155

UNPREDICTABLE:
SPLICE Page 2156

SQADD (immediate)
Signed saturating add immediate (unpredicated).
Signed saturating add of an unsigned immediate to each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 0 0 1 1 sh imm8 Zdn
U
SVE
SQADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 + imm, esize, unsigned);
Z[dn] = result;
SQADD (immediate) Page 2157

UNPREDICTABLE:
SQADD (immediate) Page 2158

SQADD (vectors)
Signed saturating add vectors (unpredicated).
Signed saturating add all elements of the second source vector to corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. Each result element is saturated to the
N-bit element's signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 0 0 Zn Zd
U
SVE
SQADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 + element2, esize, unsigned);
Z[d] = result;
SQADD (vectors) Page 2159

SQDECB
Signed saturating decrement scalar by multiple of 8-bit predicate constraint element count.
Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
32-bit
SQDECB <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 1 0 pattern Rdn
64-bit
SQDECB <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
integer ssize = 64;
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
SQDECB Page 2160

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) operand1 = X[dn];
bits(ssize) result;
integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);
SQDECB Page 2161

SQDECD (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 1 0 pattern Rdn
32-bit
SQDECD <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 1 0 pattern Rdn
64-bit
SQDECD <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
integer ssize = 64;
Assembler Symbols
SQDECD (scalar) Page 2162

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

SQDECD (scalar) Page 2163

SQDECD (vector)
Signed saturating decrement vector by multiple of 64-bit predicate constraint element count.
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 64-bit signed integer range.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U
SVE
SQDECD <Zdn>.D{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
SQDECD (vector) Page 2164

Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);
Z[dn] = result;
UNPREDICTABLE:
SQDECD (vector) Page 2165

SQDECH (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 1 0 pattern Rdn
32-bit
SQDECH <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 1 0 pattern Rdn
64-bit
SQDECH <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
integer ssize = 64;
Assembler Symbols
SQDECH (scalar) Page 2166

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

SQDECH (scalar) Page 2167

SQDECH (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U
SVE
SQDECH <Zdn>.H{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
SQDECH (vector) Page 2168

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
SQDECH (vector) Page 2169

SQDECP (scalar)
Signed saturating decrement scalar by count of true predicate elements.
destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated
result is then sign-extended to 64 bits.
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 1 0 0 Pm Rdn
D U sf
32-bit
SQDECP <Xdn>, <Pm>.<T>, <Wdn>

integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 1 1 0 Pm Rdn
D U sf
64-bit
SQDECP <Xdn>, <Pm>.<T>

integer ssize = 64;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
SQDECP (scalar) Page 2170

Operation
CheckSVEEnabled();
bits(ssize) result;
integer count = 0;
count = count + 1;
integer element = Int(operand1, unsigned);

(result, -) = SatQ(element - count, ssize, unsigned);
SQDECP (scalar) Page 2171

SQDECP (vector)
Signed saturating decrement vector by count of true predicate elements.
vector elements. The results are saturated to the element signed integer range.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 0 0 0 Pm Zdn
D U
SVE
SQDECP <Zdn>.<T>, <Pm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer count = 0;
count = count + 1;
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element - count, esize, unsigned);
Z[dn] = result;
UNPREDICTABLE:
SQDECP (vector) Page 2172

SQDECP (vector) Page 2173

SQDECW (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 1 0 pattern Rdn
32-bit
SQDECW <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 1 0 pattern Rdn
64-bit
SQDECW <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
integer ssize = 64;
Assembler Symbols
SQDECW (scalar) Page 2174

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

SQDECW (scalar) Page 2175

SQDECW (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U
SVE
SQDECW <Zdn>.S{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
SQDECW (vector) Page 2176

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
SQDECW (vector) Page 2177

SQINCB
Signed saturating increment scalar by multiple of 8-bit predicate constraint element count.
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 0 0 pattern Rdn
32-bit
SQINCB <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 0 0 pattern Rdn
64-bit
SQINCB <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
integer ssize = 64;
Assembler Symbols
SQINCB Page 2178

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
SQINCB Page 2179

SQINCD (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 0 0 pattern Rdn
32-bit
SQINCD <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 0 0 pattern Rdn
64-bit
SQINCD <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
integer ssize = 64;
Assembler Symbols
SQINCD (scalar) Page 2180

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

SQINCD (scalar) Page 2181

SQINCD (vector)
Signed saturating increment vector by multiple of 64-bit predicate constraint element count.
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U
SVE
SQINCD <Zdn>.D{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
SQINCD (vector) Page 2182

Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);
Z[dn] = result;
UNPREDICTABLE:
SQINCD (vector) Page 2183

SQINCH (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 0 0 pattern Rdn
32-bit
SQINCH <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 0 0 pattern Rdn
64-bit
SQINCH <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
integer ssize = 64;
Assembler Symbols
SQINCH (scalar) Page 2184

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

SQINCH (scalar) Page 2185

SQINCH (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U
SVE
SQINCH <Zdn>.H{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
SQINCH (vector) Page 2186

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
SQINCH (vector) Page 2187

SQINCP (scalar)
Signed saturating increment scalar by count of true predicate elements.
destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated
result is then sign-extended to 64 bits.
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 1 0 0 Pm Rdn
D U sf
32-bit
SQINCP <Xdn>, <Pm>.<T>, <Wdn>

integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 1 1 0 Pm Rdn
D U sf
64-bit
SQINCP <Xdn>, <Pm>.<T>

integer ssize = 64;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
SQINCP (scalar) Page 2188

Operation
CheckSVEEnabled();
bits(ssize) result;
integer count = 0;
count = count + 1;

(result, -) = SatQ(element + count, ssize, unsigned);
SQINCP (scalar) Page 2189

SQINCP (vector)
Signed saturating increment vector by count of true predicate elements.
vector elements. The results are saturated to the element signed integer range.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 0 0 0 Pm Zdn
D U
SVE
SQINCP <Zdn>.<T>, <Pm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer count = 0;
count = count + 1;
(Elem[result, e, esize], -) = SatQ(element + count, esize, unsigned);
Z[dn] = result;
UNPREDICTABLE:
SQINCP (vector) Page 2190

SQINCP (vector) Page 2191

SQINCW (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 0 0 pattern Rdn
32-bit
SQINCW <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 0 0 pattern Rdn
64-bit
SQINCW <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
integer ssize = 64;
Assembler Symbols
SQINCW (scalar) Page 2192

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

SQINCW (scalar) Page 2193

SQINCW (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U
SVE
SQINCW <Zdn>.S{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
SQINCW (vector) Page 2194

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
SQINCW (vector) Page 2195

SQSUB (immediate)
Signed saturating subtract immediate (unpredicated).
Signed saturating subtract of an unsigned immediate from each element of the source vector, and destructively place
the results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 1 0 1 1 sh imm8 Zdn
U
SVE
SQSUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 - imm, esize, unsigned);
Z[dn] = result;
SQSUB (immediate) Page 2196

UNPREDICTABLE:
SQSUB (immediate) Page 2197

SQSUB (vectors)
Signed saturating subtract vectors (unpredicated).
Signed saturating subtract all elements of the second source vector from corresponding elements of the first source
vector and place the results in the corresponding elements of the destination vector. Each result element is saturated
to the N-bit element's signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 1 0 Zn Zd
U
SVE
SQSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 - element2, esize, unsigned);
Z[d] = result;
SQSUB (vectors) Page 2198

ST1B (vector plus immediate)
Scatter store bytes from a vector (immediate index).
Scatter store of bytes from the active elements of a vector register to the memory addresses generated by a vector
base plus immediate index. The index is in the range 0 to 31. Inactive elements are not written to memory.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
32-bit element
ST1B { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 8;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
64-bit element
ST1B { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 8;
Assembler Symbols
"imm5" field.
ST1B (vector plus immediate) Page 2199

Operation
CheckSVEEnabled();
bits(VL) src = Z[t];
bits(64) addr;
Mem[addr, mbytes, AccType_NORMAL] = Elem[src, e, esize]<msize-1:0>;
ST1B (vector plus immediate) Page 2200

ST1B (scalar plus immediate)
Contiguous store bytes from vector (immediate index).
Contiguous store of bytes from elements of a vector register to the memory address generated by a 64-bit scalar base
and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer msize = 8;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
"imm4" field.
ST1B (scalar plus immediate) Page 2201

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];


ST1B (scalar plus scalar)
Contiguous store bytes from vector (scalar index).
Contiguous store of bytes from elements of a vector register to the memory address generated by a 64-bit scalar base
and scalar index which is added to the base address. After each element access the index value is incremented, but the
index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 size Rm 0 1 0 Pg Rn Zt
SVE
ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>]

integer msize = 8;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
ST1B (scalar plus scalar) Page 2203

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

ST1B (scalar plus vector)
Scatter store bytes from a vector (vector index).
Scatter store of bytes from the active elements of a vector register to the memory addresses generated by a 64-bit
elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1B { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1B { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 8;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
ST1B (scalar plus vector) Page 2205

ST1B { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 8;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
ST1B (scalar plus vector) Page 2206

ST1D (vector plus immediate)
Scatter store doublewords from a vector (immediate index).
Scatter store of doublewords from the active elements of a vector register to the memory addresses generated by a
vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements are not written
to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
SVE
ST1D { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 64;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) addr;
ST1D (vector plus immediate) Page 2207

ST1D (scalar plus immediate)
Contiguous store doublewords from vector (immediate index).
Contiguous store of doublewords from elements of a vector register to the memory address generated by a 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer msize = 64;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

ST1D (scalar plus immediate) Page 2208

ST1D (scalar plus scalar)
Contiguous store doublewords from vector (scalar index).
Contiguous store of doublewords from elements of a vector register to the memory address generated by a 64-bit
index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 Rm 0 1 0 Pg Rn Zt
SVE
ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer msize = 64;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
ST1D (scalar plus scalar) Page 2209

ST1D (scalar plus vector)
Scatter store doublewords from a vector (vector index).
Scatter store of doublewords from the active elements of a vector register to the memory addresses generated by a
64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and
then optionally multiplied by 8. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #3]

integer esize = 64;
integer msize = 64;
integer scale = 3;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 64;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
ST1D (scalar plus vector) Page 2210

ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #3]

integer esize = 64;
integer msize = 64;
integer scale = 3;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 64;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

ST1H (vector plus immediate)
Scatter store halfwords from a vector (immediate index).
Scatter store of halfwords from the active elements of a vector register to the memory addresses generated by a
vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements are not written
to memory.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
32-bit element
ST1H { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 16;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
64-bit element
ST1H { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 16;
Assembler Symbols
ST1H (vector plus immediate) Page 2213

Operation
CheckSVEEnabled();
bits(64) addr;
ST1H (vector plus immediate) Page 2214

ST1H (scalar plus immediate)
Contiguous store halfwords from vector (immediate index).
Contiguous store of halfwords from elements of a vector register to the memory address generated by a 64-bit scalar
base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer msize = 16;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
"imm4" field.
ST1H (scalar plus immediate) Page 2215

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];


ST1H (scalar plus scalar)
Contiguous store halfwords from vector (scalar index).
Contiguous store of halfwords from elements of a vector register to the memory address generated by a 64-bit scalar
base and scalar index which is multiplied by 2 and added to the base address. After each element access the index
value is incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 size Rm 0 1 0 Pg Rn Zt
SVE
ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

integer msize = 16;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
ST1H (scalar plus scalar) Page 2217

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

ST1H (scalar plus vector)
Scatter store halfwords from a vector (vector index).
Scatter store of halfwords from the active elements of a vector register to the memory addresses generated by a 64-bit
optionally multiplied by 2. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1H { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #1]

integer esize = 32;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1H (scalar plus vector) Page 2219

ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1H { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 16;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #1]

integer esize = 64;
integer msize = 16;
integer scale = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 16;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

ST1W (vector plus immediate)
Scatter store words from a vector (immediate index).
Scatter store of words from the active elements of a vector register to the memory addresses generated by a vector
base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements are not written to
memory.
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
32-bit element
ST1W { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}]

integer esize = 32;
integer msize = 32;
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
64-bit element
ST1W { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]

integer esize = 64;
integer msize = 32;
Assembler Symbols
ST1W (vector plus immediate) Page 2222

Operation
CheckSVEEnabled();
bits(64) addr;
ST1W (vector plus immediate) Page 2223

ST1W (scalar plus immediate)
Contiguous store words from vector (immediate index).
Contiguous store of words from elements of a vector register to the memory address generated by a 64-bit scalar base
and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer msize = 32;
Assembler Symbols
size<0> <T>
0 S
1 D
"imm4" field.
ST1W (scalar plus immediate) Page 2224

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];


ST1W (scalar plus scalar)
Contiguous store words from vector (scalar index).
Contiguous store of words from elements of a vector register to the memory address generated by a 64-bit scalar base
and scalar index which is multiplied by 4 and added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 size Rm 0 1 0 Pg Rn Zt
SVE
ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

integer msize = 32;
Assembler Symbols
size<0> <T>
0 S
1 D
ST1W (scalar plus scalar) Page 2226

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

ST1W (scalar plus vector)
Scatter store words from a vector (vector index).
Scatter store of words from the active elements of a vector register to the memory addresses generated by a 64-bit
optionally multiplied by 4. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #2]

integer esize = 32;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1W (scalar plus vector) Page 2228

ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

integer esize = 64;
integer msize = 32;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
ST1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]

integer esize = 32;
integer msize = 32;
integer scale = 0;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #2]

integer esize = 64;
integer msize = 32;
integer scale = 2;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

integer esize = 64;
integer msize = 32;
integer scale = 0;
Assembler Symbols
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];

Contiguous store two-byte structures from two vectors (immediate index).
Contiguous store two-byte structures, each from the same element number in two vector registers to the memory
two consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

for r = 0 to nreg-1
Mem[addr, mbytes, AccType_NORMAL] = Elem[values[r], e, esize];

Contiguous store two-byte structures from two vectors (scalar index).
Contiguous store two-byte structures, each from the same element number in two vector registers to the memory
structure access the index value is incremented by two. The index register is not updated by the instruction.
two consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>, <Xm>]

integer esize = 8;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store two-doubleword structures from two vectors (immediate index).
Contiguous store two-doubleword structures, each from the same element number in two vector registers to the
two consecutive doublewords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store two-doubleword structures from two vectors (scalar index).
Contiguous store two-doubleword structures, each from the same element number in two vector registers to the
option) and added to the base address. After each structure access the index value is incremented by two. The index
two consecutive doublewords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store two-halfword structures from two vectors (immediate index).
Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory
two consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store two-halfword structures from two vectors (scalar index).
Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory
two consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store two-word structures from two vectors (immediate index).
Contiguous store two-word structures, each from the same element number in two vector registers to the memory
two consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store two-word structures from two vectors (scalar index).
Contiguous store two-word structures, each from the same element number in two vector registers to the memory
two consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
integer nreg = 2;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store three-byte structures from three vectors (immediate index).
Contiguous store three-byte structures, each from the same element number in three vector registers to the memory
the three consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store three-byte structures from three vectors (scalar index).
Contiguous store three-byte structures, each from the same element number in three vector registers to the memory
structure access the index value is incremented by three. The index register is not updated by the instruction.
the three consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>, <Xm>]

integer esize = 8;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store three-doubleword structures from three vectors (immediate index).
Contiguous store three-doubleword structures, each from the same element number in three vector registers to the
the three consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store three-doubleword structures from three vectors (scalar index).
Contiguous store three-doubleword structures, each from the same element number in three vector registers to the
the three consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store three-halfword structures from three vectors (immediate index).
Contiguous store three-halfword structures, each from the same element number in three vector registers to the
the three consecutive halfwords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store three-halfword structures from three vectors (scalar index).
Contiguous store three-halfword structures, each from the same element number in three vector registers to the
the three consecutive halfwords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store three-word structures from three vectors (immediate index).
Contiguous store three-word structures, each from the same element number in three vector registers to the memory
the three consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store three-word structures from three vectors (scalar index).
Contiguous store three-word structures, each from the same element number in three vector registers to the memory
and added to the base address. After each structure access the index value is incremented by three. The index register
the three consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
integer nreg = 3;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store four-byte structures from four vectors (immediate index).
Contiguous store four-byte structures, each from the same element number in four vector registers to the memory
four consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store four-byte structures from four vectors (scalar index).
Contiguous store four-byte structures, each from the same element number in four vector registers to the memory
structure access the index value is incremented by four. The index register is not updated by the instruction.
four consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>, <Xm>]

integer esize = 8;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store four-doubleword structures from four vectors (immediate index).
Contiguous store four-doubleword structures, each from the same element number in four vector registers to the
four consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store four-doubleword structures from four vectors (scalar index).
Contiguous store four-doubleword structures, each from the same element number in four vector registers to the
option) and added to the base address. After each structure access the index value is incremented by four. The index
four consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store four-halfword structures from four vectors (immediate index).
Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory
four consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store four-halfword structures from four vectors (scalar index).
Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory
four consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

Contiguous store four-word structures from four vectors (immediate index).
Contiguous store four-word structures, each from the same element number in four vector registers to the memory
four consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1

for r = 0 to nreg-1

Contiguous store four-word structures from four vectors (scalar index).
Contiguous store four-word structures, each from the same element number in four vector registers to the memory
four consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
integer nreg = 4;
Assembler Symbols

Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
for r = 0 to nreg-1
for r = 0 to nreg-1

STNT1B (scalar plus immediate)
Contiguous store non-temporal bytes from vector (immediate index).
Contiguous store non-temporal of bytes from elements of a vector register to the memory address generated by a
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 8;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) src;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];

Mem[addr, mbytes, AccType_STREAM] = Elem[src, e, esize];
STNT1B (scalar plus

Page 2279
immediate)
STNT1B (scalar plus scalar)
Contiguous store non-temporal bytes from vector (scalar index).
Contiguous store non-temporal of bytes from elements of a vector register to the memory address generated by a
64-bit scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>, <Xm>]

integer esize = 8;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) src;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];
STNT1B (scalar plus scalar) Page 2280

STNT1B (scalar plus scalar) Page 2281

STNT1D (scalar plus immediate)
Contiguous store non-temporal doublewords from vector (immediate index).
Contiguous store non-temporal of doublewords from elements of a vector register to the memory address generated by
a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 64;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) src;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];

STNT1D (scalar plus

Page 2282
immediate)
STNT1D (scalar plus scalar)
Contiguous store non-temporal doublewords from vector (scalar index).
Contiguous store non-temporal of doublewords from elements of a vector register to the memory address generated by
a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

integer esize = 64;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) src;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];
STNT1D (scalar plus scalar) Page 2283

STNT1D (scalar plus scalar) Page 2284

STNT1H (scalar plus immediate)
Contiguous store non-temporal halfwords from vector (immediate index).
Contiguous store non-temporal of halfwords from elements of a vector register to the memory address generated by a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 16;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) src;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];

STNT1H (scalar plus

Page 2285
immediate)
STNT1H (scalar plus scalar)
Contiguous store non-temporal halfwords from vector (scalar index).
Contiguous store non-temporal of halfwords from elements of a vector register to the memory address generated by a
the index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

integer esize = 16;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) src;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];
STNT1H (scalar plus scalar) Page 2286

STNT1H (scalar plus scalar) Page 2287

STNT1W (scalar plus immediate)
Contiguous store non-temporal words from vector (immediate index).
Contiguous store non-temporal of words from elements of a vector register to the memory address generated by a
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

integer esize = 32;
Assembler Symbols
"imm4" field.
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) src;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];

STNT1W (scalar plus

Page 2288
immediate)
STNT1W (scalar plus scalar)
Contiguous store non-temporal words from vector (scalar index).
Contiguous store non-temporal of words from elements of a vector register to the memory address generated by a
the index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
SVE
STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

integer esize = 32;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(64) base;
bits(64) addr;
bits(VL) src;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];
STNT1W (scalar plus scalar) Page 2289

STNT1W (scalar plus scalar) Page 2290

STR (predicate)
Store predicate register.
Store a predicate register to a memory address generated by a 64-bit scalar base, plus an immediate offset in the
range -256 to 255 which is multiplied by the current predicate register size in bytes. This instruction is unpredicated.
The store is performed as a stream of bytes containing 8 consecutive predicate bits in ascending element order,
without any endian conversion.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 imm9h 0 0 0 imm9l Rn 0 Pt
SVE
STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]

integer t = UInt(Pt);
Assembler Symbols
<Pt> Is the name of the scalable predicate transfer register, encoded in the "Pt" field.
Operation
CheckSVEEnabled();
integer elements = PL DIV 8;
bits(PL) src;
bits(64) base;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = P[t];
boolean aligned = AArch64.CheckAlignment(base + offset, 2, AccType_NORMAL, TRUE);
AArch64.MemSingle[base + offset, 1, AccType_NORMAL, aligned] = Elem[src, e, 8];
STR (predicate) Page 2291

STR (vector)
Store vector register.
Store a vector register to a memory address generated by a 64-bit scalar base, plus an immediate offset in the range
-256 to 255 which is multiplied by the current vector register size in bytes. This instruction is unpredicated.
The store is performed as a stream of byte elements in ascending element order, without any endian conversion.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 imm9h 0 1 0 imm9l Rn Zt
SVE
STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]

Assembler Symbols
Operation
CheckSVEEnabled();
bits(VL) src;
bits(64) base;
if n == 31 then
CheckSPAlignment();
base = SP[];
else
base = X[n];
src = Z[t];
boolean aligned = AArch64.CheckAlignment(base + offset, 16, AccType_NORMAL, TRUE);
AArch64.MemSingle[base + offset, 1, AccType_NORMAL, aligned] = Elem[src, e, 8];
STR (vector) Page 2292

SUB (vectors, predicated)
Subtract vectors (predicated).
Subtract active elements of the second source vector from corresponding elements of the first source vector and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 1 0 0 0 Pg Zm Zdn
SVE
SUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
SUB (vectors, predicated) Page 2293

SUB (vectors, predicated) Page 2294

SUB (immediate)
Subtract immediate (unpredicated).
Subtract an unsigned immediate from each element of the source vector, and destructively place the results in the
corresponding elements of the source vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 1 1 1 sh imm8 Zdn
SVE
SUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = element1 - imm;
Z[dn] = result;
UNPREDICTABLE:


SUB (vectors, unpredicated)
Subtract vectors (unpredicated).
Subtract all elements of the second source vector from corresponding elements of the first source vector and place the
results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 0 0 1 Zn Zd
SVE
SUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Z[d] = result;
SUB (vectors, unpredicated) Page 2297

SUBR (vectors)
Reversed subtract vectors (predicated).
Reversed subtract active elements of the first source vector from corresponding elements of the second source vector
and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 1 1 0 0 0 Pg Zm Zdn
SVE
SUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
else
Z[dn] = result;
UNPREDICTABLE:
SUBR (vectors) Page 2298

SUBR (vectors) Page 2299

SUBR (immediate)
Reversed subtract from immediate (unpredicated).
Reversed subtract from an unsigned immediate each element of the source vector, and destructively place the results
in the corresponding elements of the source vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 1 1 1 1 sh imm8 Zdn
SVE
SUBR <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = (imm - element1)<esize-1:0>;
Z[dn] = result;
UNPREDICTABLE:
SUBR (immediate) Page 2300

SUBR (immediate) Page 2301

SUDOT
Signed by unsigned integer indexed dot product.
The signed by unsigned integer indexed dot product instruction computes the dot product of a group of four signed
8-bit integer values held in each 32-bit element of the first source vector multiplied by a group of four unsigned 8-bit
integer values in an indexed 32-bit element of the second source vector, and then destructively adds the widened dot
product to the corresponding 32-bit element of the destination vector.
position within each 128-bit vector segment. The index range is from 0 to 3. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 1 1 Zn Zda
size<1>size<0> U
SVE
SUDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

integer esize = 32;
Assembler Symbols
<imm> Is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit vector segment, in the
range 0 to 3, encoded in the "i2" field.
Operation
CheckSVEEnabled();
bits(VL) result;
for i = 0 to 3
integer element2 = UInt(Elem[operand2, 4 * s + i, esize DIV 4]);
Z[da] = result;
UNPREDICTABLE:
SUDOT Page 2302

SUDOT Page 2303

SUNPKHI, SUNPKLO
Signed unpack and extend half of vector.
Unpack elements from the lowest or highest half of the source vector and then sign-extend them to place in elements
of twice their size within the destination vector. This instruction is unpredicated.
High half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 1 0 0 1 1 1 0 Zn Zd
U H
High half
SUNPKHI <Zd>.<T>, <Zn>.<Tb>

boolean hi = TRUE;
Low half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 0 0 0 1 1 1 0 Zn Zd
U H
Low half
SUNPKLO <Zd>.<T>, <Zn>.<Tb>

boolean hi = FALSE;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
<Tb> Is the size specifier, encoded in “size”:
SUNPKHI, SUNPKLO Page 2304

size <Tb>
00 RESERVED
01 B
10 H
11 S
Operation
CheckSVEEnabled();
integer hsize = esize DIV 2;
bits(VL) result;
bits(hsize) element = if hi then Elem[operand, e + elements, hsize] else Elem[operand, e, hsize];
Elem[result, e, esize] = Extend(element, esize, unsigned);
Z[d] = result;
SUNPKHI, SUNPKLO Page 2305

SXTB, SXTH, SXTW
Signed byte / halfword / word extend (predicated).
Sign-extend the least-significant sub-element of each active element of the source vector, and place the results in the
unmodified.
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 1 0 1 Pg Zn Zd
U
Byte
SXTB <Zd>.<T>, <Pg>/M, <Zn>.<T>

Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 0 1 0 1 Pg Zn Zd
U
Halfword
SXTH <Zd>.<T>, <Pg>/M, <Zn>.<T>

Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 1 0 1 Pg Zn Zd
U
SXTB, SXTH, SXTW Page 2306

Word
SXTW <Zd>.D, <Pg>/M, <Zn>.D

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
size<0> <T>
0 S
1 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = Extend(element<s_esize-1:0>, esize, unsigned);
Z[d] = result;
UNPREDICTABLE:
SXTB, SXTH, SXTW Page 2307

TBL
Programmable table lookup in single vector table.
Reads each element of the second source (index) vector and uses its value to select an indexed element from the first
source (table) vector, and places the indexed table element in the destination vector element corresponding to the
index vector element. If an index value is greater than or equal to the number of vector elements then it places zero in
the corresponding destination vector element.
Since the index values can select any element in a vector this operation is not naturally vector length agnostic.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 0 1 1 0 0 Zn Zd
SVE
TBL <Zd>.<T>, { <Zn>.<T> }, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer idx = UInt(Elem[operand2, e, esize]);
Elem[result, e, esize] = if idx < elements then Elem[operand1, idx, esize] else Zeros();
Z[d] = result;
TBL Page 2308

TRN1, TRN2 (predicates)
Interleave even or odd elements from two predicates.
Interleave alternating even or odd-numbered elements from the first and second source predicates and place in
elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: Even and Odd
Even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 1 0 0 0 Pn 0 Pd
H
Even
TRN1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

integer part = 0;
Odd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 1 0 1 0 Pn 0 Pd
H
Odd
TRN2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

integer part = 1;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
TRN1, TRN2 (predicates) Page 2309

Operation
CheckSVEEnabled();
integer pairs = VL DIV (esize * 2);
bits(PL) result;
Elem[result, 2*p+0, esize DIV 8] = Elem[operand1, 2*p+part, esize DIV 8];
Elem[result, 2*p+1, esize DIV 8] = Elem[operand2, 2*p+part, esize DIV 8];
P[d] = result;
TRN1, TRN2 (predicates) Page 2310

TRN1, TRN2 (vectors)
Interleave even or odd elements from two vectors.
Interleave alternating even or odd-numbered elements from the first and second source vectors and place in elements
of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction requires that
the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then
the trailing bits are set to zero.
ID_AA64ZFR0_EL1.F64MM indicates whether the 128-bit element variant of the instruction is implemented.
It has encodings from 4 classes: Even , Even (quadwords) , Odd and Odd (quadwords)
Even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 1 0 0 Zn Zd
H
Even
TRN1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

integer part = 0;
Even (quadwords)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 1 1 0 Zn Zd
H
Even (quadwords)
TRN1 <Zd>.Q, <Zn>.Q, <Zm>.Q

integer esize = 128;
integer part = 0;
Odd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 1 0 1 Zn Zd
H
Odd
TRN2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

integer part = 1;
TRN1, TRN2 (vectors) Page 2311

Odd (quadwords)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 1 1 1 Zn Zd
H
Odd (quadwords)
TRN2 <Zd>.Q, <Zn>.Q, <Zm>.Q

integer part = 1;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
Z[d] = result;
TRN1, TRN2 (vectors) Page 2312

UABD
Unsigned absolute difference (predicated).
Compute the absolute difference between unsigned integer values in active elements of the second source vector and
corresponding elements of the first source vector and destructively place the difference in the corresponding elements
of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 1 0 1 0 0 0 Pg Zm Zdn
U
SVE
UABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer absdiff = Abs(element1 - element2);
Elem[result, e, esize] = absdiff<esize-1:0>;
else
Z[dn] = result;
UNPREDICTABLE:
UABD Page 2313

UABD Page 2314

UADDV
Unsigned add reduction to scalar.
Unsigned add horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Narrow elements are first zero-extended to 64 bits. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 1 0 0 1 Pg Zn Vd
U
SVE
UADDV <Dd>, <Pg>, <Zn>.<T>

Assembler Symbols
<Dd> Is the 64-bit name of the destination SIMD&FP register, encoded in the "Vd" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer sum = 0;
integer element = UInt(Elem[operand, e, esize]);
sum = sum + element;
V[d] = sum<63:0>;
UADDV Page 2315

UCVTF
Unsigned integer convert to floating-point (predicated).
Convert to floating-point from the unsigned integer in each active element of the source vector, and place the results
in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
It has encodings from 7 classes: 16-bit to half-precision , 32-bit to half-precision , 32-bit to single-precision , 32-bit to
double-precision , 64-bit to half-precision , 64-bit to single-precision and 64-bit to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 1 Pg Zn Zd
int_U
UCVTF <Zd>.H, <Pg>/M, <Zn>.H

integer esize = 16;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U
UCVTF <Zd>.H, <Pg>/M, <Zn>.S

integer esize = 32;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U
UCVTF Page 2316

UCVTF <Zd>.S, <Pg>/M, <Zn>.S

integer esize = 32;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 Pg Zn Zd
int_U
UCVTF <Zd>.D, <Pg>/M, <Zn>.S

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 Pg Zn Zd
int_U
UCVTF <Zd>.H, <Pg>/M, <Zn>.D

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U
UCVTF Page 2317

UCVTF <Zd>.S, <Pg>/M, <Zn>.D

integer esize = 64;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 0 1 Pg Zn Zd
int_U
UCVTF <Zd>.D, <Pg>/M, <Zn>.D

integer esize = 64;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(d_esize) fpval = FixedToFP(element<s_esize-1:0>, 0, unsigned, FPCR<31:0>, rounding);
Elem[result, e, esize] = ZeroExtend(fpval);
Z[d] = result;
UNPREDICTABLE:
UCVTF Page 2318

UCVTF Page 2319

UDIV
Unsigned divide (predicated).
Unsigned divide active elements of the first source vector by corresponding elements of the second source vector and
destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 0 0 0 Pg Zm Zdn
R U
SVE
UDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size<0> <T>
0 S
1 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer quotient;
quotient = 0;
else
else
Z[dn] = result;
UDIV Page 2320

UNPREDICTABLE:
UDIV Page 2321

UDIVR
Unsigned reversed divide (predicated).
Unsigned reversed divide active elements of the second source vector by corresponding elements of the first source
vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 0 0 0 Pg Zm Zdn
R U
SVE
UDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size<0> <T>
0 S
1 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer quotient;
quotient = 0;
else
else
Z[dn] = result;
UDIVR Page 2322

UNPREDICTABLE:
UDIVR Page 2323

UDOT (vectors)
Unsigned integer dot product.
The unsigned integer dot product instruction computes the dot product of a group of four unsigned 8-bit or 16-bit
integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four unsigned
8-bit or 16-bit integer values in the corresponding 32-bit or 64-bit element of the second source vector, and then
destructively adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 1 Zn Zda
U
SVE
UDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>

Assembler Symbols
size<0> <T>
0 S
1 D
<Tb> Is the size specifier, encoded in “size<0>”:
size<0> <Tb>
0 B
1 H
Operation
CheckSVEEnabled();
bits(VL) result;
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
Z[da] = result;
UDOT (vectors) Page 2324

UNPREDICTABLE:
UDOT (vectors) Page 2325

UDOT (indexed)
Unsigned integer indexed dot product.
The unsigned integer indexed dot product instruction computes the dot product of a group of four unsigned 8-bit or
16-bit integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four
unsigned 8-bit or 16-bit integer values in an indexed 32-bit or 64-bit element of the second source vector, and then
destructively adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
position within each 128-bit vector segment. The index range is from 0 to one less than the number of groups per
128-bit segment, encoded in 1 to 2 bits depending on the size of the group. This instruction is unpredicated.
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> U
32-bit
UDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

integer esize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> U
64-bit
UDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]

integer esize = 64;
Assembler Symbols
<Zm> For the 32-bit variant: is the name of the second source scalable vector register Z0-Z7, encoded in the
"Zm" field.
For the 64-bit variant: is the name of the second source scalable vector register Z0-Z15, encoded in the
"Zm" field.
<imm> For the 32-bit variant: is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit
For the 64-bit variant: is the immediate index of a quadtuplet of four 16-bit elements within each 128-bit
UDOT (indexed) Page 2326

Operation
CheckSVEEnabled();
bits(VL) result;
for i = 0 to 3
integer element2 = UInt(Elem[operand2, 4 * s + i, esize DIV 4]);
Z[da] = result;
UNPREDICTABLE:
UDOT (indexed) Page 2327

UMAX (vectors)
Unsigned maximum vectors (predicated).
Determine the unsigned maximum of active elements of the second source vector and corresponding elements of the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 1 0 0 0 Pg Zm Zdn
U
SVE
UMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer maximum = Max(element1, element2);
Elem[result, e, esize] = maximum<esize-1:0>;
else
Z[dn] = result;
UNPREDICTABLE:
UMAX (vectors) Page 2328

UMAX (vectors) Page 2329

UMAX (immediate)
Unsigned maximum with immediate (unpredicated).
Determine the unsigned maximum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is an unsigned 8-bit value in the range 0 to
255, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 1 0 imm8 Zdn
U
SVE
UMAX <Zdn>.<T>, <Zdn>.<T>, #<imm>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = Max(element1, imm)<esize-1:0>;
Z[dn] = result;
UNPREDICTABLE:
UMAX (immediate) Page 2330

UMAXV
Unsigned maximum reduction to scalar.
Unsigned maximum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 1 0 0 1 Pg Zn Vd
U
SVE
UMAXV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer maximum = if unsigned then 0 else -(2^(esize-1));
maximum = Max(maximum, element);
V[d] = maximum<esize-1:0>;
UMAXV Page 2331

UMIN (vectors)
Unsigned minimum vectors (predicated).
Determine the unsigned minimum of active elements of the second source vector and corresponding elements of the
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 1 0 0 0 Pg Zm Zdn
U
SVE
UMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer minimum = Min(element1, element2);
Elem[result, e, esize] = minimum<esize-1:0>;
else
Z[dn] = result;
UNPREDICTABLE:
UMIN (vectors) Page 2332

UMIN (vectors) Page 2333

UMIN (immediate)
Unsigned minimum with immediate (unpredicated).
Determine the unsigned minimum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is an unsigned 8-bit value in the range 0 to
255, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 1 0 imm8 Zdn
U
SVE
UMIN <Zdn>.<T>, <Zdn>.<T>, #<imm>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
Elem[result, e, esize] = Min(element1, imm)<esize-1:0>;
Z[dn] = result;
UNPREDICTABLE:
UMIN (immediate) Page 2334

UMINV
Unsigned minimum reduction to scalar.
Unsigned minimum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the maximum unsigned integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 1 0 0 1 Pg Zn Vd
U
SVE
UMINV <V><d>, <Pg>, <Zn>.<T>

Assembler Symbols

size <V>
00 B
01 H
10 S
11 D
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer minimum = if unsigned then (2^esize - 1) else (2^(esize-1) - 1);
minimum = Min(minimum, element);
V[d] = minimum<esize-1:0>;
UMINV Page 2335

UMMLA
Unsigned integer matrix multiply-accumulate.
The unsigned integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of unsigned 8-bit integer
values held in each 128-bit segment of the first source vector by the 8×2 matrix of unsigned 8-bit integer values in the
corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is then
destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend and
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 1 1 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>
SVE
UMMLA <Zda>.S, <Zn>.B, <Zm>.B

boolean op1_unsigned = TRUE;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(128) op1, op2;
Z[da] = result;
UNPREDICTABLE:
UMMLA Page 2336

UMMLA Page 2337

UMULH
Unsigned multiply returning high half (predicated).
Widening multiply unsigned integer values in active elements of the first source vector by corresponding elements of
the second source vector and destructively place the high half of the result in the corresponding elements of the first
source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 0 0 0 Pg Zm Zdn
H U
SVE
UMULH <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer product = (element1 * element2) >> esize;
else
Z[dn] = result;
UNPREDICTABLE:
UMULH Page 2338

UMULH Page 2339

UQADD (immediate)
Unsigned saturating add immediate (unpredicated).
Unsigned saturating add of an unsigned immediate to each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 0 1 1 1 sh imm8 Zdn
U
SVE
UQADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 + imm, esize, unsigned);
Z[dn] = result;
UQADD (immediate) Page 2340

UNPREDICTABLE:
UQADD (immediate) Page 2341

UQADD (vectors)
Unsigned saturating add vectors (unpredicated).
Unsigned saturating add all elements of the second source vector to corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. Each result element is saturated to the
N-bit element's unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 0 1 Zn Zd
U
SVE
UQADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 + element2, esize, unsigned);
Z[d] = result;
UQADD (vectors) Page 2342

UQDECB
Unsigned saturating decrement scalar by multiple of 8-bit predicate constraint element count.
saturated to the general-purpose register's unsigned integer range.
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 1 1 pattern Rdn
32-bit
UQDECB <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 1 1 pattern Rdn
64-bit
UQDECB <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
integer ssize = 64;
Assembler Symbols
UQDECB Page 2343

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

UQDECB Page 2344

UQDECD (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 1 1 pattern Rdn
32-bit
UQDECD <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 1 1 pattern Rdn
64-bit
UQDECD <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
integer ssize = 64;
Assembler Symbols
UQDECD (scalar) Page 2345

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

UQDECD (scalar) Page 2346

UQDECD (vector)
Unsigned saturating decrement vector by multiple of 64-bit predicate constraint element count.
results are saturated to the 64-bit unsigned integer range.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U
SVE
UQDECD <Zdn>.D{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
UQDECD (vector) Page 2347

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
UQDECD (vector) Page 2348

UQDECH (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 1 1 pattern Rdn
32-bit
UQDECH <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 1 1 pattern Rdn
64-bit
UQDECH <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
integer ssize = 64;
Assembler Symbols
UQDECH (scalar) Page 2349

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

UQDECH (scalar) Page 2350

UQDECH (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U
SVE
UQDECH <Zdn>.H{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
UQDECH (vector) Page 2351

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
UQDECH (vector) Page 2352

UQDECP (scalar)
Unsigned saturating decrement scalar by count of true predicate elements.
destination. The result is saturated to the general-purpose register's unsigned integer range.
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 1 0 0 Pm Rdn
D U sf
32-bit
UQDECP <Wdn>, <Pm>.<T>

integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 1 1 0 Pm Rdn
D U sf
64-bit
UQDECP <Xdn>, <Pm>.<T>

integer ssize = 64;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
UQDECP (scalar) Page 2353

Operation
CheckSVEEnabled();
bits(ssize) result;
integer count = 0;
count = count + 1;

(result, -) = SatQ(element - count, ssize, unsigned);
UQDECP (scalar) Page 2354

UQDECP (vector)
Unsigned saturating decrement vector by count of true predicate elements.
vector elements. The results are saturated to the element unsigned integer range.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 0 0 0 Pm Zdn
D U
SVE
UQDECP <Zdn>.<T>, <Pm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer count = 0;
count = count + 1;
(Elem[result, e, esize], -) = SatQ(element - count, esize, unsigned);
Z[dn] = result;
UNPREDICTABLE:
UQDECP (vector) Page 2355

UQDECP (vector) Page 2356

UQDECW (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 1 1 pattern Rdn
32-bit
UQDECW <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 1 1 pattern Rdn
64-bit
UQDECW <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
integer ssize = 64;
Assembler Symbols
UQDECW (scalar) Page 2357

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

UQDECW (scalar) Page 2358

UQDECW (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U
SVE
UQDECW <Zdn>.S{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
UQDECW (vector) Page 2359

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
UQDECW (vector) Page 2360

UQINCB
Unsigned saturating increment scalar by multiple of 8-bit predicate constraint element count.
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 0 1 pattern Rdn
32-bit
UQINCB <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 0 1 pattern Rdn
64-bit
UQINCB <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 8;
integer ssize = 64;
Assembler Symbols
UQINCB Page 2361

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

UQINCB Page 2362

UQINCD (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 0 1 pattern Rdn
32-bit
UQINCD <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 0 1 pattern Rdn
64-bit
UQINCD <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 64;
integer ssize = 64;
Assembler Symbols
UQINCD (scalar) Page 2363

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

UQINCD (scalar) Page 2364

UQINCD (vector)
Unsigned saturating increment vector by multiple of 64-bit predicate constraint element count.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U
SVE
UQINCD <Zdn>.D{, <pattern>{, MUL #<imm>}}

integer esize = 64;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
UQINCD (vector) Page 2365

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
UQINCD (vector) Page 2366

UQINCH (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 0 1 pattern Rdn
32-bit
UQINCH <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 0 1 pattern Rdn
64-bit
UQINCH <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 16;
integer ssize = 64;
Assembler Symbols
UQINCH (scalar) Page 2367

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

UQINCH (scalar) Page 2368

UQINCH (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U
SVE
UQINCH <Zdn>.H{, <pattern>{, MUL #<imm>}}

integer esize = 16;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
UQINCH (vector) Page 2369

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
UQINCH (vector) Page 2370

UQINCP (scalar)
Unsigned saturating increment scalar by count of true predicate elements.
destination. The result is saturated to the general-purpose register's unsigned integer range.
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 1 0 0 Pm Rdn
D U sf
32-bit
UQINCP <Wdn>, <Pm>.<T>

integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 1 1 0 Pm Rdn
D U sf
64-bit
UQINCP <Xdn>, <Pm>.<T>

integer ssize = 64;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
UQINCP (scalar) Page 2371

Operation
CheckSVEEnabled();
bits(ssize) result;
integer count = 0;
count = count + 1;

(result, -) = SatQ(element + count, ssize, unsigned);
UQINCP (scalar) Page 2372

UQINCP (vector)
Unsigned saturating increment vector by count of true predicate elements.
vector elements. The results are saturated to the element unsigned integer range.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 0 0 0 Pm Zdn
D U
SVE
UQINCP <Zdn>.<T>, <Pm>.<T>

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
integer count = 0;
count = count + 1;
(Elem[result, e, esize], -) = SatQ(element + count, esize, unsigned);
Z[dn] = result;
UNPREDICTABLE:
UQINCP (vector) Page 2373

UQINCP (vector) Page 2374

UQINCW (scalar)
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 0 1 pattern Rdn
32-bit
UQINCW <Wdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
integer ssize = 32;
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 0 1 pattern Rdn
64-bit
UQINCW <Xdn>{, <pattern>{, MUL #<imm>}}

integer esize = 32;
integer ssize = 64;
Assembler Symbols
UQINCW (scalar) Page 2375

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
Operation
CheckSVEEnabled();
bits(ssize) result;

UQINCW (scalar) Page 2376

UQINCW (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U
SVE
UQINCW <Zdn>.S{, <pattern>{, MUL #<imm>}}

integer esize = 32;
Assembler Symbols
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
UQINCW (vector) Page 2377

Operation
CheckSVEEnabled();
bits(VL) result;
Z[dn] = result;
UNPREDICTABLE:
UQINCW (vector) Page 2378

UQSUB (immediate)
Unsigned saturating subtract immediate (unpredicated).
Unsigned saturating subtract an unsigned immediate from each element of the source vector, and destructively place
the results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 1 1 1 1 sh imm8 Zdn
U
SVE
UQSUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 - imm, esize, unsigned);
Z[dn] = result;
UQSUB (immediate) Page 2379

UNPREDICTABLE:
UQSUB (immediate) Page 2380

UQSUB (vectors)
Unsigned saturating subtract vectors (unpredicated).
Unsigned saturating subtract all elements of the second source vector from corresponding elements of the first source
vector and place the results in the corresponding elements of the destination vector. Each result element is saturated
to the N-bit element's unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 1 1 Zn Zd
U
SVE
UQSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL) result;
(Elem[result, e, esize], -) = SatQ(element1 - element2, esize, unsigned);
Z[d] = result;
UQSUB (vectors) Page 2381

USDOT (vectors)
Unsigned by signed integer dot product.
The unsigned by signed integer dot product instruction computes the dot product of a group of four unsigned 8-bit
integer values held in each 32-bit element of the first source vector multiplied by a group of four signed 8-bit integer
values in the corresponding 32-bit element of the second source vector, and then destructively adds the widened dot
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 0 Zm 0 1 1 1 1 0 Zn Zda
size<1>size<0>
SVE
USDOT <Zda>.S, <Zn>.B, <Zm>.B

integer esize = 32;
Assembler Symbols
Operation
CheckSVEEnabled();
bits(VL) result;
for i = 0 to 3
Z[da] = result;
UNPREDICTABLE:
USDOT (vectors) Page 2382

USDOT (indexed)
Unsigned by signed integer indexed dot product.
The unsigned by signed integer indexed dot product instruction computes the dot product of a group of four unsigned
8-bit integer values held in each 32-bit element of the first source vector multiplied by a group of four signed 8-bit
integer values in an indexed 32-bit element of the second source vector, and then destructively adds the widened dot
position within each 128-bit vector segment. The index range is from 0 to 3. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 1 0 Zn Zda
size<1>size<0> U
SVE
USDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

integer esize = 32;
Assembler Symbols
<imm> Is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit vector segment, in the
range 0 to 3, encoded in the "i2" field.
Operation
CheckSVEEnabled();
bits(VL) result;
for i = 0 to 3
integer element2 = SInt(Elem[operand2, 4 * s + i, esize DIV 4]);
Z[da] = result;
UNPREDICTABLE:
USDOT (indexed) Page 2383

USDOT (indexed) Page 2384

USMMLA
Unsigned by signed integer matrix multiply-accumulate.
The unsigned by signed integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of unsigned 8-bit
integer values held in each 128-bit segment of the first source vector by the 8×2 matrix of signed 8-bit integer values
in the corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is
then destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend
and destination vector. This is equivalent to performing an 8-way dot product per destination element.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 1 0 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>
SVE
USMMLA <Zda>.S, <Zn>.B, <Zm>.B

Assembler Symbols
Operation
CheckSVEEnabled();
bits(128) op1, op2;
Z[da] = result;
UNPREDICTABLE:
USMMLA Page 2385

USMMLA Page 2386

UUNPKHI, UUNPKLO
Unsigned unpack and extend half of vector.
Unpack elements from the lowest or highest half of the source vector and then zero-extend them to place in elements
of twice their size within the destination vector. This instruction is unpredicated.
High half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 1 1 0 0 1 1 1 0 Zn Zd
U H
High half
UUNPKHI <Zd>.<T>, <Zn>.<Tb>

boolean hi = TRUE;
Low half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 1 0 0 0 1 1 1 0 Zn Zd
U H
Low half
UUNPKLO <Zd>.<T>, <Zn>.<Tb>

boolean hi = FALSE;
Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
<Tb> Is the size specifier, encoded in “size”:
UUNPKHI, UUNPKLO Page 2387

size <Tb>
00 RESERVED
01 B
10 H
11 S
Operation
CheckSVEEnabled();
integer hsize = esize DIV 2;
bits(VL) result;
bits(hsize) element = if hi then Elem[operand, e + elements, hsize] else Elem[operand, e, hsize];
Elem[result, e, esize] = Extend(element, esize, unsigned);
Z[d] = result;
UUNPKHI, UUNPKLO Page 2388

UXTB, UXTH, UXTW
Unsigned byte / halfword / word extend (predicated).
Zero-extend the least-significant sub-element of each active element of the source vector, and place the results in the
unmodified.
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 1 1 0 1 Pg Zn Zd
U
Byte
UXTB <Zd>.<T>, <Pg>/M, <Zn>.<T>

Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 1 0 1 Pg Zn Zd
U
Halfword
UXTH <Zd>.<T>, <Pg>/M, <Zn>.<T>

Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 1 0 1 Pg Zn Zd
U
UXTB, UXTH, UXTW Page 2389

Word
UXTW <Zd>.D, <Pg>/M, <Zn>.D

Assembler Symbols
size <T>
00 RESERVED
01 H
10 S
11 D
size<0> <T>
0 S
1 D
Operation
CheckSVEEnabled();
Elem[result, e, esize] = Extend(element<s_esize-1:0>, esize, unsigned);
Z[d] = result;
UNPREDICTABLE:
UXTB, UXTH, UXTW Page 2390

UZP1, UZP2 (predicates)
Concatenate even or odd elements from two predicates.
Concatenate adjacent even or odd-numbered elements from the first and second source predicates and place in
elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: Even and Odd
Even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 1 0 0 Pn 0 Pd
H
Even
UZP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

integer part = 0;
Odd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 1 1 0 Pn 0 Pd
H
Odd
UZP2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

integer part = 1;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
UZP1, UZP2 (predicates) Page 2391

Operation
CheckSVEEnabled();
bits(PL) result;
bits(PL*2) zipped = operand2:operand1;

Elem[result, e, esize DIV 8] = Elem[zipped, 2*e+part, esize DIV 8];
P[d] = result;
UZP1, UZP2 (predicates) Page 2392

UZP1, UZP2 (vectors)
Concatenate even or odd elements from two vectors.
Concatenate adjacent even or odd-numbered elements from the first and second source vectors and place in elements
of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction requires that
the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then
the trailing bits are set to zero.
It has encodings from 4 classes: Even , Even (quadwords) , Odd and Odd (quadwords)
Even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 1 0 Zn Zd
H
Even
UZP1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

integer part = 0;
Even (quadwords)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 1 0 Zn Zd
H
Even (quadwords)
UZP1 <Zd>.Q, <Zn>.Q, <Zm>.Q

integer part = 0;
Odd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 1 1 Zn Zd
H
Odd
UZP2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

integer part = 1;
UZP1, UZP2 (vectors) Page 2393

Odd (quadwords)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 1 1 Zn Zd
H
Odd (quadwords)
UZP2 <Zd>.Q, <Zn>.Q, <Zm>.Q

integer part = 1;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(VL*2) zipped = operand2:operand1;

Z[d] = result;
UZP1, UZP2 (vectors) Page 2394

WHILELE
While incrementing signed scalar less than or equal to scalar.
Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
signed scalar operand is less than or equal to the second scalar operand and false thereafter up to the highest
numbered element.
If the second scalar operand is equal to the maximum signed integer value then a condition which includes an equality
test can never fail and the result will be an all-true predicate.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 0 1 Rn 1 Pd
U lt eq
SVE
WHILELE <Pd>.<T>, <R><n>, <R><m>

integer rsize = 32 << UInt(sf);
SVECmp op = Cmp_LE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
<R> Is a width specifier, encoded in “sf”:

sf <R>
0 W
1 X
field.
field.
WHILELE Page 2395

Operation
CheckSVEEnabled();
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));
last = last && cond;

ElemP[result, e, esize] = if last then '1' else '0';
operand1 = operand1 + 1;

P[d] = result;
WHILELE Page 2396

WHILELO
While incrementing unsigned scalar lower than scalar.
unsigned scalar operand is lower than the second scalar operand and false thereafter up to the highest numbered
element.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 1 1 Rn 0 Pd
U lt eq
SVE
WHILELO <Pd>.<T>, <R><n>, <R><m>

SVECmp op = Cmp_LT;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

sf <R>
0 W
1 X
field.
field.
WHILELO Page 2397

Operation
CheckSVEEnabled();
bits(PL) result;
boolean cond;
case op of


P[d] = result;
WHILELO Page 2398

WHILELS
While incrementing unsigned scalar lower or same as scalar.
unsigned scalar operand is lower or same as the second scalar operand and false thereafter up to the highest
numbered element.
If the second scalar operand is equal to the maximum unsigned integer value then a condition which includes an
equality test can never fail and the result will be an all-true predicate.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 1 1 Rn 1 Pd
U lt eq
SVE
WHILELS <Pd>.<T>, <R><n>, <R><m>

SVECmp op = Cmp_LE;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

sf <R>
0 W
1 X
field.
field.
WHILELS Page 2399

Operation
CheckSVEEnabled();
bits(PL) result;
boolean cond;
case op of


P[d] = result;
WHILELS Page 2400

WHILELT
While incrementing signed scalar less than scalar.
signed scalar operand is less than the second scalar operand and false thereafter up to the highest numbered element.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 0 1 Rn 0 Pd
U lt eq
SVE
WHILELT <Pd>.<T>, <R><n>, <R><m>

SVECmp op = Cmp_LT;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D

sf <R>
0 W
1 X
field.
field.
WHILELT Page 2401

Operation
CheckSVEEnabled();
bits(PL) result;
boolean cond;
case op of


P[d] = result;
WHILELT Page 2402

WRFFR
Write the first-fault register.
Read the source predicate register and place in the first-fault register (FFR). This instruction is intended to restore a
saved FFR and is not recommended for general use by applications.
This instruction requires that the source predicate contains a MONOTONIC predicate value, in which starting from bit 0
there are zero or more 1 bits, followed only by 0 bits in any remaining bit positions. If the source is not a monotonic
predicate value, then the resulting value in the FFR will be UNPREDICTABLE. It is not possible to generate a non-
monotonic value in FFR when using SETFFR followed by first-fault or non-fault loads.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 Pn 0 0 0 0 0
SVE
WRFFR <Pn>.B

Assembler Symbols
Operation
CheckSVEEnabled();
hsb = HighestSetBit(operand);
if hsb < 0 || IsOnes(operand<hsb:0>) then
FFR[] = operand;
else // not a monotonic predicate
FFR[] = bits(PL) UNKNOWN;
WRFFR Page 2403

ZIP1, ZIP2 (predicates)
Interleave elements from two half predicates.
Interleave alternating elements from the lowest or highest halves of the first and second source predicates and place
in elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: High halves and Low halves
High halves
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 0 1 0 Pn 0 Pd
H
High halves
ZIP2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

integer part = 1;
Low halves
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 0 0 0 Pn 0 Pd
H
Low halves
ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

integer part = 0;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
ZIP1, ZIP2 (predicates) Page 2404

Operation
CheckSVEEnabled();
bits(PL) result;

Elem[result, 2*p+0, esize DIV 8] = Elem[operand1, base+p, esize DIV 8];
Elem[result, 2*p+1, esize DIV 8] = Elem[operand2, base+p, esize DIV 8];
P[d] = result;
ZIP1, ZIP2 (predicates) Page 2405

ZIP1, ZIP2 (vectors)
Interleave elements from two half vectors.
Interleave alternating elements from the lowest or highest halves of the first and second source vectors and place in
elements of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction
requires that the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of
256 bits then the trailing bits are set to zero.
It has encodings from 4 classes: High halves , High halves (quadwords) , Low halves and Low halves (quadwords)
High halves
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 0 1 Zn Zd
H
High halves
ZIP2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

integer part = 1;
High halves (quadwords)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 0 1 Zn Zd
H
High halves (quadwords)
ZIP2 <Zd>.Q, <Zn>.Q, <Zm>.Q

integer part = 1;
Low halves
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 0 0 Zn Zd
H
Low halves
ZIP1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

integer part = 0;
ZIP1, ZIP2 (vectors) Page 2406

Low halves (quadwords)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 0 0 Zn Zd
H
Low halves (quadwords)
ZIP1 <Zd>.Q, <Zn>.Q, <Zm>.Q

integer part = 0;
Assembler Symbols
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();

Z[d] = result;
ZIP1, ZIP2 (vectors) Page 2407

Top-level encodings for A64

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0
Decode fields
Instruction details
op0
0000 Reserved
0001 UNALLOCATED
0010 SVE encodings
0011 UNALLOCATED
100x Data Processing -- Immediate
101x Branches, Exception Generating and System instructions
x1x0 Loads and Stores
x101 Data Processing -- Register
x111 Data Processing -- Scalar Floating-Point and Advanced SIMD
Reserved
These instructions are under the top-level.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 0000 op1
Decode fields
Instruction details
op0 op1
000 000000000 UDF
!= 000000000 UNALLOCATED
!= 000 UNALLOCATED
SVE encodings
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 0010 op1 op2 op3
Decode fields
Instruction details
op0 op1 op2 op3
000 0x 0xxxx x1xxxx SVE Integer Multiply-Add - Predicated
000 0x 0xxxx 000xxx SVE Integer Binary Arithmetic - Predicated
000 0x 0xxxx 001xxx SVE Integer Reduction
000 0x 0xxxx 100xxx SVE Bitwise Shift - Predicated
000 0x 0xxxx 101xxx SVE Integer Unary Arithmetic - Predicated
000 0x 1xxxx 000xxx SVE integer add/subtract vectors (unpredicated)
000 0x 1xxxx 001xxx SVE Bitwise Logical - Unpredicated
000 0x 1xxxx 0100xx SVE Index Generation
000 0x 1xxxx 0101xx SVE Stack Allocation
000 0x 1xxxx 011xxx UNALLOCATED
000 0x 1xxxx 100xxx SVE Bitwise Shift - Unpredicated
000 0x 1xxxx 1010xx SVE address generation
000 0x 1xxxx 1011xx SVE Integer Misc - Unpredicated
Page 2408
000 0x 1xxxx 11xxxx SVE Element Count

000 1x 00xxx SVE Bitwise Immediate
000 1x 01xxx SVE Integer Wide Immediate - Predicated
000 1x 1xxxx 001xxx SVE Permute Vector - Unpredicated
000 1x 1xxxx 010xxx SVE Permute Predicate
000 1x 1xxxx 011xxx SVE permute vector elements
000 1x 1xxxx 10xxxx SVE Permute Vector - Predicated
000 1x 1xxxx 11xxxx SEL (vectors)
000 10 1xxxx 000xxx SVE Permute Vector - Extract
000 11 1xxxx 000xxx SVE permute vector segments
001 0x 0xxxx SVE Integer Compare - Vectors
001 0x 1xxxx SVE integer compare with unsigned immediate
001 1x 0xxxx x0xxxx SVE integer compare with signed immediate
001 1x 00xxx 01xxxx SVE predicate logical operations
001 1x 00xxx 11xxxx SVE Propagate Break
001 1x 01xxx 01xxxx SVE Partition Break
001 1x 01xxx 11xxxx SVE Predicate Misc
001 1x 1xxxx 00xxxx SVE Integer Compare - Scalars
001 1x 1xxxx 01xxxx UNALLOCATED
001 1x 1xxxx 11xxxx SVE Integer Wide Immediate - Unpredicated
001 1x 100xx 10xxxx SVE predicate count
001 1x 101xx 1000xx SVE Inc/Dec by Predicate Count
001 1x 101xx 1001xx SVE Write FFR
001 1x 101xx 101xxx UNALLOCATED
001 1x 11xxx 10xxxx UNALLOCATED
010 0x 0xxxx 0xxxxx SVE Integer Multiply-Add - Unpredicated
010 0x 0xxxx 1xxxxx UNALLOCATED
010 0x 1xxxx SVE Multiply - Indexed
010 1x 0xxxx 0xxxxx UNALLOCATED
010 1x 0xxxx 10xxxx SVE Misc
010 1x 0xxxx 11xxxx UNALLOCATED
010 1x 1xxxx UNALLOCATED
011 0x 0xxxx 0xxxxx FCMLA (vectors)
011 0x 00x1x 1xxxxx UNALLOCATED
011 0x 00000 100xxx FCADD
011 0x 00000 101xxx UNALLOCATED
011 0x 00000 11xxxx UNALLOCATED
011 0x 00001 1xxxxx UNALLOCATED
011 0x 0010x 100xxx UNALLOCATED
011 0x 0010x 101xxx SVE floating-point convert precision odd elements
011 0x 0010x 11xxxx UNALLOCATED
011 0x 01xxx 1xxxxx UNALLOCATED
011 0x 1xxxx x0x01x UNALLOCATED
011 0x 1xxxx 00000x SVE floating-point multiply-add (indexed)
011 0x 1xxxx 0001xx SVE floating-point complex multiply-add (indexed)
011 0x 1xxxx 001000 SVE floating-point multiply (indexed)
011 0x 1xxxx 001001 UNALLOCATED
011 0x 1xxxx 0011xx UNALLOCATED
011 0x 1xxxx 01x0xx SVE Floating Point Widening Multiply-Add - Indexed
Page 2409
011 0x 1xxxx 01x1xx UNALLOCATED

011 0x 1xxxx 10x00x SVE Floating Point Widening Multiply-Add
011 0x 1xxxx 10x1xx UNALLOCATED
011 0x 1xxxx 110xxx UNALLOCATED
011 0x 1xxxx 111000 UNALLOCATED
011 0x 1xxxx 111001 SVE floating point matrix multiply accumulate
011 0x 1xxxx 11101x UNALLOCATED
011 0x 1xxxx 1111xx UNALLOCATED
011 1x 0xxxx x1xxxx SVE floating-point compare vectors
011 1x 0xxxx 000xxx SVE floating-point arithmetic (unpredicated)
011 1x 0xxxx 100xxx SVE Floating Point Arithmetic - Predicated
011 1x 0xxxx 101xxx SVE Floating Point Unary Operations - Predicated
011 1x 000xx 001xxx SVE floating-point recursive reduction
011 1x 001xx 0010xx UNALLOCATED
011 1x 001xx 0011xx SVE Floating Point Unary Operations - Unpredicated
011 1x 010xx 001xxx SVE Floating Point Compare - with Zero
011 1x 011xx 001xxx SVE floating-point serial reduction (predicated)
011 1x 1xxxx SVE Floating Point Multiply-Add
100 SVE Memory - 32-bit Gather and Unsized Contiguous
101 SVE Memory - Contiguous Load
110 SVE Memory - 64-bit Gather
111 0x0xxx SVE Memory - Contiguous Store and Unsized Contiguous
111 0x1xxx SVE Memory - Non-temporal and Multi-register Store
111 1x0xxx SVE Memory - Scatter with Optional Sign Extend
111 101xxx SVE Memory - Scatter
111 111xxx SVE Memory - Contiguous Store with Immediate Offset
SVE Integer Multiply-Add - Predicated
These instructions are under SVE encodings.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 1
Decode fields
Instruction details
op0
0 SVE integer multiply-accumulate writing addend (predicated)
1 SVE integer multiply-add writing multiplicand (predicated)
SVE integer multiply-accumulate writing addend (predicated)
These instructions are under SVE Integer Multiply-Add - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 op Pg Zn Zda
Decode fields
Instruction Details
op
0 MLA
1 MLS
Page 2410
SVE integer multiply-add writing multiplicand (predicated)
These instructions are under SVE Integer Multiply-Add - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 op Pg Za Zdn
Decode fields
Instruction Details
op
0 MAD
1 MSB
SVE Integer Binary Arithmetic - Predicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 000
Decode fields
Instruction details
op0
00x SVE integer add/subtract vectors (predicated)
01x SVE integer min/max/difference (predicated)
100 SVE integer multiply vectors (predicated)
101 SVE integer divide vectors (predicated)
11x SVE bitwise logical operations (predicated)
SVE integer add/subtract vectors (predicated)
These instructions are under SVE Integer Binary Arithmetic - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 opc 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
opc
000 ADD (vectors, predicated)
001 SUB (vectors, predicated)
010 UNALLOCATED
011 SUBR (vectors)
1xx UNALLOCATED
SVE integer min/max/difference (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 opc U 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
opc U
00 0 SMAX (vectors)
00 1 UMAX (vectors)
01 0 SMIN (vectors)
01 1 UMIN (vectors)
10 0 SABD
Page 2411
Decode fields
Instruction Details
opc U
10 1 UABD
11 UNALLOCATED
SVE integer multiply vectors (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 H U 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
H U
0 0 MUL (vectors)
0 1 UNALLOCATED
1 0 SMULH
1 1 UMULH
SVE integer divide vectors (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 R U 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
R U
0 0 SDIV
0 1 UDIV
1 0 SDIVR
1 1 UDIVR
SVE bitwise logical operations (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 opc 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
opc
000 ORR (vectors, predicated)
001 EOR (vectors, predicated)
010 AND (vectors, predicated)
011 BIC (vectors, predicated)
1xx UNALLOCATED
SVE Integer Reduction
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 001
Page 2412
Decode fields
Instruction details
op0
00 SVE integer add reduction (predicated)
01 SVE integer min/max reduction (predicated)
10 SVE constructive prefix (predicated)
11 SVE bitwise logical reduction (predicated)
SVE integer add reduction (predicated)
These instructions are under SVE Integer Reduction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 opc U 0 0 1 Pg Zn Vd
Decode fields
Instruction Details
opc U
00 0 SADDV
00 1 UADDV
01 UNALLOCATED
1x UNALLOCATED
SVE integer min/max reduction (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 opc U 0 0 1 Pg Zn Vd
Decode fields
Instruction Details
opc U
00 0 SMAXV
00 1 UMAXV
01 0 SMINV
01 1 UMINV
1x UNALLOCATED
SVE constructive prefix (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 opc M 0 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc
00 MOVPRFX (predicated)
01 UNALLOCATED
1x UNALLOCATED
SVE bitwise logical reduction (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 opc 0 0 1 Pg Zn Vd
Page 2413
Decode fields
Instruction Details
opc
000 ORV
001 EORV
010 ANDV
011 UNALLOCATED
1xx UNALLOCATED
SVE Bitwise Shift - Predicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 100
Decode fields
Instruction details
op0
0x SVE bitwise shift by immediate (predicated)
10 SVE bitwise shift by vector (predicated)
11 SVE bitwise shift by wide elements (predicated)
SVE bitwise shift by immediate (predicated)
These instructions are under SVE Bitwise Shift - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 opc L U 1 0 0 Pg tszl imm3 Zdn
Decode fields
Instruction Details
opc L U
00 0 0 ASR (immediate, predicated)
00 0 1 LSR (immediate, predicated)
00 1 0 UNALLOCATED
00 1 1 LSL (immediate, predicated)
01 0 0 ASRD
01 0 1 UNALLOCATED
01 1 UNALLOCATED
1x UNALLOCATED
SVE bitwise shift by vector (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 R L U 1 0 0 Pg Zm Zdn
Decode fields
Instruction Details
R L U
1 0 UNALLOCATED
0 0 0 ASR (vectors)
0 0 1 LSR (vectors)
0 1 1 LSL (vectors)
1 0 0 ASRR
1 0 1 LSRR
Page 2414
Decode fields
Instruction Details
R L U
1 1 1 LSLR
SVE bitwise shift by wide elements (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 R L U 1 0 0 Pg Zm Zdn
Decode fields
Instruction Details
R L U
0 0 0 ASR (wide elements, predicated)
0 0 1 LSR (wide elements, predicated)
0 1 0 UNALLOCATED
0 1 1 LSL (wide elements, predicated)
1 UNALLOCATED
SVE Integer Unary Arithmetic - Predicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 101
Decode fields
Instruction details
op0
0x UNALLOCATED
10 SVE integer unary operations (predicated)
11 SVE bitwise unary operations (predicated)
SVE integer unary operations (predicated)
These instructions are under SVE Integer Unary Arithmetic - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 opc 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc
000 SXTB, SXTH, SXTW — SXTB
001 UXTB, UXTH, UXTW — UXTB
010 SXTB, SXTH, SXTW — SXTH
011 UXTB, UXTH, UXTW — UXTH
100 SXTB, SXTH, SXTW — SXTW
101 UXTB, UXTH, UXTW — UXTW
110 ABS
111 NEG
SVE bitwise unary operations (predicated)
These instructions are under SVE Integer Unary Arithmetic - Predicated.
Page 2415
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 opc 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc
000 CLS
001 CLZ
010 CNT
011 CNOT
100 FABS
101 FNEG
110 NOT (vector)
111 UNALLOCATED
SVE integer add/subtract vectors (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 opc Zn Zd
Decode fields
Instruction Details
opc
000 ADD (vectors, unpredicated)
001 SUB (vectors, unpredicated)
01x UNALLOCATED
100 SQADD (vectors)
101 UQADD (vectors)
110 SQSUB (vectors)
111 UQSUB (vectors)
SVE Bitwise Logical - Unpredicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 001 op0 op1
Decode fields
Instruction details
op0 op1
0 UNALLOCATED
1 00 SVE bitwise logical operations (unpredicated)
1 != 00 UNALLOCATED
SVE bitwise logical operations (unpredicated)
These instructions are under SVE Bitwise Logical - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 Zm 0 0 1 1 0 0 Zn Zd
Decode fields
Instruction Details
opc
00 AND (vectors, unpredicated)
01 ORR (vectors, unpredicated)
Page 2416
Decode fields
Instruction Details
opc
10 EOR (vectors, unpredicated)
11 BIC (vectors, unpredicated)
SVE Index Generation
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 0100 op0
Decode fields
Instruction details
op0
00 INDEX (immediates)
01 INDEX (scalar, immediate)
10 INDEX (immediate, scalar)
11 INDEX (scalars)
SVE Stack Allocation
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 op0 1 0101 op1
Decode fields
Instruction details
op0 op1
0 0 SVE stack frame adjustment
1 0 SVE stack frame size
1 UNALLOCATED
SVE stack frame adjustment
These instructions are under SVE Stack Allocation.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 op 1 Rn 0 1 0 1 0 imm6 Rd
Decode fields
Instruction Details
op
0 ADDVL
1 ADDPL
SVE stack frame size
These instructions are under SVE Stack Allocation.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 op 1 opc2 0 1 0 1 0 imm6 Rd
Decode fields
Instruction Details
op opc2
0 0xxxx UNALLOCATED
0 10xxx UNALLOCATED
Page 2417
Decode fields
Instruction Details
op opc2
0 110xx UNALLOCATED
0 1110x UNALLOCATED
0 11110 UNALLOCATED
0 11111 RDVL
1 UNALLOCATED
SVE Bitwise Shift - Unpredicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 100 op0
Decode fields
Instruction details
op0
0 SVE bitwise shift by wide elements (unpredicated)
1 SVE bitwise shift by immediate (unpredicated)
SVE bitwise shift by wide elements (unpredicated)
These instructions are under SVE Bitwise Shift - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 opc Zn Zd
Decode fields
Instruction Details
opc
00 ASR (wide elements, unpredicated)
01 LSR (wide elements, unpredicated)
10 UNALLOCATED
11 LSL (wide elements, unpredicated)
SVE bitwise shift by immediate (unpredicated)
These instructions are under SVE Bitwise Shift - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 opc Zn Zd
Decode fields
Instruction Details
opc
00 ASR (immediate, unpredicated)
01 LSR (immediate, unpredicated)
10 UNALLOCATED
11 LSL (immediate, unpredicated)
SVE address generation
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 Zm 1 0 1 0 msz Zn Zd
Page 2418
Decode fields
Instruction Details
opc
00 ADR — Unpacked 32-bit signed offsets
01 ADR — Unpacked 32-bit unsigned offsets
1x ADR — Packed offsets
SVE Integer Misc - Unpredicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 1011 op0
Decode fields
Instruction details
op0
0x SVE floating-point trig select coefficient
10 SVE floating-point exponential accelerator
11 SVE constructive prefix (unpredicated)
SVE floating-point trig select coefficient
These instructions are under SVE Integer Misc - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 1 1 0 op Zn Zd
Decode fields
Instruction Details
op
0 FTSSEL
1 UNALLOCATED
SVE floating-point exponential accelerator
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 opc 1 0 1 1 1 0 Zn Zd
Decode fields
Instruction Details
opc
00000 FEXPA
00001 UNALLOCATED
0001x UNALLOCATED
001xx UNALLOCATED
01xxx UNALLOCATED
1xxxx UNALLOCATED
SVE constructive prefix (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 opc2 1 0 1 1 1 1 Zn Zd
Page 2419
Decode fields
Instruction Details
opc opc2
00 00000 MOVPRFX (unpredicated)
00 00001 UNALLOCATED
00 0001x UNALLOCATED
00 001xx UNALLOCATED
00 01xxx UNALLOCATED
00 1xxxx UNALLOCATED
01 UNALLOCATED
1x UNALLOCATED
SVE Element Count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 op0 11 op1
Decode fields
Instruction details
op0 op1
0 00x SVE saturating inc/dec vector by element count
0 100 SVE element count
0 101 UNALLOCATED
1 000 SVE inc/dec vector by element count
1 100 SVE inc/dec register by element count
1 x01 UNALLOCATED
01x UNALLOCATED
11x SVE saturating inc/dec register by element count
SVE saturating inc/dec vector by element count
These instructions are under SVE Element Count.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 imm4 1 1 0 0 D U pattern Zdn
Decode fields
Instruction Details
size D U
00 UNALLOCATED
01 0 0 SQINCH (vector)
01 0 1 UQINCH (vector)
01 1 0 SQDECH (vector)
01 1 1 UQDECH (vector)
10 0 0 SQINCW (vector)
10 0 1 UQINCW (vector)
10 1 0 SQDECW (vector)
10 1 1 UQDECW (vector)
11 0 0 SQINCD (vector)
11 0 1 UQINCD (vector)
11 1 0 SQDECD (vector)
11 1 1 UQDECD (vector)
Page 2420
SVE element count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 imm4 1 1 1 0 0 op pattern Rd
Decode fields
Instruction Details
size op
1 UNALLOCATED
00 0 CNTB, CNTD, CNTH, CNTW — CNTB
01 0 CNTB, CNTD, CNTH, CNTW — CNTH
10 0 CNTB, CNTD, CNTH, CNTW — CNTW
11 0 CNTB, CNTD, CNTH, CNTW — CNTD
SVE inc/dec vector by element count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 1 imm4 1 1 0 0 0 D pattern Zdn
Decode fields
Instruction Details
size D
00 UNALLOCATED
01 0 INCD, INCH, INCW (vector) — INCH
01 1 DECD, DECH, DECW (vector) — DECH
10 0 INCD, INCH, INCW (vector) — INCW
10 1 DECD, DECH, DECW (vector) — DECW
11 0 INCD, INCH, INCW (vector) — INCD
11 1 DECD, DECH, DECW (vector) — DECD
SVE inc/dec register by element count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 1 imm4 1 1 1 0 0 D pattern Rdn
Decode fields
Instruction Details
size D
00 0 INCB, INCD, INCH, INCW (scalar) — INCB
00 1 DECB, DECD, DECH, DECW (scalar) — DECB
01 0 INCB, INCD, INCH, INCW (scalar) — INCH
01 1 DECB, DECD, DECH, DECW (scalar) — DECH
10 0 INCB, INCD, INCH, INCW (scalar) — INCW
10 1 DECB, DECD, DECH, DECW (scalar) — DECW
11 0 INCB, INCD, INCH, INCW (scalar) — INCD
11 1 DECB, DECD, DECH, DECW (scalar) — DECD
SVE saturating inc/dec register by element count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 sf imm4 1 1 1 1 D U pattern Rdn
Page 2421
Decode fields
Instruction Details
size sf D U
00 0 0 0 SQINCB — 32-bit
00 0 0 1 UQINCB — 32-bit
00 0 1 0 SQDECB — 32-bit
00 0 1 1 UQDECB — 32-bit
00 1 0 0 SQINCB — 64-bit
00 1 0 1 UQINCB — 64-bit
00 1 1 0 SQDECB — 64-bit
00 1 1 1 UQDECB — 64-bit
01 0 0 0 SQINCH (scalar) — 32-bit
01 0 0 1 UQINCH (scalar) — 32-bit
01 0 1 0 SQDECH (scalar) — 32-bit
01 0 1 1 UQDECH (scalar) — 32-bit
01 1 0 0 SQINCH (scalar) — 64-bit
01 1 0 1 UQINCH (scalar) — 64-bit
01 1 1 0 SQDECH (scalar) — 64-bit
01 1 1 1 UQDECH (scalar) — 64-bit
10 0 0 0 SQINCW (scalar) — 32-bit
10 0 0 1 UQINCW (scalar) — 32-bit
10 0 1 0 SQDECW (scalar) — 32-bit
10 0 1 1 UQDECW (scalar) — 32-bit
10 1 0 0 SQINCW (scalar) — 64-bit
10 1 0 1 UQINCW (scalar) — 64-bit
10 1 1 0 SQDECW (scalar) — 64-bit
10 1 1 1 UQDECW (scalar) — 64-bit
11 0 0 0 SQINCD (scalar) — 32-bit
11 0 0 1 UQINCD (scalar) — 32-bit
11 0 1 0 SQDECD (scalar) — 32-bit
11 0 1 1 UQDECD (scalar) — 32-bit
11 1 0 0 SQINCD (scalar) — 64-bit
11 1 0 1 UQINCD (scalar) — 64-bit
11 1 1 0 SQDECD (scalar) — 64-bit
11 1 1 1 UQDECD (scalar) — 64-bit
SVE Bitwise Immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 op0 00 op1
Decode fields
Instruction details
op0 op1
11 00 DUPM
!= 11 00 SVE bitwise logical with immediate (unpredicated)
!= 00 UNALLOCATED
SVE bitwise logical with immediate (unpredicated)
These instructions are under SVE Bitwise Immediate.
Page 2422
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 != 11 0 0 0 0 imm13 Zdn
opc
The following constraints also apply to this encoding: opc != 11 && opc != 11
Decode fields
Instruction Details
opc
00 ORR (immediate)
01 EOR (immediate)
10 AND (immediate)
SVE Integer Wide Immediate - Predicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 01 op0
Decode fields
Instruction details
op0
0xx SVE copy integer immediate (predicated)
10x UNALLOCATED
110 FCPY
111 UNALLOCATED
SVE copy integer immediate (predicated)
These instructions are under SVE Integer Wide Immediate - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 M sh imm8 Zd
Decode fields
Instruction Details
M
0 CPY (immediate, zeroing)
1 CPY (immediate, merging)
SVE Permute Vector - Unpredicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 1 op0 op1 op2 001 op3 op4
Decode fields
Instruction details
op0 op1 op2 op3 op4
00 00 0 1 10 DUP (scalar)
00 10 0 1 10 INSR (scalar)
00 x0 0 0 01 UNALLOCATED
00 x0 0 1 x1 UNALLOCATED
00 x1 1 1x UNALLOCATED
00 x1 01 UNALLOCATED
00 1 1 1x UNALLOCATED
00 1 01 UNALLOCATED
Page 2423
00 0 1x UNALLOCATED
01 != 00 UNALLOCATED
10 0x 0 01 UNALLOCATED
10 0x 1 10 SVE unpack vector elements
10 0x 1 x1 UNALLOCATED
10 10 0 0 01 UNALLOCATED
10 10 0 1 10 INSR (SIMD&FP scalar)
10 10 0 1 x1 UNALLOCATED
10 11 01 UNALLOCATED
10 1x 1 1 1x UNALLOCATED
11 00 0 1 10 REV (vector)
11 0x 1 1 1x UNALLOCATED
11 != 00 1 1x UNALLOCATED
11 != 00 01 UNALLOCATED
1x 0 1x UNALLOCATED
0 00 DUP (indexed)
1 00 TBL
SVE unpack vector elements
These instructions are under SVE Permute Vector - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 U H 0 0 1 1 1 0 Zn Zd
Decode fields
Instruction Details
U H
0 0 SUNPKHI, SUNPKLO — SUNPKLO
0 1 SUNPKHI, SUNPKLO — SUNPKHI
1 0 UUNPKHI, UUNPKLO — UUNPKLO
1 1 UUNPKHI, UUNPKLO — UUNPKHI
SVE Permute Predicate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 op0 1 op1 010 op2 op3
Decode fields
Instruction details
op0 op1 op2 op3
00 1000x 0000 0 SVE unpack predicate elements
0xxxx xxx0 0 SVE permute predicate elements
0xxxx xxx1 0 UNALLOCATED
Page 2424
10100 0000 0 REV (predicate)

10x0x 1000 0 UNALLOCATED
10x0x x100 0 UNALLOCATED
10x0x xx10 0 UNALLOCATED
10x0x xxx1 0 UNALLOCATED
10x1x 0 UNALLOCATED
11xxx 0 UNALLOCATED
1 UNALLOCATED
SVE unpack predicate elements
These instructions are under SVE Permute Predicate.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 H 0 1 0 0 0 0 0 Pn 0 Pd
Decode fields
Instruction Details
H
0 PUNPKHI, PUNPKLO — PUNPKLO
1 PUNPKHI, PUNPKLO — PUNPKHI
SVE permute predicate elements
These instructions are under SVE Permute Predicate.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 opc H 0 Pn 0 Pd
Decode fields
Instruction Details
opc H
00 0 ZIP1, ZIP2 (predicates) — ZIP1
00 1 ZIP1, ZIP2 (predicates) — ZIP2
01 0 UZP1, UZP2 (predicates) — UZP1
01 1 UZP1, UZP2 (predicates) — UZP2
10 0 TRN1, TRN2 (predicates) — TRN1
10 1 TRN1, TRN2 (predicates) — TRN2
11 UNALLOCATED
SVE permute vector elements
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 opc Zn Zd
Decode fields
Instruction Details
opc
000 ZIP1, ZIP2 (vectors) — ZIP1
001 ZIP1, ZIP2 (vectors) — ZIP2
010 UZP1, UZP2 (vectors) — UZP1
011 UZP1, UZP2 (vectors) — UZP2
100 TRN1, TRN2 (vectors) — TRN1
Page 2425
Decode fields
Instruction Details
opc
101 TRN1, TRN2 (vectors) — TRN2
11x UNALLOCATED
SVE Permute Vector - Predicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 1 op0 op1 op2 10 op3
Decode fields
Instruction details
op0 op1 op2 op3
0 000 0 0 CPY (SIMD&FP scalar)
0 000 1 0 COMPACT
0 000 1 SVE extract element to general register
0 001 0 SVE extract element to SIMD&FP scalar register
0 01x 0 SVE reverse within elements
0 01x 1 UNALLOCATED
0 100 0 1 CPY (scalar)
0 100 1 1 UNALLOCATED
0 100 0 SVE conditionally broadcast element to vector
0 101 0 SVE conditionally extract element to SIMD&FP scalar
0 110 0 0 SPLICE
0 110 1 UNALLOCATED
0 111 0 UNALLOCATED
0 111 1 UNALLOCATED
0 x01 1 UNALLOCATED
1 000 0 UNALLOCATED
1 000 1 SVE conditionally extract element to general register
SVE extract element to general register
These instructions are under SVE Permute Vector - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 B 1 0 1 Pg Zn Rd
Decode fields
Instruction Details
B
0 LASTA (scalar)
1 LASTB (scalar)
SVE extract element to SIMD&FP scalar register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 B 1 0 0 Pg Zn Vd
Page 2426
Decode fields
Instruction Details
B
0 LASTA (SIMD&FP scalar)
1 LASTB (SIMD&FP scalar)
SVE reverse within elements
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 opc 1 0 0 Pg Zn Zd
Decode fields
Instruction Details
opc
00 REVB, REVH, REVW — REVB
01 REVB, REVH, REVW — REVH
10 REVB, REVH, REVW — REVW
11 RBIT
SVE conditionally broadcast element to vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 B 1 0 0 Pg Zm Zdn
Decode fields
Instruction Details
B
0 CLASTA (vectors)
1 CLASTB (vectors)
SVE conditionally extract element to SIMD&FP scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 B 1 0 0 Pg Zm Vdn
Decode fields
Instruction Details
B
0 CLASTA (SIMD&FP scalar)
1 CLASTB (SIMD&FP scalar)
SVE conditionally extract element to general register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 B 1 0 1 Pg Zm Rdn
Decode fields
Instruction Details
B
0 CLASTA (scalar)
1 CLASTB (scalar)
Page 2427
SVE Permute Vector - Extract
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
000001010 op0 1 000
Decode fields
Instruction details
op0
0 EXT
1 UNALLOCATED
SVE permute vector segments
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 op 1 Zm 0 0 0 opc2 Zn Zd
Decode fields
Instruction Details
op opc2
0 000 ZIP1, ZIP2 (vectors) — ZIP1
0 001 ZIP1, ZIP2 (vectors) — ZIP2
0 010 UZP1, UZP2 (vectors) — UZP1
0 011 UZP1, UZP2 (vectors) — UZP2
0 10x UNALLOCATED
0 110 TRN1, TRN2 (vectors) — TRN1
0 111 TRN1, TRN2 (vectors) — TRN2
1 UNALLOCATED
SVE Integer Compare - Vectors
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100100 0 op0
Decode fields
Instruction details
op0
0 SVE integer compare vectors
1 SVE integer compare with wide elements
SVE integer compare vectors
These instructions are under SVE Integer Compare - Vectors.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm op 0 o2 Pg Zn ne Pd
Decode fields
Instruction Details
op o2 ne
0 0 0 CMP<cc> (vectors) — CMPHS
0 0 1 CMP<cc> (vectors) — CMPHI
0 1 0 CMP<cc> (wide elements) — CMPEQ
0 1 1 CMP<cc> (wide elements) — CMPNE
1 0 0 CMP<cc> (vectors) — CMPGE
Page 2428
Decode fields
Instruction Details
op o2 ne
1 0 1 CMP<cc> (vectors) — CMPGT
1 1 0 CMP<cc> (vectors) — CMPEQ
1 1 1 CMP<cc> (vectors) — CMPNE
SVE integer compare with wide elements
These instructions are under SVE Integer Compare - Vectors.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm U 1 lt Pg Zn ne Pd
Decode fields
Instruction Details
U lt ne
0 0 0 CMP<cc> (wide elements) — CMPGE
0 0 1 CMP<cc> (wide elements) — CMPGT
0 1 0 CMP<cc> (wide elements) — CMPLT
0 1 1 CMP<cc> (wide elements) — CMPLE
1 0 0 CMP<cc> (wide elements) — CMPHS
1 0 1 CMP<cc> (wide elements) — CMPHI
1 1 0 CMP<cc> (wide elements) — CMPLO
1 1 1 CMP<cc> (wide elements) — CMPLS
SVE integer compare with unsigned immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 lt Pg Zn ne Pd
Decode fields
Instruction Details
lt ne
0 0 CMP<cc> (immediate) — CMPHS
0 1 CMP<cc> (immediate) — CMPHI
1 0 CMP<cc> (immediate) — CMPLO
1 1 CMP<cc> (immediate) — CMPLS
SVE integer compare with signed immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 op 0 o2 Pg Zn ne Pd
Decode fields
Instruction Details
op o2 ne
0 0 0 CMP<cc> (immediate) — CMPGE
0 0 1 CMP<cc> (immediate) — CMPGT
0 1 0 CMP<cc> (immediate) — CMPLT
0 1 1 CMP<cc> (immediate) — CMPLE
1 0 0 CMP<cc> (immediate) — CMPEQ
1 0 1 CMP<cc> (immediate) — CMPNE
1 1 UNALLOCATED
Page 2429
SVE predicate logical operations
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 0 Pm 0 1 Pg o2 Pn o3 Pd
Decode fields
Instruction Details
op S o2 o3
0 0 0 0 AND, ANDS (predicates) — not setting the condition flags
0 0 0 1 BIC, BICS (predicates) — not setting the condition flags
0 0 1 0 EOR, EORS (predicates) — not setting the condition flags
0 0 1 1 SEL (predicates)
0 1 0 0 AND, ANDS (predicates) — setting the condition flags
0 1 0 1 BIC, BICS (predicates) — setting the condition flags
0 1 1 0 EOR, EORS (predicates) — setting the condition flags
0 1 1 1 UNALLOCATED
1 0 0 0 ORR, ORRS (predicates) — not setting the condition flags
1 0 0 1 ORN, ORNS (predicates) — not setting the condition flags
1 0 1 0 NOR, NORS — not setting the condition flags
1 0 1 1 NAND, NANDS — not setting the condition flags
1 1 0 0 ORR, ORRS (predicates) — setting the condition flags
1 1 0 1 ORN, ORNS (predicates) — setting the condition flags
1 1 1 0 NOR, NORS — setting the condition flags
1 1 1 1 NAND, NANDS — setting the condition flags
SVE Propagate Break
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 00 11 op0
Decode fields
Instruction details
op0
0 SVE propagate break from previous partition
1 UNALLOCATED
SVE propagate break from previous partition
These instructions are under SVE Propagate Break.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 0 Pm 1 1 Pg 0 Pn B Pd
Decode fields
Instruction Details
op S B
0 0 0 BRKPA, BRKPAS — not setting the condition flags
0 0 1 BRKPB, BRKPBS — not setting the condition flags
0 1 0 BRKPA, BRKPAS — setting the condition flags
0 1 1 BRKPB, BRKPBS — setting the condition flags
1 UNALLOCATED
Page 2430
SVE Partition Break
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 op0 01 op1 01 op2 op3
Decode fields
Instruction details
op0 op1 op2 op3
0 1000 0 0 SVE propagate break to next partition
0 x1xx UNALLOCATED
0 xx1x UNALLOCATED
0 xxx1 UNALLOCATED
0000 0 SVE partition break condition
SVE propagate break to next partition
These instructions are under SVE Partition Break.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 S 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm
Decode fields
Instruction Details
S
0 BRKN, BRKNS — not setting the condition flags
1 BRKN, BRKNS — setting the condition flags
SVE partition break condition
These instructions are under SVE Partition Break.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 B S 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd
Decode fields
Instruction Details
B S M
1 1 UNALLOCATED
0 0 BRKA, BRKAS — not setting the condition flags
0 1 0 BRKA, BRKAS — setting the condition flags
1 0 BRKB, BRKBS — not setting the condition flags
1 1 0 BRKB, BRKBS — setting the condition flags
SVE Predicate Misc
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 01 op0 11 op1 op2 op3 op4
Decode fields
Instruction details
op0 op1 op2 op3 op4
0000 x0 0 SVE predicate test
Page 2431
0x10 x0 0 UNALLOCATED
0xx1 x0 0 UNALLOCATED
0xxx x1 0 UNALLOCATED
1000 000 00 0 SVE predicate first active
1000 000 != 00 0 UNALLOCATED
1000 100 10 0000 0 SVE predicate zero
1000 100 10 != 0000 0 UNALLOCATED
1000 110 00 0 SVE predicate read from FFR (predicated)
1001 000 0x 0 UNALLOCATED
1001 000 10 0 PNEXT
1001 110 00 0000 0 SVE predicate read from FFR (unpredicated)
1001 110 00 != 0000 0 UNALLOCATED
100x 010 0 UNALLOCATED
100x 100 0x 0 SVE predicate initialize
100x 100 11 0 UNALLOCATED
100x 110 != 00 0 UNALLOCATED
100x xx1 0 UNALLOCATED
110x 0 UNALLOCATED
1x1x 0 UNALLOCATED
1 UNALLOCATED
SVE predicate test
These instructions are under SVE Predicate Misc.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 0 0 0 0 1 1 Pg 0 Pn 0 opc2
Decode fields
Instruction Details
op S opc2
0 0 UNALLOCATED
0 1 0000 PTEST
0 1 001x UNALLOCATED
0 1 01xx UNALLOCATED
0 1 1xxx UNALLOCATED
1 UNALLOCATED
SVE predicate first active
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 0 0 0 0 0 Pg 0 Pdn
Decode fields
Instruction Details
op S
0 0 UNALLOCATED
0 1 PFIRST
Page 2432
Decode fields
Instruction Details
op S
1 UNALLOCATED
SVE predicate zero
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 Pd
Decode fields
Instruction Details
op S
0 0 PFALSE
0 1 UNALLOCATED
1 UNALLOCATED
SVE predicate read from FFR (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd
Decode fields
Instruction Details
op S
0 0 RDFFR, RDFFRS (predicated) — not setting the condition flags
0 1 RDFFR, RDFFRS (predicated) — setting the condition flags
1 UNALLOCATED
SVE predicate read from FFR (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Pd
Decode fields
Instruction Details
op S
0 0 RDFFR (unpredicated)
0 1 UNALLOCATED
1 UNALLOCATED
SVE predicate initialize
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 S 1 1 1 0 0 0 pattern 0 Pd
Decode fields
Instruction Details
S
0 PTRUE, PTRUES — not setting the condition flags
1 PTRUE, PTRUES — setting the condition flags
Page 2433
SVE Integer Compare - Scalars
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 1 00 op0 op1 op2
Decode fields
Instruction details
op0 op1 op2
0 SVE integer compare scalar count and limit
1 000 0000 SVE conditionally terminate scalars
1 000 != 0000 UNALLOCATED
SVE integer compare scalar count and limit
These instructions are under SVE Integer Compare - Scalars.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf U lt Rn eq Pd
Decode fields
Instruction Details
U lt eq
0 UNALLOCATED
0 1 0 WHILELT
0 1 1 WHILELE
1 1 0 WHILELO
1 1 1 WHILELS
SVE conditionally terminate scalars
These instructions are under SVE Integer Compare - Scalars.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op sz 1 Rm 0 0 1 0 0 0 Rn ne 0 0 0 0
Decode fields
Instruction Details
op ne
0 UNALLOCATED
1 0 CTERMEQ, CTERMNE — CTERMEQ
1 1 CTERMEQ, CTERMNE — CTERMNE
SVE Integer Wide Immediate - Unpredicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 1 op0 op1 11
Decode fields
Instruction details
op0 op1
00 SVE integer add/subtract immediate (unpredicated)
01 SVE integer min/max immediate (unpredicated)
10 SVE integer multiply immediate (unpredicated)
11 0 SVE broadcast integer immediate (unpredicated)
11 1 SVE broadcast floating-point immediate (unpredicated)
Page 2434
SVE integer add/subtract immediate (unpredicated)
These instructions are under SVE Integer Wide Immediate - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 opc 1 1 sh imm8 Zdn
Decode fields
Instruction Details
opc
000 ADD (immediate)
001 SUB (immediate)
010 UNALLOCATED
011 SUBR (immediate)
100 SQADD (immediate)
101 UQADD (immediate)
110 SQSUB (immediate)
111 UQSUB (immediate)
SVE integer min/max immediate (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 opc 1 1 o2 imm8 Zdn
Decode fields
Instruction Details
opc o2
0xx 1 UNALLOCATED
000 0 SMAX (immediate)
001 0 UMAX (immediate)
010 0 SMIN (immediate)
011 0 UMIN (immediate)
1xx UNALLOCATED
SVE integer multiply immediate (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 0 opc 1 1 o2 imm8 Zdn
Decode fields
Instruction Details
opc o2
000 0 MUL (immediate)
000 1 UNALLOCATED
001 UNALLOCATED
01x UNALLOCATED
1xx UNALLOCATED
SVE broadcast integer immediate (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 opc 0 1 1 sh imm8 Zd
Page 2435
Decode fields
Instruction Details
opc
00 DUP (immediate)
01 UNALLOCATED
1x UNALLOCATED
SVE broadcast floating-point immediate (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 opc 1 1 1 o2 imm8 Zd
Decode fields
Instruction Details
opc o2
00 0 FDUP
00 1 UNALLOCATED
01 UNALLOCATED
1x UNALLOCATED
SVE predicate count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 opc 1 0 Pg o2 Pn Rd
Decode fields
Instruction Details
opc o2
000 0 CNTP
000 1 UNALLOCATED
001 UNALLOCATED
01x UNALLOCATED
1xx UNALLOCATED
SVE Inc/Dec by Predicate Count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 101 op0 1000 op1
Decode fields
Instruction details
op0 op1
0 0 SVE saturating inc/dec vector by predicate count
0 1 SVE saturating inc/dec register by predicate count
1 0 SVE inc/dec vector by predicate count
1 1 SVE inc/dec register by predicate count
SVE saturating inc/dec vector by predicate count
These instructions are under SVE Inc/Dec by Predicate Count.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 D U 1 0 0 0 0 opc Pm Zdn
Page 2436
Decode fields
Instruction Details
D U opc
01 UNALLOCATED
1x UNALLOCATED
0 0 00 SQINCP (vector)
0 1 00 UQINCP (vector)
1 0 00 SQDECP (vector)
1 1 00 UQDECP (vector)
SVE saturating inc/dec register by predicate count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 D U 1 0 0 0 1 sf op Pm Rdn
Decode fields
Instruction Details
D U sf op
1 UNALLOCATED
0 0 0 0 SQINCP (scalar) — 32-bit
0 0 1 0 SQINCP (scalar) — 64-bit
0 1 0 0 UQINCP (scalar) — 32-bit
0 1 1 0 UQINCP (scalar) — 64-bit
1 0 0 0 SQDECP (scalar) — 32-bit
1 0 1 0 SQDECP (scalar) — 64-bit
1 1 0 0 UQDECP (scalar) — 32-bit
1 1 1 0 UQDECP (scalar) — 64-bit
SVE inc/dec vector by predicate count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 op D 1 0 0 0 0 opc2 Pm Zdn
Decode fields
Instruction Details
op D opc2
0 01 UNALLOCATED
0 1x UNALLOCATED
0 0 00 INCP (vector)
0 1 00 DECP (vector)
1 UNALLOCATED
SVE inc/dec register by predicate count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 op D 1 0 0 0 1 opc2 Pm Rdn
Decode fields
Instruction Details
op D opc2
0 01 UNALLOCATED
0 1x UNALLOCATED
Page 2437
Decode fields
Instruction Details
op D opc2
0 0 00 INCP (scalar)
0 1 00 DECP (scalar)
1 UNALLOCATED
SVE Write FFR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 101 op0 op1 1001 op2 op3 op4
Decode fields
Instruction details
op0 op1 op2 op3 op4
0 00 000 00000 SVE FFR write from predicate
1 00 000 0000 00000 SVE FFR initialise
1 00 000 1xxx 00000 UNALLOCATED
1 00 000 x1xx 00000 UNALLOCATED
1 00 000 xx1x 00000 UNALLOCATED
1 00 000 xxx1 00000 UNALLOCATED
!= 00 UNALLOCATED
SVE FFR write from predicate
These instructions are under SVE Write FFR.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 opc 1 0 1 0 0 0 1 0 0 1 0 0 0 Pn 0 0 0 0 0
Decode fields
Instruction Details
opc
00 WRFFR
01 UNALLOCATED
1x UNALLOCATED
SVE FFR initialise
These instructions are under SVE Write FFR.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 opc 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Decode fields
Instruction Details
opc
00 SETFFR
01 UNALLOCATED
1x UNALLOCATED
SVE Integer Multiply-Add - Unpredicated
Page 2438
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000100 0 0 op0 op1 op2
Decode fields
Instruction details
op0 op1 op2
0 000 SVE integer dot product (unpredicated)
1 0xx UNALLOCATED
1 10x UNALLOCATED
1 110 UNALLOCATED
1 111 0 SVE mixed sign dot product
1 111 1 UNALLOCATED
SVE integer dot product (unpredicated)
These instructions are under SVE Integer Multiply-Add - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 U Zn Zda
Decode fields
Instruction Details
U
0 SDOT (vectors)
1 UDOT (vectors)
SVE mixed sign dot product
These instructions are under SVE Integer Multiply-Add - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 1 1 1 1 0 Zn Zda
Decode fields
Instruction Details
size
0x UNALLOCATED
10 USDOT (vectors)
11 UNALLOCATED
SVE Multiply - Indexed
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000100 1 op0 op1
Decode fields
Instruction details
op0 op1
000 00 SVE integer dot product (indexed)
000 01 UNALLOCATED
000 10 UNALLOCATED
000 11 SVE mixed sign dot product (indexed)
!= 000 UNALLOCATED
SVE integer dot product (indexed)
These instructions are under SVE Multiply - Indexed.
Page 2439
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 1 opc 0 0 0 0 0 U Zn Zda
Decode fields
Instruction Details
size U
0x UNALLOCATED
10 0 SDOT (indexed) — 32-bit
10 1 UDOT (indexed) — 32-bit
11 0 SDOT (indexed) — 64-bit
11 1 UDOT (indexed) — 64-bit
SVE mixed sign dot product (indexed)
These instructions are under SVE Multiply - Indexed.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 1 opc 0 0 0 1 1 U Zn Zda
Decode fields
Instruction Details
size U
0x UNALLOCATED
10 0 USDOT (indexed)
10 1 SUDOT
11 UNALLOCATED
SVE Misc
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000101 0 10 op0
Decode fields
Instruction details
op0
00xx UNALLOCATED
010x UNALLOCATED
0110 SVE integer matrix multiply accumulate
0111 UNALLOCATED
1xxx UNALLOCATED
SVE integer matrix multiply accumulate
These instructions are under SVE Misc.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 uns 0 Zm 1 0 0 1 1 0 Zn Zd
Decode fields
Instruction Details
uns
00 SMMLA
01 UNALLOCATED
10 USMMLA
11 UMMLA
Page 2440
SVE floating-point convert precision odd elements
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 opc 0 0 1 0 opc2 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc opc2
0x UNALLOCATED
10 0x UNALLOCATED
10 10 BFCVTNT
10 11 UNALLOCATED
11 UNALLOCATED
SVE floating-point multiply-add (indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 0 0 0 op Zn Zda
Decode fields
Instruction Details
size op
0x 0 FMLA (indexed) — half-precision
0x 1 FMLS (indexed) — half-precision
10 0 FMLA (indexed) — single-precision
10 1 FMLS (indexed) — single-precision
11 0 FMLA (indexed) — double-precision
11 1 FMLS (indexed) — double-precision
SVE floating-point complex multiply-add (indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 0 1 rot Zn Zda
Decode fields
Instruction Details
size
0x UNALLOCATED
10 FCMLA (indexed) — half-precision
11 FCMLA (indexed) — single-precision
SVE floating-point multiply (indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 1 0 0 0 Zn Zd
Decode fields
Instruction Details
size
0x FMUL (indexed) — half-precision
10 FMUL (indexed) — single-precision
11 FMUL (indexed) — double-precision
Page 2441
SVE Floating Point Widening Multiply-Add - Indexed
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100100 op0 1 01 op1 0 op2
Decode fields
Instruction details
op0 op1 op2
0 0 00 SVE BFloat16 floating-point dot product (indexed)
0 1 UNALLOCATED
1 SVE floating-point multiply-add long (indexed)
SVE floating-point multiply-add long (indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 o2 1 i3h Zm 0 1 op 0 i3l T Zn Zda
Decode fields
Instruction Details
o2 op T
0 UNALLOCATED
1 0 0 BFMLALB (indexed)
1 0 1 BFMLALT (indexed)
1 1 UNALLOCATED
SVE BFloat16 floating-point dot product (indexed)
These instructions are under SVE Floating Point Widening Multiply-Add - Indexed.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 op 1 i2 Zm 0 1 0 0 0 0 Zn Zda
Decode fields
Instruction Details
op
0 UNALLOCATED
1 BFDOT (indexed)
SVE floating-point multiply-add long (indexed)
These instructions are under SVE Floating Point Widening Multiply-Add - Indexed.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 o2 1 i3h Zm 0 1 op 0 i3l T Zn Zda
Decode fields
Instruction Details
o2 op T
0 UNALLOCATED
1 0 0 BFMLALB (indexed)
1 0 1 BFMLALT (indexed)
1 1 UNALLOCATED
Page 2442
SVE Floating Point Widening Multiply-Add
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100100 op0 1 10 op1 00 op2
Decode fields
Instruction details
op0 op1 op2
0 0 0 SVE BFloat16 floating-point dot product
0 0 1 UNALLOCATED
0 1 UNALLOCATED
1 SVE floating-point multiply-add long
SVE floating-point multiply-add long
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 o2 1 Zm 1 0 op 0 0 T Zn Zda
Decode fields
Instruction Details
o2 op T
0 UNALLOCATED
1 0 0 BFMLALB (vectors)
1 0 1 BFMLALT (vectors)
1 1 UNALLOCATED
SVE BFloat16 floating-point dot product
These instructions are under SVE Floating Point Widening Multiply-Add.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 op 1 Zm 1 0 0 0 0 0 Zn Zda
Decode fields
Instruction Details
op
0 UNALLOCATED
1 BFDOT (vectors)
SVE floating-point multiply-add long
These instructions are under SVE Floating Point Widening Multiply-Add.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 o2 1 Zm 1 0 op 0 0 T Zn Zda
Decode fields
Instruction Details
o2 op T
0 UNALLOCATED
1 0 0 BFMLALB (vectors)
1 0 1 BFMLALT (vectors)
1 1 UNALLOCATED
SVE floating point matrix multiply accumulate
Page 2443
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 opc 1 Zm 1 1 1 0 0 1 Zn Zda
Decode fields
Instruction Details
opc
00 UNALLOCATED
01 BFMMLA
10 FMMLA — 32-bit element
11 FMMLA — 64-bit element
SVE floating-point compare vectors
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm op 1 o2 Pg Zn o3 Pd
Decode fields
Instruction Details
op o2 o3
0 0 0 FCM<cc> (vectors) — FCMGE
0 0 1 FCM<cc> (vectors) — FCMGT
0 1 0 FCM<cc> (vectors) — FCMEQ
0 1 1 FCM<cc> (vectors) — FCMNE
1 0 0 FCM<cc> (vectors) — FCMUO
1 0 1 FAC<cc> — FACGE
1 1 0 UNALLOCATED
1 1 1 FAC<cc> — FACGT
SVE floating-point arithmetic (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 opc Zn Zd
Decode fields
Instruction Details
opc
000 FADD (vectors, unpredicated)
001 FSUB (vectors, unpredicated)
010 FMUL (vectors, unpredicated)
011 FTSMUL
10x UNALLOCATED
110 FRECPS
111 FRSQRTS
SVE Floating Point Arithmetic - Predicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100101 0 op0 100 op1 op2
Decode fields
Instruction details
op0 op1 op2
0x SVE floating-point arithmetic (predicated)
Page 2444
10 000 FTMAD
11 0000 SVE floating-point arithmetic with immediate (predicated)
SVE floating-point arithmetic (predicated)
These instructions are under SVE Floating Point Arithmetic - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 opc 1 0 0 Pg Zm Zdn
Decode fields
Instruction Details
opc
0000 FADD (vectors, predicated)
0001 FSUB (vectors, predicated)
0010 FMUL (vectors, predicated)
0011 FSUBR (vectors)
0100 FMAXNM (vectors)
0101 FMINNM (vectors)
0110 FMAX (vectors)
0111 FMIN (vectors)
1000 FABD
1001 FSCALE
1010 FMULX
1011 UNALLOCATED
1100 FDIVR
1101 FDIV
111x UNALLOCATED
SVE floating-point arithmetic with immediate (predicated)
These instructions are under SVE Floating Point Arithmetic - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 opc 1 0 0 Pg 0 0 0 0 i1 Zdn
Decode fields
Instruction Details
opc
000 FADD (immediate)
001 FSUB (immediate)
010 FMUL (immediate)
011 FSUBR (immediate)
100 FMAXNM (immediate)
101 FMINNM (immediate)
110 FMAX (immediate)
111 FMIN (immediate)
SVE Floating Point Unary Operations - Predicated
Page 2445
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100101 0 op0 101
Decode fields
Instruction details
op0
00x SVE floating-point round to integral value
010 SVE floating-point convert precision
011 SVE floating-point unary operations
10x SVE integer convert to floating-point
11x SVE floating-point convert to integer
SVE floating-point round to integral value
These instructions are under SVE Floating Point Unary Operations - Predicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 opc 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc
000 FRINT<r> — nearest with ties to even
001 FRINT<r> — toward plus infinity
010 FRINT<r> — toward minus infinity
011 FRINT<r> — toward zero
100 FRINT<r> — nearest with ties to away
101 UNALLOCATED
110 FRINT<r> — current mode signalling inexact
111 FRINT<r> — current mode
SVE floating-point convert precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 opc 0 0 1 0 opc2 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc opc2
0x UNALLOCATED
10 00 FCVT — single-precision to half-precision
10 01 FCVT — half-precision to single-precision
10 10 BFCVT
10 11 UNALLOCATED
11 00 FCVT — double-precision to half-precision
11 01 FCVT — half-precision to double-precision
11 10 FCVT — double-precision to single-precision
11 11 FCVT — single-precision to double-precision
SVE floating-point unary operations
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 opc 1 0 1 Pg Zn Zd
Page 2446
Decode fields
Instruction Details
opc
00 FRECPX
01 FSQRT
1x UNALLOCATED
SVE integer convert to floating-point
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 opc 0 1 0 opc2 U 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc opc2 U
00 UNALLOCATED
01 00 UNALLOCATED
01 01 0 SCVTF — 16-bit to half-precision
01 01 1 UCVTF — 16-bit to half-precision
10 0x UNALLOCATED
10 10 0 SCVTF — 32-bit to single-precision
10 10 1 UCVTF — 32-bit to single-precision
10 11 UNALLOCATED
11 00 0 SCVTF — 32-bit to double-precision
11 00 1 UCVTF — 32-bit to double-precision
11 01 UNALLOCATED
11 10 0 SCVTF — 64-bit to single-precision
11 10 1 UCVTF — 64-bit to single-precision
11 11 0 SCVTF — 64-bit to double-precision
11 11 1 UCVTF — 64-bit to double-precision
SVE floating-point convert to integer
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 opc 0 1 1 opc2 U 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc opc2 U
00 UNALLOCATED
01 00 UNALLOCATED
01 01 0 FCVTZS — half-precision to 16-bit
01 01 1 FCVTZU — half-precision to 16-bit
Page 2447
Decode fields
Instruction Details
opc opc2 U
10 0x UNALLOCATED
10 10 0 FCVTZS — single-precision to 32-bit
10 10 1 FCVTZU — single-precision to 32-bit
10 11 UNALLOCATED
11 00 0 FCVTZS — double-precision to 32-bit
11 00 1 FCVTZU — double-precision to 32-bit
11 01 UNALLOCATED
11 10 0 FCVTZS — single-precision to 64-bit
11 10 1 FCVTZU — single-precision to 64-bit
11 11 0 FCVTZS — double-precision to 64-bit
11 11 1 FCVTZU — double-precision to 64-bit
SVE floating-point recursive reduction
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 opc 0 0 1 Pg Zn Vd
Decode fields
Instruction Details
opc
000 FADDV
001 UNALLOCATED
01x UNALLOCATED
100 FMAXNMV
101 FMINNMV
110 FMAXV
111 FMINV
SVE Floating Point Unary Operations - Unpredicated
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100101 001 0011 op0
Decode fields
Instruction details
op0
00 SVE floating-point reciprocal estimate (unpredicated)
!= 00 UNALLOCATED
SVE floating-point reciprocal estimate (unpredicated)
These instructions are under SVE Floating Point Unary Operations - Unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 opc 0 0 1 1 0 0 Zn Zd
Decode fields
Instruction Details
opc
0xx UNALLOCATED
10x UNALLOCATED
Page 2448
Decode fields
Instruction Details
opc
110 FRECPE
111 FRSQRTE
SVE Floating Point Compare - with Zero
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100101 010 op0 001
Decode fields
Instruction details
op0
0 SVE floating-point compare with zero
1 UNALLOCATED
SVE floating-point compare with zero
These instructions are under SVE Floating Point Compare - with Zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 eq lt 0 0 1 Pg Zn ne Pd
Decode fields
Instruction Details
eq lt ne
0 0 0 FCM<cc> (zero) — FCMGE
0 0 1 FCM<cc> (zero) — FCMGT
0 1 0 FCM<cc> (zero) — FCMLT
0 1 1 FCM<cc> (zero) — FCMLE
1 1 UNALLOCATED
1 0 0 FCM<cc> (zero) — FCMEQ
1 1 0 FCM<cc> (zero) — FCMNE
SVE floating-point serial reduction (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 opc 0 0 1 Pg Zm Vdn
Decode fields
Instruction Details
opc
000 FADDA
001 UNALLOCATED
01x UNALLOCATED
1xx UNALLOCATED
SVE Floating Point Multiply-Add
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100101 1 op0
Page 2449
Decode fields
Instruction details
op0
0 SVE floating-point multiply-accumulate writing addend
1 SVE floating-point multiply-accumulate writing multiplicand
SVE floating-point multiply-accumulate writing addend
These instructions are under SVE Floating Point Multiply-Add.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 opc Pg Zn Zda
Decode fields
Instruction Details
opc
00 FMLA (vectors)
01 FMLS (vectors)
10 FNMLA
11 FNMLS
SVE floating-point multiply-accumulate writing multiplicand
These instructions are under SVE Floating Point Multiply-Add.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 opc Pg Zm Zdn
Decode fields
Instruction Details
opc
00 FMAD
01 FMSB
10 FNMAD
11 FNMSB
SVE Memory - 32-bit Gather and Unsized Contiguous
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1000010 op0 op1 op2 op3
Decode fields
Instruction details
op0 op1 op2 op3
00 x1 0xx 0 SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets)
00 x1 0xx 1 UNALLOCATED
01 x1 0xx SVE 32-bit gather load halfwords (scalar plus 32-bit scaled offsets)
10 x1 0xx SVE 32-bit gather load words (scalar plus 32-bit scaled offsets)
11 0x 000 0 LDR (predicate)
11 0x 010 LDR (vector)
11 0x 0x1 UNALLOCATED
11 1x 0xx 0 SVE contiguous prefetch (scalar plus immediate)
11 1x 0xx 1 UNALLOCATED
!= 11 x0 0xx SVE 32-bit gather load (scalar plus 32-bit unscaled offsets)
00 10x UNALLOCATED
Page 2450
00 110 0 SVE contiguous prefetch (scalar plus scalar)

00 111 0 SVE 32-bit gather prefetch (vector plus immediate)
00 11x 1 UNALLOCATED
01 1xx SVE 32-bit gather load (vector plus immediate)
1x 1xx SVE load and broadcast element
SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets)
These instructions are under SVE Memory - 32-bit Gather and Unsized Contiguous.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 msz Pg Rn 0 prfop
Decode fields
Instruction Details
msz
00 PRFB (scalar plus vector)
01 PRFH (scalar plus vector)
10 PRFW (scalar plus vector)
11 PRFD (scalar plus vector)
SVE 32-bit gather load halfwords (scalar plus 32-bit scaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 U ff Pg Rn Zt
Decode fields
Instruction Details
U ff
0 0 LD1SH (scalar plus vector)
0 1 LDFF1SH (scalar plus vector)
1 0 LD1H (scalar plus vector)
1 1 LDFF1H (scalar plus vector)
SVE 32-bit gather load words (scalar plus 32-bit scaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 1 Zm 0 U ff Pg Rn Zt
Decode fields
Instruction Details
U ff
0 UNALLOCATED
1 0 LD1W (scalar plus vector)
1 1 LDFF1W (scalar plus vector)
SVE contiguous prefetch (scalar plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 msz Pg Rn 0 prfop
Page 2451
Decode fields
Instruction Details
msz
00 PRFB (scalar plus immediate)
01 PRFH (scalar plus immediate)
10 PRFW (scalar plus immediate)
11 PRFD (scalar plus immediate)
SVE 32-bit gather load (scalar plus 32-bit unscaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 != 11 xs 0 Zm 0 U ff Pg Rn Zt
opc
Decode fields
Instruction Details
opc U ff
00 0 0 LD1SB (scalar plus vector)
00 0 1 LDFF1SB (scalar plus vector)
00 1 0 LD1B (scalar plus vector)
00 1 1 LDFF1B (scalar plus vector)
01 0 0 LD1SH (scalar plus vector)
01 0 1 LDFF1SH (scalar plus vector)
01 1 0 LD1H (scalar plus vector)
01 1 1 LDFF1H (scalar plus vector)
10 0 UNALLOCATED
10 1 0 LD1W (scalar plus vector)
10 1 1 LDFF1W (scalar plus vector)
SVE contiguous prefetch (scalar plus scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 msz 0 0 Rm 1 1 0 Pg Rn 0 prfop
Decode fields
Instruction Details
msz
00 PRFB (scalar plus scalar)
01 PRFH (scalar plus scalar)
10 PRFW (scalar plus scalar)
11 PRFD (scalar plus scalar)
SVE 32-bit gather prefetch (vector plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 msz 0 0 imm5 1 1 1 Pg Zn 0 prfop
Page 2452
Decode fields
Instruction Details
msz
00 PRFB (vector plus immediate)
01 PRFH (vector plus immediate)
10 PRFW (vector plus immediate)
11 PRFD (vector plus immediate)
SVE 32-bit gather load (vector plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 msz 0 1 imm5 1 U ff Pg Zn Zt
Decode fields
Instruction Details
msz U ff
00 0 0 LD1SB (vector plus immediate)
00 0 1 LDFF1SB (vector plus immediate)
00 1 0 LD1B (vector plus immediate)
00 1 1 LDFF1B (vector plus immediate)
01 0 0 LD1SH (vector plus immediate)
01 0 1 LDFF1SH (vector plus immediate)
01 1 0 LD1H (vector plus immediate)
01 1 1 LDFF1H (vector plus immediate)
10 0 UNALLOCATED
10 1 0 LD1W (vector plus immediate)
10 1 1 LDFF1W (vector plus immediate)
11 UNALLOCATED
SVE load and broadcast element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 dtypeh 1 imm6 1 dtypel Pg Rn Zt
Decode fields
Instruction Details
dtypeh dtypel
00 00 LD1RB — 8-bit element
01 00 LD1RSW
01 01 LD1RH — 16-bit element
10 00 LD1RSH — 64-bit element
10 01 LD1RSH — 32-bit element
10 10 LD1RW — 32-bit element
10 11 LD1RW — 64-bit element
11 00 LD1RSB — 64-bit element
Page 2453
Decode fields
Instruction Details
dtypeh dtypel
11 11 LD1RD
SVE Memory - Contiguous Load
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1010010 op0 op1 op2
Decode fields
Instruction details
op0 op1 op2
00 0 111 SVE contiguous non-temporal load (scalar plus immediate)
00 110 SVE contiguous non-temporal load (scalar plus scalar)
!= 00 0 111 SVE load multiple structures (scalar plus immediate)
!= 00 110 SVE load multiple structures (scalar plus scalar)
0 001 SVE load and broadcast quadword (scalar plus immediate)
0 101 SVE contiguous load (scalar plus immediate)
1 001 UNALLOCATED
1 101 SVE contiguous non-fault load (scalar plus immediate)
1 111 UNALLOCATED
000 SVE load and broadcast quadword (scalar plus scalar)
010 SVE contiguous load (scalar plus scalar)
011 SVE contiguous first-fault load (scalar plus scalar)
100 UNALLOCATED
SVE contiguous non-temporal load (scalar plus immediate)
These instructions are under SVE Memory - Contiguous Load.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 msz 0 0 0 imm4 1 1 1 Pg Rn Zt
Decode fields
Instruction Details
msz
00 LDNT1B (scalar plus immediate)
01 LDNT1H (scalar plus immediate)
10 LDNT1W (scalar plus immediate)
11 LDNT1D (scalar plus immediate)
SVE contiguous non-temporal load (scalar plus scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 msz 0 0 Rm 1 1 0 Pg Rn Zt
Decode fields
Instruction Details
msz
00 LDNT1B (scalar plus scalar)
01 LDNT1H (scalar plus scalar)
10 LDNT1W (scalar plus scalar)
Page 2454
Decode fields
Instruction Details
msz
11 LDNT1D (scalar plus scalar)
SVE load multiple structures (scalar plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 msz != 00 0 imm4 1 1 1 Pg Rn Zt
opc
Decode fields
Instruction Details
msz opc
00 01 LD2B (scalar plus immediate)
01 01 LD2H (scalar plus immediate)
10 01 LD2W (scalar plus immediate)
11 01 LD2D (scalar plus immediate)
SVE load multiple structures (scalar plus scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 msz != 00 Rm 1 1 0 Pg Rn Zt
opc
Decode fields
Instruction Details
msz opc
00 01 LD2B (scalar plus scalar)
01 01 LD2H (scalar plus scalar)
10 01 LD2W (scalar plus scalar)
11 01 LD2D (scalar plus scalar)
Page 2455
SVE load and broadcast quadword (scalar plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 msz ssz 0 imm4 0 0 1 Pg Rn Zt
Decode fields
Instruction Details
msz ssz
1x UNALLOCATED
00 00 LD1RQB (scalar plus immediate)
00 01 LD1ROB (scalar plus immediate)
01 00 LD1RQH (scalar plus immediate)
01 01 LD1ROH (scalar plus immediate)
10 00 LD1RQW (scalar plus immediate)
10 01 LD1ROW (scalar plus immediate)
11 00 LD1RQD (scalar plus immediate)
11 01 LD1ROD (scalar plus immediate)
SVE contiguous load (scalar plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 dtype 0 imm4 1 0 1 Pg Rn Zt
Decode fields
Instruction Details
dtype
0000 LD1B (scalar plus immediate) — 8-bit element
0100 LD1SW (scalar plus immediate)
0101 LD1H (scalar plus immediate) — 16-bit element
1000 LD1SH (scalar plus immediate) — 64-bit element
1001 LD1SH (scalar plus immediate) — 32-bit element
1010 LD1W (scalar plus immediate) — 32-bit element
1011 LD1W (scalar plus immediate) — 64-bit element
1100 LD1SB (scalar plus immediate) — 64-bit element
1111 LD1D (scalar plus immediate)
SVE contiguous non-fault load (scalar plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 dtype 1 imm4 1 0 1 Pg Rn Zt
Decode fields
Instruction Details
dtype
0000 LDNF1B — 8-bit element
Page 2456
Decode fields
Instruction Details
dtype
0100 LDNF1SW
0101 LDNF1H — 16-bit element
1000 LDNF1SH — 64-bit element
1001 LDNF1SH — 32-bit element
1010 LDNF1W — 32-bit element
1011 LDNF1W — 64-bit element
1100 LDNF1SB — 64-bit element
1111 LDNF1D
SVE load and broadcast quadword (scalar plus scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 msz ssz Rm 0 0 0 Pg Rn Zt
Decode fields
Instruction Details
msz ssz
1x UNALLOCATED
00 00 LD1RQB (scalar plus scalar)
00 01 LD1ROB (scalar plus scalar)
01 00 LD1RQH (scalar plus scalar)
01 01 LD1ROH (scalar plus scalar)
10 00 LD1RQW (scalar plus scalar)
10 01 LD1ROW (scalar plus scalar)
11 00 LD1RQD (scalar plus scalar)
11 01 LD1ROD (scalar plus scalar)
SVE contiguous load (scalar plus scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 dtype Rm 0 1 0 Pg Rn Zt
Decode fields
Instruction Details
dtype
0000 LD1B (scalar plus scalar) — 8-bit element
0100 LD1SW (scalar plus scalar)
0101 LD1H (scalar plus scalar) — 16-bit element
Page 2457
Decode fields
Instruction Details
dtype
1000 LD1SH (scalar plus scalar) — 64-bit element
1001 LD1SH (scalar plus scalar) — 32-bit element
1010 LD1W (scalar plus scalar) — 32-bit element
1011 LD1W (scalar plus scalar) — 64-bit element
1100 LD1SB (scalar plus scalar) — 64-bit element
1111 LD1D (scalar plus scalar)
SVE contiguous first-fault load (scalar plus scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 dtype Rm 0 1 1 Pg Rn Zt
Decode fields
Instruction Details
dtype
0000 LDFF1B (scalar plus scalar) — 8-bit element
0100 LDFF1SW (scalar plus scalar)
0101 LDFF1H (scalar plus scalar) — 16-bit element
1000 LDFF1SH (scalar plus scalar) — 64-bit element
1001 LDFF1SH (scalar plus scalar) — 32-bit element
1010 LDFF1W (scalar plus scalar) — 32-bit element
1011 LDFF1W (scalar plus scalar) — 64-bit element
1100 LDFF1SB (scalar plus scalar) — 64-bit element
1111 LDFF1D (scalar plus scalar)
SVE Memory - 64-bit Gather
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1100010 op0 op1 op2 op3
Decode fields
Instruction details
op0 op1 op2 op3
00 01 0xx 1 UNALLOCATED
00 11 1xx 0 SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets)
00 11 1 UNALLOCATED
00 x1 0xx 0 SVE 64-bit gather prefetch (scalar plus unpacked 32-bit scaled offsets)
Page 2458
!= 00 11 1xx SVE 64-bit gather load (scalar plus 64-bit scaled offsets)
!= 00 x1 0xx SVE 64-bit gather load (scalar plus 32-bit unpacked scaled offsets)
00 10x UNALLOCATED
00 110 UNALLOCATED
00 111 0 SVE 64-bit gather prefetch (vector plus immediate)
01 1xx SVE 64-bit gather load (vector plus immediate)
10 1xx SVE 64-bit gather load (scalar plus 64-bit unscaled offsets)
x0 0xx SVE 64-bit gather load (scalar plus unpacked 32-bit unscaled offsets)
SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets)
These instructions are under SVE Memory - 64-bit Gather.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 msz Pg Rn 0 prfop
Decode fields
Instruction Details
msz
SVE 64-bit gather prefetch (scalar plus unpacked 32-bit scaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 msz Pg Rn 0 prfop
Decode fields
Instruction Details
msz
SVE 64-bit gather load (scalar plus 64-bit scaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 != 00 1 1 Zm 1 U ff Pg Rn Zt
opc
Decode fields
Instruction Details
opc U ff
Page 2459
Decode fields
Instruction Details
opc U ff
10 0 0 LD1SW (scalar plus vector)
10 0 1 LDFF1SW (scalar plus vector)
11 0 UNALLOCATED
11 1 0 LD1D (scalar plus vector)
11 1 1 LDFF1D (scalar plus vector)
SVE 64-bit gather load (scalar plus 32-bit unpacked scaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 != 00 xs 1 Zm 0 U ff Pg Rn Zt
opc
Decode fields
Instruction Details
opc U ff
11 0 UNALLOCATED
SVE 64-bit gather prefetch (vector plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 msz 0 0 imm5 1 1 1 Pg Zn 0 prfop
Decode fields
Instruction Details
msz
00 PRFB (vector plus immediate)
01 PRFH (vector plus immediate)
10 PRFW (vector plus immediate)
11 PRFD (vector plus immediate)
SVE 64-bit gather load (vector plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 msz 0 1 imm5 1 U ff Pg Zn Zt
Page 2460
Decode fields
Instruction Details
msz U ff
00 0 0 LD1SB (vector plus immediate)
00 0 1 LDFF1SB (vector plus immediate)
00 1 0 LD1B (vector plus immediate)
00 1 1 LDFF1B (vector plus immediate)
01 0 0 LD1SH (vector plus immediate)
01 0 1 LDFF1SH (vector plus immediate)
01 1 0 LD1H (vector plus immediate)
01 1 1 LDFF1H (vector plus immediate)
10 0 0 LD1SW (vector plus immediate)
10 0 1 LDFF1SW (vector plus immediate)
10 1 0 LD1W (vector plus immediate)
10 1 1 LDFF1W (vector plus immediate)
11 0 UNALLOCATED
11 1 0 LD1D (vector plus immediate)
11 1 1 LDFF1D (vector plus immediate)
SVE 64-bit gather load (scalar plus 64-bit unscaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 msz 1 0 Zm 1 U ff Pg Rn Zt
Decode fields
Instruction Details
msz U ff
11 0 UNALLOCATED
SVE 64-bit gather load (scalar plus unpacked 32-bit unscaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 msz xs 0 Zm 0 U ff Pg Rn Zt
Page 2461
Decode fields
Instruction Details
msz U ff
11 0 UNALLOCATED
SVE Memory - Contiguous Store and Unsized Contiguous
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1110010 op0 0 op1 0 op2
Decode fields
Instruction details
op0 op1 op2
0xx 0 UNALLOCATED
10x 0 UNALLOCATED
110 0 0 STR (predicate)
110 0 1 UNALLOCATED
110 1 STR (vector)
111 0 UNALLOCATED
!= 110 1 SVE contiguous store (scalar plus scalar)
SVE contiguous store (scalar plus scalar)
These instructions are under SVE Memory - Contiguous Store and Unsized Contiguous.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 != 110 o2 Rm 0 1 0 Pg Rn Zt
opc
Decode fields
Instruction Details
opc o2
00x ST1B (scalar plus scalar)
01x ST1H (scalar plus scalar)
10x ST1W (scalar plus scalar)
111 0 UNALLOCATED
111 1 ST1D (scalar plus scalar)
Page 2462
SVE Memory - Non-temporal and Multi-register Store
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1110010 op0 0 op1 1
Decode fields
Instruction details
op0 op1
00 1 SVE contiguous non-temporal store (scalar plus scalar)
!= 00 1 SVE store multiple structures (scalar plus scalar)
0 UNALLOCATED
SVE contiguous non-temporal store (scalar plus scalar)
These instructions are under SVE Memory - Non-temporal and Multi-register Store.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 0 0 Rm 0 1 1 Pg Rn Zt
Decode fields
Instruction Details
msz
00 STNT1B (scalar plus scalar)
01 STNT1H (scalar plus scalar)
10 STNT1W (scalar plus scalar)
11 STNT1D (scalar plus scalar)
SVE store multiple structures (scalar plus scalar)
These instructions are under SVE Memory - Non-temporal and Multi-register Store.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz != 00 Rm 0 1 1 Pg Rn Zt
opc
Decode fields
Instruction Details
msz opc
00 01 ST2B (scalar plus scalar)
01 01 ST2H (scalar plus scalar)
10 01 ST2W (scalar plus scalar)
SVE Memory - Scatter with Optional Sign Extend
Page 2463
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1110010 op0 1 0
Decode fields
Instruction details
op0
00 SVE 64-bit scatter store (scalar plus unpacked 32-bit unscaled offsets)
01 SVE 64-bit scatter store (scalar plus unpacked 32-bit scaled offsets)
10 SVE 32-bit scatter store (scalar plus 32-bit unscaled offsets)
11 SVE 32-bit scatter store (scalar plus 32-bit scaled offsets)
SVE 64-bit scatter store (scalar plus unpacked 32-bit unscaled offsets)
These instructions are under SVE Memory - Scatter with Optional Sign Extend.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 0 0 Zm 1 xs 0 Pg Rn Zt
Decode fields
Instruction Details
msz
00 ST1B (scalar plus vector)
01 ST1H (scalar plus vector)
10 ST1W (scalar plus vector)
11 ST1D (scalar plus vector)
SVE 64-bit scatter store (scalar plus unpacked 32-bit scaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 0 1 Zm 1 xs 0 Pg Rn Zt
Decode fields
Instruction Details
msz
00 UNALLOCATED
SVE 32-bit scatter store (scalar plus 32-bit unscaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 1 0 Zm 1 xs 0 Pg Rn Zt
Decode fields
Instruction Details
msz
11 UNALLOCATED
SVE 32-bit scatter store (scalar plus 32-bit scaled offsets)
Page 2464
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 1 1 Zm 1 xs 0 Pg Rn Zt
Decode fields
Instruction Details
msz
00 UNALLOCATED
11 UNALLOCATED
SVE Memory - Scatter
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1110010 op0 101
Decode fields
Instruction details
op0
00 SVE 64-bit scatter store (scalar plus 64-bit unscaled offsets)
01 SVE 64-bit scatter store (scalar plus 64-bit scaled offsets)
10 SVE 64-bit scatter store (vector plus immediate)
11 SVE 32-bit scatter store (vector plus immediate)
SVE 64-bit scatter store (scalar plus 64-bit unscaled offsets)
These instructions are under SVE Memory - Scatter.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 0 0 Zm 1 0 1 Pg Rn Zt
Decode fields
Instruction Details
msz
SVE 64-bit scatter store (scalar plus 64-bit scaled offsets)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 0 1 Zm 1 0 1 Pg Rn Zt
Decode fields
Instruction Details
msz
00 UNALLOCATED
SVE 64-bit scatter store (vector plus immediate)
Page 2465
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 1 0 imm5 1 0 1 Pg Zn Zt
Decode fields
Instruction Details
msz
00 ST1B (vector plus immediate)
01 ST1H (vector plus immediate)
10 ST1W (vector plus immediate)
11 ST1D (vector plus immediate)
SVE 32-bit scatter store (vector plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 1 1 imm5 1 0 1 Pg Zn Zt
Decode fields
Instruction Details
msz
00 ST1B (vector plus immediate)
01 ST1H (vector plus immediate)
10 ST1W (vector plus immediate)
11 UNALLOCATED
SVE Memory - Contiguous Store with Immediate Offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1110010 op0 op1 111
Decode fields
Instruction details
op0 op1
00 1 SVE contiguous non-temporal store (scalar plus immediate)
!= 00 1 SVE store multiple structures (scalar plus immediate)
0 SVE contiguous store (scalar plus immediate)
SVE contiguous non-temporal store (scalar plus immediate)
These instructions are under SVE Memory - Contiguous Store with Immediate Offset.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz 0 0 1 imm4 1 1 1 Pg Rn Zt
Decode fields
Instruction Details
msz
00 STNT1B (scalar plus immediate)
01 STNT1H (scalar plus immediate)
10 STNT1W (scalar plus immediate)
11 STNT1D (scalar plus immediate)
SVE store multiple structures (scalar plus immediate)
Page 2466
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz != 00 1 imm4 1 1 1 Pg Rn Zt
opc
Decode fields
Instruction Details
msz opc
00 01 ST2B (scalar plus immediate)
01 01 ST2H (scalar plus immediate)
10 01 ST2W (scalar plus immediate)
11 01 ST2D (scalar plus immediate)
SVE contiguous store (scalar plus immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 msz size 0 imm4 1 1 1 Pg Rn Zt
Decode fields
Instruction Details
msz
00 ST1B (scalar plus immediate)
01 ST1H (scalar plus immediate)
10 ST1W (scalar plus immediate)
11 ST1D (scalar plus immediate)
Data Processing -- Immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
100 op0
Decode fields
Instruction details
op0
00x PC-rel. addressing
010 Add/subtract (immediate)
011 Add/subtract (immediate, with tags)
100 Logical (immediate)
101 Move wide (immediate)
110 Bitfield
111 Extract
Page 2467
PC-rel. addressing
These instructions are under Data Processing -- Immediate.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op immlo 1 0 0 0 0 immhi Rd
Decode fields
Instruction Details
op
0 ADR
1 ADRP
Add/subtract (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 1 0 0 0 1 0 sh imm12 Rn Rd
Decode fields
Instruction Details
sf op S
0 0 0 ADD (immediate) — 32-bit
0 0 1 ADDS (immediate) — 32-bit
0 1 0 SUB (immediate) — 32-bit
0 1 1 SUBS (immediate) — 32-bit
1 0 0 ADD (immediate) — 64-bit
1 0 1 ADDS (immediate) — 64-bit
1 1 0 SUB (immediate) — 64-bit
1 1 1 SUBS (immediate) — 64-bit
Add/subtract (immediate, with tags)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 1 0 0 0 1 1 o2 uimm6 op3 uimm4 Rn Rd
Decode fields
Instruction Details Architecture Version
sf op S o2
1 UNALLOCATED -
0 0 UNALLOCATED -
1 1 0 UNALLOCATED -
1 0 0 0 ADDG ARMv8.5-MemTag
1 1 0 0 SUBG ARMv8.5-MemTag
Logical (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf opc 1 0 0 1 0 0 N immr imms Rn Rd
Decode fields
Instruction Details
sf opc N
0 1 UNALLOCATED
0 00 0 AND (immediate) — 32-bit
Page 2468
Decode fields
Instruction Details
sf opc N
0 01 0 ORR (immediate) — 32-bit
0 10 0 EOR (immediate) — 32-bit
0 11 0 ANDS (immediate) — 32-bit
1 00 AND (immediate) — 64-bit
1 01 ORR (immediate) — 64-bit
1 10 EOR (immediate) — 64-bit
1 11 ANDS (immediate) — 64-bit
Move wide (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf opc 1 0 0 1 0 1 hw imm16 Rd
Decode fields
Instruction Details
sf opc hw
01 UNALLOCATED
0 1x UNALLOCATED
0 00 0x MOVN — 32-bit
0 10 0x MOVZ — 32-bit
0 11 0x MOVK — 32-bit
1 00 MOVN — 64-bit
1 10 MOVZ — 64-bit
1 11 MOVK — 64-bit
Bitfield
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf opc 1 0 0 1 1 0 N immr imms Rn Rd
Decode fields
Instruction Details
sf opc N
11 UNALLOCATED
0 1 UNALLOCATED
0 00 0 SBFM — 32-bit
0 01 0 BFM — 32-bit
0 10 0 UBFM — 32-bit
1 0 UNALLOCATED
1 00 1 SBFM — 64-bit
1 01 1 BFM — 64-bit
1 10 1 UBFM — 64-bit
Extract
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op21 1 0 0 1 1 1 N o0 Rm imms Rn Rd
Page 2469
Decode fields
Instruction Details
sf op21 N o0 imms
x1 UNALLOCATED
00 1 UNALLOCATED
1x UNALLOCATED
0 1xxxxx UNALLOCATED
0 1 UNALLOCATED
0 00 0 0 0xxxxx EXTR — 32-bit
1 0 UNALLOCATED
1 00 1 0 EXTR — 64-bit
Branches, Exception Generating and System instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 101 op1 op2
Decode fields
Instruction details
op0 op1 op2
010 0xxxxxxxxxxxxx Conditional branch (immediate)
110 00xxxxxxxxxxxx Exception generation
110 01000000110010 11111 Hints
110 01000000110011 Barriers
110 0100000xxx0100 PSTATE
110 0100x01xxxxxxx System instructions
110 0100x1xxxxxxxx System register move
110 1xxxxxxxxxxxxx Unconditional branch (register)
x00 Unconditional branch (immediate)
x01 0xxxxxxxxxxxxx Compare and branch (immediate)
x01 1xxxxxxxxxxxxx Test and branch (immediate)
Conditional branch (immediate)
These instructions are under Branches, Exception Generating and System instructions.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 1 0 o1 imm19 o0 cond
Decode fields
Instruction Details
o1 o0
0 0 B.cond
0 1 UNALLOCATED
1 UNALLOCATED
Exception generation
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 opc imm16 op2 LL
Page 2470
Decode fields
Instruction Details
opc op2 LL
001 UNALLOCATED
01x UNALLOCATED
1xx UNALLOCATED
000 000 01 SVC
000 000 10 HVC
000 000 11 SMC
001 000 x1 UNALLOCATED
001 000 00 BRK
010 000 x1 UNALLOCATED
010 000 00 HLT
100 000 UNALLOCATED
101 000 01 DCPS1
101 000 10 DCPS2
101 000 11 DCPS3
110 000 UNALLOCATED
111 000 UNALLOCATED
Hints
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 CRm op2 1 1 1 1 1
Decode fields
CRm op2
HINT -
0000 000 NOP -
0000 001 YIELD -
0000 010 WFE -
0000 011 WFI -
0000 100 SEV -
0000 101 SEVL -
0000 110 DGH ARMv8.0-DGH
0000 111 XPACD, XPACI, XPACLRI ARMv8.3-PAuth
0001 000 PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA — PACIA1716 ARMv8.3-PAuth
0001 010 PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB — PACIB1716 ARMv8.3-PAuth
0001 100 AUTIA, AUTIA1716, AUTIASP, AUTIAZ, AUTIZA — AUTIA1716 ARMv8.3-PAuth
0001 110 AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, AUTIZB — AUTIB1716 ARMv8.3-PAuth
0010 000 ESB RAS
0010 001 PSB CSYNC SPE
0010 010 TSB CSYNC ARMv8.4-Trace
0010 100 CSDB -
Page 2471
Decode fields
CRm op2
0011 000 PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA — PACIAZ ARMv8.3-PAuth
0011 001 PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA — PACIASP ARMv8.3-PAuth
0011 010 PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB — PACIBZ ARMv8.3-PAuth
0011 011 PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB — PACIBSP ARMv8.3-PAuth
0011 100 AUTIA, AUTIA1716, AUTIASP, AUTIAZ, AUTIZA — AUTIAZ ARMv8.3-PAuth
0011 101 AUTIA, AUTIA1716, AUTIASP, AUTIAZ, AUTIZA — AUTIASP ARMv8.3-PAuth
0011 110 AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, AUTIZB — AUTIBZ ARMv8.3-PAuth
0011 111 AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, AUTIZB — AUTIBSP ARMv8.3-PAuth
0100 xx0 BTI ARMv8.5-BTI
Barriers
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm op2 Rt
Decode fields
Instruction Details
CRm op2 Rt
000 UNALLOCATED
010 11111 CLREX
101 11111 DMB
110 11111 ISB
111 11111 SB
!= 0x00 100 11111 DSB
0000 100 11111 SSBB
001x 011 UNALLOCATED
01xx 011 UNALLOCATED
0100 100 11111 PSSBB
1xxx 011 UNALLOCATED
PSTATE
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 op1 0 1 0 0 CRm op2 Rt
Decode fields
op1 op2 Rt
!= 11111 UNALLOCATED -
11111 MSR (immediate) -
000 000 11111 CFINV ARMv8.4-CondM
000 001 11111 XAFLAG ARMv8.5-CondM
000 010 11111 AXFLAG ARMv8.5-CondM
Page 2472
System instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 L 0 1 op1 CRn CRm op2 Rt
Decode fields
Instruction Details
L
0 SYS
1 SYSL
System register move
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 L 1 o0 op1 CRn CRm op2 Rt
Decode fields
Instruction Details
L
0 MSR (register)
1 MRS
Unconditional branch (register)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 opc op2 op3 Rn op4
Decode fields Architecture

Instruction Details
opc op2 op3 Rn op4 Version
!= UNALLOCATED -
11111
0000 11111 000000 != UNALLOCATED -
00000
0000 11111 000000 00000 BR -
0000 11111 000001 UNALLOCATED -
0000 11111 000010 != UNALLOCATED -
11111
0000 11111 000010 11111 BRAA, BRAAZ, BRAB, BRABZ — key A, ARMv8.3-PAuth
zero modifier
0000 11111 000011 != UNALLOCATED -
11111
0000 11111 000011 11111 BRAA, BRAAZ, BRAB, BRABZ — key B, ARMv8.3-PAuth
zero modifier
0000 11111 0001xx UNALLOCATED -
0000 11111 001xxx UNALLOCATED -
0000 11111 01xxxx UNALLOCATED -
0000 11111 1xxxxx UNALLOCATED -
0001 11111 000000 != UNALLOCATED -
00000
0001 11111 000000 00000 BLR -
0001 11111 000001 UNALLOCATED -
Page 2473

Instruction Details
0001 11111 000010 != UNALLOCATED -
11111
0001 11111 000010 11111 BLRAA, BLRAAZ, BLRAB, BLRABZ — key ARMv8.3-PAuth
A, zero modifier
0001 11111 000011 != UNALLOCATED -
11111
0001 11111 000011 11111 BLRAA, BLRAAZ, BLRAB, BLRABZ — key ARMv8.3-PAuth
B, zero modifier
0010 11111 000000 != UNALLOCATED -
00000
0010 11111 000000 00000 RET -
0010 11111 000001 UNALLOCATED -
0010 11111 000010 != != UNALLOCATED -
11111 11111
0010 11111 000010 11111 11111 RETAA, RETAB — RETAA ARMv8.3-PAuth
0010 11111 000011 != != UNALLOCATED -
11111 11111
0010 11111 000011 11111 11111 RETAA, RETAB — RETAB ARMv8.3-PAuth
0011 11111 UNALLOCATED -
0100 11111 000000 != != UNALLOCATED -
11111 00000
0100 11111 000000 != 00000 UNALLOCATED -
11111
0100 11111 000000 11111 != UNALLOCATED -
00000
0100 11111 000000 11111 00000 ERET -
0100 11111 000001 UNALLOCATED -
0100 11111 000010 != != UNALLOCATED -
11111 11111
0100 11111 000010 != 11111 UNALLOCATED -
11111
0100 11111 000010 11111 != UNALLOCATED -
11111
0100 11111 000010 11111 11111 ERETAA, ERETAB — ERETAA ARMv8.3-PAuth
0100 11111 000011 != != UNALLOCATED -
11111 11111
0100 11111 000011 != 11111 UNALLOCATED -
11111
0100 11111 000011 11111 != UNALLOCATED -
11111
Page 2474

Instruction Details
0100 11111 000011 11111 11111 ERETAA, ERETAB — ERETAB ARMv8.3-PAuth
0101 11111 != UNALLOCATED -
000000
0101 11111 000000 != != UNALLOCATED -
11111 00000
0101 11111 000000 != 00000 UNALLOCATED -
11111
0101 11111 000000 11111 != UNALLOCATED -
00000
0101 11111 000000 11111 00000 DRPS -
011x 11111 UNALLOCATED -
1000 11111 00000x UNALLOCATED -
1000 11111 000010 BRAA, BRAAZ, BRAB, BRABZ — key A, ARMv8.3-PAuth
register modifier
1000 11111 000011 BRAA, BRAAZ, BRAB, BRABZ — key B, ARMv8.3-PAuth
register modifier
1001 11111 00000x UNALLOCATED -
1001 11111 000010 BLRAA, BLRAAZ, BLRAB, BLRABZ — key ARMv8.3-PAuth
A, register modifier
1001 11111 000011 BLRAA, BLRAAZ, BLRAB, BLRABZ — key ARMv8.3-PAuth
B, register modifier
11xx 11111 UNALLOCATED -
Unconditional branch (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op 0 0 1 0 1 imm26
Decode fields
Instruction Details
op
0 B
1 BL
Compare and branch (immediate)
Page 2475
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 1 0 op imm19 Rt
Decode fields
Instruction Details
sf op
0 0 CBZ — 32-bit
0 1 CBNZ — 32-bit
1 0 CBZ — 64-bit
1 1 CBNZ — 64-bit
Test and branch (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
b5 0 1 1 0 1 1 op b40 imm14 Rt
Decode fields
Instruction Details
op
0 TBZ
1 TBNZ
Loads and Stores
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 1 op1 0 op2 op3 op4
Decode fields
Instruction details
op0 op1 op2 op3 op4
0x00 1 00 000000 Advanced SIMD load/store multiple structures
0x00 1 01 0xxxxx Advanced SIMD load/store multiple structures (post-indexed)
0x00 1 0x 1xxxxx UNALLOCATED
0x00 1 10 x00000 Advanced SIMD load/store single structure
0x00 1 11 Advanced SIMD load/store single structure (post-indexed)
0x00 1 x0 x1xxxx UNALLOCATED
0x00 1 x0 xx1xxx UNALLOCATED
0x00 1 x0 xxx1xx UNALLOCATED
0x00 1 x0 xxxx1x UNALLOCATED
0x00 1 x0 xxxxx1 UNALLOCATED
1101 0 1x 1xxxxx Load/store memory tags
1x00 1 UNALLOCATED
xx00 0 0x Load/store exclusive
xx01 0 1x 0xxxxx 00 LDAPR/STLR (unscaled immediate)
xx01 0x Load register (literal)
xx10 00 Load/store no-allocate pair (offset)
xx10 01 Load/store register pair (post-indexed)
xx10 10 Load/store register pair (offset)
xx10 11 Load/store register pair (pre-indexed)
xx11 0x 0xxxxx 00 Load/store register (unscaled immediate)
xx11 0x 0xxxxx 01 Load/store register (immediate post-indexed)
xx11 0x 0xxxxx 10 Load/store register (unprivileged)
xx11 0x 0xxxxx 11 Load/store register (immediate pre-indexed)
Page 2476
xx11 0x 1xxxxx 00 Atomic memory operations

xx11 0x 1xxxxx 10 Load/store register (register offset)
xx11 0x 1xxxxx x1 Load/store register (pac)
xx11 1x Load/store register (unsigned immediate)
Advanced SIMD load/store multiple structures
These instructions are under Loads and Stores.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 L 0 0 0 0 0 0 opcode size Rn Rt
Decode fields
Instruction Details
L opcode
0 0000 ST4 (multiple structures)
0 0001 UNALLOCATED
0 0010 ST1 (multiple structures) — four registers
0 0011 UNALLOCATED
0 0101 UNALLOCATED
0 0110 ST1 (multiple structures) — three registers
0 0111 ST1 (multiple structures) — one register
0 1001 UNALLOCATED
0 1010 ST1 (multiple structures) — two registers
0 1011 UNALLOCATED
0 11xx UNALLOCATED
1 0000 LD4 (multiple structures)
1 0001 UNALLOCATED
1 0010 LD1 (multiple structures) — four registers
1 0011 UNALLOCATED
1 0101 UNALLOCATED
1 0110 LD1 (multiple structures) — three registers
1 0111 LD1 (multiple structures) — one register
1 1001 UNALLOCATED
1 1010 LD1 (multiple structures) — two registers
1 1011 UNALLOCATED
1 11xx UNALLOCATED
Advanced SIMD load/store multiple structures (post-indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 L 0 Rm opcode size Rn Rt
Decode fields
Instruction Details
L Rm opcode
0 0001 UNALLOCATED
0 0011 UNALLOCATED
Page 2477
Decode fields
Instruction Details
L Rm opcode
0 0101 UNALLOCATED
0 1001 UNALLOCATED
0 1011 UNALLOCATED
0 11xx UNALLOCATED
0 != 11111 0000 ST4 (multiple structures) — register offset
0 != 11111 0010 ST1 (multiple structures) — four registers, register offset
0 != 11111 0110 ST1 (multiple structures) — three registers, register offset
0 != 11111 0111 ST1 (multiple structures) — one register, register offset
0 != 11111 1010 ST1 (multiple structures) — two registers, register offset
0 11111 0000 ST4 (multiple structures) — immediate offset
0 11111 0010 ST1 (multiple structures) — four registers, immediate offset
0 11111 0110 ST1 (multiple structures) — three registers, immediate offset
0 11111 0111 ST1 (multiple structures) — one register, immediate offset
0 11111 1010 ST1 (multiple structures) — two registers, immediate offset
1 0001 UNALLOCATED
1 0011 UNALLOCATED
1 0101 UNALLOCATED
1 1001 UNALLOCATED
1 1011 UNALLOCATED
1 11xx UNALLOCATED
1 != 11111 0000 LD4 (multiple structures) — register offset
1 != 11111 0010 LD1 (multiple structures) — four registers, register offset
1 != 11111 0110 LD1 (multiple structures) — three registers, register offset
1 != 11111 0111 LD1 (multiple structures) — one register, register offset
1 != 11111 1010 LD1 (multiple structures) — two registers, register offset
1 11111 0000 LD4 (multiple structures) — immediate offset
1 11111 0010 LD1 (multiple structures) — four registers, immediate offset
1 11111 0110 LD1 (multiple structures) — three registers, immediate offset
1 11111 0111 LD1 (multiple structures) — one register, immediate offset
1 11111 1010 LD1 (multiple structures) — two registers, immediate offset
Advanced SIMD load/store single structure
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 L R 0 0 0 0 0 opcode S size Rn Rt
Decode fields
Instruction Details
L R opcode S size
0 11x UNALLOCATED
Page 2478
Decode fields
Instruction Details
L R opcode S size
0 0 000 ST1 (single structure) — 8-bit
0 0 010 x0 ST1 (single structure) — 16-bit
0 0 010 x1 UNALLOCATED
0 0 100 00 ST1 (single structure) — 32-bit
0 0 100 0 01 ST1 (single structure) — 64-bit
1 0 000 LD1 (single structure) — 8-bit
1 0 010 x0 LD1 (single structure) — 16-bit
1 0 100 00 LD1 (single structure) — 32-bit
1 0 100 0 01 LD1 (single structure) — 64-bit
1 0 110 0 LD1R
Page 2479
Decode fields
Instruction Details
L R opcode S size
1 0 111 0 LD3R
1 1 110 0 LD2R
1 1 111 0 LD4R
Advanced SIMD load/store single structure (post-indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 L R Rm opcode S size Rn Rt
Decode fields
Instruction Details
L R Rm opcode S size
0 11x UNALLOCATED
0 0 != 11111 000 ST1 (single structure) — 8-bit, register offset
0 0 != 11111 010 x0 ST1 (single structure) — 16-bit, register offset
0 0 != 11111 100 00 ST1 (single structure) — 32-bit, register offset
0 0 != 11111 100 0 01 ST1 (single structure) — 64-bit, register offset
Page 2480
Decode fields
Instruction Details
0 0 11111 000 ST1 (single structure) — 8-bit, immediate offset
0 0 11111 010 x0 ST1 (single structure) — 16-bit, immediate offset
0 0 11111 100 00 ST1 (single structure) — 32-bit, immediate offset
0 0 11111 100 0 01 ST1 (single structure) — 64-bit, immediate offset
1 0 != 11111 000 LD1 (single structure) — 8-bit, register offset
1 0 != 11111 010 x0 LD1 (single structure) — 16-bit, register offset
1 0 != 11111 100 00 LD1 (single structure) — 32-bit, register offset
1 0 != 11111 100 0 01 LD1 (single structure) — 64-bit, register offset
Page 2481
Decode fields
Instruction Details
1 0 != 11111 110 0 LD1R — register offset
1 0 11111 000 LD1 (single structure) — 8-bit, immediate offset
1 0 11111 010 x0 LD1 (single structure) — 16-bit, immediate offset
1 0 11111 100 00 LD1 (single structure) — 32-bit, immediate offset
1 0 11111 100 0 01 LD1 (single structure) — 64-bit, immediate offset
1 0 11111 110 0 LD1R — immediate offset
Load/store memory tags
Page 2482
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 opc 1 imm9 op2 Rn Rt
Decode fields
opc imm9 op2
00 01 STG — post-index ARMv8.5-MemTag
00 10 STG — signed offset ARMv8.5-MemTag
00 11 STG — pre-index ARMv8.5-MemTag
00 000000000 00 STZGM ARMv8.5-MemTag
01 00 LDG ARMv8.5-MemTag
01 01 STZG — post-index ARMv8.5-MemTag
01 10 STZG — signed offset ARMv8.5-MemTag
01 11 STZG — pre-index ARMv8.5-MemTag
10 01 ST2G — post-index ARMv8.5-MemTag
10 10 ST2G — signed offset ARMv8.5-MemTag
10 11 ST2G — pre-index ARMv8.5-MemTag
10 != 000000000 00 UNALLOCATED -
10 000000000 00 STGM ARMv8.5-MemTag
11 01 STZ2G — post-index ARMv8.5-MemTag
11 10 STZ2G — signed offset ARMv8.5-MemTag
11 11 STZ2G — pre-index ARMv8.5-MemTag
11 != 000000000 00 UNALLOCATED -
11 000000000 00 LDGM ARMv8.5-MemTag
Load/store exclusive
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 0 0 1 0 0 0 o2 L o1 Rs o0 Rt2 Rn Rt

Instruction Details
size o2 L o1 o0 Rt2 Version
1 1 != UNALLOCATED -
11111
0x 0 1 != UNALLOCATED -
11111
00 0 0 0 0 STXRB -
00 0 0 0 1 STLXRB -
00 0 0 1 0 11111 CASP, CASPA, CASPAL, CASPL — 32-bit ARMv8.1-LSE
CASP
CASPL
00 0 1 0 0 LDXRB -
00 0 1 0 1 LDAXRB -
CASPA
CASPAL
00 1 0 0 0 STLLRB ARMv8.1-LOR
00 1 0 0 1 STLRB -
00 1 0 1 0 11111 CASB, CASAB, CASALB, CASLB — CASB ARMv8.1-LSE
00 1 0 1 1 11111 CASB, CASAB, CASALB, CASLB — CASLB ARMv8.1-LSE
00 1 1 0 0 LDLARB ARMv8.1-LOR
Page 2483

Instruction Details
00 1 1 0 1 LDARB -
00 1 1 1 0 11111 CASB, CASAB, CASALB, CASLB — CASAB ARMv8.1-LSE
00 1 1 1 1 11111 CASB, CASAB, CASALB, CASLB — ARMv8.1-LSE
CASALB
01 0 0 0 0 STXRH -
01 0 0 0 1 STLXRH -
CASP
CASPL
01 0 1 0 0 LDXRH -
01 0 1 0 1 LDAXRH -
CASPA
CASPAL
01 1 0 0 0 STLLRH ARMv8.1-LOR
01 1 0 0 1 STLRH -
01 1 0 1 0 11111 CASH, CASAH, CASALH, CASLH — CASH ARMv8.1-LSE
01 1 0 1 1 11111 CASH, CASAH, CASALH, CASLH — CASLH ARMv8.1-LSE
01 1 1 0 0 LDLARH ARMv8.1-LOR
01 1 1 0 1 LDARH -
01 1 1 1 0 11111 CASH, CASAH, CASALH, CASLH — ARMv8.1-LSE
CASAH
01 1 1 1 1 11111 CASH, CASAH, CASALH, CASLH — ARMv8.1-LSE
CASALH
10 0 0 0 0 STXR — 32-bit -
10 0 0 0 1 STLXR — 32-bit -
10 0 0 1 0 STXP — 32-bit -
10 0 0 1 1 STLXP — 32-bit -
10 0 1 0 0 LDXR — 32-bit -
10 0 1 0 1 LDAXR — 32-bit -
10 0 1 1 0 LDXP — 32-bit -
10 0 1 1 1 LDAXP — 32-bit -
10 1 0 0 0 STLLR — 32-bit ARMv8.1-LOR
10 1 0 0 1 STLR — 32-bit -
10 1 0 1 0 11111 CAS, CASA, CASAL, CASL — 32-bit CAS ARMv8.1-LSE
10 1 0 1 1 11111 CAS, CASA, CASAL, CASL — 32-bit CASL ARMv8.1-LSE
10 1 1 0 0 LDLAR — 32-bit ARMv8.1-LOR
10 1 1 0 1 LDAR — 32-bit -
10 1 1 1 0 11111 CAS, CASA, CASAL, CASL — 32-bit CASA ARMv8.1-LSE
10 1 1 1 1 11111 CAS, CASA, CASAL, CASL — 32-bit CASAL ARMv8.1-LSE
11 0 0 0 0 STXR — 64-bit -
11 0 0 0 1 STLXR — 64-bit -
11 0 0 1 0 STXP — 64-bit -
11 0 0 1 1 STLXP — 64-bit -
11 0 1 0 0 LDXR — 64-bit -
11 0 1 0 1 LDAXR — 64-bit -
11 0 1 1 0 LDXP — 64-bit -
Page 2484

Instruction Details
11 0 1 1 1 LDAXP — 64-bit -
11 1 0 0 0 STLLR — 64-bit ARMv8.1-LOR
11 1 0 0 1 STLR — 64-bit -
11 1 0 1 0 11111 CAS, CASA, CASAL, CASL — 64-bit CAS ARMv8.1-LSE
11 1 0 1 1 11111 CAS, CASA, CASAL, CASL — 64-bit CASL ARMv8.1-LSE
11 1 1 0 0 LDLAR — 64-bit ARMv8.1-LOR
11 1 1 0 1 LDAR — 64-bit -
11 1 1 1 0 11111 CAS, CASA, CASAL, CASL — 64-bit CASA ARMv8.1-LSE
11 1 1 1 1 11111 CAS, CASA, CASAL, CASL — 64-bit CASAL ARMv8.1-LSE
LDAPR/STLR (unscaled immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 0 1 1 0 0 1 opc 0 imm9 0 0 Rn Rt
Decode fields
size opc
00 00 STLURB ARMv8.4-RCPC
00 01 LDAPURB ARMv8.4-RCPC
00 10 LDAPURSB — 64-bit ARMv8.4-RCPC
00 11 LDAPURSB — 32-bit ARMv8.4-RCPC
01 00 STLURH ARMv8.4-RCPC
01 01 LDAPURH ARMv8.4-RCPC
01 10 LDAPURSH — 64-bit ARMv8.4-RCPC
01 11 LDAPURSH — 32-bit ARMv8.4-RCPC
10 00 STLUR — 32-bit ARMv8.4-RCPC
10 01 LDAPUR — 32-bit ARMv8.4-RCPC
10 10 LDAPURSW ARMv8.4-RCPC
10 11 UNALLOCATED -
11 00 STLUR — 64-bit ARMv8.4-RCPC
11 01 LDAPUR — 64-bit ARMv8.4-RCPC
11 10 UNALLOCATED -
11 11 UNALLOCATED -
Load register (literal)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 0 1 1 V 0 0 imm19 Rt
Decode fields
Instruction Details
opc V
00 0 LDR (literal) — 32-bit
00 1 LDR (literal, SIMD&FP) — 32-bit
01 0 LDR (literal) — 64-bit
10 0 LDRSW (literal)
Page 2485
Decode fields
Instruction Details
opc V
11 0 PRFM (literal)
11 1 UNALLOCATED
Load/store no-allocate pair (offset)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 V 0 0 0 L imm7 Rt2 Rn Rt
Decode fields
Instruction Details
opc V L
00 0 0 STNP — 32-bit
00 0 1 LDNP — 32-bit
00 1 0 STNP (SIMD&FP) — 32-bit
00 1 1 LDNP (SIMD&FP) — 32-bit
01 0 UNALLOCATED
10 0 0 STNP — 64-bit
10 0 1 LDNP — 64-bit
11 UNALLOCATED
Load/store register pair (post-indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Decode fields
opc V L
00 0 0 STP — 32-bit -
00 0 1 LDP — 32-bit -
00 1 0 STP (SIMD&FP) — 32-bit -
00 1 1 LDP (SIMD&FP) — 32-bit -
01 0 0 STGP ARMv8.5-MemTag
01 0 1 LDPSW -
01 1 0 STP (SIMD&FP) — 64-bit -
01 1 1 LDP (SIMD&FP) — 64-bit -
10 0 0 STP — 64-bit -
10 0 1 LDP — 64-bit -
10 1 0 STP (SIMD&FP) — 128-bit -
10 1 1 LDP (SIMD&FP) — 128-bit -
11 UNALLOCATED -
Page 2486
Load/store register pair (offset)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Decode fields
opc V L
00 0 0 STP — 32-bit -
00 0 1 LDP — 32-bit -
00 1 0 STP (SIMD&FP) — 32-bit -
00 1 1 LDP (SIMD&FP) — 32-bit -
01 0 1 LDPSW -
01 1 0 STP (SIMD&FP) — 64-bit -
01 1 1 LDP (SIMD&FP) — 64-bit -
10 0 0 STP — 64-bit -
10 0 1 LDP — 64-bit -
10 1 0 STP (SIMD&FP) — 128-bit -
10 1 1 LDP (SIMD&FP) — 128-bit -
11 UNALLOCATED -
Load/store register pair (pre-indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Decode fields
opc V L
00 0 0 STP — 32-bit -
00 0 1 LDP — 32-bit -
00 1 0 STP (SIMD&FP) — 32-bit -
00 1 1 LDP (SIMD&FP) — 32-bit -
01 0 1 LDPSW -
01 1 0 STP (SIMD&FP) — 64-bit -
01 1 1 LDP (SIMD&FP) — 64-bit -
10 0 0 STP — 64-bit -
10 0 1 LDP — 64-bit -
10 1 0 STP (SIMD&FP) — 128-bit -
10 1 1 LDP (SIMD&FP) — 128-bit -
11 UNALLOCATED -
Load/store register (unscaled immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 V 0 0 opc 0 imm9 0 0 Rn Rt
Page 2487
Decode fields
Instruction Details
size V opc
x1 1 1x UNALLOCATED
00 0 00 STURB
00 0 01 LDURB
00 0 10 LDURSB — 64-bit
00 0 11 LDURSB — 32-bit
00 1 00 STUR (SIMD&FP) — 8-bit
00 1 01 LDUR (SIMD&FP) — 8-bit
01 0 00 STURH
01 0 01 LDURH
01 0 10 LDURSH — 64-bit
01 0 11 LDURSH — 32-bit
1x 0 11 UNALLOCATED
1x 1 1x UNALLOCATED
10 0 00 STUR — 32-bit
10 0 01 LDUR — 32-bit
10 0 10 LDURSW
11 0 00 STUR — 64-bit
11 0 01 LDUR — 64-bit
11 0 10 PRFUM
Load/store register (immediate post-indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Decode fields
Instruction Details
size V opc
x1 1 1x UNALLOCATED
00 0 00 STRB (immediate)
00 0 01 LDRB (immediate)
00 0 10 LDRSB (immediate) — 64-bit
00 1 00 STR (immediate, SIMD&FP) — 8-bit
00 1 01 LDR (immediate, SIMD&FP) — 8-bit
01 0 00 STRH (immediate)
01 0 01 LDRH (immediate)
01 0 10 LDRSH (immediate) — 64-bit
Page 2488
Decode fields
Instruction Details
size V opc
1x 0 11 UNALLOCATED
1x 1 1x UNALLOCATED
10 0 00 STR (immediate) — 32-bit
10 0 01 LDR (immediate) — 32-bit
10 0 10 LDRSW (immediate)
11 0 10 UNALLOCATED
Load/store register (unprivileged)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Decode fields
Instruction Details
size V opc
1 UNALLOCATED
00 0 00 STTRB
00 0 01 LDTRB
00 0 10 LDTRSB — 64-bit
00 0 11 LDTRSB — 32-bit
01 0 00 STTRH
01 0 01 LDTRH
01 0 10 LDTRSH — 64-bit
01 0 11 LDTRSH — 32-bit
1x 0 11 UNALLOCATED
10 0 00 STTR — 32-bit
10 0 01 LDTR — 32-bit
10 0 10 LDTRSW
11 0 00 STTR — 64-bit
11 0 01 LDTR — 64-bit
11 0 10 UNALLOCATED
Load/store register (immediate pre-indexed)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Page 2489
Decode fields
Instruction Details
size V opc
x1 1 1x UNALLOCATED
1x 0 11 UNALLOCATED
1x 1 1x UNALLOCATED
11 0 10 UNALLOCATED
Atomic memory operations
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 V 0 0 A R 1 Rs o3 opc 0 0 Rn Rt

Instruction Details
size V A R o3 opc Version
0 1 11x UNALLOCATED -
0 0 1 100 UNALLOCATED -
0 0 1 1 001 UNALLOCATED -
Page 2490

Instruction Details
1 UNALLOCATED -
00 0 0 0 0 000 LDADDB, LDADDAB, LDADDALB, LDADDLB — ARMv8.1-LSE
LDADDB
00 0 0 0 0 001 LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB — ARMv8.1-LSE
LDCLRB
00 0 0 0 0 010 LDEORB, LDEORAB, LDEORALB, LDEORLB — ARMv8.1-LSE
LDEORB
00 0 0 0 0 011 LDSETB, LDSETAB, LDSETALB, LDSETLB — LDSETB ARMv8.1-LSE
00 0 0 0 0 100 LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB — ARMv8.1-LSE
LDSMAXB
00 0 0 0 0 101 LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB — ARMv8.1-LSE
LDSMINB
00 0 0 0 0 110 LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB — ARMv8.1-LSE
LDUMAXB
00 0 0 0 0 111 LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB — ARMv8.1-LSE
LDUMINB
00 0 0 0 1 000 SWPB, SWPAB, SWPALB, SWPLB — SWPB ARMv8.1-LSE
00 0 0 0 1 001 UNALLOCATED -
00 0 0 0 1 010 UNALLOCATED -
00 0 0 0 1 011 UNALLOCATED -
00 0 0 0 1 101 UNALLOCATED -
LDADDLB
LDCLRLB
LDEORLB
00 0 0 1 0 011 LDSETB, LDSETAB, LDSETALB, LDSETLB — ARMv8.1-LSE
LDSETLB
LDSMAXLB
LDSMINLB
LDUMAXLB
LDUMINLB
00 0 0 1 1 000 SWPB, SWPAB, SWPALB, SWPLB — SWPLB ARMv8.1-LSE
LDADDAB
LDCLRAB
LDEORAB
LDSETAB
LDSMAXAB
LDSMINAB
LDUMAXAB
Page 2491

Instruction Details
LDUMINAB
00 0 1 0 1 000 SWPB, SWPAB, SWPALB, SWPLB — SWPAB ARMv8.1-LSE
00 0 1 0 1 100 LDAPRB ARMv8.3-RCPC
LDADDALB
LDCLRALB
LDEORALB
LDSETALB
LDSMAXALB
LDSMINALB
LDUMAXALB
LDUMINALB
00 0 1 1 1 000 SWPB, SWPAB, SWPALB, SWPLB — SWPALB ARMv8.1-LSE
01 0 0 0 0 000 LDADDH, LDADDAH, LDADDALH, LDADDLH — ARMv8.1-LSE
LDADDH
01 0 0 0 0 001 LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH — ARMv8.1-LSE
LDCLRH
01 0 0 0 0 010 LDEORH, LDEORAH, LDEORALH, LDEORLH — ARMv8.1-LSE
LDEORH
01 0 0 0 0 011 LDSETH, LDSETAH, LDSETALH, LDSETLH — ARMv8.1-LSE
LDSETH
01 0 0 0 0 100 LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH — ARMv8.1-LSE
LDSMAXH
01 0 0 0 0 101 LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH — ARMv8.1-LSE
LDSMINH
01 0 0 0 0 110 LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH ARMv8.1-LSE
— LDUMAXH
01 0 0 0 0 111 LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH — ARMv8.1-LSE
LDUMINH
01 0 0 0 1 000 SWPH, SWPAH, SWPALH, SWPLH — SWPH ARMv8.1-LSE
01 0 0 0 1 001 UNALLOCATED -
01 0 0 0 1 010 UNALLOCATED -
01 0 0 0 1 011 UNALLOCATED -
01 0 0 0 1 101 UNALLOCATED -
LDADDLH
LDCLRLH
LDEORLH
LDSETLH
LDSMAXLH
LDSMINLH
Page 2492

Instruction Details
— LDUMAXLH
LDUMINLH
01 0 0 1 1 000 SWPH, SWPAH, SWPALH, SWPLH — SWPLH ARMv8.1-LSE
LDADDAH
LDCLRAH
LDEORAH
LDSETAH
LDSMAXAH
LDSMINAH
— LDUMAXAH
LDUMINAH
01 0 1 0 1 000 SWPH, SWPAH, SWPALH, SWPLH — SWPAH ARMv8.1-LSE
01 0 1 0 1 100 LDAPRH ARMv8.3-RCPC
LDADDALH
LDCLRALH
LDEORALH
LDSETALH
LDSMAXALH
LDSMINALH
— LDUMAXALH
LDUMINALH
01 0 1 1 1 000 SWPH, SWPAH, SWPALH, SWPLH — SWPALH ARMv8.1-LSE
10 0 0 0 0 000 LDADD, LDADDA, LDADDAL, LDADDL — 32-bit ARMv8.1-LSE
LDADD
10 0 0 0 0 001 LDCLR, LDCLRA, LDCLRAL, LDCLRL — 32-bit LDCLR ARMv8.1-LSE
10 0 0 0 0 010 LDEOR, LDEORA, LDEORAL, LDEORL — 32-bit ARMv8.1-LSE
LDEOR
10 0 0 0 0 011 LDSET, LDSETA, LDSETAL, LDSETL — 32-bit LDSET ARMv8.1-LSE
10 0 0 0 0 100 LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL — 32-bit ARMv8.1-LSE
LDSMAX
10 0 0 0 0 101 LDSMIN, LDSMINA, LDSMINAL, LDSMINL — 32-bit ARMv8.1-LSE
LDSMIN
10 0 0 0 0 110 LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL — ARMv8.1-LSE
32-bit LDUMAX
10 0 0 0 0 111 LDUMIN, LDUMINA, LDUMINAL, LDUMINL — 32-bit ARMv8.1-LSE
LDUMIN
10 0 0 0 1 000 SWP, SWPA, SWPAL, SWPL — 32-bit SWP ARMv8.1-LSE
Page 2493

Instruction Details
10 0 0 0 1 001 UNALLOCATED -
10 0 0 0 1 010 UNALLOCATED -
10 0 0 0 1 011 UNALLOCATED -
10 0 0 0 1 101 UNALLOCATED -
LDADDL
10 0 0 1 0 001 LDCLR, LDCLRA, LDCLRAL, LDCLRL — 32-bit ARMv8.1-LSE
LDCLRL
LDEORL
10 0 0 1 0 011 LDSET, LDSETA, LDSETAL, LDSETL — 32-bit LDSETL ARMv8.1-LSE
LDSMAXL
LDSMINL
32-bit LDUMAXL
LDUMINL
10 0 0 1 1 000 SWP, SWPA, SWPAL, SWPL — 32-bit SWPL ARMv8.1-LSE
LDADDA
LDCLRA
LDEORA
10 0 1 0 0 011 LDSET, LDSETA, LDSETAL, LDSETL — 32-bit LDSETA ARMv8.1-LSE
LDSMAXA
LDSMINA
32-bit LDUMAXA
LDUMINA
10 0 1 0 1 000 SWP, SWPA, SWPAL, SWPL — 32-bit SWPA ARMv8.1-LSE
10 0 1 0 1 100 LDAPR — 32-bit ARMv8.3-RCPC
LDADDAL
LDCLRAL
LDEORAL
10 0 1 1 0 011 LDSET, LDSETA, LDSETAL, LDSETL — 32-bit ARMv8.1-LSE
LDSETAL
LDSMAXAL
LDSMINAL
32-bit LDUMAXAL
LDUMINAL
10 0 1 1 1 000 SWP, SWPA, SWPAL, SWPL — 32-bit SWPAL ARMv8.1-LSE
Page 2494

Instruction Details
LDADD
11 0 0 0 0 001 LDCLR, LDCLRA, LDCLRAL, LDCLRL — 64-bit LDCLR ARMv8.1-LSE
LDEOR
11 0 0 0 0 011 LDSET, LDSETA, LDSETAL, LDSETL — 64-bit LDSET ARMv8.1-LSE
LDSMAX
LDSMIN
64-bit LDUMAX
LDUMIN
11 0 0 0 1 000 SWP, SWPA, SWPAL, SWPL — 64-bit SWP ARMv8.1-LSE
LDADDL
LDCLRL
LDEORL
11 0 0 1 0 011 LDSET, LDSETA, LDSETAL, LDSETL — 64-bit LDSETL ARMv8.1-LSE
LDSMAXL
LDSMINL
64-bit LDUMAXL
LDUMINL
11 0 0 1 1 000 SWP, SWPA, SWPAL, SWPL — 64-bit SWPL ARMv8.1-LSE
LDADDA
LDCLRA
LDEORA
11 0 1 0 0 011 LDSET, LDSETA, LDSETAL, LDSETL — 64-bit LDSETA ARMv8.1-LSE
LDSMAXA
LDSMINA
64-bit LDUMAXA
LDUMINA
11 0 1 0 1 000 SWP, SWPA, SWPAL, SWPL — 64-bit SWPA ARMv8.1-LSE
11 0 1 0 1 100 LDAPR — 64-bit ARMv8.3-RCPC
LDADDAL
LDCLRAL
LDEORAL
Page 2495

Instruction Details
11 0 1 1 0 011 LDSET, LDSETA, LDSETAL, LDSETL — 64-bit ARMv8.1-LSE
LDSETAL
LDSMAXAL
LDSMINAL
64-bit LDUMAXAL
LDUMINAL
11 0 1 1 1 000 SWP, SWPA, SWPAL, SWPL — 64-bit SWPAL ARMv8.1-LSE
Load/store register (register offset)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 V 0 0 opc 1 Rm option S 1 0 Rn Rt
Decode fields
Instruction Details
size V opc option
x1 1 1x UNALLOCATED
00 0 00 != 011 STRB (register) — extended register
00 0 00 011 STRB (register) — shifted register
00 0 01 != 011 LDRB (register) — extended register
00 0 01 011 LDRB (register) — shifted register
00 0 10 != 011 LDRSB (register) — 64-bit with extended register offset
00 0 10 011 LDRSB (register) — 64-bit with shifted register offset
00 0 11 != 011 LDRSB (register) — 32-bit with extended register offset
00 0 11 011 LDRSB (register) — 32-bit with shifted register offset
00 1 00 != 011 STR (register, SIMD&FP)
00 1 00 011 STR (register, SIMD&FP)
00 1 01 != 011 LDR (register, SIMD&FP)
00 1 01 011 LDR (register, SIMD&FP)
00 1 10 STR (register, SIMD&FP)
00 1 11 LDR (register, SIMD&FP)
01 0 00 STRH (register)
01 0 01 LDRH (register)
01 0 10 LDRSH (register) — 64-bit
01 0 11 LDRSH (register) — 32-bit
1x 0 11 UNALLOCATED
1x 1 1x UNALLOCATED
10 0 00 STR (register) — 32-bit
10 0 01 LDR (register) — 32-bit
10 0 10 LDRSW (register)
11 0 00 STR (register) — 64-bit
Page 2496
Decode fields
Instruction Details
size V opc option
11 0 01 LDR (register) — 64-bit
11 0 10 PRFM (register)
Load/store register (pac)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 V 0 0 M S 1 imm9 W 1 Rn Rt
Decode fields
size V M W
!= 11 UNALLOCATED -
11 0 0 0 LDRAA, LDRAB — key A, offset ARMv8.3-PAuth
11 0 0 1 LDRAA, LDRAB — key A, pre-indexed ARMv8.3-PAuth
11 0 1 0 LDRAA, LDRAB — key B, offset ARMv8.3-PAuth
11 0 1 1 LDRAA, LDRAB — key B, pre-indexed ARMv8.3-PAuth
11 1 UNALLOCATED -
Load/store register (unsigned immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 V 0 1 opc imm12 Rn Rt
Decode fields
Instruction Details
size V opc
x1 1 1x UNALLOCATED
1x 0 11 UNALLOCATED
1x 1 1x UNALLOCATED
Page 2497
Decode fields
Instruction Details
size V opc
11 0 10 PRFM (immediate)
Data Processing -- Register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 op1 101 op2 op3
Decode fields
Instruction details
op0 op1 op2 op3
0 1 0110 Data-processing (2 source)
1 1 0110 Data-processing (1 source)
0 0xxx Logical (shifted register)
0 1xx0 Add/subtract (shifted register)
0 1xx1 Add/subtract (extended register)
1 0000 000000 Add/subtract (with carry)
1 0000 x00001 Rotate right into flags
1 0000 xx0010 Evaluate into flags
1 0010 xxxx0x Conditional compare (register)
1 0010 xxxx1x Conditional compare (immediate)
1 0100 Conditional select
1 1xxx Data-processing (3 source)
Data-processing (2 source)
These instructions are under Data Processing -- Register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 S 1 1 0 1 0 1 1 0 Rm opcode Rn Rd
Decode fields
sf S opcode
000001 UNALLOCATED -
011xxx UNALLOCATED -
1xxxxx UNALLOCATED -
0 00011x UNALLOCATED -
1 0001xx UNALLOCATED -
1 001xxx UNALLOCATED -
1 01xxxx UNALLOCATED -
Page 2498
Decode fields
sf S opcode
0 0 000010 UDIV — 32-bit -
0 0 000011 SDIV — 32-bit -
0 0 001000 LSLV — 32-bit -
0 0 001001 LSRV — 32-bit -
0 0 001010 ASRV — 32-bit -
0 0 001011 RORV — 32-bit -
0 0 001100 UNALLOCATED -
0 0 010x11 UNALLOCATED -
0 0 010000 CRC32B, CRC32H, CRC32W, CRC32X — CRC32B -
0 0 010001 CRC32B, CRC32H, CRC32W, CRC32X — CRC32H -
0 0 010010 CRC32B, CRC32H, CRC32W, CRC32X — CRC32W -
0 0 010100 CRC32CB, CRC32CH, CRC32CW, CRC32CX — CRC32CB -
0 0 010101 CRC32CB, CRC32CH, CRC32CW, CRC32CX — CRC32CH -
0 0 010110 CRC32CB, CRC32CH, CRC32CW, CRC32CX — CRC32CW -
1 0 000000 SUBP ARMv8.5-MemTag
1 0 000010 UDIV — 64-bit -
1 0 000011 SDIV — 64-bit -
1 0 000100 IRG ARMv8.5-MemTag
1 0 000101 GMI ARMv8.5-MemTag
1 0 001000 LSLV — 64-bit -
1 0 001001 LSRV — 64-bit -
1 0 001010 ASRV — 64-bit -
1 0 001011 RORV — 64-bit -
1 0 001100 PACGA ARMv8.3-PAuth
1 0 010xx0 UNALLOCATED -
1 0 010x0x UNALLOCATED -
1 0 010011 CRC32B, CRC32H, CRC32W, CRC32X — CRC32X -
1 0 010111 CRC32CB, CRC32CH, CRC32CW, CRC32CX — CRC32CX -
1 1 000000 SUBPS ARMv8.5-MemTag
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 S 1 1 0 1 0 1 1 0 opcode2 opcode Rn Rd

Instruction Details
sf S opcode2 opcode Rn Version
xxx1x UNALLOCATED -
xx1xx UNALLOCATED -
x1xxx UNALLOCATED -
1xxxx UNALLOCATED -
1 UNALLOCATED -
Page 2499

Instruction Details
sf S opcode2 opcode Rn Version
0 0 00000 000000 RBIT — 32-bit -
0 0 00000 000001 REV16 — 32-bit -
0 0 00000 000010 REV — 32-bit -
0 0 00000 000011 UNALLOCATED -
0 0 00000 000100 CLZ — 32-bit -
0 0 00000 000101 CLS — 32-bit -
1 0 00000 000000 RBIT — 64-bit -
1 0 00000 000001 REV16 — 64-bit -
1 0 00000 000010 REV32 -
1 0 00000 000011 REV — 64-bit -
1 0 00000 000100 CLZ — 64-bit -
1 0 00000 000101 CLS — 64-bit -
1 0 00001 000000 PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA ARMv8.3-PAuth
— PACIA
1 0 00001 000001 PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB ARMv8.3-PAuth
— PACIB
1 0 00001 000010 PACDA, PACDZA — PACDA ARMv8.3-PAuth
1 0 00001 000011 PACDB, PACDZB — PACDB ARMv8.3-PAuth
1 0 00001 000100 AUTIA, AUTIA1716, AUTIASP, AUTIAZ, ARMv8.3-PAuth
AUTIZA — AUTIA
1 0 00001 000101 AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, ARMv8.3-PAuth
AUTIZB — AUTIB
1 0 00001 000110 AUTDA, AUTDZA — AUTDA ARMv8.3-PAuth
1 0 00001 000111 AUTDB, AUTDZB — AUTDB ARMv8.3-PAuth
1 0 00001 001000 11111 PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA ARMv8.3-PAuth
— PACIZA
1 0 00001 001001 11111 PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB ARMv8.3-PAuth
— PACIZB
1 0 00001 001010 11111 PACDA, PACDZA — PACDZA ARMv8.3-PAuth
1 0 00001 001011 11111 PACDB, PACDZB — PACDZB ARMv8.3-PAuth
1 0 00001 001100 11111 AUTIA, AUTIA1716, AUTIASP, AUTIAZ, ARMv8.3-PAuth
AUTIZA — AUTIZA
1 0 00001 001101 11111 AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, ARMv8.3-PAuth
AUTIZB — AUTIZB
1 0 00001 001110 11111 AUTDA, AUTDZA — AUTDZA ARMv8.3-PAuth
1 0 00001 001111 11111 AUTDB, AUTDZB — AUTDZB ARMv8.3-PAuth
1 0 00001 010000 11111 XPACD, XPACI, XPACLRI — XPACI ARMv8.3-PAuth
1 0 00001 010001 11111 XPACD, XPACI, XPACLRI — XPACD ARMv8.3-PAuth
1 0 00001 01001x UNALLOCATED -
1 0 00001 0101xx UNALLOCATED -
1 0 00001 011xxx UNALLOCATED -
Logical (shifted register)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf opc 0 1 0 1 0 shift N Rm imm6 Rn Rd
Page 2500
Decode fields
Instruction Details
sf opc N imm6
0 00 0 AND (shifted register) — 32-bit
0 00 1 BIC (shifted register) — 32-bit
0 01 0 ORR (shifted register) — 32-bit
0 01 1 ORN (shifted register) — 32-bit
0 10 0 EOR (shifted register) — 32-bit
0 10 1 EON (shifted register) — 32-bit
0 11 0 ANDS (shifted register) — 32-bit
0 11 1 BICS (shifted register) — 32-bit
1 00 0 AND (shifted register) — 64-bit
1 00 1 BIC (shifted register) — 64-bit
1 01 0 ORR (shifted register) — 64-bit
1 01 1 ORN (shifted register) — 64-bit
1 10 0 EOR (shifted register) — 64-bit
1 10 1 EON (shifted register) — 64-bit
1 11 0 ANDS (shifted register) — 64-bit
1 11 1 BICS (shifted register) — 64-bit
Add/subtract (shifted register)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
Decode fields
Instruction Details
sf op S shift imm6
11 UNALLOCATED
0 0 0 ADD (shifted register) — 32-bit
0 0 1 ADDS (shifted register) — 32-bit
0 1 0 SUB (shifted register) — 32-bit
0 1 1 SUBS (shifted register) — 32-bit
1 0 0 ADD (shifted register) — 64-bit
1 0 1 ADDS (shifted register) — 64-bit
1 1 0 SUB (shifted register) — 64-bit
1 1 1 SUBS (shifted register) — 64-bit
Add/subtract (extended register)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 0 1 0 1 1 opt 1 Rm option imm3 Rn Rd
Decode fields
Instruction Details
sf op S opt imm3
1x1 UNALLOCATED
11x UNALLOCATED
x1 UNALLOCATED
Page 2501
Decode fields
Instruction Details
sf op S opt imm3
1x UNALLOCATED
0 0 0 00 ADD (extended register) — 32-bit
0 0 1 00 ADDS (extended register) — 32-bit
0 1 0 00 SUB (extended register) — 32-bit
0 1 1 00 SUBS (extended register) — 32-bit
1 0 0 00 ADD (extended register) — 64-bit
1 0 1 00 ADDS (extended register) — 64-bit
1 1 0 00 SUB (extended register) — 64-bit
1 1 1 00 SUBS (extended register) — 64-bit
Add/subtract (with carry)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
Decode fields
Instruction Details
sf op S
0 0 0 ADC — 32-bit
0 0 1 ADCS — 32-bit
0 1 0 SBC — 32-bit
0 1 1 SBCS — 32-bit
1 0 0 ADC — 64-bit
1 0 1 ADCS — 64-bit
1 1 0 SBC — 64-bit
1 1 1 SBCS — 64-bit
Rotate right into flags
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 1 1 0 1 0 0 0 0 imm6 0 0 0 0 1 Rn o2 mask
Decode fields
sf op S o2
0 UNALLOCATED -
1 0 0 UNALLOCATED -
1 0 1 0 RMIF ARMv8.4-CondM
1 1 UNALLOCATED -
Evaluate into flags
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 1 1 0 1 0 0 0 0 opcode2 sz 0 0 1 0 Rn o3 mask
Page 2502

Instruction Details
sf op S opcode2 sz o3 mask Version
0 0 0 UNALLOCATED -
0 0 1 != UNALLOCATED -
000000
0 0 1 000000 0 != UNALLOCATED -
1101
0 0 1 000000 1 UNALLOCATED -
0 0 1 000000 0 0 1101 SETF8, SETF16 — SETF8 ARMv8.4-CondM
0 0 1 000000 1 0 1101 SETF8, SETF16 — ARMv8.4-CondM
SETF16
0 1 UNALLOCATED -
1 UNALLOCATED -
Conditional compare (register)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 1 1 0 1 0 0 1 0 Rm cond 0 o2 Rn o3 nzcv
Decode fields
Instruction Details
sf op S o2 o3
1 UNALLOCATED
1 UNALLOCATED
0 UNALLOCATED
0 0 1 0 0 CCMN (register) — 32-bit
0 1 1 0 0 CCMP (register) — 32-bit
1 0 1 0 0 CCMN (register) — 64-bit
1 1 1 0 0 CCMP (register) — 64-bit
Conditional compare (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 1 1 0 1 0 0 1 0 imm5 cond 1 o2 Rn o3 nzcv
Decode fields
Instruction Details
sf op S o2 o3
1 UNALLOCATED
1 UNALLOCATED
0 UNALLOCATED
0 0 1 0 0 CCMN (immediate) — 32-bit
0 1 1 0 0 CCMP (immediate) — 32-bit
1 0 1 0 0 CCMN (immediate) — 64-bit
1 1 1 0 0 CCMP (immediate) — 64-bit
Conditional select
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op S 1 1 0 1 0 1 0 0 Rm cond op2 Rn Rd
Page 2503
Decode fields
Instruction Details
sf op S op2
1x UNALLOCATED
1 UNALLOCATED
0 0 0 00 CSEL — 32-bit
0 0 0 01 CSINC — 32-bit
0 1 0 00 CSINV — 32-bit
0 1 0 01 CSNEG — 32-bit
1 0 0 00 CSEL — 64-bit
1 0 0 01 CSINC — 64-bit
1 1 0 00 CSINV — 64-bit
1 1 0 01 CSNEG — 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf op54 1 1 0 1 1 op31 Rm o0 Ra Rn Rd
Decode fields
Instruction Details
sf op54 op31 o0
00 011 UNALLOCATED
00 100 UNALLOCATED
00 111 UNALLOCATED
01 UNALLOCATED
1x UNALLOCATED
0 00 000 0 MADD — 32-bit
0 00 000 1 MSUB — 32-bit
1 00 000 0 MADD — 64-bit
1 00 000 1 MSUB — 64-bit
1 00 001 0 SMADDL
1 00 001 1 SMSUBL
1 00 010 0 SMULH
1 00 101 0 UMADDL
1 00 101 1 UMSUBL
1 00 110 0 UMULH
Data Processing -- Scalar Floating-Point and Advanced SIMD
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 111 op1 op2 op3
Page 2504

Instruction details
op0 op1 op2 op3 version
0000 0x x101 00xxxxx10 UNALLOCATED -
0100 0x x101 00xxxxx10 Cryptographic AES -
0101 0x x0xx xxx0xxx00 Cryptographic three-register SHA -
0101 0x x0xx xxx0xxx10 UNALLOCATED -
0101 0x x101 00xxxxx10 Cryptographic two-register SHA -
0111 0x x0xx xxx0xxxx0 UNALLOCATED -
01x1 00 00xx xxx0xxxx1 Advanced SIMD scalar copy -
01x1 01 00xx xxx0xxxx1 UNALLOCATED -
01x1 0x 0111 00xxxxx10 UNALLOCATED -
01x1 0x 10xx xxx00xxx1 Advanced SIMD scalar three same FP16 ARMv8.2-FP16
01x1 0x 10xx xxx01xxx1 UNALLOCATED -
01x1 0x 1111 00xxxxx10 Advanced SIMD scalar two-register ARMv8.2-FP16
miscellaneous FP16
01x1 0x x0xx xxx1xxxx0 UNALLOCATED -
01x1 0x x0xx xxx1xxxx1 Advanced SIMD scalar three same extra ARMv8.1-RDMA
01x1 0x x100 00xxxxx10 Advanced SIMD scalar two-register -
miscellaneous
01x1 0x x110 00xxxxx10 Advanced SIMD scalar pairwise ARMv8.2-FP16
01x1 0x x1xx 1xxxxxx10 UNALLOCATED -
01x1 0x x1xx x1xxxxx10 UNALLOCATED -
01x1 0x x1xx xxxxxxx00 Advanced SIMD scalar three different -
01x1 0x x1xx xxxxxxxx1 Advanced SIMD scalar three same -
01x1 10 xxxxxxxx1 Advanced SIMD scalar shift by immediate -
01x1 11 xxxxxxxx1 UNALLOCATED -
01x1 1x xxxxxxxx0 Advanced SIMD scalar x indexed element ARMv8.2-FP16
0x00 0x x0xx xxx0xxx00 Advanced SIMD table lookup -
0x00 0x x0xx xxx0xxx10 Advanced SIMD permute -
0x10 0x x0xx xxx0xxxx0 Advanced SIMD extract -
0xx0 00 00xx xxx0xxxx1 Advanced SIMD copy -
0xx0 01 00xx xxx0xxxx1 UNALLOCATED -
0xx0 0x 0111 00xxxxx10 UNALLOCATED -
0xx0 0x 10xx xxx00xxx1 Advanced SIMD three same (FP16) ARMv8.2-FP16
0xx0 0x 10xx xxx01xxx1 UNALLOCATED -
0xx0 0x 1111 00xxxxx10 Advanced SIMD two-register miscellaneous ARMv8.2-FP16
(FP16)
0xx0 0x x0xx xxx1xxxx0 UNALLOCATED -
0xx0 0x x0xx xxx1xxxx1 Advanced SIMD three-register extension ARMv8.2-DotProd
0xx0 0x x100 00xxxxx10 Advanced SIMD two-register miscellaneous ARMv8.5-FRINT
0xx0 0x x110 00xxxxx10 Advanced SIMD across lanes ARMv8.2-FP16
0xx0 0x x1xx 1xxxxxx10 UNALLOCATED -
0xx0 0x x1xx x1xxxxx10 UNALLOCATED -
0xx0 0x x1xx xxxxxxx00 Advanced SIMD three different -
0xx0 0x x1xx xxxxxxxx1 Advanced SIMD three same ARMv8.2-FHM
0xx0 10 0000 xxxxxxxx1 Advanced SIMD modified immediate ARMv8.2-FP16
Page 2505
0xx0 10 != xxxxxxxx1 Advanced SIMD shift by immediate -

0000
0xx0 11 xxxxxxxx1 UNALLOCATED -
0xx0 1x xxxxxxxx0 Advanced SIMD vector x indexed element ARMv8.2-DotProd
1100 00 10xx xxx10xxxx Cryptographic three-register, imm2 ARMv8.2-SM3
1100 00 11xx xxx1x00xx Cryptographic three-register SHA 512 ARMv8.2-SHA2
1100 00 xxx0xxxxx Cryptographic four-register ARMv8.2-SHA3
1100 01 00xx XAR ARMv8.2-SHA3
1100 01 1000 0001000xx Cryptographic two-register SHA 512 ARMv8.2-SHA2
1xx0 1x UNALLOCATED -
x0x1 0x x0xx Conversion between floating-point and fixed- ARMv8.2-FP16
point
x0x1 0x x1xx xxx000000 Conversion between floating-point and integer ARMv8.3-JConv
x0x1 0x x1xx xxxx10000 Floating-point data-processing (1 source) ARMv8.5-FRINT
x0x1 0x x1xx xxxxx1000 Floating-point compare ARMv8.2-FP16
x0x1 0x x1xx xxxxxx100 Floating-point immediate ARMv8.2-FP16
x0x1 0x x1xx xxxxxxx01 Floating-point conditional compare ARMv8.2-FP16
x0x1 0x x1xx xxxxxxx10 Floating-point data-processing (2 source) ARMv8.2-FP16
x0x1 0x x1xx xxxxxxx11 Floating-point conditional select ARMv8.2-FP16
x0x1 1x Floating-point data-processing (3 source) ARMv8.2-FP16
Cryptographic AES
These instructions are under Data Processing -- Scalar Floating-Point and Advanced SIMD.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 size 1 0 1 0 0 opcode 1 0 Rn Rd
Decode fields
Instruction Details
size opcode
x1xxx UNALLOCATED
000xx UNALLOCATED
1xxxx UNALLOCATED
x1 UNALLOCATED
00 00100 AESE
00 00101 AESD
00 00110 AESMC
00 00111 AESIMC
1x UNALLOCATED
Cryptographic three-register SHA
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 0 Rm 0 opcode 0 0 Rn Rd
Decode fields
Instruction Details
size opcode
111 UNALLOCATED
x1 UNALLOCATED
00 000 SHA1C
Page 2506
Decode fields
Instruction Details
size opcode
00 001 SHA1P
00 010 SHA1M
00 011 SHA1SU0
00 100 SHA256H
00 101 SHA256H2
00 110 SHA256SU1
1x UNALLOCATED
Cryptographic two-register SHA
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 1 0 0 opcode 1 0 Rn Rd
Decode fields
Instruction Details
size opcode
xx1xx UNALLOCATED
x1xxx UNALLOCATED
1xxxx UNALLOCATED
x1 UNALLOCATED
00 00000 SHA1H
00 00001 SHA1SU1
00 00010 SHA256SU0
1x UNALLOCATED
Advanced SIMD scalar copy
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 op 1 1 1 1 0 0 0 0 imm5 0 imm4 1 Rn Rd
Decode fields
Instruction Details
op imm5 imm4
0 xxx1 UNALLOCATED
0 xx1x UNALLOCATED
0 x1xx UNALLOCATED
0 0000 DUP (element)
0 1xxx UNALLOCATED
1 UNALLOCATED
Advanced SIMD scalar three same FP16
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 0 a 1 0 Rm 0 0 opcode 1 Rn Rd
Page 2507
Decode fields
U a opcode
110 UNALLOCATED -
1 011 UNALLOCATED -
0 0 011 FMULX ARMv8.2-FP16
0 0 100 FCMEQ (register) ARMv8.2-FP16
0 0 111 FRECPS ARMv8.2-FP16
0 1 111 FRSQRTS ARMv8.2-FP16
1 0 100 FCMGE (register) ARMv8.2-FP16
1 0 101 FACGE ARMv8.2-FP16
1 1 010 FABD ARMv8.2-FP16
1 1 100 FCMGT (register) ARMv8.2-FP16
1 1 101 FACGT ARMv8.2-FP16
Advanced SIMD scalar two-register miscellaneous FP16
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 0 a 1 1 1 1 0 0 opcode 1 0 Rn Rd
Decode fields
U a opcode
00xxx UNALLOCATED -
010xx UNALLOCATED -
10xxx UNALLOCATED -
1100x UNALLOCATED -
11110 UNALLOCATED -
0 0 11010 FCVTNS (vector) ARMv8.2-FP16
0 0 11011 FCVTMS (vector) ARMv8.2-FP16
0 0 11100 FCVTAS (vector) ARMv8.2-FP16
0 0 11101 SCVTF (vector, integer) ARMv8.2-FP16
0 1 01100 FCMGT (zero) ARMv8.2-FP16
0 1 01101 FCMEQ (zero) ARMv8.2-FP16
0 1 01110 FCMLT (zero) ARMv8.2-FP16
0 1 11010 FCVTPS (vector) ARMv8.2-FP16
0 1 11011 FCVTZS (vector, integer) ARMv8.2-FP16
0 1 11101 FRECPE ARMv8.2-FP16
0 1 11111 FRECPX ARMv8.2-FP16
1 0 11010 FCVTNU (vector) ARMv8.2-FP16
1 0 11011 FCVTMU (vector) ARMv8.2-FP16
Page 2508
Decode fields
U a opcode
1 0 11100 FCVTAU (vector) ARMv8.2-FP16
1 0 11101 UCVTF (vector, integer) ARMv8.2-FP16
1 1 01100 FCMGE (zero) ARMv8.2-FP16
1 1 01101 FCMLE (zero) ARMv8.2-FP16
1 1 11010 FCVTPU (vector) ARMv8.2-FP16
1 1 11011 FCVTZU (vector, integer) ARMv8.2-FP16
1 1 11101 FRSQRTE ARMv8.2-FP16
Advanced SIMD scalar three same extra
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 0 size 0 Rm 1 opcode 1 Rn Rd
Decode fields
U opcode
001x UNALLOCATED -
01xx UNALLOCATED -
1xxx UNALLOCATED -
1 0000 SQRDMLAH (vector) ARMv8.1-RDMA
1 0001 SQRDMLSH (vector) ARMv8.1-RDMA
Advanced SIMD scalar two-register miscellaneous
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 0 size 1 0 0 0 0 opcode 1 0 Rn Rd
Decode fields
Instruction Details
U size opcode
0000x UNALLOCATED
00010 UNALLOCATED
0010x UNALLOCATED
00110 UNALLOCATED
01111 UNALLOCATED
1000x UNALLOCATED
10011 UNALLOCATED
10101 UNALLOCATED
10111 UNALLOCATED
1100x UNALLOCATED
11110 UNALLOCATED
0x 011xx UNALLOCATED
Page 2509
Decode fields
Instruction Details
U size opcode
0 00011 SUQADD
0 00111 SQABS
0 01000 CMGT (zero)
0 01001 CMEQ (zero)
0 01010 CMLT (zero)
0 01011 ABS
0 10010 UNALLOCATED
0 10100 SQXTN, SQXTN2
0 0x 11010 FCVTNS (vector)
0 0x 11011 FCVTMS (vector)
0 0x 11100 FCVTAS (vector)
0 0x 11101 SCVTF (vector, integer)
0 1x 01100 FCMGT (zero)
0 1x 01101 FCMEQ (zero)
0 1x 01110 FCMLT (zero)
0 1x 11010 FCVTPS (vector)
0 1x 11011 FCVTZS (vector, integer)
0 1x 11101 FRECPE
0 1x 11111 FRECPX
1 00011 USQADD
1 00111 SQNEG
1 01000 CMGE (zero)
1 01001 CMLE (zero)
1 01010 UNALLOCATED
1 01011 NEG (vector)
1 10010 SQXTUN, SQXTUN2
1 10100 UQXTN, UQXTN2
1 0x 10110 FCVTXN, FCVTXN2
1 0x 11010 FCVTNU (vector)
1 0x 11011 FCVTMU (vector)
1 0x 11100 FCVTAU (vector)
1 0x 11101 UCVTF (vector, integer)
1 1x 01100 FCMGE (zero)
1 1x 01101 FCMLE (zero)
1 1x 11010 FCVTPU (vector)
1 1x 11011 FCVTZU (vector, integer)
1 1x 11101 FRSQRTE
Advanced SIMD scalar pairwise
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 0 size 1 1 0 0 0 opcode 1 0 Rn Rd
Page 2510
Decode fields
U size opcode
00xxx UNALLOCATED -
010xx UNALLOCATED -
01110 UNALLOCATED -
10xxx UNALLOCATED -
1100x UNALLOCATED -
11010 UNALLOCATED -
111xx UNALLOCATED -
0 11011 ADDP (scalar) -
0 00 01100 FMAXNMP (scalar) — half-precision ARMv8.2-FP16
0 00 01101 FADDP (scalar) — half-precision ARMv8.2-FP16
0 00 01111 FMAXP (scalar) — half-precision ARMv8.2-FP16
0 10 01100 FMINNMP (scalar) — half-precision ARMv8.2-FP16
0 10 01111 FMINP (scalar) — half-precision ARMv8.2-FP16
1 0x 01100 FMAXNMP (scalar) — single-precision and double-precision -
1 0x 01101 FADDP (scalar) — single-precision and double-precision -
1 0x 01111 FMAXP (scalar) — single-precision and double-precision -
1 1x 01100 FMINNMP (scalar) — single-precision and double-precision -
1 1x 01111 FMINP (scalar) — single-precision and double-precision -
Advanced SIMD scalar three different
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 0 size 1 Rm opcode 0 0 Rn Rd
Decode fields
Instruction Details
U opcode
00xx UNALLOCATED
01xx UNALLOCATED
1000 UNALLOCATED
1010 UNALLOCATED
1100 UNALLOCATED
111x UNALLOCATED
0 1001 SQDMLAL, SQDMLAL2 (vector)
0 1011 SQDMLSL, SQDMLSL2 (vector)
0 1101 SQDMULL, SQDMULL2 (vector)
1 1001 UNALLOCATED
1 1011 UNALLOCATED
1 1101 UNALLOCATED
Page 2511
Advanced SIMD scalar three same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 0 size 1 Rm opcode 1 Rn Rd
Decode fields
Instruction Details
U size opcode
00000 UNALLOCATED
0001x UNALLOCATED
00100 UNALLOCATED
011xx UNALLOCATED
1001x UNALLOCATED
0 00001 SQADD
0 00101 SQSUB
0 00110 CMGT (register)
0 00111 CMGE (register)
0 01000 SSHL
0 01001 SQSHL (register)
0 01010 SRSHL
0 01011 SQRSHL
0 10000 ADD (vector)
0 10001 CMTST
0 10100 UNALLOCATED
0 10101 UNALLOCATED
0 10110 SQDMULH (vector)
0 10111 UNALLOCATED
0 0x 11011 FMULX
0 0x 11100 FCMEQ (register)
0 0x 11111 FRECPS
0 1x 11111 FRSQRTS
1 00001 UQADD
1 00101 UQSUB
1 00110 CMHI (register)
1 00111 CMHS (register)
1 01000 USHL
1 01001 UQSHL (register)
1 01010 URSHL
Page 2512
Decode fields
Instruction Details
U size opcode
1 01011 UQRSHL
1 10000 SUB (vector)
1 10001 CMEQ (register)
1 10100 UNALLOCATED
1 10101 UNALLOCATED
1 10110 SQRDMULH (vector)
1 10111 UNALLOCATED
1 0x 11100 FCMGE (register)
1 0x 11101 FACGE
1 1x 11010 FABD
1 1x 11100 FCMGT (register)
1 1x 11101 FACGT
Advanced SIMD scalar shift by immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 1 0 immh immb opcode 1 Rn Rd
Decode fields
Instruction Details
U immh opcode
!= 0000 00001 UNALLOCATED
!= 0000 101xx UNALLOCATED
!= 0000 110xx UNALLOCATED
0000 UNALLOCATED
0 != 0000 00000 SSHR
0 != 0000 00010 SSRA
0 != 0000 00100 SRSHR
0 != 0000 00110 SRSRA
Page 2513
Decode fields
Instruction Details
U immh opcode
0 != 0000 01010 SHL
0 != 0000 01110 SQSHL (immediate)
0 != 0000 10010 SQSHRN, SQSHRN2
0 != 0000 10011 SQRSHRN, SQRSHRN2
0 != 0000 11100 SCVTF (vector, fixed-point)
0 != 0000 11111 FCVTZS (vector, fixed-point)
1 != 0000 00000 USHR
1 != 0000 00010 USRA
1 != 0000 00100 URSHR
1 != 0000 00110 URSRA
1 != 0000 01000 SRI
1 != 0000 01010 SLI
1 != 0000 01100 SQSHLU
1 != 0000 01110 UQSHL (immediate)
1 != 0000 10000 SQSHRUN, SQSHRUN2
1 != 0000 10001 SQRSHRUN, SQRSHRUN2
1 != 0000 10010 UQSHRN, UQSHRN2
1 != 0000 10011 UQRSHRN, UQRSHRN2
1 != 0000 11100 UCVTF (vector, fixed-point)
1 != 0000 11111 FCVTZU (vector, fixed-point)
Advanced SIMD scalar x indexed element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 U 1 1 1 1 1 size L M Rm opcode H 0 Rn Rd
Decode fields
U size opcode
0000 UNALLOCATED -
0010 UNALLOCATED -
0100 UNALLOCATED -
0110 UNALLOCATED -
1000 UNALLOCATED -
1010 UNALLOCATED -
1110 UNALLOCATED -
0 0011 SQDMLAL, SQDMLAL2 (by element) -
0 0111 SQDMLSL, SQDMLSL2 (by element) -
0 1011 SQDMULL, SQDMULL2 (by element) -
0 1100 SQDMULH (by element) -
0 1101 SQRDMULH (by element) -
Page 2514
Decode fields
U size opcode
0 00 0001 FMLA (by element) — half-precision ARMv8.2-FP16
0 00 0101 FMLS (by element) — half-precision ARMv8.2-FP16
0 00 1001 FMUL (by element) — half-precision ARMv8.2-FP16
0 1x 0001 FMLA (by element) — single-precision and double-precision -
0 1x 0101 FMLS (by element) — single-precision and double-precision -
0 1x 1001 FMUL (by element) — single-precision and double-precision -
1 1101 SQRDMLAH (by element) ARMv8.1-RDMA
1 1111 SQRDMLSH (by element) ARMv8.1-RDMA
1 00 1001 FMULX (by element) — half-precision ARMv8.2-FP16
1 1x 0001 UNALLOCATED -
1 1x 1001 FMULX (by element) — single-precision and double-precision -
Advanced SIMD table lookup
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 op2 0 Rm 0 len op 0 0 Rn Rd
Decode fields
Instruction Details
op2 len op
x1 UNALLOCATED
00 00 0 TBL — single register table
00 00 1 TBX — single register table
00 01 0 TBL — two register table
00 01 1 TBX — two register table
00 10 0 TBL — three register table
00 10 1 TBX — three register table
00 11 0 TBL — four register table
00 11 1 TBX — four register table
1x UNALLOCATED
Advanced SIMD permute
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 opcode 1 0 Rn Rd
Decode fields
Instruction Details
opcode
000 UNALLOCATED
Page 2515
Decode fields
Instruction Details
opcode
001 UZP1
010 TRN1
011 ZIP1
100 UNALLOCATED
101 UZP2
110 TRN2
111 ZIP2
Advanced SIMD extract
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 op2 0 Rm 0 imm4 0 Rn Rd
Decode fields
Instruction Details
op2
x1 UNALLOCATED
00 EXT
1x UNALLOCATED
Advanced SIMD copy
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q op 0 1 1 1 0 0 0 0 imm5 0 imm4 1 Rn Rd
Decode fields
Instruction Details
Q op imm5 imm4
x0000 UNALLOCATED
0 0000 DUP (element)
0 0001 DUP (general)
0 0010 UNALLOCATED
0 0100 UNALLOCATED
0 0110 UNALLOCATED
0 1xxx UNALLOCATED
0 0 0101 SMOV
0 0 0111 UMOV
0 1 UNALLOCATED
1 0 0011 INS (general)
1 0 0101 SMOV
1 0 x1000 0111 UMOV
1 1 INS (element)
Advanced SIMD three same (FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 0 a 1 0 Rm 0 0 opcode 1 Rn Rd
Page 2516
Decode fields
U a opcode
0 0 000 FMAXNM (vector) ARMv8.2-FP16
0 0 001 FMLA (vector) ARMv8.2-FP16
0 0 010 FADD (vector) ARMv8.2-FP16
0 0 011 FMULX ARMv8.2-FP16
0 0 100 FCMEQ (register) ARMv8.2-FP16
0 0 110 FMAX (vector) ARMv8.2-FP16
0 0 111 FRECPS ARMv8.2-FP16
0 1 000 FMINNM (vector) ARMv8.2-FP16
0 1 001 FMLS (vector) ARMv8.2-FP16
0 1 010 FSUB (vector) ARMv8.2-FP16
0 1 110 FMIN (vector) ARMv8.2-FP16
0 1 111 FRSQRTS ARMv8.2-FP16
1 0 000 FMAXNMP (vector) ARMv8.2-FP16
1 0 010 FADDP (vector) ARMv8.2-FP16
1 0 011 FMUL (vector) ARMv8.2-FP16
1 0 100 FCMGE (register) ARMv8.2-FP16
1 0 101 FACGE ARMv8.2-FP16
1 0 110 FMAXP (vector) ARMv8.2-FP16
1 0 111 FDIV (vector) ARMv8.2-FP16
1 1 000 FMINNMP (vector) ARMv8.2-FP16
1 1 010 FABD ARMv8.2-FP16
1 1 100 FCMGT (register) ARMv8.2-FP16
1 1 101 FACGT ARMv8.2-FP16
1 1 110 FMINP (vector) ARMv8.2-FP16
Advanced SIMD two-register miscellaneous (FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 0 a 1 1 1 1 0 0 opcode 1 0 Rn Rd
Decode fields
U a opcode
00xxx UNALLOCATED -
010xx UNALLOCATED -
10xxx UNALLOCATED -
11110 UNALLOCATED -
Page 2517
Decode fields
U a opcode
0 0 11000 FRINTN (vector) ARMv8.2-FP16
0 0 11001 FRINTM (vector) ARMv8.2-FP16
0 0 11010 FCVTNS (vector) ARMv8.2-FP16
0 0 11011 FCVTMS (vector) ARMv8.2-FP16
0 0 11100 FCVTAS (vector) ARMv8.2-FP16
0 0 11101 SCVTF (vector, integer) ARMv8.2-FP16
0 1 01100 FCMGT (zero) ARMv8.2-FP16
0 1 01101 FCMEQ (zero) ARMv8.2-FP16
0 1 01110 FCMLT (zero) ARMv8.2-FP16
0 1 01111 FABS (vector) ARMv8.2-FP16
0 1 11000 FRINTP (vector) ARMv8.2-FP16
0 1 11001 FRINTZ (vector) ARMv8.2-FP16
0 1 11010 FCVTPS (vector) ARMv8.2-FP16
0 1 11011 FCVTZS (vector, integer) ARMv8.2-FP16
0 1 11101 FRECPE ARMv8.2-FP16
1 0 11000 FRINTA (vector) ARMv8.2-FP16
1 0 11001 FRINTX (vector) ARMv8.2-FP16
1 0 11010 FCVTNU (vector) ARMv8.2-FP16
1 0 11011 FCVTMU (vector) ARMv8.2-FP16
1 0 11100 FCVTAU (vector) ARMv8.2-FP16
1 0 11101 UCVTF (vector, integer) ARMv8.2-FP16
1 1 01100 FCMGE (zero) ARMv8.2-FP16
1 1 01101 FCMLE (zero) ARMv8.2-FP16
1 1 01111 FNEG (vector) ARMv8.2-FP16
1 1 11001 FRINTI (vector) ARMv8.2-FP16
1 1 11010 FCVTPU (vector) ARMv8.2-FP16
1 1 11011 FCVTZU (vector, integer) ARMv8.2-FP16
1 1 11101 FRSQRTE ARMv8.2-FP16
1 1 11111 FSQRT (vector) ARMv8.2-FP16
Advanced SIMD three-register extension
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 0 size 0 Rm 1 opcode 1 Rn Rd
Decode fields
Q U size opcode
0 0010 SDOT (vector) ARMv8.2-DotProd
0 1xxx UNALLOCATED -
0 10 0011 USDOT (vector) ARMv8.2-I8MM
Page 2518
Decode fields
Q U size opcode
1 0000 SQRDMLAH (vector) ARMv8.1-RDMA
1 0001 SQRDMLSH (vector) ARMv8.1-RDMA
1 0010 UDOT (vector) ARMv8.2-DotProd
1 10xx FCMLA ARMv8.3-CompNum
1 11x0 FCADD ARMv8.3-CompNum
1 01 1111 BFDOT (vector) ARMv8.2-BF16
1 11 1111 BFMLALB, BFMLALT (vector) ARMv8.2-BF16
1 0x 01xx UNALLOCATED -
1 1x 011x UNALLOCATED -
1 0 10 0100 SMMLA (vector) ARMv8.2-I8MM
1 0 10 0101 USMMLA (vector) ARMv8.2-I8MM
1 1 01 1101 BFMMLA ARMv8.2-BF16
1 1 10 0100 UMMLA (vector) ARMv8.2-I8MM
Advanced SIMD two-register miscellaneous
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 0 size 1 0 0 0 0 opcode 1 0 Rn Rd
Decode fields
U size opcode
1000x UNALLOCATED -
10101 UNALLOCATED -
0x 011xx UNALLOCATED -
0 00000 REV64 -
0 00001 REV16 (vector) -
0 00010 SADDLP -
0 00011 SUQADD -
0 00100 CLS (vector) -
0 00101 CNT -
0 00110 SADALP -
0 00111 SQABS -
0 01000 CMGT (zero) -
0 01001 CMEQ (zero) -
0 01010 CMLT (zero) -
0 01011 ABS -
Page 2519
Decode fields
U size opcode
0 10010 XTN, XTN2 -
0 10100 SQXTN, SQXTN2 -
0 0x 10110 FCVTN, FCVTN2 -
0 0x 10111 FCVTL, FCVTL2 -
0 0x 11000 FRINTN (vector) -
0 0x 11001 FRINTM (vector) -
0 0x 11010 FCVTNS (vector) -
0 0x 11011 FCVTMS (vector) -
0 0x 11100 FCVTAS (vector) -
0 0x 11101 SCVTF (vector, integer) -
0 0x 11110 FRINT32Z (vector) ARMv8.5-FRINT
0 0x 11111 FRINT64Z (vector) ARMv8.5-FRINT
0 1x 01100 FCMGT (zero) -
0 1x 01101 FCMEQ (zero) -
0 1x 01110 FCMLT (zero) -
0 1x 01111 FABS (vector) -
0 1x 11000 FRINTP (vector) -
0 1x 11001 FRINTZ (vector) -
0 1x 11010 FCVTPS (vector) -
0 1x 11011 FCVTZS (vector, integer) -
0 1x 11100 URECPE -
0 1x 11101 FRECPE -
0 10 10110 BFCVTN, BFCVTN2 ARMv8.2-BF16
1 00000 REV32 (vector) -
1 00010 UADDLP -
1 00011 USQADD -
1 00100 CLZ (vector) -
1 00110 UADALP -
1 00111 SQNEG -
1 01000 CMGE (zero) -
1 01001 CMLE (zero) -
1 01011 NEG (vector) -
1 10010 SQXTUN, SQXTUN2 -
1 10011 SHLL, SHLL2 -
1 10100 UQXTN, UQXTN2 -
1 0x 10110 FCVTXN, FCVTXN2 -
1 0x 11000 FRINTA (vector) -
1 0x 11001 FRINTX (vector) -
1 0x 11010 FCVTNU (vector) -
1 0x 11011 FCVTMU (vector) -
1 0x 11100 FCVTAU (vector) -
1 0x 11101 UCVTF (vector, integer) -
Page 2520
Decode fields
U size opcode
1 0x 11110 FRINT32X (vector) ARMv8.5-FRINT
1 0x 11111 FRINT64X (vector) ARMv8.5-FRINT
1 00 00101 NOT -
1 01 00101 RBIT (vector) -
1 1x 01100 FCMGE (zero) -
1 1x 01101 FCMLE (zero) -
1 1x 01111 FNEG (vector) -
1 1x 11001 FRINTI (vector) -
1 1x 11010 FCVTPU (vector) -
1 1x 11011 FCVTZU (vector, integer) -
1 1x 11100 URSQRTE -
1 1x 11101 FRSQRTE -
1 1x 11111 FSQRT (vector) -
Advanced SIMD across lanes
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 0 size 1 1 0 0 0 opcode 1 0 Rn Rd
Decode fields
U size opcode
0000x UNALLOCATED -
00010 UNALLOCATED -
001xx UNALLOCATED -
0100x UNALLOCATED -
01011 UNALLOCATED -
01101 UNALLOCATED -
01110 UNALLOCATED -
10xxx UNALLOCATED -
1100x UNALLOCATED -
111xx UNALLOCATED -
0 00011 SADDLV -
0 01010 SMAXV -
0 11010 SMINV -
0 11011 ADDV -
0 00 01100 FMAXNMV — half-precision ARMv8.2-FP16
0 00 01111 FMAXV — half-precision ARMv8.2-FP16
0 10 01100 FMINNMV — half-precision ARMv8.2-FP16
0 10 01111 FMINV — half-precision ARMv8.2-FP16
Page 2521
Decode fields
U size opcode
1 00011 UADDLV -
1 01010 UMAXV -
1 11010 UMINV -
1 0x 01100 FMAXNMV — single-precision and double-precision -
1 0x 01111 FMAXV — single-precision and double-precision -
1 1x 01100 FMINNMV — single-precision and double-precision -
1 1x 01111 FMINV — single-precision and double-precision -
Advanced SIMD three different
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 0 size 1 Rm opcode 0 0 Rn Rd
Decode fields
Instruction Details
U opcode
1111 UNALLOCATED
0 0000 SADDL, SADDL2
0 0001 SADDW, SADDW2
0 0010 SSUBL, SSUBL2
0 0011 SSUBW, SSUBW2
0 0100 ADDHN, ADDHN2
0 0101 SABAL, SABAL2
0 0110 SUBHN, SUBHN2
0 0111 SABDL, SABDL2
0 1000 SMLAL, SMLAL2 (vector)
0 1001 SQDMLAL, SQDMLAL2 (vector)
0 1010 SMLSL, SMLSL2 (vector)
0 1011 SQDMLSL, SQDMLSL2 (vector)
0 1100 SMULL, SMULL2 (vector)
0 1101 SQDMULL, SQDMULL2 (vector)
0 1110 PMULL, PMULL2
1 0000 UADDL, UADDL2
1 0001 UADDW, UADDW2
1 0010 USUBL, USUBL2
1 0011 USUBW, USUBW2
1 0100 RADDHN, RADDHN2
1 0101 UABAL, UABAL2
1 0110 RSUBHN, RSUBHN2
1 0111 UABDL, UABDL2
1 1000 UMLAL, UMLAL2 (vector)
1 1001 UNALLOCATED
1 1010 UMLSL, UMLSL2 (vector)
1 1011 UNALLOCATED
1 1100 UMULL, UMULL2 (vector)
1 1101 UNALLOCATED
1 1110 UNALLOCATED
Page 2522
Advanced SIMD three same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 0 size 1 Rm opcode 1 Rn Rd
Decode fields
U size opcode
0 00000 SHADD -
0 00001 SQADD -
0 00010 SRHADD -
0 00100 SHSUB -
0 00101 SQSUB -
0 00110 CMGT (register) -
0 00111 CMGE (register) -
0 01000 SSHL -
0 01001 SQSHL (register) -
0 01010 SRSHL -
0 01011 SQRSHL -
0 01100 SMAX -
0 01101 SMIN -
0 01110 SABD -
0 01111 SABA -
0 10000 ADD (vector) -
0 10001 CMTST -
0 10010 MLA (vector) -
0 10011 MUL (vector) -
0 10100 SMAXP -
0 10101 SMINP -
0 10110 SQDMULH (vector) -
0 10111 ADDP (vector) -
0 0x 11000 FMAXNM (vector) -
0 0x 11001 FMLA (vector) -
0 0x 11010 FADD (vector) -
0 0x 11011 FMULX -
0 0x 11100 FCMEQ (register) -
0 0x 11110 FMAX (vector) -
0 0x 11111 FRECPS -
0 00 00011 AND (vector) -
0 00 11101 FMLAL, FMLAL2 (vector) — FMLAL ARMv8.2-FHM
0 01 00011 BIC (vector, register) -
0 1x 11000 FMINNM (vector) -
0 1x 11001 FMLS (vector) -
0 1x 11010 FSUB (vector) -
0 1x 11110 FMIN (vector) -
0 1x 11111 FRSQRTS -
0 10 00011 ORR (vector, register) -
Page 2523
Decode fields
U size opcode
0 10 11101 FMLSL, FMLSL2 (vector) — FMLSL ARMv8.2-FHM
0 11 00011 ORN (vector) -
1 00000 UHADD -
1 00001 UQADD -
1 00010 URHADD -
1 00100 UHSUB -
1 00101 UQSUB -
1 00110 CMHI (register) -
1 00111 CMHS (register) -
1 01000 USHL -
1 01001 UQSHL (register) -
1 01010 URSHL -
1 01011 UQRSHL -
1 01100 UMAX -
1 01101 UMIN -
1 01110 UABD -
1 01111 UABA -
1 10000 SUB (vector) -
1 10001 CMEQ (register) -
1 10010 MLS (vector) -
1 10011 PMUL -
1 10100 UMAXP -
1 10101 UMINP -
1 10110 SQRDMULH (vector) -
1 0x 11000 FMAXNMP (vector) -
1 0x 11010 FADDP (vector) -
1 0x 11011 FMUL (vector) -
1 0x 11100 FCMGE (register) -
1 0x 11101 FACGE -
1 0x 11110 FMAXP (vector) -
1 0x 11111 FDIV (vector) -
1 00 00011 EOR (vector) -
1 00 11001 FMLAL, FMLAL2 (vector) — FMLAL2 ARMv8.2-FHM
1 01 00011 BSL -
1 1x 11000 FMINNMP (vector) -
1 1x 11010 FABD -
1 1x 11100 FCMGT (register) -
1 1x 11101 FACGT -
1 1x 11110 FMINP (vector) -
1 10 00011 BIT -
1 10 11001 FMLSL, FMLSL2 (vector) — FMLSL2 ARMv8.2-FHM
1 11 00011 BIF -
Page 2524
Decode fields
U size opcode
Advanced SIMD modified immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q op 0 1 1 1 1 0 0 0 0 0 a b c cmode o2 1 d e f g h Rd
Decode fields
Q op cmode o2
0 0xxx 1 UNALLOCATED -
0 0xx0 0 MOVI — 32-bit shifted immediate -
0 0xx1 0 ORR (vector, immediate) — 32-bit -
0 10xx 1 UNALLOCATED -
0 10x0 0 MOVI — 16-bit shifted immediate -
0 10x1 0 ORR (vector, immediate) — 16-bit -
0 110x 0 MOVI — 32-bit shifting ones -
0 1110 0 MOVI — 8-bit -
0 1111 0 FMOV (vector, immediate) — single-precision -
0 1111 1 FMOV (vector, immediate) — half-precision ARMv8.2-FP16
1 1 UNALLOCATED -
1 0xx0 0 MVNI — 32-bit shifted immediate -
1 0xx1 0 BIC (vector, immediate) — 32-bit -
1 10x0 0 MVNI — 16-bit shifted immediate -
1 10x1 0 BIC (vector, immediate) — 16-bit -
1 110x 0 MVNI — 32-bit shifting ones -
0 1 1110 0 MOVI — 64-bit scalar -
1 1 1110 0 MOVI — 64-bit vector -
1 1 1111 0 FMOV (vector, immediate) — double-precision -
Advanced SIMD shift by immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 1 0 != 0000 immb opcode 1 Rn Rd
immh
The following constraints also apply to this encoding: immh != 0000 && immh != 0000
Decode fields
Instruction Details
U opcode
00001 UNALLOCATED
00011 UNALLOCATED
00101 UNALLOCATED
00111 UNALLOCATED
01001 UNALLOCATED
Page 2525
Decode fields
Instruction Details
U opcode
01011 UNALLOCATED
01101 UNALLOCATED
01111 UNALLOCATED
10101 UNALLOCATED
1011x UNALLOCATED
110xx UNALLOCATED
11101 UNALLOCATED
11110 UNALLOCATED
0 00000 SSHR
0 00010 SSRA
0 00100 SRSHR
0 00110 SRSRA
0 01000 UNALLOCATED
0 01010 SHL
0 01100 UNALLOCATED
0 01110 SQSHL (immediate)
0 10000 SHRN, SHRN2
0 10001 RSHRN, RSHRN2
0 10010 SQSHRN, SQSHRN2
0 10011 SQRSHRN, SQRSHRN2
0 10100 SSHLL, SSHLL2
0 11100 SCVTF (vector, fixed-point)
0 11111 FCVTZS (vector, fixed-point)
1 00000 USHR
1 00010 USRA
1 00100 URSHR
1 00110 URSRA
1 01000 SRI
1 01010 SLI
1 01100 SQSHLU
1 01110 UQSHL (immediate)
1 10000 SQSHRUN, SQSHRUN2
1 10001 SQRSHRUN, SQRSHRUN2
1 10010 UQSHRN, UQSHRN2
1 10011 UQRSHRN, UQRSHRN2
1 10100 USHLL, USHLL2
1 11100 UCVTF (vector, fixed-point)
1 11111 FCVTZU (vector, fixed-point)
Advanced SIMD vector x indexed element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q U 0 1 1 1 1 size L M Rm opcode H 0 Rn Rd
Decode fields
U size opcode
Page 2526
Decode fields
U size opcode
0 0010 SMLAL, SMLAL2 (by element) -
0 0011 SQDMLAL, SQDMLAL2 (by element) -
0 0110 SMLSL, SMLSL2 (by element) -
0 0111 SQDMLSL, SQDMLSL2 (by element) -
0 1000 MUL (by element) -
0 1010 SMULL, SMULL2 (by element) -
0 1011 SQDMULL, SQDMULL2 (by element) -
0 1100 SQDMULH (by element) -
0 1101 SQRDMULH (by element) -
0 1110 SDOT (by element) ARMv8.2-DotProd
0 00 0001 FMLA (by element) — half-precision ARMv8.2-FP16
0 00 0101 FMLS (by element) — half-precision ARMv8.2-FP16
0 00 1001 FMUL (by element) — half-precision ARMv8.2-FP16
0 00 1111 SUDOT (by element) ARMv8.2-I8MM
0 01 1111 BFDOT (by element) ARMv8.2-BF16
0 1x 0001 FMLA (by element) — single-precision and double-precision -
0 1x 0101 FMLS (by element) — single-precision and double-precision -
0 1x 1001 FMUL (by element) — single-precision and double-precision -
0 10 0000 FMLAL, FMLAL2 (by element) — FMLAL ARMv8.2-FHM
0 10 0100 FMLSL, FMLSL2 (by element) — FMLSL ARMv8.2-FHM
0 10 1111 USDOT (by element) ARMv8.2-I8MM
0 11 1111 BFMLALB, BFMLALT (by element) ARMv8.2-BF16
1 0000 MLA (by element) -
1 0010 UMLAL, UMLAL2 (by element) -
1 0100 MLS (by element) -
1 0110 UMLSL, UMLSL2 (by element) -
1 1010 UMULL, UMULL2 (by element) -
1 1101 SQRDMLAH (by element) ARMv8.1-RDMA
1 1110 UDOT (by element) ARMv8.2-DotProd
1 1111 SQRDMLSH (by element) ARMv8.1-RDMA
1 00 1001 FMULX (by element) — half-precision ARMv8.2-FP16
1 01 0xx1 FCMLA (by element) ARMv8.3-CompNum
1 1x 1001 FMULX (by element) — single-precision and double-precision -
1 10 0xx1 FCMLA (by element) ARMv8.3-CompNum
Page 2527
Decode fields
U size opcode
1 10 1000 FMLAL, FMLAL2 (by element) — FMLAL2 ARMv8.2-FHM
1 10 1100 FMLSL, FMLSL2 (by element) — FMLSL2 ARMv8.2-FHM
Cryptographic three-register, imm2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 opcode Rn Rd
Decode fields
opcode
00 SM3TT1A ARMv8.2-SM3
01 SM3TT1B ARMv8.2-SM3
10 SM3TT2A ARMv8.2-SM3
11 SM3TT2B ARMv8.2-SM3
Cryptographic three-register SHA 512
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 O 0 0 opcode Rn Rd
Decode fields
O opcode
0 00 SHA512H ARMv8.2-SHA2
0 01 SHA512H2 ARMv8.2-SHA2
0 10 SHA512SU1 ARMv8.2-SHA2
0 11 RAX1 ARMv8.2-SHA3
1 00 SM3PARTW1 ARMv8.2-SM3
1 01 SM3PARTW2 ARMv8.2-SM3
1 10 SM4EKEY ARMv8.2-SM4
1 11 UNALLOCATED -
Cryptographic four-register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 Op0 Rm 0 Ra Rn Rd
Decode fields
Op0
00 EOR3 ARMv8.2-SHA3
01 BCAX ARMv8.2-SHA3
Page 2528
Decode fields
Op0
10 SM3SS1 ARMv8.2-SM3
11 UNALLOCATED -
Cryptographic two-register SHA 512
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 opcode Rn Rd
Decode fields
opcode
00 SHA512SU0 ARMv8.2-SHA2
01 SM4E ARMv8.2-SM4
1x UNALLOCATED -
Conversion between floating-point and fixed-point
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 S 1 1 1 1 0 ptype 0 rmode opcode scale Rn Rd

Instruction Details
sf S ptype rmode opcode scale Version
1xx UNALLOCATED -
x0 00x UNALLOCATED -
0x 00x UNALLOCATED -
10 UNALLOCATED -
1 UNALLOCATED -
0 0xxxxx UNALLOCATED -
0 0 00 00 010 SCVTF (scalar, fixed-point) — 32-bit to -
single-precision
0 0 00 00 011 UCVTF (scalar, fixed-point) — 32-bit to -
single-precision
0 0 00 11 000 FCVTZS (scalar, fixed-point) — single- -
precision to 32-bit
0 0 00 11 001 FCVTZU (scalar, fixed-point) — single- -
precision to 32-bit
double-precision
double-precision
0 0 01 11 000 FCVTZS (scalar, fixed-point) — double- -
precision to 32-bit
0 0 01 11 001 FCVTZU (scalar, fixed-point) — double- -
precision to 32-bit
0 0 11 00 010 SCVTF (scalar, fixed-point) — 32-bit to ARMv8.2-FP16
half-precision
0 0 11 00 011 UCVTF (scalar, fixed-point) — 32-bit to ARMv8.2-FP16
half-precision
Page 2529

Instruction Details
sf S ptype rmode opcode scale Version
0 0 11 11 000 FCVTZS (scalar, fixed-point) — half- ARMv8.2-FP16
precision to 32-bit
0 0 11 11 001 FCVTZU (scalar, fixed-point) — half- ARMv8.2-FP16
precision to 32-bit
single-precision
single-precision
1 0 00 11 000 FCVTZS (scalar, fixed-point) — single- -
precision to 64-bit
1 0 00 11 001 FCVTZU (scalar, fixed-point) — single- -
precision to 64-bit
double-precision
double-precision
1 0 01 11 000 FCVTZS (scalar, fixed-point) — double- -
precision to 64-bit
1 0 01 11 001 FCVTZU (scalar, fixed-point) — double- -
precision to 64-bit
1 0 11 00 010 SCVTF (scalar, fixed-point) — 64-bit to ARMv8.2-FP16
half-precision
1 0 11 00 011 UCVTF (scalar, fixed-point) — 64-bit to ARMv8.2-FP16
half-precision
1 0 11 11 000 FCVTZS (scalar, fixed-point) — half- ARMv8.2-FP16
precision to 64-bit
1 0 11 11 001 FCVTZU (scalar, fixed-point) — half- ARMv8.2-FP16
precision to 64-bit
Conversion between floating-point and integer
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 S 1 1 1 1 0 ptype 1 rmode opcode 0 0 0 0 0 0 Rn Rd

Instruction Details
sf S ptype rmode opcode Version
1 UNALLOCATED -
0 0 00 x1 11x UNALLOCATED -
0 0 00 00 000 FCVTNS (scalar) — single-precision to 32-bit -
0 0 00 00 001 FCVTNU (scalar) — single-precision to 32-bit -
0 0 00 00 010 SCVTF (scalar, integer) — 32-bit to single- -
precision
0 0 00 00 011 UCVTF (scalar, integer) — 32-bit to single- -
precision
0 0 00 00 100 FCVTAS (scalar) — single-precision to 32-bit -
0 0 00 00 101 FCVTAU (scalar) — single-precision to 32-bit -
Page 2530

Instruction Details
0 0 00 00 110 FMOV (general) — single-precision to 32-bit -
0 0 00 00 111 FMOV (general) — 32-bit to single-precision -
0 0 00 01 000 FCVTPS (scalar) — single-precision to 32-bit -
0 0 00 01 001 FCVTPU (scalar) — single-precision to 32-bit -
0 0 00 1x 11x UNALLOCATED -
0 0 00 10 000 FCVTMS (scalar) — single-precision to 32-bit -
0 0 00 10 001 FCVTMU (scalar) — single-precision to 32-bit -
0 0 00 11 000 FCVTZS (scalar, integer) — single-precision to -
32-bit
0 0 00 11 001 FCVTZU (scalar, integer) — single-precision to -
32-bit
0 0 01 00 000 FCVTNS (scalar) — double-precision to 32-bit -
0 0 01 00 001 FCVTNU (scalar) — double-precision to 32-bit -
0 0 01 00 010 SCVTF (scalar, integer) — 32-bit to double- -
precision
0 0 01 00 011 UCVTF (scalar, integer) — 32-bit to double- -
precision
0 0 01 00 100 FCVTAS (scalar) — double-precision to 32-bit -
0 0 01 00 101 FCVTAU (scalar) — double-precision to 32-bit -
0 0 01 01 000 FCVTPS (scalar) — double-precision to 32-bit -
0 0 01 01 001 FCVTPU (scalar) — double-precision to 32-bit -
0 0 01 10 000 FCVTMS (scalar) — double-precision to 32-bit -
0 0 01 10 001 FCVTMU (scalar) — double-precision to 32-bit -
0 0 01 10 11x UNALLOCATED -
0 0 01 11 000 FCVTZS (scalar, integer) — double-precision to -
32-bit
0 0 01 11 001 FCVTZU (scalar, integer) — double-precision to -
32-bit
0 0 01 11 110 FJCVTZS ARMv8.3-JConv
0 0 11 00 000 FCVTNS (scalar) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 00 001 FCVTNU (scalar) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 00 010 SCVTF (scalar, integer) — 32-bit to half- ARMv8.2-FP16
precision
0 0 11 00 011 UCVTF (scalar, integer) — 32-bit to half- ARMv8.2-FP16
precision
0 0 11 00 100 FCVTAS (scalar) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 00 101 FCVTAU (scalar) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 00 110 FMOV (general) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 00 111 FMOV (general) — 32-bit to half-precision ARMv8.2-FP16
0 0 11 01 000 FCVTPS (scalar) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 01 001 FCVTPU (scalar) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 10 000 FCVTMS (scalar) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 10 001 FCVTMU (scalar) — half-precision to 32-bit ARMv8.2-FP16
0 0 11 11 000 FCVTZS (scalar, integer) — half-precision to ARMv8.2-FP16
32-bit
0 0 11 11 001 FCVTZU (scalar, integer) — half-precision to ARMv8.2-FP16
32-bit
Page 2531

Instruction Details
1 0 00 00 000 FCVTNS (scalar) — single-precision to 64-bit -
1 0 00 00 001 FCVTNU (scalar) — single-precision to 64-bit -
1 0 00 00 010 SCVTF (scalar, integer) — 64-bit to single- -
precision
1 0 00 00 011 UCVTF (scalar, integer) — 64-bit to single- -
precision
1 0 00 00 100 FCVTAS (scalar) — single-precision to 64-bit -
1 0 00 00 101 FCVTAU (scalar) — single-precision to 64-bit -
1 0 00 01 000 FCVTPS (scalar) — single-precision to 64-bit -
1 0 00 01 001 FCVTPU (scalar) — single-precision to 64-bit -
1 0 00 10 000 FCVTMS (scalar) — single-precision to 64-bit -
1 0 00 10 001 FCVTMU (scalar) — single-precision to 64-bit -
1 0 00 11 000 FCVTZS (scalar, integer) — single-precision to -
64-bit
1 0 00 11 001 FCVTZU (scalar, integer) — single-precision to -
64-bit
1 0 01 00 000 FCVTNS (scalar) — double-precision to 64-bit -
1 0 01 00 001 FCVTNU (scalar) — double-precision to 64-bit -
1 0 01 00 010 SCVTF (scalar, integer) — 64-bit to double- -
precision
1 0 01 00 011 UCVTF (scalar, integer) — 64-bit to double- -
precision
1 0 01 00 100 FCVTAS (scalar) — double-precision to 64-bit -
1 0 01 00 101 FCVTAU (scalar) — double-precision to 64-bit -
1 0 01 00 110 FMOV (general) — double-precision to 64-bit -
1 0 01 00 111 FMOV (general) — 64-bit to double-precision -
1 0 01 01 000 FCVTPS (scalar) — double-precision to 64-bit -
1 0 01 01 001 FCVTPU (scalar) — double-precision to 64-bit -
1 0 01 10 000 FCVTMS (scalar) — double-precision to 64-bit -
1 0 01 10 001 FCVTMU (scalar) — double-precision to 64-bit -
1 0 01 11 000 FCVTZS (scalar, integer) — double-precision to -
64-bit
1 0 01 11 001 FCVTZU (scalar, integer) — double-precision to -
64-bit
1 0 10 01 110 FMOV (general) — top half of 128-bit to 64-bit -
1 0 10 01 111 FMOV (general) — 64-bit to top half of 128-bit -
1 0 11 00 000 FCVTNS (scalar) — half-precision to 64-bit ARMv8.2-FP16
1 0 11 00 001 FCVTNU (scalar) — half-precision to 64-bit ARMv8.2-FP16
1 0 11 00 010 SCVTF (scalar, integer) — 64-bit to half- ARMv8.2-FP16
precision
1 0 11 00 011 UCVTF (scalar, integer) — 64-bit to half- ARMv8.2-FP16
precision
1 0 11 00 100 FCVTAS (scalar) — half-precision to 64-bit ARMv8.2-FP16
1 0 11 00 101 FCVTAU (scalar) — half-precision to 64-bit ARMv8.2-FP16
1 0 11 00 110 FMOV (general) — half-precision to 64-bit ARMv8.2-FP16
1 0 11 00 111 FMOV (general) — 64-bit to half-precision ARMv8.2-FP16
1 0 11 01 000 FCVTPS (scalar) — half-precision to 64-bit ARMv8.2-FP16
Page 2532

Instruction Details
1 0 11 01 001 FCVTPU (scalar) — half-precision to 64-bit ARMv8.2-FP16
1 0 11 10 000 FCVTMS (scalar) — half-precision to 64-bit ARMv8.2-FP16
1 0 11 10 001 FCVTMU (scalar) — half-precision to 64-bit ARMv8.2-FP16
1 0 11 11 000 FCVTZS (scalar, integer) — half-precision to ARMv8.2-FP16
64-bit
1 0 11 11 001 FCVTZU (scalar, integer) — half-precision to ARMv8.2-FP16
64-bit
Floating-point data-processing (1 source)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M 0 S 1 1 1 1 0 ptype 1 opcode 1 0 0 0 0 Rn Rd
Decode fields
M S ptype opcode
1 UNALLOCATED -
0 0 00 000000 FMOV (register) — single-precision -
0 0 00 000001 FABS (scalar) — single-precision -
0 0 00 000010 FNEG (scalar) — single-precision -
0 0 00 000011 FSQRT (scalar) — single-precision -
0 0 00 000101 FCVT — single-precision to double-precision -
0 0 00 000111 FCVT — single-precision to half-precision -
0 0 00 001000 FRINTN (scalar) — single-precision -
0 0 00 001001 FRINTP (scalar) — single-precision -
0 0 00 001010 FRINTM (scalar) — single-precision -
0 0 00 001011 FRINTZ (scalar) — single-precision -
0 0 00 001100 FRINTA (scalar) — single-precision -
0 0 00 001110 FRINTX (scalar) — single-precision -
0 0 00 001111 FRINTI (scalar) — single-precision -
0 0 00 010000 FRINT32Z (scalar) — single-precision ARMv8.5-FRINT
0 0 00 010001 FRINT32X (scalar) — single-precision ARMv8.5-FRINT
0 0 00 010010 FRINT64Z (scalar) — single-precision ARMv8.5-FRINT
0 0 00 010011 FRINT64X (scalar) — single-precision ARMv8.5-FRINT
0 0 01 000000 FMOV (register) — double-precision -
0 0 01 000001 FABS (scalar) — double-precision -
0 0 01 000010 FNEG (scalar) — double-precision -
0 0 01 000011 FSQRT (scalar) — double-precision -
0 0 01 000100 FCVT — double-precision to single-precision -
0 0 01 000110 BFCVT ARMv8.2-BF16
0 0 01 000111 FCVT — double-precision to half-precision -
Page 2533
Decode fields
M S ptype opcode
0 0 01 001000 FRINTN (scalar) — double-precision -
0 0 01 001001 FRINTP (scalar) — double-precision -
0 0 01 001010 FRINTM (scalar) — double-precision -
0 0 01 001011 FRINTZ (scalar) — double-precision -
0 0 01 001100 FRINTA (scalar) — double-precision -
0 0 01 001110 FRINTX (scalar) — double-precision -
0 0 01 001111 FRINTI (scalar) — double-precision -
0 0 01 010000 FRINT32Z (scalar) — double-precision ARMv8.5-FRINT
0 0 01 010001 FRINT32X (scalar) — double-precision ARMv8.5-FRINT
0 0 01 010010 FRINT64Z (scalar) — double-precision ARMv8.5-FRINT
0 0 01 010011 FRINT64X (scalar) — double-precision ARMv8.5-FRINT
0 0 10 0xxxxx UNALLOCATED -
0 0 11 000000 FMOV (register) — half-precision ARMv8.2-FP16
0 0 11 000001 FABS (scalar) — half-precision ARMv8.2-FP16
0 0 11 000010 FNEG (scalar) — half-precision ARMv8.2-FP16
0 0 11 000011 FSQRT (scalar) — half-precision ARMv8.2-FP16
0 0 11 000100 FCVT — half-precision to single-precision -
0 0 11 000101 FCVT — half-precision to double-precision -
0 0 11 001000 FRINTN (scalar) — half-precision ARMv8.2-FP16
0 0 11 001001 FRINTP (scalar) — half-precision ARMv8.2-FP16
0 0 11 001010 FRINTM (scalar) — half-precision ARMv8.2-FP16
0 0 11 001011 FRINTZ (scalar) — half-precision ARMv8.2-FP16
0 0 11 001100 FRINTA (scalar) — half-precision ARMv8.2-FP16
0 0 11 001110 FRINTX (scalar) — half-precision ARMv8.2-FP16
0 0 11 001111 FRINTI (scalar) — half-precision ARMv8.2-FP16
0 0 11 01xxxx UNALLOCATED -
1 UNALLOCATED -
Floating-point compare
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M 0 S 1 1 1 1 0 ptype 1 Rm op 1 0 0 0 Rn opcode2
Decode fields
M S ptype op opcode2
xxxx1 UNALLOCATED -
xxx1x UNALLOCATED -
xx1xx UNALLOCATED -
x1 UNALLOCATED -
1x UNALLOCATED -
10 UNALLOCATED -
1 UNALLOCATED -
Page 2534
Decode fields
M S ptype op opcode2
0 0 00 00 00000 FCMP -
0 0 00 00 01000 FCMP -
0 0 00 00 10000 FCMPE -
0 0 00 00 11000 FCMPE -
0 0 01 00 00000 FCMP -
0 0 01 00 01000 FCMP -
0 0 01 00 10000 FCMPE -
0 0 01 00 11000 FCMPE -
0 0 11 00 00000 FCMP ARMv8.2-FP16
0 0 11 00 01000 FCMP ARMv8.2-FP16
0 0 11 00 10000 FCMPE ARMv8.2-FP16
0 0 11 00 11000 FCMPE ARMv8.2-FP16
1 UNALLOCATED -
Floating-point immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M 0 S 1 1 1 1 0 ptype 1 imm8 1 0 0 imm5 Rd
Decode fields
M S ptype imm5
xxxx1 UNALLOCATED -
xxx1x UNALLOCATED -
xx1xx UNALLOCATED -
x1xxx UNALLOCATED -
1xxxx UNALLOCATED -
10 UNALLOCATED -
1 UNALLOCATED -
0 0 00 00000 FMOV (scalar, immediate) — single-precision -
0 0 01 00000 FMOV (scalar, immediate) — double-precision -
0 0 11 00000 FMOV (scalar, immediate) — half-precision ARMv8.2-FP16
1 UNALLOCATED -
Floating-point conditional compare
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M 0 S 1 1 1 1 0 ptype 1 Rm cond 0 1 Rn op nzcv
Decode fields
M S ptype op
10 UNALLOCATED -
1 UNALLOCATED -
0 0 00 0 FCCMP — single-precision -
0 0 00 1 FCCMPE — single-precision -
0 0 01 0 FCCMP — double-precision -
0 0 01 1 FCCMPE — double-precision -
Page 2535
Decode fields
M S ptype op
0 0 11 0 FCCMP — half-precision ARMv8.2-FP16
0 0 11 1 FCCMPE — half-precision ARMv8.2-FP16
1 UNALLOCATED -
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M 0 S 1 1 1 1 0 ptype 1 Rm opcode 1 0 Rn Rd
Decode fields
M S ptype opcode
1xx1 UNALLOCATED -
1x1x UNALLOCATED -
11xx UNALLOCATED -
10 UNALLOCATED -
1 UNALLOCATED -
0 0 00 0000 FMUL (scalar) — single-precision -
0 0 00 0001 FDIV (scalar) — single-precision -
0 0 00 0010 FADD (scalar) — single-precision -
0 0 00 0011 FSUB (scalar) — single-precision -
0 0 00 0100 FMAX (scalar) — single-precision -
0 0 00 0101 FMIN (scalar) — single-precision -
0 0 00 0110 FMAXNM (scalar) — single-precision -
0 0 00 0111 FMINNM (scalar) — single-precision -
0 0 00 1000 FNMUL (scalar) — single-precision -
0 0 01 0000 FMUL (scalar) — double-precision -
0 0 01 0001 FDIV (scalar) — double-precision -
0 0 01 0010 FADD (scalar) — double-precision -
0 0 01 0011 FSUB (scalar) — double-precision -
0 0 01 0100 FMAX (scalar) — double-precision -
0 0 01 0101 FMIN (scalar) — double-precision -
0 0 01 0110 FMAXNM (scalar) — double-precision -
0 0 01 0111 FMINNM (scalar) — double-precision -
0 0 01 1000 FNMUL (scalar) — double-precision -
0 0 11 0000 FMUL (scalar) — half-precision ARMv8.2-FP16
0 0 11 0001 FDIV (scalar) — half-precision ARMv8.2-FP16
0 0 11 0010 FADD (scalar) — half-precision ARMv8.2-FP16
0 0 11 0011 FSUB (scalar) — half-precision ARMv8.2-FP16
0 0 11 0100 FMAX (scalar) — half-precision ARMv8.2-FP16
0 0 11 0101 FMIN (scalar) — half-precision ARMv8.2-FP16
0 0 11 0110 FMAXNM (scalar) — half-precision ARMv8.2-FP16
0 0 11 0111 FMINNM (scalar) — half-precision ARMv8.2-FP16
0 0 11 1000 FNMUL (scalar) — half-precision ARMv8.2-FP16
1 UNALLOCATED -
Page 2536
Floating-point conditional select
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M 0 S 1 1 1 1 0 ptype 1 Rm cond 1 1 Rn Rd
Decode fields
M S ptype
10 UNALLOCATED -
1 UNALLOCATED -
0 0 00 FCSEL — single-precision -
0 0 01 FCSEL — double-precision -
0 0 11 FCSEL — half-precision ARMv8.2-FP16
1 UNALLOCATED -
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M 0 S 1 1 1 1 1 ptype o1 Rm o0 Ra Rn Rd
Decode fields
M S ptype o1 o0
10 UNALLOCATED -
1 UNALLOCATED -
0 0 00 0 0 FMADD — single-precision -
0 0 00 0 1 FMSUB — single-precision -
0 0 00 1 0 FNMADD — single-precision -
0 0 00 1 1 FNMSUB — single-precision -
0 0 01 0 0 FMADD — double-precision -
0 0 01 0 1 FMSUB — double-precision -
0 0 01 1 0 FNMADD — double-precision -
0 0 01 1 1 FNMSUB — double-precision -
0 0 11 0 0 FMADD — half-precision ARMv8.2-FP16
0 0 11 0 1 FMSUB — half-precision ARMv8.2-FP16
0 0 11 1 0 FNMADD — half-precision ARMv8.2-FP16
0 0 11 1 1 FNMSUB — half-precision ARMv8.2-FP16
1 UNALLOCATED -
Page 2537
Shared Pseudocode Functions
This page displays common pseudocode functions shared by many pages.
Pseudocodes
Library pseudocode for aarch32/debug/VCRMatch/AArch32.VCRMatch
// AArch32.VCRMatch()
// ==================
boolean AArch32.VCRMatch(bits(32) vaddress)
if UsingAArch32() && ELUsingAArch32(EL1) && IsZero(vaddress<1:0>) && PSTATE.EL != EL2 then

// Each bit position in this string corresponds to a bit in DBGVCR and an exception vector.
match_word = Zeros(32);
if vaddress<31:5> == ExcVectorBase()<31:5> then

if HaveEL(EL3) && !IsSecure() then
match_word<UInt(vaddress<4:2>) + 24> = '1'; // Non-secure vectors
else
match_word<UInt(vaddress<4:2>) + 0> = '1'; // Secure vectors (or no EL3)
if HaveEL(EL3) && ELUsingAArch32(EL3) && IsSecure() && vaddress<31:5> == MVBAR<31:5> then

match_word<UInt(vaddress<4:2>) + 8> = '1'; // Monitor vectors
// Mask out bits not corresponding to vectors.

if !HaveEL(EL3) then
mask = '00000000':'00000000':'00000000':'11011110'; // DBGVCR[31:8] are RES0
elsif !ELUsingAArch32(EL3) then
mask = '11011110':'00000000':'00000000':'11011110'; // DBGVCR[15:8] are RES0
else
mask = '11011110':'00000000':'11011100':'11011110';
match_word = match_word AND DBGVCR AND mask;

match = !IsZero(match_word);
// Check for UNPREDICTABLE case - match on Prefetch Abort and Data Abort vectors
if !IsZero(match_word<28:27,12:11,4:3>) && DebugTarget() == PSTATE.EL then
match = ConstrainUnpredictableBool(Unpredictable_VCMATCHDAPA);
else
match = FALSE;
return match;
Library pseudocode for aarch32/debug/authentication/

AArch32.SelfHostedSecurePrivilegedInvasiveDebugEnabled
// AArch32.SelfHostedSecurePrivilegedInvasiveDebugEnabled()
// ========================================================
boolean AArch32.SelfHostedSecurePrivilegedInvasiveDebugEnabled()
// The definition of this function is IMPLEMENTATION DEFINED.
// In the recommended interface, AArch32.SelfHostedSecurePrivilegedInvasiveDebugEnabled returns
// the state of the (DBGEN AND SPIDEN) signal.
if !HaveEL(EL3) && !IsSecure() then return FALSE;
return DBGEN == HIGH && SPIDEN == HIGH;
Shared Pseudocode Functions Page 2538

Library pseudocode for aarch32/debug/breakpoint/AArch32.BreakpointMatch
// AArch32.BreakpointMatch()
// =========================
// Breakpoint matching in an AArch32 translation regime.
(boolean,boolean) AArch32.BreakpointMatch(integer n, bits(32) vaddress, integer size)

assert ELUsingAArch32(S1TranslationRegime());
assert n <= UInt(DBGDIDR.BRPs);
enabled = DBGBCR[n].E == '1';

ispriv = PSTATE.EL != EL0;
linked = DBGBCR[n].BT == '0x01';
isbreakpnt = TRUE;
linked_to = FALSE;
state_match = AArch32.StateMatch(DBGBCR[n].SSC, DBGBCR[n].HMC, DBGBCR[n].PMC,

linked, DBGBCR[n].LBN, isbreakpnt, ispriv);
(value_match, value_mismatch) = AArch32.BreakpointValueMatch(n, vaddress, linked_to);
if size == 4 then // Check second halfword

// If the breakpoint address and BAS of an Address breakpoint match the address of the
// second halfword of an instruction, but not the address of the first halfword, it is
// CONSTRAINED UNPREDICTABLE whether or not this breakpoint generates a Breakpoint debug
// event.
(match_i, mismatch_i) = AArch32.BreakpointValueMatch(n, vaddress + 2, linked_to);
if !value_match && match_i then
value_match = ConstrainUnpredictableBool(Unpredictable_BPMATCHHALF);
if value_mismatch && !mismatch_i then
value_mismatch = ConstrainUnpredictableBool(Unpredictable_BPMISMATCHHALF);
if vaddress<1> == '1' && DBGBCR[n].BAS == '1111' then
// The above notwithstanding, if DBGBCR[n].BAS == '1111', then it is CONSTRAINED
// UNPREDICTABLE whether or not a Breakpoint debug event is generated for an instruction
// at the address DBGBVR[n]+2.
if value_match then value_match = ConstrainUnpredictableBool(Unpredictable_BPMATCHHALF);
if !value_mismatch then value_mismatch = ConstrainUnpredictableBool(Unpredictable_BPMISMATCHHALF)
match = value_match && state_match && enabled;

mismatch = value_mismatch && state_match && enabled;
return (match, mismatch);

Library pseudocode for aarch32/debug/breakpoint/AArch32.BreakpointValueMatch

// AArch32.BreakpointValueMatch()
// ==============================
// The first result is whether an Address Match or Context breakpoint is programmed on the
// instruction at "address". The second result is whether an Address Mismatch breakpoint is
// programmed on the instruction, that is, whether the instruction should be stepped.
(boolean,boolean) AArch32.BreakpointValueMatch(integer n, bits(32) vaddress, boolean linked_to)
// "n" is the identity of the breakpoint unit to match against.

// "vaddress" is the current instruction address, ignored if linked_to is TRUE and for Context
// matching breakpoints.
// "linked_to" is TRUE if this is a call from StateMatch for linking.
// If a non-existent breakpoint then it is CONSTRAINED UNPREDICTABLE whether this gives

// no match or the breakpoint is mapped to another UNKNOWN implemented breakpoint.
if n > UInt(DBGDIDR.BRPs) then
(c, n) = ConstrainUnpredictableInteger(0, UInt(DBGDIDR.BRPs), Unpredictable_BPNOTIMPL);
assert c IN {Constraint_DISABLED, Constraint_UNKNOWN};
if c == Constraint_DISABLED then return (FALSE,FALSE);
// If this breakpoint is not enabled, it cannot generate a match. (This could also happen on a
// call from StateMatch for linking).
if DBGBCR[n].E == '0' then return (FALSE,FALSE);
context_aware = (n >= UInt(DBGDIDR.BRPs) - UInt(DBGDIDR.CTX_CMPs));
// If BT is set to a reserved type, behaves either as disabled or as a not-reserved type.

dbgtype = DBGBCR[n].BT;
if ((dbgtype IN {'011x','11xx'} && !HaveVirtHostExt() && !HaveV82Debug()) || // Context matching

(dbgtype == '010x' && HaltOnBreakpointOrWatchpoint()) || // Address mismatch
(dbgtype != '0x0x' && !context_aware) || // Context matching
(dbgtype == '1xxx' && !HaveEL(EL2))) then // EL2 extension
(c, dbgtype) = ConstrainUnpredictableBits(Unpredictable_RESBPTYPE);
if c == Constraint_DISABLED then return (FALSE,FALSE);
// Otherwise the value returned by ConstrainUnpredictableBits must be a not-reserved value
// Determine what to compare against.

match_addr = (dbgtype == '0x0x');
mismatch = (dbgtype == '010x');
match_vmid = (dbgtype == '10xx');
match_cid1 = (dbgtype == 'xx1x');
match_cid2 = (dbgtype == '11xx');
linked = (dbgtype == 'xxx1');
// If this is a call from StateMatch, return FALSE if the breakpoint is not programmed for a
// VMID and/or context ID match, of if not context-aware. The above assertions mean that the
// code can just test for match_addr == TRUE to confirm all these things.
if linked_to && (!linked || match_addr) then return (FALSE,FALSE);
// If called from BreakpointMatch return FALSE for Linked context ID and/or VMID matches.
if !linked_to && linked && !match_addr then return (FALSE,FALSE);
// Do the comparison.
if match_addr then
byte = UInt(vaddress<1:0>);
assert byte IN {0,2}; // "vaddress" is halfword aligned
byte_select_match = (DBGBCR[n].BAS<byte> == '1');
BVR_match = vaddress<31:2> == DBGBVR[n]<31:2> && byte_select_match;
elsif match_cid1 then
BVR_match = (PSTATE.EL != EL2 && CONTEXTIDR == DBGBVR[n]<31:0>);
if match_vmid then
if ELUsingAArch32(EL2) then
vmid = ZeroExtend(VTTBR.VMID, 16);
bvr_vmid = ZeroExtend(DBGBXVR[n]<7:0>, 16);
elsif !Have16bitVMID() || VTCR_EL2.VS == '0' then
vmid = ZeroExtend(VTTBR_EL2.VMID<7:0>, 16);
bvr_vmid = ZeroExtend(DBGBXVR[n]<7:0>, 16);
else

vmid = VTTBR_EL2.VMID;
bvr_vmid = DBGBXVR[n]<15:0>;
BXVR_match = (PSTATE.EL IN {EL0, EL1} && EL2Enabled() &&
vmid == bvr_vmid);
BXVR_match = ((HaveVirtHostExt() || HaveV82Debug()) && EL2Enabled() &&
!ELUsingAArch32(EL2) &&
DBGBXVR[n]<31:0> == CONTEXTIDR_EL2);
bvr_match_valid = (match_addr || match_cid1);

bxvr_match_valid = (match_vmid || match_cid2);
match = (!bxvr_match_valid || BXVR_match) && (!bvr_match_valid || BVR_match);
return (match && !mismatch, !match && mismatch);

Library pseudocode for aarch32/debug/breakpoint/AArch32.StateMatch
// AArch32.StateMatch()
// ====================
// Determine whether a breakpoint or watchpoint is enabled in the current mode and state.
boolean AArch32.StateMatch(bits(2) SSC, bit HMC, bits(2) PxC, boolean linked, bits(4) LBN,
boolean isbreakpnt, boolean ispriv)
// "SSC", "HMC", "PxC" are the control fields from the DBGBCR[n] or DBGWCR[n] register.
// "linked" is TRUE if this is a linked breakpoint/watchpoint type.
// "LBN" is the linked breakpoint number from the DBGBCR[n] or DBGWCR[n] register.
// "isbreakpnt" is TRUE for breakpoints, FALSE for watchpoints.
// "ispriv" is valid for watchpoints, and selects between privileged and unprivileged accesses.
// If parameters are set to a reserved type, behaves as either disabled or a defined type
(c, SSC, HMC, PxC) = CheckValidStateMatch(SSC, HMC, PxC, isbreakpnt);
if c == Constraint_DISABLED then return FALSE;
// Otherwise the HMC,SSC,PxC values are either valid or the values returned by
// CheckValidStateMatch are valid.
PL2_match = HaveEL(EL2) && ((HMC == '1' && (SSC:PxC != '1000')) || SSC == '11');
PL1_match = PxC<0> == '1';
PL0_match = PxC<1> == '1';
SSU_match = isbreakpnt && HMC == '0' && PxC == '00' && SSC != '11';
if !ispriv && !isbreakpnt then

priv_match = PL0_match;
elsif SSU_match then
priv_match = PSTATE.M IN {M32_User,M32_Svc,M32_System};
else
case PSTATE.EL of
when EL3 priv_match = PL1_match; // EL3 and EL1 are both PL1
when EL2 priv_match = PL2_match;
case SSC of
when '00' security_state_match = TRUE; // Both
when '01' security_state_match = !IsSecure(); // Non-secure only
when '10' security_state_match = IsSecure(); // Secure only
when '11' security_state_match = (HMC == '1' || IsSecure()); // HMC=1 -> Both, 0 -> Secure only
if linked then
// "LBN" must be an enabled context-aware breakpoint unit. If it is not context-aware then
// it is CONSTRAINED UNPREDICTABLE whether this gives no match, or LBN is mapped to some
// UNKNOWN breakpoint that is context-aware.
lbn = UInt(LBN);
first_ctx_cmp = (UInt(DBGDIDR.BRPs) - UInt(DBGDIDR.CTX_CMPs));
last_ctx_cmp = UInt(DBGDIDR.BRPs);
if (lbn < first_ctx_cmp || lbn > last_ctx_cmp) then
(c, lbn) = ConstrainUnpredictableInteger(first_ctx_cmp, last_ctx_cmp, Unpredictable_BPNOTCTXC
assert c IN {Constraint_DISABLED, Constraint_NONE, Constraint_UNKNOWN};
case c of
when Constraint_DISABLED return FALSE; // Disabled
when Constraint_NONE linked = FALSE; // No linking
// Otherwise ConstrainUnpredictableInteger returned a context-aware breakpoint
if linked then
vaddress = bits(32) UNKNOWN;
linked_to = TRUE;
(linked_match,-) = AArch32.BreakpointValueMatch(lbn, vaddress, linked_to);
return priv_match && security_state_match && (!linked || linked_match);

Library pseudocode for aarch32/debug/enables/AArch32.GenerateDebugExceptions
// AArch32.GenerateDebugExceptions()
// =================================
boolean AArch32.GenerateDebugExceptions()
return AArch32.GenerateDebugExceptionsFrom(PSTATE.EL, IsSecure());
Library pseudocode for aarch32/debug/enables/AArch32.GenerateDebugExceptionsFrom
// AArch32.GenerateDebugExceptionsFrom()
// =====================================
boolean AArch32.GenerateDebugExceptionsFrom(bits(2) from, boolean secure)
if from == EL0 && !ELStateUsingAArch32(EL1, secure) then

mask = bit UNKNOWN; // PSTATE.D mask, unused for EL0 case
return AArch64.GenerateDebugExceptionsFrom(from, secure, mask);
if DBGOSLSR.OSLK == '1' || DoubleLockStatus() || Halted() then

return FALSE;
if HaveEL(EL3) && secure then

spd = if ELUsingAArch32(EL3) then SDCR.SPD else MDCR_EL3.SPD32;
if spd<1> == '1' then
enabled = spd<0> == '1';
else
// SPD == 0b01 is reserved, but behaves the same as 0b00.
enabled = AArch32.SelfHostedSecurePrivilegedInvasiveDebugEnabled();
if from == EL0 then enabled = enabled || SDER.SUIDEN == '1';
else
enabled = from != EL2;
return enabled;
Library pseudocode for aarch32/debug/pmu/AArch32.CheckForPMUOverflow
// AArch32.CheckForPMUOverflow()
// =============================
// Signal Performance Monitors overflow IRQ and CTI overflow events
boolean AArch32.CheckForPMUOverflow()
if !ELUsingAArch32(EL1) then return AArch64.CheckForPMUOverflow();

pmuirq = PMCR.E == '1' && PMINTENSET<31> == '1' && PMOVSSET<31> == '1';
for n = 0 to UInt(PMCR.N) - 1
if HaveEL(EL2) then
hpmn = if !ELUsingAArch32(EL2) then MDCR_EL2.HPMN else HDCR.HPMN;
hpme = if !ELUsingAArch32(EL2) then MDCR_EL2.HPME else HDCR.HPME;
E = (if n < UInt(hpmn) then PMCR.E else hpme);
else
E = PMCR.E;
if E == '1' && PMINTENSET<n> == '1' && PMOVSSET<n> == '1' then pmuirq = TRUE;
SetInterruptRequestLevel(InterruptID_PMUIRQ, if pmuirq then HIGH else LOW);
CTI_SetEventLevel(CrossTriggerIn_PMUOverflow, if pmuirq then HIGH else LOW);
// The request remains set until the condition is cleared. (For example, an interrupt handler
// or cross-triggered event handler clears the overflow status flag by writing to PMOVSCLR_EL0.)
return pmuirq;

Library pseudocode for aarch32/debug/pmu/AArch32.CountEvents
// AArch32.CountEvents()
// =====================
// Return TRUE if counter "n" should count its event. For the cycle counter, n == 31.
boolean AArch32.CountEvents(integer n)
assert n == 31 || n < UInt(PMCR.N);
if !ELUsingAArch32(EL1) then return AArch64.CountEvents(n);

// Event counting is disabled in Debug state
debug = Halted();
// In Non-secure state, some counters are reserved for EL2

if HaveEL(EL2) then
hpmn = if !ELUsingAArch32(EL2) then MDCR_EL2.HPMN else HDCR.HPMN;
hpme = if !ELUsingAArch32(EL2) then MDCR_EL2.HPME else HDCR.HPME;
if HaveHPMDExt() then
hpmd = if !ELUsingAArch32(EL2) then MDCR_EL2.HPMD else HDCR.HPMD;
E = if n < UInt(hpmn) || n == 31 then PMCR.E else hpme;
else
E = PMCR.E;
enabled = E == '1' && PMCNTENSET<n> == '1';
// Event counting in Secure state is prohibited unless any one of:

// * EL3 is not implemented
// * EL3 is using AArch64 and MDCR_EL3.SPME == 1
// * EL3 is using AArch32 and SDCR.SPME == 1
// * Executing at EL0, and SDER.SUNIDEN == 1.
spme = (if ELUsingAArch32(EL3) then SDCR.SPME else MDCR_EL3.SPME);
prohibited = HaveEL(EL3) && IsSecure() && spme == '0' && (PSTATE.EL != EL0 || SDER.SUNIDEN == '0');
// Event counting at EL2 is prohibited if all of:

// * The HPMD Extension is implemented
// * Executing at EL2
// * PMNx is not reserved for EL2
// * HDCR.HPMD == 1
if !prohibited && HaveEL(EL2) && HaveHPMDExt() && PSTATE.EL == EL2 && (n < UInt(hpmn) || n == 31) the
prohibited = (hpmd == '1');
// The IMPLEMENTATION DEFINED authentication interface might override software controls

if prohibited && !HaveNoSecurePMUDisableOverride() then
prohibited = !ExternalSecureNoninvasiveDebugEnabled();
// For the cycle counter, PMCR.DP enables counting when otherwise prohibited
if prohibited && n == 31 then prohibited = (PMCR.DP == '1');
// Event counting can be filtered by the {P, U, NSK, NSU, NSH} bits
filter = if n == 31 then PMCCFILTR else PMEVTYPER[n];
P = filter<31>;
U = filter<30>;
NSK = if HaveEL(EL3) then filter<29> else '0';
NSU = if HaveEL(EL3) then filter<28> else '0';
NSH = if HaveEL(EL2) then filter<27> else '0';
case PSTATE.EL of
when EL0 filtered = if IsSecure() then U == '1' else U != NSU;
when EL1 filtered = if IsSecure() then P == '1' else P != NSK;
when EL2 filtered = (NSH == '0');
when EL3 filtered = (P == '1');
return !debug && enabled && !prohibited && !filtered;

Library pseudocode for aarch32/debug/takeexceptiondbg/AArch32.EnterHypModeInDebugState
// AArch32.EnterHypModeInDebugState()
// ==================================
// Take an exception in Debug state to Hyp mode.
AArch32.EnterHypModeInDebugState(ExceptionRecord exception)
SynchronizeContext();
assert HaveEL(EL2) && !IsSecure() && ELUsingAArch32(EL2);
AArch32.ReportHypEntry(exception);
AArch32.WriteMode(M32_Hyp);
SPSR[] = bits(32) UNKNOWN;
ELR_hyp = bits(32) UNKNOWN;
// In Debug state, the PE always execute T32 instructions when in AArch32 state, and
// PSTATE.{SS,A,I,F} are not observable so behave as UNKNOWN.
PSTATE.T = '1'; // PSTATE.J is RES0
PSTATE.<SS,A,I,F> = bits(4) UNKNOWN;
DLR = bits(32) UNKNOWN;
DSPSR = bits(32) UNKNOWN;
PSTATE.E = HSCTLR.EE;
PSTATE.IL = '0';
PSTATE.IT = '00000000';
if HaveSSBSExt() then PSTATE.SSBS = bit UNKNOWN;
EDSCR.ERR = '1';
UpdateEDSCRFields();
EndOfInstruction();
Library pseudocode for aarch32/debug/takeexceptiondbg/AArch32.EnterModeInDebugState
// AArch32.EnterModeInDebugState()
// ===============================
// Take an exception in Debug state to a mode other than Monitor and Hyp mode.
AArch32.EnterModeInDebugState(bits(5) target_mode)
assert ELUsingAArch32(EL1) && PSTATE.EL != EL2;
if PSTATE.M == M32_Monitor then SCR.NS = '0';

AArch32.WriteMode(target_mode);
R[14] = bits(32) UNKNOWN;
PSTATE.E = SCTLR.EE;
PSTATE.IL = '0';
PSTATE.IT = '00000000';
if HavePANExt() && SCTLR.SPAN == '0' then PSTATE.PAN = '1';
EDSCR.ERR = '1';
UpdateEDSCRFields(); // Update EDSCR processor state flags.
EndOfInstruction();

Library pseudocode for aarch32/debug/takeexceptiondbg/
AArch32.EnterMonitorModeInDebugState
// AArch32.EnterMonitorModeInDebugState()
// ======================================
// Take an exception in Debug state to Monitor mode.
AArch32.EnterMonitorModeInDebugState()
assert HaveEL(EL3) && ELUsingAArch32(EL3);
from_secure = IsSecure();
AArch32.WriteMode(M32_Monitor);
R[14] = bits(32) UNKNOWN;
PSTATE.IL = '0';
PSTATE.IT = '00000000';
if HavePANExt() then
if !from_secure then
PSTATE.PAN = '0';
elsif SCTLR.SPAN == '0' then
PSTATE.PAN = '1';
EDSCR.ERR = '1';
EndOfInstruction();

Library pseudocode for aarch32/debug/watchpoint/AArch32.WatchpointByteMatch
// AArch32.WatchpointByteMatch()
// =============================
boolean AArch32.WatchpointByteMatch(integer n, bits(32) vaddress)
bottom = if DBGWVR[n]<2> == '1' then 2 else 3; // Word or doubleword

byte_select_match = (DBGWCR[n].BAS<UInt(vaddress<bottom-1:0>)> != '0');
mask = UInt(DBGWCR[n].MASK);
// If DBGWCR[n].MASK is non-zero value and DBGWCR[n].BAS is not set to '11111111', or

// DBGWCR[n].BAS specifies a non-contiguous set of bytes behavior is CONSTRAINED
// UNPREDICTABLE.
if mask > 0 && !IsOnes(DBGWCR[n].BAS) then
byte_select_match = ConstrainUnpredictableBool(Unpredictable_WPMASKANDBAS);
else
LSB = (DBGWCR[n].BAS AND NOT(DBGWCR[n].BAS - 1)); MSB = (DBGWCR[n].BAS + LSB);
if !IsZero(MSB AND (MSB - 1)) then // Not contiguous
byte_select_match = ConstrainUnpredictableBool(Unpredictable_WPBASCONTIGUOUS);
bottom = 3; // For the whole doubleword
// If the address mask is set to a reserved value, the behavior is CONSTRAINED UNPREDICTABLE.
if mask > 0 && mask <= 2 then
(c, mask) = ConstrainUnpredictableInteger(3, 31, Unpredictable_RESWPMASK);
case c of
when Constraint_NONE mask = 0; // No masking
// Otherwise the value returned by ConstrainUnpredictableInteger is a not-reserved value
if mask > bottom then

WVR_match = (vaddress<31:mask> == DBGWVR[n]<31:mask>);
// If masked bits of DBGWVR_EL1[n] are not zero, the behavior is CONSTRAINED UNPREDICTABLE.
if WVR_match && !IsZero(DBGWVR[n]<mask-1:bottom>) then
WVR_match = ConstrainUnpredictableBool(Unpredictable_WPMASKEDBITS);
else
WVR_match = vaddress<31:bottom> == DBGWVR[n]<31:bottom>;
return WVR_match && byte_select_match;
Library pseudocode for aarch32/debug/watchpoint/AArch32.WatchpointMatch
// AArch32.WatchpointMatch()
// =========================
// Watchpoint matching in an AArch32 translation regime.
boolean AArch32.WatchpointMatch(integer n, bits(32) vaddress, integer size, boolean ispriv,

boolean iswrite)
assert n <= UInt(DBGDIDR.WRPs);
// "ispriv" is FALSE for LDRT/STRT instructions executed at EL1 and all

// load/stores at EL0, TRUE for all other load/stores. "iswrite" is TRUE for stores, FALSE for
// loads.
enabled = DBGWCR[n].E == '1';
linked = DBGWCR[n].WT == '1';
isbreakpnt = FALSE;
state_match = AArch32.StateMatch(DBGWCR[n].SSC, DBGWCR[n].HMC, DBGWCR[n].PAC,

linked, DBGWCR[n].LBN, isbreakpnt, ispriv);
ls_match = (DBGWCR[n].LSC<(if iswrite then 1 else 0)> == '1');
value_match = FALSE;
for byte = 0 to size - 1
value_match = value_match || AArch32.WatchpointByteMatch(n, vaddress + byte);
return value_match && state_match && ls_match && enabled;

Library pseudocode for aarch32/exceptions/aborts/AArch32.Abort
// AArch32.Abort()
// ===============
// Abort and Debug exception handling in an AArch32 translation regime.
AArch32.Abort(bits(32) vaddress, FaultRecord fault)
// Check if routed to AArch64 state

route_to_aarch64 = PSTATE.EL == EL0 && !ELUsingAArch32(EL1);
if !route_to_aarch64 && EL2Enabled() && !ELUsingAArch32(EL2) then

route_to_aarch64 = (HCR_EL2.TGE == '1' || IsSecondStage(fault) ||
(HaveRASExt() && HCR2.TEA == '1' && IsExternalAbort(fault)) ||
(IsDebugException(fault) && MDCR_EL2.TDE == '1'));
if !route_to_aarch64 && HaveEL(EL3) && !ELUsingAArch32(EL3) then

route_to_aarch64 = SCR_EL3.EA == '1' && IsExternalAbort(fault);
if route_to_aarch64 then
AArch64.Abort(ZeroExtend(vaddress), fault);
elsif fault.acctype == AccType_IFETCH then
AArch32.TakePrefetchAbortException(vaddress, fault);
else
AArch32.TakeDataAbortException(vaddress, fault);
Library pseudocode for aarch32/exceptions/aborts/AArch32.AbortSyndrome
// AArch32.AbortSyndrome()
// =======================
// Creates an exception syndrome record for Abort exceptions taken to Hyp mode
// from an AArch32 translation regime.
ExceptionRecord AArch32.AbortSyndrome(Exception exceptype, FaultRecord fault, bits(32) vaddress)

exception = ExceptionSyndrome(exceptype);
d_side = exceptype == Exception_DataAbort;
exception.syndrome = AArch32.FaultSyndrome(d_side, fault);

exception.vaddress = ZeroExtend(vaddress);
if IPAValid(fault) then
exception.ipavalid = TRUE;
exception.NS = fault.ipaddress.NS;
exception.ipaddress = ZeroExtend(fault.ipaddress.address);
else
exception.ipavalid = FALSE;
return exception;
Library pseudocode for aarch32/exceptions/aborts/AArch32.CheckPCAlignment
// AArch32.CheckPCAlignment()
// ==========================
AArch32.CheckPCAlignment()
bits(32) pc = ThisInstrAddr();
if (CurrentInstrSet() == InstrSet_A32 && pc<1> == '1') || pc<0> == '1' then
if AArch32.GeneralExceptionsToAArch64() then AArch64.PCAlignmentFault();
// Generate an Alignment fault Prefetch Abort exception

vaddress = pc;
acctype = AccType_IFETCH;
iswrite = FALSE;
secondstage = FALSE;
AArch32.Abort(vaddress, AArch32.AlignmentFault(acctype, iswrite, secondstage));

Library pseudocode for aarch32/exceptions/aborts/AArch32.ReportDataAbort
// AArch32.ReportDataAbort()
// =========================
// Report syndrome information for aborts taken to modes other than Hyp mode.
AArch32.ReportDataAbort(boolean route_to_monitor, FaultRecord fault, bits(32) vaddress)
// The encoding used in the IFSR or DFSR can be Long-descriptor format or Short-descriptor
// format. Normally, the current translation table format determines the format. For an abort
// from Non-secure state to Monitor mode, the IFSR or DFSR uses the Long-descriptor format if
// any of the following applies:
// * The Secure TTBCR.EAE is set to 1.
// * The abort is synchronous and either:
// - It is taken from Hyp mode.
// - It is taken from EL1 or EL0, and the Non-secure TTBCR.EAE is set to 1.
long_format = FALSE;
if route_to_monitor && !IsSecure() then
long_format = TTBCR_S.EAE == '1';
if !IsSErrorInterrupt(fault) && !long_format then
long_format = PSTATE.EL == EL2 || TTBCR.EAE == '1';
else
long_format = TTBCR.EAE == '1';
d_side = TRUE;
if long_format then
syndrome = AArch32.FaultStatusLD(d_side, fault);
else
syndrome = AArch32.FaultStatusSD(d_side, fault);
if fault.acctype == AccType_IC then

if (!long_format &&
boolean IMPLEMENTATION_DEFINED "Report I-cache maintenance fault in IFSR") then
i_syndrome = syndrome;
syndrome<10,3:0> = EncodeSDFSC(Fault_ICacheMaint, 1);
else
i_syndrome = bits(32) UNKNOWN;
if route_to_monitor then
IFSR_S = i_syndrome;
else
IFSR = i_syndrome;
DFSR_S = syndrome;
DFAR_S = vaddress;
else
DFSR = syndrome;
DFAR = vaddress;
return;

Library pseudocode for aarch32/exceptions/aborts/AArch32.ReportPrefetchAbort
// AArch32.ReportPrefetchAbort()
// =============================
// Report syndrome information for aborts taken to modes other than Hyp mode.
AArch32.ReportPrefetchAbort(boolean route_to_monitor, FaultRecord fault, bits(32) vaddress)

// The encoding used in the IFSR can be Long-descriptor format or Short-descriptor format.
// Normally, the current translation table format determines the format. For an abort from
// Non-secure state to Monitor mode, the IFSR uses the Long-descriptor format if any of the
// following applies:
// * The Secure TTBCR.EAE is set to 1.
// * It is taken from Hyp mode.
// * It is taken from EL1 or EL0, and the Non-secure TTBCR.EAE is set to 1.
long_format = FALSE;
if route_to_monitor && !IsSecure() then
long_format = TTBCR_S.EAE == '1' || PSTATE.EL == EL2 || TTBCR.EAE == '1';
else
long_format = TTBCR.EAE == '1';
d_side = FALSE;
if long_format then
fsr = AArch32.FaultStatusLD(d_side, fault);
else
fsr = AArch32.FaultStatusSD(d_side, fault);
IFSR_S = fsr;
IFAR_S = vaddress;
else
IFSR = fsr;
IFAR = vaddress;
return;
Library pseudocode for aarch32/exceptions/aborts/AArch32.TakeDataAbortException
// AArch32.TakeDataAbortException()
// ================================
AArch32.TakeDataAbortException(bits(32) vaddress, FaultRecord fault)

route_to_monitor = HaveEL(EL3) && SCR.EA == '1' && IsExternalAbort(fault);
route_to_hyp = (HaveEL(EL2) && !IsSecure() && PSTATE.EL IN {EL0, EL1} &&
(HCR.TGE == '1' || IsSecondStage(fault) ||
(IsDebugException(fault) && HDCR.TDE == '1')));
bits(32) preferred_exception_return = ThisInstrAddr();
vect_offset = 0x10;
lr_offset = 8;
if IsDebugException(fault) then DBGDSCRext.MOE = fault.debugmoe;

AArch32.ReportDataAbort(route_to_monitor, fault, vaddress);
AArch32.EnterMonitorMode(preferred_exception_return, lr_offset, vect_offset);
elsif PSTATE.EL == EL2 || route_to_hyp then
exception = AArch32.AbortSyndrome(Exception_DataAbort, fault, vaddress);
AArch32.EnterHypMode(exception, preferred_exception_return, vect_offset);
else
AArch32.EnterHypMode(exception, preferred_exception_return, 0x14);
else
AArch32.EnterMode(M32_Abort, preferred_exception_return, lr_offset, vect_offset);

Library pseudocode for aarch32/exceptions/aborts/AArch32.TakePrefetchAbortException
// AArch32.TakePrefetchAbortException()
// ====================================
AArch32.TakePrefetchAbortException(bits(32) vaddress, FaultRecord fault)

route_to_monitor = HaveEL(EL3) && SCR.EA == '1' && IsExternalAbort(fault);
route_to_hyp = (HaveEL(EL2) && !IsSecure() && PSTATE.EL IN {EL0, EL1} &&
(HCR.TGE == '1' || IsSecondStage(fault) ||
(IsDebugException(fault) && HDCR.TDE == '1')));
vect_offset = 0x0C;
lr_offset = 4;
if IsDebugException(fault) then DBGDSCRext.MOE = fault.debugmoe;

AArch32.ReportPrefetchAbort(route_to_monitor, fault, vaddress);
if fault.statuscode == Fault_Alignment then // PC Alignment fault
exception = ExceptionSyndrome(Exception_PCAlignment);
exception.vaddress = ThisInstrAddr();
else
exception = AArch32.AbortSyndrome(Exception_InstructionAbort, fault, vaddress);
else
else
AArch32.ReportPrefetchAbort(route_to_monitor, fault, vaddress);
Library pseudocode for aarch32/exceptions/aborts/BranchTargetException
// BranchTargetException
// =====================
// Raise branch target exception.
AArch64.BranchTargetException(bits(52) vaddress)
route_to_el2 = PSTATE.EL == EL0 && EL2Enabled() && HCR_EL2.TGE == '1';

vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_BranchTarget);
exception.syndrome<1:0> = PSTATE.BTYPE;
exception.syndrome<24:2> = Zeros(); // RES0
if UInt(PSTATE.EL) > UInt(EL1) then

AArch64.TakeException(PSTATE.EL, exception, preferred_exception_return, vect_offset);
elsif route_to_el2 then
AArch64.TakeException(EL2, exception, preferred_exception_return, vect_offset);
else

Library pseudocode for aarch32/exceptions/asynch/AArch32.TakePhysicalFIQException
// AArch32.TakePhysicalFIQException()
// ==================================
AArch32.TakePhysicalFIQException()

route_to_aarch64 = HCR_EL2.TGE == '1' || (HCR_EL2.FMO == '1' && !IsInHost());

route_to_aarch64 = SCR_EL3.FIQ == '1';
if route_to_aarch64 then AArch64.TakePhysicalFIQException();

route_to_monitor = HaveEL(EL3) && SCR.FIQ == '1';
route_to_hyp = (PSTATE.EL IN {EL0, EL1} && EL2Enabled() &&
(HCR.TGE == '1' || HCR.FMO == '1'));
vect_offset = 0x1C;
lr_offset = 4;
exception = ExceptionSyndrome(Exception_FIQ);
else
AArch32.EnterMode(M32_FIQ, preferred_exception_return, lr_offset, vect_offset);
Library pseudocode for aarch32/exceptions/asynch/AArch32.TakePhysicalIRQException
// AArch32.TakePhysicalIRQException()
// ==================================
// Take an enabled physical IRQ exception.
AArch32.TakePhysicalIRQException()

route_to_aarch64 = HCR_EL2.TGE == '1' || (HCR_EL2.IMO == '1' && !IsInHost());
route_to_aarch64 = SCR_EL3.IRQ == '1';
if route_to_aarch64 then AArch64.TakePhysicalIRQException();
route_to_monitor = HaveEL(EL3) && SCR.IRQ == '1';

(HCR.TGE == '1' || HCR.IMO == '1'));
vect_offset = 0x18;
lr_offset = 4;
exception = ExceptionSyndrome(Exception_IRQ);
else
AArch32.EnterMode(M32_IRQ, preferred_exception_return, lr_offset, vect_offset);

Library pseudocode for aarch32/exceptions/asynch/AArch32.TakePhysicalSErrorException
// AArch32.TakePhysicalSErrorException()
// =====================================
AArch32.TakePhysicalSErrorException(boolean parity, bit extflag, bits(2) errortype,

boolean impdef_syndrome, bits(24) full_syndrome)
ClearPendingPhysicalSError();

route_to_aarch64 = (HCR_EL2.TGE == '1' || (!IsInHost() && HCR_EL2.AMO == '1'));
route_to_aarch64 = SCR_EL3.EA == '1';
AArch64.TakePhysicalSErrorException(impdef_syndrome, full_syndrome);
route_to_monitor = HaveEL(EL3) && SCR.EA == '1';

(HCR.TGE == '1' || HCR.AMO == '1'));
vect_offset = 0x10;
lr_offset = 8;
fault = AArch32.AsynchExternalAbort(parity, errortype, extflag);

else
else
Library pseudocode for aarch32/exceptions/asynch/AArch32.TakeVirtualFIQException
// AArch32.TakeVirtualFIQException()
// =================================
AArch32.TakeVirtualFIQException()
assert PSTATE.EL IN {EL0, EL1} && EL2Enabled();
if ELUsingAArch32(EL2) then // Virtual IRQ enabled if TGE==0 and FMO==1
assert HCR.TGE == '0' && HCR.FMO == '1';
else
assert HCR_EL2.TGE == '0' && HCR_EL2.FMO == '1';
if PSTATE.EL == EL0 && !ELUsingAArch32(EL1) then AArch64.TakeVirtualFIQException();

vect_offset = 0x1C;
lr_offset = 4;
AArch32.EnterMode(M32_FIQ, preferred_exception_return, lr_offset, vect_offset);

Library pseudocode for aarch32/exceptions/asynch/AArch32.TakeVirtualIRQException
// AArch32.TakeVirtualIRQException()
// =================================
AArch32.TakeVirtualIRQException()
if ELUsingAArch32(EL2) then // Virtual IRQs enabled if TGE==0 and IMO==1

assert HCR.TGE == '0' && HCR.IMO == '1';
else
assert HCR_EL2.TGE == '0' && HCR_EL2.IMO == '1';

if PSTATE.EL == EL0 && !ELUsingAArch32(EL1) then AArch64.TakeVirtualIRQException();

vect_offset = 0x18;
lr_offset = 4;
AArch32.EnterMode(M32_IRQ, preferred_exception_return, lr_offset, vect_offset);
Library pseudocode for aarch32/exceptions/asynch/AArch32.TakeVirtualSErrorException
// AArch32.TakeVirtualSErrorException()
// ====================================
AArch32.TakeVirtualSErrorException(bit extflag, bits(2) errortype, boolean impdef_syndrome, bits(24) full

if ELUsingAArch32(EL2) then // Virtual SError enabled if TGE==0 and AMO==1
assert HCR.TGE == '0' && HCR.AMO == '1';
else
assert HCR_EL2.TGE == '0' && HCR_EL2.AMO == '1';
if PSTATE.EL == EL0 && !ELUsingAArch32(EL1) then AArch64.TakeVirtualSErrorException(impdef_syndrome,
route_to_monitor = FALSE;

vect_offset = 0x10;
lr_offset = 8;

parity = FALSE;
if HaveRASExt() then
fault = AArch32.AsynchExternalAbort(FALSE, VDFSR.AET, VDFSR.ExT);
else
fault = AArch32.AsynchExternalAbort(FALSE, VSESR_EL2.AET, VSESR_EL2.ExT);
else
fault = AArch32.AsynchExternalAbort(parity, errortype, extflag);
ClearPendingVirtualSError();

Library pseudocode for aarch32/exceptions/debug/AArch32.SoftwareBreakpoint
// AArch32.SoftwareBreakpoint()
// ============================
AArch32.SoftwareBreakpoint(bits(16) immediate)
if (EL2Enabled() && !ELUsingAArch32(EL2) &&

(HCR_EL2.TGE == '1' || MDCR_EL2.TDE == '1')) || !ELUsingAArch32(EL1) then
AArch64.SoftwareBreakpoint(immediate);
acctype = AccType_IFETCH; // Take as a Prefetch Abort
iswrite = FALSE;
entry = DebugException_BKPT;
fault = AArch32.DebugFault(acctype, iswrite, entry);

AArch32.Abort(vaddress, fault);
Library pseudocode for aarch32/exceptions/debug/DebugException
constant bits(4) DebugException_Breakpoint = '0001';

constant bits(4) DebugException_BKPT = '0011';
constant bits(4) DebugException_VectorCatch = '0101';
constant bits(4) DebugException_Watchpoint = '1010';
Library pseudocode for aarch32/exceptions/exceptions/

AArch32.CheckAdvSIMDOrFPRegisterTraps
// AArch32.CheckAdvSIMDOrFPRegisterTraps()
// =======================================
// Check if an instruction that accesses an Advanced SIMD and
// floating-point System register is trapped by an appropriate HCR.TIDx
// ID group trap control.
AArch32.CheckAdvSIMDOrFPRegisterTraps(bits(4) reg)
if PSTATE.EL == EL1 && EL2Enabled() then

tid0 = if ELUsingAArch32(EL2) then HCR.TID0 else HCR_EL2.TID0;
tid3 = if ELUsingAArch32(EL2) then HCR.TID3 else HCR_EL2.TID3;
if (tid0 == '1' && reg == '0000') // FPSID

|| (tid3 == '1' && reg IN {'0101', '0110', '0111'}) then // MVFRx
AArch32.SystemAccessTrap(M32_Hyp, 0x8); // Exception_AdvSIMDFPAccessTrap
else
AArch64.AArch32SystemAccessTrap(EL2, 0x8); // Exception_AdvSIMDFPAccessTrap

Library pseudocode for aarch32/exceptions/exceptions/AArch32.ExceptionClass
// AArch32.ExceptionClass()
// ========================
// Returns the Exception Class and Instruction Length fields to be reported in HSR
(integer,bit) AArch32.ExceptionClass(Exception exceptype)
il = if ThisInstrLength() == 32 then '1' else '0';
case exceptype of
when Exception_Uncategorized ec = 0x00; il = '1';
when Exception_WFxTrap ec = 0x01;
when Exception_CP15RTTrap ec = 0x03;
when Exception_CP15RRTTrap ec = 0x04;
when Exception_CP14RTTrap ec = 0x05;
when Exception_CP14DTTrap ec = 0x06;
when Exception_AdvSIMDFPAccessTrap ec = 0x07;
when Exception_FPIDTrap ec = 0x08;
when Exception_PACTrap ec = 0x09;
when Exception_CP14RRTTrap ec = 0x0C;
when Exception_BranchTarget ec = 0x0D;
when Exception_IllegalState ec = 0x0E; il = '1';
when Exception_SupervisorCall ec = 0x11;
when Exception_HypervisorCall ec = 0x12;
when Exception_MonitorCall ec = 0x13;
when Exception_ERetTrap ec = 0x1A;
when Exception_PACFail ec = 0x1C;
when Exception_InstructionAbort ec = 0x20; il = '1';
when Exception_PCAlignment ec = 0x22; il = '1';
when Exception_DataAbort ec = 0x24;
when Exception_NV2DataAbort ec = 0x25;
when Exception_FPTrappedException ec = 0x28;
otherwise Unreachable();
if ec IN {0x20,0x24} && PSTATE.EL == EL2 then

ec = ec + 1;
return (ec,il);
Library pseudocode for aarch32/exceptions/exceptions/AArch32.GeneralExceptionsToAArch64
// AArch32.GeneralExceptionsToAArch64()
// ====================================
// Returns TRUE if exceptions normally routed to EL1 are being handled at an Exception
// level using AArch64, because either EL1 is using AArch64 or TGE is in force and EL2
// is using AArch64.
boolean AArch32.GeneralExceptionsToAArch64()
return ((PSTATE.EL == EL0 && !ELUsingAArch32(EL1)) ||
(EL2Enabled() && !ELUsingAArch32(EL2) && HCR_EL2.TGE == '1'));

Library pseudocode for aarch32/exceptions/exceptions/AArch32.ReportHypEntry
// AArch32.ReportHypEntry()
// ========================
// Report syndrome information to Hyp mode registers.
AArch32.ReportHypEntry(ExceptionRecord exception)
Exception exceptype = exception.exceptype;
(ec,il) = AArch32.ExceptionClass(exceptype);
iss = exception.syndrome;
// IL is not valid for Data Abort exceptions without valid instruction syndrome information
if ec IN {0x24,0x25} && iss<24> == '0' then
il = '1';
HSR = ec<5:0>:il:iss;
if exceptype IN {Exception_InstructionAbort, Exception_PCAlignment} then

HIFAR = exception.vaddress<31:0>;
HDFAR = bits(32) UNKNOWN;
elsif exceptype == Exception_DataAbort then
HIFAR = bits(32) UNKNOWN;
HDFAR = exception.vaddress<31:0>;
if exception.ipavalid then
HPFAR<31:4> = exception.ipaddress<39:12>;
else
HPFAR<31:4> = bits(28) UNKNOWN;
return;
Library pseudocode for aarch32/exceptions/exceptions/AArch32.ResetControlRegisters
// Resets System registers and memory-mapped control registers that have architecturally-defined
// reset values to those values.
AArch32.ResetControlRegisters(boolean cold_reset);

Library pseudocode for aarch32/exceptions/exceptions/AArch32.TakeReset
// AArch32.TakeReset()
// ===================
// Reset into AArch32 state
AArch32.TakeReset(boolean cold_reset)
assert HighestELUsingAArch32();
// Enter the highest implemented Exception level in AArch32 state

if HaveEL(EL3) then
AArch32.WriteMode(M32_Svc);
SCR.NS = '0'; // Secure state
elsif HaveEL(EL2) then
else
AArch32.WriteMode(M32_Svc);
// Reset the CP14 and CP15 registers and other system components
AArch32.ResetControlRegisters(cold_reset);
FPEXC.EN = '0';
// Reset all other PSTATE fields, including instruction set and endianness according to the
// SCTLR values produced by the above call to ResetControlRegisters()
PSTATE.<A,I,F> = '111'; // All asynchronous exceptions masked
PSTATE.IT = '00000000'; // IT block state reset
PSTATE.T = SCTLR.TE; // Instruction set: TE=0: A32, TE=1: T32. PSTATE.J is RES0.
PSTATE.E = SCTLR.EE; // Endianness: EE=0: little-endian, EE=1: big-endian
PSTATE.IL = '0'; // Clear Illegal Execution state bit
// All registers, bits and fields not reset by the above pseudocode or by the BranchTo() call
// below are UNKNOWN bitstrings after reset. In particular, the return information registers
// R14 or ELR_hyp and SPSR have UNKNOWN values, so that it
// is impossible to return from a reset in an architecturally defined way.
AArch32.ResetGeneralRegisters();
AArch32.ResetSIMDFPRegisters();
AArch32.ResetSpecialRegisters();
ResetExternalDebugRegisters(cold_reset);
bits(32) rv; // IMPLEMENTATION DEFINED reset vector
if HaveEL(EL3) then
if MVBAR<0> == '1' then // Reset vector in MVBAR
rv = MVBAR<31:1>:'0';
else
rv = bits(32) IMPLEMENTATION_DEFINED "reset vector address";
else
rv = RVBAR<31:1>:'0';
// The reset vector must be correctly aligned

assert rv<0> == '0' && (PSTATE.T == '1' || rv<1> == '0');
BranchTo(rv, BranchType_RESET);
Library pseudocode for aarch32/exceptions/exceptions/ExcVectorBase
// ExcVectorBase()
// ===============
bits(32) ExcVectorBase()
if SCTLR.V == '1' then // Hivecs selected, base = 0xFFFF0000
return Ones(16):Zeros(16);
else
return VBAR<31:5>:Zeros(5);

Library pseudocode for aarch32/exceptions/ieeefp/AArch32.FPTrappedException
// AArch32.FPTrappedException()
// ============================
AArch32.FPTrappedException(bits(8) accumulated_exceptions)
if AArch32.GeneralExceptionsToAArch64() then
is_ase = FALSE;
element = 0;
AArch64.FPTrappedException(is_ase, element, accumulated_exceptions);
FPEXC.DEX = '1';
FPEXC.TFV = '1';
FPEXC<7,4:0> = accumulated_exceptions<7,4:0>; // IDF,IXF,UFF,OFF,DZF,IOF
FPEXC<10:8> = '111'; // VECITR is RES1
AArch32.TakeUndefInstrException();
Library pseudocode for aarch32/exceptions/syscalls/AArch32.CallHypervisor
// AArch32.CallHypervisor()
// ========================
// Performs a HVC call
AArch32.CallHypervisor(bits(16) immediate)
assert HaveEL(EL2);
if !ELUsingAArch32(EL2) then
AArch64.CallHypervisor(immediate);
else
AArch32.TakeHVCException(immediate);
Library pseudocode for aarch32/exceptions/syscalls/AArch32.CallSupervisor
// AArch32.CallSupervisor()
// ========================
// Calls the Supervisor
AArch32.CallSupervisor(bits(16) immediate)
if AArch32.CurrentCond() != '1110' then

immediate = bits(16) UNKNOWN;
AArch64.CallSupervisor(immediate);
else
AArch32.TakeSVCException(immediate);
Library pseudocode for aarch32/exceptions/syscalls/AArch32.TakeHVCException
// AArch32.TakeHVCException()
// ==========================
AArch32.TakeHVCException(bits(16) immediate)
AArch32.ITAdvance();
SSAdvance();
bits(32) preferred_exception_return = NextInstrAddr();
vect_offset = 0x08;
exception = ExceptionSyndrome(Exception_HypervisorCall);
exception.syndrome<15:0> = immediate;

else

Library pseudocode for aarch32/exceptions/syscalls/AArch32.TakeSMCException
// AArch32.TakeSMCException()
// ==========================
AArch32.TakeSMCException()
SSAdvance();
vect_offset = 0x08;
lr_offset = 0;
Library pseudocode for aarch32/exceptions/syscalls/AArch32.TakeSVCException
// AArch32.TakeSVCException()
// ==========================
AArch32.TakeSVCException(bits(16) immediate)
SSAdvance();
route_to_hyp = PSTATE.EL == EL0 && EL2Enabled() && HCR.TGE == '1';

vect_offset = 0x08;
lr_offset = 0;
if PSTATE.EL == EL2 || route_to_hyp then

exception = ExceptionSyndrome(Exception_SupervisorCall);
else
else
AArch32.EnterMode(M32_Svc, preferred_exception_return, lr_offset, vect_offset);

Library pseudocode for aarch32/exceptions/takeexception/AArch32.EnterHypMode
// AArch32.EnterHypMode()
// ======================
// Take an exception to Hyp mode.
AArch32.EnterHypMode(ExceptionRecord exception, bits(32) preferred_exception_return,

integer vect_offset)
spsr = GetPSRFromPSTATE();
if !(exception.exceptype IN {Exception_IRQ, Exception_FIQ}) then
AArch32.ReportHypEntry(exception);
SPSR[] = spsr;
ELR_hyp = preferred_exception_return;
PSTATE.T = HSCTLR.TE; // PSTATE.J is RES0
PSTATE.SS = '0';
if !HaveEL(EL3) || SCR_GEN[].EA == '0' then PSTATE.A = '1';
if !HaveEL(EL3) || SCR_GEN[].IRQ == '0' then PSTATE.I = '1';
if !HaveEL(EL3) || SCR_GEN[].FIQ == '0' then PSTATE.F = '1';
PSTATE.E = HSCTLR.EE;
PSTATE.IL = '0';
PSTATE.IT = '00000000';
if HaveSSBSExt() then PSTATE.SSBS = HSCTLR.DSSBS;
BranchTo(HVBAR<31:5>:vect_offset<4:0>, BranchType_EXCEPTION);
EndOfInstruction();
Library pseudocode for aarch32/exceptions/takeexception/AArch32.EnterMode
// AArch32.EnterMode()
// ===================
// Take an exception to a mode other than Monitor and Hyp mode.
AArch32.EnterMode(bits(5) target_mode, bits(32) preferred_exception_return, integer lr_offset,

assert ELUsingAArch32(EL1) && PSTATE.EL != EL2;
AArch32.WriteMode(target_mode);
SPSR[] = spsr;
R[14] = preferred_exception_return + lr_offset;
PSTATE.T = SCTLR.TE; // PSTATE.J is RES0
PSTATE.SS = '0';
if target_mode == M32_FIQ then
PSTATE.<A,I,F> = '111';
elsif target_mode IN {M32_Abort, M32_IRQ} then
PSTATE.<A,I> = '11';
else
PSTATE.I = '1';
PSTATE.IL = '0';
PSTATE.IT = '00000000';
if HavePANExt() && SCTLR.SPAN == '0' then PSTATE.PAN = '1';
if HaveSSBSExt() then PSTATE.SSBS = SCTLR.DSSBS;
BranchTo(ExcVectorBase()<31:5>:vect_offset<4:0>, BranchType_EXCEPTION);
EndOfInstruction();

Library pseudocode for aarch32/exceptions/takeexception/AArch32.EnterMonitorMode
// AArch32.EnterMonitorMode()
// ==========================
// Take an exception to Monitor mode.
AArch32.EnterMonitorMode(bits(32) preferred_exception_return, integer lr_offset,

from_secure = IsSecure();
AArch32.WriteMode(M32_Monitor);
SPSR[] = spsr;
R[14] = preferred_exception_return + lr_offset;
PSTATE.T = SCTLR.TE; // PSTATE.J is RES0
PSTATE.SS = '0';
PSTATE.<A,I,F> = '111';
PSTATE.IL = '0';
PSTATE.IT = '00000000';
if HavePANExt() then
if !from_secure then
PSTATE.PAN = '0';
elsif SCTLR.SPAN == '0' then
PSTATE.PAN = '1';
if HaveSSBSExt() then PSTATE.SSBS = SCTLR.DSSBS;
BranchTo(MVBAR<31:5>:vect_offset<4:0>, BranchType_EXCEPTION);
EndOfInstruction();

Library pseudocode for aarch32/exceptions/traps/AArch32.CheckAdvSIMDOrFPEnabled
// AArch32.CheckAdvSIMDOrFPEnabled()
// =================================
// Check against CPACR, FPEXC, HCPTR, NSACR, and CPTR_EL3.
AArch32.CheckAdvSIMDOrFPEnabled(boolean fpexc_check, boolean advsimd)

if PSTATE.EL == EL0 && (!HaveEL(EL2) || (!ELUsingAArch32(EL2) && HCR_EL2.TGE == '0')) && !ELUsingAArc
// The PE behaves as if FPEXC.EN is 1
elsif PSTATE.EL == EL0 && HaveEL(EL2) && !ELUsingAArch32(EL2) && HCR_EL2.TGE == '1' && !ELUsingAArch3
if fpexc_check && HCR_EL2.RW == '0' then
fpexc_en = bits(1) IMPLEMENTATION_DEFINED "FPEXC.EN value when TGE==1 and RW==0";
if fpexc_en == '0' then UNDEFINED;
else
cpacr_asedis = CPACR.ASEDIS;
cpacr_cp10 = CPACR.cp10;
if HaveEL(EL3) && ELUsingAArch32(EL3) && !IsSecure() then

// Check if access disabled in NSACR
if NSACR.NSASEDIS == '1' then cpacr_asedis = '1';
if NSACR.cp10 == '0' then cpacr_cp10 = '00';
if PSTATE.EL != EL2 then

// Check if Advanced SIMD disabled in CPACR
if advsimd && cpacr_asedis == '1' then UNDEFINED;
// Check if access disabled in CPACR

case cpacr_cp10 of
when '00' disabled = TRUE;
when '01' disabled = PSTATE.EL == EL0;
when '10' disabled = ConstrainUnpredictableBool(Unpredictable_RESCPACR);
when '11' disabled = FALSE;
if disabled then UNDEFINED;
// If required, check FPEXC enabled bit.

if fpexc_check && FPEXC.EN == '0' then UNDEFINED;
AArch32.CheckFPAdvSIMDTrap(advsimd); // Also check against HCPTR and CPTR_EL3

Library pseudocode for aarch32/exceptions/traps/AArch32.CheckFPAdvSIMDTrap
// AArch32.CheckFPAdvSIMDTrap()
// ============================
// Check against CPTR_EL2 and CPTR_EL3.
AArch32.CheckFPAdvSIMDTrap(boolean advsimd)
if EL2Enabled() && !ELUsingAArch32(EL2) then
AArch64.CheckFPAdvSIMDTrap();
else
if HaveEL(EL2) && !IsSecure() then
hcptr_tase = HCPTR.TASE;
hcptr_cp10 = HCPTR.TCP10;
if HaveEL(EL3) && ELUsingAArch32(EL3) && !IsSecure() then

// Check if access disabled in NSACR
if NSACR.NSASEDIS == '1' then hcptr_tase = '1';
if NSACR.cp10 == '0' then hcptr_cp10 = '1';
// Check if access disabled in HCPTR

if (advsimd && hcptr_tase == '1') || hcptr_cp10 == '1' then
exception = ExceptionSyndrome(Exception_AdvSIMDFPAccessTrap);
exception.syndrome<24:20> = ConditionSyndrome();
if advsimd then
exception.syndrome<5> = '1';
else
exception.syndrome<3:0> = '1010'; // coproc field, always 0xA

AArch32.TakeUndefInstrException(exception);
else
AArch32.TakeHypTrapException(exception);
if HaveEL(EL3) && !ELUsingAArch32(EL3) then

// Check if access disabled in CPTR_EL3
if CPTR_EL3.TFP == '1' then AArch64.AdvSIMDFPAccessTrap(EL3);
return;
Library pseudocode for aarch32/exceptions/traps/AArch32.CheckForSMCUndefOrTrap
// AArch32.CheckForSMCUndefOrTrap()
// ================================
// Check for UNDEFINED or trap on SMC instruction
AArch32.CheckForSMCUndefOrTrap()
if !HaveEL(EL3) || PSTATE.EL == EL0 then
UNDEFINED;
if EL2Enabled() && !ELUsingAArch32(EL2) then

AArch64.CheckForSMCUndefOrTrap(Zeros(16));
else
route_to_hyp = HaveEL(EL2) && !IsSecure() && PSTATE.EL == EL1 && HCR.TSC == '1';
if route_to_hyp then
exception = ExceptionSyndrome(Exception_MonitorCall);

Library pseudocode for aarch32/exceptions/traps/AArch32.CheckForSVCTrap
// AArch32.CheckForSVCTrap()
// =========================
// Check for trap on SVC instruction
AArch32.CheckForSVCTrap(bits(16) immediate)
if HaveFGTExt() then
route_to_el2 = FALSE;
route_to_el2 = !ELUsingAArch32(EL1) && EL2Enabled() && HFGITR_EL2.SVC_EL0 == '1' &&
if route_to_el2 then
vect_offset = 0x0;
Library pseudocode for aarch32/exceptions/traps/AArch32.CheckForWFxTrap
// AArch32.CheckForWFxTrap()
// =========================
// Check for trap on WFE or WFI instruction
AArch32.CheckForWFxTrap(bits(2) target_el, boolean is_wfe)

assert HaveEL(target_el);
// Check for routing to AArch64

if !ELUsingAArch32(target_el) then
AArch64.CheckForWFxTrap(target_el, is_wfe);
return;
case target_el of
when EL1
trap = (if is_wfe then SCTLR.nTWE else SCTLR.nTWI) == '0';
when EL2
trap = (if is_wfe then HCR.TWE else HCR.TWI) == '1';
when EL3
trap = (if is_wfe then SCR.TWE else SCR.TWI) == '1';
if trap then
if target_el == EL1 && EL2Enabled() && !ELUsingAArch32(EL2) && HCR_EL2.TGE == '1' then
AArch64.WFxTrap(target_el, is_wfe);
if target_el == EL3 then

AArch32.TakeMonitorTrapException();
elsif target_el == EL2 then
exception = ExceptionSyndrome(Exception_WFxTrap);
exception.syndrome<0> = if is_wfe then '1' else '0';
else

Library pseudocode for aarch32/exceptions/traps/AArch32.CheckITEnabled
// AArch32.CheckITEnabled()
// ========================
// Check whether the T32 IT instruction is disabled.
AArch32.CheckITEnabled(bits(4) mask)
it_disabled = HSCTLR.ITD;
else
it_disabled = (if ELUsingAArch32(EL1) then SCTLR.ITD else SCTLR[].ITD);
if it_disabled == '1' then
if mask != '1000' then UNDEFINED;
// Otherwise whether the IT block is allowed depends on hw1 of the next instruction.
next_instr = AArch32.MemSingle[NextInstrAddr(), 2, AccType_IFETCH, TRUE];
if next_instr IN {'11xxxxxxxxxxxxxx', '1011xxxxxxxxxxxx', '10100xxxxxxxxxxx',

'01001xxxxxxxxxxx', '010001xxx1111xxx', '010001xx1xxxx111'} then
// It is IMPLEMENTATION DEFINED whether the Undefined Instruction exception is
// taken on the IT instruction or the next instruction. This is not reflected in
// the pseudocode, which always takes the exception on the IT instruction. This
// also does not take into account cases where the next instruction is UNPREDICTABLE.
UNDEFINED;
return;
Library pseudocode for aarch32/exceptions/traps/AArch32.CheckIllegalState
// AArch32.CheckIllegalState()
// ===========================
// Check PSTATE.IL bit and generate Illegal Execution state exception if set.
AArch32.CheckIllegalState()
AArch64.CheckIllegalState();
elsif PSTATE.IL == '1' then

vect_offset = 0x04;
if PSTATE.EL == EL2 || route_to_hyp then

exception = ExceptionSyndrome(Exception_IllegalState);
else
else
Library pseudocode for aarch32/exceptions/traps/AArch32.CheckSETENDEnabled
// AArch32.CheckSETENDEnabled()
// ============================
// Check whether the AArch32 SETEND instruction is disabled.
AArch32.CheckSETENDEnabled()
setend_disabled = HSCTLR.SED;
else
setend_disabled = (if ELUsingAArch32(EL1) then SCTLR.SED else SCTLR[].SED);
if setend_disabled == '1' then
UNDEFINED;
return;

Library pseudocode for aarch32/exceptions/traps/AArch32.SystemAccessTrap
// AArch32.SystemAccessTrap()
// ==========================
// Trapped system register access.
AArch32.SystemAccessTrap(bits(5) mode, integer ec)

(valid, target_el) = ELFromM32(mode);
assert valid && HaveEL(target_el) && target_el != EL0 && UInt(target_el) >= UInt(PSTATE.EL);

exception = AArch32.SystemAccessTrapSyndrome(ThisInstr(), ec);
else

Library pseudocode for aarch32/exceptions/traps/AArch32.SystemAccessTrapSyndrome
// AArch32.SystemAccessTrapSyndrome()
// ==================================
// Returns the syndrome information for traps on AArch32 MCR, MCRR, MRC, MRRC, and VMRS, VMSR instruction
// other than traps that are due to HCPTR or CPACR.
ExceptionRecord AArch32.SystemAccessTrapSyndrome(bits(32) instr, integer ec)

ExceptionRecord exception;
case ec of
when 0x0 exception = ExceptionSyndrome(Exception_Uncategorized);
when 0x3 exception = ExceptionSyndrome(Exception_CP15RTTrap);
when 0x4 exception = ExceptionSyndrome(Exception_CP15RRTTrap);
when 0x6 exception = ExceptionSyndrome(Exception_CP14DTTrap);
when 0x7 exception = ExceptionSyndrome(Exception_AdvSIMDFPAccessTrap);
when 0x8 exception = ExceptionSyndrome(Exception_FPIDTrap);
when 0xC exception = ExceptionSyndrome(Exception_CP14RRTTrap);
bits(20) iss = Zeros();
if exception.exceptype IN {Exception_FPIDTrap, Exception_CP14RTTrap, Exception_CP15RTTrap} then

// Trapped MRC/MCR, VMRS on FPSID
iss<13:10> = instr<19:16>; // CRn, Reg in case of VMRS
iss<8:5> = instr<15:12>; // Rt
iss<9> = '0'; // RES0
if exception.exceptype != Exception_FPIDTrap then // When trap is not for VMRS

iss<19:17> = instr<7:5>; // opc2
iss<16:14> = instr<23:21>; // opc1
iss<4:1> = instr<3:0>; //CRm
else //VMRS Access
iss<19:17> = '000'; //opc2 - Hardcoded for VMRS
iss<16:14> = '111'; //opc1 - Hardcoded for VMRS
iss<4:1> = '0000'; //CRm - Hardcoded for VMRS
elsif exception.exceptype IN {Exception_CP14RRTTrap, Exception_AdvSIMDFPAccessTrap, Exception_CP15RRT
// Trapped MRRC/MCRR, VMRS/VMSR
iss<19:16> = instr<7:4>; // opc1
iss<13:10> = instr<19:16>; // Rt2
iss<8:5> = instr<15:12>; // Rt
iss<4:1> = instr<3:0>; // CRm
elsif exception.exceptype == Exception_CP14DTTrap then
// Trapped LDC/STC
iss<19:12> = instr<7:0>; // imm8
iss<4> = instr<23>; // U
iss<2:1> = instr<24,21>; // P,W
if instr<19:16> == '1111' then // Rn==15, LDC(Literal addressing)/STC
iss<8:5> = bits(4) UNKNOWN;
iss<3> = '1';
elsif exception.exceptype == Exception_Uncategorized then
// Trapped for unknown reason
iss<8:5> = instr<19:16>; // Rn
iss<3> = '0';
iss<0> = instr<20>; // Direction
exception.syndrome<19:0> = iss;
return exception;

Library pseudocode for aarch32/exceptions/traps/AArch32.TakeHypTrapException
// AArch32.TakeHypTrapException()
// ==============================
// Exceptions routed to Hyp mode as a Hyp Trap exception.
AArch32.TakeHypTrapException(integer ec)
// AArch32.TakeHypTrapException()
// ==============================
// Exceptions routed to Hyp mode as a Hyp Trap exception.
AArch32.TakeHypTrapException(ExceptionRecord exception)

vect_offset = 0x14;
Library pseudocode for aarch32/exceptions/traps/AArch32.TakeMonitorTrapException
// AArch32.TakeMonitorTrapException()
// ==================================
// Exceptions routed to Monitor mode as a Monitor Trap exception.
AArch32.TakeMonitorTrapException()

vect_offset = 0x04;
lr_offset = if CurrentInstrSet() == InstrSet_A32 then 4 else 2;
Library pseudocode for aarch32/exceptions/traps/AArch32.TakeUndefInstrException
// AArch32.TakeUndefInstrException()
// =================================
AArch32.TakeUndefInstrException()
exception = ExceptionSyndrome(Exception_Uncategorized);
AArch32.TakeUndefInstrException(exception);
// AArch32.TakeUndefInstrException()
// =================================
AArch32.TakeUndefInstrException(ExceptionRecord exception)

vect_offset = 0x04;
lr_offset = if CurrentInstrSet() == InstrSet_A32 then 4 else 2;

elsif route_to_hyp then
else
AArch32.EnterMode(M32_Undef, preferred_exception_return, lr_offset, vect_offset);

Library pseudocode for aarch32/exceptions/traps/AArch32.UndefinedFault
// AArch32.UndefinedFault()
// ========================
AArch32.UndefinedFault()
if AArch32.GeneralExceptionsToAArch64() then AArch64.UndefinedFault();

Library pseudocode for aarch32/functions/aborts/AArch32.CreateFaultRecord
// AArch32.CreateFaultRecord()
// ===========================
FaultRecord AArch32.CreateFaultRecord(Fault statuscode, bits(40) ipaddress, bits(4) domain,

integer level, AccType acctype, boolean write, bit extflag,
bits(4) debugmoe, bits(2) errortype, boolean secondstage, boolean s
FaultRecord fault;
fault.statuscode = statuscode;
if (statuscode != Fault_None && PSTATE.EL != EL2 && TTBCR.EAE == '0' && !secondstage && !s2fs1walk &&
AArch32.DomainValid(statuscode, level)) then
fault.domain = domain;
else
fault.domain = bits(4) UNKNOWN;
fault.debugmoe = debugmoe;
fault.errortype = errortype;
fault.ipaddress.NS = bit UNKNOWN;
fault.ipaddress.address = ZeroExtend(ipaddress);
fault.level = level;
fault.acctype = acctype;
fault.write = write;
fault.extflag = extflag;
fault.secondstage = secondstage;
fault.s2fs1walk = s2fs1walk;
return fault;
Library pseudocode for aarch32/functions/aborts/AArch32.DomainValid
// AArch32.DomainValid()
// =====================
// Returns TRUE if the Domain is valid for a Short-descriptor translation scheme.
boolean AArch32.DomainValid(Fault statuscode, integer level)

assert statuscode != Fault_None;
case statuscode of
when Fault_Domain
return TRUE;
when Fault_Translation, Fault_AccessFlag, Fault_SyncExternalOnWalk, Fault_SyncParityOnWalk
return level == 2;
otherwise
return FALSE;

Library pseudocode for aarch32/functions/aborts/AArch32.FaultStatusLD
// AArch32.FaultStatusLD()
// =======================
// Creates an exception fault status value for Abort and Watchpoint exceptions taken
// to Abort mode using AArch32 and Long-descriptor format.
bits(32) AArch32.FaultStatusLD(boolean d_side, FaultRecord fault)

assert fault.statuscode != Fault_None;
bits(32) fsr = Zeros();

if HaveRASExt() && IsAsyncAbort(fault) then fsr<15:14> = fault.errortype;
if d_side then
if fault.acctype IN {AccType_DC, AccType_IC, AccType_AT} then
fsr<13> = '1'; fsr<11> = '1';
else
fsr<11> = if fault.write then '1' else '0';
if IsExternalAbort(fault) then fsr<12> = fault.extflag;
fsr<9> = '1';
fsr<5:0> = EncodeLDFSC(fault.statuscode, fault.level);
return fsr;
Library pseudocode for aarch32/functions/aborts/AArch32.FaultStatusSD
// AArch32.FaultStatusSD()
// =======================
// Creates an exception fault status value for Abort and Watchpoint exceptions taken
// to Abort mode using AArch32 and Short-descriptor format.
bits(32) AArch32.FaultStatusSD(boolean d_side, FaultRecord fault)

bits(32) fsr = Zeros();

if HaveRASExt() && IsAsyncAbort(fault) then fsr<15:14> = fault.errortype;
if d_side then
if fault.acctype IN {AccType_DC, AccType_IC, AccType_AT} then
fsr<13> = '1'; fsr<11> = '1';
else
fsr<11> = if fault.write then '1' else '0';
if IsExternalAbort(fault) then fsr<12> = fault.extflag;
fsr<9> = '0';
fsr<10,3:0> = EncodeSDFSC(fault.statuscode, fault.level);
if d_side then
fsr<7:4> = fault.domain; // Domain field (data fault only)
return fsr;

Library pseudocode for aarch32/functions/aborts/AArch32.FaultSyndrome
// AArch32.FaultSyndrome()
// =======================
// Creates an exception syndrome value for Abort and Watchpoint exceptions taken to
// AArch32 Hyp mode.
bits(25) AArch32.FaultSyndrome(boolean d_side, FaultRecord fault)


if HaveRASExt() && IsAsyncAbort(fault) then iss<11:10> = fault.errortype; // AET
if d_side then
if IsSecondStage(fault) && !fault.s2fs1walk then iss<24:14> = LSInstructionSyndrome();
if fault.acctype IN {AccType_DC, AccType_DC_UNPRIV, AccType_IC, AccType_AT} then
iss<8> = '1'; iss<6> = '1';
else
iss<6> = if fault.write then '1' else '0';
if IsExternalAbort(fault) then iss<9> = fault.extflag;
iss<7> = if fault.s2fs1walk then '1' else '0';
iss<5:0> = EncodeLDFSC(fault.statuscode, fault.level);
return iss;

Library pseudocode for aarch32/functions/aborts/EncodeSDFSC
// EncodeSDFSC()
// =============
// Function that gives the Short-descriptor FSR code for different types of Fault
bits(5) EncodeSDFSC(Fault statuscode, integer level)
bits(5) result;
case statuscode of
when Fault_AccessFlag
assert level IN {1,2};
result = if level == 1 then '00011' else '00110';
when Fault_Alignment
result = '00001';
when Fault_Permission
when Fault_Domain
when Fault_Translation
when Fault_SyncExternal
result = '01000';
when Fault_SyncExternalOnWalk
when Fault_SyncParity
result = '11001';
when Fault_SyncParityOnWalk
when Fault_AsyncParity
result = '11000';
when Fault_AsyncExternal
result = '10110';
when Fault_Debug
result = '00010';
when Fault_TLBConflict
result = '10000';
when Fault_Lockdown
result = '10100'; // IMPLEMENTATION DEFINED
when Fault_Exclusive
result = '10101'; // IMPLEMENTATION DEFINED
when Fault_ICacheMaint
result = '00100';
otherwise
Unreachable();
return result;
Library pseudocode for aarch32/functions/common/A32ExpandImm
// A32ExpandImm()
// ==============
bits(32) A32ExpandImm(bits(12) imm12)
// PSTATE.C argument to following function call does not affect the imm32 result.
(imm32, -) = A32ExpandImm_C(imm12, PSTATE.C);
return imm32;

Library pseudocode for aarch32/functions/common/A32ExpandImm_C
// A32ExpandImm_C()
// ================
(bits(32), bit) A32ExpandImm_C(bits(12) imm12, bit carry_in)
unrotated_value = ZeroExtend(imm12<7:0>, 32);

(imm32, carry_out) = Shift_C(unrotated_value, SRType_ROR, 2*UInt(imm12<11:8>), carry_in);
return (imm32, carry_out);
Library pseudocode for aarch32/functions/common/DecodeImmShift
// DecodeImmShift()
// ================
(SRType, integer) DecodeImmShift(bits(2) srtype, bits(5) imm5)
case srtype of
when '00'
shift_t = SRType_LSL; shift_n = UInt(imm5);
when '01'
shift_t = SRType_LSR; shift_n = if imm5 == '00000' then 32 else UInt(imm5);
when '10'
shift_t = SRType_ASR; shift_n = if imm5 == '00000' then 32 else UInt(imm5);
when '11'
if imm5 == '00000' then
shift_t = SRType_RRX; shift_n = 1;
else
shift_t = SRType_ROR; shift_n = UInt(imm5);
return (shift_t, shift_n);
Library pseudocode for aarch32/functions/common/DecodeRegShift
// DecodeRegShift()
// ================
SRType DecodeRegShift(bits(2) srtype)

case srtype of
when '00' shift_t = SRType_LSL;
when '01' shift_t = SRType_LSR;
when '10' shift_t = SRType_ASR;
when '11' shift_t = SRType_ROR;
return shift_t;
Library pseudocode for aarch32/functions/common/RRX
// RRX()
// =====
bits(N) RRX(bits(N) x, bit carry_in)

(result, -) = RRX_C(x, carry_in);
return result;
Library pseudocode for aarch32/functions/common/RRX_C
// RRX_C()
// =======
(bits(N), bit) RRX_C(bits(N) x, bit carry_in)

result = carry_in : x<N-1:1>;
carry_out = x<0>;
return (result, carry_out);

Library pseudocode for aarch32/functions/common/SRType
enumeration SRType {SRType_LSL, SRType_LSR, SRType_ASR, SRType_ROR, SRType_RRX};
Library pseudocode for aarch32/functions/common/Shift
// Shift()
// =======
bits(N) Shift(bits(N) value, SRType srtype, integer amount, bit carry_in)

(result, -) = Shift_C(value, srtype, amount, carry_in);
return result;
Library pseudocode for aarch32/functions/common/Shift_C
// Shift_C()
// =========
(bits(N), bit) Shift_C(bits(N) value, SRType srtype, integer amount, bit carry_in)
assert !(srtype == SRType_RRX && amount != 1);
if amount == 0 then
(result, carry_out) = (value, carry_in);
else
case srtype of
when SRType_LSL
(result, carry_out) = LSL_C(value, amount);
when SRType_LSR
(result, carry_out) = LSR_C(value, amount);
when SRType_ASR
(result, carry_out) = ASR_C(value, amount);
when SRType_ROR
(result, carry_out) = ROR_C(value, amount);
when SRType_RRX
(result, carry_out) = RRX_C(value, carry_in);
return (result, carry_out);
Library pseudocode for aarch32/functions/common/T32ExpandImm
// T32ExpandImm()
// ==============
bits(32) T32ExpandImm(bits(12) imm12)
// PSTATE.C argument to following function call does not affect the imm32 result.
(imm32, -) = T32ExpandImm_C(imm12, PSTATE.C);
return imm32;

Library pseudocode for aarch32/functions/common/T32ExpandImm_C
// T32ExpandImm_C()
// ================
(bits(32), bit) T32ExpandImm_C(bits(12) imm12, bit carry_in)
if imm12<11:10> == '00' then

case imm12<9:8> of
when '00'
imm32 = ZeroExtend(imm12<7:0>, 32);
when '01'
imm32 = '00000000' : imm12<7:0> : '00000000' : imm12<7:0>;
when '10'
imm32 = imm12<7:0> : '00000000' : imm12<7:0> : '00000000';
when '11'
imm32 = imm12<7:0> : imm12<7:0> : imm12<7:0> : imm12<7:0>;
carry_out = carry_in;
else
unrotated_value = ZeroExtend('1':imm12<6:0>, 32);
(imm32, carry_out) = ROR_C(unrotated_value, UInt(imm12<11:7>));
return (imm32, carry_out);
Library pseudocode for aarch32/functions/coproc/AArch32.CheckCP15InstrCoarseTraps
// AArch32.CheckCP15InstrCoarseTraps()
// ===================================
// Check for coarse-grained CP15 traps in HSTR and HCR.
boolean AArch32.CheckCP15InstrCoarseTraps(integer CRn, integer nreg, integer CRm)
// Check for coarse-grained Hyp traps

if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then
if PSTATE.EL == EL0 && !ELUsingAArch32(EL2) then
return AArch64.CheckCP15InstrCoarseTraps(CRn, nreg, CRm);
// Check for MCR, MRC, MCRR and MRRC disabled by HSTR<CRn/CRm>
major = if nreg == 1 then CRn else CRm;
if !(major IN {4,14}) && HSTR<major> == '1' then
return TRUE;
// Check for MRC and MCR disabled by HCR.TIDCP

if (HCR.TIDCP == '1' && nreg == 1 &&
((CRn == 9 && CRm IN {0,1,2, 5,6,7,8 }) ||
(CRn == 10 && CRm IN {0,1, 4, 8 }) ||
(CRn == 11 && CRm IN {0,1,2,3,4,5,6,7,8,15}))) then
return TRUE;
return FALSE;

Library pseudocode for aarch32/functions/exclusive/AArch32.ExclusiveMonitorsPass
// AArch32.ExclusiveMonitorsPass()
// ===============================
// Return TRUE if the Exclusives monitors for the current PE include all of the addresses
// associated with the virtual address region of size bytes starting at address.
// The immediately following memory write must be to the same addresses.
boolean AArch32.ExclusiveMonitorsPass(bits(32) address, integer size)
// It is IMPLEMENTATION DEFINED whether the detection of memory aborts happens

// before or after the check on the local Exclusives monitor. As a result a failure
// of the local monitor can occur on some implementations even if the memory
// access would give an memory abort.
acctype = AccType_ATOMIC;
iswrite = TRUE;
aligned = AArch32.CheckAlignment(address, size, acctype, iswrite);
passed = AArch32.IsExclusiveVA(address, ProcessorID(), size);

if !passed then
return FALSE;
memaddrdesc = AArch32.TranslateAddress(address, acctype, iswrite, aligned, size);

// Check for aborts or debug exceptions
if IsFault(memaddrdesc) then
AArch32.Abort(address, memaddrdesc.fault);
passed = IsExclusiveLocal(memaddrdesc.paddress, ProcessorID(), size);

if passed then
if memaddrdesc.memattrs.shareable then
passed = IsExclusiveGlobal(memaddrdesc.paddress, ProcessorID(), size);
return passed;
Library pseudocode for aarch32/functions/exclusive/AArch32.IsExclusiveVA
// An optional IMPLEMENTATION DEFINED test for an exclusive access to a virtual

// address region of size bytes starting at address.
//
// It is permitted (but not required) for this function to return FALSE and
// cause a store exclusive to fail if the virtual address region is not
// totally included within the region recorded by MarkExclusiveVA().
//
// It is always safe to return TRUE which will check the physical address only.
boolean AArch32.IsExclusiveVA(bits(32) address, integer processorid, integer size);
Library pseudocode for aarch32/functions/exclusive/AArch32.MarkExclusiveVA
// Optionally record an exclusive access to the virtual address region of size bytes
// starting at address for processorid.
AArch32.MarkExclusiveVA(bits(32) address, integer processorid, integer size);

Library pseudocode for aarch32/functions/exclusive/AArch32.SetExclusiveMonitors
// AArch32.SetExclusiveMonitors()
// ==============================
// Sets the Exclusives monitors for the current PE to record the addresses associated
// with the virtual address region of size bytes starting at address.
AArch32.SetExclusiveMonitors(bits(32) address, integer size)
iswrite = FALSE;

return;
MarkExclusiveGlobal(memaddrdesc.paddress, ProcessorID(), size);
MarkExclusiveLocal(memaddrdesc.paddress, ProcessorID(), size);
AArch32.MarkExclusiveVA(address, ProcessorID(), size);
Library pseudocode for aarch32/functions/float/CheckAdvSIMDEnabled
// CheckAdvSIMDEnabled()
// =====================
CheckAdvSIMDEnabled()
fpexc_check = TRUE;
advsimd = TRUE;
AArch32.CheckAdvSIMDOrFPEnabled(fpexc_check, advsimd);
// Return from CheckAdvSIMDOrFPEnabled() occurs only if Advanced SIMD access is permitted
// Make temporary copy of D registers

// _Dclone[] is used as input data for instruction pseudocode
for i = 0 to 31
_Dclone[i] = D[i];
return;
Library pseudocode for aarch32/functions/float/CheckAdvSIMDOrVFPEnabled
// CheckAdvSIMDOrVFPEnabled()
// ==========================
CheckAdvSIMDOrVFPEnabled(boolean include_fpexc_check, boolean advsimd)

AArch32.CheckAdvSIMDOrFPEnabled(include_fpexc_check, advsimd);
// Return from CheckAdvSIMDOrFPEnabled() occurs only if VFP access is permitted
return;
Library pseudocode for aarch32/functions/float/CheckCryptoEnabled32
// CheckCryptoEnabled32()
// ======================
CheckCryptoEnabled32()
CheckAdvSIMDEnabled();
// Return from CheckAdvSIMDEnabled() occurs only if access is permitted
return;

Library pseudocode for aarch32/functions/float/CheckVFPEnabled
// CheckVFPEnabled()
// =================
CheckVFPEnabled(boolean include_fpexc_check)
advsimd = FALSE;
AArch32.CheckAdvSIMDOrFPEnabled(include_fpexc_check, advsimd);
// Return from CheckAdvSIMDOrFPEnabled() occurs only if VFP access is permitted
return;
Library pseudocode for aarch32/functions/float/FPHalvedSub
// FPHalvedSub()
// =============
bits(N) FPHalvedSub(bits(N) op1, bits(N) op2, FPCRType fpcr)

assert N IN {16,32,64};
rounding = FPRoundingMode(fpcr);
(type1,sign1,value1) = FPUnpack(op1, fpcr);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);
if !done then
inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero);
if inf1 && inf2 && sign1 == sign2 then
result = FPDefaultNaN();
FPProcessException(FPExc_InvalidOp, fpcr);
elsif (inf1 && sign1 == '0') || (inf2 && sign2 == '1') then
result = FPInfinity('0');
elsif (inf1 && sign1 == '1') || (inf2 && sign2 == '0') then
result = FPInfinity('1');
elsif zero1 && zero2 && sign1 != sign2 then
result = FPZero(sign1);
else
result_value = (value1 - value2) / 2.0;
if result_value == 0.0 then // Sign of exact zero result depends on rounding mode
result_sign = if rounding == FPRounding_NEGINF then '1' else '0';
result = FPZero(result_sign);
else
result = FPRound(result_value, fpcr);
return result;
Library pseudocode for aarch32/functions/float/FPRSqrtStep
// FPRSqrtStep()
// =============
bits(N) FPRSqrtStep(bits(N) op1, bits(N) op2)

assert N IN {16,32};
FPCRType fpcr = StandardFPSCRValue();
if !done then
bits(N) product;
if (inf1 && zero2) || (zero1 && inf2) then
product = FPZero('0');
else
product = FPMul(op1, op2, fpcr);
bits(N) three = FPThree('0');
result = FPHalvedSub(three, product, fpcr);
return result;

Library pseudocode for aarch32/functions/float/FPRecipStep
// FPRecipStep()
// =============
bits(N) FPRecipStep(bits(N) op1, bits(N) op2)

assert N IN {16,32};
FPCRType fpcr = StandardFPSCRValue();
if !done then
bits(N) product;
product = FPZero('0');
else
product = FPMul(op1, op2, fpcr);
bits(N) two = FPTwo('0');
result = FPSub(two, product, fpcr);
return result;
Library pseudocode for aarch32/functions/float/StandardFPSCRValue
// StandardFPSCRValue()
// ====================
FPCRType StandardFPSCRValue()
return '00000' : FPSCR.AHP : '110000' : FPSCR.FZ16 : '0000000000000000000';
Library pseudocode for aarch32/functions/memory/AArch32.CheckAlignment
// AArch32.CheckAlignment()
// ========================
boolean AArch32.CheckAlignment(bits(32) address, integer alignment, AccType acctype,

boolean iswrite)
if PSTATE.EL == EL0 && !ELUsingAArch32(S1TranslationRegime()) then

A = SCTLR[].A; //use AArch64 register, when higher Exception level is using AArch64
elsif PSTATE.EL == EL2 then
A = HSCTLR.A;
else
A = SCTLR.A;
aligned = (address == Align(address, alignment));
atomic = acctype IN { AccType_ATOMIC, AccType_ATOMICRW, AccType_ORDEREDATOMIC, AccType_ORDEREDATOMIC
ordered = acctype IN { AccType_ORDERED, AccType_ORDEREDRW, AccType_LIMITEDORDERED, AccType_ORDEREDATO
vector = acctype == AccType_VEC;
// AccType_VEC is used for SIMD element alignment checks only

check = (atomic || ordered || vector || A == '1');
if check && !aligned then

AArch32.Abort(address, AArch32.AlignmentFault(acctype, iswrite, secondstage));
return aligned;

Library pseudocode for aarch32/functions/memory/AArch32.MemSingle
// AArch32.MemSingle[] - non-assignment (read) form

// ================================================
// Perform an atomic, little-endian read of 'size' bytes.
bits(size*8) AArch32.MemSingle[bits(32) address, integer size, AccType acctype, boolean wasaligned]

assert size IN {1, 2, 4, 8, 16};
assert address == Align(address, size);
AddressDescriptor memaddrdesc;
bits(size*8) value;
iswrite = FALSE;
memaddrdesc = AArch32.TranslateAddress(address, acctype, iswrite, wasaligned, size);

// Memory array access

accdesc = CreateAccessDescriptor(acctype);
if AArch64.AccessIsTagChecked(ZeroExtend(address, 64), acctype) then
bits(4) ptag = AArch64.PhysicalTag(ZeroExtend(address, 64));
if !AArch64.CheckTag(memaddrdesc, ptag, iswrite) then
AArch64.TagCheckFail(ZeroExtend(address, 64), acctype, iswrite);
value = _Mem[memaddrdesc, size, accdesc];
return value;
// AArch32.MemSingle[] - assignment (write) form

// =============================================
// Perform an atomic, little-endian write of 'size' bytes.
AArch32.MemSingle[bits(32) address, integer size, AccType acctype, boolean wasaligned] = bits(size*8) val
assert size IN {1, 2, 4, 8, 16};
iswrite = TRUE;

// Effect on exclusives
ClearExclusiveByAddress(memaddrdesc.paddress, ProcessorID(), size);

_Mem[memaddrdesc, size, accdesc] = value;
return;
Library pseudocode for aarch32/functions/memory/Hint_PreloadData
Hint_PreloadData(bits(32) address);
Library pseudocode for aarch32/functions/memory/Hint_PreloadDataForWrite
Hint_PreloadDataForWrite(bits(32) address);

Library pseudocode for aarch32/functions/memory/Hint_PreloadInstr
Hint_PreloadInstr(bits(32) address);
Library pseudocode for aarch32/functions/memory/MemA
// MemA[] - non-assignment form

// ============================
bits(8*size) MemA[bits(32) address, integer size]

return Mem_with_type[address, size, acctype];
// MemA[] - assignment form

// ========================
MemA[bits(32) address, integer size] = bits(8*size) value

Mem_with_type[address, size, acctype] = value;
return;
Library pseudocode for aarch32/functions/memory/MemO
// MemO[] - non-assignment form

// ============================
bits(8*size) MemO[bits(32) address, integer size]

acctype = AccType_ORDERED;
// MemO[] - assignment form

// ========================
MemO[bits(32) address, integer size] = bits(8*size) value

acctype = AccType_ORDERED;
return;
Library pseudocode for aarch32/functions/memory/MemU
// MemU[] - non-assignment form

// ============================
bits(8*size) MemU[bits(32) address, integer size]

// MemU[] - assignment form

// ========================
MemU[bits(32) address, integer size] = bits(8*size) value

return;

Library pseudocode for aarch32/functions/memory/MemU_unpriv
// MemU_unpriv[] - non-assignment form

// ===================================
bits(8*size) MemU_unpriv[bits(32) address, integer size]

// MemU_unpriv[] - assignment form

// ===============================
MemU_unpriv[bits(32) address, integer size] = bits(8*size) value

return;

Library pseudocode for aarch32/functions/memory/Mem_with_type
// Mem_with_type[] - non-assignment (read) form

// ============================================
// Perform a read of 'size' bytes. The access byte order is reversed for a big-endian access.
// Instruction fetches would call AArch32.MemSingle directly.
bits(size*8) Mem_with_type[bits(32) address, integer size, AccType acctype]

assert size IN {1, 2, 4, 8, 16};
bits(size*8) value;
boolean iswrite = FALSE;

if !aligned then
assert size > 1;
value<7:0> = AArch32.MemSingle[address, 1, acctype, aligned];
// For subsequent bytes it is CONSTRAINED UNPREDICTABLE whether an unaligned Device memory

// access will generate an Alignment Fault, as to get this far means the first byte did
// not, so we must be changing to a new translation page.
c = ConstrainUnpredictable(Unpredictable_DEVPAGE2);
assert c IN {Constraint_FAULT, Constraint_NONE};
if c == Constraint_NONE then aligned = TRUE;
for i = 1 to size-1
value<8*i+7:8*i> = AArch32.MemSingle[address+i, 1, acctype, aligned];
else
value = AArch32.MemSingle[address, size, acctype, aligned];
if BigEndian() then
value = BigEndianReverse(value);
return value;
// Mem_with_type[] - assignment (write) form

// =========================================
// Perform a write of 'size' bytes. The byte order is reversed for a big-endian access.
Mem_with_type[bits(32) address, integer size, AccType acctype] = bits(size*8) value

boolean iswrite = TRUE;
if BigEndian() then
value = BigEndianReverse(value);
if !aligned then
assert size > 1;
AArch32.MemSingle[address, 1, acctype, aligned] = value<7:0>;
// For subsequent bytes it is CONSTRAINED UNPREDICTABLE whether an unaligned Device memory

// access will generate an Alignment Fault, as to get this far means the first byte did
// not, so we must be changing to a new translation page.
c = ConstrainUnpredictable(Unpredictable_DEVPAGE2);
assert c IN {Constraint_FAULT, Constraint_NONE};
if c == Constraint_NONE then aligned = TRUE;
for i = 1 to size-1
AArch32.MemSingle[address+i, 1, acctype, aligned] = value<8*i+7:8*i>;
else
AArch32.MemSingle[address, size, acctype, aligned] = value;
return;

Library pseudocode for aarch32/functions/ras/AArch32.ESBOperation
// AArch32.ESBOperation()
// ======================
// Perform the AArch32 ESB operation for ESB executed in AArch32 state
AArch32.ESBOperation()

route_to_aarch64 = HCR_EL2.TGE == '1' || HCR_EL2.AMO == '1';
route_to_aarch64 = SCR_EL3.EA == '1';
return;
route_to_monitor = HaveEL(EL3) && ELUsingAArch32(EL3) && SCR.EA == '1';

route_to_hyp = PSTATE.EL IN {EL0, EL1} && EL2Enabled() && (HCR.TGE == '1' || HCR.AMO == '1');
target = M32_Monitor;
elsif route_to_hyp || PSTATE.M == M32_Hyp then
target = M32_Hyp;
else
target = M32_Abort;
if IsSecure() then
mask_active = TRUE;
elsif target == M32_Monitor then
mask_active = SCR.AW == '1' && (!HaveEL(EL2) || (HCR.TGE == '0' && HCR.AMO == '0'));
else
mask_active = target == M32_Abort || PSTATE.M == M32_Hyp;
mask_set = PSTATE.A == '1';

(-, el) = ELFromM32(target);
intdis = Halted() || ExternalDebugInterruptsDisabled(el);
masked = intdis || (mask_active && mask_set);
// Check for a masked Physical SError pending that can be synchronized

// by an Error synchronization event.
if masked && IsSynchronizablePhysicalSErrorPending() then
syndrome32 = AArch32.PhysicalSErrorSyndrome();
DISR = AArch32.ReportDeferredSError(syndrome32.AET, syndrome32.ExT);
return;
Library pseudocode for aarch32/functions/ras/AArch32.PhysicalSErrorSyndrome
// Return the SError syndrome

AArch32.SErrorSyndrome AArch32.PhysicalSErrorSyndrome();

Library pseudocode for aarch32/functions/ras/AArch32.ReportDeferredSError
// AArch32.ReportDeferredSError()
// ==============================
// Return deferred SError syndrome
bits(32) AArch32.ReportDeferredSError(bits(2) AET, bit ExT)

bits(32) target;
target<31> = '1'; // A
syndrome = Zeros(16);
syndrome<11:10> = AET; // AET
syndrome<9> = ExT; // EA
syndrome<5:0> = '010001'; // DFSC
else
syndrome<15:14> = AET; // AET
syndrome<12> = ExT; // ExT
syndrome<9> = TTBCR.EAE; // LPAE
if TTBCR.EAE == '1' then // Long-descriptor format
syndrome<5:0> = '010001'; // STATUS
else // Short-descriptor format
syndrome<10,3:0> = '10110'; // FS
if HaveAnyAArch64() then
target<24:0> = ZeroExtend(syndrome);// Any RES0 fields must be set to zero
else
target<15:0> = syndrome;
return target;
Library pseudocode for aarch32/functions/ras/AArch32.SErrorSyndrome
type AArch32.SErrorSyndrome is (
bits(2) AET,
bit ExT
)
Library pseudocode for aarch32/functions/ras/AArch32.vESBOperation
// AArch32.vESBOperation()
// =======================
// Perform the ESB operation for virtual SError interrupts executed in AArch32 state
AArch32.vESBOperation()
// Check for EL2 using AArch64 state

AArch64.vESBOperation();
return;
// If physical SError interrupts are routed to Hyp mode, and TGE is not set, then a
// virtual SError interrupt might be pending
vSEI_enabled = HCR.TGE == '0' && HCR.AMO == '1';
vSEI_pending = vSEI_enabled && HCR.VA == '1';
vintdis = Halted() || ExternalDebugInterruptsDisabled(EL1);
vmasked = vintdis || PSTATE.A == '1';
// Check for a masked virtual SError pending

if vSEI_pending && vmasked then
VDISR = AArch32.ReportDeferredSError(VDFSR<15:14>, VDFSR<12>);
HCR.VA = '0'; // Clear pending virtual SError
return;

Library pseudocode for aarch32/functions/registers/AArch32.ResetGeneralRegisters
// AArch32.ResetGeneralRegisters()
// ===============================
AArch32.ResetGeneralRegisters()
for i = 0 to 7
R[i] = bits(32) UNKNOWN;
for i = 8 to 12
Rmode[i, M32_User] = bits(32) UNKNOWN;
Rmode[i, M32_FIQ] = bits(32) UNKNOWN;
if HaveEL(EL2) then Rmode[13, M32_Hyp] = bits(32) UNKNOWN; // No R14_hyp
for i = 13 to 14
Rmode[i, M32_User] = bits(32) UNKNOWN;
Rmode[i, M32_FIQ] = bits(32) UNKNOWN;
Rmode[i, M32_IRQ] = bits(32) UNKNOWN;
Rmode[i, M32_Svc] = bits(32) UNKNOWN;
Rmode[i, M32_Abort] = bits(32) UNKNOWN;
Rmode[i, M32_Undef] = bits(32) UNKNOWN;
if HaveEL(EL3) then Rmode[i, M32_Monitor] = bits(32) UNKNOWN;
return;
Library pseudocode for aarch32/functions/registers/AArch32.ResetSIMDFPRegisters
// AArch32.ResetSIMDFPRegisters()
// ==============================
AArch32.ResetSIMDFPRegisters()
for i = 0 to 15
Q[i] = bits(128) UNKNOWN;
return;
Library pseudocode for aarch32/functions/registers/AArch32.ResetSpecialRegisters
// AArch32.ResetSpecialRegisters()
// ===============================
AArch32.ResetSpecialRegisters()
// AArch32 special registers

SPSR_fiq = bits(32) UNKNOWN;
SPSR_irq = bits(32) UNKNOWN;
SPSR_svc = bits(32) UNKNOWN;
SPSR_abt = bits(32) UNKNOWN;
SPSR_und = bits(32) UNKNOWN;
if HaveEL(EL2) then
SPSR_hyp = bits(32) UNKNOWN;
ELR_hyp = bits(32) UNKNOWN;
if HaveEL(EL3) then
SPSR_mon = bits(32) UNKNOWN;
// External debug special registers

return;
Library pseudocode for aarch32/functions/registers/AArch32.ResetSystemRegisters
AArch32.ResetSystemRegisters(boolean cold_reset);

Library pseudocode for aarch32/functions/registers/ALUExceptionReturn
// ALUExceptionReturn()
// ====================
ALUExceptionReturn(bits(32) address)
UNDEFINED;
elsif PSTATE.M IN {M32_User,M32_System} then
UNPREDICTABLE; // UNDEFINED or NOP
else
AArch32.ExceptionReturn(address, SPSR[]);
Library pseudocode for aarch32/functions/registers/ALUWritePC
// ALUWritePC()
// ============
ALUWritePC(bits(32) address)
if CurrentInstrSet() == InstrSet_A32 then
BXWritePC(address, BranchType_INDIR);
else
BranchWritePC(address, BranchType_INDIR);
Library pseudocode for aarch32/functions/registers/BXWritePC
// BXWritePC()
// ===========
BXWritePC(bits(32) address, BranchType branch_type)

if address<0> == '1' then
SelectInstrSet(InstrSet_T32);
address<0> = '0';
else
SelectInstrSet(InstrSet_A32);
// For branches to an unaligned PC counter in A32 state, the processor takes the branch
// and does one of:
// * Forces the address to be aligned
// * Leaves the PC unaligned, meaning the target generates a PC Alignment fault.
if address<1> == '1' && ConstrainUnpredictableBool(Unpredictable_A32FORCEALIGNPC) then
address<1> = '0';
BranchTo(address, branch_type);
Library pseudocode for aarch32/functions/registers/BranchWritePC
// BranchWritePC()
// ===============
BranchWritePC(bits(32) address, BranchType branch_type)

if CurrentInstrSet() == InstrSet_A32 then
address<1:0> = '00';
else
address<0> = '0';
BranchTo(address, branch_type);

Library pseudocode for aarch32/functions/registers/D
// D[] - non-assignment form

// =========================
bits(64) D[integer n]
assert n >= 0 && n <= 31;
base = (n MOD 2) * 64;
bits(128) vreg = V[n DIV 2];
return vreg<base+63:base>;
// D[] - assignment form

// =====================
D[integer n] = bits(64) value

assert n >= 0 && n <= 31;
base = (n MOD 2) * 64;
vreg<base+63:base> = value;
V[n DIV 2] = vreg;
return;
Library pseudocode for aarch32/functions/registers/Din
// Din[] - non-assignment form

// ===========================
bits(64) Din[integer n]
assert n >= 0 && n <= 31;
return _Dclone[n];
Library pseudocode for aarch32/functions/registers/LR
// LR - assignment form
// ====================
LR = bits(32) value
R[14] = value;
return;
// LR - non-assignment form
// ========================
bits(32) LR
return R[14];
Library pseudocode for aarch32/functions/registers/LoadWritePC
// LoadWritePC()
// =============
LoadWritePC(bits(32) address)
BXWritePC(address, BranchType_INDIR);

Library pseudocode for aarch32/functions/registers/LookUpRIndex
// LookUpRIndex()
// ==============
integer LookUpRIndex(integer n, bits(5) mode)

assert n >= 0 && n <= 14;
case n of // Select index by mode: usr fiq irq svc abt und hyp
when 8 result = RBankSelect(mode, 8, 24, 8, 8, 8, 8, 8);
otherwise result = n;
return result;
Library pseudocode for aarch32/functions/registers/Monitor_mode_registers
bits(32) SP_mon;
bits(32) LR_mon;
Library pseudocode for aarch32/functions/registers/PC
// PC - non-assignment form
// ========================
bits(32) PC
return R[15]; // This includes the offset from AArch32 state
Library pseudocode for aarch32/functions/registers/PCStoreValue
// PCStoreValue()
// ==============
bits(32) PCStoreValue()
// This function returns the PC value. On architecture versions before Armv7, it
// is permitted to instead return PC+4, provided it does so consistently. It is
// used only to describe A32 instructions, so it returns the address of the current
// instruction plus 8 (normally) or 12 (when the alternative is permitted).
return PC;
Library pseudocode for aarch32/functions/registers/Q
// Q[] - non-assignment form

// =========================
bits(128) Q[integer n]
assert n >= 0 && n <= 15;
return V[n];
// Q[] - assignment form

// =====================
Q[integer n] = bits(128) value

assert n >= 0 && n <= 15;
V[n] = value;
return;

Library pseudocode for aarch32/functions/registers/Qin
// Qin[] - non-assignment form

// ===========================
bits(128) Qin[integer n]
assert n >= 0 && n <= 15;
return Din[2*n+1]:Din[2*n];
Library pseudocode for aarch32/functions/registers/R
// R[] - assignment form

// =====================
R[integer n] = bits(32) value

Rmode[n, PSTATE.M] = value;
return;
// R[] - non-assignment form

// =========================
bits(32) R[integer n]
if n == 15 then
offset = (if CurrentInstrSet() == InstrSet_A32 then 8 else 4);
return _PC<31:0> + offset;
else
return Rmode[n, PSTATE.M];
Library pseudocode for aarch32/functions/registers/RBankSelect
// RBankSelect()
// =============
integer RBankSelect(bits(5) mode, integer usr, integer fiq, integer irq,

integer svc, integer abt, integer und, integer hyp)
case mode of
when M32_User result = usr; // User mode
when M32_FIQ result = fiq; // FIQ mode
when M32_IRQ result = irq; // IRQ mode
when M32_Svc result = svc; // Supervisor mode
when M32_Abort result = abt; // Abort mode
when M32_Hyp result = hyp; // Hyp mode
when M32_Undef result = und; // Undefined mode
when M32_System result = usr; // System mode uses User mode registers
otherwise Unreachable(); // Monitor mode
return result;

Library pseudocode for aarch32/functions/registers/Rmode
// Rmode[] - non-assignment form

// =============================
bits(32) Rmode[integer n, bits(5) mode]

assert n >= 0 && n <= 14;
// Check for attempted use of Monitor mode in Non-secure state.

if !IsSecure() then assert mode != M32_Monitor;
assert !BadMode(mode);
if mode == M32_Monitor then

if n == 13 then return SP_mon;
elsif n == 14 then return LR_mon;
else return _R[n]<31:0>;
else
return _R[LookUpRIndex(n, mode)]<31:0>;
// Rmode[] - assignment form

// =========================
Rmode[integer n, bits(5) mode] = bits(32) value

assert n >= 0 && n <= 14;
// Check for attempted use of Monitor mode in Non-secure state.

if !IsSecure() then assert mode != M32_Monitor;
assert !BadMode(mode);
if mode == M32_Monitor then

if n == 13 then SP_mon = value;
elsif n == 14 then LR_mon = value;
else _R[n]<31:0> = value;
else
// It is CONSTRAINED UNPREDICTABLE whether the upper 32 bits of the X
// register are unchanged or set to zero. This is also tested for on
// exception entry, as this applies to all AArch32 registers.
if !HighestELUsingAArch32() && ConstrainUnpredictableBool(Unpredictable_ZEROUPPER) then
_R[LookUpRIndex(n, mode)] = ZeroExtend(value);
else
_R[LookUpRIndex(n, mode)]<31:0> = value;
return;
Library pseudocode for aarch32/functions/registers/S
// S[] - non-assignment form

// =========================
bits(32) S[integer n]
assert n >= 0 && n <= 31;
base = (n MOD 4) * 32;
return vreg<base+31:base>;
// S[] - assignment form

// =====================
S[integer n] = bits(32) value

assert n >= 0 && n <= 31;
base = (n MOD 4) * 32;
vreg<base+31:base> = value;
V[n DIV 4] = vreg;
return;

Library pseudocode for aarch32/functions/registers/SP
// SP - assignment form
// ====================
SP = bits(32) value
R[13] = value;
return;
// SP - non-assignment form
// ========================
bits(32) SP
return R[13];
Library pseudocode for aarch32/functions/registers/_Dclone
array bits(64) _Dclone[0..31];
Library pseudocode for aarch32/functions/system/AArch32.ExceptionReturn
// AArch32.ExceptionReturn()
// =========================
AArch32.ExceptionReturn(bits(32) new_pc, bits(32) spsr)
// Attempts to change to an illegal mode or state will invoke the Illegal Execution state
// mechanism
SetPSTATEFromPSR(spsr);
SendEventLocal();
if PSTATE.IL == '1' then

// If the exception return is illegal, PC[1:0] are UNKNOWN
new_pc<1:0> = bits(2) UNKNOWN;
else
// LR[1:0] or LR[0] are treated as being 0, depending on the target instruction set state
if PSTATE.T == '1' then
new_pc<0> = '0'; // T32
else
new_pc<1:0> = '00'; // A32
BranchTo(new_pc, BranchType_ERET);
Library pseudocode for aarch32/functions/system/AArch32.ExecutingATS1xPInstr
// AArch32.ExecutingATS1xPInstr()
// ==============================
// Return TRUE if current instruction is AT S1CPR/WP
boolean AArch32.ExecutingATS1xPInstr()
if !HavePrivATExt() then return FALSE;
instr = ThisInstr();
if instr<24+:4> == '1110' && instr<8+:4> == '1111' then
opc1 = instr<21+:3>;
CRn = instr<16+:4>;
CRm = instr<0+:4>;
opc2 = instr<5+:3>;
return (opc1 == '000' && CRn == '0111' && CRm == '1001' && opc2 IN {'000','001'});
else
return FALSE;

Library pseudocode for aarch32/functions/system/AArch32.ExecutingCP10or11Instr
// AArch32.ExecutingCP10or11Instr()
// ================================
boolean AArch32.ExecutingCP10or11Instr()
instr_set = CurrentInstrSet();
assert instr_set IN {InstrSet_A32, InstrSet_T32};
if instr_set == InstrSet_A32 then

return ((instr<27:24> == '1110' || instr<27:25> == '110') && instr<11:8> == '101x');
else // InstrSet_T32
return (instr<31:28> == '111x' && (instr<27:24> == '1110' || instr<27:25> == '110') && instr<11:8
Library pseudocode for aarch32/functions/system/AArch32.ExecutingLSMInstr
// AArch32.ExecutingLSMInstr()
// ===========================
// Returns TRUE if processor is executing a Load/Store Multiple instruction
boolean AArch32.ExecutingLSMInstr()
instr_set = CurrentInstrSet();
assert instr_set IN {InstrSet_A32, InstrSet_T32};
if instr_set == InstrSet_A32 then

return (instr<28+:4> != '1111' && instr<25+:3> == '100');
else // InstrSet_T32
if ThisInstrLength() == 16 then
return (instr<12+:4> == '1100');
else
return (instr<25+:7> == '1110100' && instr<22> == '0');
Library pseudocode for aarch32/functions/system/AArch32.ITAdvance
// AArch32.ITAdvance()
// ===================
AArch32.ITAdvance()
if PSTATE.IT<2:0> == '000' then
PSTATE.IT = '00000000';
else
PSTATE.IT<4:0> = LSL(PSTATE.IT<4:0>, 1);
return;
Library pseudocode for aarch32/functions/system/AArch32.SysRegRead
// Read from a 32-bit AArch32 System register and return the register's contents.
bits(32) AArch32.SysRegRead(integer cp_num, bits(32) instr);
Library pseudocode for aarch32/functions/system/AArch32.SysRegRead64
// Read from a 64-bit AArch32 System register and return the register's contents.
bits(64) AArch32.SysRegRead64(integer cp_num, bits(32) instr);

Library pseudocode for aarch32/functions/system/AArch32.SysRegReadCanWriteAPSR
// AArch32.SysRegReadCanWriteAPSR()
// ================================
// Determines whether the AArch32 System register read instruction can write to APSR flags.
boolean AArch32.SysRegReadCanWriteAPSR(integer cp_num, bits(32) instr)

assert UsingAArch32();
assert (cp_num IN {14,15});
assert cp_num == UInt(instr<11:8>);
opc1 = UInt(instr<23:21>);
opc2 = UInt(instr<7:5>);
CRn = UInt(instr<19:16>);
CRm = UInt(instr<3:0>);
if cp_num == 14 && opc1 == 0 && CRn == 0 && CRm == 1 && opc2 == 0 then // DBGDSCRint
return TRUE;
return FALSE;
Library pseudocode for aarch32/functions/system/AArch32.SysRegWrite
// Write to a 32-bit AArch32 System register.

AArch32.SysRegWrite(integer cp_num, bits(32) instr, bits(32) val);
Library pseudocode for aarch32/functions/system/AArch32.SysRegWrite64
// Write to a 64-bit AArch32 System register.

AArch32.SysRegWrite64(integer cp_num, bits(32) instr, bits(64) val);
Library pseudocode for aarch32/functions/system/AArch32.WriteMode
// AArch32.WriteMode()
// ===================
// Function for dealing with writes to PSTATE.M from AArch32 state only.
// This ensures that PSTATE.EL and PSTATE.SP are always valid.
AArch32.WriteMode(bits(5) mode)
(valid,el) = ELFromM32(mode);
assert valid;
PSTATE.M = mode;
PSTATE.EL = el;
PSTATE.nRW = '1';
PSTATE.SP = (if mode IN {M32_User,M32_System} then '0' else '1');
return;

Library pseudocode for aarch32/functions/system/AArch32.WriteModeByInstr
// AArch32.WriteModeByInstr()
// ==========================
// Function for dealing with writes to PSTATE.M from an AArch32 instruction, and ensuring that
// illegal state changes are correctly flagged in PSTATE.IL.
AArch32.WriteModeByInstr(bits(5) mode)
(valid,el) = ELFromM32(mode);
// 'valid' is set to FALSE if' mode' is invalid for this implementation or the current value
// of SCR.NS/SCR_EL3.NS. Additionally, it is illegal for an instruction to write 'mode' to
// PSTATE.EL if it would result in any of:
// * A change to a mode that would cause entry to a higher Exception level.
if UInt(el) > UInt(PSTATE.EL) then
valid = FALSE;
// * A change to or from Hyp mode.

if (PSTATE.M == M32_Hyp || mode == M32_Hyp) && PSTATE.M != mode then
valid = FALSE;
// * When EL2 is implemented, the value of HCR.TGE is '1', a change to a Non-secure EL1 mode.
if PSTATE.M == M32_Monitor && HaveEL(EL2) && el == EL1 && SCR.NS == '1' && HCR.TGE == '1' then
valid = FALSE;
if !valid then
PSTATE.IL = '1';
else
AArch32.WriteMode(mode);
Library pseudocode for aarch32/functions/system/BadMode
// BadMode()
// =========
boolean BadMode(bits(5) mode)

// Return TRUE if 'mode' encodes a mode that is not valid for this implementation
case mode of
when M32_Monitor
valid = HaveAArch32EL(EL3);
when M32_Hyp
when M32_FIQ, M32_IRQ, M32_Svc, M32_Abort, M32_Undef, M32_System
// If EL3 is implemented and using AArch32, then these modes are EL3 modes in Secure
// state, and EL1 modes in Non-secure state. If EL3 is not implemented or is using
// AArch64, then these modes are EL1 modes.
// Therefore it is sufficient to test this implementation supports EL1 using AArch32.
when M32_User
otherwise
valid = FALSE; // Passed an illegal mode value
return !valid;

Library pseudocode for aarch32/functions/system/BankedRegisterAccessValid
// BankedRegisterAccessValid()
// ===========================
// Checks for MRS (Banked register) or MSR (Banked register) accesses to registers
// other than the SPSRs that are invalid. This includes ELR_hyp accesses.
BankedRegisterAccessValid(bits(5) SYSm, bits(5) mode)
case SYSm of
when '000xx', '00100' // R8_usr to R12_usr
if mode != M32_FIQ then UNPREDICTABLE;
when '00101' // SP_usr
if mode == M32_System then UNPREDICTABLE;
when '00110' // LR_usr
if mode IN {M32_Hyp,M32_System} then UNPREDICTABLE;
when '010xx', '0110x', '01110' // R8_fiq to R12_fiq, SP_fiq, LR_fiq
if mode == M32_FIQ then UNPREDICTABLE;
when '1000x' // LR_irq, SP_irq
if mode == M32_IRQ then UNPREDICTABLE;
when '1001x' // LR_svc, SP_svc
if mode == M32_Svc then UNPREDICTABLE;
when '1010x' // LR_abt, SP_abt
if mode == M32_Abort then UNPREDICTABLE;
when '1011x' // LR_und, SP_und
if mode == M32_Undef then UNPREDICTABLE;
when '1110x' // LR_mon, SP_mon
if !HaveEL(EL3) || !IsSecure() || mode == M32_Monitor then UNPREDICTABLE;
when '11110' // ELR_hyp, only from Monitor or Hyp mode
if !HaveEL(EL2) || !(mode IN {M32_Monitor,M32_Hyp}) then UNPREDICTABLE;
when '11111' // SP_hyp, only from Monitor mode
if !HaveEL(EL2) || mode != M32_Monitor then UNPREDICTABLE;
otherwise
UNPREDICTABLE;
return;

Library pseudocode for aarch32/functions/system/CPSRWriteByInstr
// CPSRWriteByInstr()
// ==================
// Update PSTATE.<N,Z,C,V,Q,GE,E,A,I,F,M> from a CPSR value written by an MSR instruction.
CPSRWriteByInstr(bits(32) value, bits(4) bytemask)

privileged = PSTATE.EL != EL0; // PSTATE.<A,I,F,M> are not writable at EL0
// Write PSTATE from 'value', ignoring bytes masked by 'bytemask'

if bytemask<3> == '1' then
PSTATE.<N,Z,C,V,Q> = value<31:27>;
// Bits <26:24> are ignored

if HaveSSBSExt() then
PSTATE.SSBS = value<23>;
if privileged then
PSTATE.PAN = value<22>;
if HaveDITExt() then
PSTATE.DIT = value<21>;
// Bit <20> is RES0
PSTATE.GE = value<19:16>;

// Bits <15:10> are RES0
PSTATE.E = value<9>; // PSTATE.E is writable at EL0
if privileged then
PSTATE.A = value<8>;

if privileged then
PSTATE.<I,F> = value<7:6>;
// Bit <5> is RES0
// AArch32.WriteModeByInstr() sets PSTATE.IL to 1 if this is an illegal mode change.
AArch32.WriteModeByInstr(value<4:0>);
return;
Library pseudocode for aarch32/functions/system/ConditionPassed
// ConditionPassed()
// =================
boolean ConditionPassed()
return ConditionHolds(AArch32.CurrentCond());
Library pseudocode for aarch32/functions/system/CurrentCond
bits(4) AArch32.CurrentCond();
Library pseudocode for aarch32/functions/system/InITBlock
// InITBlock()
// ===========
boolean InITBlock()
if CurrentInstrSet() == InstrSet_T32 then
return PSTATE.IT<3:0> != '0000';
else
return FALSE;

Library pseudocode for aarch32/functions/system/LastInITBlock
// LastInITBlock()
// ===============
boolean LastInITBlock()
return (PSTATE.IT<3:0> == '1000');
Library pseudocode for aarch32/functions/system/SPSRWriteByInstr
// SPSRWriteByInstr()
// ==================
SPSRWriteByInstr(bits(32) value, bits(4) bytemask)
new_spsr = SPSR[];

new_spsr<31:24> = value<31:24>; // N,Z,C,V,Q flags, IT[1:0],J bits

new_spsr<23:16> = value<23:16>; // IL bit, GE[3:0] flags

new_spsr<15:8> = value<15:8>; // IT[7:2] bits, E bit, A interrupt mask

new_spsr<7:0> = value<7:0>; // I,F interrupt masks, T bit, Mode bits
SPSR[] = new_spsr; // UNPREDICTABLE if User or System mode
return;
Library pseudocode for aarch32/functions/system/SPSRaccessValid
// SPSRaccessValid()
// =================
// Checks for MRS (Banked register) or MSR (Banked register) accesses to the SPSRs
// that are UNPREDICTABLE
SPSRaccessValid(bits(5) SYSm, bits(5) mode)

case SYSm of
when '01110' // SPSR_fiq
if mode == M32_FIQ then UNPREDICTABLE;
when '10000' // SPSR_irq
if mode == M32_IRQ then UNPREDICTABLE;
when '10010' // SPSR_svc
if mode == M32_Svc then UNPREDICTABLE;
when '10100' // SPSR_abt
if mode == M32_Abort then UNPREDICTABLE;
when '10110' // SPSR_und
if mode == M32_Undef then UNPREDICTABLE;
when '11100' // SPSR_mon
if !HaveEL(EL3) || mode == M32_Monitor || !IsSecure() then UNPREDICTABLE;
when '11110' // SPSR_hyp
if !HaveEL(EL2) || mode != M32_Monitor then UNPREDICTABLE;
otherwise
UNPREDICTABLE;
return;

Library pseudocode for aarch32/functions/system/SelectInstrSet
// SelectInstrSet()
// ================
SelectInstrSet(InstrSet iset)
assert CurrentInstrSet() IN {InstrSet_A32, InstrSet_T32};
assert iset IN {InstrSet_A32, InstrSet_T32};
PSTATE.T = if iset == InstrSet_A32 then '0' else '1';
return;
Library pseudocode for aarch32/functions/v6simd/Sat
// Sat()
// =====
bits(N) Sat(integer i, integer N, boolean unsigned)

result = if unsigned then UnsignedSat(i, N) else SignedSat(i, N);
return result;
Library pseudocode for aarch32/functions/v6simd/SignedSat
// SignedSat()
// ===========
bits(N) SignedSat(integer i, integer N)

(result, -) = SignedSatQ(i, N);
return result;
Library pseudocode for aarch32/functions/v6simd/UnsignedSat
// UnsignedSat()
// =============
bits(N) UnsignedSat(integer i, integer N)

(result, -) = UnsignedSatQ(i, N);
return result;

Library pseudocode for aarch32/translation/attrs/AArch32.CombineS1S2Desc
// AArch32.CombineS1S2Desc()
// =========================
// Combines the address descriptors from stage 1 and stage 2
AddressDescriptor AArch32.CombineS1S2Desc(AddressDescriptor s1desc, AddressDescriptor s2desc)
AddressDescriptor result;
result.paddress = s2desc.paddress;
apply_force_writeback = HaveStage2MemAttrControl() && HCR_EL2.FWB == '1';

if IsFault(s1desc) || IsFault(s2desc) then
result = if IsFault(s1desc) then s1desc else s2desc;
else
result.fault = AArch32.NoFault();
if s2desc.memattrs.memtype == MemType_Device || (
(apply_force_writeback && s1desc.memattrs.memtype == MemType_Device && s2desc.memattrs.inner.
(!apply_force_writeback && s1desc.memattrs.memtype == MemType_Device) ) then
result.memattrs.memtype = MemType_Device;
if s1desc.memattrs.memtype == MemType_Normal then
result.memattrs.device = s2desc.memattrs.device;
elsif s2desc.memattrs.memtype == MemType_Normal then
result.memattrs.device = s1desc.memattrs.device;
else // Both Device
result.memattrs.device = CombineS1S2Device(s1desc.memattrs.device,
s2desc.memattrs.device);
result.memattrs.tagged = FALSE;
// S1 can be either Normal or Device, S2 is Normal.
else
result.memattrs.memtype = MemType_Normal;
result.memattrs.device = DeviceType UNKNOWN;
result.memattrs.inner = CombineS1S2AttrHints(s1desc.memattrs.inner, s2desc.memattrs.inner);
result.memattrs.outer = CombineS1S2AttrHints(s1desc.memattrs.outer, s2desc.memattrs.outer);
result.memattrs.shareable = (s1desc.memattrs.shareable || s2desc.memattrs.shareable);
result.memattrs.outershareable = (s1desc.memattrs.outershareable ||
s2desc.memattrs.outershareable);
result.memattrs.tagged = (s1desc.memattrs.tagged &&
result.memattrs.inner.attrs == MemAttr_WB &&
result.memattrs.inner.hints == MemHint_RWA &&
result.memattrs.outer.attrs == MemAttr_WB &&
result.memattrs.outer.hints == MemHint_RWA);
result.memattrs = MemAttrDefaults(result.memattrs);
return result;

Library pseudocode for aarch32/translation/attrs/AArch32.DefaultTEXDecode
// AArch32.DefaultTEXDecode()
// ==========================
MemoryAttributes AArch32.DefaultTEXDecode(bits(3) TEX, bit C, bit B, bit S, AccType acctype)
MemoryAttributes memattrs;
// Reserved values map to allocated values

if (TEX == '001' && C:B == '01') || (TEX == '010' && C:B != '00') || TEX == '011' then
bits(5) texcb;
(-, texcb) = ConstrainUnpredictableBits(Unpredictable_RESTEXCB);
TEX = texcb<4:2>; C = texcb<1>; B = texcb<0>;
case TEX:C:B of
when '00000'
// Device-nGnRnE
memattrs.memtype = MemType_Device;
memattrs.device = DeviceType_nGnRnE;
when '00001', '01000'
// Device-nGnRE
memattrs.device = DeviceType_nGnRE;
when '00010', '00011', '00100'
// Write-back or Write-through Read allocate, or Non-cacheable
memattrs.memtype = MemType_Normal;
memattrs.inner = ShortConvertAttrsHints(C:B, acctype, FALSE);
memattrs.outer = ShortConvertAttrsHints(C:B, acctype, FALSE);
memattrs.shareable = (S == '1');
when '00110'
memattrs = MemoryAttributes IMPLEMENTATION_DEFINED;
when '00111'
// Write-back Read and Write allocate
memattrs.inner = ShortConvertAttrsHints('01', acctype, FALSE);
memattrs.outer = ShortConvertAttrsHints('01', acctype, FALSE);
when '1xxxx'
// Cacheable, TEX<1:0> = Outer attrs, {C,B} = Inner attrs
memattrs.inner = ShortConvertAttrsHints(C:B, acctype, FALSE);
memattrs.outer = ShortConvertAttrsHints(TEX<1:0>, acctype, FALSE);
otherwise
// Reserved, handled above
Unreachable();
// transient bits are not supported in this format

memattrs.inner.transient = FALSE;
memattrs.outer.transient = FALSE;
// distinction between inner and outer shareable is not supported in this format
memattrs.outershareable = memattrs.shareable;
memattrs.tagged = FALSE;
return MemAttrDefaults(memattrs);

Library pseudocode for aarch32/translation/attrs/AArch32.InstructionDevice
// AArch32.InstructionDevice()
// ===========================
// Instruction fetches from memory marked as Device but not execute-never might generate a
// Permission Fault but are otherwise treated as if from Normal Non-cacheable memory.
AddressDescriptor AArch32.InstructionDevice(AddressDescriptor addrdesc, bits(32) vaddress,

bits(40) ipaddress, integer level, bits(4) domain,
AccType acctype, boolean iswrite, boolean secondstage,
boolean s2fs1walk)
c = ConstrainUnpredictable(Unpredictable_INSTRDEVICE);
assert c IN {Constraint_NONE, Constraint_FAULT};
if c == Constraint_FAULT then
addrdesc.fault = AArch32.PermissionFault(ipaddress, domain, level, acctype, iswrite,
secondstage, s2fs1walk);
else
addrdesc.memattrs.memtype = MemType_Normal;
addrdesc.memattrs.inner.attrs = MemAttr_NC;
addrdesc.memattrs.inner.hints = MemHint_No;
addrdesc.memattrs.outer = addrdesc.memattrs.inner;
addrdesc.memattrs.tagged = FALSE;
addrdesc.memattrs = MemAttrDefaults(addrdesc.memattrs);
return addrdesc;

Library pseudocode for aarch32/translation/attrs/AArch32.RemappedTEXDecode
// AArch32.RemappedTEXDecode()
// ===========================
MemoryAttributes AArch32.RemappedTEXDecode(bits(3) TEX, bit C, bit B, bit S, AccType acctype)
region = UInt(TEX<0>:C:B); // TEX<2:1> are ignored in this mapping scheme

if region == 6 then
memattrs = MemoryAttributes IMPLEMENTATION_DEFINED;
else
base = 2 * region;
attrfield = PRRR<base+1:base>;
if attrfield == '11' then // Reserved, maps to allocated value

(-, attrfield) = ConstrainUnpredictableBits(Unpredictable_RESPRRR);
case attrfield of
when '00' // Device-nGnRnE
memattrs.device = DeviceType_nGnRnE;
when '01' // Device-nGnRE
memattrs.device = DeviceType_nGnRE;
when '10'
memattrs.inner = ShortConvertAttrsHints(NMRR<base+1:base>, acctype, FALSE);
memattrs.outer = ShortConvertAttrsHints(NMRR<base+17:base+16>, acctype, FALSE);
s_bit = if S == '0' then PRRR.NS0 else PRRR.NS1;
memattrs.shareable = (s_bit == '1');
memattrs.outershareable = (s_bit == '1' && PRRR<region+24> == '0');
when '11'
Unreachable();
// transient bits are not supported in this format

memattrs.inner.transient = FALSE;
memattrs.outer.transient = FALSE;

Library pseudocode for aarch32/translation/attrs/AArch32.S1AttrDecode
// AArch32.S1AttrDecode()
// ======================
// Converts the Stage 1 attribute fields, using the MAIR, to orthogonal
// attributes and hints.
MemoryAttributes AArch32.S1AttrDecode(bits(2) SH, bits(3) attr, AccType acctype)

mair = HMAIR1:HMAIR0;
else
mair = MAIR1:MAIR0;
index = 8 * UInt(attr);
attrfield = mair<index+7:index>;
if ((attrfield<7:4> != '0000' && attrfield<7:4> != '1111' && attrfield<3:0> == '0000') ||
(attrfield<7:4> == '0000' && attrfield<3:0> != 'xx00')) then
// Reserved, maps to an allocated value
(-, attrfield) = ConstrainUnpredictableBits(Unpredictable_RESMAIR);
if !HaveMTEExt() && attrfield<7:4> == '1111' && attrfield<3:0> == '0000' then
// Reserved, maps to an allocated value
(-, attrfield) = ConstrainUnpredictableBits(Unpredictable_RESMAIR);
if attrfield<7:4> == '0000' then // Device

case attrfield<3:0> of
when '0000' memattrs.device = DeviceType_nGnRnE;
when '0100' memattrs.device = DeviceType_nGnRE;
when '1000' memattrs.device = DeviceType_nGRE;
when '1100' memattrs.device = DeviceType_GRE;
otherwise Unreachable(); // Reserved, handled above
elsif attrfield<3:0> != '0000' then // Normal

memattrs.outer = LongConvertAttrsHints(attrfield<7:4>, acctype);
memattrs.inner = LongConvertAttrsHints(attrfield<3:0>, acctype);
memattrs.shareable = SH<1> == '1';
memattrs.outershareable = SH == '10';
elsif HaveMTEExt() && attrfield == '11110000' then // Normal, Tagged WB-RWA
memattrs.outer = LongConvertAttrsHints('1111', acctype); // WB_RWA
memattrs.inner = LongConvertAttrsHints('1111', acctype); // WB_RWA
memattrs.shareable = SH<1> == '1';
memattrs.outershareable = SH == '10';
memattrs.tagged = TRUE;
else
Unreachable(); // Reserved, handled above

Library pseudocode for aarch32/translation/attrs/AArch32.TranslateAddressS1Off
// AArch32.TranslateAddressS1Off()
// ===============================
// Called for stage 1 translations when translation is disabled to supply a default translation.
// Note that there are additional constraints on instruction prefetching that are not described in
// this pseudocode.
TLBRecord AArch32.TranslateAddressS1Off(bits(32) vaddress, AccType acctype, boolean iswrite)

TLBRecord result;
default_cacheable = (HasS2Translation() && ((if ELUsingAArch32(EL2) then HCR.DC else HCR_EL2.DC) == '
if default_cacheable then
// Use default cacheable settings
result.addrdesc.memattrs.memtype = MemType_Normal;
result.addrdesc.memattrs.inner.attrs = MemAttr_WB; // Write-back
result.addrdesc.memattrs.inner.hints = MemHint_RWA;
result.addrdesc.memattrs.shareable = FALSE;
result.addrdesc.memattrs.outershareable = FALSE;
result.addrdesc.memattrs.tagged = HCR_EL2.DCT == '1';
elsif acctype != AccType_IFETCH then
// Treat data as Device
result.addrdesc.memattrs.memtype = MemType_Device;
result.addrdesc.memattrs.device = DeviceType_nGnRnE;
result.addrdesc.memattrs.inner = MemAttrHints UNKNOWN;
result.addrdesc.memattrs.tagged = FALSE;
else
// Instruction cacheability controlled by SCTLR/HSCTLR.I
cacheable = HSCTLR.I == '1';
else
cacheable = SCTLR.I == '1';
result.addrdesc.memattrs.memtype = MemType_Normal;
if cacheable then
result.addrdesc.memattrs.inner.attrs = MemAttr_WT;
result.addrdesc.memattrs.inner.hints = MemHint_RA;
else
result.addrdesc.memattrs.inner.attrs = MemAttr_NC;
result.addrdesc.memattrs.inner.hints = MemHint_No;
result.addrdesc.memattrs.shareable = TRUE;
result.addrdesc.memattrs.outershareable = TRUE;
result.addrdesc.memattrs.tagged = FALSE;
result.addrdesc.memattrs.outer = result.addrdesc.memattrs.inner;
result.addrdesc.memattrs = MemAttrDefaults(result.addrdesc.memattrs);
result.perms.ap = bits(3) UNKNOWN;

result.perms.xn = '0';
result.perms.pxn = '0';
result.nG = bit UNKNOWN;

result.contiguous = boolean UNKNOWN;
result.domain = bits(4) UNKNOWN;
result.level = integer UNKNOWN;
result.blocksize = integer UNKNOWN;
result.addrdesc.paddress.address = ZeroExtend(vaddress);
result.addrdesc.paddress.NS = if IsSecure() then '0' else '1';
result.addrdesc.fault = AArch32.NoFault();
result.descupdate.AF = FALSE;
result.descupdate.AP = FALSE;
result.descupdate.descaddr = result.addrdesc;
return result;

Library pseudocode for aarch32/translation/checks/AArch32.AccessIsPrivileged
// AArch32.AccessIsPrivileged()
// ============================
boolean AArch32.AccessIsPrivileged(AccType acctype)
el = AArch32.AccessUsesEL(acctype);
if el == EL0 then
ispriv = FALSE;
elsif el != EL1 then
ispriv = TRUE;
else
ispriv = (acctype != AccType_UNPRIV);
return ispriv;
Library pseudocode for aarch32/translation/checks/AArch32.AccessUsesEL
// AArch32.AccessUsesEL()
// ======================
// Returns the Exception Level of the regime that will manage the translation for a given access type.
bits(2) AArch32.AccessUsesEL(AccType acctype)

if acctype == AccType_UNPRIV then
return EL0;
else
return PSTATE.EL;
Library pseudocode for aarch32/translation/checks/AArch32.CheckDomain
// AArch32.CheckDomain()
// =====================
(boolean, FaultRecord) AArch32.CheckDomain(bits(4) domain, bits(32) vaddress, integer level,

AccType acctype, boolean iswrite)
index = 2 * UInt(domain);
attrfield = DACR<index+1:index>;
if attrfield == '10' then // Reserved, maps to an allocated value

// Reserved value maps to an allocated value
(-, attrfield) = ConstrainUnpredictableBits(Unpredictable_RESDACR);
if attrfield == '00' then

fault = AArch32.DomainFault(domain, level, acctype, iswrite);
else
fault = AArch32.NoFault();
permissioncheck = (attrfield == '01');
return (permissioncheck, fault);

Library pseudocode for aarch32/translation/checks/AArch32.CheckPermission

// AArch32.CheckPermission()
// =========================
// Function used for permission checking from AArch32 stage 1 translations
FaultRecord AArch32.CheckPermission(Permissions perms, bits(32) vaddress, integer level,

bits(4) domain, bit NS, AccType acctype, boolean iswrite)
if PSTATE.EL != EL2 then

wxn = SCTLR.WXN == '1';
if TTBCR.EAE == '1' || SCTLR.AFE == '1' || perms.ap<0> == '1' then
priv_r = TRUE;
priv_w = perms.ap<2> == '0';
user_r = perms.ap<1> == '1';
user_w = perms.ap<2:1> == '01';
else
priv_r = perms.ap<2:1> != '00';
priv_w = perms.ap<2:1> == '01';
user_r = perms.ap<1> == '1';
user_w = FALSE;
uwxn = SCTLR.UWXN == '1';
ispriv = AArch32.AccessIsPrivileged(acctype);
pan = if HavePANExt() then PSTATE.PAN else '0';

is_ldst = !(acctype IN {AccType_DC, AccType_DC_UNPRIV, AccType_AT, AccType_IFETCH});
is_ats1xp = (acctype == AccType_AT && AArch32.ExecutingATS1xPInstr());
if pan == '1' && user_r && ispriv && (is_ldst || is_ats1xp) then
priv_r = FALSE;
priv_w = FALSE;
user_xn = !user_r || perms.xn == '1' || (user_w && wxn);

priv_xn = (!priv_r || perms.xn == '1' || perms.pxn == '1' ||
(priv_w && wxn) || (user_w && uwxn));
if ispriv then
(r, w, xn) = (priv_r, priv_w, priv_xn);
else
(r, w, xn) = (user_r, user_w, user_xn);
else
// Access from EL2
wxn = HSCTLR.WXN == '1';
r = TRUE;
w = perms.ap<2> == '0';
xn = perms.xn == '1' || (w && wxn);
// Restriction on Secure instruction fetch

if HaveEL(EL3) && IsSecure() && NS == '1' then
secure_instr_fetch = if ELUsingAArch32(EL3) then SCR.SIF else SCR_EL3.SIF;
if secure_instr_fetch == '1' then xn = TRUE;
if acctype == AccType_IFETCH then

fail = xn;
failedread = TRUE;
elsif acctype IN { AccType_ATOMICRW, AccType_ORDEREDRW, AccType_ORDEREDATOMICRW } then
fail = !r || !w;
failedread = !r;
elsif acctype == AccType_DC then
// DC maintenance instructions operating by VA, cannot fault from stage 1 translation.
fail = FALSE;
elsif iswrite then
fail = !w;
failedread = FALSE;
else
fail = !r;
failedread = TRUE;
if fail then
s2fs1walk = FALSE;

ipaddress = bits(40) UNKNOWN;
return AArch32.PermissionFault(ipaddress, domain, level, acctype,
!failedread, secondstage, s2fs1walk);
else
return AArch32.NoFault();
Library pseudocode for aarch32/translation/checks/AArch32.CheckS2Permission
// AArch32.CheckS2Permission()
// ===========================
// Function used for permission checking from AArch32 stage 2 translations
FaultRecord AArch32.CheckS2Permission(Permissions perms, bits(32) vaddress, bits(40) ipaddress,

integer level, AccType acctype, boolean iswrite,
boolean s2fs1walk)
assert HaveEL(EL2) && !IsSecure() && ELUsingAArch32(EL2) && HasS2Translation();
r = perms.ap<1> == '1';
w = perms.ap<2> == '1';
if HaveExtendedExecuteNeverExt() then
case perms.xn:perms.xxn of
when '00' xn = !r;
when '01' xn = !r || PSTATE.EL == EL1;
when '10' xn = TRUE;
when '11' xn = !r || PSTATE.EL == EL0;
else
xn = !r || perms.xn == '1';
// Stage 1 walk is checked as a read, regardless of the original type
if acctype == AccType_IFETCH && !s2fs1walk then
fail = xn;
failedread = TRUE;
elsif (acctype IN { AccType_ATOMICRW, AccType_ORDEREDRW, AccType_ORDEREDATOMICRW }) && !s2fs1walk the
fail = !r || !w;
failedread = !r;
elsif acctype == AccType_DC && !s2fs1walk then
// DC maintenance instructions operating by VA, do not generate Permission faults
// from stage 2 translation, other than from stage 1 translation table walk.
fail = FALSE;
elsif iswrite && !s2fs1walk then
fail = !w;
failedread = FALSE;
else
fail = !r;
failedread = !iswrite;
if fail then
domain = bits(4) UNKNOWN;
secondstage = TRUE;
return AArch32.PermissionFault(ipaddress, domain, level, acctype,
!failedread, secondstage, s2fs1walk);
else

Library pseudocode for aarch32/translation/debug/AArch32.CheckBreakpoint
// AArch32.CheckBreakpoint()
// =========================
// Called before executing the instruction of length "size" bytes at "vaddress" in an AArch32
// translation regime, when either debug exceptions are enabled, or halting debug is enabled
// and halting is allowed.
FaultRecord AArch32.CheckBreakpoint(bits(32) vaddress, integer size)

assert size IN {2,4};
match = FALSE;
mismatch = FALSE;
for i = 0 to UInt(DBGDIDR.BRPs)
(match_i, mismatch_i) = AArch32.BreakpointMatch(i, vaddress, size);
match = match || match_i;
mismatch = mismatch || mismatch_i;
if match && HaltOnBreakpointOrWatchpoint() then

reason = DebugHalt_Breakpoint;
Halt(reason);
elsif (match || mismatch) then
iswrite = FALSE;
debugmoe = DebugException_Breakpoint;
return AArch32.DebugFault(acctype, iswrite, debugmoe);
else
Library pseudocode for aarch32/translation/debug/AArch32.CheckDebug
// AArch32.CheckDebug()
// ====================
// Called on each access to check for a debug exception or entry to Debug state.
FaultRecord AArch32.CheckDebug(bits(32) vaddress, AccType acctype, boolean iswrite, integer size)
FaultRecord fault = AArch32.NoFault();
d_side = (acctype != AccType_IFETCH);

generate_exception = AArch32.GenerateDebugExceptions() && DBGDSCRext.MDBGen == '1';
halt = HaltOnBreakpointOrWatchpoint();
// Relative priority of Vector Catch and Breakpoint exceptions not defined in the architecture
vector_catch_first = ConstrainUnpredictableBool(Unpredictable_BPVECTORCATCHPRI);
if !d_side && vector_catch_first && generate_exception then

fault = AArch32.CheckVectorCatch(vaddress, size);
if fault.statuscode == Fault_None && (generate_exception || halt) then

if d_side then
fault = AArch32.CheckWatchpoint(vaddress, acctype, iswrite, size);
else
fault = AArch32.CheckBreakpoint(vaddress, size);
if fault.statuscode == Fault_None && !d_side && !vector_catch_first && generate_exception then

return AArch32.CheckVectorCatch(vaddress, size);
return fault;

Library pseudocode for aarch32/translation/debug/AArch32.CheckVectorCatch
// AArch32.CheckVectorCatch()
// ==========================
// Called before executing the instruction of length "size" bytes at "vaddress" in an AArch32
// translation regime, when debug exceptions are enabled.
FaultRecord AArch32.CheckVectorCatch(bits(32) vaddress, integer size)

match = AArch32.VCRMatch(vaddress);
if size == 4 && !match && AArch32.VCRMatch(vaddress + 2) then
match = ConstrainUnpredictableBool(Unpredictable_VCMATCHHALF);
if match then
iswrite = FALSE;
debugmoe = DebugException_VectorCatch;
else
Library pseudocode for aarch32/translation/debug/AArch32.CheckWatchpoint
// AArch32.CheckWatchpoint()
// =========================
// Called before accessing the memory location of "size" bytes at "address",
// when either debug exceptions are enabled for the access, or halting debug
// is enabled and halting is allowed.
FaultRecord AArch32.CheckWatchpoint(bits(32) vaddress, AccType acctype,

boolean iswrite, integer size)
match = FALSE;
ispriv = AArch32.AccessIsPrivileged(acctype);
for i = 0 to UInt(DBGDIDR.WRPs)
match = match || AArch32.WatchpointMatch(i, vaddress, size, ispriv, iswrite);
if match && HaltOnBreakpointOrWatchpoint() then

reason = DebugHalt_Watchpoint;
Halt(reason);
elsif match then
debugmoe = DebugException_Watchpoint;
else
Library pseudocode for aarch32/translation/faults/AArch32.AccessFlagFault
// AArch32.AccessFlagFault()
// =========================
FaultRecord AArch32.AccessFlagFault(bits(40) ipaddress, bits(4) domain, integer level,

boolean s2fs1walk)
extflag = bit UNKNOWN;

debugmoe = bits(4) UNKNOWN;
errortype = bits(2) UNKNOWN;
return AArch32.CreateFaultRecord(Fault_AccessFlag, ipaddress, domain, level, acctype, iswrite,
extflag, debugmoe, errortype, secondstage, s2fs1walk);

Library pseudocode for aarch32/translation/faults/AArch32.AddressSizeFault
// AArch32.AddressSizeFault()
// ==========================
FaultRecord AArch32.AddressSizeFault(bits(40) ipaddress, bits(4) domain, integer level,

boolean s2fs1walk)

return AArch32.CreateFaultRecord(Fault_AddressSize, ipaddress, domain, level, acctype, iswrite,
Library pseudocode for aarch32/translation/faults/AArch32.AlignmentFault
// AArch32.AlignmentFault()
// ========================
FaultRecord AArch32.AlignmentFault(AccType acctype, boolean iswrite, boolean secondstage)

level = integer UNKNOWN;
s2fs1walk = boolean UNKNOWN;
return AArch32.CreateFaultRecord(Fault_Alignment, ipaddress, domain, level, acctype, iswrite,

Library pseudocode for aarch32/translation/faults/AArch32.AsynchExternalAbort
// AArch32.AsynchExternalAbort()
// =============================
// Wrapper function for asynchronous external aborts
FaultRecord AArch32.AsynchExternalAbort(boolean parity, bits(2) errortype, bit extflag)
faulttype = if parity then Fault_AsyncParity else Fault_AsyncExternal;

iswrite = boolean UNKNOWN;
s2fs1walk = FALSE;
return AArch32.CreateFaultRecord(faulttype, ipaddress, domain, level, acctype, iswrite, extflag,

debugmoe, errortype, secondstage, s2fs1walk);

Library pseudocode for aarch32/translation/faults/AArch32.DebugFault
// AArch32.DebugFault()
// ====================
FaultRecord AArch32.DebugFault(AccType acctype, boolean iswrite, bits(4) debugmoe)

s2fs1walk = FALSE;
return AArch32.CreateFaultRecord(Fault_Debug, ipaddress, domain, level, acctype, iswrite,

Library pseudocode for aarch32/translation/faults/AArch32.DomainFault
// AArch32.DomainFault()
// =====================
FaultRecord AArch32.DomainFault(bits(4) domain, integer level, AccType acctype, boolean iswrite)

s2fs1walk = FALSE;
return AArch32.CreateFaultRecord(Fault_Domain, ipaddress, domain, level, acctype, iswrite,

Library pseudocode for aarch32/translation/faults/AArch32.NoFault
// AArch32.NoFault()
// =================
FaultRecord AArch32.NoFault()

iswrite = boolean UNKNOWN;
s2fs1walk = FALSE;
return AArch32.CreateFaultRecord(Fault_None, ipaddress, domain, level, acctype, iswrite,


Library pseudocode for aarch32/translation/faults/AArch32.PermissionFault
// AArch32.PermissionFault()
// =========================
FaultRecord AArch32.PermissionFault(bits(40) ipaddress, bits(4) domain, integer level,

boolean s2fs1walk)

return AArch32.CreateFaultRecord(Fault_Permission, ipaddress, domain, level, acctype, iswrite,
Library pseudocode for aarch32/translation/faults/AArch32.TranslationFault
// AArch32.TranslationFault()
// ==========================
FaultRecord AArch32.TranslationFault(bits(40) ipaddress, bits(4) domain, integer level,

boolean s2fs1walk)

return AArch32.CreateFaultRecord(Fault_Translation, ipaddress, domain, level, acctype, iswrite,

Library pseudocode for aarch32/translation/translation/AArch32.FirstStageTranslate
// AArch32.FirstStageTranslate()
// =============================
// Perform a stage 1 translation walk. The function used by Address Translation operations is
// similar except it uses the translation regime specified for the instruction.
AddressDescriptor AArch32.FirstStageTranslate(bits(32) vaddress, AccType acctype, boolean iswrite,

boolean wasaligned, integer size)

s1_enabled = HSCTLR.M == '1';
elsif EL2Enabled() then
tge = (if ELUsingAArch32(EL2) then HCR.TGE else HCR_EL2.TGE);
dc = (if ELUsingAArch32(EL2) then HCR.DC else HCR_EL2.DC);
s1_enabled = tge == '0' && dc == '0' && SCTLR.M == '1';
else
s1_enabled = SCTLR.M == '1';
TLBRecord S1;
S1.addrdesc.fault = AArch32.NoFault();
s2fs1walk = FALSE;
if s1_enabled then // First stage enabled

use_long_descriptor_format = PSTATE.EL == EL2 || TTBCR.EAE == '1';
if use_long_descriptor_format then
S1 = AArch32.TranslationTableWalkLD(ipaddress, vaddress, acctype, iswrite, secondstage,
s2fs1walk, size);
permissioncheck = TRUE; domaincheck = FALSE;
else
S1 = AArch32.TranslationTableWalkSD(vaddress, acctype, iswrite, size);
permissioncheck = TRUE; domaincheck = TRUE;
else
S1 = AArch32.TranslateAddressS1Off(vaddress, acctype, iswrite);
permissioncheck = FALSE; domaincheck = FALSE;
InGuardedPage = FALSE; // No memory is guarded when stage 1 address translation i
if !IsFault(S1.addrdesc) && UsingAArch32() && HaveTrapLoadStoreMultipleDeviceExt() && AArch32.Executi

if S1.addrdesc.memattrs.memtype == MemType_Device && S1.addrdesc.memattrs.device != DeviceType_GR
nTLSMD = if S1TranslationRegime() == EL2 then HSCTLR.nTLSMD else SCTLR.nTLSMD;
if nTLSMD == '0' then
S1.addrdesc.fault = AArch32.AlignmentFault(acctype, iswrite, secondstage);
// Check for unaligned data accesses to Device memory

if ((!wasaligned && acctype != AccType_IFETCH) || (acctype == AccType_DCZVA))
&& !IsFault(S1.addrdesc) && S1.addrdesc.memattrs.memtype == MemType_Device then
if !IsFault(S1.addrdesc) && domaincheck then
(permissioncheck, abort) = AArch32.CheckDomain(S1.domain, vaddress, S1.level, acctype,
iswrite);
S1.addrdesc.fault = abort;
if !IsFault(S1.addrdesc) && permissioncheck then

S1.addrdesc.fault = AArch32.CheckPermission(S1.perms, vaddress, S1.level,
S1.domain, S1.addrdesc.paddress.NS,
acctype, iswrite);
// Check for instruction fetches from Device memory not marked as execute-never. If there has
// not been a Permission Fault then the memory is not marked execute-never.
if (!IsFault(S1.addrdesc) && S1.addrdesc.memattrs.memtype == MemType_Device &&
acctype == AccType_IFETCH) then
S1.addrdesc = AArch32.InstructionDevice(S1.addrdesc, vaddress, ipaddress, S1.level,
S1.domain, acctype, iswrite,
return S1.addrdesc;

Library pseudocode for aarch32/translation/translation/AArch32.FullTranslate
// AArch32.FullTranslate()
// =======================
// Perform both stage 1 and stage 2 translation walks for the current translation regime. The
// function used by Address Translation operations is similar except it uses the translation
// regime specified for the instruction.
AddressDescriptor AArch32.FullTranslate(bits(32) vaddress, AccType acctype, boolean iswrite,

// First Stage Translation

S1 = AArch32.FirstStageTranslate(vaddress, acctype, iswrite, wasaligned, size);
if !IsFault(S1) && !(HaveNV2Ext() && acctype == AccType_NV2REGISTER) && HasS2Translation() then
s2fs1walk = FALSE;
result = AArch32.SecondStageTranslate(S1, vaddress, acctype, iswrite, wasaligned, s2fs1walk,
size);
else
result = S1;
return result;

Library pseudocode for aarch32/translation/translation/AArch32.SecondStageTranslate
// AArch32.SecondStageTranslate()
// ==============================
// Perform a stage 2 translation walk. The function used by Address Translation operations is
// similar except it uses the translation regime specified for the instruction.
AddressDescriptor AArch32.SecondStageTranslate(AddressDescriptor S1, bits(32) vaddress,

AccType acctype, boolean iswrite, boolean wasaligned,
boolean s2fs1walk, integer size)
assert HasS2Translation();
assert IsZero(S1.paddress.address<47:40>);
hwupdatewalk = FALSE;
return AArch64.SecondStageTranslate(S1, ZeroExtend(vaddress, 64), acctype, iswrite,
wasaligned, s2fs1walk, size, hwupdatewalk);
s2_enabled = HCR.VM == '1' || HCR.DC == '1';

secondstage = TRUE;
if s2_enabled then // Second stage enabled

ipaddress = S1.paddress.address<39:0>;
S2 = AArch32.TranslationTableWalkLD(ipaddress, vaddress, acctype, iswrite, secondstage,
s2fs1walk, size);
// Check for unaligned data accesses to Device memory

if ((!wasaligned && acctype != AccType_IFETCH) || (acctype == AccType_DCZVA))
&& S2.addrdesc.memattrs.memtype == MemType_Device && !IsFault(S2.addrdesc) then
// Check for permissions on Stage2 translations

if !IsFault(S2.addrdesc) then
S2.addrdesc.fault = AArch32.CheckS2Permission(S2.perms, vaddress, ipaddress, S2.level,
acctype, iswrite, s2fs1walk);
// Check for instruction fetches from Device memory not marked as execute-never. As there
// has not been a Permission Fault then the memory is not marked execute-never.
if (!s2fs1walk && !IsFault(S2.addrdesc) && S2.addrdesc.memattrs.memtype == MemType_Device &&
acctype == AccType_IFETCH) then
S2.addrdesc = AArch32.InstructionDevice(S2.addrdesc, vaddress, ipaddress, S2.level,
domain, acctype, iswrite,
if (s2fs1walk && !IsFault(S2.addrdesc) &&

S2.addrdesc.memattrs.memtype == MemType_Device) then
// Check for protected table walk.
if HCR.PTW == '1' then
S2.addrdesc.fault = AArch32.PermissionFault(ipaddress,
domain, S2.level,
acctype, iswrite, secondstage, s2fs1walk);
else
// Translation table walk occurs as Normal Non-cacheable memory.
S2.addrdesc.memattrs.memtype = MemType_Normal;
S2.addrdesc.memattrs.inner.attrs = MemAttr_NC;
S2.addrdesc.memattrs.outer.attrs = MemAttr_NC;
S2.addrdesc.memattrs.shareable = TRUE;
S2.addrdesc.memattrs.outershareable = TRUE;
result = AArch32.CombineS1S2Desc(S1, S2.addrdesc);

else
result = S1;
return result;

Library pseudocode for aarch32/translation/translation/AArch32.SecondStageWalk
// AArch32.SecondStageWalk()
// =========================
// Perform a stage 2 translation on a stage 1 translation page table walk access.
AddressDescriptor AArch32.SecondStageWalk(AddressDescriptor S1, bits(32) vaddress, AccType acctype,

boolean iswrite, integer size)
assert HasS2Translation();
s2fs1walk = TRUE;
wasaligned = TRUE;
return AArch32.SecondStageTranslate(S1, vaddress, acctype, iswrite, wasaligned, s2fs1walk,
size);
Library pseudocode for aarch32/translation/translation/AArch32.TranslateAddress
// AArch32.TranslateAddress()
// ==========================
// Main entry point for translating an address
AddressDescriptor AArch32.TranslateAddress(bits(32) vaddress, AccType acctype, boolean iswrite,

if !ELUsingAArch32(S1TranslationRegime()) then
return AArch64.TranslateAddress(ZeroExtend(vaddress, 64), acctype, iswrite, wasaligned,
size);
result = AArch32.FullTranslate(vaddress, acctype, iswrite, wasaligned, size);
if !(acctype IN {AccType_PTW, AccType_IC, AccType_AT}) && !IsFault(result) then

result.fault = AArch32.CheckDebug(vaddress, acctype, iswrite, size);
// Update virtual address for abort functions

result.vaddress = ZeroExtend(vaddress);
return result;

Library pseudocode for aarch32/translation/walk/AArch32.TranslationTableWalkLD

// AArch32.TranslationTableWalkLD()
// ================================
// Returns a result of a translation table walk using the Long-descriptor format
//
// Implementations might cache information from memory in any number of non-coherent TLB
// caching structures, and so avoid memory accesses that have been expressed in this
// pseudocode. The use of such TLBs is not expressed in this pseudocode.
TLBRecord AArch32.TranslationTableWalkLD(bits(40) ipaddress, bits(32) vaddress,

boolean s2fs1walk, integer size)
if !secondstage then
else
assert HaveEL(EL2) && !IsSecure() && ELUsingAArch32(EL2) && HasS2Translation();
TLBRecord result;
AddressDescriptor descaddr;
bits(64) baseregister;
bits(40) inputaddr; // Input Address is 'vaddress' for stage 1, 'ipaddress' for stage 2
bit nswalk; // Stage 2 translation table walks are to Secure or to Non-secure PA s
result.descupdate.AF = FALSE;
result.descupdate.AP = FALSE;
descaddr.memattrs.memtype = MemType_Normal;
// Fixed parameters for the page table walk:

// grainsize = Log2(Size of Table) - Size of Table is 4KB in AArch32
// stride = Log2(Address per Level) - Bits of address consumed at each level
constant integer grainsize = 12; // Log2(4KB page size)
constant integer stride = grainsize - 3; // Log2(page size / 8 bytes)
// Derived parameters for the page table walk:

// inputsize = Log2(Size of Input Address) - Input Address size in bits
// level = Level to start walk from
// This means that the number of levels after start level = 3-level
// First stage translation
inputaddr = ZeroExtend(vaddress);
el = AArch32.AccessUsesEL(acctype);
if el == EL2 then
inputsize = 32 - UInt(HTCR.T0SZ);
basefound = inputsize == 32 || IsZero(inputaddr<31:inputsize>);
disabled = FALSE;
baseregister = HTTBR;
descaddr.memattrs = WalkAttrDecode(HTCR.SH0, HTCR.ORGN0, HTCR.IRGN0, secondstage);
reversedescriptors = HSCTLR.EE == '1';
lookupsecure = FALSE;
singlepriv = TRUE;
hierattrsdisabled = AArch32.HaveHPDExt() && HTCR.HPD == '1';
else
basefound = FALSE;
disabled = FALSE;
t0size = UInt(TTBCR.T0SZ);
if t0size == 0 || IsZero(inputaddr<31:(32-t0size)>) then
inputsize = 32 - t0size;
basefound = TRUE;
baseregister = TTBR0;
descaddr.memattrs = WalkAttrDecode(TTBCR.SH0, TTBCR.ORGN0, TTBCR.IRGN0, secondstage);
hierattrsdisabled = AArch32.HaveHPDExt() && TTBCR.T2E == '1' && TTBCR2.HPD0 == '1';
t1size = UInt(TTBCR.T1SZ);
if (t1size == 0 && !basefound) || (t1size > 0 && IsOnes(inputaddr<31:(32-t1size)>)) then
inputsize = 32 - t1size;
basefound = TRUE;
baseregister = TTBR1;
descaddr.memattrs = WalkAttrDecode(TTBCR.SH1, TTBCR.ORGN1, TTBCR.IRGN1, secondstage);

hierattrsdisabled = AArch32.HaveHPDExt() && TTBCR.T2E == '1' && TTBCR2.HPD1 == '1';
reversedescriptors = SCTLR.EE == '1';
lookupsecure = IsSecure();
singlepriv = FALSE;
// The starting level is the number of strides needed to consume the input address
level = 4 - (1 + ((inputsize - grainsize - 1) DIV stride));
else
// Second stage translation
inputaddr = ipaddress;
inputsize = 32 - SInt(VTCR.T0SZ);
// VTCR.S must match VTCR.T0SZ[3]
if VTCR.S != VTCR.T0SZ<3> then
(-, inputsize) = ConstrainUnpredictableInteger(32-7, 32+8, Unpredictable_RESVTCRS);
basefound = inputsize == 40 || IsZero(inputaddr<39:inputsize>);
disabled = FALSE;
descaddr.memattrs = WalkAttrDecode(VTCR.SH0, VTCR.ORGN0, VTCR.IRGN0, secondstage);
reversedescriptors = HSCTLR.EE == '1';
singlepriv = TRUE;
lookupsecure = FALSE;
baseregister = VTTBR;
startlevel = UInt(VTCR.SL0);
level = 2 - startlevel;
if level <= 0 then basefound = FALSE;
// Number of entries in the starting level table =

// (Size of Input Address)/((Address per level)^(Num levels remaining)*(Size of Table))
startsizecheck = inputsize - ((3 - level)*stride + grainsize); // Log2(Num of entries)
// Check for starting level table with fewer than 2 entries or longer than 16 pages.
// Lower bound check is: startsizecheck < Log2(2 entries)
// That is, VTCR.SL0 == '00' and SInt(VTCR.T0SZ) > 1, Size of Input Address < 2^31 bytes
// Upper bound check is: startsizecheck > Log2(pagesize/8*16)
// That is, VTCR.SL0 == '01' and SInt(VTCR.T0SZ) < -2, Size of Input Address > 2^34 bytes
if startsizecheck < 1 || startsizecheck > stride + 4 then basefound = FALSE;
if !basefound || disabled then
level = 1; // AArch64 reports this as a level 0 fault
result.addrdesc.fault = AArch32.TranslationFault(ipaddress, domain, level, acctype, iswrite,
return result;
if !IsZero(baseregister<47:40>) then
level = 0;
result.addrdesc.fault = AArch32.AddressSizeFault(ipaddress, domain, level, acctype, iswrite,
return result;
// Bottom bound of the Base address is:

// Log2(8 bytes per entry)+Log2(Number of entries in starting level table)
// Number of entries in starting level table =
// (Size of Input Address)/((Address per level)^(Num levels remaining)*(Size of Table))
baselowerbound = 3 + inputsize - ((3-level)*stride + grainsize); // Log2(Num of entries*8)
baseaddress = baseregister<39:baselowerbound>:Zeros(baselowerbound);
ns_table = if lookupsecure then '0' else '1';

ap_table = '00';
xn_table = '0';
pxn_table = '0';
addrselecttop = inputsize - 1;
repeat
addrselectbottom = (3-level)*stride + grainsize;
bits(40) index = ZeroExtend(inputaddr<addrselecttop:addrselectbottom>:'000');

descaddr.paddress.address = ZeroExtend(baseaddress OR index);
descaddr.paddress.NS = ns_table;
// If there are two stages of translation, then the first stage table walk addresses

// are themselves subject to translation
if secondstage || !HasS2Translation() || (HaveNV2Ext() && acctype == AccType_NV2REGISTER) then
descaddr2 = descaddr;
else
descaddr2 = AArch32.SecondStageWalk(descaddr, vaddress, acctype, iswrite, 8);
// Check for a fault on the stage 2 walk
if IsFault(descaddr2) then
result.addrdesc.fault = descaddr2.fault;
return result;

descaddr2.vaddress = ZeroExtend(vaddress);
accdesc = CreateAccessDescriptorPTW(acctype, secondstage, s2fs1walk, level);

desc = _Mem[descaddr2, 8, accdesc];
if reversedescriptors then desc = BigEndianReverse(desc);
if desc<0> == '0' || (desc<1:0> == '01' && level == 3) then

// Fault (00), Reserved (10), or Block (01) at level 3.
result.addrdesc.fault = AArch32.TranslationFault(ipaddress, domain, level, acctype,
iswrite, secondstage, s2fs1walk);
return result;
// Valid Block, Page, or Table entry

if desc<1:0> == '01' || level == 3 then // Block (01) or Page (11)
blocktranslate = TRUE;
else // Table (11)
if !IsZero(desc<47:40>) then
result.addrdesc.fault = AArch32.AddressSizeFault(ipaddress, domain, level, acctype,
return result;
baseaddress = desc<39:grainsize>:Zeros(grainsize);
// Unpack the upper and lower table attributes
ns_table = ns_table OR desc<63>;
if !secondstage && !hierattrsdisabled then
ap_table<1> = ap_table<1> OR desc<62>; // read-only
xn_table = xn_table OR desc<60>;

// pxn_table and ap_table[0] apply only in EL1&0 translation regimes
if !singlepriv then
pxn_table = pxn_table OR desc<59>;
ap_table<0> = ap_table<0> OR desc<61>; // privileged
level = level + 1;
addrselecttop = addrselectbottom - 1;
blocktranslate = FALSE;
until blocktranslate;
// Unpack the descriptor into address and upper and lower block attributes
outputaddress = desc<39:addrselectbottom>:inputaddr<addrselectbottom-1:0>;
// Check the output address is inside the supported range

if !IsZero(desc<47:40>) then
result.addrdesc.fault = AArch32.AddressSizeFault(ipaddress, domain, level, acctype,
return result;
// Check the access flag

if desc<10> == '0' then
result.addrdesc.fault = AArch32.AccessFlagFault(ipaddress, domain, level, acctype,
return result;
xn = desc<54>; // Bit[54] of the block/page descriptor hol
pxn = desc<53>; // Bit[53] of the block/page descriptor hol
ap = desc<7:6>:'1'; // Bits[7:6] of the block/page descriptor h
contiguousbit = desc<52>;
nG = desc<11>;

sh = desc<9:8>;
memattr = desc<5:2>; // AttrIndx and NS bit in stage 1
result.domain = bits(4) UNKNOWN; // Domains not used

result.level = level;
result.blocksize = 2^((3-level)*stride + grainsize);
// Stage 1 translation regimes also inherit attributes from the tables

result.perms.xn = xn OR xn_table;
result.perms.ap<2> = ap<2> OR ap_table<1>; // Force read-only
// PXN, nG and AP[1] apply only in EL1&0 stage 1 translation regimes
if !singlepriv then
result.perms.ap<1> = ap<1> AND NOT(ap_table<0>); // Force privileged only
result.perms.pxn = pxn OR pxn_table;
// Pages from Non-secure tables are marked non-global in Secure EL1&0
if IsSecure() then
result.nG = nG OR ns_table;
else
result.nG = nG;
else
result.perms.ap<1> = '1';
result.nG = '0';
result.GP = desc<50>; // Stage 1 block or pages might be guarded
result.addrdesc.memattrs = AArch32.S1AttrDecode(sh, memattr<2:0>, acctype);
result.addrdesc.paddress.NS = memattr<3> OR ns_table;
else
result.perms.ap<2:1> = ap<2:1>;
result.perms.xn = xn;
if HaveExtendedExecuteNeverExt() then result.perms.xxn = desc<53>;
result.nG = '0';
if s2fs1walk then
result.addrdesc.memattrs = S2AttrDecode(sh, memattr, AccType_PTW);
else
result.addrdesc.memattrs = S2AttrDecode(sh, memattr, acctype);
result.addrdesc.paddress.NS = '1';
result.addrdesc.paddress.address = ZeroExtend(outputaddress);
result.contiguous = contiguousbit == '1';
if HaveCommonNotPrivateTransExt() then result.CnP = baseregister<0>;
return result;

Library pseudocode for aarch32/translation/walk/AArch32.TranslationTableWalkSD

// AArch32.TranslationTableWalkSD()
// ================================
// Returns a result of a translation table walk using the Short-descriptor format
//
// Implementations might cache information from memory in any number of non-coherent TLB
// caching structures, and so avoid memory accesses that have been expressed in this
// pseudocode. The use of such TLBs is not expressed in this pseudocode.
TLBRecord AArch32.TranslationTableWalkSD(bits(32) vaddress, AccType acctype, boolean iswrite,

integer size)
// This is only called when address translation is enabled

TLBRecord result;
AddressDescriptor l1descaddr;
AddressDescriptor l2descaddr;
bits(40) outputaddress;
// Variables for Abort functions

s2fs1walk = FALSE;
NS = bit UNKNOWN;
// Default setting of the domain

level = 1;
// Determine correct Translation Table Base Register to use.

bits(64) ttbr;
n = UInt(TTBCR.N);
if n == 0 || IsZero(vaddress<31:(32-n)>) then
ttbr = TTBR0;
disabled = (TTBCR.PD0 == '1');
else
ttbr = TTBR1;
disabled = (TTBCR.PD1 == '1');
n = 0; // TTBR1 translation always works like N=0 TTBR0 translation
// Check this Translation Table Base Register is not disabled.

if disabled then
level = 1;
result.addrdesc.fault = AArch32.TranslationFault(ipaddress, domain, level, acctype, iswrite,
return result;
// Obtain descriptor from initial lookup.

l1descaddr.paddress.address = ZeroExtend(ttbr<31:14-n>:vaddress<31-n:20>:'00');
l1descaddr.paddress.NS = if IsSecure() then '0' else '1';
IRGN = ttbr<0>:ttbr<6>; // TTBR.IRGN
RGN = ttbr<4:3>; // TTBR.RGN
SH = ttbr<1>:ttbr<5>; // TTBR.S:TTBR.NOS
l1descaddr.memattrs = WalkAttrDecode(SH, RGN, IRGN, secondstage);
if !HaveEL(EL2) || (IsSecure() && !IsSecureEL2Enabled()) then

// if only 1 stage of translation
l1descaddr2 = l1descaddr;
else
l1descaddr2 = AArch32.SecondStageWalk(l1descaddr, vaddress, acctype, iswrite, 4);
if IsFault(l1descaddr2) then
result.addrdesc.fault = l1descaddr2.fault;
return result;

l1descaddr2.vaddress = ZeroExtend(vaddress);

l1desc = _Mem[l1descaddr2, 4,accdesc];

if SCTLR.EE == '1' then l1desc = BigEndianReverse(l1desc);
// Process descriptor from initial lookup.

case l1desc<1:0> of
when '00' // Fault, Reserved
level = 1;
return result;
when '01' // Large page or Small page

domain = l1desc<8:5>;
level = 2;
pxn = l1desc<2>;
NS = l1desc<3>;
// Obtain descriptor from level 2 lookup.

l2descaddr.paddress.address = ZeroExtend(l1desc<31:10>:vaddress<19:12>:'00');
l2descaddr.paddress.NS = if IsSecure() then '0' else '1';
l2descaddr.memattrs = l1descaddr.memattrs;
if !HaveEL(EL2) || (IsSecure() && !IsSecureEL2Enabled()) then

// if only 1 stage of translation
l2descaddr2 = l2descaddr;
else
l2descaddr2 = AArch32.SecondStageWalk(l2descaddr, vaddress, acctype, iswrite, 4);
if IsFault(l2descaddr2) then
result.addrdesc.fault = l2descaddr2.fault;
return result;

l2descaddr2.vaddress = ZeroExtend(vaddress);

l2desc = _Mem[l2descaddr2, 4, accdesc];
if SCTLR.EE == '1' then l2desc = BigEndianReverse(l2desc);
// Process descriptor from level 2 lookup.

if l2desc<1:0> == '00' then
return result;
nG = l2desc<11>;
S = l2desc<10>;
ap = l2desc<9,5:4>;
if SCTLR.AFE == '1' && l2desc<4> == '0' then

// Armv8 VMSAv8-32 does not support hardware management of the Access flag.
return result;
if l2desc<1> == '0' then // Large page

xn = l2desc<15>;
tex = l2desc<14:12>;
c = l2desc<3>;
b = l2desc<2>;
blocksize = 64;
outputaddress = ZeroExtend(l2desc<31:16>:vaddress<15:0>);
else // Small page
tex = l2desc<8:6>;
c = l2desc<3>;
b = l2desc<2>;
xn = l2desc<0>;
blocksize = 4;

when '1x' // Section or Supersection
NS = l1desc<19>;
nG = l1desc<17>;
S = l1desc<16>;
ap = l1desc<15,11:10>;
tex = l1desc<14:12>;
xn = l1desc<4>;
c = l1desc<3>;
b = l1desc<2>;
pxn = l1desc<0>;
level = 1;
if SCTLR.AFE == '1' && l1desc<10> == '0' then

// Armv8 VMSAv8-32 does not support hardware management of the Access flag.
return result;
if l1desc<18> == '0' then // Section

domain = l1desc<8:5>;
blocksize = 1024;
else // Supersection
domain = '0000';
blocksize = 16384;
outputaddress = l1desc<8:5>:l1desc<23:20>:l1desc<31:24>:vaddress<23:0>;
// Decode the TEX, C, B and S bits to produce the TLBRecord's memory attributes
if SCTLR.TRE == '0' then
if RemapRegsHaveResetValues() then
result.addrdesc.memattrs = AArch32.DefaultTEXDecode(tex, c, b, S, acctype);
else
result.addrdesc.memattrs = MemoryAttributes IMPLEMENTATION_DEFINED;
else
result.addrdesc.memattrs = AArch32.RemappedTEXDecode(tex, c, b, S, acctype);
// Set the rest of the TLBRecord, try to add it to the TLB, and return it.
result.perms.ap = ap;
result.perms.xn = xn;
result.perms.pxn = pxn;
result.nG = nG;
result.domain = domain;
result.level = level;
result.blocksize = blocksize;
result.addrdesc.paddress.address = ZeroExtend(outputaddress);
result.addrdesc.paddress.NS = if IsSecure() then NS else '1';
return result;
Library pseudocode for aarch32/translation/walk/RemapRegsHaveResetValues
boolean RemapRegsHaveResetValues();

Library pseudocode for aarch64/debug/breakpoint/AArch64.BreakpointMatch
// AArch64.BreakpointMatch()
// =========================
// Breakpoint matching in an AArch64 translation regime.
boolean AArch64.BreakpointMatch(integer n, bits(64) vaddress, AccType acctype, integer size)

assert !ELUsingAArch32(S1TranslationRegime());
assert n <= UInt(ID_AA64DFR0_EL1.BRPs);
enabled = DBGBCR_EL1[n].E == '1';

ispriv = PSTATE.EL != EL0;
linked = DBGBCR_EL1[n].BT == '0x01';
isbreakpnt = TRUE;
linked_to = FALSE;
state_match = AArch64.StateMatch(DBGBCR_EL1[n].SSC, DBGBCR_EL1[n].HMC, DBGBCR_EL1[n].PMC,

linked, DBGBCR_EL1[n].LBN, isbreakpnt, acctype, ispriv);
value_match = AArch64.BreakpointValueMatch(n, vaddress, linked_to);
if HaveAnyAArch32() && size == 4 then // Check second halfword

// If the breakpoint address and BAS of an Address breakpoint match the address of the
// second halfword of an instruction, but not the address of the first halfword, it is
// CONSTRAINED UNPREDICTABLE whether or not this breakpoint generates a Breakpoint debug
// event.
match_i = AArch64.BreakpointValueMatch(n, vaddress + 2, linked_to);
if !value_match && match_i then
value_match = ConstrainUnpredictableBool(Unpredictable_BPMATCHHALF);
if vaddress<1> == '1' && DBGBCR_EL1[n].BAS == '1111' then
// The above notwithstanding, if DBGBCR_EL1[n].BAS == '1111', then it is CONSTRAINED
// UNPREDICTABLE whether or not a Breakpoint debug event is generated for an instruction
// at the address DBGBVR_EL1[n]+2.
if value_match then value_match = ConstrainUnpredictableBool(Unpredictable_BPMATCHHALF);
match = value_match && state_match && enabled;
return match;

Library pseudocode for aarch64/debug/breakpoint/AArch64.BreakpointValueMatch

// AArch64.BreakpointValueMatch()
// ==============================
boolean AArch64.BreakpointValueMatch(integer n, bits(64) vaddress, boolean linked_to)
// "n" is the identity of the breakpoint unit to match against.

// "vaddress" is the current instruction address, ignored if linked_to is TRUE and for Context
// matching breakpoints.
// "linked_to" is TRUE if this is a call from StateMatch for linking.
// If a non-existent breakpoint then it is CONSTRAINED UNPREDICTABLE whether this gives

// no match or the breakpoint is mapped to another UNKNOWN implemented breakpoint.
if n > UInt(ID_AA64DFR0_EL1.BRPs) then
(c, n) = ConstrainUnpredictableInteger(0, UInt(ID_AA64DFR0_EL1.BRPs), Unpredictable_BPNOTIMPL);
// If this breakpoint is not enabled, it cannot generate a match. (This could also happen on a
// call from StateMatch for linking).
if DBGBCR_EL1[n].E == '0' then return FALSE;
context_aware = (n >= UInt(ID_AA64DFR0_EL1.BRPs) - UInt(ID_AA64DFR0_EL1.CTX_CMPs));
// If BT is set to a reserved type, behaves either as disabled or as a not-reserved type.

dbgtype = DBGBCR_EL1[n].BT;
if ((dbgtype IN {'011x','11xx'} && !HaveVirtHostExt() && !HaveV82Debug()) || // Context matching

dbgtype == '010x' || // Reserved
(dbgtype != '0x0x' && !context_aware) || // Context matching
(dbgtype == '1xxx' && !HaveEL(EL2))) then // EL2 extension
(c, dbgtype) = ConstrainUnpredictableBits(Unpredictable_RESBPTYPE);
// Otherwise the value returned by ConstrainUnpredictableBits must be a not-reserved value
// Determine what to compare against.

match_addr = (dbgtype == '0x0x');
match_vmid = (dbgtype == '10xx');
match_cid = (dbgtype == '001x');
match_cid1 = (dbgtype IN { '101x', 'x11x'});
match_cid2 = (dbgtype == '11xx');
linked = (dbgtype == 'xxx1');
// If this is a call from StateMatch, return FALSE if the breakpoint is not programmed for a
// VMID and/or context ID match, of if not context-aware. The above assertions mean that the
// code can just test for match_addr == TRUE to confirm all these things.
if linked_to && (!linked || match_addr) then return FALSE;
// If called from BreakpointMatch return FALSE for Linked context ID and/or VMID matches.
if !linked_to && linked && !match_addr then return FALSE;
// Do the comparison.
if match_addr then
byte = UInt(vaddress<1:0>);
if HaveAnyAArch32() then
// T32 instructions can be executed at EL0 in an AArch64 translation regime.
assert byte IN {0,2}; // "vaddress" is halfword aligned
byte_select_match = (DBGBCR_EL1[n].BAS<byte> == '1');
else
assert byte == 0; // "vaddress" is word aligned
byte_select_match = TRUE; // DBGBCR_EL1[n].BAS<byte> is RES1
top = AddrTop(vaddress, TRUE, PSTATE.EL);
BVR_match = vaddress<top:2> == DBGBVR_EL1[n]<top:2> && byte_select_match;
elsif match_cid then
if IsInHost() then
BVR_match = (CONTEXTIDR_EL2 == DBGBVR_EL1[n]<31:0>);
else
BVR_match = (PSTATE.EL IN {EL0, EL1} && CONTEXTIDR_EL1 == DBGBVR_EL1[n]<31:0>);
BVR_match = (PSTATE.EL IN {EL0, EL1} && !IsInHost() && CONTEXTIDR_EL1 == DBGBVR_EL1[n]<31:0>);

if match_vmid then
if !Have16bitVMID() || VTCR_EL2.VS == '0' then
vmid = ZeroExtend(VTTBR_EL2.VMID<7:0>, 16);
bvr_vmid = ZeroExtend(DBGBVR_EL1[n]<39:32>, 16);
else
vmid = VTTBR_EL2.VMID;
bvr_vmid = DBGBVR_EL1[n]<47:32>;
BXVR_match = (PSTATE.EL IN {EL0, EL1} && EL2Enabled() &&
!IsInHost() &&
vmid == bvr_vmid);
BXVR_match = ((HaveVirtHostExt() || HaveV82Debug()) && EL2Enabled() &&
DBGBVR_EL1[n]<63:32> == CONTEXTIDR_EL2);
bvr_match_valid = (match_addr || match_cid || match_cid1);

bxvr_match_valid = (match_vmid || match_cid2);
match = (!bxvr_match_valid || BXVR_match) && (!bvr_match_valid || BVR_match);
return match;

Library pseudocode for aarch64/debug/breakpoint/AArch64.StateMatch
// AArch64.StateMatch()
// ====================
// Determine whether a breakpoint or watchpoint is enabled in the current mode and state.
boolean AArch64.StateMatch(bits(2) SSC, bit HMC, bits(2) PxC, boolean linked, bits(4) LBN,
boolean isbreakpnt, AccType acctype, boolean ispriv)
// "SSC", "HMC", "PxC" are the control fields from the DBGBCR[n] or DBGWCR[n] register.
// "linked" is TRUE if this is a linked breakpoint/watchpoint type.
// "LBN" is the linked breakpoint number from the DBGBCR[n] or DBGWCR[n] register.
// "isbreakpnt" is TRUE for breakpoints, FALSE for watchpoints.
// "ispriv" is valid for watchpoints, and selects between privileged and unprivileged accesses.
// If parameters are set to a reserved type, behaves as either disabled or a defined type
(c, SSC, HMC, PxC) = CheckValidStateMatch(SSC, HMC, PxC, isbreakpnt);
// Otherwise the HMC,SSC,PxC values are either valid or the values returned by
// CheckValidStateMatch are valid.
EL3_match = HaveEL(EL3) && HMC == '1' && SSC<0> == '0';

EL2_match = HaveEL(EL2) && ((HMC == '1' && (SSC:PxC != '1000')) || SSC == '11');
EL1_match = PxC<0> == '1';
EL0_match = PxC<1> == '1';
if HaveNV2Ext() && acctype == AccType_NV2REGISTER && !isbreakpnt then

priv_match = EL2_match;
elsif !ispriv && !isbreakpnt then
priv_match = EL0_match;
else
case PSTATE.EL of
when EL3 priv_match = EL3_match;
case SSC of
when '00' security_state_match = TRUE; // Both
when '01' security_state_match = !IsSecure(); // Non-secure only
when '10' security_state_match = IsSecure(); // Secure only
when '11' security_state_match = (HMC == '1' || IsSecure()); // HMC=1 -> Both, 0 -> Secure only
if linked then
// "LBN" must be an enabled context-aware breakpoint unit. If it is not context-aware then
// it is CONSTRAINED UNPREDICTABLE whether this gives no match, or LBN is mapped to some
// UNKNOWN breakpoint that is context-aware.
lbn = UInt(LBN);
first_ctx_cmp = (UInt(ID_AA64DFR0_EL1.BRPs) - UInt(ID_AA64DFR0_EL1.CTX_CMPs));
last_ctx_cmp = UInt(ID_AA64DFR0_EL1.BRPs);
if (lbn < first_ctx_cmp || lbn > last_ctx_cmp) then
(c, lbn) = ConstrainUnpredictableInteger(first_ctx_cmp, last_ctx_cmp, Unpredictable_BPNOTCTXC
case c of
when Constraint_NONE linked = FALSE; // No linking
// Otherwise ConstrainUnpredictableInteger returned a context-aware breakpoint
if linked then
linked_to = TRUE;
linked_match = AArch64.BreakpointValueMatch(lbn, vaddress, linked_to);
return priv_match && security_state_match && (!linked || linked_match);

Library pseudocode for aarch64/debug/enables/AArch64.GenerateDebugExceptions
// AArch64.GenerateDebugExceptions()
// =================================
boolean AArch64.GenerateDebugExceptions()
return AArch64.GenerateDebugExceptionsFrom(PSTATE.EL, IsSecure(), PSTATE.D);
Library pseudocode for aarch64/debug/enables/AArch64.GenerateDebugExceptionsFrom
// AArch64.GenerateDebugExceptionsFrom()
// =====================================
boolean AArch64.GenerateDebugExceptionsFrom(bits(2) from, boolean secure, bit mask)
if OSLSR_EL1.OSLK == '1' || DoubleLockStatus() || Halted() then

return FALSE;
route_to_el2 = HaveEL(EL2) && (!secure || IsSecureEL2Enabled()) && (HCR_EL2.TGE == '1' || MDCR_EL2.TD

target = (if route_to_el2 then EL2 else EL1);
enabled = !HaveEL(EL3) || !secure || MDCR_EL3.SDD == '0';
if from == target then

enabled = enabled && MDSCR_EL1.KDE == '1' && mask == '0';
else
enabled = enabled && UInt(target) > UInt(from);
return enabled;
Library pseudocode for aarch64/debug/pmu/AArch64.CheckForPMUOverflow
// AArch64.CheckForPMUOverflow()
// =============================
// Signal Performance Monitors overflow IRQ and CTI overflow events
boolean AArch64.CheckForPMUOverflow()
pmuirq = PMCR_EL0.E == '1' && PMINTENSET_EL1<31> == '1' && PMOVSSET_EL0<31> == '1';

for n = 0 to UInt(PMCR_EL0.N) - 1
if HaveEL(EL2) then
E = (if n < UInt(MDCR_EL2.HPMN) then PMCR_EL0.E else MDCR_EL2.HPME);
else
E = PMCR_EL0.E;
if E == '1' && PMINTENSET_EL1<n> == '1' && PMOVSSET_EL0<n> == '1' then pmuirq = TRUE;
SetInterruptRequestLevel(InterruptID_PMUIRQ, if pmuirq then HIGH else LOW);
CTI_SetEventLevel(CrossTriggerIn_PMUOverflow, if pmuirq then HIGH else LOW);
// The request remains set until the condition is cleared. (For example, an interrupt handler
// or cross-triggered event handler clears the overflow status flag by writing to PMOVSCLR_EL0.)
return pmuirq;

Library pseudocode for aarch64/debug/pmu/AArch64.CountEvents
// AArch64.CountEvents()
// =====================
// Return TRUE if counter "n" should count its event. For the cycle counter, n == 31.
boolean AArch64.CountEvents(integer n)
assert n == 31 || n < UInt(PMCR_EL0.N);
// Event counting is disabled in Debug state

debug = Halted();
// In Non-secure state, some counters are reserved for EL2

if HaveEL(EL2) then
E = if n < UInt(MDCR_EL2.HPMN) || n == 31 then PMCR_EL0.E else MDCR_EL2.HPME;
else
E = PMCR_EL0.E;
enabled = E == '1' && PMCNTENSET_EL0<n> == '1';
// Event counting in Secure state is prohibited unless any one of:

// * EL3 is not implemented
// * EL3 is using AArch64 and MDCR_EL3.SPME == 1
prohibited = HaveEL(EL3) && IsSecure() && MDCR_EL3.SPME == '0';
// Event counting at EL2 is prohibited if all of:

// * The HPMD Extension is implemented
// * Executing at EL2
// * PMNx is not reserved for EL2
// * MDCR_EL2.HPMD == 1
if !prohibited && HaveEL(EL2) && HaveHPMDExt() && PSTATE.EL == EL2 && (n < UInt(MDCR_EL2.HPMN) || n =
prohibited = (MDCR_EL2.HPMD == '1');
// The IMPLEMENTATION DEFINED authentication interface might override software controls

if prohibited && !HaveNoSecurePMUDisableOverride() then
prohibited = !ExternalSecureNoninvasiveDebugEnabled();
// For the cycle counter, PMCR_EL0.DP enables counting when otherwise prohibited
if prohibited && n == 31 then prohibited = (PMCR_EL0.DP == '1');
// Event counting can be filtered by the {P, U, NSK, NSU, NSH, M, SH} bits
filter = if n == 31 then PMCCFILTR else PMEVTYPER[n];
P = filter<31>;
U = filter<30>;
NSK = if HaveEL(EL3) then filter<29> else '0';
NSU = if HaveEL(EL3) then filter<28> else '0';
NSH = if HaveEL(EL2) then filter<27> else '0';
M = if HaveEL(EL3) then filter<26> else '0';
SH = if HaveEL(EL3) && HaveSecureEL2Ext() then filter<24> else '0';
case PSTATE.EL of
when EL0 filtered = if IsSecure() then U == '1' else U != NSU;
when EL1 filtered = if IsSecure() then P == '1' else P != NSK;
when EL2 filtered = (if IsSecure() then NSH == SH else NSH == '0');
when EL3 filtered = (M != P);
return !debug && enabled && !prohibited && !filtered;

Library pseudocode for aarch64/debug/statisticalprofiling/CheckProfilingBufferAccess
// CheckProfilingBufferAccess()
// ============================
SysRegAccess CheckProfilingBufferAccess()
if !HaveStatisticalProfiling() || PSTATE.EL == EL0 || UsingAArch32() then
return SysRegAccess_UNDEFINED;
if PSTATE.EL == EL1 && EL2Enabled() && MDCR_EL2.E2PB<0> != '1' then

return SysRegAccess_TrapToEL2;
if HaveEL(EL3) && PSTATE.EL != EL3 && MDCR_EL3.NSPB != SCR_EL3.NS:'1' then

return SysRegAccess_OK;
Library pseudocode for aarch64/debug/statisticalprofiling/CheckStatisticalProfilingAccess
// CheckStatisticalProfilingAccess()
// =================================
SysRegAccess CheckStatisticalProfilingAccess()
if !HaveStatisticalProfiling() || PSTATE.EL == EL0 || UsingAArch32() then
return SysRegAccess_UNDEFINED;
if PSTATE.EL == EL1 && EL2Enabled() && MDCR_EL2.TPMS == '1' then

if HaveEL(EL3) && PSTATE.EL != EL3 && MDCR_EL3.NSPB != SCR_EL3.NS:'1' then

return SysRegAccess_OK;
Library pseudocode for aarch64/debug/statisticalprofiling/CollectContextIDR1
// CollectContextIDR1()
// ====================
boolean CollectContextIDR1()
if !StatisticalProfilingEnabled() then return FALSE;
if PSTATE.EL == EL2 then return FALSE;
if EL2Enabled() && HCR_EL2.TGE == '1' then return FALSE;
return PMSCR_EL1.CX == '1';
Library pseudocode for aarch64/debug/statisticalprofiling/CollectContextIDR2
// CollectContextIDR2()
// ====================
boolean CollectContextIDR2()
if !EL2Enabled() then return FALSE;
return PMSCR_EL2.CX == '1';
Library pseudocode for aarch64/debug/statisticalprofiling/CollectPhysicalAddress
// CollectPhysicalAddress()
// ========================
boolean CollectPhysicalAddress()
(secure, el) = ProfilingBufferOwner();
if ((!secure && HaveEL(EL2)) || IsSecureEL2Enabled()) then
return PMSCR_EL2.PA == '1' && (el == EL2 || PMSCR_EL1.PA == '1');
else
return PMSCR_EL1.PA == '1';

Library pseudocode for aarch64/debug/statisticalprofiling/CollectRecord
// CollectRecord()
// ===============
boolean CollectRecord(bits(64) events, integer total_latency, OpType optype)

assert StatisticalProfilingEnabled();
// Filtering by event
if PMSFCR_EL1.FE == '1' && !IsZero(PMSEVFR_EL1) then
bits(64) mask = 0xFFFF0000FF00F0AA<63:0>; // Bits [63:48,31:24,15:12,7,5,3,1]
if HaveStatisticalProfiling() then
mask<11> = '1'; // Alignment flag
if HaveSVE() then mask<18:17> = Ones(); // Predicate flags
e = events AND mask;
m = PMSEVFR_EL1 AND mask;
if !IsZero(NOT(e) AND m) then return FALSE;
// Filtering by type
if PMSFCR_EL1.FT == '1' && !IsZero(PMSFCR_EL1.<B,LD,ST>) then
case optype of
when OpType_Branch
if PMSFCR_EL1.B == '0' then return FALSE;
when OpType_Load
if PMSFCR_EL1.LD == '0' then return FALSE;
when OpType_Store
if PMSFCR_EL1.ST == '0' then return FALSE;
when OpType_LoadAtomic
if PMSFCR_EL1.<LD,ST> == '00' then return FALSE;
otherwise
return FALSE;
// Filtering by latency
if PMSFCR_EL1.FL == '1' && !IsZero(PMSLATFR_EL1.MINLAT) then
if total_latency < UInt(PMSLATFR_EL1.MINLAT) then
return FALSE;
// Check for UNPREDICTABLE cases

if ((PMSFCR_EL1.FE == '1' && !IsZero(PMSEVFR_EL1)) ||
(PMSFCR_EL1.FT == '1' && !IsZero(PMSFCR_EL1.<B,LD,ST>)) ||
(PMSFCR_EL1.FL == '1' && !IsZero(PMSLATFR_EL1.MINLAT))) then
return ConstrainUnpredictableBool(Unpredictable_BADPMSFCR);
return TRUE;

Library pseudocode for aarch64/debug/statisticalprofiling/CollectTimeStamp
// CollectTimeStamp()
// ==================
TimeStamp CollectTimeStamp()
if !StatisticalProfilingEnabled() then return TimeStamp_None;
if el == EL2 then
if PMSCR_EL2.TS == '0' then return TimeStamp_None;
else
if PMSCR_EL1.TS == '0' then return TimeStamp_None;
if EL2Enabled() then
case PMSCR_EL2.PCT of
when '00'
return TimeStamp_Virtual;
when '01'
if el == EL2 then return TimeStamp_Physical;
when '11'
if (el == EL2 || PMSCR_EL1.PCT != '00') && HaveECVExt() then
return TimeStamp_OffsetPhysical;
otherwise
Unreachable();
case PMSCR_EL1.PCT of
when '00' return TimeStamp_Virtual;
when '01' return TimeStamp_Physical;
when '11' if HaveECVExt() then return TimeStamp_OffsetPhysical;
Library pseudocode for aarch64/debug/statisticalprofiling/OpType
enumeration OpType {
OpType_Load, // Any memory-read operation other than atomics, compare-and-swap, and swap
OpType_Store, // Any memory-write operation, including atomics without return
OpType_LoadAtomic, // Atomics with return, compare-and-swap and swap
OpType_Branch, // Software write to the PC
OpType_Other // Any other class of operation
};
Library pseudocode for aarch64/debug/statisticalprofiling/ProfilingBufferEnabled
// ProfilingBufferEnabled()
// ========================
boolean ProfilingBufferEnabled()
if !HaveStatisticalProfiling() then return FALSE;
non_secure_bit = if secure then '0' else '1';
return (!ELUsingAArch32(el) && non_secure_bit == SCR_EL3.NS &&
PMBLIMITR_EL1.E == '1' && PMBSR_EL1.S == '0');
Library pseudocode for aarch64/debug/statisticalprofiling/ProfilingBufferOwner
// ProfilingBufferOwner()
// ======================
(boolean, bits(2)) ProfilingBufferOwner()

secure = if HaveEL(EL3) then (MDCR_EL3.NSPB<1> == '0') else IsSecure();
el = if !secure && HaveEL(EL2) && MDCR_EL2.E2PB == '00' then EL2 else EL1;
return (secure, el);

Library pseudocode for aarch64/debug/statisticalprofiling/ProfilingSynchronizationBarrier
// Barrier to ensure that all existing profiling data has been formatted, and profiling buffer
// addresses have been translated such that writes to the profiling buffer have been initiated.
// A following DSB completes when writes to the profiling buffer have completed.
Library pseudocode for aarch64/debug/statisticalprofiling/StatisticalProfilingEnabled
// StatisticalProfilingEnabled()
// =============================
boolean StatisticalProfilingEnabled()
if !HaveStatisticalProfiling() || UsingAArch32() || !ProfilingBufferEnabled() then
return FALSE;
in_host = EL2Enabled() && HCR_EL2.TGE == '1';

if UInt(el) < UInt(PSTATE.EL) || secure != IsSecure() || (in_host && el == EL1) then
return FALSE;
case PSTATE.EL of
when EL3 Unreachable();
when EL2 spe_bit = PMSCR_EL2.E2SPE;
when EL1 spe_bit = PMSCR_EL1.E1SPE;
when EL0 spe_bit = (if in_host then PMSCR_EL2.E0HSPE else PMSCR_EL1.E0SPE);
return spe_bit == '1';
Library pseudocode for aarch64/debug/statisticalprofiling/SysRegAccess
enumeration SysRegAccess { SysRegAccess_OK,

SysRegAccess_UNDEFINED,
SysRegAccess_TrapToEL1,
SysRegAccess_TrapToEL2,
SysRegAccess_TrapToEL3 };
Library pseudocode for aarch64/debug/statisticalprofiling/TimeStamp
enumeration TimeStamp {
TimeStamp_None, // No timestamp
TimeStamp_CoreSight, // CoreSight time (IMPLEMENTATION DEFINED)
TimeStamp_Physical, // Physical counter value with no offset
TimeStamp_OffsetPhysical, // Physical counter value minus CNTPOFF_EL2
TimeStamp_Virtual }; // Physical counter value minus CNTVOFF_EL2

Library pseudocode for aarch64/debug/takeexceptiondbg/AArch64.TakeExceptionInDebugState
// AArch64.TakeExceptionInDebugState()
// ===================================
// Take an exception in Debug state to an Exception Level using AArch64.
AArch64.TakeExceptionInDebugState(bits(2) target_el, ExceptionRecord exception)

assert HaveEL(target_el) && !ELUsingAArch32(target_el) && UInt(target_el) >= UInt(PSTATE.EL);
sync_errors = HaveIESB() && SCTLR[target_el].IESB == '1';

if HaveDoubleFaultExt() then
sync_errors = sync_errors || (SCR_EL3.EA == '1' && SCR_EL3.NMEA == '1' && target_el == EL3);
// SCTLR[].IESB might be ignored in Debug state.
if !ConstrainUnpredictableBool(Unpredictable_IESBinDebug) then
sync_errors = FALSE;
// If coming from AArch32 state, the top parts of the X[] registers might be set to zero
from_32 = UsingAArch32();
if from_32 then AArch64.MaybeZeroRegisterUppers();
MaybeZeroSVEUppers(target_el);
AArch64.ReportException(exception, target_el);
PSTATE.EL = target_el;
PSTATE.nRW = '0';
PSTATE.SP = '1';

ELR[] = bits(64) UNKNOWN;
// PSTATE.{SS,D,A,I,F} are not observable and ignored in Debug state, so behave as if UNKNOWN.
PSTATE.<SS,D,A,I,F> = bits(5) UNKNOWN;
PSTATE.IL = '0';
if from_32 then // Coming from AArch32
PSTATE.IT = '00000000';
if (HavePANExt() && (PSTATE.EL == EL1 || (PSTATE.EL == EL2 && ELIsInHost(EL0))) &&
SCTLR[].SPAN == '0') then
PSTATE.PAN = '1';
if HaveUAOExt() then PSTATE.UAO = '0';
if HaveBTIExt() then PSTATE.BTYPE = '00';
if HaveMTEExt() then PSTATE.TCO = '1';
DLR_EL0 = bits(64) UNKNOWN;

DSPSR_EL0 = bits(32) UNKNOWN;
EDSCR.ERR = '1';
if sync_errors then
EndOfInstruction();

Library pseudocode for aarch64/debug/watchpoint/AArch64.WatchpointByteMatch
// AArch64.WatchpointByteMatch()
// =============================
boolean AArch64.WatchpointByteMatch(integer n, AccType acctype, bits(64) vaddress)
el = if HaveNV2Ext() && acctype == AccType_NV2REGISTER then EL2 else PSTATE.EL;

top = AddrTop(vaddress, FALSE, el);
bottom = if DBGWVR_EL1[n]<2> == '1' then 2 else 3; // Word or doubleword
byte_select_match = (DBGWCR_EL1[n].BAS<UInt(vaddress<bottom-1:0>)> != '0');
mask = UInt(DBGWCR_EL1[n].MASK);
// If DBGWCR_EL1[n].MASK is non-zero value and DBGWCR_EL1[n].BAS is not set to '11111111', or

// DBGWCR_EL1[n].BAS specifies a non-contiguous set of bytes behavior is CONSTRAINED
// UNPREDICTABLE.
if mask > 0 && !IsOnes(DBGWCR_EL1[n].BAS) then
byte_select_match = ConstrainUnpredictableBool(Unpredictable_WPMASKANDBAS);
else
LSB = (DBGWCR_EL1[n].BAS AND NOT(DBGWCR_EL1[n].BAS - 1)); MSB = (DBGWCR_EL1[n].BAS + LSB);
if !IsZero(MSB AND (MSB - 1)) then // Not contiguous
byte_select_match = ConstrainUnpredictableBool(Unpredictable_WPBASCONTIGUOUS);
bottom = 3; // For the whole doubleword
// If the address mask is set to a reserved value, the behavior is CONSTRAINED UNPREDICTABLE.
if mask > 0 && mask <= 2 then
(c, mask) = ConstrainUnpredictableInteger(3, 31, Unpredictable_RESWPMASK);
case c of
when Constraint_NONE mask = 0; // No masking
// Otherwise the value returned by ConstrainUnpredictableInteger is a not-reserved value
if mask > bottom then

WVR_match = (vaddress<top:mask> == DBGWVR_EL1[n]<top:mask>);
// If masked bits of DBGWVR_EL1[n] are not zero, the behavior is CONSTRAINED UNPREDICTABLE.
if WVR_match && !IsZero(DBGWVR_EL1[n]<mask-1:bottom>) then
WVR_match = ConstrainUnpredictableBool(Unpredictable_WPMASKEDBITS);
else
WVR_match = vaddress<top:bottom> == DBGWVR_EL1[n]<top:bottom>;
return WVR_match && byte_select_match;

Library pseudocode for aarch64/debug/watchpoint/AArch64.WatchpointMatch
// AArch64.WatchpointMatch()
// =========================
// Watchpoint matching in an AArch64 translation regime.
boolean AArch64.WatchpointMatch(integer n, bits(64) vaddress, integer size, boolean ispriv,

AccType acctype, boolean iswrite)
assert !ELUsingAArch32(S1TranslationRegime());
assert n <= UInt(ID_AA64DFR0_EL1.WRPs);
// "ispriv" is FALSE for LDTR/STTR instructions executed at EL1 and all

// load/stores at EL0, TRUE for all other load/stores. "iswrite" is TRUE for stores, FALSE for
// loads.
enabled = DBGWCR_EL1[n].E == '1';
linked = DBGWCR_EL1[n].WT == '1';
isbreakpnt = FALSE;
state_match = AArch64.StateMatch(DBGWCR_EL1[n].SSC, DBGWCR_EL1[n].HMC, DBGWCR_EL1[n].PAC,

linked, DBGWCR_EL1[n].LBN, isbreakpnt, acctype, ispriv);
ls_match = (DBGWCR_EL1[n].LSC<(if iswrite then 1 else 0)> == '1');
value_match = FALSE;
for byte = 0 to size - 1
value_match = value_match || AArch64.WatchpointByteMatch(n, acctype, vaddress + byte);
return value_match && state_match && ls_match && enabled;
Library pseudocode for aarch64/exceptions/aborts/AArch64.Abort
// AArch64.Abort()
// ===============
// Abort and Debug exception handling in an AArch64 translation regime.
AArch64.Abort(bits(64) vaddress, FaultRecord fault)
if IsDebugException(fault) then
if fault.acctype == AccType_IFETCH then
if UsingAArch32() && fault.debugmoe == DebugException_VectorCatch then
AArch64.VectorCatchException(fault);
else
AArch64.BreakpointException(fault);
else
AArch64.WatchpointException(vaddress, fault);
elsif fault.acctype == AccType_IFETCH then
AArch64.InstructionAbort(vaddress, fault);
else
AArch64.DataAbort(vaddress, fault);

Library pseudocode for aarch64/exceptions/aborts/AArch64.AbortSyndrome
// AArch64.AbortSyndrome()
// =======================
// Creates an exception syndrome record for Abort and Watchpoint exceptions
// from an AArch64 translation regime.
ExceptionRecord AArch64.AbortSyndrome(Exception exceptype, FaultRecord fault, bits(64) vaddress)

exception = ExceptionSyndrome(exceptype);
d_side = exceptype IN {Exception_DataAbort, Exception_NV2DataAbort, Exception_Watchpoint, Exception_N
exception.syndrome = AArch64.FaultSyndrome(d_side, fault);

exception.vaddress = ZeroExtend(vaddress);
if IPAValid(fault) then
exception.ipavalid = TRUE;
exception.NS = fault.ipaddress.NS;
exception.ipaddress = fault.ipaddress.address;
else
exception.ipavalid = FALSE;
return exception;
Library pseudocode for aarch64/exceptions/aborts/AArch64.CheckPCAlignment
// AArch64.CheckPCAlignment()
// ==========================
AArch64.CheckPCAlignment()
bits(64) pc = ThisInstrAddr();
if pc<1:0> != '00' then
AArch64.PCAlignmentFault();
Library pseudocode for aarch64/exceptions/aborts/AArch64.DataAbort
// AArch64.DataAbort()
// ===================
AArch64.DataAbort(bits(64) vaddress, FaultRecord fault)

route_to_el3 = HaveEL(EL3) && SCR_EL3.EA == '1' && IsExternalAbort(fault);
route_to_el2 = (PSTATE.EL IN {EL0, EL1} && EL2Enabled() && (HCR_EL2.TGE == '1' ||
(HaveRASExt() && HCR_EL2.TEA == '1' && IsExternalAbort(fault)) ||
(HaveNV2Ext() && fault.acctype == AccType_NV2REGISTER) ||
IsSecondStage(fault)));

if (HaveDoubleFaultExt() && (PSTATE.EL == EL3 || route_to_el3) &&
IsExternalAbort(fault) && SCR_EL3.EASE == '1') then
vect_offset = 0x180;
else
vect_offset = 0x0;
if HaveNV2Ext() && fault.acctype == AccType_NV2REGISTER then
exception = AArch64.AbortSyndrome(Exception_NV2DataAbort, fault, vaddress);
else
if PSTATE.EL == EL3 || route_to_el3 then
elsif PSTATE.EL == EL2 || route_to_el2 then
else

Library pseudocode for aarch64/exceptions/aborts/AArch64.EffectiveTCF
// AArch64.EffectiveTCF()
// ======================
// Returns the TCF field applied to tag check faults in the given Exception Level.
bits(2) AArch64.EffectiveTCF(bits(2) el)

bits(2) tcf;
if el == EL3 then
tcf = SCTLR_EL3.TCF;
elsif el == EL2 then
elsif el == EL0 && HCR_EL2.<E2H,TGE> == '11' then
tcf = SCTLR_EL2.TCF0;
elsif el == EL0 && HCR_EL2.<E2H,TGE> != '11' then
tcf = SCTLR_EL1.TCF0;
if tcf == '11' then

(-,tcf) = ConstrainUnpredictableBits(Unpredictable_RESTCF);
return tcf;
Library pseudocode for aarch64/exceptions/aborts/AArch64.InstructionAbort
// AArch64.InstructionAbort()
// ==========================
AArch64.InstructionAbort(bits(64) vaddress, FaultRecord fault)

// External aborts on instruction fetch must be taken synchronously
if HaveDoubleFaultExt() then assert fault.statuscode != Fault_AsyncExternal;
route_to_el3 = HaveEL(EL3) && SCR_EL3.EA == '1' && IsExternalAbort(fault);
route_to_el2 = (PSTATE.EL IN {EL0, EL1} && EL2Enabled() &&
(HCR_EL2.TGE == '1' || IsSecondStage(fault) ||
(HaveRASExt() && HCR_EL2.TEA == '1' && IsExternalAbort(fault))));
if (HaveDoubleFaultExt() && (PSTATE.EL == EL3 || route_to_el3) &&

IsExternalAbort(fault) && SCR_EL3.EASE == '1') then
else
vect_offset = 0x0;
exception = AArch64.AbortSyndrome(Exception_InstructionAbort, fault, vaddress);

else

Library pseudocode for aarch64/exceptions/aborts/AArch64.PCAlignmentFault
// AArch64.PCAlignmentFault()
// ==========================
// Called on unaligned program counter in AArch64 state.
AArch64.PCAlignmentFault()

vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_PCAlignment);
exception.vaddress = ThisInstrAddr();

elsif EL2Enabled() && HCR_EL2.TGE == '1' then
else
Library pseudocode for aarch64/exceptions/aborts/AArch64.RaiseTagCheckFault
// AArch64.RaiseTagCheckFault()
// ============================
// Raise a tag check fault exception.
AArch64.RaiseTagCheckFault(bits(64) va, boolean write)

bits(2) target_el;
integer vect_offset = 0x0;

target_el = if HCR_EL2.TGE == '0' then EL1 else EL2;
else
target_el = PSTATE.EL;
exception = ExceptionSyndrome(Exception_DataAbort);
exception.syndrome<5:0> = '010001';
if write then
exception.vaddress = bits(4) UNKNOWN : va<59:0>;
AArch64.TakeException(target_el, exception, preferred_exception_return, vect_offset);

Library pseudocode for aarch64/exceptions/aborts/AArch64.ReportTagCheckFail
// AArch64.ReportTagCheckFail()
// ============================
// Records a tag check fault exception into the appropriate TCFR_ELx.
AArch64.ReportTagCheckFail(bits(2) el, bit ttbr)

if el == EL3 then
assert ttbr == '0';
TFSR_EL3.TF0 = '1';
if ttbr == '0' then
TFSR_EL2.TF0 = '1';
else
TFSR_EL2.TF1 = '1';
if ttbr == '0' then
TFSR_EL1.TF0 = '1';
else
TFSR_EL1.TF1 = '1';
if ttbr == '0' then
TFSRE0_EL1.TF0 = '1';
else
TFSRE0_EL1.TF1 = '1';
Library pseudocode for aarch64/exceptions/aborts/AArch64.SPAlignmentFault
// AArch64.SPAlignmentFault()
// ==========================
// Called on an unaligned stack pointer in AArch64 state.
AArch64.SPAlignmentFault()

vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_SPAlignment);

elsif EL2Enabled() && HCR_EL2.TGE == '1' then
else
Library pseudocode for aarch64/exceptions/aborts/AArch64.TagCheckFault
// AArch64.TagCheckFault()
// =======================
// Handle a tag check fault condition.
AArch64.TagCheckFail(bits(64) vaddress, AccType acctype, boolean iswrite)

bits(2) el;
if AArch64.AccessIsPrivileged(acctype) then
el = PSTATE.EL;
else
el = EL0;
bits(2) tcf = AArch64.EffectiveTCF(el);

if tcf == '01' then
AArch64.RaiseTagCheckFault(vaddress, iswrite);
elsif tcf == '10' then
AArch64.ReportTagCheckFail(el, vaddress<55>);

Library pseudocode for aarch64/exceptions/aborts/BranchTargetException
// BranchTargetException
// =====================
// Raise branch target exception.
AArch64.BranchTargetException(bits(52) vaddress)

vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_BranchTarget);
exception.syndrome<1:0> = PSTATE.BTYPE;
exception.syndrome<24:2> = Zeros(); // RES0

else
Library pseudocode for aarch64/exceptions/asynch/AArch64.TakePhysicalFIQException
// AArch64.TakePhysicalFIQException()
// ==================================
AArch64.TakePhysicalFIQException()
route_to_el3 = HaveEL(EL3) && SCR_EL3.FIQ == '1';

(HCR_EL2.TGE == '1' || HCR_EL2.FMO == '1'));
assert PSTATE.EL != EL3;
else
assert PSTATE.EL IN {EL0, EL1};

Library pseudocode for aarch64/exceptions/asynch/AArch64.TakePhysicalIRQException
// AArch64.TakePhysicalIRQException()
// ==================================
// Take an enabled physical IRQ exception.
AArch64.TakePhysicalIRQException()
route_to_el3 = HaveEL(EL3) && SCR_EL3.IRQ == '1';

(HCR_EL2.TGE == '1' || HCR_EL2.IMO == '1'));
vect_offset = 0x80;
else
assert PSTATE.EL IN {EL0, EL1};
Library pseudocode for aarch64/exceptions/asynch/AArch64.TakePhysicalSErrorException
// AArch64.TakePhysicalSErrorException()
// =====================================
AArch64.TakePhysicalSErrorException(boolean impdef_syndrome, bits(24) syndrome)
route_to_el3 = HaveEL(EL3) && SCR_EL3.EA == '1';

(HCR_EL2.TGE == '1' || (!IsInHost() && HCR_EL2.AMO == '1')));
exception = ExceptionSyndrome(Exception_SError);
exception.syndrome<24> = if impdef_syndrome then '1' else '0';
exception.syndrome<23:0> = syndrome;

else
Library pseudocode for aarch64/exceptions/asynch/AArch64.TakeVirtualFIQException
// AArch64.TakeVirtualFIQException()
// =================================
AArch64.TakeVirtualFIQException()
assert HCR_EL2.TGE == '0' && HCR_EL2.FMO == '1'; // Virtual IRQ enabled if TGE==0 and FMO==1


Library pseudocode for aarch64/exceptions/asynch/AArch64.TakeVirtualIRQException
// AArch64.TakeVirtualIRQException()
// =================================
AArch64.TakeVirtualIRQException()
assert HCR_EL2.TGE == '0' && HCR_EL2.IMO == '1'; // Virtual IRQ enabled if TGE==0 and IMO==1

vect_offset = 0x80;
Library pseudocode for aarch64/exceptions/asynch/AArch64.TakeVirtualSErrorException
// AArch64.TakeVirtualSErrorException()
// ====================================
AArch64.TakeVirtualSErrorException(boolean impdef_syndrome, bits(24) syndrome)

assert HCR_EL2.TGE == '0' && HCR_EL2.AMO == '1'; // Virtual SError enabled if TGE==0 and AMO==1

exception = ExceptionSyndrome(Exception_SError);
if HaveRASExt() then
exception.syndrome<24> = VSESR_EL2.IDS;
exception.syndrome<23:0> = VSESR_EL2.ISS;
else
exception.syndrome<24> = if impdef_syndrome then '1' else '0';
if impdef_syndrome then exception.syndrome<23:0> = syndrome;
ClearPendingVirtualSError();
Library pseudocode for aarch64/exceptions/debug/AArch64.BreakpointException
// AArch64.BreakpointException()
// =============================
AArch64.BreakpointException(FaultRecord fault)

(HCR_EL2.TGE == '1' || MDCR_EL2.TDE == '1'));

vect_offset = 0x0;

exception = AArch64.AbortSyndrome(Exception_Breakpoint, fault, vaddress);

else

Library pseudocode for aarch64/exceptions/debug/AArch64.SoftwareBreakpoint
// AArch64.SoftwareBreakpoint()
// ============================
AArch64.SoftwareBreakpoint(bits(16) immediate)
route_to_el2 = (PSTATE.EL IN {EL0, EL1} &&

EL2Enabled() && (HCR_EL2.TGE == '1' || MDCR_EL2.TDE == '1'));

vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_SoftwareBreakpoint);

else
Library pseudocode for aarch64/exceptions/debug/AArch64.SoftwareStepException
// AArch64.SoftwareStepException()
// ===============================
AArch64.SoftwareStepException()


vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_SoftwareStep);
if SoftwareStep_DidNotStep() then
else
exception.syndrome<6> = if SoftwareStep_SteppedEX() then '1' else '0';

else
Library pseudocode for aarch64/exceptions/debug/AArch64.VectorCatchException
// AArch64.VectorCatchException()
// ==============================
// Vector Catch taken from EL0 or EL1 to EL2. This can only be called when debug exceptions are
// being routed to EL2, as Vector Catch is a legacy debug event.
AArch64.VectorCatchException(FaultRecord fault)
assert EL2Enabled() && (HCR_EL2.TGE == '1' || MDCR_EL2.TDE == '1');

vect_offset = 0x0;

exception = AArch64.AbortSyndrome(Exception_VectorCatch, fault, vaddress);

Library pseudocode for aarch64/exceptions/debug/AArch64.WatchpointException
// AArch64.WatchpointException()
// =============================
AArch64.WatchpointException(bits(64) vaddress, FaultRecord fault)



vect_offset = 0x0;

exception = AArch64.AbortSyndrome(Exception_NV2Watchpoint, fault, vaddress);
else
exception = AArch64.AbortSyndrome(Exception_Watchpoint, fault, vaddress);

else

Library pseudocode for aarch64/exceptions/exceptions/AArch64.ExceptionClass
// AArch64.ExceptionClass()
// ========================
// Returns the Exception Class and Instruction Length fields to be reported in ESR
(integer,bit) AArch64.ExceptionClass(Exception exceptype, bits(2) target_el)
il = if ThisInstrLength() == 32 then '1' else '0';

assert from_32 || il == '1'; // AArch64 instructions always 32-bit
case exceptype of
when Exception_Uncategorized ec = 0x00; il = '1';
when Exception_WFxTrap ec = 0x01;
when Exception_CP15RTTrap ec = 0x03; assert from_32;
when Exception_CP15RRTTrap ec = 0x04; assert from_32;
when Exception_CP14RTTrap ec = 0x05; assert from_32;
when Exception_CP14DTTrap ec = 0x06; assert from_32;
when Exception_AdvSIMDFPAccessTrap ec = 0x07;
when Exception_FPIDTrap ec = 0x08;
when Exception_PACTrap ec = 0x09;
when Exception_CP14RRTTrap ec = 0x0C; assert from_32;
when Exception_BranchTarget ec = 0x0D;
when Exception_IllegalState ec = 0x0E; il = '1';
when Exception_SupervisorCall ec = 0x11;
when Exception_HypervisorCall ec = 0x12;
when Exception_MonitorCall ec = 0x13;
when Exception_SystemRegisterTrap ec = 0x18; assert !from_32;
when Exception_SVEAccessTrap ec = 0x19; assert !from_32;
when Exception_ERetTrap ec = 0x1A;
when Exception_PACFail ec = 0x1C;
when Exception_InstructionAbort ec = 0x20; il = '1';
when Exception_PCAlignment ec = 0x22; il = '1';
when Exception_DataAbort ec = 0x24;
when Exception_NV2DataAbort ec = 0x25;
when Exception_SPAlignment ec = 0x26; il = '1'; assert !from_32;
when Exception_FPTrappedException ec = 0x28;
when Exception_SError ec = 0x2F; il = '1';
when Exception_Breakpoint ec = 0x30; il = '1';
when Exception_SoftwareStep ec = 0x32; il = '1';
when Exception_Watchpoint ec = 0x34; il = '1';
when Exception_NV2Watchpoint ec = 0x35; il = '1';
when Exception_SoftwareBreakpoint ec = 0x38;
when Exception_VectorCatch ec = 0x3A; il = '1'; assert from_32;
if ec IN {0x20,0x24,0x30,0x32,0x34} && target_el == PSTATE.EL then

ec = ec + 1;
if ec IN {0x11,0x12,0x13,0x28,0x38} && !from_32 then

ec = ec + 4;
return (ec,il);

Library pseudocode for aarch64/exceptions/exceptions/AArch64.ReportException
// AArch64.ReportException()
// =========================
// Report syndrome information for exception taken to AArch64 state.
AArch64.ReportException(ExceptionRecord exception, bits(2) target_el)
Exception exceptype = exception.exceptype;
(ec,il) = AArch64.ExceptionClass(exceptype, target_el);

iss = exception.syndrome;
// IL is not valid for Data Abort exceptions without valid instruction syndrome information
if ec IN {0x24,0x25} && iss<24> == '0' then
il = '1';
ESR[target_el] = ec<5:0>:il:iss;
if exceptype IN {Exception_InstructionAbort, Exception_PCAlignment, Exception_DataAbort,

Exception_NV2DataAbort, Exception_NV2Watchpoint,
Exception_Watchpoint} then
FAR[target_el] = exception.vaddress;
else
FAR[target_el] = bits(64) UNKNOWN;

if exception.ipavalid then
HPFAR_EL2<43:4> = exception.ipaddress<51:12>;
if IsSecureEL2Enabled() && IsSecure() then
HPFAR_EL2.NS = exception.NS;
else
HPFAR_EL2.NS = '0';
else
HPFAR_EL2<43:4> = bits(40) UNKNOWN;
return;
Library pseudocode for aarch64/exceptions/exceptions/AArch64.ResetControlRegisters
// Resets System registers and memory-mapped control registers that have architecturally-defined
// reset values to those values.
AArch64.ResetControlRegisters(boolean cold_reset);

Library pseudocode for aarch64/exceptions/exceptions/AArch64.TakeReset
// AArch64.TakeReset()
// ===================
// Reset into AArch64 state
AArch64.TakeReset(boolean cold_reset)
assert !HighestELUsingAArch32();
// Enter the highest implemented Exception level in AArch64 state

PSTATE.nRW = '0';
if HaveEL(EL3) then
PSTATE.EL = EL3;
PSTATE.EL = EL2;
else
PSTATE.EL = EL1;
// Reset the system registers and other system components

AArch64.ResetControlRegisters(cold_reset);
// Reset all other PSTATE fields

PSTATE.SP = '1'; // Select stack pointer
PSTATE.<D,A,I,F> = '1111'; // All asynchronous exceptions masked
PSTATE.SS = '0'; // Clear software step bit
PSTATE.DIT = '0'; // PSTATE.DIT is reset to 0 when resetting into AArch64
PSTATE.IL = '0'; // Clear Illegal Execution state bit
// All registers, bits and fields not reset by the above pseudocode or by the BranchTo() call
// below are UNKNOWN bitstrings after reset. In particular, the return information registers
// ELR_ELx and SPSR_ELx have UNKNOWN values, so that it
// is impossible to return from a reset in an architecturally defined way.
AArch64.ResetGeneralRegisters();
AArch64.ResetSIMDFPRegisters();
AArch64.ResetSpecialRegisters();
ResetExternalDebugRegisters(cold_reset);
bits(64) rv; // IMPLEMENTATION DEFINED reset vector
if HaveEL(EL3) then
rv = RVBAR_EL3;
rv = RVBAR_EL2;
else
rv = RVBAR_EL1;
// The reset vector must be correctly aligned

assert IsZero(rv<63:PAMax()>) && IsZero(rv<1:0>);
BranchTo(rv, BranchType_RESET);

Library pseudocode for aarch64/exceptions/ieeefp/AArch64.FPTrappedException
// AArch64.FPTrappedException()
// ============================
AArch64.FPTrappedException(boolean is_ase, integer element, bits(8) accumulated_exceptions)

exception = ExceptionSyndrome(Exception_FPTrappedException);
if is_ase then
if boolean IMPLEMENTATION_DEFINED "vector instructions set TFV to 1" then
exception.syndrome<23> = '1'; // TFV
else
else
exception.syndrome<10:8> = bits(3) UNKNOWN; // VECITR
if exception.syndrome<23> == '1' then
exception.syndrome<7,4:0> = accumulated_exceptions<7,4:0>; // IDF,IXF,UFF,OFF,DZF,IOF
else
exception.syndrome<7,4:0> = bits(6) UNKNOWN;
route_to_el2 = EL2Enabled() && HCR_EL2.TGE == '1';

vect_offset = 0x0;

else
Library pseudocode for aarch64/exceptions/syscalls/AArch64.CallHypervisor
// AArch64.CallHypervisor()
// ========================
// Performs a HVC call
AArch64.CallHypervisor(bits(16) immediate)
assert HaveEL(EL2);
if UsingAArch32() then AArch32.ITAdvance();

SSAdvance();
vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_HypervisorCall);

else

Library pseudocode for aarch64/exceptions/syscalls/AArch64.CallSecureMonitor
// AArch64.CallSecureMonitor()
// ===========================
AArch64.CallSecureMonitor(bits(16) immediate)
assert HaveEL(EL3) && !ELUsingAArch32(EL3);
SSAdvance();
vect_offset = 0x0;
Library pseudocode for aarch64/exceptions/syscalls/AArch64.CallSupervisor
// AArch64.CallSupervisor()
// ========================
// Calls the Supervisor
AArch64.CallSupervisor(bits(16) immediate)

SSAdvance();

vect_offset = 0x0;

else

Library pseudocode for aarch64/exceptions/takeexception/AArch64.TakeException

// AArch64.TakeException()
// =======================
// Take an exception to an Exception Level using AArch64.
AArch64.TakeException(bits(2) target_el, ExceptionRecord exception,

bits(64) preferred_exception_return, integer vect_offset)
assert HaveEL(target_el) && !ELUsingAArch32(target_el) && UInt(target_el) >= UInt(PSTATE.EL);
sync_errors = HaveIESB() && SCTLR[target_el].IESB == '1';

if HaveDoubleFaultExt() then
sync_errors = sync_errors || (SCR_EL3.EA == '1' && SCR_EL3.NMEA == '1' && target_el == EL3);
if sync_errors && InsertIESBBeforeException(target_el) then
iesb_req = FALSE;
sync_errors = FALSE;
TakeUnmaskedPhysicalSErrorInterrupts(iesb_req);
// If coming from AArch32 state, the top parts of the X[] registers might be set to zero
if from_32 then AArch64.MaybeZeroRegisterUppers();
MaybeZeroSVEUppers(target_el);
if UInt(target_el) > UInt(PSTATE.EL) then

boolean lower_32;
if EL2Enabled() then
lower_32 = ELUsingAArch32(EL2);
else
elsif IsInHost() && PSTATE.EL == EL0 && target_el == EL2 then
else
lower_32 = ELUsingAArch32(target_el - 1);
vect_offset = vect_offset + (if lower_32 then 0x600 else 0x400);
elsif PSTATE.SP == '1' then

vect_offset = vect_offset + 0x200;
if PSTATE.EL == EL1 && target_el == EL1 && EL2Enabled() then

if HaveNV2Ext() && (HCR_EL2.<NV,NV1,NV2> == '100' || HCR_EL2.<NV,NV1,NV2> == '111') then
spsr<3:2> = '10';
else
if HaveNVExt() && HCR_EL2.<NV,NV1> == '10' then
spsr<3:2> = '10';
// SPSR[].BTYPE is only guaranteed valid for these exception types
if exception.exceptype IN {Exception_SError, Exception_IRQ, Exception_FIQ,
Exception_SoftwareStep, Exception_PCAlignment,
Exception_InstructionAbort, Exception_Breakpoint,
Exception_VectorCatch, Exception_SoftwareBreakpoint,
Exception_IllegalState, Exception_BranchTarget} then
zero_btype = FALSE;
else
zero_btype = ConstrainUnpredictableBool(Unpredictable_ZEROBTYPE);
if zero_btype then spsr<11:10> = '00';
if HaveNV2Ext() && exception.exceptype == Exception_NV2DataAbort && target_el == EL3 then

// external aborts are configured to be taken to EL3
exception.exceptype = Exception_DataAbort;
if !(exception.exceptype IN {Exception_IRQ, Exception_FIQ}) then
AArch64.ReportException(exception, target_el);
PSTATE.EL = target_el;
PSTATE.nRW = '0';
PSTATE.SP = '1';

SPSR[] = spsr;
ELR[] = preferred_exception_return;
PSTATE.SS = '0';
PSTATE.<D,A,I,F> = '1111';
PSTATE.IL = '0';
if from_32 then // Coming from AArch32
PSTATE.IT = '00000000';
if (HavePANExt() && (PSTATE.EL == EL1 || (PSTATE.EL == EL2 && ELIsInHost(EL0))) &&
SCTLR[].SPAN == '0') then
PSTATE.PAN = '1';
if HaveUAOExt() then PSTATE.UAO = '0';
if HaveBTIExt() then PSTATE.BTYPE = '00';
if HaveSSBSExt() then PSTATE.SSBS = SCTLR[].DSSBS;
if HaveMTEExt() then PSTATE.TCO = '1';
BranchTo(VBAR[]<63:11>:vect_offset<10:0>, BranchType_EXCEPTION);
if sync_errors then
iesb_req = TRUE;
TakeUnmaskedPhysicalSErrorInterrupts(iesb_req);
EndOfInstruction();
Library pseudocode for aarch64/exceptions/traps/AArch64.AArch32SystemAccessTrap
// AArch64.AArch32SystemAccessTrap()
// =================================
// Trapped AARCH32 system register access.
AArch64.AArch32SystemAccessTrap(bits(2) target_el, integer ec)

assert HaveEL(target_el) && target_el != EL0 && UInt(target_el) >= UInt(PSTATE.EL);

vect_offset = 0x0;
exception = AArch64.AArch32SystemAccessTrapSyndrome(ThisInstr(), ec);


Library pseudocode for aarch64/exceptions/traps/AArch64.AArch32SystemAccessTrapSyndrome

// AArch64.AArch32SystemAccessTrapSyndrome()
// =========================================
// Returns the syndrome information for traps on AArch32 MCR, MCRR, MRC, MRRC, and VMRS, VMSR instruction
// other than traps that are due to HCPTR or CPACR.
ExceptionRecord AArch64.AArch32SystemAccessTrapSyndrome(bits(32) instr, integer ec)

case ec of
when 0x0 exception = ExceptionSyndrome(Exception_Uncategorized);
when 0x4 exception = ExceptionSyndrome(Exception_CP15RRTTrap);
when 0x6 exception = ExceptionSyndrome(Exception_CP14DTTrap);
when 0x7 exception = ExceptionSyndrome(Exception_AdvSIMDFPAccessTrap);
when 0x8 exception = ExceptionSyndrome(Exception_FPIDTrap);
when 0xC exception = ExceptionSyndrome(Exception_CP14RRTTrap);
if exception.exceptype IN {Exception_FPIDTrap, Exception_CP14RTTrap, Exception_CP15RTTrap} then

// Trapped MRC/MCR, VMRS on FPSID
if exception.exceptype != Exception_FPIDTrap then // When trap is not for VMRS
iss<19:17> = instr<7:5>; // opc2
iss<16:14> = instr<23:21>; // opc1
iss<13:10> = instr<19:16>; // CRn
iss<4:1> = instr<3:0>; // CRm
else
iss<19:17> = '000';
iss<16:14> = '111';
iss<13:10> = instr<19:16>; // reg
iss<4:1> = '0000';
if instr<20> == '1' && instr<15:12> == '1111' then // MRC, Rt==15

iss<9:5> = '11111';
elsif instr<20> == '0' && instr<15:12> == '1111' then // MCR, Rt==15
else
iss<9:5> = LookUpRIndex(UInt(instr<15:12>), PSTATE.M)<4:0>;
elsif exception.exceptype IN {Exception_CP14RRTTrap, Exception_AdvSIMDFPAccessTrap, Exception_CP15RRT
// Trapped MRRC/MCRR, VMRS/VMSR
iss<19:16> = instr<7:4>; // opc1
if instr<19:16> == '1111' then // Rt2==15
else
if instr<15:12> == '1111' then // Rt==15

else
iss<4:1> = instr<3:0>; // CRm
elsif exception.exceptype == Exception_CP14DTTrap then
// Trapped LDC/STC
iss<19:12> = instr<7:0>; // imm8
iss<4> = instr<23>; // U
iss<2:1> = instr<24,21>; // P,W
if instr<19:16> == '1111' then // Rn==15, LDC(Literal addressing)/STC
iss<3> = '1';
elsif exception.exceptype == Exception_Uncategorized then
// Trapped for unknown reason
iss<9:5> = LookUpRIndex(UInt(instr<19:16>), PSTATE.M)<4:0>; // Rn
iss<3> = '0';
iss<0> = instr<20>; // Direction
exception.syndrome<19:0> = iss;

return exception;
Library pseudocode for aarch64/exceptions/traps/AArch64.AdvSIMDFPAccessTrap
// AArch64.AdvSIMDFPAccessTrap()
// =============================
// Trapped access to Advanced SIMD or FP registers due to CPACR[].
AArch64.AdvSIMDFPAccessTrap(bits(2) target_el)
vect_offset = 0x0;
route_to_el2 = (target_el == EL1 && EL2Enabled() && HCR_EL2.TGE == '1');
else
return;
Library pseudocode for aarch64/exceptions/traps/AArch64.CheckCP15InstrCoarseTraps
// AArch64.CheckCP15InstrCoarseTraps()
// ===================================
// Check for coarse-grained AArch32 CP15 traps in HSTR_EL2 and HCR_EL2.
boolean AArch64.CheckCP15InstrCoarseTraps(integer CRn, integer nreg, integer CRm)
// Check for coarse-grained Hyp traps

if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then
// Check for MCR, MRC, MCRR and MRRC disabled by HSTR_EL2<CRn/CRm>
major = if nreg == 1 then CRn else CRm;
if !IsInHost() && !(major IN {4,14}) && HSTR_EL2<major> == '1' then
return TRUE;
// Check for MRC and MCR disabled by HCR_EL2.TIDCP

if (HCR_EL2.TIDCP == '1' && nreg == 1 &&
((CRn == 9 && CRm IN {0,1,2, 5,6,7,8 }) ||
(CRn == 10 && CRm IN {0,1, 4, 8 }) ||
(CRn == 11 && CRm IN {0,1,2,3,4,5,6,7,8,15}))) then
return TRUE;
return FALSE;
Library pseudocode for aarch64/exceptions/traps/AArch64.CheckFPAdvSIMDEnabled
// AArch64.CheckFPAdvSIMDEnabled()
// ===============================
// Check against CPACR[]
AArch64.CheckFPAdvSIMDEnabled()
if PSTATE.EL IN {EL0, EL1} && !IsInHost() then
// Check if access disabled in CPACR_EL1
case CPACR_EL1.FPEN of
when 'x0' disabled = TRUE;
when '01' disabled = PSTATE.EL == EL0;
if disabled then AArch64.AdvSIMDFPAccessTrap(EL1);
AArch64.CheckFPAdvSIMDTrap(); // Also check against CPTR_EL2 and CPTR_EL3

Library pseudocode for aarch64/exceptions/traps/AArch64.CheckFPAdvSIMDTrap
// AArch64.CheckFPAdvSIMDTrap()
// ============================
// Check against CPTR_EL2 and CPTR_EL3.
AArch64.CheckFPAdvSIMDTrap()
if PSTATE.EL IN {EL0, EL1, EL2} && EL2Enabled() then
if HaveVirtHostExt() && HCR_EL2.E2H == '1' then
case CPTR_EL2.FPEN of
when 'x0' disabled = TRUE;
when '01' disabled = PSTATE.EL == EL0 && HCR_EL2.TGE == '1';
if disabled then AArch64.AdvSIMDFPAccessTrap(EL2);
else
if HaveEL(EL3) then
return;
Library pseudocode for aarch64/exceptions/traps/AArch64.CheckForERetTrap
// AArch64.CheckForERetTrap()
// ==========================
// Check for trap on ERET, ERETAA, ERETAB instruction
AArch64.CheckForERetTrap(boolean eret_with_pac, boolean pac_uses_key_a)
// Non-secure EL1 execution of ERET, ERETAA, ERETAB when either HCR_EL2.NV or HFGITR_EL2.ERET is set,
// is trapped to EL2
route_to_el2 = PSTATE.EL == EL1 && EL2Enabled() && ( (HaveNVExt() && HCR_EL2.NV
vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_ERetTrap);
if !eret_with_pac then // ERET
exception.syndrome<0> = '0'; // RES0
else
if pac_uses_key_a then // ERETAA
else // ERETAB

Library pseudocode for aarch64/exceptions/traps/AArch64.CheckForSMCUndefOrTrap
// AArch64.CheckForSMCUndefOrTrap()
// ================================
// Check for UNDEFINED or trap on SMC instruction
AArch64.CheckForSMCUndefOrTrap(bits(16) imm)
route_to_el2 = PSTATE.EL == EL1 && EL2Enabled() && HCR_EL2.TSC == '1';
if !HaveEL(EL3) then
if PSTATE.EL == EL1 && EL2Enabled() then
if HaveNVExt() && HCR_EL2.NV == '1' && HCR_EL2.TSC == '1' then
route_to_el2 = TRUE;
else
UNDEFINED;
else
UNDEFINED;
else
route_to_el2 = PSTATE.EL == EL1 && EL2Enabled() && HCR_EL2.TSC == '1';
vect_offset = 0x0;
exception.syndrome<15:0> = imm;
Library pseudocode for aarch64/exceptions/traps/AArch64.CheckForSVCTrap
// AArch64.CheckForSVCTrap()
// =========================
// Check for trap on SVC instruction
AArch64.CheckForSVCTrap(bits(16) immediate)
if HaveFGTExt() then
route_to_el2 = !ELUsingAArch32(EL0) && !ELUsingAArch32(EL1) && EL2Enabled() && HFGITR_EL2.SV
elsif PSTATE.EL == EL1 then

route_to_el2 = !ELUsingAArch32(EL1) && EL2Enabled() && HFGITR_EL2.SVC_EL1 == '1' &&
vect_offset = 0x0;
Library pseudocode for aarch64/exceptions/traps/AArch64.CheckForWFxTrap
// AArch64.CheckForWFxTrap()
// =========================
// Check for trap on WFE or WFI instruction
AArch64.CheckForWFxTrap(bits(2) target_el, boolean is_wfe)

assert HaveEL(target_el);
case target_el of
when EL1
trap = (if is_wfe then SCTLR[].nTWE else SCTLR[].nTWI) == '0';
when EL2
trap = (if is_wfe then HCR_EL2.TWE else HCR_EL2.TWI) == '1';
when EL3
trap = (if is_wfe then SCR_EL3.TWE else SCR_EL3.TWI) == '1';
if trap then
AArch64.WFxTrap(target_el, is_wfe);

Library pseudocode for aarch64/exceptions/traps/AArch64.CheckIllegalState
// AArch64.CheckIllegalState()
// ===========================
// Check PSTATE.IL bit and generate Illegal Execution state exception if set.
AArch64.CheckIllegalState()
if PSTATE.IL == '1' then

vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_IllegalState);

else
Library pseudocode for aarch64/exceptions/traps/AArch64.MonitorModeTrap
// AArch64.MonitorModeTrap()
// =========================
// Trapped use of Monitor mode features in a Secure EL1 AArch32 mode
AArch64.MonitorModeTrap()
vect_offset = 0x0;
if IsSecureEL2Enabled() then
Library pseudocode for aarch64/exceptions/traps/AArch64.SystemAccessTrap
// AArch64.SystemAccessTrap()
// ==========================
// Trapped access to AArch64 system register or system instruction.
AArch64.SystemAccessTrap(bits(2) target_el, integer ec)

assert HaveEL(target_el) && target_el != EL0 && UInt(target_el) >= UInt(PSTATE.EL);

vect_offset = 0x0;


Library pseudocode for aarch64/exceptions/traps/AArch64.SystemAccessTrapSyndrome
// AArch64.SystemAccessTrapSyndrome()
// ==================================
// Returns the syndrome information for traps on AArch64 MSR/MRS instructions.
ExceptionRecord AArch64.SystemAccessTrapSyndrome(bits(32) instr, integer ec)

case ec of
when 0x0 // Trapped access due to unknown rea
when 0x7 // Trapped access to SVE, Advance SI
when 0x18 // Trapped access to system register
exception = ExceptionSyndrome(Exception_SystemRegisterTrap);
exception.syndrome<21:20> = instr<20:19>; // Op0
exception.syndrome<13:10> = instr<15:12>; // CRn
exception.syndrome<9:5> = instr<4:0>; // Rt
exception.syndrome<4:1> = instr<11:8>; // CRm
exception.syndrome<0> = instr<21>; // Direction
when 0x19 // Trapped access to SVE System regi
exception = ExceptionSyndrome(Exception_SVEAccessTrap);
otherwise
Unreachable();
return exception;
Library pseudocode for aarch64/exceptions/traps/AArch64.UndefinedFault
// AArch64.UndefinedFault()
// ========================
AArch64.UndefinedFault()

vect_offset = 0x0;

else

Library pseudocode for aarch64/exceptions/traps/AArch64.WFETrapDelay
// AArch64.WFETrapDelay()
// ======================
// Returns TRUE when delay in trap to WFE is enabled with value to amount of delay,
// FALSE otherwise.
(boolean, integer) WFETrapDelay(bits(2) target_el)

case target_el of
when EL1
if !IsInHost() then
delay_enabled = SCTLR_EL1.TWEDEn == '1';
delay = 1 << (UInt(SCTLR_EL1.TWEDEL) + 8);
else
delay_enabled = SCTLR_EL2.TWEDEn == '1';
delay = 1 << (UInt(SCTLR_EL2.TWEDEL) + 8);
when EL2
delay_enabled = HCR_EL2.TWEDEn == '1';
delay = 1 << (UInt(HCR_EL2.TWEDEL) + 8);
when EL3
delay_enabled = SCR_EL3.TWEDEn == '1';
delay = 1 << (UInt(SCR_EL3.TWEDEL) + 8);
return (delay_enabled, delay);
Library pseudocode for aarch64/exceptions/traps/AArch64.WFxTrap
// AArch64.WFxTrap()
// =================
AArch64.WFxTrap(bits(2) target_el, boolean is_wfe)

assert UInt(target_el) > UInt(PSTATE.EL);

vect_offset = 0x0;
exception = ExceptionSyndrome(Exception_WFxTrap);
exception.syndrome<0> = if is_wfe then '1' else '0';
if target_el == EL1 && EL2Enabled() && HCR_EL2.TGE == '1' then

else
Library pseudocode for aarch64/exceptions/traps/AArch64.WaitForEventUntilDelay
// AArch64.WaitForEventUntilDelay()
// ================================
// Returns TRUE if WaitForEvent() returns before WFE trap delay expires,
// FALSE otherwise.
boolean AArch64.WaitForEventUntilDelay(boolean delay_enabled, integer delay)

boolean eventarrived = FALSE;
// set eventarrived to TRUE if WaitForEvent() returns before
// 'delay' expires when delay_enabled is TRUE.
return eventarrived;
Library pseudocode for aarch64/exceptions/traps/CheckFPAdvSIMDEnabled64
// CheckFPAdvSIMDEnabled64()
// =========================
// AArch64 instruction wrapper
CheckFPAdvSIMDEnabled64()

Library pseudocode for aarch64/functions/aborts/AArch64.CreateFaultRecord
// AArch64.CreateFaultRecord()
// ===========================
FaultRecord AArch64.CreateFaultRecord(Fault statuscode, bits(52) ipaddress, boolean NS,

integer level, AccType acctype, boolean write, bit extflag,
bits(2) errortype, boolean secondstage, boolean s2fs1walk)
FaultRecord fault;
fault.statuscode = statuscode;
fault.domain = bits(4) UNKNOWN; // Not used from AArch64
fault.debugmoe = bits(4) UNKNOWN; // Not used from AArch64
fault.errortype = errortype;
fault.ipaddress.NS = if NS then '1' else '0';
fault.ipaddress.address = ipaddress;
fault.level = level;
fault.acctype = acctype;
fault.write = write;
fault.extflag = extflag;
fault.secondstage = secondstage;
fault.s2fs1walk = s2fs1walk;
return fault;
Library pseudocode for aarch64/functions/aborts/AArch64.FaultSyndrome
// AArch64.FaultSyndrome()
// =======================
// Creates an exception syndrome value for Abort and Watchpoint exceptions taken to
// an Exception Level using AArch64.
bits(25) AArch64.FaultSyndrome(boolean d_side, FaultRecord fault)


if HaveRASExt() && IsExternalSyncAbort(fault) then iss<12:11> = fault.errortype; // SET
if d_side then
if IsSecondStage(fault) && !fault.s2fs1walk then iss<24:14> = LSInstructionSyndrome();
iss<13> = '1'; // Value of '1' indicates fault is generated by use of VNCR_EL2
if fault.acctype IN {AccType_DC, AccType_DC_UNPRIV, AccType_IC, AccType_AT} then
iss<8> = '1'; iss<6> = '1';
else
iss<6> = if fault.write then '1' else '0';
if IsExternalAbort(fault) then iss<9> = fault.extflag;
iss<7> = if fault.s2fs1walk then '1' else '0';
iss<5:0> = EncodeLDFSC(fault.statuscode, fault.level);
return iss;

Library pseudocode for aarch64/functions/exclusive/AArch64.ExclusiveMonitorsPass
// AArch64.ExclusiveMonitorsPass()
// ===============================
// Return TRUE if the Exclusives monitors for the current PE include all of the addresses
// associated with the virtual address region of size bytes starting at address.
// The immediately following memory write must be to the same addresses.
boolean AArch64.ExclusiveMonitorsPass(bits(64) address, integer size)
// It is IMPLEMENTATION DEFINED whether the detection of memory aborts happens

// before or after the check on the local Exclusives monitor. As a result a failure
// of the local monitor can occur on some implementations even if the memory
// access would give an memory abort.
iswrite = TRUE;
passed = AArch64.IsExclusiveVA(address, ProcessorID(), size);

if !passed then
return FALSE;

passed = IsExclusiveLocal(memaddrdesc.paddress, ProcessorID(), size);

if passed then
passed = IsExclusiveGlobal(memaddrdesc.paddress, ProcessorID(), size);
return passed;
Library pseudocode for aarch64/functions/exclusive/AArch64.IsExclusiveVA
// An optional IMPLEMENTATION DEFINED test for an exclusive access to a virtual

// address region of size bytes starting at address.
//
// It is permitted (but not required) for this function to return FALSE and
// cause a store exclusive to fail if the virtual address region is not
// totally included within the region recorded by MarkExclusiveVA().
//
// It is always safe to return TRUE which will check the physical address only.
boolean AArch64.IsExclusiveVA(bits(64) address, integer processorid, integer size);
Library pseudocode for aarch64/functions/exclusive/AArch64.MarkExclusiveVA
// Optionally record an exclusive access to the virtual address region of size bytes
// starting at address for processorid.
AArch64.MarkExclusiveVA(bits(64) address, integer processorid, integer size);

Library pseudocode for aarch64/functions/exclusive/AArch64.SetExclusiveMonitors
// AArch64.SetExclusiveMonitors()
// ==============================
// Sets the Exclusives monitors for the current PE to record the addresses associated
// with the virtual address region of size bytes starting at address.
AArch64.SetExclusiveMonitors(bits(64) address, integer size)
iswrite = FALSE;

return;
MarkExclusiveGlobal(memaddrdesc.paddress, ProcessorID(), size);
MarkExclusiveLocal(memaddrdesc.paddress, ProcessorID(), size);
AArch64.MarkExclusiveVA(address, ProcessorID(), size);
Library pseudocode for aarch64/functions/fusedrstep/FPRSqrtStepFused
// FPRSqrtStepFused()
// ==================
bits(N) FPRSqrtStepFused(bits(N) op1, bits(N) op2)

assert N IN {16, 32, 64};
bits(N) result;
op1 = FPNeg(op1);
(type1,sign1,value1) = FPUnpack(op1, FPCR);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, FPCR);
if !done then
inf1 = (type1 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
result = FPOnePointFive('0');
elsif inf1 || inf2 then
result = FPInfinity(sign1 EOR sign2);
else
// Fully fused multiply-add and halve
result_value = (3.0 + (value1 * value2)) / 2.0;
if result_value == 0.0 then
// Sign of exact zero result depends on rounding mode
sign = if FPRoundingMode(FPCR) == FPRounding_NEGINF then '1' else '0';
result = FPZero(sign);
else
result = FPRound(result_value, FPCR);
return result;

Library pseudocode for aarch64/functions/fusedrstep/FPRecipStepFused
// FPRecipStepFused()
// ==================
bits(N) FPRecipStepFused(bits(N) op1, bits(N) op2)

assert N IN {16, 32, 64};
bits(N) result;
op1 = FPNeg(op1);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, FPCR);
if !done then
result = FPTwo('0');
elsif inf1 || inf2 then
result = FPInfinity(sign1 EOR sign2);
else
// Fully fused multiply-add
result_value = 2.0 + (value1 * value2);
if result_value == 0.0 then
// Sign of exact zero result depends on rounding mode
sign = if FPRoundingMode(FPCR) == FPRounding_NEGINF then '1' else '0';
result = FPZero(sign);
else
result = FPRound(result_value, FPCR);
return result;
Library pseudocode for aarch64/functions/memory/AArch64.AccessIsTagChecked
// AArch64.AccessIsTagChecked()
// ============================
// TRUE if a given access is tag-checked, FALSE otherwise.
boolean AArch64.AccessIsTagChecked(bits(64) vaddr, AccType acctype)

if PSTATE.M<4> == '1' then return FALSE;
if EffectiveTBI(vaddr, FALSE, PSTATE.EL) == '0' then

return FALSE;
if EffectiveTCMA(vaddr, PSTATE.EL) == '1' && (vaddr<59:55> == '00000' || vaddr<59:55> == '11111') the

return FALSE;
if !AArch64.AllocationTagAccessIsEnabled(acctype) then
return FALSE;
if acctype IN {AccType_IFETCH, AccType_PTW} then

return FALSE;
if acctype == AccType_NV2REGISTER then

return FALSE;
if PSTATE.TCO=='1' then
return FALSE;
if !IsTagCheckedInstruction() then
return FALSE;
return TRUE;

Library pseudocode for aarch64/functions/memory/AArch64.AddressWithAllocationTag
// AArch64.AddressWithAllocationTag()
// ==================================
// Generate a 64-bit value containing a Logical Address Tag from a 64-bit
// virtual address and an Allocation Tag.
// If the extension is disabled, treats the Allocation Tag as '0000'.
bits(64) AArch64.AddressWithAllocationTag(bits(64) address, AccType acctype, bits(4) allocation_tag)

bits(64) result = address;
bits(4) tag;
if AArch64.AllocationTagAccessIsEnabled(acctype) then
tag = allocation_tag;
else
tag = '0000';
result<59:56> = tag;
return result;
Library pseudocode for aarch64/functions/memory/AArch64.AllocationTagFromAddress
// AArch64.AllocationTagFromAddress()
// ==================================
// Generate an ALlocation Tag from a 64-bit value containing a Logical Address Tag.
bits(4) AArch64.AllocationTagFromAddress(bits(64) tagged_address)

return tagged_address<59:56>;
Library pseudocode for aarch64/functions/memory/AArch64.CheckAlignment
// AArch64.CheckAlignment()
// ========================
boolean AArch64.CheckAlignment(bits(64) address, integer alignment, AccType acctype,

boolean iswrite)
aligned = (address == Align(address, alignment));

atomic = acctype IN { AccType_ATOMIC, AccType_ATOMICRW, AccType_ORDEREDATOMIC, AccType_ORDEREDATOMIC
ordered = acctype IN { AccType_ORDERED, AccType_ORDEREDRW, AccType_LIMITEDORDERED, AccType_ORDEREDATO
vector = acctype == AccType_VEC;
if SCTLR[].A == '1' then check = TRUE;
elsif HaveUA16Ext() then
check = (UInt(address<0+:4>) + alignment > 16) && ((ordered && SCTLR[].nAA == '0') || atomic);
else check = atomic || ordered;
if check && !aligned then

return aligned;
Library pseudocode for aarch64/functions/memory/AArch64.CheckTag
// AArch64.CheckTag()
// ==================
// Performs a Tag Check operation for a memory access and returns
// whether the check passed
boolean AArch64.CheckTag(AddressDescriptor memaddrdesc, bits(4) ptag, boolean write)

if memaddrdesc.memattrs.tagged then
return ptag == _MemTag[memaddrdesc];
else
return TRUE;

Library pseudocode for aarch64/functions/memory/AArch64.MemSingle
// AArch64.MemSingle[] - non-assignment (read) form

// ================================================
// Perform an atomic, little-endian read of 'size' bytes.
bits(size*8) AArch64.MemSingle[bits(64) address, integer size, AccType acctype, boolean wasaligned]

assert size IN {1, 2, 4, 8, 16};
bits(size*8) value;
iswrite = FALSE;


value = _Mem[memaddrdesc, size, accdesc];
return value;
// AArch64.MemSingle[] - assignment (write) form

// =============================================
// Perform an atomic, little-endian write of 'size' bytes.
AArch64.MemSingle[bits(64) address, integer size, AccType acctype, boolean wasaligned] = bits(size*8) val
assert size IN {1, 2, 4, 8, 16};
iswrite = TRUE;

// Effect on exclusives
ClearExclusiveByAddress(memaddrdesc.paddress, ProcessorID(), size);

_Mem[memaddrdesc, size, accdesc] = value;
return;

Library pseudocode for aarch64/functions/memory/AArch64.MemTag
// AArch64.MemTag[] - non-assignment (read) form

// =============================================
// Load an Allocation Tag from memory.
bits(4) AArch64.MemTag[bits(64) address, AccType acctype]

assert acctype == AccType_NORMAL;
bits(4) value;
iswrite = FALSE;
memaddrdesc = AArch64.TranslateAddress(address, acctype, iswrite, TRUE, TAG_GRANULE);

// Return the granule tag if tagging is enabled...

if AArch64.AllocationTagAccessIsEnabled(acctype) && memaddrdesc.memattrs.tagged then
return _MemTag[memaddrdesc];
else
// ...otherwise read tag as zero.
return '0000';
// AArch64.MemTag[] - assignment (write) form

// ==========================================
// Store an Allocation Tag to memory.
AArch64.MemTag[bits(64) address, AccType acctype] = bits(4) value

assert acctype == AccType_NORMAL;
iswrite = TRUE;
// Stores of allocation tags must be aligned

if address != Align(address, TAG_GRANULE) then
boolean secondstage = FALSE;
wasaligned = TRUE;
memaddrdesc = AArch64.TranslateAddress(address, acctype, iswrite, wasaligned, TAG_GRANULE);


if AArch64.AllocationTagAccessIsEnabled(acctype) && memaddrdesc.memattrs.tagged then
_MemTag[memaddrdesc] = value;
Library pseudocode for aarch64/functions/memory/AArch64.PhysicalTag
// AArch64.PhysicalTag()
// =====================
// Generate a Physical Tag from a Logical Tag in an address
bits(4) AArch64.PhysicalTag(bits(64) vaddr)

return vaddr<59:56>;

Library pseudocode for aarch64/functions/memory/AArch64.TranslateAddressForAtomicAccess
// AArch64.TranslateAddressForAtomicAccess()
//

Armv86bisa PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Armv86bisa PDF

Transféré par

Droits d'auteur :

Formats disponibles

Arm A64 Instruction Set Architecture

Armv8, for Armv8-A architecture profile

This document may include technical inaccuracies or typographical errors.

Arm Limited. Company 02557590 registered in England.

A64 -- Base Instructions (alphabetic order)

ADC: Add with Carry.

ADCS: Add with Carry, setting flags.

ADD (extended register): Add (extended register).

ADD (immediate): Add (immediate).

ADD (shifted register): Add (shifted register).

ADDG: Add with Tag.

ADDS (extended register): Add (extended register), setting flags.

ADDS (immediate): Add (immediate), setting flags.

ADDS (shifted register): Add (shifted register), setting flags.

ADR: Form PC-relative address.

ADRP: Form PC-relative address to 4KB page.

AND (immediate): Bitwise AND (immediate).

AND (shifted register): Bitwise AND (shifted register).

ANDS (immediate): Bitwise AND (immediate), setting flags.

ANDS (shifted register): Bitwise AND (shifted register), setting flags.

ASR (immediate): Arithmetic Shift Right (immediate): an alias of SBFM.

ASR (register): Arithmetic Shift Right (register): an alias of ASRV.

ASRV: Arithmetic Shift Right Variable.

AT: Address Translate: an alias of SYS.

AUTDA, AUTDZA: Authenticate Data address, using key A.

AUTDB, AUTDZB: Authenticate Data address, using key B.

AXFLAG: Convert floating-point condition flags from Arm to external format.

B.cond: Branch conditionally.

BFC: Bitfield Clear: an alias of BFM.

BFI: Bitfield Insert: an alias of BFM.

BFM: Bitfield Move.

BFXIL: Bitfield extract and insert at low end: an alias of BFM.

BIC (shifted register): Bitwise Bit Clear (shifted register).

BL: Branch with Link.

BLR: Branch with Link to Register.

BR: Branch to Register.

BRAA, BRAAZ, BRAB, BRABZ: Branch to Register, with pointer authentication.

BRK: Breakpoint instruction.

BTI: Branch Target Identification.

CASB, CASAB, CASALB, CASLB: Compare and Swap byte in memory.

CASH, CASAH, CASALH, CASLH: Compare and Swap halfword in memory.

CBNZ: Compare and Branch on Nonzero.

CBZ: Compare and Branch on Zero.

CCMN (immediate): Conditional Compare Negative (immediate).

CCMN (register): Conditional Compare Negative (register).

CCMP (immediate): Conditional Compare (immediate).

CCMP (register): Conditional Compare (register).

CFINV: Invert Carry Flag.

CFP: Control Flow Prediction Restriction by Context: an alias of SYS.

CINC: Conditional Increment: an alias of CSINC.

CINV: Conditional Invert: an alias of CSINV.

CLREX: Clear Exclusive.

CLS: Count Leading Sign bits.

CLZ: Count Leading Zeros.

CMN (immediate): Compare Negative (immediate): an alias of ADDS (immediate).

CMP (immediate): Compare (immediate): an alias of SUBS (immediate).

CMPP: Compare with Tag: an alias of SUBPS.

CNEG: Conditional Negate: an alias of CSNEG.

CPP: Cache Prefetch Prediction Restriction by Context: an alias of SYS.

CRC32B, CRC32H, CRC32W, CRC32X: CRC32 checksum.

CRC32CB, CRC32CH, CRC32CW, CRC32CX: CRC32C checksum.

CSDB: Consumption of Speculative Data Barrier.

CSEL: Conditional Select.