AARCHMRS Schema 2.7.4

ISA Data Model User Guide

This guide introduces the AARCHMRS Instruction Set Architecture (ISA) and the underlying data models that represent it. The target audience is people who want to manipulate or read ISAs and their contents programmatically.

This guide describes a fictitious ISA, named B64, which should not be conflated with a real ISA.

Instruction.Instructions wrapper

Instructions holds all instruction-related content. It has three properties:

Building the B64 instruction set

The example below includes the following concepts, contained in an Instruction.Instructions object:

{
    "_type": "Instruction.Instructions",
    "instructions": [
        {
            "_type": "Instruction.InstructionSet",
            "name": "B64",
            "read_width": 32,
            "operation_id": "unalloc",
            "encoding": {
                "_type": "Instruction.Encodeset.Encodeset",
                "width": 32,
                "values": [
                    {
                        "_type": "Instruction.Encodeset.Field",
                        "name": "op0",
                        "range": {
                            "_type": "Range",
                            "start": 25,
                            "width": 7
                        },
                        "value": {
                            "_type": "Values.Value",
                            "value": "'xxxxxxx'",
                            "meaning": null
                        },
                        "should_be_mask": {
                            "_type": "Values.Value",
                            "value": "'0000000'"
                        }
                    }
                ]
            },
            "children": []
        }
    ],
    "assembly_rules": {},
    "operations": {
        "unalloc": {
            "_type": "Instruction.Operation",
            "title": "Default Behavior",
            "operation": [
                [
                    "REG_LASTINSTRUCTION_OP = encoding[31:0];",
                    "RaiseException(UNALLOCATED);"
                ]
            ],
            "brief": "Default behavior of the B64 instruction set",
            "description": "If no child instruction is matched, the following operation is executed."
        }
    }
}

In this example, there is a 32-bit instruction set named B64, which can have default behavior that here is defined in the operation identified by unalloc.

Note

Each instruction uses an Instruction.Encodeset.Encodeset to define its bits. For more information, see Encodeset.

Note

For ease of use from this point, instructions and operations are shown independently. In a real instruction set, they are defined within Instruction.Instructions, as described above.

Adding instruction groups and individual instructions to B64

In the examples that follow, a group, B64.arithmetic, is added. It has two children - B64.arithmetic.register and B64.arithmetic.immediate - and each has an ADD child and a SUB child.

The following is being represented:

B64 --> arithmetic --> register  --> ADD
                   |             \-> SUB
                   \-> immediate --> ADD
                                 \-> SUB

Warning

The grouping shown above is useful for representational purposes, but the following is semantically equivalent:

B64 --> ADD_register
    |-> SUB_register
    |-> ADD_immediate
    \-> SUB_immediate

No architectural meaning should be inferred from syntactic differences between semantically equivalent arrangements of the tree.

The grouping provides an organisational representation of an architecture that may contain many instructions. The grouping also allows common fields to be defined at the group level, which enables the use of inheritance to reduce the duplicating of information in the data.

The four instructions in B64.arithmetic define the following:

The first and second operands of each are:

The third operand of B64.arithmetic.register instructions is src1 (short for "second source register"). The third operand of B64.arithmetic.immediate instructions is imm (short for "immediate").

B64.arithmetic

The B64.arithmetic group is first defined as follows:

{
    "_type": "Instruction.InstructionGroup",
    "name": "arithmetic",
    "encoding": {
        "_type": "Instruction.Encodeset.Encodeset",
        "width": 32,
        "values": [
            {
                "_type": "Instruction.Encodeset.Bits",
                "range": {
                    "_type": "Range",
                    "start": 30,
                    "width": 2
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'00'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'00'"
                }
            },
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "op0",
                "range": {
                    "_type": "Range",
                    "start": 26,
                    "width": 4
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'xxxx'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'0000'"
                }
            },
            {
                "_type": "Instruction.Encodeset.Bits",
                "range": {
                    "_type": "Range",
                    "start": 25,
                    "width": 1
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'0'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'0'"
                }
            },
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "subtype",
                "range": {
                    "_type": "Range",
                    "start": 24,
                    "width": 1
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'x'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'0'"
                }
            },
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "dst",
                "range": {
                    "_type": "Range",
                    "start": 16,
                    "width": 8
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'xxxxxxxx'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'00000000'"
                }
            },
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "src0",
                "range": {
                    "_type": "Range",
                    "start": 8,
                    "width": 8
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'xxxxxxxx'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'00000000'"
                }
            }
        ]
    },
    "children": []
}

In the above:

B64.arithmetic.register and B64.arithmetic.immediate

B64.arithmetic.register (a child of B64.arithmetic) is shown below:

{
    "_type": "Instruction.InstructionGroup",
    "name": "register",
    "encoding": {
        "_type": "Instruction.Encodeset.Encodeset",
        "width": 32,
        "values": [
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "subtype",
                "range": {
                    "_type": "Range",
                    "start": 24,
                    "width": 1
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'0'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'0'"
                }
            },
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "src1",
                "range": {
                    "_type": "Range",
                    "start": 0,
                    "width": 8
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'xxxxxxxx'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'00000000'"
                }
            }
        ]
    },
    "children": []
}

In the above:

B64.arithmetic.immediate (another child of B64.arithmetic) is shown below:

{
    "_type": "Instruction.InstructionGroup",
    "name": "immediate",
    "encoding": {
        "_type": "Instruction.Encodeset.Encodeset",
        "width": 32,
        "values": [
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "subtype",
                "range": {
                    "_type": "Range",
                    "start": 24,
                    "width": 1
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'1'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'0'"
                }
            },
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "invert",
                "range": {
                    "_type": "Range",
                    "start": 7,
                    "width": 1
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'x'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'0'"
                }
            },
            {
                "_type": "Instruction.Encodeset.Field",
                "name": "imm",
                "range": {
                    "_type": "Range",
                    "start": 0,
                    "width": 7
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'xxxxxxx'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'0000000'"
                }
            }
        ]
    },
    "children": []
}

In the above:

B64.arithmetic.register.ADD

ADD_reg operation

To reduce duplication of information in the data, Instruction.Operation contains a single definition of execute behavior that can be attached to more than one instruction.

Note

This is only a compression technique - defining the same behavior independently for each instruction would be semantically equivalent.

The following shows the "ADD (register)" operation (referenced as ADD_reg):

{
    "ADD_reg": {
        "_type": "Instruction.Operation",
        "title": "ADD (register)",
        "operation": "R[d] = R[s0] + R[s1]",
        "brief": "Add values",
        "description": "Add registers at locations `s0` and `s1` and store the result in the register located at `d`."
    }
}

Note

The names d, s0, and s1 come from the decode in the instruction, which is shown in the next section. The R refers to a general-purpose register in the B64 architecture.

ADD instruction

The ADD instruction is defined under B64.arithmetic.register and then connected to its operation via the operation_id key "ADD_reg":

{
    "_type": "Instruction.Instruction",
    "name": "ADD",
    "operation_id": "ADD_reg",
    "decode": [
        [
            "integer d = UInt(dst);,",
            "integer s0 = UInt(src0)",
            "integer s1 = UInt(src1);"
        ]
    ],
    "encoding": {
        "_type": "Instruction.Encodeset.Encodeset",
        "width": 32,
        "values": [
            {
                "_type": "Instruction.Encodeset.Bits",
                "range": {
                    "_type": "Range",
                    "start": 26,
                    "width": 4
                },
                "value": {
                    "_type": "Values.Value",
                    "value": "'0000'",
                    "meaning": null
                },
                "should_be_mask": {
                    "_type": "Values.Value",
                    "value": "'1000'"
                }
            }
        ]
    },
    "properties": {
        "operands": {
            "destination": {
                "isread": false,
                "iswritten": true,
                "index": null
            },
            "source0": {
                "isread": true,
                "iswritten": false,
                "index": null
            },
            "source1": {
                "isread": true,
                "iswritten": false,
                "index": null
            }
        }
    },
    "assembly": {
        "_type": "Instruction.Assembly",
        "symbols": [
            {
                "_type": "Instruction.Symbols.Literal",
                "value": "ADD"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "SPACE"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "Rd"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "COMMA"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "Rs0"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "COMMA"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "Rs1"
            }
        ]
    },
    "assemble": [
        "dst = operands.destination.index[7:0];",
        "src0 = operands.source0.index[7:0];",
        "src1 = operands.source1.index[7:0];"
    ],
    "disassemble": [
        "operands.destination.index = UInt(dst);",
        "operands.source0.index = UInt(src0);",
        "operands.source1.index = UInt(src1);"
    ],
    "children": []
}

The example above includes the following concepts:

R[d] = R[s0] + R[s1]

Adding assembly to B64

An assembly layer can be added to an instruction, defining a human-friendly representation of the underlying machine instruction. There might not be a one-to-one relationship between an assembly symbol and an encoding field. For this reason, the property section of an Instruction is used to hold intermediate assembly data.

Here are some of the possible ways to describe the "ADD (register)" assembly:

ADD R5, R6, R9
ADD R[5], R[6], R[9]
ADD register 6 and register 9 and store the result in register 5
R5 = R6 + R9

This User Guide uses the first assembly notation - ADD R5, R6, R9. To define the assembly in the B64.arithmetic.register.ADD instruction, the following are defined:

{
    "COMMA": {
        "_type": "Instruction.Rules.Token",
        "pattern": ",\\s+",
        "default": ", "
    },
    "SPACE": {
        "_type": "Instruction.Rules.Token",
        "pattern": "\\s+",
        "default": " "
    },
    "UInteger": {
        "_type": "Instruction.Rules.Token",
        "pattern": "[1-9][0-9]*|0"
    },
    "Rd": {
        "_type": "Instruction.Rules.Rule",
        "display": "Rd",
        "description": "Is the index of the general-purpose destination register.",
        "symbols": {
            "_type": "Instruction.Assembly",
            "symbols": [
                {
                    "_type": "Instruction.Symbols.Literal",
                    "value": "R"
                },
                {
                    "_type": "Instruction.Symbols.RuleReference",
                    "rule_id": "UInteger"
                }
            ]
        },
        "assemble": "operands.destination.index = UInteger;",
        "disassemble": "return \"R\" ++ operands.destination.index;"
    },
    "Rs0": {
        "_type": "Instruction.Rules.Rule",
        "display": "Rs0",
        "description": "Is the index of the first general-purpose source register.",
        "symbols": {
            "_type": "Instruction.Assembly",
            "symbols": [
                {
                    "_type": "Instruction.Symbols.Literal",
                    "value": "R"
                },
                {
                    "_type": "Instruction.Symbols.RuleReference",
                    "rule_id": "UInteger"
                }
            ]
        },
        "assemble": "operands.source0.index = UInteger;",
        "disassemble": "return \"R\" ++ operands.source0.index;"
    },
    "Rs1": {
        "_type": "Instruction.Rules.Rule",
        "display": "Rs1",
        "description": "Is the index of the second general-purpose source register.",
        "symbols": {
            "_type": "Instruction.Assembly",
            "symbols": [
                {
                    "_type": "Instruction.Symbols.Literal",
                    "value": "R"
                },
                {
                    "_type": "Instruction.Symbols.RuleReference",
                    "rule_id": "UInteger"
                }
            ]
        },
        "assemble": "operands.source1.index = UInteger;",
        "disassemble": "return \"R\" ++ operands.source1.index;"
    }
}

Note

The content of the above example is added to the assembly_rules property of the Instruction.Instructions mentioned in the "Instruction.Instructions wrapper" section, above.

Warning

The assemble and disassemble properties shown above are represented as strings, but in the data model they are represented as AST.StatementBlock. Strings are used here for readability.

The above example contains two types of Instruction.Rules:

The defined rules are then connected to the assembly property of the B64.arithmetic.register.ADD instruction using RuleReference as shown in the assembly of the instruction.

This allows rules to be reused when creating a SUB (register) instruction. For ADD (immediate), the only new rule required is for "immediate".

Alias

An example of an alias is "DOUBLE", which is an "ADD (register)" instruction where the dst, src0, and src1 fields have the same value. One operand is sufficient to disassemble this alias.

The following assemblies disassemble to the same instruction encoding:

DOUBLE X5
ADD X5, X5, X5

Both assemblies have the same behavior:

R[5] = R[5] + R[5]

To create this alias, a new rule, RdRs0Rs1, is needed:

{
    "_type": "Instruction.Rules.Rule",
    "display": "RdRs0Rs1",
    "description": "Is the index of the destination and first and second general-purpose source registers.",
    "symbols": {
        "_type": "Instruction.Assembly",
        "symbols": [
            {
                "_type": "Instruction.Symbols.Literal",
                "value": "R"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "UInteger"
            }
        ]
    },
    "assemble": [
        "operands.destination.index = UInteger;",
        "operands.source0.index = UInteger;",
        "operands.source1.index = UInteger;"
    ],
    "disassemble": "return \"R\" ++ operands.destination.index;"
}

The main difference is that, to assemble, the rule RdRs0Rs1 sets all the operands.*.index to the same value. This is because the operand in the DOUBLE assembly maps to all three operand properties.

The alias DOUBLE, under B64.arithmetic.register.ADD, is now introduced:

{
    "_type": "Instruction.InstructionAlias",
    "name": "DOUBLE",
    "condition": {
        "_type": "AST.BinaryOp",
        "left": {
            "_type": "AST.BinaryOp",
            "left": {
                "_type": "AST.Identifier",
                "value": "dst"
            },
            "op": "==",
            "right": {
                "_type": "AST.Identifier",
                "value": "src0"
            }
        },
        "op": "==",
        "right": {
            "_type": "AST.Identifier",
            "value": "src1"
        }
    },
    "preferred": {
        "_type": "AST.Bool",
        "value": true
    },
    "operation_id": "ADD_reg",
    "assembly": {
        "_type": "Instruction.Assembly",
        "symbols": [
            {
                "_type": "Instruction.Symbols.Literal",
                "value": "DOUBLE"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "SPACE"
            },
            {
                "_type": "Instruction.Symbols.RuleReference",
                "rule_id": "RdRs0Rs1"
            }
        ]
    }
}

In the above example:

Sub-instruction

A sub-instruction is a parent-child relationship between Instruction.Instruction objects in which a child node can override the behavior of its parent. This principle can be expanded to create many ancestor-child relationships, permitting more than one level of sub-instruction.


See the following for more information: