University of Huddersfield Repository

Mencák, Jirí

Extended update plans

Original Citation


This version is available at http://eprints.hud.ac.uk/5935/

The University Repository is a digital collection of the research output of the University, available on Open Access. Copyright and Moral Rights for the items on this site are retained by the individual author and/or other copyright owners. Users may access full items free of charge; copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational or not-for-profit purposes without prior permission or charge, provided:

- The authors, title and full bibliographic details is credited in any copy;
- A hyperlink and/or URL is included for the original metadata page; and
- The content is not changed in any way.

For more information, including our policy and submission procedure, please contact the Repository Team at: E.mailbox@hud.ac.uk.

http://eprints.hud.ac.uk/
Extended Update Plans

Jiří Mencák

A thesis submitted to the University of Huddersfield
in partial fulfilment of the requirements for
the degree of Doctor of Philosophy

The University of Huddersfield

School of Computing and Engineering May 2003
Abstract

Formal methods are gaining popularity as a way of increasing the reliability of systems through the use of mathematically based techniques. Their domain is no longer restricted to purely academic environments and examples, as they are slowly moving into industrial settings. The slow rate at which this transition takes place is mainly due to the perceived difficulty of formalising the behaviour of systems. While this is undoubtedly true, it is not the case with all formal methods.

Update Plans are a powerful formalism for the description of computer architectures and intermediate to low-level languages. They are a declarative specification language with an underlying imperative machine model. The descriptions using Update Plans are clear, compact, intuitive, unambiguous and simple to read. These characteristics allow for the minimisation of possible errors at early stages of the development process even before a verification takes place.

In this thesis an overview of the Update Plans formalism is given and a number of real-world applications is shown. The investigation of the application area focuses on computer architectures for which various specifications already exist. The comparison of Update Plan specifications to other specifications provides a useful insight into the strengths and shortcomings of the formalism. The shortcomings, in particular the lack of synchronisation primitives and modularity, are addressed by the development and evaluation of several syntactic and semantic extensions described in this thesis. The extended formalism is also compared to other specification languages and conclusions are drawn.
Acknowledgements

The research detailed in this thesis was funded by the Engineering and Physical Sciences Research Council (EPSRC) and partially by the School of Computing and Engineering of the University of Huddersfield. I wish to express my gratitude to both of these institutions for their financial support.

I am indebted to my director of studies Dr. Hugh R. Osborne and my supervisor Dr. Adrian R. Jackson, for their help and guidance over the past three years.

Special thanks belong to my parents and my girlfriend Kamila for their continuous support and encouragement.

Statement of Original Authorship

The work contained in this thesis has not been previously submitted for a degree or diploma at any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

Brno, 4th April 2004
Contents

Introduction ............................................. 1
1 Context ................................................. 1
2 Update Plans ............................................ 2
  2.1 Research questions .................................. 3
3 Organisation ............................................. 4
4 Notational conventions ................................. 6

I Basic Update Plans ........................................... 7

1 Update Plans ............................................. 9
  1 Basic Update Plans .................................... 9
  2 Typing .................................................. 13
  3 Archetypes ............................................ 14
    3.1 Syntax ............................................ 15
    3.2 Expansion ........................................ 16
    3.3 Syntactic sugar .................................... 17
  4 Parallelism ............................................ 19

5 Examples ................................................. 19
  5.1 ADTs ............................................... 19
  5.2 Archetype expansion ............................... 20

2 PDP-11 .................................................... 22
  1 Addressing modes ..................................... 22
  2 Instructions ......................................... 24
    2.1 Single operand instructions ...................... 24
    2.2 Double operand instructions .................... 26
    2.3 Condition code and program flow operations ....... 27
    2.4 Interrupts ........................................ 28
    2.5 Other instructions ................................ 29
  3 Conclusions ........................................... 29
### Contents

3.1 Constants ........................................ 60  
3.2 Type grammar ................................... 61  

4 Comments ........................................... 61  

5 Conclusions ....................................... 62  

6 Semantic Extensions .............................. 63  
1 Parallel blocks .................................. 63  
2 Sequential update schemes ..................... 64  
2.1 Background/Motivation ....................... 64  
2.2 Syntax ..................................... 66  
2.3 Semantics .................................. 67  
2.4 Canonical form ................................ 69  
2.5 Implementation ................................ 72  

3 Sequential archetypes ........................... 76  
3.1 Background/Motivation ....................... 76  
3.2 Syntax ..................................... 77  
3.3 Semantics .................................. 77  
3.4 Special types of sequential archetypes ..... 80  
3.5 Parameters .................................. 81  
3.6 Syntactic sugar ................................ 82  
3.7 Limitations .................................. 83  

4 Special cases of archetype expansion .......... 83  
4.1 Alternatives ................................ 83  
4.2 Parallel blocks ................................ 83  

5 Conclusions ....................................... 84  

7 PRAM .................................................. 85  
1 Informal description ............................ 85  
2 2-PRAM memory models ......................... 87  
3 \( n \)-PRAM memory models ..................... 89  

4 Instructions ....................................... 91  
4.1 Addressing modes ............................. 91  
4.2 Accumulator loading instructions .......... 93  
4.3 General purpose register loading instructions 95  
4.4 Program counter loading instructions ....... 95  
4.5 Memory read instructions .................... 95  
4.6 Memory write instructions ................... 95  
4.7 The instruction set ........................... 96  

5 Conclusion ........................................ 96
# CONTENTS

<table>
<thead>
<tr>
<th>8 Other Methods</th>
<th>97</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 Specification methods</td>
<td>97</td>
</tr>
<tr>
<td>1.1 Hardware</td>
<td>97</td>
</tr>
<tr>
<td>1.2 Concrete machines and instruction sets</td>
<td>100</td>
</tr>
<tr>
<td>1.3 Parallelism</td>
<td>101</td>
</tr>
<tr>
<td>1.4 Protocols</td>
<td>103</td>
</tr>
<tr>
<td>1.5 Z/VDM</td>
<td>104</td>
</tr>
<tr>
<td>2 Verification</td>
<td>104</td>
</tr>
<tr>
<td>2.1 ACL2</td>
<td>105</td>
</tr>
<tr>
<td>3 Conclusions</td>
<td>106</td>
</tr>
<tr>
<td>3.1 Integrated specification/verification methodologies</td>
<td>106</td>
</tr>
<tr>
<td>3.2 Specification methods</td>
<td>106</td>
</tr>
<tr>
<td>3.3 Summary</td>
<td>108</td>
</tr>
<tr>
<td>9 Conclusions and Future Research</td>
<td>109</td>
</tr>
<tr>
<td>1 Update Plans applications</td>
<td>109</td>
</tr>
<tr>
<td>2 Update Plans extensions</td>
<td>110</td>
</tr>
<tr>
<td>3 Future considerations</td>
<td>111</td>
</tr>
<tr>
<td>3.1 Theory</td>
<td>111</td>
</tr>
<tr>
<td>3.2 Applications</td>
<td>112</td>
</tr>
<tr>
<td>3.3 Implementation</td>
<td>113</td>
</tr>
</tbody>
</table>

Bibliography

| A Extended Update Plans Grammar | 121 |
| B PDP-11 | 129 |
| C SPARC-V9 | 133 |
| D Java Virtual Machine | 140 |
| E n-PRAM | 144 |
| F Glossary | 151 |
Introduction

1 Context

The first recorded ideas about modern formal methods were those of Leibniz (1646–1716), who dreamt [47] about a machine called “calculus ratiocinator” which would decide any question given in a language “characteristica universalis” rich enough to describe any kind of phenomena.

Many years have passed since these ideas surfaced, and formal methods are today advocated as a means of increasing the reliability of systems as exhaustive testing through simulation is no longer possible due to the ever increasing complexity of integrated circuits. Furthermore, the gap between the performance of current systems used for simulation and the complexity of systems under development is increasing rather than shrinking.

The current uses of formal methods include but are not limited to safety and security-critical systems in domains such as transport, defence, banking and medical applications, where the cost of failure is unacceptably high. The losses suffered by bad microprocessor design in particular can be devastating. For example, the cost of Ariane 5 to the European Space Agency was $7,000 million and 10 years of design due to an overflow error during a conversion of a number from a 64-bit format to a 16-bit format [27]. The Pentium floating-point bug [63] in the division algorithm cost Intel an estimated $500 million. Perhaps not surprisingly the use of formal methods is spreading from government agencies such as NASA or MoD into a wider industrial sphere.

The “weakest link” in the development of a system is usually its specification. Many inconsistencies and ambiguities can be discovered just by going through the process of rigorous specification. By writing a formal specification and devising a verification strategy one is forced to think about the system in new ways, new questions are raised, new insights into the system are gained and thus the number of possible errors is minimised at early stages of the development.

While there are methods for formal development of high level software (e.g. [7, 14]), and for hardware description (e.g. [36, 57, 73]), there seems to be a shortage of specification and development formalisms at the intermediate level of machine architectures and instruction sets.
as the semantics of low-level languages is still [48, 70, 77] unjustly assumed trivial. Although it is possible to describe the semantics of low-level languages by various methods [13, 16, 34, 66, 69], they either lack formal semantics themselves or the readability of specifications aimed at this level is generally bad.

Update Plans [60], on the other hand, aim to fill the gap among formal languages by providing a formal, clear, unambiguous and expressive way of describing intermediate to low-levels of machine architectures.

Many formal methods (some of which are briefly described and compared with Update Plans in this thesis) are difficult to grasp. The learning time for the production of usable designs with such methods is usually very long, and an efficient use of a specification language relies heavily on its detailed knowledge. This also results in all sorts of other difficulties. Firstly, it is very easy to formalise the wrong thing in this environment. Secondly, a hard to understand design description makes any verification more complicated.

Update Plans, on the other hand, have a short learning curve—they are easy to learn and understand as its basic concepts are very simple and confidence in using the formalism is achieved very soon. Designs written in Update Plans are highly communicable and compact formal descriptions.

2 Update Plans

Update Plans (UP for short) are a formalism for the description of (abstract) machines and algorithms. A specification produced by UP combines both the structural and behavioural model of a system. The structural descriptions are concrete, detailed and low-level (although it is easy to abstract away from irrelevant details). With respect to algorithmic structures the descriptions are abstract and high-level. These characteristics make UP particularly suitable as a specification language for the description of large classes of machine architectures [55, 58, 59, 61] and as a target language in compiler design [54, 60].

In [54] the Update Plans formalism (then called Update Schemes) was introduced by Hans Meijer as a target language in the framework of the design of a translator generator. Since then it has been extended to the description of machines and algorithms.

Hugh Osborne designed a formal semantics [58] and developed a powerful macro-like mechanism called archetypes. He also simplified the typing regime, introduced parallelism [60], constructed a prototype implementation based on [45], investigated a number of typical applications [59, 61] and approached the problem of the formal verification of UP specifications.

A brief description of most of the above-mentioned past areas of Update Plans research is given in the following points.

- Semantics. The presence of a formal semantics for any specification language is a necessity for any kind of formal reasoning about specifications written in that language. In UP
this has been done in [58] by defining the underlying imperative model and the referentially transparent semantics of UP.

- **Archetypes.** Archetypes are an abstraction and structure reuse mechanism that has been introduced into Update Plans in order to further increase their expressiveness [60]. Although the primary motivation was abstracting away the details of addressing modes, they can be used for a variety of other purposes where it is desirable to express multiple update schemes by a single archetype call.

- **Parallelism.** Parallelism is inherent in Update Plans. While there is (to some extent) no need for an explicit mechanism to describe it, it has been introduced [60] to increase legibility by providing a way to allow many update schemes to be combined into one atomic update.

- **Typing.** The type information introduced to Update Plans in [60] is not only important for type checking, but it also has serious consequences for the implementability and formal verification in Update Plans.

- **Verification.** Perhaps most importantly, the work on proving semantic equivalence between update plans (possibly representing various levels of description) has started [60].

- **Applications.** A number of machines varying from very abstract to concrete have been described. One of the first larger applications was a machine for a simple functional language [59]. Two lower level specifications have been given for a more realistic machine model, taken from [25]. The more abstract of these was a machine for a tree language, typical of intermediate code generated by a compiler. The more concrete was a PDP-11 style machine. A linearisation of the tree code (register allocation) was given transforming it to concrete machine code. A proof of the semantic equivalence of the two specifications under the transformation was shown, which verified the register allocation algorithm. A specification of a pipelined RISC processor has been given [61]. UP have also been applied to a (partial) specification of the Java Virtual Machine [55]. Using an abstraction mechanism similar to the archetype mechanism, elementary VLSI components have been specified [53].

Update Plans have also proved very useful in education. Apart from their didactic use in explaining the intricacies of addressing modes and the semantics of instructions of a computer architecture simulator [62], they can also be used as a compact means of explaining algorithms such as rebalancing operations in AVL trees [60].

### 2.1 Research questions

The main aim of this project is to develop UP as a specification and verification formalism, primarily at the level of low-level code and hardware. To achieve this a hierarchical modular specification and validation structure for UP needs to developed. This would provide a formalism in which multi-level specifications of architectures can be given, each level being a translation or transformation of its neighbouring levels. The formally defined translation
between levels will provide a proof of the semantic equivalence of the specifications.

A second aim is to extend the syntax and semantics of UP to cover a wide range of parallel paradigms and, in particular, their communication and synchronisation primitives.

To achieve this the problems of parallelism, modularity and formal verification need to be addressed.

- **Parallelism.** More general mechanisms for specifying parallel systems need to be introduced, allowing for the specification of asynchronous parallelism and related communication primitives. The semantics of the new constructs must be clearly defined. While some preliminary investigation of parallelism in UP has been done, this has not gone much further than replacing nondeterministic execution of an applicable update scheme with simultaneous execution of all applicable update schemes; so in order to maintain well-behaviour a host of artificial semaphore-like constructs have to be used.

- **Modularity.** A modular structure needs to be introduced into UP in order to provide the formalism with a mechanism for structure reuse, for information hiding and to encourage hierarchical specification. A specification at one level, e.g. that of the fetch/execute cycle could then be defined in terms of a lower level specification such as microcode, and in turn be used to specify higher level activities such as the instruction set.

- **Formal verification.** A proof system needs to be developed that, given UP specifications of two machine architectures and a translation, would prove the semantic equivalence of the specifications under the translation. This work should be based on the concepts of semantic equivalence, translation and irrelevance [60].

## 3 Organisation

This thesis contains an evaluation of Update Plans as a specification formalism for hardware architectures, and suggests extensions based on this evaluation. It is divided into two parts. Unless indicated otherwise by marginal notes giving forward references to sections of chapter 5, the first part uses basic Update Plans as defined by [60], and the suggested extensions are described in the second part. The first part of this thesis is an evaluation of basic Update Plans, and consists of three case studies involving instruction set specifications of machines differing in the degree of abstraction.

A tutorial on basic Update Plans is given in chapter 1. A lot of material in this chapter was adapted with minor amendments from [60] and [61].

The PDP-11 specification in chapter 2 is the first attempt to specify a large subset of a real machine's instruction set using Update Plans. It was chosen as a candidate for specification because some historically important formal specifications of the PDP-11 exist [20, 69]. Consideration of a more detailed specification at the fetch/execute cycle drove the development of some semantic extensions described in chapter 6.
INTRODUCTION

The first chapters to make some advance use of the syntactic extensions to Update Plans described in chapter 5 are chapters 3 and 4. Both of these chapters feature modern, abstract, but very different architectures. Chapter 3 has the UP specification for the SPARC-V9 architecture. Again, there was already a partial specification of the SPARC-V9 architecture [65], but this is aimed at the specification of connections and communication rather than at a concise specification of the instruction set. In chapter 4 the Java Virtual Machine (JVM) was chosen to test UP's suitability for describing more abstract instruction sets using a wide variety of types.

The second part of this thesis proposes some extensions to Update Plans and contains a comparison of Update Plans to other formal methods.

Chapter 5 covers the syntactic extensions brought about during the development of the SPARC-V9 and JVM instruction set specifications in the first part of the thesis. In particular this chapter discusses the changes in the revised Update Plans grammar and the extensions to the typing mechanism which, in many cases, allow multi-level specifications.

The concept of sequential update schemes and archetypes is introduced in chapter 6 together with other semantic extensions to make the Update Plans formalism more expressive and consistent. More importantly, this chapter gives possible answers to the research questions of parallelism and modularity. The problem with synchronisation of parallel processes is solved by the introduction of a general synchronisation primitive (sequential update schemes) that augments non-deterministic model of Update Plans by explicitly stating the order in which updates will be applied. Sequential archetypes extend the possibilities for information hiding and structure reuse by encapsulating a series of synchronised updates at one level of abstraction into a modular update easily identifiable against a corresponding update at another level of description.

Chapter 7 contains a more comprehensive example of the application of sequential update schemes and archetypes in a specification of a theoretical model of computation—Parallel Random Access Machine (PRAM). PRAM provides a useful test-bed for the semantic extensions as it specifies strictly sequential operations (the fetch/execute cycle) within a massively parallel context.

The penultimate chapter evaluates Update Plans. Other existing formalisms which could be used for the specification of intermediate levels of hardware architectures are considered. A brief description of each of the formalisms is given and they are compared to Update Plans. This chapter uses a few simple examples to show the advantages the introduction of sequences brings to UP.

Finally, conclusions are drawn and future research directions are suggested in chapter 9.
4 Notational conventions

Throughout this thesis, the following notational conventions are observed.

- Production rules used to describe basic and Extended Update Plans grammars given in this thesis make use of the notational conventions of the specification formalism ASF+SDF [43, 74] in which the notation \( \{ S \text{ term} \}^+ \) indicates a list of one or more \( S \)'s separated by the terminal \( \text{term} \). If the \( ^+ \) is replaced by a \( ^* \) the list may also be empty. The suffix \(-\text{opt}\) indicates zero or one occurrences of its nonterminal. More notational conventions used by the ASF+SDF specification formalism can be found in appendix A on page 121.

- Update plans given in this thesis use the typewriter font. This convention has two exceptions. Firstly, update plan comments and 'meta-variables' such as \( \text{lhs} \) use the italic typestyle. The second exception is update plan elements which have been commented out. They use the slanted typewriter font.

- All emphasised words appearing freely in the text are glossary terms explained in appendix F for quick reference. The only exception are references to update plan 'meta-variables' which use the emphasised (italic) typestyle.

- Whenever a reference to an instruction field is made, a sans serif typeface is used. The \( \| \) symbol designates concatenation of bit vectors, the \( \% \) symbol is arithmetic modulo, and \( \times \) is used for multiplication. Symbols \&,, |, and ^ are used for bit-wise AND, OR, and XOR operations respectively. And finally, symbols \( =, \neq, <, >, \leq \) and \( \geq \) have their usual mathematical meanings.

- Marginal notes used in the first part of this thesis are forward references to sections of chapter 5 where syntactic UP extensions are explained. References to the trivial extension introduced in section 4 of the same chapter are not made.

- The American spelling 'program' is used to denote computer software.
Part I

Basic Update Plans
Introduction

Part one of this thesis serves as an introduction to Update Plans with the focus on the description of hardware architectures. It shows three specifications of instruction sets of machines at various degrees of abstraction, ordered from the most concrete to the most abstract. Each of the specifications provide some interesting insights into the formalism and uncovers its various shortcomings, which are addressed in the second part of the thesis. Although the specifications use basic Update Plans as described in [60], some advance use (in chapters 3 and 4) is made of syntactic changes to the formalism described in chapter 5.
Chapter 1

Update Plans

This chapter gives an informal overview of basic Update Plans. Only the basic facts necessary to understand the specifications in the first part and the reasons for the extensions in the second part of the thesis are introduced. More information on Update Plans along with their complete formal syntactic and semantic definition can be found in [58–61].

1 Basic Update Plans

An update plan specifies state transitions in an abstract machine. This machine consists of a number of stores, each containing a linear countably infinite sequence of memory cells (e.g. bytes or machine words) between locators (addresses). Note that it is not the cells themselves that are addressed, but the boundaries between cells as shown in figure 1.1.

Triples $\alpha[\xi]\beta$ are called locator expressions, where $\alpha$ and $\beta$ are locators, $\alpha \leq \beta$ and the cells between the addresses $\alpha$ and $\beta$ contain (a particular representation of) the value of $\xi$. A set of locator expressions is consistent if there are no two or more expressions in the set which specify the contents of some cell to have different values. A consistent set of locator expressions describes a (sub)configuration of the machine.

State transitions in the abstract machine are described by update schemes, which are constructed from two sets of locator expressions forming the left-hand side (lhs) and the right-hand side (rhs), separated by a guard ($\models \gamma \models$) which carries an applicability condition $\gamma$. A guard whose condition is always true can be simply written as $\models$.

An update plan is a set of update schemes, each of which may contain unspecified values (variables). Variables are indicated by lower case words, constants are indicated either by a value or, symbolically, by upper case words. An update scheme containing no unspecified values is called an update rule. Update schemes yield update rules by instantiation. Both the left and right-hand sides of the resulting update rule must be self-consistent, i.e. all locator expressions in the left and right-hand side must be mutually consistent.
An update rule is applicable to a given configuration if its left-hand side is a subset of that configuration and its guard is true. The memory then may be minimally updated such that thereafter all locator expressions in its right-hand side are satisfied.

An initial configuration, which specifies the initial state of the memory before any update scheme is applied, and an update plan form an update script. The update plan is executed by repeatedly (non-deterministically) choosing an applicable update scheme from the plan, and applying it, until the configuration is such that no scheme is applicable. This final configuration is the result of the script.

The overview of the basic Update Plans syntax is given by the following grammar.

\[
\begin{align*}
\langle \text{script} \rangle & \rightarrow \langle \text{configuration} \rangle \ "\cdot\" \langle \text{plan} \rangle \\
\langle \text{plan} \rangle & \rightarrow \langle \text{item} \rangle^* \\
\langle \text{item} \rangle & \rightarrow \langle \text{scheme} \rangle \ "\cdot\" \\
\langle \text{configuration} \rangle & \rightarrow \langle \text{locator expression} \rangle^* \\
\langle \text{scheme} \rangle & \rightarrow \langle \text{configuration} \rangle \langle \text{guard} \rangle \langle \text{configuration} \rangle \\
\langle \text{guard} \rangle & \rightarrow \ "\{\" \langle \text{term} \rangle \ "\Rightarrow\" \\
\langle \text{locator expression} \rangle & \rightarrow \langle \text{locator} \rangle \ "\[\" \langle \text{text} \rangle \ "\]\" \langle \text{locator} \rangle \\
\langle \text{locator} \rangle & \rightarrow \langle \text{term} \rangle \\
\langle \text{text} \rangle & \rightarrow \langle \text{term} \rangle^*
\end{align*}
\]

A \langle \text{term} \rangle is an expression built from constants, variables and operators. Note that the grammar presented here is a very simplified version of the Update Plans grammar to demonstrate the basic structure. This grammar will be extended at several places in this chapter, but will still not be presented in its entirety. The complete basic Update Plans grammar can be found in [60], and a complete and updated grammar for Extended Update Plans in appendix A.

The classic two-scheme update plan in example 1 demonstrates Euclid’s algorithm for computing the GCD of the number initially between A and B and that initially between B and C. The constants A, B and C are fixed locators and x and y are unspecified values.

---

**Example 1**

\[
\begin{align*}
A[x]B & \iff B[y \iff B[y - x]C. \\
\end{align*}
\]
CHAPTER 1: UPDATE PLANS

If at any stage of the computation the machine configuration contains $A[B]$ and $B[C]$, (only) the second update scheme (instantiating into an update rule) in example 1 is applicable, whereupon the $9$ is replaced by a $3$.

Syntactic sugar

The following points describe notational simplifications which make update plans more legible.

- A superfluous locator may be omitted. A locator is superfluous if its removal does not lead to any confusion.
- Contiguous sequences may be concatenated. Two expressions $x[s]y$ and $y[t]z$ may be written as $x[s|y|t]z$.
- Locators may be also omitted when concatenating contiguous sequences so that $x[s|y|t]z$ may also be written as $x[s \cdot t]z$.
- Identical left-hand sides may be shared. A repeat of the previous left-hand side is indicated by the ‘repeat’ symbol ‘$\ddagger$’.
- Other notational conventions (I/O, the introduction of the program counter and alternatives) are described in sections of their own.

I/O

As input/output instructions are one of most frequently used operations, a mechanism for hiding the details and so sweetening the syntax is provided. Input streams which are described by an update scheme

$$\cdots IP[s] i[input]j \cdots = [g] \Rightarrow \cdots IP[j] \cdots$$

may be written as

$$\cdots ?IP[input] \cdots = [g] \Rightarrow \cdots$$

For the standard input stream the ‘? [input]’ can be used when this is unambiguously defined.

Similarly, output streams

$$\cdots OP[o] \cdots = [g] \Rightarrow \cdots OP[p] o[output]p \cdots$$

may be written as

$$\cdots = [g] \Rightarrow \cdots !OP[output] \cdots$$

Again, the ‘! [output]’ can be used for the standard output stream, if it is defined.

Program counter

By acknowledging the existence of a program counter at a fixed locator (by convention $PC$), update schemes exhibiting the pattern
CHAPTER 1: UPDATE PLANS

J-d

Figure 1.2: Visualisation of the PUSH operation on a stack machine


may be written as so-called commands

\[ \text{OP args } lc = [g] \Rightarrow \text{next } rc. \]

with the program counter hidden, where at least one of \( \text{OP args} \) and \( \text{next} \) is a non-empty command sequence, and \( \text{lc/rc} \) are left/right contexts. The first term of a command will in most cases be a constant, known as the opcode. An update plan in which all update schemes are commands is know as a command driven plan. A configuration is in command form if it contains a non-empty command sequence, or one in which the contents of the register PC are not specified.

---

Example 2

The following two commands may be part of a stack machine.

\[
\begin{align*}
PUSH \ a \ & a[x] \ SP[t] \Rightarrow s[x]t \ SP[s]. \\
POP \ & a[s] \ s[x]t \Rightarrow a[x] \ SP[t].
\end{align*}
\]

They will be desugared as

\[
\begin{align*}
PC[pc] \ pc[PUSH \ a]qc \ a[x] \ SP[t] \Rightarrow PC[qc] \ s[x]t \ SP[s]. \\
PC[pc] \ pc[POP \ a]qc \ SP[s] \ a[x]t \Rightarrow PC[qc] \ a[x] \ SP[t].
\end{align*}
\]

A graphical description of the PUSH operation is in figure 1.2.

---

Alternatives

A set of update schemes with mutually exclusive guards often performs some form of case analysis. Such a set

\[
\begin{align*}
\text{lhs}_1 = & [g_1] \Rightarrow \text{rhs}_1. \\
\text{lhs}_2 = & [\neg g_1 \land g_2] \Rightarrow \text{rhs}_2. \\
\vdots \\
\text{lhs}_n = & (\bigwedge_{i=1}^{n-1} \neg g_i) \land g_n \Rightarrow \text{rhs}_n.
\end{align*}
\]
can then be written as a series of $n$ update schemes known as *alternatives*

\[
\begin{align*}
\text{lhs}_1 &= \{ g_1 \} \Rightarrow \text{rhs}_1; \\
\text{lhs}_2 &= \{ g_2 \} \Rightarrow \text{rhs}_2; \\
\vdots \\
\text{lhs}_n &= \{ g_n \} \Rightarrow \text{rhs}_n.
\end{align*}
\]

The update schemes are divided by semicolons and only the first applicable update scheme, reading from top to bottom, will be applied. The following production rules are added to the basic Update Plans grammar.

\[
\begin{align*}
\langle \text{item} \rangle &\rightarrow \langle \text{alternatives} \rangle \ "::" \\
\langle \text{alternatives} \rangle &\rightarrow \{ \langle \text{update scheme} \rangle \ "::" \}^+ 
\end{align*}
\]

## 2 Typing

The basic type in Update Plans is the locator. In fact all objects appearing in an update plan are considered to be locators.

New types may be defined by combining existing types using any of the standard operators of regular expressions. Such a declaration is said to define a *type alias*. Store names are lower case words with an initial upper case letter, and must, therefore, contain at least two letters, in order to ensure that store names can be distinguished from constants. Stores are declared by listing them between braces, e.g.

\[
\{ \text{Bit, Bool, Int, Stack, Heap} \}.
\]

Each store name is said to be a *type primitive*. Type aliases are declared similarly

\[
\{ \text{Num = Byte | Short | Long} \}.
\]

Type aliases may not be recursive, either directly or indirectly. This ensures that any type alias can be expressed as a regular expression containing only type primitives. Every object appearing in an update plan must be typed. For some objects this may be done implicitly. Each symbolic constant is considered to have its own unique type, again unless indicated otherwise. Other objects—expressions for which no type can be determined automatically—must have their type indicated. This can be done by means of a global declaration valid throughout an update plan, e.g. ‘$v :: \text{Store1}$’. A global type definition can be overridden by *casting* a term within an update scheme with a different type, e.g. ‘$(v :: \text{Store2})$’ such a cast determining only the type of the term to which it is applied. The syntax of type declarations is obtained by adding the following rules to the basic UP grammar

\[
\begin{align*}
\langle \text{item} \rangle &\rightarrow \langle \text{store declaration} \rangle \ "::" | \langle \text{type declaration} \rangle \ "::" \\
\langle \text{store declaration} \rangle &\rightarrow "\{" \{\langle \text{store} \rangle \ "::"\}^+ "\}
\end{align*}
\]
A \texttt{store} name is a lower case word with a leading upper case letter, and a \texttt{store structure} is a regular expression over the set of store names. Some extensions have been added to the typing mechanism, and can be found in chapter 5 alongside an updated type grammar.

A very important concept related to typing and implementability of update plans is grounding. It is extensively described in [60], but the following terminology is provided for convenience as it will be used in this thesis.

A ground expression is one for which a unique variable-free expression can be derived, possibly by instantiation of variables with respect to the current configuration; a semi-ground expression is one for which a finite number of such expressions can be derived.

---

Example 3

This example assumes that the length of an object of the type \texttt{Bit} is 1.

Given the following update plan

\[
\begin{align*}
x, y &:: \texttt{Bit}. \\
x, s, y &:: (\texttt{Bit})^*.
\end{align*}
\]

\(A, B\) and \(C\) are ground as they are constants; \(b\) and \(c\) are also ground \((b = A + 1, c = B + 2)\); \(d\) is non-ground, since the length of \(ys\) cannot be established; \(x\) is ground, since both of its locators are ground and its value can be determined by a reference to a configuration. The ‘sequence’ \(xs\) is ground for \(b \leq c \leq b + 1\), semi-ground otherwise. Finally, both \(ys\) and \(y\) are non-ground.

---

3 Archetypes

The expressive power of Update Plans is greatly increased by the use of a macro-like mechanism known as archetypes. An update plan often contains a set of similar update schemes sharing certain parts of their left/right-hand sides. Such patterns can be moved out of update schemes into an archetype definition, and replaced by archetype calls. This will almost certainly mean a reduction in the number of update schemes necessary for a description as duplicate update schemes will be omitted from the update plan.
The primary motivation for the introduction of the archetype mechanism was the complexity of many machines due to the number of addressing modes they use. In an update plan, addressing modes can easily be replaced by a single archetype call, thus making it possible to express many update schemes as one.

Archetypes can also be viewed as a mechanism of abstraction, where certain frequently used actions are be grouped into "modular updates". This view will be reinforced in the second part of the thesis, where the concept of sequential update schemes and archetypes is introduced.

3.1 Syntax

The basic Update Plans grammar is updated by the following production rules.

\[
\begin{align*}
\text{(item)} & \rightarrow \text{(archetype definition)} \\
\text{(archetype definition)} & \rightarrow \\
& \text{(basic archetype definition)} \mid \\
& \text{(command archetype definition)} \\
\text{(basic archetype definition)} & \rightarrow \text{(basic declaration)} \text{(basic definition)}^* \\
\text{(basic declaration)} & \rightarrow \text{(basic archetype name)} \text{(parameters)} \\
\text{(basic definition)} & \rightarrow "=" \text{(basic body)} \\
\text{(basic body)} & \rightarrow \\
& \text{(configuration)} \text{(guard)} \text{(configuration)} \mid \\
& \text{(repeat)} \text{(guard)} \text{(configuration)} \mid \\
& \text{(configuration)} \mid \\
& \text{(text)} "[" \text{(context)} \text{(guard)} \text{(context)} \mid \\
& \text{(text)} "[" \text{(repeat)} \text{(guard)} \text{(context)} \mid \\
& \text{(text)} "[" \text{(context)} \\
\text{(command archetype definition)} & \rightarrow \\
& \text{(command declaration)} \text{(command definition)}^* \\
\text{(command declaration)} & \rightarrow \\
& \text{(command archetype name)} \text{(parameters)} \text{(text)} \\
\text{(command definition)} & \rightarrow "=" \text{(command body)} \\
\text{(command body)} & \rightarrow \\
& \text{(context)} \text{(guard)} \text{(context)} \mid \\
& \text{(repeat)} \text{(guard)} \text{(context)} \mid \\
& \text{(context)} \\
\text{(parameters)} & \rightarrow "(" \{\text{\text{(term)} "," \} \} "")"
\end{align*}
\]
A basic archetype name is an identifier, a command archetype name is a constant. The syntax of an archetype call is basically that of an archetype declaration. The only difference is when archetype calls are 'coupled' by an index.

\[
\begin{align*}
\langle \text{term} \rangle & \rightarrow \langle \text{archetype call} \rangle \\
\langle \text{archetype call} \rangle & \rightarrow \langle \text{archetype name} \rangle \langle \text{index} \rangle \text{-opt} \langle \text{parameters} \rangle \\
\langle \text{archetype name} \rangle & \rightarrow \langle \text{basic archetype name} \rangle \mid \langle \text{command archetype name} \rangle \\
\langle \text{index} \rangle & \rightarrow \langle \text{number} \rangle \mid \"\{\" \langle \text{number} \rangle \"\}\"
\end{align*}
\]

Again, the archetype grammar has been adapted with minor amendments from [60].

3.2 Expansion

The archetype expansion mechanism consists of two stages, the textual expansion and the parameter resolution stage.

The parameter resolution phase is necessary due to the fact that parameters do not have to be variables—they can be complex expressions. The aim of parameter resolution is to derive semi-ground expressions for non-ground terms. Parameters are resolved by using a resolution set, to which equations are added as rewriting takes place. If at any point in this process a non-trivial equation is derived relating two semi-ground expressions in an update scheme, it is added to the scheme's guard. The method is discussed in great detail in [60].

An archetype's body consists of a left and a right-hand side expansion and context. The expansion is the text that actually replaces the archetype call, the left/right-hand side expansion replacing the call on the left/right-hand side. The contexts are simply added to the call's scheme, again the left/right-hand side context is added to the left/right-hand side of the scheme. Archetype variables conflicting with variables of the call's scheme will be renamed before expansion.

See, for example, an archetype autoinc in example 4a which could define an autoincrement addressing mode. The (output) parameter \(v\) obtains its value by means of instantiation against the current configuration, and the contents of \(r\) are updated.

\[
\text{Example 4a}
\]

\[
\text{autoinc}(v) = \text{AUTOINC} r b[v]c \implies r[c].
\]

In the example, the left-hand side expansion is \(\text{AUTOINC} r\) and the left-hand side context \(r[b] b[v]c\).

Archetype calls occur in indexed pairs, with one element of the pair on the left-hand side of the update scheme or archetype definition in which it occurs, and the other on the right-hand side. The resolution mechanism is demonstrated in example 4b, which gives a possible expansion of \(\text{ADD } r \text{ autoinc}_1(x) \text{ autoinc}_2(y) \implies \text{autoinc}_1(x) \text{ autoinc}_2(y) r[x + y]\).
As will be noted in the following section, syntactic sugar allows one of the elements of an archetype call pair to be omitted if the corresponding expansion is empty. The indices may then also be omitted, as they are superfluous. The update plan from the previous example can then be rewritten as ADD r autoinc(x) autoinc(y) \( \Rightarrow r[x + y] \).

Archetype calls can be recursive. However, there are several limitations as to where recursive and non-recursive archetype calls may appear in an update plan. For example, a guard may not contain an archetype call and a locator and an archetype parameter may not contain a call of a recursive archetype.

### 3.3 Syntactic sugar

The bodies of archetype definitions inherit syntactic sugar as described in section 1. Additionally, syntactic sugar also makes it possible to

1. omit the guard and right-hand side from the body of an archetype definition, if the right-hand side and the guard are both empty
2. replace irrelevant parameters by the “don’t care” symbol ‘.’
3. share identical archetype declarations (see the grammar)

Other forms of syntactic sugar are left/right-handed, ambidextrous and command archetypes.

#### Left/Right-handed archetypes

If the expansions (text) of all right/left-hand sides of a given archetype in all of its definitions are empty then the right/left-hand side of that archetype may be omitted. If all right/left-hand side calls are omitted in an update plan, such an archetype is called a left/right-handed archetype.

Left/Right-handed archetypes further contribute to sweetening of an update plan by the omission of archetype call’s indices which are rendered superfluous as all calls of the same name appear only on one (left/right) side of an update scheme.

#### Ambidextrous archetypes

A left/right-handed archetype are restricted in that they may only be called on the left/right-hand side. An ambidextrous archetype is a context independent archetype, which can be called on both sides of an update scheme.

A definition of an ambidextrous archetype \( a \) is equivalent to a pair of left and right archetype definitions \( a_l \) and \( a_r \).
which can be 'compressed' into one ambidextrous archetype definition

\[ a(params) = \text{text } lc \rightarrow rc. \]

One use of ambidextrous archetypes can be to describe repetitive computations occurring on both sides of an update scheme.

--- Example 5 ---

Consider the following recursive ambidextrous archetype pad(x) and an update scheme containing a call of this archetype. The archetype expands x padding bytes PAD to align address a of the JMP instruction so that a begins at an address that is a multiple of 4 bytes.

\[
\begin{align*}
\text{pad}(0) &= . \\
\text{pad}(n) &= \text{PAD } \text{pad}(n - 1) \quad |n > 0 \Rightarrow . \\
\text{PC}[pc] &= \text{PC}[\text{JMP } \text{pad}(4 - (pc + 1) \% 4) a] \Rightarrow \text{PC}[a].
\end{align*}
\]

The purpose of making the pad(n) archetype ambidextrous is that it can be called on both sides of an update scheme without the need to define the following pad\(_L\)(n) and pad\(_R\)(n) pair

\[
\begin{align*}
\text{pad}_L(n) &= \text{PAD } \text{pad}(n - 1) \quad |n > 0 \Rightarrow . \quad \# \text{To be called on the lhs} \\
\text{pad}_R(n) &= |n > 0 \Rightarrow \text{PAD } \text{pad}(n - 1). \quad \# \text{To be called on the rhs}
\end{align*}
\]

in which case the pad\(_L\)(n) and pad\(_R\)(n) archetypes would have to be called on the left/right-hand side of an update scheme respectively, depending on the side padding is needed.

---

**Command archetypes**

An ambidextrous archetype \( a \) whose text part begins with a constant can be further compressed by using this constant as a name of a **command archetype**, unless this constant has already been used for this purpose. The command archetype

\[
\text{CONST}(params) \quad \text{text } lc \rightarrow rc.
\]

is the sugared version of

\[
a(params) = \text{CONST } \text{text } lc \rightarrow rc.
\]

with calls of \( a \) replaced throughout the update plan by calls of \( \text{CONST} \). Note that the syntax of ambidextrous archetypes changed (see chapter 5) in Extended Update Plans to be more consistent with command archetypes.
4 Parallelism

Parallelism is inherent in Update Plans, as application of an update rule is atomic and many cells may be changed simultaneously as a result. However, relying purely on this mechanism may create update schemes of unmanageable length. The solution is to have a set of non-related update schemes, instantiations of which will be applied simultaneously in a parallel block. These instantiations will only be applied if the conditions for application are satisfied for every single update rule and their right-hand sides are mutually consistent. An alternative view of parallel blocks will be given in chapter 6.

Parallel blocks are delimited by the open parallel block symbol, ‘(∥’, and the close parallel block symbol, ‘∥)’. The use of the double pipeline symbol ‘∥’ is encouraged to separate the individual update schemes in a parallel block. An m update scheme parallel block can then be written as

\[
(∥ l_{hs_1} \{ g_1 \} \Rightarrow r_{hs_1}.
\]

\[
∥ l_{hs_2} \{ g_2 \} \Rightarrow r_{hs_2}.
\]

\[\
\]

\[
∥ l_{hs_m} \{ g_m \} \Rightarrow r_{hs_m}.
\]

or making use of typesetting possibilities as

\[
\]

\[
\]

\[
\]

\[
\]

\[
∥\]

To accommodate for parallel blocks, the following production rules need to be added to the basic Update Plans grammar.

\[
\]

\[
(\text{item}) \rightarrow (\text{parallel block})
\]

\[
(\text{parallel block}) \rightarrow "(∥" \ (\text{alternatives} \ "\cdot")^+ "∥")"
\]

Note that in Extended Update Plans not only alternatives can be in the body of a parallel block as will be shown in the second part of this thesis.

5 Examples

5.1 ADTs

The use of Update Plans is not limited to specification of hardware architectures although this is their primary aim. As has already been shown in this chapter, Update Plans can also be used to describe algorithms. This section shows their use in the description and implementation of abstract data types (ADTs), in particular singly linked lists.
An item of a list can be described as item[data item]. It consists of a payload data left of locator item and locator item addressing a following item. To make a list, items can be simply linked in a list of items, and a reference to its head will be added as shown in figure 1.3.

The easiest operation on a list is to find about its emptiness. The archetype is_empty simply checks whether the head of a list points to the constant NULL and expands the appropriate truth value.

Two versions of insert item operation are provided. The difference is only in the placement of item. Note that an additional update scheme inserting an item at the start of the list would be required to complete the ADT if a list without a dummy head and the second version of insert archetype was to be used.

Similarly to the second insert item archetype, the remove item archetype doesn't take into account removal of the first item in a list of linked items. Thus a similar function now removing the first item in a list would be required if a list without a dummy head was to be used.

Finally, the recursive length archetype calculates length of a list of items. The remaining ADT operations such as “retrieve n-th item” can be defined in a similar fashion.

5.2 Archetype expansion

The following example uses archetypes and an update scheme from chapter 2 to illustrate the archetype expansion mechanism. In this example the update scheme

\[ [2.2] \text{arithm}(x, r_1) \rightarrow \text{dop}(a, x) \Rightarrow a[r_1] \text{ cc}(r_1). \]

will be expanded into one update scheme showing the effects of the INC instruction in direct autodecrement mode from the PDP-11 instruction set.

The command scheme 2.2 expands as shown in figure 1.4. The archetypes and their index numbers from chapter 2 are included in figure 1.4 as comments for convenience. After substitution using equations derived at each step the full expansion of our update scheme is
### CHAPTER 1: UPDATE PLANS

<table>
<thead>
<tr>
<th>Rule</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>arithm(x, r_1) dop(a, x) \rightarrow a[r_1] cc(r_1).</code></td>
<td><code>- [2.2] arithm(x, x + 1) = INC. x = x, r_1 = x + 1</code></td>
</tr>
<tr>
<td><code>INCI dop(a, x) \rightarrow a[r_1] cc(r_1).</code></td>
<td><code>- [1.4] dop(a, v_1) = AUTODEC(a, v_1). a = a, x = v_1</code></td>
</tr>
<tr>
<td><code>INCI AUTODEC(a, v_1) \rightarrow a[r_1] cc(r_1).</code></td>
<td><code>- [1.1] AUTODEC(a, v_1) r = r[b] a[v_1] b \rightarrow r[a]. a = a, v_1 = v_1</code></td>
</tr>
<tr>
<td><code>INCI AUTODEC r r[b] a[v_1] b \rightarrow r[a] a[r_1] cc(r_1).</code></td>
<td><code>- [2.1] cc(v_2) = CC\text{\textsubscript{n}}[(v_2 &lt; s 0) (v_2 = 0) (\neg(MIN_s \leq v_2 \leq MAX_s)) (v_2 &gt; u MAX_u)]</code></td>
</tr>
</tbody>
</table>

![Figure 1.4: Example of archetype expansion](image)

A slightly more complex example of archetype expansion can also be found in chapter 3.
Chapter 2

PDP-11

The PDP-11 is one of the most widely known and used machines in the history of computing science, and its importance in mission-critical applications can hardly be questioned. Although one might argue the relevance of a specification of a historically dead machine (although sales of hardware products for the PDP-11 continued until as recently as September 1997), the very fact of its historical importance serves as a counterargument.

The instruction set of the PDP-11 was designed to provide a clean, general, symmetric instruction set. Word length is 16 bits with the leftmost, most significant bit being bit 15. There are eight general registers of 16 bits each. Register 7 is the program counter (PC) and, by convention, register 6 is the stack pointer (SP). There is also a Processor Status Register/Word (PSW) which among other things indicates the 4 condition code bits (N, Z, V, C).

In this chapter, a formal specification (based on [21, 23, 69]) of the DEC PDP-11 machine instruction set is given using the Update Plans formalism. Although some formal specifications of the PDP-11 instruction set already exist [20, 69], this chapter gives a clear, compact, unambiguous [60], and easy to follow specification which can serve as a comparison to those already existing.

1 Addressing modes

Basic addressing modes of the PDP-11 machine are given in 1.1. They define two results—the effective address of the value and the value (v) itself. The following list describes PDP-11 addressing modes informally. "The register" refers to one of eight general purpose registers (r), identified by an instruction word. "The displacement" refers to a word in the memory stored after an instruction (d).

- **register.** Direct access to a general register. The content of the selected register is taken as the value.
• **autoincrement.** At the start of an instruction’s execution, the register contains the address of the value, and after the value is accessed and the instruction is executed, the address is incremented.

• **autodecrement.** The register has been decremented, before the value is accessed at this new address.

• **index.** The content of the register is added to the displacement to produce the address of the value.

• **register deferred.** The address of the value is stored in the register.

• **autoincrement deferred.** The register contains a pointer to the address of the value. The pointer is automatically incremented after the value is retrieved.

• **autodecrement deferred.** The content of the register is first decremented and then contains a pointer to the address of the value.

• **index deferred.** The displacement is added to the base address stored in the register. The result is a pointer to the address of the value.

\[ \begin{align*}
\text{REG}(r, v) &\quad r \quad \text{reg}(r, v) \quad r \quad \text{r} \quad r \quad v \quad r \quad v \\
\text{AUTOINC}(b, v) &\quad r \quad b \quad r \quad b \quad c \quad c \quad c \quad \text{r} \quad c \quad c \quad c \\
\text{AUTODEC}(a, v) &\quad r \quad c \quad a \quad (v) \quad a \quad (v) \quad a \quad (v) \quad a \quad (v) \\
\text{INDEX}(b + d, v) &\quad r \quad d \quad r \quad b \quad d \quad v \quad r \quad b \quad d \quad v \\
\text{REGDEF}(b, v) &\quad r \quad b \quad \text{r} \quad b \quad v \quad v \\
\text{AUTOINCDEF}(b, v) &\quad r \quad b \quad b \quad v \quad b \quad v \quad b \quad v \quad b \quad v \\
\text{AUTODECDEF}(a, v) &\quad r \quad b \quad a \quad (v) \quad a \quad (v) \quad a \quad (v) \quad a \quad (v) \\
\text{INDEXDEF}(b, v) &\quad r \quad d \quad r \quad b \quad d \quad b \quad v \quad r \quad b \quad d \quad b \quad v \\
\end{align*} \]

In addition to these basic addressing modes there are 4 other special PC addressing modes. These addressing modes (1.2) come into effect when referencing the PC (register 7). Again, they define (apart from the immediate mode) two results—the effective address of the value and the value itself. “The word” refers to the contents of the location following the instruction.

• **PC immediate.** The word is the resulting value itself.

• **PC absolute.** The word is interpreted an address of the value.

• **PC relative.** The value’s address is calculated by adding the word to the PC.

• **PC relative deferred.** A pointer to the value’s address is calculated by adding the word to the PC.

\[ \begin{align*}
\text{IMM}(v) &\quad v \quad \text{im}(v) \quad v \quad v \\
\text{ABS}(a, v) &\quad a \quad (v) \quad \text{abs}(a, v) \quad a \quad (v) \quad a \quad (v) \\
\text{REL}(p, d, v) &\quad p \quad d \quad \text{rel}(p, d, v) \quad p \quad d \quad v \quad p \quad d \quad v \\
\text{RELDEF}(a, v) &\quad p \quad a \quad \text{rel}(a, v) \quad p \quad a \quad a \quad (v) \quad a \quad (v) \\
\end{align*} \]
Archetypes in 1.3 and 1.4 define two classes of addressing modes—the source (sop), and the destination (dop) operand.

\begin{align*}
\text{[1.3]} & \quad \text{sop}(v) = \text{REG}(-, v). & \text{[1.4]} & \quad \text{dop}(a, v) = \text{REG}(a, v). \\
& = \text{AUTOINC}(-, v). & & = \text{AUTOINC}(a, v). \\
& = \text{AUTODEC}(-, v). & & = \text{AUTODEC}(a, v). \\
& = \text{INDEX}(-, v). & & = \text{INDEX}(a, v). \\
& = \text{REGDEF}(-, v). & & = \text{REGDEF}(a, v). \\
& = \text{AUTOINCDEF}(-, v). & & = \text{AUTOINCDEF}(a, v). \\
& = \text{AUTODECDEF}(-, v). & & = \text{AUTODECDEF}(a, v). \\
& = \text{INDEXDEF}(-, v). & & = \text{INDEXDEF}(a, v). \\
& = \text{IMM}(-, v). & & = \text{ABS}(a, v). \\
& = \text{ABS}(-, v). & & = \text{REL}(a, v). \\
& = \text{REL}(-, v). & & = \text{RELDEF}(a, v). \\
& = \text{RELDEF}(-, v). & & &
\end{align*}

2 Instructions

2.1 Single operand instructions

In order to make the Update Plans specification closer to the real implementation, the following data types are defined. Promotions and conversions between these individual data types are considered to be defined elsewhere.

\{ \text{Nibble = Bit Bit Bit Bit, Byte = Nibble Nibble, Word = Byte Byte} \}.

The following text shows the layout of condition codes in the PSW register. We make use of the \text{CC}[\text{NZVC}] locators later on in this specification. The registers \text{CC}[\text{NZVC}] carry the negative, zero, overflow and carry bits respectively.

\[ b_n, b_2, b_v, b_c : : \text{Bit.} \]
\[ \text{nibble} : : \text{Nibble.} \]
\[ \text{byte} : : \text{Byte.} \]
\[ \text{PSW[byte nibble]}\text{CC}_n[b_n]\text{CC}_2[b_2]\text{CC}_v[b_v]\text{CC}_c[b_c]. \]

Archetype 2.1 sets the condition codes according to the value of the parameter \( v \).

\[ \text{cc}(v) = \text{CC}_n[(v <_s 0) \quad (v = 0) \quad \neg(\text{MIN}_s \leq v \leq \text{MAX}_s)) \quad (v > \text{MAX}_u)] \].

The constants \text{MIN}_s/\text{MAX}_s are the smallest/largest negative 16-bit two’s complement signed integer (-32768/32767). \text{MAX}_u is the largest 16-bit unsigned integer (65535). The operators \(<_s, \leq_s \) (signed comparison operators), >_u (unsigned comparison operator) and = are assumed to be defined elsewhere.
It is assumed that the standard arithmetic operators are already defined. The arithmetic instructions are then defined as archetypes.

\[2.2\]

\[
\begin{align*}
\text{arithm}(\_, 0) &= \text{CLR.} \quad \text{# clear destination} \\
\text{arithm}(x, \neg x) &= \text{COM.} \quad \text{# complement destination} \\
\text{arithm}(x, x + 1) &= \text{INC.} \quad \text{# increment destination} \\
\text{arithm}(x, x - 1) &= \text{DEC.} \quad \text{# decrement destination} \\
\text{arithm}(x, -x) &= \text{NEG.} \quad \text{# negate destination} \\
\text{arithm}(x, x + c) &= \text{ADC CC_c} \quad \text{# add carry to destination} \\
\text{arithm}(x, x - c) &= \text{SBC CC_c} \quad \text{# subtract carry from destination} \\
\text{arithm}(x, x/2) &= \text{ASR.} \quad \text{# arithmetic shift right destination} \\
\text{arithm}(x, x \times 2) &= \text{ASL.} \quad \text{# arithmetic shift left destination} \\
\text{arithm}(\_, -1 \times n) &= \text{STX CC_0} \quad \text{# sign extend destination} \\
\text{arithm}(x, r) \, \text{dop}(a, x) &= a[x \, cc(r)].
\end{align*}
\]

This specification covers mainly integer operand instructions. The byte operand instructions could easily be defined. For example, SWAB is defined by

\[2.3\]

\[
\begin{align*}
v_1, v_0 :: \text{Byte.} \\
\text{SWAB dop}(a, \_) &= a[v_1 \, v_0] \Rightarrow a[v_0 \, v_1]. \quad \text{# swap bytes of destination}
\end{align*}
\]

The specification of INCB is slightly more complicated. Here is an example of how the specification would change if we were to specify the instruction set including the byte operand instructions.

\[2.4\]

\[
\begin{align*}
\text{arithm}(x, x + 1) &= \text{INC} \Rightarrow \text{W.} \quad \text{# increment destination} \\
\text{arithm}(x, x + 1) &= \text{INCB} \Rightarrow \text{B.} \quad \text{# increment destination byte} \\
\text{arithm}(x, r) \, \text{dop}(a, x) &= rcc(a, r, \text{arithm}(\_, \_)).
\end{align*}
\]

The other byte/word operand pairs can be defined similarly. The constants \(W\) and \(B\) are used to pass typing information to the new condition code archetype \(rcc\). Note that the left-hand side call of \(\text{arithm}\) occurs in the text part of the left-hand side, while the right-hand side call is a parameter of the \(rcc\) archetype so that, e.g., the INC in update scheme 2.4 will appear in the text of the expansion, while the constant \(W\) will be passed to \(rcc\) on the right-hand side. A small example is shown in figure 2.1. As this is a simple example (of a partial expansion), parameters are resolved every time an archetype is expanded. Some extra archetypes, which are used for byte operand instructions are shown in 2.5.

\[2.5\]

\[
\begin{align*}
\text{type}(W) &= \text{Word}. \\
\text{type}(B) &= \text{Byte}. \\
\text{rcc}(a, v, w) &= cc(v, w) \Rightarrow a[(v :: \text{type}(w))]. \\
\text{cc}(v, W) &= \text{CC_0}((v <\leq 0) (v = 0) (\neg((\text{MIN}_s \leq v \leq s \text{MAX}_s)) (v >u \text{MAX}_u))]. \\
\text{cc}(v, B) &= \text{CC_0}((v <\leq 0) (v = 0) (\neg((\text{MIN}_B \leq bs v \leq s \text{MAX}_B)) (v >bu \text{MAX}_B)].
\end{align*}
\]
arithmetic(x, x) \implies rcc(a, x, arithmetic(x, -)). \hfill \# [2.4]

\# [2.4] arithmetic(x, x + 1) = INCB \implies B. \quad x = x, r = x + 1

INCB \text{ dop}(a, x) \implies rcc(a, x + 1, B).

\# [2.5] rcc(a, v, w) = \text{cc}(v, w) | \implies a[[v :: \text{type}(w)]]. \quad a = a, x + 1 = v, B = w

INCB \text{ dop}(a, x) \implies \text{cc}(x + 1, B) a[(x + 1 :: \text{type}(B))].

\# [2.5] \text{cc}(v, B) = \text{CCN}[(v <_{bs} 0) (v = 0) (\neg (\text{MIN} \_B_{bs} \leq_{bs} v \leq_{bs} \text{MAX} \_B_{bs})) (v >_{bs} \text{MAX} \_B_{bs})] \\ x + 1 = v

INCB \text{ dop}(a, x) \implies \text{CCN}[(x + 1 <_{bs} 0) (x + 1 = 0) (\neg (\text{MIN} \_B_{bs} <_{bs} x + 1 <_{bs} \text{MAX} \_B_{bs})) (x + 1 >_{bs} \text{MAX} \_B_{bs})] a[(x + 1 :: \text{type}(B))].

Figure 2.1: An example of archetype expansion

Again, the byte constants \text{MIN} \_B_{bs}, \text{MAX} \_B_{bs}, \text{MAX} \_B_{us}, and the byte operators $<_{bs}, \leq_{bs}, >_{bs}$ have similar meaning to their integer (word) counterparts, and are assumed to be defined elsewhere.

\[2.6\] \text{TST} \text{ dop}(-, x) \implies \text{cc}(x). \hfill \# \text{test destination}

\begin{align*}
\text{ROR} \text{ dop}(a, x) \text{ \text{CCN}[c]} & \implies a[(x \gg 1)][(c \ll 15)] \text{ \text{CCN}[x \& 1 :: \text{Bit}]]. \\
\text{ROL} \text{ dop}(a, x) \text{ \text{CCN}[c]} & \implies a[(x \ll 1)] [c] \text{ \text{CCN}[(x \gg 15) \& 1 :: \text{Bit}]].
\end{align*}

The \text{TST} instruction sets the condition codes according to the contents of the destination word.

\text{ROR/ROL}: the PSW C-bit and the destination word (considered as a 17-bit "word") is rotated right/left one bit. The vacated high/low-order bit (bit 15/0) is loaded with the contents of the C-bit. The low/high-order bit of the destination (bit 0/15) is loaded into the C-bit.

### 2.2 Double operand instructions

The \text{MOV} instruction moves a copy of the source word contents to the destination. Its command archetype definition is given in 2.7.

\[2.7\] \text{MOV} \text{ sop}(v) \text{ dop}(a, -) \implies a[v].

The following arithmetic and bit-wise instructions perform the specified operations, and set the destination operand and the condition codes in accordance with the result.

\[2.8\] arithmetic(x, y, x + y) = \text{ADD}. \hfill \# \text{add source to destination}

\begin{align*}
\text{arithmetic}(x, y, x - y) & = \text{SUB}. \hfill \# \text{subtract source from destination} \\
\text{arithmetic}(x, y, \neg x \& y) & = \text{BIC}. \hfill \# \text{bit clear destination from source} \\
\text{arithmetic}(x, y, x \mid y) & = \text{BIS}. \hfill \# \text{bit set destination from source} \\
\text{arithmetic}(x, y, x) \text{ sop}(x) \text{ dop}(a, y) & \implies a[x] \text{ \text{cc}(r)}.
\end{align*}

The comparison instructions are similar to the above instructions, however, the destination operand is not affected.
2.9 \( \text{cmps}(x, y, x - y) = \text{CMP} \). \# compare source to destination
\( \text{cmps}(x, y, x \& y) = \text{BIT} \). \# bit test source and destination bytes
\( \text{cmps}(x, y, x) \text{ sop}(x) \text{ dop}(., y) \Rightarrow \text{cc}(x) \).

2.3 Condition code and program flow operations

The condition code operations \( \text{SEc} \) and \( \text{CLc} \), where \( c \) is any of the condition codes \( N, Z, V, \) and \( C \) are shown in 2.10. The \( \text{SEc} \) operations set the condition codes, and \( \text{CLc} \) clear the condition code. \text{TRUE} and \text{FALSE} are predefined constants of the type \text{Bit}, values of which are 1 and 0 respectively.

\[\begin{align*}
\text{SE}N & \Rightarrow \text{CC}N[\text{TRUE}]. \quad \text{CL}N \Rightarrow \text{CC}N[\text{FALSE}]. \\
\text{SE}Z & \Rightarrow \text{CC}Z[\text{TRUE}]. \quad \text{CL}Z \Rightarrow \text{CC}Z[\text{FALSE}]. \\
\text{SE}V & \Rightarrow \text{CC}V[\text{TRUE}]. \quad \text{CL}V \Rightarrow \text{CC}V[\text{FALSE}]. \\
\text{SE}C & \Rightarrow \text{CC}C[\text{TRUE}]. \quad \text{CL}C \Rightarrow \text{CC}C[\text{FALSE}]. \\
\text{S}CC & \Rightarrow \text{CC}N[\text{TRUE}] \text{ CC}Z[\text{TRUE}] \text{ CC}V[\text{TRUE}] \text{ CC}C[\text{TRUE}]. \\
\text{CCC} & \Rightarrow \text{CC}N[\text{FALSE}] \text{ CC}Z[\text{FALSE}] \text{ CC}V[\text{FALSE}] \text{ CC}C[\text{FALSE}].
\end{align*}\]

The full suite of program flow instructions is given in the following definitions. First, the conditional branches are given (2.11), and then the unconditional jump is defined in 2.12.

\[\begin{align*}
\text{branch}(\text{true}) & = \text{BR}. \quad \# \text{branch always} \\
\text{branch}(\neg z) & = \text{BNE} \text{ CC}Z[z]. \quad \# \neq 0 \\
\text{branch}(z) & = \text{BEQ} \text{ CC}Z[z]. \quad \# = 0 \\
\text{branch}(\neg(n \wedge v)) & = \text{BGE} \text{ CC}N[n] \text{ CC}V[v]. \quad \# \geq 0 \\
\text{branch}(n \wedge v) & = \text{BLT} \text{ CC}N[n] \text{ CC}V[v]. \quad \# < 0 \\
\text{branch}(\neg(z)(n \wedge v))) & = \text{BGT} \text{ CC}Z[z] \text{ CC}N[n] \text{ CC}V[v]. \quad \# > 0 \\
\text{branch}(z)(n \wedge v)) & = \text{BLE} \text{ CC}Z[z] \text{ CC}N[n] \text{ CC}V[v]. \quad \# \leq 0 \\
\text{branch}(\neg n) & = \text{BPL} \text{ CC}N[n]. \quad \# + \\
\text{branch}(n) & = \text{BMI} \text{ CC}N[n]. \quad \# - \\
\text{branch}(\neg(c|z)) & = \text{BHI} \text{ CC}C[c] \text{ CC}Z[z]. \quad \# \text{higher (unsigned comparison)} \\
\text{branch}(\neg c) & = \text{BCC} \text{ CC}C[c]. \quad \# \text{carry clear} \\
& = \text{BHIS} \text{ CC}C[c]. \quad \# \text{higher or same (unsigned compar.)} \\
\text{branch}(c) & = \text{BCS} \text{ CC}C[c]. \quad \# \text{carry set} \\
& = \text{BLO} \text{ CC}C[c]. \quad \# \text{lower (unsigned comparison)} \\
\text{branch}(c|z) & = \text{BLOS} \text{ CC}C[c] \text{ CC}Z[z]. \quad \# \text{lower or same (unsigned compar.)} \\
\text{branch}(\neg v) & = \text{BVC} \text{ CC}V[v]. \quad \# \text{overflow clear} \\
\text{branch}(v) & = \text{BVS} \text{ CC}V[v]. \quad \# \text{overflow set}
\end{align*}\]

\( \text{PC}[pc] \text{ pc}[\text{branch}(c) \text{ d|cp} = [c] \Rightarrow \text{PC}[cp + d]; \quad \# \text{branch} \\
& \Rightarrow \text{PC}[cp]. \quad \# \text{no branch} \)
All branch instructions jump relative to the contents of the program counter based on condition codes. The displacement \( d \) is contained in the instruction word. The target absolute address of the jump instruction defined by the following command is dependent on an addressing mode encoded in the instruction word.

[2.12] \( \text{JMP } d_{op(-, a)} \Rightarrow PC[a] \).

The following register addressing archetype is defined for convenience. Note that use cannot be made of the register addressing command archetype defined in 1.1, since it expands the additional REG text.

[2.13] \( \text{reg}(r, v) = r[v] \).

JSR calculates the destination address \( (a) \), saves the contents \( (v) \) of the source register \( (r) \) on the hardware stack, and saves the address of the following instruction in the source register. (This saving of the PC provides the linkage from the subroutine back to the calling program.) Finally, the PC is given the destination address, which produces the actual jump to the subroutine. Return to the calling routine is provided by the companion instruction RTS. The contents of the specified register are loaded into the PC, and the top of the hardware stack is popped into the register. Note that register 6 is by convention the stack pointer.

[2.14] \( \text{PC}[pc] \text{pc}[\text{JSR } r_{dop(a, -)}]qc \text{reg}(r, v) \text{reg}(6, tp) \Rightarrow \text{PC}[a] \text{reg}(6, sp) sp[v]tp \text{reg}(r, qc). \)

\( \text{RTS } r \text{reg}(r, pc) \text{reg}(6, sp) sp[v]tp \Rightarrow \text{PC}[pc] \text{reg}(6, tp) \text{reg}(r, v). \)

### 2.4 Interrupts

Software interrupt instructions perform a trap through the vector at (fixed) memory locations. The effect of all of these instructions is the same: stack the contents of the PC and PSW (program status word), and then load the PC and PSW with the contents of a fixed memory location. The following definitions specify interrupt traps.

[2.15] \( \text{trap}(148, 168) = \text{BPT. } \# \text{breakpoint trap} \)
\( \text{trap}(208, 228) = \text{IOT. } \# \text{input/output trap} \)
\( \text{trap}(308, 328) = \text{EMT. } \# \text{emulator trap} \)
\( \text{trap}(348, 368) = \text{TRAP. } \# \text{TRAP} \)
\( \text{trap}(\text{pc}a, \text{psw}a) \text{reg}(6, tp) \text{pc}a[pc] \text{psw}a[psw] \text{PC}[pc] \text{PSW}[psw] \Rightarrow \text{sp}[pc] \text{psw}a[tp] \text{reg}(6, sp) \text{PC}[pc] \text{PSW}[psw]. \)

Returns from interrupts, on the other hand, restore the original contents (just before an interrupt) of the PC and PSW registers from stack.

[2.16] \( \text{ret()} = \text{RTI. } \# \text{return from interrupt} \)
\( = \text{RTT. } \# \text{return from trap} \)
\( \text{ret()} \text{reg}(6, sp) \text{sp}[pc \text{psw}]tp \Rightarrow \text{PC}[pc] \text{PSW}[psw] \text{reg}(6, tp). \)
2.5 Other instructions

The **NOP** instruction is the standard *no operation* instruction. It produces no effect, but occupies one word of memory and requires a complete execution cycle for execution.

\[ \text{NOP} \rightarrow .\# \text{ no operation} \]

3 Conclusions

A significant subset of the PDP-11 instruction set has been described using the Update Plans formalism. Although this specification adheres strictly to the Update Plans formalism as described in [60], it already raises several issues addressed in the second part of this thesis. In particular, it prompted the development of sequential update schemes and archetypes as explained in chapter 6. A new, more complex specification would be based on the concepts described in the same chapter, possibly on a fetch/execute cycle level description. This specification is rather an abstract one. Use of the syntactic extensions described in the second part of the thesis could be made, together with additional typing information to define the precise layout of instruction words, as is done, for example, in the following chapter.
Chapter 3

SPARC-V9

The SPARC has been implemented in processors used in a range of computers from laptops to supercomputers, and today boasts over 8,000 software application programs. SPARC-V9, like its predecessor SPARC-V8, is a modern microprocessor specification created by the SPARC Architecture Committee of SPARC International. SPARC-V9 is not a specific chip. It is an architectural specification that can be implemented as a microprocessor by those obtaining a license from SPARC International.

In this chapter, a formal specification of the SPARC-V9 abstract machine instruction set is given using the Update Plans formalism. Although a partial specification of the SPARC-V9 architecture already exists [50], this is aimed at a different level of description.

The structure of this chapter is as follows. Section 1 defines types and constants used throughout the specification. The SPARC-V9 has a rich set of registers using a relatively new concept of windowing which is both informally described and then specified using Update Plans in section 2. The most important instructions of this architecture are specified in section 3. An example taken from the specification is shown in section 4. Finally, conclusions and some future research suggestions are given section 5.

1 Types and constants

Throughout the specification, the following types are used. The remaining types used in this specification which are not displayed here are declared in sections of their immediate use. Bit is considered to be a primitive type having the usual meaning.

\[
\begin{align*}
\text{Bit} & \text{,} \\
\{\text{Bit}\} & = \{\text{Bit}\}_{8}, \text{Halfword} = \{\text{Bit}\}_{16}, \text{Word} = \{\text{Bit}\}_{32}, \\
\text{Extended word} & = \{\text{Bit}\}_{64}, \\
\{\text{Fp-single}\} & = \{\text{Bit}\}_{32}, \{\text{Fp-double}\} = \{\text{Bit}\}_{64}, \{\text{Fp-quad}\} = \{\text{Bit}\}_{128}.
\end{align*}
\]

\[\text{Scalable Processor ARCHitecture}\]
In Update Plans, constants are uppercase words. However, as will be shown in chapter 5 any sequence of letters (store) can be made into a constant by assigning it a type Constant. The main reason for listing these constants is to show their bit values, supply grounding information where necessary, and use names easily identifiable against the official (informal) description of the SPARC-V9 architecture [77].

nPC :: Constant.
IMM13(12) :: Bit.
BRZ(012), BRLEZ(102), BRLZ(112) :: {Bit}2.
ADD(0002), SUB(1002), SAVE(1002), RESTORE(1012) :: {Bit}3.
BPA(10002), BPN(00002), BPNE(10012), BPE(00012) :: {Bit}4.
LDSTUB(0011012) :: {Bit}6.
FADD(00100002), FSUB(00100012) :: {Bit}7.
REGRES(0000000002) :: {Bit}9.

2 Registers

A SPARC-V9 processor includes two types of registers: general-purpose, or working data registers, and control/status registers. In the following sections only integer and floating point registers are described. For the structure of control/status registers, please refer to the informal specification [77].

2.1 General purpose r registers

An implementation of the instruction unit may contain anything from 64 to 528 general-purpose 64-bit r registers. At any time, an instruction can access the first 8 global registers (r[0-7]), and a 24-register window into the r registers. A register window comprises the 8 'in' and 8 'local' registers, together with the 8 'in' registers of an adjacent register set, which are addressable from the current window as 'out' registers. See figure 3.1 for more graphic explanation.

The following two sets of archetypes are provided to facilitate reading (2.1) and writing (2.2) of r registers, and expansion of a 5-bit (a :: {Bit}5) register address in an instruction word.

[2.1] \( rr(0,0) = . \) # reading r[0] yields 0
\[ rr(a,v) = a[v] \]
\[ = \begin{cases} 
  \text{CWP}[w] \text{ outr}(a-8,v,w) & \text{if } 8 \leq a < 16 \\
  \text{CWP}[w] \text{ locr}(a-16,v,w) & \text{if } 16 \leq a < 24 \\
  \text{CWP}[w] \text{ inr}(a-24,v,w) & \text{if } 24 \leq a < 32 
\end{cases} \] # global registers
CHAPTER 3: SPARC-V9

Figure 3.1: Register Window 0

CWP is a constant locator pointing to the left boundary of the CWP (Current Window Pointer) register containing a 5-bit (w :: {Bit}5) address of the current 24 r register window.

A typical use of archetypes rr is through archetypes sop1 and sop2 (3.1) both of which will be described in section 3.1.

\[ \text{rw}(a, v) \quad a = \text{CWP}[w] \times \text{rw}(a, v, w). \]

\[ \text{rw}(a, v, w) = \begin{cases} 0 = a \to & \text{# writing } r[0] \text{ has no effect} \\ 1 \leq a < 8 \to & a[v]. \quad \text{# global registers} \\ 8 \leq a < 16 \to & \text{outw}(a - 8, v, w). \\ 16 \leq a < 24 \to & \text{locw}(a - 16, v, w). \\ 24 \leq a < 32 \to & \text{inw}(a - 24, v, w). \end{cases} \]

Global register zero r[0] always reads as zero (archetype 2.1), and writes to it have no program-visible effect (archetype 2.2).

\[ \text{out}(a, v, w) = \text{ilocr}(a, v, w). \]

\[ \text{outw}(a, v, w) = \text{ilocrw}(a, v, w). \]

\[ \text{locr}(a, v, w) = \text{ilocr}(a + 8, v, w). \]

\[ \text{locw}(a, v, w) = \text{ilocrw}(a + 8, v, w). \]

\[ \text{inr}(a, v, w) = \text{ilocr}(a, v, w - 1). \]

\[ \text{inw}(a, v, w) = \text{ilocrw}(a, v, w - 1). \]

Archetypes outr/outw, locr/locw and inr/inw (used to read/write access 'out', 'local' and 'in' registers) are defined in terms of archetypes ilocr/ilocw (2.4) which calculate left locators for these 3 types of r registers.

\[ \text{ilocr}(a, v, w) = (NWINDOWS - 1 - (w \% NWINDOWS)) \times 16 + a[v]. \]

\[ \text{ilocw}(a, v, w) = (NWINDOWS - 1 - (w \% NWINDOWS)) \times 16 + a[v]. \]

The number of windows or register sets, NWINDOWS, is implementation-dependent and ranges from 3 to 32. Note that r register with address o, where 8 ≤ o < 16, refers to exactly the same register as o + 16 does after the CWP is incremented by 1 (% NWINDOWS). Likewise, r register with address i, where 24 ≤ i < 32, refers to exactly the same register as address i - 16 does after the CWP is decremented by 1 (% NWINDOWS).
2.2 Floating-point f registers

The floating-point unit contains

- 32 single-precision (32-bit) floating-point registers, numbered \( f[0], f[1], \ldots, f[31] \)
- 32 double-precision (64-bit) floating-point registers, numbered \( f[0], f[2], \ldots, f[62] \)
- 16 quad-precision (128-bit) floating-point registers, numbered \( f[0], f[4], \ldots, f[60] \)

The floating-point registers are arranged so that some of them overlap, that is, are aliased. Layout and numbering of the floating-point registers is apparent from the following archetypes.

\[
\begin{align*}
fr(a, v, s) a &= a[(v::Fp\text{-single})] =\{s = 012\} \Rightarrow .
\[2.5\]
fr(a, v, s) a &= a[(v::Fp\text{-double})] =\{s = 102\} \Rightarrow .
\[2.5\]
fr(a, v, s) a &= a[(v::Fp\text{-quad})] =\{s = 112\} \Rightarrow .
\[2.5\]
fw(a, v, s) a &= \equiv\{s = 012\} \Rightarrow a[(v::Fp\text{-single})].
\[2.5\]
fw(a, v, s) a &= \equiv\{s = 102\} \Rightarrow a[(v::Fp\text{-double})].
\[2.5\]
fw(a, v, s) a &= \equiv\{s = 112\} \Rightarrow a[(v::Fp\text{-quad})].
\[2.5\]
\end{align*}
\]

Archetypes (2.5) are provided to encapsulate reading (\( fr \)) and writing (\( fw \)) of different types of floating-point values into \( f \) registers. Similarly to \( r \) registers, address \( a \) of an \( f \) register is 5-bit wide (\( a :: \{\text{Bit}\}5 \)).

Unlike the windowed \( r \) registers, all of the floating-point registers are accessible at any time.

3 Instructions

3.1 Arithmetic and logical operations

Arithmetic and logical instructions (and some others) use the instruction word format shown in detail in figure 3.2. The value (major opcode) of two highest bits of all instructions falling into this category is 10, for which a constant OP:ARM is used. Other fields in the instruction word are 5-bit addresses of source (\( rs1, rs2 \)) and destination (\( rd \)) registers, minor opcode \( op3 \), and if bit 13 is set to 1, a 13-bit immediate value \( simm13 \) always sign-extended to 64 bits.

In order to make our specification as compact and clear as possible, a frequent use of the archetype mechanism is made. Firstly, we define archetypes for accessing values in source registers, which at the same time describe the layout of the instruction word.

\[
\begin{align*}
sop1(v) &= rr(a, v) a.
sop2(v) &= \text{REGRES}(v).
\end{align*}
\]

Command archetypes \( \text{REGRES} \) and \( \text{IMM13} \) help in defining the instruction word format and return a value of a source operand in variable \( v \). The function \( \text{sign\_ext} \) sign-extends an integer value and is assumed to be defined elsewhere.
Secondly, there must be a way of reading and writing carry bit in the condition codes register. Archetypes 3.3 and 3.4 do exactly that. Note that arithmetic and logical instructions always read the carry bit from integer condition code register (ICC), whereas they write to both ICC and the extended integer condition code (XCC) registers.

Archetype \textit{iccrb} reads the carry flag to \( c \), if the carry bit in the op3 field (bit 22) is set. Similarly, \textit{ccwb} sets the condition code registers XCC and ICC (only) if the ccwb bit (23) in the instruction is 1.

The constants \( \text{MIN}_{n} \text{/MAX}_{n} \) are the smallest/largest negative \( n \)-byte two's complement signed integer. \( \text{MAX}_{n} \) is the largest \( n \)-byte unsigned integer. The \( n \)-byte operators \( <_{n}, \leq_{n}, >_{n}, \geq_{n} \) (signed comparison operators), \( >_{u} \) (unsigned comparison) and \( = \) are assumed to be defined elsewhere.

Archetype \textit{aloper} (3.5) shows the behaviour of 8 different arithmetic instructions and defines the layout of their op3 field making use of the previously defined archetypes \textit{iccrb} and \textit{ccwb}. These instructions include add/subtract, add/subtract and modify condition codes, add/subtract with carry, and add/subtract with carry and modify condition codes. The $n$ symbol used in archetype bodies is used to reference the archetype's $n$th parameter. Thus the $\S 3$ of the first \textit{aloper} archetype is merely shorthand for saying '\( x + y + c \)'.
### 3.2 Register window manipulation instructions

All \( r \) registers (apart from the first eight) are windowed. This means that there must be a mechanism to access registers in various windows. There are two instructions that do that—SAVE and RESTORE.

The SAVE instruction provides the routine executing it with a new register window. The 'out' registers from the old window become the 'in' registers of the new window.
The **RESTORE** instruction restores the register window saved by the last **SAVE** instruction executed by the current process. The 'in' registers of the old window become the 'out' registers of the new window. The 'in' and 'local' registers in the new window contain the previous values.

\[
\begin{align*}
[3.9] & \quad \text{sr}(x,y,x+y+c,w+1) = \\
& \quad 1_0 \text{ ccwb}($3) \text{ iccr}(c) \text{ SAVE CWP}[w] \Longrightarrow . \quad \# \text{SAVE} = 1002 \\
& \quad \text{sr}(x,y,x+y+c,w-1) = \\
& \quad 1_0 \text{ ccwb}($3) \text{ iccr}(c) \text{ RESTORE CWP}[w] \Longrightarrow . \quad \# \text{RESTORE} = 1012 \\
& \quad \text{OP:ARM} \text{ rw}(a,r,w) \text{ a sr}(x,y,r,w) \text{ sop}1(x) \text{ sop}2(y) \Longrightarrow \text{CWP}[w \% \text{NWINDOWS}].
\end{align*}
\]

Furthermore, **SAVE** and **RESTORE** behave like an ordinary **ADD** instruction, except that the source operands \(r[rs1]\) and/or \(r[rs2]\) are read from the old window (that is, the window addressed by the original CWP) and the sum is written into \(r[rd]\) of the new window (that is, the window addressed by the new CWP).

### 3.3 Load/Store instructions

#### 3.3.1 Load-store unsigned byte

The load-store unsigned byte instruction copies a byte from memory into \(r[rd]\), and then rewrites the addressed byte in memory to all ones. The fetched byte is right-justified in the destination register \(r[rd]\) and zero-filled on the left. The format of this instruction is the same as the format of arithmetic instructions, with the exception of the major opcode.

\[
\begin{align*}
[3.10] & \quad r :: \text{Byte}. \\
& \quad \text{ldstub}(a,r) = \text{LDSTUB} a[r] \Longrightarrow a[111111112]. \quad \# \text{LDSTUB} = 0011012 \\
& \quad \text{OP:LS} \text{ rw}(\_,r) \text{ ldstub}(x+y,r) \text{ sop}1(x) \text{ sop}2(y) \Longrightarrow . \quad \# \text{OP:LS} = 112
\end{align*}
\]

#### 3.3.2 Load integer from alternate space

The load integer from alternate space instructions copy a byte, a halfword, a word or an extended word from memory into \(r[rd]\). A fetched byte, halfword, or word is right-justified in the destination register; it is either sign-extended or zero-filled on the left (to 64 bits), depending on whether the opcode specifies a signed or unsigned operation, respectively.

For each instruction access and each normal data access, the IU appends an 8-bit address space identifier (ASI) to the 64-bit memory address. Load/Store alternate instructions can provide an arbitrary ASI with their data addresses (if bit 13 of the instruction word = 0), or use the ASI value contained in the ASI register (if bit 13 of the instruction word = 1).

Archetypes defined in 3.11, 3.13 and the type alias declaration in 3.12 are provided in order to make the final command update scheme in 3.14 more compact.
CHAPTER 3: SPARC-V9

Figure 3.3: Format of load/store integer or from/into alternate space instructions

[3.11] signed(s) = zero_fill = \{ s = 0 \}\Rightarrow .
      = sign_ext = \{ s = 1 \}\Rightarrow .

[3.12] \{ Integer = Byte | Halfword | Word \}.

The archetype 3.13 makes sure that we get the correct value of ASI regardless of the value
of bit 13 in the instruction field, fetches values x and y from source registers, and sets the
format of the instruction word for bits 0-18.

     sopasi(x, y, asi) = sop1(x) 02 asi sop1(y).
     = sop1(x) IMM13(y) ASI[asi].

Load/Store instructions use rather inconsistently different bit values to denote the type
Extended_word, and that is why a separate archetype lias is provided for this type. The
ASI a locator points to a normal or alternative address area.

[3.14] s :: Bit.
     lias(asi, a, signed(s) (r)) = s 02 Integer asia[r :: Integer]].
     lias(asi, a, r) = 10112 asi:a[(r :: Extended_word)].
     OP: LS rw(-, r) 012 lias(asi, x + y, r) sopasi(x, y, asi) \Rightarrow .

3.3.3 Store integer into alternate space

The store integer into alternate space instructions copy the whole extended (64-bit) integer,
the less-significant word, the least-significant halfword, or the least-significant byte of r[rd]
into memory. Again, due to the inconsistency described in the previous section a separate
archetype sias is provided for the type Extended_word.

[3.15] sias(asi, a, r) = 012 Integer \Rightarrow asia[(r :: Integer)].
     sias(asi, a, r) = 11102 \Rightarrow asia[(r :: Extended_word)].
     OP:LS rr(-, r) 012 sias(asi, x + y, r) sopasi(x, y, asi) \Rightarrow .
3.4 Floating-point instructions

3.4.1 Floating-point ADD and SUB instructions

The floating-point add instructions add the floating-point register specified by the rs1 field and the floating-point register specified by the rs2 field, and write the sum into the floating-point register specified by the rd field. The floating-point subtract instructions subtract the floating-point register specified by the rs2 field from the floating-point register specified by the rs1 field, and write the difference into the floating-point register specified by the rd field. Note that the command archetype in 3.16 makes use of archetypes fr defined in section 2.2 which read appropriate floating-point type values (based on the value in the two-bit cell s) into variables x and y. The archetype fw writes the result (r) into a floating-point register specified by the rd field, again, its type is based on the value of s.

\[ s :: \{\text{Bit}\}2. \]

\[ \text{farithm}(x, y, x + y) = \text{FADD}. \# \text{FADD} = 00100002 \]
\[ \text{farithm}(x, y, x - y) = \text{FSUB}. \# \text{FSUB} = 00100012 \]
\[ \text{OP:ARM } \text{fw}(\_, r, s) 1101002 \text{ fr}(\_, x, s) \text{ farithm}(x, y, r) s \text{ fr}(\_, y, s) \implies . \]

3.4.2 Floating-point compare

These instructions compare the floating-point register specified by the rs1 field with the floating-point register specified by the rs2 field, and set the selected floating-point condition code (fccn) as specified by the 3.17 archetype.

\[ f\text{cmp}(x, y) = \begin{cases} \text{\&[ } x = y \text{ ]} & 002. \\ \text{\&[ } x < y \text{ ]} & 012. \\ \text{\&[ } x > y \text{ ]} & 102. \\ \text{\&[ } x \ ? y \text{ ]} & 112. \# \text{unordered (x or/and y is NaN)} \end{cases} \]

Archetypes fccnr and fccnw read/write a selected floating point condition code field, and finally, update scheme 3.19 makes use of all the archetypes from this section to specify floating-point compare instruction and its field layout.

\[ v :: \{\text{Bit}\}2. \]

\[ \text{fccnr}(fcc, v) = \text{FSR:FCC}:fcc[v]. \]
\[ \text{fccnw}(fcc, v) = \implies \text{FSR:FCC}:fcc[v]. \]
### 3.4.3 Load floating-point from alternate space

The load single floating-point from alternate space instruction LDFA (see figure 3.6) copies a word from memory into f[rd]. The load doubleword floating-point from alternate space instruction (LDDFA) copies a word-aligned double-word from memory into a double-precision floating-point register. The load quad floating-point from alternate space instruction (LDQFA) copies a word-aligned quadword from memory into a quad-precision floating-point register.

The command update scheme in 3.20 makes use of archetypes defined earlier in 2.5 and 3.13.

\[
\begin{align*}
[3.20] \text{ldfa(asi, a, r, 01)} &= 00_2 \text{ asi:a[(r :: Fp.single)]}. \ # \text{LDFA} \\
\text{ldfa(asi, a, r, 10)} &= 11_2 \text{ asi:a[(r :: Fp.double)]}. \ # \text{LDDFA} \\
\text{ldfa(asi, a, r, 11)} &= 10_2 \text{ asi:a[(r :: Fp.quad)]}. \ # \text{LDQFA}
\end{align*}
\]

\[
\text{OP:LS fr(-, r, s) 110101 \text{ ldfa(asi, x+y, r, s) sopasi(x, y, asi) }}
\]

### 3.4.4 Store floating-point into alternate space

The store single floating-point into alternate space instruction (STFA) copies f[rd] into memory. The store doubleword floating-point into alternate space instruction (STDFA) copies a doubleword from a double floating-point register into a word-aligned doubleword in memory. The store quad floating-point into alternate space instruction (STQFA) copies the contents of a quad floating-point register into a word-aligned quadword in memory.

Similarly to load floating-point instructions from ASI (and load/store integer from/into ASI) store floating-point instructions into ASI contain the ASI to be used for the load in the imm.asi field if bit 13 = 0 in the instruction word, or in the ASI register if the same bit (i) reads as 1. The effective address for these instructions is r[rs1] + r[rs2] if i = 0, or r[rs1] + sign_ext(simm13) if i = 1.

\[
\begin{align*}
[3.21] \text{stfa(asi, a, r, 01)} &= 00_2 \text{ asi:a[(r :: Fp.single)]}. \ # \text{STFA} \\
\text{stfa(asi, a, r, 10)} &= 11_2 \text{ asi:a[(r :: Fp.double)]}. \ # \text{STDFA} \\
\text{stfa(asi, a, r, 11)} &= 10_2 \text{ asi:a[(r :: Fp.quad)]}. \ # \text{STQFA}
\end{align*}
\]

\[
\text{OP:LS fr(-, r, s) 110100 \text{ stfa(asi, x+y, r, s) sopasi(x, y, asi) }}
\]
3.5 Control transfer instructions

3.5.1 Jump and link

One of the simplest control transfer instructions is the JMPL instruction. The JMPL writes the contents of the PC (program counter), which points to the JMPL instruction itself, into the \( r[d] \) register\(^2\) and then causes a delayed transfer of control (the register nPC [next PC] is written) to the address given by \( r[rs1] + r[rs2] \) or \( r[rs1] + \text{sign-ext}(\text{simm13}) \) based on the value of bit 13 of the instruction word. Note that its format is the same as that of arithmetic instructions (figure 3.2). The value 111000\(_2\) is the content of the op3 instruction field and simply distinguishes the JMPL instruction from other (arithmetic) instructions.

\[ 3.22 \text{ PC}[pc] \rightarrow \text{nPC}[npc] = \text{OP: ARM \text{rw}(-, pc) 111000_2 \quad sop1(x) \quad sop2(y)} \] 

3.5.2 Branch on integer register with prediction (BPr)

These instructions branch based on the contents of \( r[rs1] \). They treat the register contents as a signed integer value. A BPr instruction examines all 64 bits of \( r[rs1] \) according to the cond field of the instruction, producing either a TRUE or FALSE result. If TRUE, the branch is taken; that is, the instruction causes a PC-relative, delayed control transfer to the address \( \text{PC} + (4 \times \text{sign-ext}(d16hi | d16lo)) \). If FALSE, the branch is not taken. If the branch is taken, the delay instruction is always executed, regardless of the value of the annul bit. If the branch is not taken and the annul bit (a) is 1, the delay instruction is annulled (not executed)—see update scheme 3.25.

The predict bit (p) is used to give the hardware a hint about whether the branch is expected to be taken. A 1 in the p bit indicates that the branch is expected to be taken; a 0 indicates that the branch is expected not to be taken.

First, let us define some stores which will be used later in the specification.

\(^2\)The value written into the register is visible to the instruction in the delay slot.
CHAPTER 3: SPARC-V9

Figure 3.8: Format of BPr instructions

<table>
<thead>
<tr>
<th>00</th>
<th>a</th>
<th>0</th>
<th>cond</th>
<th>011</th>
<th>d16hi</th>
<th>p</th>
<th>rs1</th>
<th>d16lo</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>30</td>
<td>29</td>
<td>28</td>
<td>27</td>
<td>25</td>
<td>24</td>
<td>22</td>
<td>21</td>
</tr>
</tbody>
</table>

a, p, n :: Bit. # annul, prediction and "negate condition" bits
d16hi :: {Bit}2. # 2-bit PC-relative displacement
rs1 :: {Bit}5. # address of the 1st source register
d16lo :: {Bit}14. # 14-bit PC-relative displacement

The values of constants BRZ, BRLEZ, BRLZ and OP:BRS can be found in section 1.

![3.23](#) bpr-cond(v, neg(n) (v = 0)) = n BRZ. # Branch on Register Zero
bpr-cond(v, neg(n) (v ≤ 0)) = n BRLEZ. # " Register Less Than or Equal to Zero
bpr-cond(v, neg(n) (v < 0)) = n BRLZ. # " Register Less Than Zero

Note that the archetypes bpr-cond can expand into 6 different update schemes, based on the value of bit 27 (n) of the instruction word. The archetype neg is defined in 3.6 on page 35.

Ambidextrous archetypes ea are used throughout the specification, and expand an effective jump address to branches.

![3.24](#) ea(pc, displ) pc + (4 x sign-ext(displ)) = .

![3.25](#) a BPA (Branch Always with Prediction) causes an unconditional PC-relative, delayed control transfer to the address PC + (4 x sign-ext(disp19)).
If the BPN's or BPA's a field is 1, the following (delay) instruction is not executed. If the a field is 0, the following instruction is executed.

3.5.3 Branch on integer condition codes with prediction (BPcc)

Unconditional Branches (BPA, BPN). A BPN (Branch Never with Prediction) in no case causes a transfer of control to take place. This instruction may be treated as a NOP by an implementation.

BPA (Branch Always with Prediction) causes an unconditional PC-relative, delayed control transfer to the address PC + (4 x sign-ext(disp19)).

If the BPN's or BPA's a field is 1, the following (delay) instruction is not executed. If the a field is 0, the following instruction is executed.

Conditional Branches. Conditional BPcc instructions (except BPA and BPN) evaluate one of the two integer condition codes (icc or xcc), as selected by cco and ccl, according to the cond field of the instruction, producing either a TRUE or FALSE result.

If TRUE, the branch is taken; that is, the instruction causes a PC-relative, delayed control transfer to the address PC + (4 x sign-ext(disp19)). If FALSE, the branch is not taken. If a
conditional branch is taken, the delay instruction is always executed regardless of the value of the annul (a) field. If a conditional branch is not taken and the a field is 1, the delay instruction is not executed.

Again, let us define some stores which will be used later in the following sections of this specification.

\[
\begin{align*}
n, z, v, c & :: \text{Bit. } \# \text{ bits of the ICC/XCC register} \\
cc1, cc0 & :: \text{Bit. } \# \text{ condition codes selection} \\
\text{cond} & :: \{\text{Bit}\}4. \# \text{ the condition field} \\
\text{disp19} & :: \{\text{Bit}\}19. \# \text{ branch's PC-relative displacement}
\end{align*}
\]

The \text{bp-cond} archetypes are used for reading condition code values, and making a decision about a condition's validity. It also defines the \text{cond} field in an instruction word (the 4-bit values of the constants BPA, BPN, ..., BPVS are assumed to be defined elsewhere). \text{COND} is a 3-bit offset into the instruction word.

\[
\begin{align*}
\text{bp-cond}(., \text{TRUE}) &= \text{BPA. } \# \text{ Branch Always} \\
\text{bp-cond}(., \text{FALSE}) &= \text{BPN. } \# \text{ " Never} \\
\text{bp-cond}(\text{ncc}, \neg z) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPNE. } \# \text{ " on } \neq \\
\text{bp-cond}(\text{ncc}, z) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPE. } \# \text{ " on } = \\
\text{bp-cond}(\text{ncc}, \neg(z \lor (n \land v))) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPG. } \# \text{ " on } > \\
\text{bp-cond}(\text{ncc}, z \lor (n \land v)) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPLE. } \# \text{ " on } \leq \\
\text{bp-cond}(\text{ncc}, \neg(n \land v)) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPGE. } \# \text{ " on } \geq \\
\text{bp-cond}(\text{ncc}, n \land v) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPL. } \# \text{ " on } < \\
\text{bp-cond}(\text{ncc}, \neg(c \lor z)) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPGU. } \# \text{ " on } > \text{ Unsigned} \\
\text{bp-cond}(\text{ncc}, c \land z) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPLEU. } \# \text{ " on } \leq \text{ Unsigned} \\
\text{bp-cond}(\text{ncc}, \neg c) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPCC. } \# \text{ " on } \geq \text{ Unsigned} \\
\text{bp-cond}(\text{ncc}, c) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPCS. } \# \text{ " on } < \text{ Unsigned} \\
\text{bp-cond}(\text{ncc}, \neg n) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPPOS. } \# \text{ " on Positive} \\
\text{bp-cond}(\text{ncc}, n) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPNEG. } \# \text{ " on Negative} \\
\text{bp-cond}(\text{ncc}, \neg v) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPVC. } \# \text{ " on Overflow Clear} \\
\text{bp-cond}(\text{ncc}, v) &= \text{CCR}: \text{ncc}[n \ z \ v \ c] \ \text{BPVS. } \# \text{ " on Overflow Set}
\end{align*}
\]
" \( c \land a = 0 \) \( \Rightarrow \) \( PC[\text{npc}] \text{nPC}[\text{ea}(\text{pc}, \text{disp}19)] \).
" \( c \land a = 1 \land \text{cond} = \text{BPA} \) \( \Rightarrow \) \( PC[\text{ea}(\text{pc}, \text{disp}19)] \text{nPC}[\text{ea}(\text{pc}, \text{disp}19) + 4] \).
" \( c \land a = 1 \land \text{cond} \neq \text{BPA} \) \( \Rightarrow \) \( PC[\text{npc}] \text{nPC}[\text{ea}(\text{pc}, \text{disp}19)] \).

Note that the value of the CCR:ncc locator (CCR: ICC or CCR: XCC) depends on the values of condition codes cc1 and cc0 (002 \( \approx \) XCC, 102 \( \approx \) ICC). The reason for using so many guards in update scheme 3.27 is mainly the different effects of the annul bit on conditional and unconditional branches described earlier in this section.

3.5.4 Branch on floating-point condition codes with prediction (FBPfcc)

The effects of these instructions are exactly the same as those of their integer counterparts described in section 3.5.3 as far as control transfer and annulment of the instruction in a delay slot is concerned. See figure 3.10 for their format.

The following set of archetypes reads a floating-point condition register selected by a 2-bit value fcc and compares it to constants: \( E(002), L(012), G(102), U(112) :: \{\text{Bit}\}2 \). As a result TRUE or FALSE value is instantiated as the archetype's second parameter. Again, the content of the cond field of the instruction word is defined by 4-bit constants (FBPA, FBPN, ..., FBPO) which are assumed to be defined elsewhere.

\[
\begin{align*}
\text{[3.28]} & \quad \text{fbp\_cond}(_, \text{TRUE}) = \text{FBPA}. & \# \text{Branch Always} \\
& \quad \text{fbp\_cond}(_, \text{FALSE}) = \text{FBPN}. & \# \text{"Never} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{U}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPU}. & \# \text{" on Unordered} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{G}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPG}. & \# \text{" on >} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{U} \lor \text{v} = \text{G}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPG}. & \# \text{" on Unordered or >} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{L}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPL}. & \# \text{" on <} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{U} \lor \text{v} = \text{L}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPLG}. & \# \text{" on Unordered or <} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{L} \lor \text{v} = \text{G}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPLG}. & \# \text{" on < or >} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{~E}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPN}. & \# \text{" on \text{\neq}} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{E}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPE}. & \# \text{" on =} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{U} \lor \text{v} = \text{E}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPUE}. & \# \text{" on Unordered or =} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{G} \lor \text{v} = \text{E}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPGE}. & \# \text{" on \geq} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{~L}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPUGE}. & \# \text{" on Unordered or \geq} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{L} \lor \text{v} = \text{E}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPLE}. & \# \text{" on \leq} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{~G}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPULE}. & \# \text{" on Unordered or \leq} \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{v} = \text{~U}) = \text{ffcnr}(\text{fcc}, \text{v}) \text{FBPO}. & \# \text{" on Ordered}
\end{align*}
\]

A more complex, but concise alternative to archetypes 3.28 is given in 3.29.

\[
\begin{align*}
\text{[3.29]} & \quad e :: \text{Bit}. \text{lgu} :: \{\text{Bit}\}3. \\
& \quad \text{fbp\_cond}(\text{fcc}, \text{neg}(\text{e}) (dm(\text{~v}) \land \text{~lgu} + 1)) = \text{ffcnr}(\text{fcc}, \text{v}) \text{e lgu}.
\end{align*}
\]
The function $dm$ is an ordinary demultiplexor, which could be defined using the arithmetic shift left operator $\ll$ as $dm(v) = 1_2 \ll v \{v \mid v \in \mathbb{N}_0\}$.

\[\begin{align*}
[3.30] & \quad \text{PC}[pc] \text{nPC}[npc] \text{pc:COND}[cond] \\
& \quad \text{pc}[OP:BRS a][pc:COND][\text{fbpCond}][cc1 | cc0, c] 1012 \text{ cc1 cc0 p disp19]} \\
& \quad = \neg c \land a = 0 \implies \text{PC}[npc] \text{nPC}[npc + 4]. \\
& \quad = \neg c \land a = 1 \implies \text{PC}[npc + 4] \text{nPC}[npc + 8]. \\
& \quad = c \land a = 0 \implies \text{PC}[npc] \text{nPC}[ea(pc, disp19)]. \\
& \quad = c \land a = 1 \land \text{cond} = \text{FBPA} \implies \text{PC}[ea(pc, disp19)] \text{nPC}[ea(pc, disp19) + 4]. \\
& \quad = c \land a = 1 \land \text{cond} \neq \text{FBPA} \implies \text{PC}[npc] \text{nPC}[ea(pc, disp19)].
\end{align*}\]

Again, the annul bit has a different effect for conditional branches than it does for unconditional branches (in this case opcode FBPA). If the annul bit (a) is set, it annuls the delay instruction for unconditional branches always, whereas for conditional branches it annuls the delay only if the branch is not taken. Please refer to [77] for details.

### 3.5.5 Call and link

The CALL instruction writes the contents of the PC into r[15] ('out' register\(^3\) 7) and then causes an unconditional delayed transfer of control to a PC-relative effective address.

\[\begin{align*}
[3.31] & \quad \text{disp30 :: } \{\text{Bit}\}30. \\
& \quad \text{PC}[pc] \text{nPC}[npc] \text{CWP}[v] \text{pc}[OP:CLL \text{ disp30]} & \# \text{OP:CLL} = 012 \\
& \quad \implies \text{PC}[npc] \text{nPC}[ea(pc, disp30)] \text{rw}(15, pc, v).
\end{align*}\]

### 3.5.6 Return

The RETURN instruction causes a delayed transfer of control to the target address and has the window semantics of a RESTORE instruction; that is, it restores the register window prior to the last SAVE instruction. The target address is $r[rs1] + r[rs2]$ if bit 13 of the instruction word (i) is 0, or $r[rs1] + \text{sign-ext}(\text{sim13})$ if $i = 1$.

\(^3\)see footnote 2 on page 40
CHAPTER 3: SPARC-V9

Figure 3.12: Format of the RETURN instruction

Figure 3.13: Format of the NOP instruction

Note that registers r[rs1] and r[rs2] come from the old window.

3.6 Miscellaneous instructions

3.6.1 No operation

The NOP instruction changes no program-visible state (except the PC and nPC registers).

An example

The following example uses archetypes from sections 2 and 3 to illustrate the archetype expansion mechanism. In this example the update scheme

will be expanded into one update scheme showing the effects of the JMPL (Jump and Link) instruction described in section 3.5.1. For maximum simplicity, the instruction uses as its destination and first source operands global r registers (1 ≤ r < 8) and as its second source operand an immediate value carried by the instruction itself.

The arithmetic command scheme 3.22 expands as shown in figure 3.14. The archetypes and their index numbers from sections 2 and 3 are included as comments for convenience. After substitution using equations derived at each step the full expansion of our update scheme is

PC[pc] nPC[npc] CWP[w] a1[x] # initial configuration
pc[OP:ARM 1110002 a1 IMM13 (v3 :: {Bit}13)] # field format of the current instruction
⇒ (1 ≤ r < 8) ∧ (1 ≤ a1 < 8) ⇒ # conditions for application
PC[npc] nPC[x + sign_ext(v3)] -[pc]. # effects of instruction on configuration
CHAPTER 3: SPARC-V9

46

For a slightly longer example of archetype expansion please refer to appendix C, where the same instruction uses as its destination operand one of 7 global r registers, as its first source operand one of 8 'in' registers, and finally as its second source operand an immediate value.

5 Conclusions

This chapter gives a (partial) specification of the SPARC-V9 CPU at the instruction set level. The main achievement is its compactness (the number of update schemes necessary to specify the whole instruction set of this processor is less than the number of opcodes appearing in the informal specification [77]) and clarity. The next step could be a specification of a SPARC-V9 compliant specific chip (such as UltraSPARC III [70]) and proof of their (partial) equivalence. Furthermore, since the same formalism can be used to specify lower level representations (such as the microcode), it would be possible to prove transformations between these levels and reason about the specifications.
Chapter 4

Java Virtual Machine

The Java Virtual Machine (JVM) is a platform independent abstract computing machine. The JVM has a bytecoded instruction set designed to be compact and easily interpreted in either software or hardware. In this chapter, a formal specification of a subset of the JVM instructions is given. Instructions are described using the basic Update Plans formalism briefly introduced in chapter 1. This is not the first attempt at a formal description of Java bytecode semantics as a similar work has already been done [11], however, the following description of JVM instruction semantics reflects the readability and flexibility of Update Plans for this purpose and also demonstrates some of the new Update Plans features introduced to the formalism by this thesis. The specification is based on informal specifications [19, 48, 75] and a part of it was already published in [55].

1 Types, constants, variables

1.1 Types

The following types are used throughout the specification. Store Bit is considered to be a primitive type having the usual meaning.

\[\text{1.1 } \{\text{Bit}\}.\]

\[
\begin{align*}
\text{Boolean} &= \{\text{Bit}\}_{32}, \\
\text{Byte} &= \{\text{Bit}\}_{32}, \\
\text{Char} &= \{\text{Bit}\}_{32}, \\
\text{Short} &= \{\text{Bit}\}_{32}, \\
\text{Int} &= \{\text{Bit}\}_{32}, \\
\text{Float} &= \{\text{Bit}\}_{32}, \\
\text{Reference} &= \{\text{Bit}\}_{32}, \\
\text{ReturnAddress} &= \{\text{Bit}\}_{32}, \\
\text{Long} &= \{\text{Bit}\}_{64}, \\
\text{Double} &= \{\text{Bit}\}_{64}.
\end{align*}
\]

Since many JVM instructions are defined in terms of so called category types [48], the following type aliases are provided.

\[\text{1.2 } \{\text{Category1} = \text{Boolean} | \text{Byte} | \text{Char} | \text{Short} | \text{Int} | \text{Float} | \text{Reference} | \text{ReturnAddress}, \]

\[\text{Category2} = \text{Long} | \text{Double}, \]

\[\text{Category12} = \text{Category1} | \text{Category2}.\]

47
For the purposes of this specification, additional types Signed/Unsigned Byte/Short are declared. The main reason is to supply grounding information for Update Plans. Signed values use two's-complement encoding similarly to Int and Long types.

\[ \text{[1.3]} \quad \{\text{UnsignedByte} = \{\text{Bit}\}_{8}, \text{SignedByte} = \{\text{Bit}\}_{8}, \text{UnsignedShort} = \{\text{Bit}\}_{16}, \text{SignedShort} = \{\text{Bit}\}_{16}\}. \]

In order to make the definitions of the instructions as compact and clear as possible, an alias type Word is also defined, which is used mainly by stack manipulation instructions.

\[ \text{[1.4]} \quad \{\text{Word} = \text{Category1 Category1} | \text{Category2}\}. \]

1.2 Constants

This section contains a list of all JVM opcodes used in this specification with their values.

\begin{align*}
\text{POP}(87), \text{POP2}(88), \text{DUP}(89), \text{DUP.X1}(90), \text{DUP.X2}(91), \text{DUP2}(92) \\
\text{DUP2.X1}(93), \text{DUP2.X2}(94), \text{SWAP}(95), \\
\text{ILOAD}(21), \text{ILOAD.0}(26), \text{ILOAD.1}(27), \text{ILOAD.2}(28), \text{ILOAD.3}(29), \\
\text{ISTORE}(54), \text{ISTORE.0}(59), \text{ISTORE.1}(60), \text{ISTORE.2}(61), \text{ISTORE.3}(62), \\
\text{IADD}(96), \text{ISUB}(100), \text{IMUL}(104), \text{IDIV}(108), \text{IREM}(112), \\
\text{IAND}(126), \text{IOR}(128), \text{IXOR}(130), \text{ISHL}(120), \text{ISHR}(122), \text{IUSHR}(124), \\
\text{INEG}(116), \text{IINC}(132), \\
\text{ICONST.M1}(2), \text{ICONST.0}(3), \text{ICONST.1}(4), \text{ICONST.2}(5), \text{ICONST.3}(6), \text{ICONST.4}(7), \\
\text{ICONST.5}(8), \\
\text{BIPUSH}(16), \text{SIPUSH}(17), \\
\text{IFEQ}(153), \text{IFNE}(154), \text{IFLT}(155), \text{IFGE}(156), \text{IFGT}(157), \text{IFLE}(158), \\
\text{IF_ICMPEQ}(159), \text{IF_ICMPNE}(160), \text{IF_ICMPLT}(161), \text{IF_ICMPGE}(162), \text{IF_ICMPGT}(163), \\
\text{IF_ICMPLE}(164), \\
\text{GOTO}(167), \text{GOTO.W}(200), \text{JSR}(168), \text{JSR.W}(201), \text{RET}(169), \\
\text{WIDE}(196) :: \text{UnsignedByte}. 
\end{align*}

1.3 Variables

Unless stated otherwise, throughout the whole document the following variables, used for description of instruction parameters, are assumed to have the following types

\begin{align*}
b_a & :: \text{SignedByte}. \\
b_u, \text{opc} & :: \text{UnsignedByte}. \\
s_s, \delta_s & :: \text{SignedShort}. \\
s_u & :: \text{UnsignedShort}. \\
i, j, \delta_i & :: \text{Int}. \\
z & :: \text{Category12}. 
\end{align*}
2 Instructions

2.1 Operand stack management

A number of instructions are provided for direct manipulation of the operand stack in the current frame: POP, POP2, DUP, DUP_X1, DUP_X2, DUP2_X1, DUP2_X2 and SWAP. One of many possible UP definitions is as follows

\[ \begin{align*}
\text{POP} & \quad \text{SP}[t] s[v]t \mapsto \text{SP}[s] . \\
\text{POP2} & \quad \text{SP}[t] s[w]t \mapsto \text{SP}[s] . \\
\text{DUP} & \quad \text{SP}[t] s[v]t \mapsto \text{SP}[t'] s[v v']t' . \\
\text{DUP2} & \quad \text{SP}[t] s[w]t \mapsto \text{SP}[t'] s[w w']t' . \\
\text{DUP_X1} & \quad \text{SP}[t] s[v_1 v_0]t \mapsto \text{SP}[t'] s[v_0 v_1 v_0]t' . \\
\text{DUP2_X1} & \quad \text{SP}[t] s[v_1 v_0]t \mapsto \text{SP}[t'] s[v_0 v_1 v_0]t' . \\
\text{DUP_X2} & \quad \text{SP}[t] s[w_1 v_0]t \mapsto \text{SP}[t'] s[w_0 v_1 v_0]t' . \\
\text{DUP2_X2} & \quad \text{SP}[t] s[w_1 v_0]t \mapsto \text{SP}[t'] s[w_0 v_1 v_0]t' . \\
\text{SWAP} & \quad \text{SP}[t] s[v_1 v_0]t \mapsto \text{SP}[t'] s[v_0 v_1]t .
\end{align*} \]

where SP denotes a stack pointer in the current frame and the s, t and t' variables are stack locators. In the specification above, the variables v, w, and their indexed variants, are of data types

\[ \begin{align*}
v & :: \text{Category1} . \\
w & :: \text{Word} .
\end{align*} \]

where variables of the type Category1 and Word occupy one and two stack items on the operand stack respectively.

Variables v, v_0 and v_1 in 2.1 can be viewed as two variables of the type Category1. Thus (for example) the definition of the instruction DUP2_X2 also covers definitions such as

\[ \begin{align*}
\text{DUP2_X2} & \quad \text{SP}[t] s[v_2 v_1 v_0]t \implies \text{SP}[t'] s[v_0 v_2 v_1 v_0]t' . \\
\text{DUP2_X2} & \quad \text{SP}[t] s[w_2 v_1 v_0]t \implies \text{SP}[t'] s[v_1 v_0 w_2 v_1 v_0]t' . \\
\text{DUP2_X2} & \quad \text{SP}[t] s[v_3 v_2 v_1 v_0]t \implies \text{SP}[t'] s[v_1 v_0 v_3 v_2 v_1 v_0]t' .
\end{align*} \]

The above specification assumes that the memory for the JVM stack is contiguous and as such is in fact implementation dependent. One of the ways around this particular problem is to use generic push and pop archetypes whose exact definitions would be defined later on in terms of specific/implementation dependent archetypes. For example, assuming there is a generic popsh(pop, psh) archetype combining the effects of pop and push instructions we can simply say that

\[ \begin{align*}
\text{push}(v) & = \text{popsh}(, v) . \\
\text{pop}(v) & = \text{popsh}(v, ) .
\end{align*} \]
The push and pop archetypes from 2.4 are frequently used in the following sections as a means of defining other JVM instructions which make changes to the operand stack.

2.2 Local variable access

The JVM instruction set includes a number of instructions for accessing local variables of different types in the current frame. The following ambidextrous archetype is used throughout this chapter to denote accesses to a local variable at index \( n \).

\[
[2.5] \text{var}(n, z) \ n[z] = .
\]

Note that the value \( z \) right of the locator \( n \) can be either of type `Category1` or `Category2` as declared in section 1.3. As the usual size of local variables is 32 bits (the width of the computational type (see [481] Int)), value \( z \) is stored in one or two local variables based on its size.

Many JVM instructions can be prefixed by the \texttt{WIDE} instruction, which results in different interpretation of the instruction modified in this way. The \texttt{WIDE} instruction takes one of two formats, depending on the instruction being modified. Archetypes in 2.6 are provided to illustrate the modifications. Two non-wide instruction formats are immediately followed by two wide formats. \( \alpha \) and \( \delta \) helper archetypes reconstruct (un)signed integer values from bytes of data.

\[
[2.6] \alpha(b_{u1}, b_{u2}) (((b_{u1} \ll 8))b_{u2}) :: \text{UnsignedShort} = .
\]

\[
\delta(b_{u1}, b_{u2}) (((b_{u1} \ll 8))b_{u2}) :: \text{SignedShort} = .
\]

\[
\delta(b_{u1}, b_{u2}, b_{u3}, b_{u4}) ((b_{u1} \ll 24)((b_{u2} \ll 16))((b_{u3} \ll 8))b_{u4} = .
\]

\[
\text{wide}(\text{opc}, b_{u}) = \text{opc} b_{u}.
\]

\[
\text{wide}(\text{opc}, b_{u}, b_{s}) = \text{opc} b_{u} b_{s}.
\]

\[
\text{wide}(\text{opc}, \alpha(b_{u1}, b_{u2})) = \text{WIDE} \text{opc} b_{u1} b_{u2}.
\]

\[
\text{wide}(\text{opc}, \alpha(b_{u1}, b_{u2}), \delta(b_{u3}, b_{u4})) = \text{WIDE} \text{opc} b_{u1} b_{u2} b_{u3} b_{u4}.
\]

The most common instructions that access local variables are load and store instructions which transfer values between JVM local variables and the operand stack. The following archetypes describe the instruction formats of all Int load/store instructions.

\[
[2.7] i3(0) = 0. \ i3(1) = 1. \ i3(2) = 2. \ i3(3) = 3.
\]

\[
\text{iload}(i) = \text{wide}(\text{ILOAD}, i).
\]

\[
\text{iload}(i) = \text{ILOAD} \_i3(i).
\]

\[
\text{istore}(i) = \text{wide}(\text{ISTORE}, i).
\]

\[
\text{istore}(i) = \text{ISTORE} \_i3(i).
\]

The purpose of the \( i3 \) archetypes is to condense the specification which would otherwise had to explicitly list all \texttt{ILOAD}\_\( n \) and \texttt{ISTORE}\_\( n \) instructions that implicitly access variables \( n \). Finally, update schemes in 2.8 show the effects that these instruction have on the JVM machine.
[2.8] iload(n) var(n, v) push(v) \rightarrow .
    istore(n) pop(v) \rightarrow var(n, v).

All the instructions described in this section operate on the data type Int. Similar archetypes and update schemes could be specified for the instructions operating on the data types Float, Long, Double, and Reference; that is, instructions FLOAD, LLOAD, DLOAD, and ALOAD respectively, plus their implicit immediate operand variants FLOAD.n, LLOAD.n, DLOAD.n, and ALOAD.n, where \( n \in \{0,1,2,3\} \).

2.3 Arithmetic instructions

The Java Virtual Machine set offers a wide range of arithmetic operators. It is assumed that the standard arithmetic operators are defined in the standard environment, so only these less common arithmetic operators are informally explained: shift left (<<), shift right (>>), and unsigned shift right (>>>). Archetypes and an update scheme for all the JVM's double operand Int arithmetic instructions are given in 2.9.

[2.9] iarithmetic.binary(i_1, i_2, i_1 + i_2) = IADD.
    iarithmetic.binary(i_1, i_2, i_1 - i_2) = ISUB.
    iarithmetic.binary(i_1, i_2, i_1 \times i_2) = IMUL.
    iarithmetic.binary(i_1, i_2, i_1 / i_2) = IDIV \{ i_2 \neq 0 \} \rightarrow .
    iarithmetic.binary(i_1, i_2, i_1 \% i_2) = IREM \{ i_2 \neq 0 \} \rightarrow .
    iarithmetic.binary(i_1, i_2, i_1 \& i_2) = IAND.
    iarithmetic.binary(i_1, i_2, i_1 \^ i_2) = IOR.
    iarithmetic.binary(i_1, i_2, i_1 \_\_ i_2) = IXOR.
    iarithmetic.binary(i_1, i_2, i_1 << i_2) = ISHL.
    iarithmetic.binary(i_1, i_2, i_1 >> i_2) = ISHR.
    iarithmetic.binary(i_1, i_2, i_1 >>> i_2) = IUSHR.

    iarithmetic.binary(i_1, i_2, i_1) popsh(i_1 i_2) \rightarrow .

Similarly, the only single operand^1 Int arithmetic instruction present in the JVM instruction set INEG can be defined as

[2.10] iarithmetic.monadic(i_1, -i_1) = INEG.
    iarithmetic.monadic(i, i) popsh(i, i) \rightarrow .

Note that because of the two's-complement representation used for negative numbers, negation of the minimum value of the type Int produces the same value, not the maximum value of this type as one might expect.

^1The Int bitwise negation is not present in the JVM instruction set, and is typically carried out by the instruction IXOR with the constant ICONST.M1.
And finally the last arithmetic instruction defined in this section is the Int increment of a local variable by a constant. It is the only arithmetic instruction, which can take the wide instruction format.

\[ \text{wide(IINC, } s_u, s_a) \text{ var}(s_u, i) \rightarrow \text{var}(s_u, i + s_a). \]

All the arithmetic instructions specified here operated on the data type Int. The instruction set of the Java Virtual Machine contains similar instructions for data types Long, Float, and Double, and so their formal description would not be beneficial to explanation of their semantics, or as a demonstration of the UP formalism.

2.4 Immediate operands

The instruction set of the Java Virtual Machine includes a number of instructions pushing an immediate operand onto the operand stack.

The simplest form of these instructions are pushes of implicit immediate operands. In a way similar to the \( i_3 \) archetypes from section 2.2, \( i_5 \) archetypes are used in an effort to make the specification as succinct as possible.

\[ i_5(-1) = \text{M1}. \quad i_5(0) = 0. \quad i_5(1) = 1. \quad i_5(2) = 2. \quad i_5(3) = 3. \quad i_5(4) = 4. \quad i_5(5) = 5. \]
\[ \text{iconst}(i) = \text{ICONST}._i5(i). \]

The rest of the instructions that push Int values onto the operand stack have explicit immediate operands.

\[ \text{iconst}(b_u) = \text{BIPUSH } b_u. \]
\[ \text{iconst}(b_u_1, b_u_2) = \text{SIPUSH } b_{u_1} b_{u_2}. \]
\[ \text{iconst}(i) \text{ push}(i) \rightarrow . \]

Please note that there is an implicit promotion of (un)signed types Byte and Short to Int described by [48].

Again, all the instructions described in this section operated on the data type Int. Similar archetypes and update schemes could be specified for the instructions operating on data types

- Float: \( \text{FCONST}_n \), where \( n \in \{0, 1, 2\} \)
- Long: \( \text{LCONST}_n \), where \( n \in \{0, 1\} \)
- Double: \( \text{DCONST}_n \), where \( n \in \{0, 1\} \)
- Reference: \( \text{ACONST.NULL} \) pushing a special NULL value

2.5 Control transfer

The control transfer instructions conditionally or unconditionally make the Java Virtual Machine continue execution with an instruction other than the one following the control transfer instruction. These instructions can be divided into three categories.
• conditional branches (e.g. IFEQ δ, IF_ICMPZEQ δ) comparing a top stack item against zero or comparing two topmost stack items against each other
• compound conditional branches (TABLESWITCH and LOOKUPSWITCH) with a variable number of operands (branches)
• unconditional branches (GOTO δ₁, GOTO_W δ₁, JSR δₙ, JSR_W δ₁, RET bₙ and WIDE RET sₙ)

As has already been pointed out, there are two groups of conditional branches comparing Int type values. The first one compares its parameter against an implicit immediate operand 0 and the other one compares its two parameters. Both versions are listed on the same line for the same type of conditional jump. The δ archetype defined in section 2.2, is used.

\[
\begin{align*}
\text{jmpc}(i, i = 0, δ(b₁, b₂)) &= \text{IFEQ } b₁, b₂. \\
\text{jmpc}(i, i \neq 0, δ(b₁, b₂)) &= \text{IFNE } b₁, b₂. \\
\text{jmpc}(i, i < 0, δ(b₁, b₂)) &= \text{IFLT } b₁, b₂. \\
\text{jmpc}(i, i \geq 0, δ(b₁, b₂)) &= \text{IFGE } b₁, b₂. \\
\text{jmpc}(i, i > 0, δ(b₁, b₂)) &= \text{IFGT } b₁, b₂. \\
\text{jmpc}(i, i < 0, δ(b₁, b₂)) &= \text{IFLCMP } b₁, b₂. \\
\text{jmpc}(i, i > 0, δ(b₁, b₂)) &= \text{IFICMPGE } b₁, b₂. \\
\text{jmpc}(i, i < 0, δ(b₁, b₂)) &= \text{IFICMPGT } b₁, b₂. \\
\text{jmpc}(i, j, i < j, δ(b₁, b₂)) &= \text{IFICMPLE } b₁, b₂.
\end{align*}
\]

The list of conditional branches in 2.14 is not complete. Again, only instructions comparing Int type values are included. There are four other conditional branches comparing values of the type Reference: IF_ACMPEQ δ₂, IF_ACMPNPE δ₂, IFNULL δ₂ and IFNULL δ₂; however, the semantics of these instructions is very similar to that of the instructions already defined in 2.14.

\[
\begin{align*}
\text{jmpu}(\text{TRUE}, δ(b₁, b₂)) &= \text{GOTO } b₁, b₂. \\
\text{jmpu}(\text{TRUE}, δ(b₁, b₂, b₃, b₄)) &= \text{GOTO_W } b₁, b₂, b₃, b₄.
\end{align*}
\]

Note that the only difference between the instructions GOTO δ₁ and GOTO_W δ₁ is the size of their operands. The wide variant of the unconditional jump takes a four-byte offset, whereas the standard variant takes only a two-byte offset. The aim of the archetype definitions above is to compress both the conditional and unconditional branches into one update scheme 2.17.

\[
\begin{align*}
\text{jump}(\text{cond}, δ₁) &= \text{jmpc}(i, \text{cond}, δ₁) \text{ pop}(i). \\
&= \text{jmpc}(i, j, \text{cond}, δ₁) \text{ pop}(i j). \\
&= \text{jmpu}(\text{cond}, δ₁).
\end{align*}
\]

\[
\begin{align*}
\text{PC}[\text{pc}]&[\text{jump}(\text{cond}, δ₁)] [\text{qc}] = [\text{cond}] \Rightarrow \text{PC}[\text{pc} + δ₁]. \\
&= [\neg \text{cond}] \Rightarrow \text{PC}[\text{qc}].
\end{align*}
\]

PC is the JVM program counter, which contains the address of the current instruction. Note that the target address of a GOTO(.W) instruction must be that of an opcode of an instruction within the method that contains this instruction. This is not dealt with in this specification, since this is the responsibility of the JVM bytecode verifier, which is not specified here.
The format of JSR(\_W) instructions is exactly the same as the format of GOTO(\_W) instructions. However, contrary to the GOTO(\_W) instructions, the JSR(\_W) instructions also push the return address on the operand stack as shown in 2.19.

\[ \text{jsr}(\delta(b_1, b_2)) = \text{JSR } b_1 \ b_2. \]
\[ \text{jsr}(\delta(b_1, b_2, b_3, b_4)) = \text{JSR}\_W b_1 \ b_2 \ b_3 \ b_4. \]

Note that the asymmetry of JSR and RET instructions is intentional. After a JSR instruction, the return address is stored into a local variable by one of ASTORE instructions, and this local variable may in turn be used by a RET instruction.

\[ \text{PC}[\text{pc}] \text{pc}[\text{jsr}(\delta_i)]\text{qc push(qc)} \iff \text{PC}[\text{pc} + \delta_i]. \]
\[ \text{PC}[\text{pc}] \text{pc}[\text{wide(RET, } n)]\text{qc var(n, a)} \iff \text{PC}[a]. \]

3 Conclusions

The specification of the subset of the JVM instruction set is the third and last demonstration of the original Update Plans, using only minor syntactic extensions from Extended Update Plans introduced in the second part of this thesis. It provides a proof that even more abstract instruction sets using a wide variety of types can easily be described by UP. An interesting exercise for the future could be a specification of a real Java processor (used widely in mobile devices) which usually implements one of several existing JVM subsets.
Part II

Extended Update Plans
Introduction

The first part of the thesis concentrated mainly on Update Plans as proposed by [60]. Several case studies have been performed testing the formalism on real examples, and a comparison of UP against other existing formal methods has been made. This prompted a number of research questions some of which are addressed in this second part of the thesis. This second part contains a number of syntactic and semantic extensions to Update Plans called Extended Update Plans (EUP). The main contribution of EUP is a concept of sequential update scheme and archetypes and improved consistency of the formalism in some areas. Finally, various other formal methods with similar application domains are examined and comparisons with the Update Plans formalism are drawn.
Chapter 5

Syntactic Extensions

The whole Update Plans grammar has been revised, and undergone changes to make it more compact and consistent in its structure and terminology. As a result, the formalism became much more simple to use without the need for frequent reference to its grammar. As an additional benefit, the implementation will be more transparent. Some syntactic changes, such as parallel blocks in archetypes introduced new possibilities in Update Plans and (inevitably) resulted in the need to define their semantics. These changes along with a completely new concept of sequential update schemes and archetypes will be discussed in chapter 6.

This chapter describes most of the important changes and additions to the grammar and the semantic impact of less significant syntactic changes. The complete revised grammar of Extended Update Plans can be found in appendix A.

1 Everything is an update

Most ‘inconsistencies’ in the original Update Plans grammar stemmed from the fact that there was no unifying concept of an ‘update’. In the new grammar, an update is either a parallel block, a sequential block or a set of alternatives. In other words, changes to a configuration are always caused by an update. Another significant inconsistency was the use of a dot ‘.’ to separate all (top-level) items other than parallel blocks. Strictly only (top-level) items are separated by a dot now. One of the consequences is that sugared multiple archetype definitions such as

\[
a(params) = update_1 = update_2 = \ldots = update_n.
\]

are perceived as one item. They are separated from other items in an update plan by the dot behind the last update update_n and from individual updates of the archetype by the ‘=’ sign.
A similar consideration applies to parallel blocks. The use of the double pipeline symbol \( \| \) to separate alternatives in a parallel block was entirely optional. As only items are now separated by a dot, it is necessary to separate individual updates in parallel blocks. This is done by changing the production rule for parallel blocks to
\[
\text{(parblock)} \rightarrow "\|" \{\text{(update)} \ "\|" \}^{+} \ "\|
\]

2 Archetypes

2.1 Grammar

The following grammar tries to be as compact as possible while preserving all features offered by the original grammar.

\[
\text{(item)} \rightarrow \text{(archetype definition)}
\text{(archetype definition)} \rightarrow
\text{(basic archetype definition)} \mid
\text{(ambidextrous archetype definition)}
\text{(basic archetype definition)} \rightarrow \text{(basic declaration)} \text{(basic definition)}^{+}
\text{(basic declaration)} \rightarrow \text{(basic archetype name)} \text{(parameters)}
\text{(ambidextrous archetype definition)} \rightarrow
\text{(ambidextrous declaration)} \text{(basic definition)}^{+}
\text{(ambidextrous declaration)} \rightarrow
\text{(archetype name)} \text{(parameters)} \text{(text)}
\text{(basic definition)} \rightarrow "\=" \text{(archetype body)}
\text{(archetype body)} \rightarrow \text{(update)} \mid \text{(configuration)}
\text{(parameters)} \rightarrow "(" \{\text{(text)}, "\," \}^{*} ")"
\]

A \text{(basic archetype name)} is an identifier, an \text{(archetype name)} is a symbolic constant or an identifier. Note that the pipeline symbol \( \| \) is no longer used to denote ambidextrous archetypes. The use of this symbol by ambidextrous archetypes and other means of separating the text by command (also ambidextrous) archetypes was one of the main inconsistencies. The consequences of these simplifications are discussed in the following two sections. The syntax of archetype calls has been left unchanged and can be found in appendix A or in the original work [60].

2.2 Ambidextrous archetypes

As mentioned earlier, the \( \| \) symbol no longer denotes the text to be expanded on the left or right-hand side of an update scheme during ambidextrous archetype expansion. Instead, the
text is placed between the parameters of an archetype and its definition, so that the archetype definitions

\[ a_l(params) = text \quad lc = \{ g \} \implies rc. \]
\[ a_r(params) = lc = \{ g \} \implies text \quad rc. \]
can be 'compressed' into one EUP ambidextrous archetype definition

\[ a(params) \quad text = \{ g \} \implies rc. \]

where \( lc/rc \) and \( text \) are left/right contexts and text shared among archetypes \( a_l, a_r \) and \( a \). Note that every archetype whose left and right-hand side expansions are empty is an ambidextrous archetype, which was not the case in basic Update Plans.

If the archetype body of an ambidextrous archetype consists of alternatives

\[ a(params) \quad text = lc_1 = \{ g_1 \} \implies rc_1; \quad lc_2 = \{ g_2 \} \implies rc_2; \quad \ldots; \quad lc_n = \{ g_n \} \implies rc_n. \]

it is interpreted as syntactic sugar for

\[ a(params) \quad text = lc_1 \quad g_1 \implies rc_1. \]
\[ a(params) \quad text = lc_2 \quad \neg g_1 \land g_2 \implies rc_2. \]
\[ \vdots \]
\[ a(params) \quad text = lc_n \quad \left( \bigwedge_{i=1}^{n-1} \neg g_i \right) \land g_n \implies rc_n. \]

Note that text in ambidextrous archetypes is not allowed to appear on the left or right-hand side of an archetype's body. The semantics of ambidextrous sequential archetypes is discussed in chapter 6.

2.3 Command archetypes

If \( a \) is an ambidextrous archetype and the expansion of all definitions of \( a \) begin with the same constant then that constant may be used as the archetype name, unless it has already been so used. The remainder of the expansion is then placed immediately after archetype's parameters. Such an archetype is called command archetype. The command archetype

\[ CONST(params) \quad text = update. \]

is the sugared version of

\[ a(params) \quad CONST \quad text = update. \]

with calls of \( a \) replaced throughout the update plan by calls of \( CONST \).

2.4 Archetype parameters

An archetype's parameters can be referenced in the archetype's body by the symbol '$' and a number directly corresponding to the position of the archetype's parameter where the parameters are numbered from the left starting from 1. For instance, the fourth parameter 'x + y' is referred to by $4$ in the following archetype.
\[ \text{add}(a, b, c, x + y) = A[a][a][x] b[y] \Rightarrow c[\text{§4}] \]

One of the changes to the archetype grammar is that the individual parameters of an archetype call can be empty. For example \( \text{add}(, b, c, r) \) is a perfectly valid archetype call, and an "empty string" is in place of parameter \$1\). No production rules are added to the grammar as the preprocessor can expand these references as simple macros.

### 2.5 Archetypes in guards

The last extension concerning archetypes concerns archetype calls appearing in guards. Archetype calls were not allowed in guards in basic Update Plans. As there is no reason for this restriction, archetype calls can be used in EUP guards unless they are recursive.

### 3 Types

#### 3.1 Constants

Constants are uppercase words in Update Plans. An exception to this rule is when making an explicit declaration of constants using the predefined type `Constant`. Then it is possible to compose constants of lowercase letters. For instance, \((c\text{ and } n\text{PC})\) are given "a constant status" by the following definition \(c, n\text{PC} :: \text{Constant}\).

Variables, on the other hand, are denoted by lowercase words. However, even if a word uses uppercase letters, it can be assigned an explicit type. Furthermore, individual types can be viewed as constants when used as an ordinary text in an update scheme. This feature can, for example, compress the following two update schemes

\[
\begin{align*}
\text{get}\_r(a, r) &= 0012 a[(r :: \text{Byte})]. \\
&= 0102 a[(r :: \text{Halfword})].
\end{align*}
\]

into only one

\[
\text{get}\_r(a, r) = \text{Integer} a[(r :: \text{Integer})].
\]

provided that a new type alias `Integer` is created, and that the constants `Byte` and `Halfword` are assigned symbolic values 0012 and 0102 respectively by an explicit list of types with their symbolic values.

\[
\{\text{Byte}(0012), \text{Halfword}(0102)\}. \\
\{\text{Integer} = \text{Byte} | \text{Halfword}\}.
\]

It is sometimes useful to initialise a variable or a constant to a value. The following example assumes that we want constants \(E, L, G\) and \(U\) to have symbolic values 002, 012, 102.
and 112 respectively. Note that this example makes use of the repeat construct introduced in section 3.2.

\[ E(002), L(012), G(102), U(112) :: \{\text{Bit}\}2. \]

A new meaning is given to the symbol "-". It is now used as the text concatenation symbol. For instance, a sequence 'T' E" X" T' is interpreted as a constant 'TEXT'. A more realistic example of its use can be found in section 2.4 of chapter 4.

An updated type grammar for these changes is given in the following section.

### 3.2 Type grammar

\[
\begin{align*}
\langle \text{item} \rangle & \rightarrow \langle \text{store declaration} \rangle \mid \langle \text{type declaration} \rangle \\
\langle \text{store declaration} \rangle & \rightarrow "\{" \{\langle \text{store} \rangle,"\} \{\\"\}" \\
\langle \text{type declaration} \rangle & \rightarrow \{(\langle \text{term} \rangle \langle \text{const}\rangle\text{-opt} ) "\} \{\\"\}+ "::" \langle \text{store structure} \rangle \\
\langle \text{term} \rangle & \rightarrow "(" \langle \text{term} \rangle "::" \langle \text{store structure} \rangle ")" \\
\langle \text{store} \rangle & \rightarrow \\
\quad \langle \text{store identifier} \rangle \mid \langle \text{store idenfier} \rangle "=" \langle \text{store structure} \rangle \\
\langle \text{store identifier} \rangle & \rightarrow \langle \text{store name} \rangle \langle \text{const}\rangle\text{-opt} \\
\langle \text{const} \rangle & \rightarrow "(" \langle \text{number} \rangle ")" \\
\end{align*}
\]

Note that \( \langle \text{store structure} \rangle \) is a regular expression over a set of \( \langle \text{store name} \rangle \)s, i.e. lower case words with a leading upper case letter.

A new type can be declared reusing an old one provided that the structure of the new type contains a pattern identical to the structure of the old type. In order to express the exact number of repetitions of the old structure in the new type, the syntax of the original Update Plans needs to be extended by adding the following rules.

\[
\langle \text{store structure} \rangle \rightarrow "\{" \{\langle \text{store structure} \rangle \} "\}\} \langle \text{number} \rangle \\
\]

For instance, assuming that the type Bit is already defined, the type Byte could be defined using the repeat construct as \( \{\text{Byte} = \{\text{Bit}\}8\} \).

### 4 Comments

The last trivial but useful syntactic addition to basic Update Plans are comments. A comment is started by the symbol ' # ' and is terminated by the end of the line. They would be removed by a preprocessor during lexical analysis.
5 Conclusions

Although not significant, the changes described in this chapter contribute to the usability of the formalism in three ways. Firstly, the grammar has now a slightly less restrictive syntax. As a result several simple semantic rules had to be added to the formalism, and the more will follow in the following chapter. This, however, is a small price to pay for its increased consistency and more intuitive use of the whole formalism. Secondly, the introduction of a few simple conventions improves the type reuse mechanism and allows even more compact, multi-level UP specifications. And finally, the changes to the grammar open new possibilities to UP such as nesting of sequential/parallel blocks within an update which will be described in the following chapter.
Chapter 6

Semantic Extensions

While parallel blocks form a useful extension to basic Update Plans by making them more readable, they do not add much power to the formalism as they can be interpreted as mere syntactic sugar. Moreover, one of the drawbacks of a parallel block lies in the nondeterministic execution of its constituent update schemes which makes any synchronisation of these schemes impossible.

Not only do the extensions described in this chapter address these problems, but they also provide Update Plans with modularity, which with the exception of the archetype mechanism was not present in basic Update Plans.

The layout of this chapter is as follows. Firstly, a simple convention concerning parallel blocks making Extended Update Plans more consistent is introduced in section 1. Secondly, the syntax, semantics and the implementation of sequential update schemes is presented in section 2. An extension, known as sequential archetypes, closely related to the archetype mechanism makes the concept of sequential update schemes more powerful and is introduced in section 3. The EUP grammar allows (in contrast to the basic UP grammar) any kind of update to appear in the body of an archetype. The consequences of this change are considered in section 4. Finally, conclusions are drawn in section 5.

1 Parallel blocks

Parallel blocks were introduced in [60] as a way of expressing synchronous parallelism. In the original Update Plans, a parallel block is a set of independent alternatives which are applied to a configuration simultaneously. The application of a parallel block is naturally atomic, so all changes to the configuration contributed by any of its alternatives will appear simultaneously. The only way to share data between the individual alternatives is to use constant locators.

The introduction of a simple convention allowing variables to be shared across all updates in a parallel block not only simplifies sharing of data among these updates and makes a parallel block consistent in this respect with its sequential counterpart, but it also makes
transformation of parallel blocks into canonical form slightly easier as there is no need to rename unrelated variables of the same name among the parallel block's updates.

--- Example 1 ---

Consider the following update plan, which best illustrates the difference between the semantics of parallel blocks in basic and Extended Update Plans.

```
A[0] B[1].    # initial configuration
\| A[a] \rightarrow A[a + 1]  # basic Update Plans require ',' here
\| B[a] \rightarrow B[a - 1].
```

In basic Update Plans, the first application of the parallel block will result in the configuration A[1] B[0]. In Extended Update Plans, the parallel block is not applicable, as the cells right of A and B have different values and as such the set of locator expressions on the left-hand side of the resulting update scheme is not consistent.

---

2 Sequential update schemes

2.1 Background/Motivation

Apart from improving modularity in Update Plans, the motivation for sequential update schemes had two sources. Firstly, at the formalism level, it was necessary to provide some synchronisation for update schemes, and on a related note, at the application level, the introduction of explicit sequential execution proved to be very useful as there is often a need to express a sequential execution of update schemes without resorting to techniques which would make the specification less transparent.

At the instruction level, consider any PDP-11-like instruction, but with three operands\(^1\), e.g. `ADD z x, y`, with x and y source operands and z the destination operand. The semantics of this instruction seems simple. Source operands x and y are added and the result is written into the destination z. Consider, however, what happens if two of the operands use addressing modes that address the same register, and change the value in that register—e.g. x is postincrement on register R1 (R1+) and y is predecrement on the same register (–R1). It is then not clear what values are being addressed. Should x be addressed first and then y—i.e. get x from @R1, increment R1 then decrement R1 and get y from @R1, or is y addressed first and then x, or are they in some way accessed in parallel. In the original Update Plans such an instruction was illegal.

\(^1\)PDP-11 instructions have a maximum of only 2 operands
The specification of the above-mentioned addressing modes and the arithmetic instruction is in 2.1.

\[ \text{POSTINC}(b, v) \ r = r[b] b[v]c \rightarrow r[c]. \]  
\[ \text{PREDEC}(a, v) \ r = r[b] a[v]b \rightarrow r[a]. \]  
\[ \text{arithm}(x, y, x + y) = \text{ADD}. \]  
\[ \text{arithm}(x, y, r) \ r_3 \text{ POSTINC}(b, x) \text{ PREDEC}(a, y) \rightarrow r_3[r]. \]

In the following text the expansion of an ADD R2 R1+ -R1 instruction is examined. First, before an archetype’s expansion, it is necessary to rename all archetype’s variables so that they do not conflict with variables within the scheme the archetype is being called from.

\[ \text{POSTINC}(b, v) \ r_1 = r_1[b] b[v]c \rightarrow r_1[c]. \]

The POSTINC archetype is now used and its parameters added to the resolution set.

\[ \text{arithm}(x, y, r) \ r_3 \text{ POSTINC} r_1 r_1[b] b[v]c \text{ PREDEC}(a, y) \rightarrow r_3[r] r_1[c]. \]

\# \{b = b, x = v\}

Similarly, all conflicting local variables of the PREDEC archetype are renamed.

\[ \text{PREDEC}(a, v_1) \ r_2 = r_2[b_1] a[v_1]b_1 \rightarrow r_2[a]. \]

Finally, after the PREDEC’s archetype expansion, the original update scheme becomes

\[ \text{arithm}(x, y, r) \ r_3 \text{ POSTINC} r_1 r_1[b] b[v]c \text{ PREDEC} r_2 r_2[b_1] a[v_1]b_1 \rightarrow r_3[r] r_1[c] r_2[a]. \]

\# \{a = a, y = v_1\}

If \( r_1 \) and \( r_2 \) refer to the same register (R1) then \( b = b_1 \), so it must be the case that \( a \neq c \). This is therefore a conflict between the cells \( r_1[c] \) and \( r_2[a] \), i.e. \( R1[c] \) and \( R1[a] \) (the right-hand side of the update scheme 2.5 is inconsistent). Although the update scheme is not yet fully expanded, we can already recognise the inconsistency at this early stage of archetype expansion.

Although there is a good reason for making these instructions illegal, there are situations when instruction operands are accessed sequentially in a clearly defined order during a f/e cycle. While even such cases can be specified using basic Update Plans, there would be a significant loss of clarity in the specification.

On most PDP-11 implementations, operands of an instruction are accessed in some sequence. In this example operand access order of the ADD z x y instruction is x, then y and finally z. Using the extension to the UP formalism introduced in this chapter, the behaviour of our arithmetic instruction can be described in terms of sequential update scheme S in 2.6.

\[ \begin{align*}
3 \ & \text{arithm}(x, y, r) \ r_3 \rightarrow r_3[r] \\
1 \ & \text{POSTINC}(b, x) \rightarrow \\
2 \ & \text{PREDEC}(a, y) \rightarrow .
\end{align*} \]
Informally, the above definition says: expand archetypes POSTINC, PREDEC, and arithm and apply their locator expressions to the configuration in this exact order (the application order). However, the ordering of text expanded from sequence's archetypes is from left to right/top to bottom (the textual order).

2.2 Syntax

Similarly to parallel blocks [60], sequential blocks are delimited by the open sequential block symbol, '(SEQ)' which takes an identifier (sequencer) here 'SEQ', and the close sequential block symbol, ').' The reason for using the identifier in the open sequential block symbol will become clear in section 3 where sequential archetypes are discussed.

While the double pipeline symbol '|' is entirely optional (with the exception of the open/close parallel block symbol) in basic Update Plans, the (single) pipeline symbol is essential in sequential blocks, in order to separate updates in these blocks. The pipeline symbol takes an additional number which is the order in which updates nested in the sequential block are applied to a configuration.

A basic notation for sequential update schemes (sequences for short) is

\[(SEQ|a \ \text{update}_a \ 
|b \ \text{update}_b 
\vdots 
|m \ \text{update}_m 
|)\]

or making use of typesetting possibilities

\[
\text{SEQ} \atop \text{a update}_a \atop \text{b update}_b \atop \vdots \atop \text{m update}_m 
\]

where updates are either alternatives, a parallel block, or a sequence.

Sequences either have all stages (sequence numbers/indicators; in the above generic sequential update scheme \(a, b, \ldots, m\)) in a sequential block tagged, or all stages left untagged. Such 'untagged' sequences are so-called stageless. A sequence with no sequential block identifier is referred to as an anonymous sequence. Sequencers do not have to be unique, but if they are not, the risk of creating an incorrect specification by an oversight is increased as a sequential archetype can expand into two or more unrelated sequences as will be seen in section 3.3.

The following production rules need to be added to the UP grammar.

\footnote{Note that in EUP the double pipeline symbol is no longer optional.}
CHAPTER 6: SEMANTIC EXTENSIONS

\[
\begin{align*}
\langle \text{item} \rangle & \rightarrow \langle \text{seqblock} \rangle \\
\langle \text{seqblock} \rangle & \rightarrow "(\langle \text{seqblock id} \rangle - \text{opt} "|"\} \{\langle \text{stage} \rangle - \text{opt} \langle \text{update} \rangle \} + "|"\})\]
\end{align*}
\]

\[
\begin{align*}
\langle \text{seqblock id} \rangle & \rightarrow \langle \text{symb-const} \rangle \\
\langle \text{stage} \rangle & \rightarrow \langle \text{symb-const} \rangle \mid \langle \text{variable} \rangle
\end{align*}
\]

The complete grammar of Extended Update Plans can be found in appendix A.

2.3 Semantics

Consider the generic sequential update scheme from section 2.2. There are \( n = \{|a, b, \ldots, m|\} \) updates in the sequence, where the variables \( a, b, \ldots, m \) are stages of application. All the stage-representing variables can be grouped into a countable set of variables \( \mathcal{V} = \{a, b, \ldots, m\} \). Every variable \( v \in \mathcal{V} \) instantiates a value \( i \in \mathcal{I} \), where \( \mathcal{I} \) is a countable instantiation set. The mapping between \( \mathcal{V} \) and \( \mathcal{I} \) is a many-to-one mapping, in other words, two or more stages in a sequential block can have equal instantiations, which allows for non-deterministic sequential updates schemes. The stages represent the application order for all the updates within a sequence and for every instantiation \( \alpha \)

\[
\{\alpha \mid \alpha \in (\mathcal{I} - \{\omega\})\}, \text{ where } \omega = \max(\mathcal{I})
\]

of these variables there is a direct successor function \( \text{succ}(\alpha) \). If for any \( \{\alpha \mid \text{succ}(\alpha) \notin \mathcal{I}\} \), the sequence is always only partially applicable, as explained later. If for all \( \{\alpha \mid \text{succ}(\alpha) \in \mathcal{I}\} \), stages do not restrict the applicability of the sequence, and it can be fully applicable depending on the applicability of its updates. The application/temporal order of updates in a sequence is defined as

\[
\min(\mathcal{I}), \text{succ}(\min(\mathcal{I})), \text{succ}(\text{succ}(\min(\mathcal{I}))), \ldots, \text{succ}^{n-1}(\min(\mathcal{I})) \equiv \max(\mathcal{I})
\]

where \( \min(\mathcal{I}) \) is the first stage, and \( \text{succ}^{n-1}(\min(\mathcal{I})) \) is the last stage of application.

Although the application of a sequence takes place in individual stages (steps), it is still atomic (as is for example the application of a parallel block) in relation to the rest of the updates outside the sequence. The syntax and the semantics of any of these \( n \) updates (including their atomicity of application) is naturally preserved. The semantics of sequential blocks nested in parallel blocks is slightly more complicated, and is explained in section 2.4.

Unlike parallel blocks, which are either applicable as a whole or simply not applicable at all, we define a full and partial applicability of sequences. A sequence is fully applicable if all of its constituent updates are applicable in the application order. A sequence is partially applicable, if one or more but less than \( n \) updates are applicable in the application order. For example, sequence 2.1 in chapter 7 will always be only partially applicable unless some sequential archetype expands its stage two. The effects of a sequential update scheme \( \mathcal{S} \) which is only partially applicable are permanent, unless there was another update which was applicable (in the case of a sequential update scheme this may be either fully or partially) before the
application of the scheme $S$. This behaviour is especially useful when simulating fatal failures of a system, when the system retains the state when the failure occurred. Breaking the application order by omitting a stage $\{\alpha | \text{succ}(\alpha) \notin \mathcal{I}\}$ results in a sequential block which is always only partially applicable.

Since text can be expanded as a result of a sequence's application, we need to define the placement of the text for the sequence as a whole, and also an internal *textual ordering*. The placement of text expanded as a result of a *top-level* sequence's application fits nicely with the concept of a command driven update plan. An update plan's top-level update is also an item in the update plan. In other words, a top-level update is always separated from other update plan's items by the '.' symbol. Since the application of a sequence can be viewed as an atomic action, it can also be viewed as a command update scheme

$$\text{ltxt } lc = \{ g \} \Rightarrow \text{rttxt } rc$$

which can be desugared for top-level sequences as

$$\text{PC}[pc] \text{pc}[\text{ltxt}]qc lc = \{ g \} \Rightarrow \text{PC}[pc'] \text{pc'}[\text{rttxt}]qc rc.$$  

where ltxt/rttxt or both are non-empty left/right-hand side texts, and lc/rc is the left/right-hand side context before/after application of a sequence, and PC is a program counter.

The (internal) textual ordering of left/right-hand side text $ltxt_i/rttxt_i$, $i \in \{a, b, \ldots, m\}$ from updates of the form

$$ltxt_i \ lci = \{ g_i \} \Rightarrow rttxt_i \ rc_i$$

is then

$$\text{ltxt}_a \ ltxt_b \ldots \ ltxt_m \ lc = \{ g \} \Rightarrow \text{rttxt}_a \ rtttxt_b \ldots \ rtttxt_m \ rc$$

which can again be desugared for top-level sequences as

$$\text{PC}[pc] \text{pc}[\text{lttxt}_a \ ltxt_b \ldots \ ltxt_m]qc lc = \{ g \} \Rightarrow \text{PC}[pc'] \text{pc'}[\text{rttxt}_a \ rtttxt_b \ldots \ rtttxt_m]qc rc.$$  

Not only are variables shared between left and right-hand sides of an update scheme, but they are also shared across all updates of a sequence. This further reduces the complexity of an update plan as it is possible to share data between updates directly rather than using additional constant locators. However, as will be shown in section 3.7, no forward references to variables are allowed.

The temporal ordering of stageless sequences is equivalent to its textual ordering. In other words, the stageless sequence

$$\text{SEQ} \left\| \text{update}_a \right. \left\| \text{update}_b \right. \left\| \ldots \right. \left\| \text{update}_m \right.$$  

is syntactic sugar for
2.4 Canonical form

A sequential block is said to have a canonical form if all of its updates are update schemes. The canonical form of sequential blocks is introduced to simplify the implementation of sequential update schemes and to define the semantics of sequential blocks nested in parallel blocks.

Every sequential block has its canonical form. The following text shows how a sequential update scheme can be translated into its canonical form.

One of the requirements for transformation of a sequential update scheme into canonical form is that all its stages and stages of all nested sequences must be ground. At first sight this might seem as a limitation, but as temporal ordering is also required for the implementation, and transformation into canonical form is primarily required as the first step of the implementation, this is not an issue.

The algorithm can be divided into two independent parts. Firstly, the sequential block is ‘linearised’ by three transformations so that all of updates in the block are alternatives or parallel blocks. Then, in the second part, any parallel blocks and alternatives are replaced by update schemes. Due to the possible presence of alternatives (or alternatives nested in parallel blocks), this step (conversions 4 and 5) may produce two or more sequential update schemes in canonical form.

The algorithm

while(sequential block contains other sequential blocks) {
  convert:
  1) sequential blocks in || blocks /* \tau_1 */
  2) sequential blocks in sequential blocks /* \tau_2 */
  3) || blocks in || blocks /* \tau_3 */
}
/* assertion: the only type of nested updates are || blocks or alternatives */
convert:
  4) all || blocks into update schemes /* \tau_{4a}, \tau_{4b} */
  5) all alternatives into update schemes /* \tau_5 */

All five individual steps of the algorithm are explained in more detail in the following sections.

Sequential blocks in a parallel block

Of all the transformations used in the algorithm the transformation of sequential blocks nested in a parallel block has the most noticeable impact on the structure of the resulting sequential
As a side effect, the transformation also shows the semantics of sequential blocks nested in a parallel block. The main idea is demonstrated by the following example. There are two sequential updates $A$ and $B$, and two non-sequential updates $update_1$ and $update_2$ in the parallel block. Let there be instantiation sets $\mathcal{I}_A$ and $\mathcal{I}_B$ containing instantiations of sequential blocks' $A$ and $B$ stages. The parallel block can then be replaced by a sequential block containing $n$ parallel blocks/stages, where $n = |\mathcal{I}_A \cup \mathcal{I}_B|$. Each of these parallel blocks/stages will contain updates to be applied at the same time. All non-sequential updates are grouped in the first stage of the newly created sequential block.

\[
\begin{align*}
\text{update}_1 & \quad | \quad \text{update}_1 \\
A & a_A \quad \text{update}_{a_A} \\
& b_A \quad \text{update}_{b_A} \\
& c_A \quad \text{update}_{c_A} \\
B & b_B \quad \text{update}_{b_B} \\
& b_B \quad \text{update}_{b_B} \\
& c_A \quad \text{update}_{c_A} \\
\text{update}_2 & \quad | \quad \text{update}_2 \\
\end{align*}
\]

Provided stages $a_A = a_B$ and $b_A = b_B$, the parallel block can be split into three separate parallel blocks as demonstrated above ($\tau_1$). As all updates in a parallel block are guaranteed to be desugared, no special care needs to be taken about textual ordering.

**Sequential blocks in a sequential block**

Let there be a sequential block $A$ with a nested sequential block $B$ in stage $b_A$. Then it is possible to transform ($\tau_2$) the original sequential block $A$ into a sequential block $AB$ by simply splitting $B$'s stages and adding them to the the newly created sequential block.

\[
\begin{align*}
\text{update}_1 & \quad | \quad \text{update}_1 \\
A & a_A \quad \text{update}_{a_A} \\
& b_A \quad \text{update}_{a_B} \\
& b_B \quad \text{update}_{b_B} \\
& c_A \quad \text{update}_{c_A} \\
\text{update}_2 & \quad | \quad \text{update}_2 \\
\end{align*}
\]

Values of stages of the new block $AB$ must preserve the application order to ensure the semantic equivalence of sequential blocks $A$ and $AB$. Values of all stages temporally preceding stage $b_A$ are left unchanged. For instance, assuming that $b_A$ temporally follows both the $a_A$ and $c_A$ stages, $a_{AB} = a_A$ and $d_{AB} = c_A$. All stages $B$ introduced from the block $B$ will have their value changed to $b_A + b - \min(B)$, where $b \in B$.

Values of all stages temporally following stage $b_A$ need to be incremented by the difference between the last and the first stage to be applied in block $B$, i.e. $\max(B) - \min(B)$. Textual ordering is naturally preserved.
CHAPTER 6: SEMANTIC EXTENSIONS

Parallel blocks in a parallel block

The third step of conversion of a sequential block into its canonical form is the simplest one. A parallel block in the sequential block $S$ having as its updates other parallel blocks will be transformed into a new parallel block using a simple manipulation demonstrated by the following figure.

$$
\begin{array}{c}
\text{update}_1 \\
\text{update}_2 \\
\text{update}_3 \\
\text{update}_4 \\
\end{array} \quad \xrightarrow{\tau_3} \quad 
\begin{array}{c}
\text{update}_1 \\
\text{update}_2 \\
\text{update}_3 \\
\text{update}_4 \\
\end{array}
$$

Parallel blocks

In contrast to the three transformations described above, transformation of parallel blocks into canonical form can produce two or more updates (update schemes) derived from a single parallel block. This fact is due to the possible presence of alternatives, which will now be separated into update schemes.

A parallel block $P$ containing $m$ alternatives with a variable amount of update schemes $(us_{ri})$ for each of the alternatives

$$
\begin{array}{c}
us_{11}; us_{12}; \ldots; us_{1a} \\
us_{21}; us_{22}; \ldots; us_{2b} \\
\vdots \\
us_{m1}; us_{m2}; \ldots; us_{mn}
\end{array}
$$

can be transformed ($\tau_4$) into $x = a \times b \times \cdots \times n$ parallel blocks

$$
\begin{array}{ccccccc}
us_{11} & us_{11} & us_{11} & us_{11} & us_{11} & us_{11} & us_{1a} \\
us_{21} & us_{21} & us_{21} & us_{22} \ldots & us_{22} \ldots & us_{2b} & us_{2b} \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\
us_{m1} & us_{m2} & us_{mn} & us_{m1} & us_{mn} & us_{m1} & us_{mn}
\end{array}
$$

This means that the sequential block containing the parallel block $P$ will be replaced by multiple $(x)$ instances of the sequential block where $P$'s occurrence will be replaced in each of the instances by a parallel block transformed as shown above.

Every parallel block consisting of only update schemes

$$
\begin{array}{c}
lhs_1 = \{g_1\} \Rightarrow rhs_1 \\
lhs_2 = \{g_2\} \Rightarrow rhs_2 \\
\vdots \\
lhs_m = \{g_m\} \Rightarrow rhs_m
\end{array}
$$

is said to be in canonical form and can be rewritten ($\tau_{4b}$) as a single update scheme simply by taking the unions of all left/right-hand sides and combining the guards.
Alternatives

Linearised sequential blocks containing alternatives will be normalised (\(\tau_3\)) into multiple sequential blocks using exactly the same principal as described in the previous section.

An alternative \(\mathcal{A}\) containing \(n\) update schemes

\[
lhs_1 \land g_1 \land \cdots \land g_n \Rightarrow \text{rhs}_1; \ldots; \ lhs_n \land g_n \Rightarrow \text{rhs}_n.
\]

can be rewritten as \(n\) update schemes

\[
\begin{align*}
\text{lhs}_1 & = \{ g_1 \} \Rightarrow \text{rhs}_1, \\
& \vdots \\
\text{lhs}_n & = \{ \bigwedge_{i=1}^{n-1} \neg g_i \land g_n \} \Rightarrow \text{rhs}_n.
\end{align*}
\]

The sequential block containing alternative \(\mathcal{A}\) will then be replaced by \(n\) instances of the sequential block where \(\mathcal{A}\)’s occurrence will be replaced in each of the instances by the individual update schemes shown above.

2.5 Implementation

In this section a simple translation of sequential update schemes to basic update schemes is presented. The advantages of such a translation are obvious. As an algorithm for implementation of basic Update Plans has already been described [60], it is not necessary to redesign the whole implementation. In other words, the proposed translation is a front-end, independent of the actual implementation of basic UP. The implementation is divided in three sections. Firstly, the main idea is presented in the following section. Secondly, the algorithm is described in section 2.5.2. Finally, section 2.5.3 gives an example of the implementation.

2.5.1 Preliminaries

The algorithm described in the following section assumes that the sequential update scheme under transformation is already in its canonical form and that all of its sequential archetypes have been expanded. For the expansion of sequential archetypes please refer to section 3.3. The main idea is to translate all sequential update schemes at compile-time into basic update schemes and adding a mechanism to synchronise the translated update schemes and existing top-level updates. Synchronisation is made possible by the introduction of a shared central update plan synchroniser \(\text{synchroniser} \ UP: \text{SEQ}\), which is a locator addressing the synchroniser’s stack. The role of the stack is twofold. Firstly, it serves as an extra guard to allow the application of only a particular update scheme from a sequential block, and secondly, if the application of a sequential block finished, or has yet to start, it allows the application of only top-level updates.
2.5.2 The algorithm

For the purposes of the implementation, the following popsh archetype is defined. It is a two-parameter stack management archetype, which combines the effects of the classical pop and push operations.

\[ \text{popsh}(v_1, v_2) = \text{UP:SEQ}[q] \ p[v_1]q \implies \text{UP:SEQ}[q'] \ p[v_2]q'. \] # pop and push

In order to prevent existing top-level update schemes from interfering with half-completed sequential update schemes, it is necessary to add an archetype call popsh(UP:TOP, UP:TOP) to every top-level update scheme in an update plan. The initial synchroniser's configuration is UP:SEQ[p] [UP:TOP]p, which enables the application of all top-level update schemes. The algorithm for the actual translation of sequential update schemes into basic update schemes is described below.

Consider the following generic top-level canonical sequential update scheme \( S \)

\[
\begin{align*}
\text{ltxti} & \text{ } \text{lc}i = g & \Rightarrow \text{rtxti} \text{ } \text{rc}i \\
\text{ltxtb} & \text{ } \text{lc}b = g & \Rightarrow \text{rtxtb} \text{ } \text{rc}b \\
\vdots
\end{align*}
\]

where \( \text{ltxti}/\text{rtxti} \) is the left/right-hand side text and \( \text{lc}i/\text{rc}i \) is the left/right-hand side context of an update scheme \( i \in \{a, b, \ldots, m\} \).

As already explained in section 2.3, not only are variables shared between left and right-hand sides of an update scheme, but they are also shared across all updates in a sequence. As the aim is to convert a sequential block into top-level update schemes, and the only way to share data across top-level updates is using ground (in other words constant) locators, additional ground locators will be used as addresses of the shared variables.

Let there be a set \( A \) of update schemes which share a variable \( v \) in the sequence \( S \). Let there also be a universe of locators \( U \) used in the transformed update plan and a locator \( \mathcal{V} \), where \( \mathcal{V} \notin U \). Then all update schemes from \( A \) need to be adapted by adding a locator expression \( \mathcal{V}[v] \) using the following two rules.

1. An update scheme from \( A \) in which \( v \) is ground is subject to transformation \( \tau_1 \)

\[
\text{lhs} = g \Rightarrow \text{rhs} \quad \tau_1 \quad \text{lhs} = [g] \Rightarrow \text{rhs} \mathcal{V}[v]
\]

2. An update scheme from \( A \) in which \( v \) is not ground is subject to transformation \( \tau_2 \)

\[
\text{lhs} = [g] \Rightarrow \text{rhs} \quad \tau_2 \quad \text{lhs} \mathcal{V}[v] = [g] \Rightarrow \text{rhs}
\]

It is also necessary to make the textual ordering of a sequence's text explicit, to conform to the semantics of sequences described in 2.3. Let there be an ordering \( B \) of \( n \) update schemes in the sequence \( S \) ordered textually in the order text is arranged in a configuration.
after expansion as described in section 2.3. Then every update scheme from that ordering is assigned a number $t$ in the range 1 to $n$ representing the position it appears in $B$ and is subject to transformation

$$ltxt\ lct = \{ g \} \Rightarrow rtxt\ rc$$

$$\tau_3$$

$$PC[pc] (pc + \sum_{j=1}^{t-1} |ltxt_j|)[ltxt] = \{ g \} \Rightarrow (pc + \delta - \sum_{j=t}^{n} |rtxt_j|)[rtxt] \Rightarrow pc$$

where $\delta = \sum_{j=1}^{n} |ltxt_j|$ and $pc \equiv PC[pc + \delta - \sum_{j=t}^{n} |rtxt_j|]$ if the update scheme under transformation is in the last (temporal) stage of $S$, otherwise $pc$ is empty. The ‘$|$’ symbol is the length operator.

Lastly, the application order of update schemes within $S$ needs to be made explicit together with an explicit statement of its entering and leaving.

**Rule 1.** The first update scheme $\mu_f$ of the sequence $S$ can only be applied if there is an appropriate unique value $\sigma$ corresponding to $\mu_f$ on the synchroniser. The following update scheme needs to be added for (top-level) sequence $S$ in order to enter sequential application of $S$’s schemes.

$$\text{popsh}(UP: TOP, UP: TOP \sigma) \Rightarrow \# \text{preamble, the first update scheme (rule 1)}$$

**Rule 2.** Let there be a temporal ordering $C$ of all sequence’s $S$ stages given by the successor function $\text{succ}(\alpha)$ as described in section 2.3. Then for every stage $\alpha \in C$ there is an update scheme $\mu$ in that stage (already adapted by transformations $\tau_1, \tau_2$ and $\tau_3$) which will be translated into a basic update scheme

$$\text{popsh}(v_1, v_2) \mu. \# \text{sequential block itself (rule 2 and 3)}$$

where $v_1$ and $v_2$ are both unique constants—the first one identifying the update scheme in stage $\alpha$, the second one identifying the update scheme in stage $\text{succ}(\alpha)$. The uniqueness of these constants in combination with the popsh archetype ensures sequentiality of application of all update schemes within the sequence $S$ and no interference with other updates.

**Rule 3.** The update scheme $\mu_t$ in the last stage of $S$’s application—$\text{succ}^{n-1}(\min(\mathcal{I}))$, where $\mathcal{I}$ is a countable instantiation set of the sequence $S$ as explained in section 2.3—is not subject to rule 2, as its successor function is not defined. There is, however, a need to transfer control back to (or enable execution of) top-level updates, in other words exit the sequential application of $S$. This is ensured by adding an update scheme 2.10 where $\mu \equiv \mu_t$, $v_2 = 'UP: TOP'$ and $v_1 = 'UP: TOP \psi'$, where $\psi$ is the constant which was used by rule 2 as $v_2$ for $\mu$ in stage $\text{succ}^{n-2}(\min(\mathcal{I}))$.

Anonymous and stageless sequences are mere syntactic sugar for ordinary sequences as described in sections 2.2 and 2.3. As such they do not need to be considered as a special case for implementation.
2.5.3 Example

Continuing with the ‘PDP-11’ example introduced in this chapter as one of the motivations for sequential update schemes, the implementation of the sequential update scheme $S$ shown in section 2.1 on page 65 is now examined. It is given again for convenience in 2.11.

\[ S \]

\[
\text{arithm}(x, y, r) \quad r_3 \implies r_3[x] \\
1 \quad \text{POSTINC}(b, x) \implies \\
2 \quad \text{PREDEC}(a, y) \implies .
\]

As the sequential update scheme is already in canonical form, we can start with transformations $\tau_1$ and $\tau_2$ and thus make sharing of variables between update schemes explicit.

\[ S \]

\[
\text{arithm}(x, y, r) \quad r_3 \implies x[y] \implies r_3[x] \\
1 \quad \text{POSTINC}(b, x) \implies x[x] \\
2 \quad \text{PREDEC}(a, y) \implies y[y].
\]

Note that variables $a$, $b$, $r_3$ and $r$ are not shared among the individual update schemes of $S$, and as such are not subject to transformations $\tau_1$ and $\tau_2$.

As mentioned in section 2.5.1, this is a compile-time transformation and so all text-expanding archetypes of a sequence must be expanded before the sequence can be translated into basic update schemes. This step is demonstrated by the update scheme 2.13.

\[ S \]

\[
\text{ADD} \quad r_3 \quad x[x] \quad y[y] \implies r_3[x] \\
1 \quad \text{POSTINC} \quad r_1 \quad r_1[b] \quad b[v] \implies x[x] \quad r_1[c] \\
2 \quad \text{PREDEC} \quad r_2 \quad r_2[b] \quad a[v_1] \implies y[y] \quad r_2[a].
\]

Note that the variables of archetype definitions have been renamed before the expansion of archetype calls as in section 2.1. As a sequential update scheme shares variables among its individual updates, variables in all updates within a sequence have to be taken into account during the renaming process.

Assuming that $|\text{ADD} r_3| = 16$, $|\text{POSTINC} r_1| = 8$, and $|\text{PREDEC} r_2| = 8$, internal and external textual orderings are made explicit by the transformation $\tau_3$ (see section 2.3) as shown by 2.14.

\[ S \]

\[
\text{ADD} \quad x[x] \quad y[y] \implies r_3[x] \\
1 \quad \text{POSTINC} \quad r_1[b] \quad b[v] \implies x[x] \quad r_1[c] \\
2 \quad \text{PREDEC} \quad r_2[b] \quad a[v_1] \implies y[y] \quad r_2[a].
\]

Finally, $S$ can be split into individual top-level update schemes using rules 1–3 from the previous section. As no text is expanded on any of the right-hand sides the ‘empty’ locator expressions $pc+32[]$ are discarded for better readability.

\[ S \]

\[
\text{popsh}(\text{UP:TOP}, \text{UP:TOP} \quad S_1) \implies . \\
\text{popsh}(S_1, S_2) \quad \text{PC}[pc] \quad pc+16[\text{POSTINC} \quad r_1] \quad r_1[b] \quad b[v] \implies x[x] \quad r_1[c]. \\
\text{popsh}(S_2, S_3) \quad \text{PC}[pc] \quad pc+24[\text{PREDEC} \quad r_2] \quad r_2[b] \quad a[v_1] \implies y[y] \quad r_2[a]. \\
\text{popsh}(\text{UP:TOP} \quad S_3, \text{UP:TOP}) \quad \text{PC}[pc] \quad \text{ADD} \quad r_3 \quad x[y] \implies r_3[r] \quad \text{PC}[pc+32].
\]
Since in our example \( r_1 \) and \( r_2 \) are identical registers, \( r_2 \) can be substituted by \( r_1 \). Also, based on the resolution set derived in 2.13, \( v, v_1 \) and \( r \) variables are replaced by \( x, y \) and \( x + y \) respectively.

\[ \text{popsh}(\text{UP:TPOP, UP:TOP S1}) \iff . \]

\[ \text{popsh}(S1, S2) \text{ PC}[pc][pc+16]\text{ POSTINC } r_1] r_1[b] b[x]c \iff X[x] r_1[c]. \]

\[ \text{popsh}(S2, S3) \text{ PC}[pc][pc+24]\text{ PREDEC } r_1] r_1[b_1] a[y]b_1 \iff Y[y] r_1[a]. \]

\[ \text{popsh}(\text{UP:TPOP S3, UP:TOP}) \text{ PC}[pc][pc+32]\text{ ADD } r_3] X[x] Y[y] \iff r_3[x + y] \text{ PC}[pc+32]. \]

To complete the example, all popsh archetypes are expanded.

\[ \text{UP:SEQ}[q] \text{ p}[\text{UP:TPOP}]q \iff \text{UP:SEQ}[q'] \text{ p}[\text{UP:TOP S1}]q'. \]

\[ \text{UP:SEQ}[q'] \text{ q}[S1]q' \text{ PC}[pc][pc+16]\text{ POSTINC } r_1] r_1[b] b[x]c \iff X[x] r_1[c] \text{ UP:SEQ}[q'] \text{ q}[S2]q'. \]

\[ \text{UP:SEQ}[q'] \text{ q}[S2]q' \text{ PC}[pc][pc+24]\text{ PREDEC } r_1] r_1[b_1] a[y]b_1 \iff Y[y] r_1[a] \text{ UP:SEQ}[q'] \text{ q}[S3]q'. \]

\[ \text{UP:SEQ}[q'] \text{ p}[\text{UP:TPOP S3}]q' \text{ PC}[pc][pc]\text{ ADD } r_3] X[x] Y[y] \iff r_3[x + y] \text{ PC}[pc+32] \text{ UP:SEQ}[q] \text{ p}[\text{UP:TPOP}]q. \]

Note that \( b_1 = c \), and since \( x \) and \( y \) are of the same type, \( x = y \) (and \( b = a \)). Also note that the value \( b \) in \( r_1[b] \) does not change with respect to the value before application of the update scheme 2.6 on page 65.

### 3 Sequential archetypes

#### 3.1 Background/Motivation

While sequential update schemes have already proved to be a good step towards more transparent and hierarchical UP specifications, they still provide only a limited improvement on the original model.

Consider the example of a simple logic circuit in figure 6.1. A basic Update Plans specification for AND and OR gates is as follows.

\[ \text{and}(a, b, c) = a[0] b[0] \iff c[0]. \]

\[ \text{or}(a, b, c) = a[0] b[0] \iff c[0]. \]

\[ = a[0] b[1] \iff c[0]. \]

\[ = a[1] b[0] \iff c[0]. \]

\[ = a[1] b[1] \iff c[1]. \]

\[ = a[0] b[1] \iff c[1]. \]

\[ = a[1] b[0] \iff c[1]. \]

\[ = a[1] b[1] \iff c[1]. \]
To connect the output D of the AND gate to the input D of the OR gate, sequential update scheme 3.2 is defined.

\[ \begin{align*}
0 & \text{ and}(A, B, D) \\
1 & \text{ or}(D, C, E).
\end{align*} \]

In the situation shown in figure 6.1 only one of 16 possible expansions is applicable

\[ \begin{align*}
1 & D[1] C[0] \rightarrow E[1]. \quad \# \text{OR gate}
\end{align*} \]

This example demonstrates that archetypes can be used in sequential update schemes in the same manner as in basic update schemes. Note that it would be possible to assign delays (or other metrics) to the individual and and or gates/archetypes, and that the overall delay of the logic circuit could then be easily calculated simply by adding maximum delays occurring in every stage of the sequential update scheme.

However, the main limitation of using basic archetypes in sequential update schemes is that sequences would be unduly complicated to encapsulate into non-sequential archetypes. Not only do sequential archetypes address this issue, but they also make it possible to explicitly predetermine locations of sequential archetypes’ updates in the hierarchy of the top-level sequential update scheme these archetypes are called from. They also provide the formalism with additional information-hiding facility with the help of which modular specifications can easily be defined.

### 3.2 Syntax

The syntax of **sequential archetype** definitions is that of ordinary archetype definitions where the \( \text{update} \) (see page 58) appearing in the archetype body is a sequential block. The syntax of archetype calls is the same for all types of archetypes. Using typesetting possibilities a sequential archetype \( a \) is defined

\[
a(params) = \text{SEQ}_a \begin{align*}
& \text{update}_a \\
& \text{update}_b \\
& \vdots \\
& \text{update}_m.
\end{align*}
\]

The new archetype grammar can be found in chapter 5, or alternatively refer to appendix A for the revised complete Extended Update Plans grammar.

### 3.3 Semantics

Syntactically, sequential archetype calls are equivalent to basic archetype calls. A sequential archetype definition, on the other hand, has a sequential block in its body instead of an update scheme. The individual stages (together with their updates) of this sequential block are then
matched during expansion against corresponding stages of the sequential update scheme they are called from.

As matching is done using sequencers and stages, both of these must be ground in the archetype body and in the sequential update scheme before any expansion can take place. The sequencer/stage pair has a similar purpose to that of indices in basic archetype calls, as described in [60], i.e. to ensure the correct placement of archetype bodies during an expansion.

In a sense, sequential archetypes are not as ‘dynamic’ as normal archetypes, and serve only as a framework to place update schemes in their predetermined locations of a sequential update scheme. Nevertheless, this is still a powerful concept, examples of which will be shown in chapter 7.

Expansion

If

\[
(\lambda x. (SEQ \{ a_1 \text{ update}_{a_1},
                     a_2 \text{ update}_{a_2},
                     \ldots,
                     a_n \text{ update}_{a_n} \}))(a(params))
\]

is equivalent to a (top-level or nested) sequential update scheme \( S \) containing one or more calls of the archetype \( a(params) \) and the definition of this archetype is

\[
a(params) = SEQ \{ b_1 \text{ update}_{b_1},
                     b_2 \text{ update}_{b_2},
                     \ldots,
                     b_n \text{ update}_{b_n} \}
\]

then the result of expanding the archetype in \( S \) is

\[
SEQ \{ c_1 \text{ update}_{c_1},
         c_2 \text{ update}_{c_2},
         \ldots,
         c_o \text{ update}_{c_o} \}
\]

where \( \{ c_1, c_2, \ldots, c_o \} = \{ a_1, a_2, \ldots, a_n \} \cup \{ b_1, b_2, \ldots, b_n \} \), and the individual \( \text{update}_{c_j} \) updates, where \( j \in \{ 1, \ldots, o \} \) are constructed as described in the following four sections.

As has already been shown in the introduction to this section, expansion of non-sequential archetypes in sequential blocks is allowed and they expand in the same manner as they do in the original UP.

3.3.1 Non-matching sequencers

If the archetype’s sequencer does not match any of the sequencers of the top-level sequential update scheme it is called from, it is matched against the sequencer of the closest sequence in which the archetype call \( a(params) \) appears—i.e. the lowest level sequence containing the
archetype call. Note that due to the syntactic sugar introduced in section 3.6.2, this is in effect an in situ expansion.

### 3.3.2 Non-matching stages

If during the archetype expansion any \( a(params) \) \( b \)-stage does not match any of the (sequencer matched) sequential update scheme's \( S \) stages, it is added temporally after the previous stage of \( S \).

---

**Example 2**

The sequential archetype `a`  
\[
a() = \begin{align*}
\text{ID} | 3 & \text{ update}_3 \\
\text{ID} | 2 & \text{ update}_2.
\end{align*}
\]

in a sequential update scheme

\[
\text{ID} | 1 & a() \text{ lhs } \rightarrow \text{ rhs}.
\]

expands as

\[
\text{ID} | 1 & \text{ lhs } \rightarrow \text{ rhs} \\
\text{ID} | 2 & \text{ update}_2 & \# 1 < 2 \\
\text{ID} | 3 & \text{ update}_3 & \# 2 < 3
\]

Although it might seem attractive to preserve textual ordering as it is in the sequential block of an archetype, this is not the primary aim. The primary aim is an accurate placement of a sequential archetype's updates in a sequence regardless of the order in which sequential archetypes expand (resulting in the normal form). This kind of expansion is also more consistent with the way placement of matching updates is performed.

### 3.3.3 \text{update}_a \text{ and update}_b \text{ are alternatives}

In case sequencer and stage matched updates \text{update}_a and \text{update}_b are both alternatives, the expansion mechanism is identical to the expansion mechanism in basic Update Plans. Consider the following sequential archetype and update scheme containing its calls.

\[
a(params) = \text{SEQ} \begin{align*}
1 & \text{ lttx}_1 \text{ lc}_1 = [g_1] \rightarrow \text{ rttx}_1 \text{ rc}_1 \\
2 & \text{ lttx}_2 \text{ lc}_2 = [g_2] \rightarrow \text{ rc}_2 \\
& \quad \vdots \\
n & \text{ lttx}_n \text{ lc}_n = [g_n] \rightarrow \text{ rttx}_n \text{ rc}_n.
\end{align*}
\]
Note the intentional omission of an archetype call $a(params)$ on the right-hand side of the alternative in stage 2. The archetype call can be omitted on the right/left-hand side provided the corresponding right/left expansion of the sequential archetype is empty. This is equivalent to the notion of left/right-handed archetypes introduced in [60].

Provided $m$ does not match any stage in the sequential update scheme and $m > n$ the expansion is

$$SEQ 1 \quad s_{lhs_1} a(params) = [s_{-g_1}] \Rightarrow a(params) s_{rhs_1}$$

$$2 \quad a(params) s_{lhs_2} = [s_{-g_2}] \Rightarrow s_{rhs_2}$$

$$\vdots$$

$$n \quad s_{lhs_n} = [s_{-g_n}] \Rightarrow s_{rhs_n}.$$
Ambidextrous sequential archetypes have an additional text, which is substituted for every instance of its call. This text precedes any text expanded as a result of expansion of left/right-hand side of an update scheme in a particular stage. Apart from this, the expansion mechanism of ambidextrous sequential archetypes is equivalent to the expansion mechanism of sequential archetypes described in the previous section.

Again, provided $m$ does not match any stage in the sequential update scheme and $m > n$ the expansion is

\[
\text{SEQ} \begin{align*}
1 \quad & \text{seq} \left( a(\text{params}) \right) \text{ text } \left( \text{ltxt} \right) \left( \text{lht} \right)_1 \left( \text{g} \right)_1 \rightarrow \left( \text{rtxt} \right)_1 \left( \text{rht} \right)_1 \\
2 \quad & \text{seq} \left( a(\text{params}) \right) \text{ text } \left( \text{ltxt} \right) \left( \text{lht} \right)_2 \left( \text{g} \right)_2 \rightarrow \left( \text{rtxt} \right)_2 \left( \text{rht} \right)_2 \\
\vdots \\
m \quad & \text{seq} \left( a(\text{params}) \right) \text{ text } \left( \text{ltxt} \right) \left( \text{lht} \right)_m \left( \text{g} \right)_m \rightarrow \left( \text{rtxt} \right)_m \left( \text{rht} \right)_m.
\end{align*}
\]

3.4.2 Command sequential archetypes

Command archetypes in general are a special case of ambidextrous archetypes and their semantics has already been explained in section 2.3 on page 59.

3.5 Parameters

Parameters of sequential and parallel archetypes have to be ground expressions before archetype expansion. The primary reason for this restriction is that the structure of a sequential block that is to be expanded must be known before transformation into canonical form which is an important part of the implementation of sequential update schemes.

--- Example 3 ---

The following archetype definition is perfectly valid as long as $n$ is known (ground) before archetype expansion and $n > 0$.

\[
\begin{align*}
a(0) = & \ . \\
a(n) = & \ s|_n b(n) a(n - 1).
\end{align*}
\]
Parameter resolution of archetype calls inside a sequential block is slightly complicated by the fact that variables are shared across individual stages of the sequential block application and the parameter resolution set has to reflect this.

3.6 Syntactic sugar

3.6.1 Update schemes in sequential and parallel blocks

If the right-hand side of an update scheme in a sequential or parallel block is empty and its guard is true, then the transition symbol ‘\(\rightarrow\)’ can be omitted.

3.6.2 An update is a sequential update

Every update update (excluding the updates which form archetype bodies) can be viewed as syntactic sugar for an anonymous one-stage sequential update \(l\) update.

The impact of this syntactic sugar is bigger than it may at first sight seem. It is significant as sequential archetypes do not necessarily have to be called from within sequential update schemes. The intention is to use this sugar in conjunction with the rule from section 3.3.1.

--- Example 4 ---

The following update plan defines a half adder.

\[ \text{xor}(x, y, s) \text{ and}(x, y, c) \rightarrow \text{.} \text{# half adder} \]

It can be viewed as syntactic sugar for

\[ l \text{xor}(x, y, s) \text{ and}(x, y, c) \rightarrow \text{.} \text{# half adder} \]

Given an archetype definition

\[
\text{xor}(x, y, s) =
\begin{align*}
1 & \quad \text{not}(x, w_2) \\
& \quad \text{not}(y, w_1) \\
2 & \quad \text{and}(x, w_1, w_3) \\
& \quad \text{and}(w_2, y, w_4) \\
3 & \quad \text{or}(w_3, w_4, s).
\end{align*}
\]

expanding the xor archetype in the definition of a half adder using the rule from section 3.3.4 gives

\[
\begin{align*}
1 & \quad \text{not}(x, w_2) \\
& \quad \text{not}(y, w_1) \\
2 & \quad \text{and}(x, w_1, w_3) \\
& \quad \text{and}(w_2, y, w_4) \\
3 & \quad \text{or}(w_3, w_4, s) \\
\text{and}(x, y, c) \quad \text{which can be transformed as}
1 & \quad \text{and}(x, y, c) \\
& \quad \text{not}(x, w_2) \\
2 & \quad \text{not}(y, w_1) \\
& \quad \text{and}(x, w_1, w_3) \\
& \quad \text{and}(w_2, y, w_4) \\
3 & \quad \text{or}(w_3, w_4, s).
\end{align*}
\]
3.7 Limitations

With the introduction of sequential update schemes a new set of problems needs to be addressed. The most obvious one stems from allowing variables to be shared among all stages of a sequential block. Consider example 5.

--- Example 5 ---

This rather forced example uses two sequential archetypes and a (sugared) sequential update scheme which calls these archetypes. One of these archetypes (r) reads a value of variable v, and the second one (w) writes a value of variable v.

\[
\begin{align*}
\text{r}(v) &= s_{12} \ A[v], \\
\text{w}(v) &= s_{11} \rightarrow B[v], \\
\text{r}(v) &\rightarrow \text{w}(v).
\end{align*}
\]

Clearly, the expansion of these archetypes the results in a "forward reference" to the variable v (v is not ground in stage 1), which is not a valid specification.

\[
\begin{align*}
1 &= \rightarrow B[v] \\
\end{align*}
\]

However, problems such as these are not exclusive to Extended UP, they were also present in basic UP, and can easily be detected by a trivial data flow analysis [1]. Any update scheme containing such a forward reference is illegal.

4 Special cases of archetype expansion

4.1 Alternatives

As a result of syntactic changes, alternatives can now (in Extended Update Plans) be bodies of archetypes. A single update scheme in which such an archetype is called will, assuming there are n update schemes in the archetype's alternative, expand into n update schemes. In other words, the archetype definition of such an archetype can be viewed as n archetype definitions of the same archetype containing update schemes with their guards adapted as shown by the transformation \( \tau_5 \) in section 2.4. Again, consistency is the primary reason for introducing alternatives into archetype's bodies.

4.2 Parallel blocks

The original Update Plans formalism does not allow the use of parallel blocks in archetypes. While this may seem unnecessary in the original Update Plans, in Extended Update Plans
parallel blocks can appear in archetypes not only as a part of a sequential update scheme, but also entirely on its own. This increases consistency and adds a degree of modularity and information hiding to the formalism.

An archetype definition
\[
a(params) = \begin{array}{c}
\text{update}_1 \\
\vdots \\
\text{update}_n
\end{array}
\]
is syntactic sugar for
\[
a(params) = \begin{array}{c}
\text{update}_1 \\
\vdots \\
\text{update}_n
\end{array}
\]
that is the same parallel block is embedded in a one-stage anonymous sequence. Thanks to the existence of the expansion rule introduced in section 3.3.1 and the syntactic sugar in section 3.6.2, the expansion of this kind of archetype is governed by the rule in section 3.3.4, which is thus in effect an in situ expansion.

5 Conclusions

In this chapter sequential update schemes and sequential archetypes were introduced as semantic extensions to basic Update Plans. They contribute to the usability of the formalism in two major areas.

Firstly, the obvious area of synchronisation. Synchronisation, or more precisely explicit temporal ordering of update schemes, was already possible in basic Update Plans. However, this required an introduction of artificial constructs unrelated to the architecture under description and made the whole specification less readable and elegant. Sequential update schemes augment the non-deterministic model of execution of basic Update Plans by explicitly stating the order in which updates will be applied and expressing a number of consecutive updates as one.

Secondly, the introduction of sequential archetypes extended the possibility for information hiding and structure reuse by encapsulating a series of synchronised updates rather than just a single atomic update. As a series of actions can be encapsulated into a module (a sequential archetype), it is possible to provide multiple definitions for exactly the same series of actions, but perhaps on a different level of abstraction. The next step is to design a mechanism proving equivalence of these levels or ideally a refinement methodology to automatically derive provably correct levels of description.

Overall, Extended Update Plans present a real improvement in readability of descriptions where sequential behaviour is required. These may include fetch/execute cycle and lower-level descriptions such as gate-level models of (a)synchronous circuits.
Chapter 7

PRAM

To demonstrate the power, compactness and intuitiveness of the use of sequential update schemes and archetypes, a specification of the parallel random access machine (PRAM) has been developed. The PRAM specification has been chosen in particular not only because of its historical importance as one of the first models of parallel computing [22], but also because of its ongoing relevance and the interest of researchers in this model [2, 40, 46, 51, 76].

A PRAM is characterised by its “memory models” which determine the PRAM’s behaviour when two or more random access machines (RAMs) attempt to read from or write to the same memory all. An informal description of these memory models is given in section 1. Section 2 gives an EUP specification of the memory models for PRAMs with two RAMs. More general n-RAM PRAM memory models are defined in section 3. The EUP specifications are used in section 4 where they are placed in the context of the remainder of a PRAM’s instruction set. Conclusions are drawn in section 5. The full specification for n-PRAM can be found in appendix E.

1 Informal description

The parallel random access machine is a theoretical uniform memory access shared memory model (UMA/SMP) [35] of parallel computing. This description and the formal specification is based on the informal PRAM specification given in [30]. PRAM consists of n random access machines [67, 72] with infinite shared memory and a common clock. Each of the RAMs can access shared memory independently from any other RAM in constant time.

Figure 7.1 shows a RAM consisting of R general-purpose registers, a program counter (PC), a signature register (SIG), an accumulator (ACC) and a memory address register (MAR). Note that the signature register is only used in the priority model, which will be explained later in this section.

A PRAM’s instruction set can be divided into four classes: arithmetic/logic instructions, load/store instructions, flow control instructions and read/write instructions. These instruc-
All RAMs execute the same program, but each one can be executing a separate code segment within the program. Execution of any instruction on the PRAM machine takes one clock cycle, i.e. a constant unit of time. One clock cycle is divided into four sequential phases (in a sense a f/e cycle) which are synchronous between processors (RAMs). In the first phase program counters of all processors that are not yet halted are increased to point to the following instruction. The second phase is a register read/write access phase and instruction execution phase. A read/write instruction will, on execution, prepare for shared memory access in the next two phases by loading the memory address register. All other instructions will update the contents of local registers. Finally the third and the fourth phases are shared memory read and write access phases respectively.

Depending on whether simultaneous reads of a same shared memory cell are allowed, two read models, the concurrent read model (CR), and the exclusive read model (ER) can be defined.

Similarly, two memory write models exist—a concurrent write (CW), and an exclusive write (EW). However, concurrent write models need further consideration. The result of two or more processors trying to write simultaneously into a same shared memory cell \( c \) has to be defined. There are many different write conflict resolution rules. Some of the most common ones are as follows.

**WEAK** Simultaneously writing the value zero to \( c \) by two or more RAMs is allowed and the value zero is stored into the cell. Simultaneously writing any other value to the cell by
two or more RAMs is forbidden and the execution of the PRAM ends in a write conflict if such a write is attempted.

**COMMON** Simultaneously writing a common value to $c$ by two or more RAMs is allowed and the common value is stored into the cell. Writing two or more different values simultaneously to $c$ by two or more RAMs is forbidden and the execution of the PRAM ends in a write conflict if such a write is attempted.

**TOLERANT** If two or more RAMs simultaneously try to write to $c$, then the value of the cell is not changed. The value of $c$ is changed only if just one RAM is writing to it at the time.

**COLLISION** If two or more RAMs simultaneously try to write to $c$, then a special collision symbol (COLL) is written to the cell, even if they are writing the same value.

**COLLISION+** If two or more RAMs simultaneously try to write two or more different values to $c$, then a special collision symbol (COLL) is written to the cell. The value of $c$ is changed normally if the RAMs are writing the same value.

**ARBITRARY** If two or more RAMs simultaneously try to write to $c$, then an arbitrarily chosen RAM writing to the cell succeeds in writing. There is no way to determine prior to the write which RAM will succeed, or determine after the write which RAM succeeded.

**PRIORITY** If two or more RAMs simultaneously try to write to $c$, then the RAM with the smallest RAM identifier, i.e. with the smallest value of the SIG register, succeeds in writing. In other words, RAM identifiers define an unequivocal and RAM-wise order of priority of RAMs so that the RAM with the smallest RAM identifier has the highest priority.

### 2-PRAM memory models

The aim of this section is to show a compact and elegant specification of a 2-RAM PRAM using EUP. This specification will serve as the basis for an $n$-PRAM specification presented in the following section. The EUP RAM instruction set specification is shared by 2- and $n$-PRAM specifications, and can be found in section 4.

As already mentioned in the previous section, the execution of an instruction on a PRAM machine is divided into four stages. The following sequential update scheme defines three of these stages. Phase two of the f/e cycle (the register read/write phase) is defined in terms of the instr archetype which is part of the definition of the pc archetype—i.e. when the instr archetype is expanded, appropriate actions for (not only) phase two of the f/e cycle will be added to the specification. The pc, shr, and shw archetypes are defined later in this section.
### 2.1 FE 1 pc(l) pc(2) # update PCs, block (stage 2) if neither P1 or P2 is running
3 shr()  # read shared memory
4 shw()  # write shared memory

The pc archetype in 2.2 increases program counters on both RAMs. \( C \) is the number of instructions in the program each of the RAMs are executing. A RAM having the value of its program counter \( pc < 0 \) or \( pc \geq C \) is halted. The constants \( P1 \) and \( P2 \) are used to distinguish between processor (RAM) 1 and 2. Note the use of the symbol \( '-' \) to expand these constants. For example, for \( P = 1 \), \( P^-p \) expands to \( P^-1 \) which is equivalent to \( P1 \). The ‘empty’ update scheme in stage two of the pc archetype prevents unwanted blocking behaviour of the FE sequence caused by the absence of the register access stage.

### 2.2 pc(p) FE 1 P^-p: PC\[pc\] pc[\text{instr}(p)]qc = [0 \leq pc < C] \Rightarrow P^-p:PC[qc] \# running
2 \Rightarrow .
\[\text{pc}(p) = P^-p:PC[pc] \Rightarrow 0 > pc \lor pc \geq C \Rightarrow .\] # don't block

In order to make the specification more concise, operators \( \oplus \) and \( \ominus \) are introduced.
\[
\{a_1 \ominus a_2 \mid a_1 = a_2 \land a_1 \neq \text{NULL}\}
\]
\[
\{a_1 \ominus a_2 \mid a_1 \neq a_2 \lor a_1 = \text{NULL} \lor a_2 = \text{NULL}\}
\]

Memory read conflicts occur only if the PRAM's memory read model (given by the memory read model register \( MR \)) is \( ER \), and both RAMs are trying to read the same shared memory cell. The shr (shared memory read) archetype is defined to read values from shared memory to the processors’ accumulators, and to act as a barrier to halt the entire ER PRAM should a read conflict occur.

### 2.3 shr() = PRAM:MR[mr] P1:MAR[a_1] P2:MAR[a_2] = [a_1 \ominus a_2 \land mr = ER] \Rightarrow P1:PC[C] P2:PC[C];
" \Rightarrow \text{read}(1, a_1) \text{ read}(2, a_2).

If no read conflicts occur and the current instruction is a memory read instruction, a value is read from shared memory into a RAM’s accumulator by the following archetype.

### 2.4 read(p, a) = a[v] = [a \neq \text{NULL}] \Rightarrow P^-p:ACC[v]; \# read value v from shared memory

The MAR on every RAM contains a special value \( \text{NULL} \) when the currently executed instruction doesn’t access shared memory.

The shw performs similar function to the shr archetype, but it writes values into shared memory. The PRAM’s memory write model is given by the memory write model register (\( MW \)).
Note that writing the constant C into a p's program counter register is effectively a \texttt{HALT} instruction.

If the addresses in \texttt{MAR} registers are not the same, there are clearly no write conflicts and both values are written to shared memory using the following archetype.

\begin{verbatim}
[2.6] write(a,v) = a \neq NULL \Rightarrow a[v]; \# write value v into shared memory

\end{verbatim}

\section*{3 $n$-PRAM memory models}

This section has a specification for an $n$-RAM PRAM. It is a relatively high-level specification abstracting away implementation details and giving only two memory write models. The complete EUP specification can be found in appendix E.

We first define the instruction cycle, as we did for the 2-PRAM. This time the PRAM is defined by the archetype \texttt{pram}(3.1), rather than just a sequential update scheme as for the 2-PRAM (2.1). The number of processors and memory read/write models are parameters of the specification rather than of the machine configuration. Again, only three out of four stages are included in the FE sequence, as the second (execution) stage of the instruction execution will be expanded by the \texttt{pc} archetype which was defined in section 2. As will be explained later, the enforced blocking behaviour when no RAM is running is intentional. The \texttt{pcs}, \texttt{shr} and \texttt{shw} archetypes now take additional arguments. The variable $n$ is the number of RAMs in the PRAM machine ($n > 0$), $rm$ and $wm$ are the memory read and write models respectively. For example, a 5-PRAM with concurrent read and common concurrent write would be specified by \texttt{pram}(5, CR, CW\texttt{-COMMON}).
As mentioned at the start of this section, this specification is relatively abstract. It uses the ‘limits’ notation (e.g. in 3.2) to define ‘multi-processor’ archetypes, and the (multi)set notation (e.g. in 3.6) to simplify guards. This notation can easily be implemented as recursive archetypes as shown in appendix E. For example, the pcs archetype (which expresses the first stage of an instruction execution) can be defined using the limits notation as

\[
\text{pcs}(n) = \underbrace{\text{pc}(1) \ldots \text{pc}(n)}_{n=1}.
\]

which is shorthand for \(\text{pc}(1) \ldots \text{pc}(n)\). The same archetype in 3.3 is an example of a recursive implementation. It increases the PC on every running RAM \(p\) using the archetype 2.2 from the previous section. If there are no running RAMs, execution is blocked (the FE sequence is only partially applicable), and the whole PRAM stops.

The following halt archetype serves to stop a specific RAM and also the whole PRAM. This archetype is used by shr/shw archetypes in terminal conflict models when a read/write conflict occurs.

\[
\text{halt}(p) = \rightarrow P \setminus p : \text{PC}(p). \quad \# \text{halt}(p) \text{ stops the whole PRAM}
\]

The shr archetype checks simultaneous reads from shared memory. It uses set theory in order to discover read accesses to the same memory location. Addresses waiting to be read in shared memory are stored in each one of RAM’s MAR registers. The archetype shr checks a RAM’s (non-NULL) MAR register value against the values in all the other RAM’s MARs, and if it finds a conflict, the entire PRAM is halted. The read archetype is defined in 2.4. The shrd archetype in 3.5 provides the address \(a_p\) from the memory address register of processor \(p\).

The notation \(\{a_p\}\) is the bag or multiset containing the elements \(a_p\), and an element \(a_p^{\text{NULL}} = a_p\) if and only if \(a_p \neq \text{NULL}\), otherwise it is equivalent to an empty/non-existent element.

\[
\text{shr}(n, \text{CR}) = \underbrace{\text{shrd}(p, a_p) \text{ read}(p, a_p)}_{p=1}.
\]

\[
\text{shr}(n, \text{ER}) = \underbrace{\text{shrd}(p, a_p) \text{ halt}(p)}_{p=1} = \left\{ \begin{array}{ll}
\{a_p^{\text{NULL}}\} & \neq \{a_p^{\text{NULL}}\} \Rightarrow \# \text{read conflict} \\
\{a_p^{\text{NULL}}\} & \neq \{a_p^{\text{NULL}}\} \Rightarrow \# \text{no read conflict}
\end{array} \right.
\]
A similar approach will be used to determine shared memory write conflicts, but simple
detection of the fact that 1 or more write conflicts are due to occur is not sufficient. Different
actions need to be taken in different memory models.

The number of RAMs (n) and the memory write model are now parameters of shw archetypes. The write archetype is defined in 2.6. The shwr archetype reads the address (ap) from
the memory address register of processor p and the value (vp) to be written at this address in
shared memory.

[3.7] shwr(p, a, v) = P\[\text{MAR}[a] P\[\text{ACC}[v].

An archetype is defined for each memory write model. Only the exclusive write and the
weak concurrent write models are described in detail here. The full specification is in
the appendix E.

[3.8] shw(n, EW)

\[
\begin{align*}
\text{shwr}(p, a_p, v_p) = & \begin{cases} 
\text{write}(a_p, v_p) & \text{# no write conflict} \\
\text{halt}(p) & \text{# write conflict}
\end{cases} \\
\text{shwr}(p, a_p, v_p) = & \begin{cases} 
\text{write}(a_p, v_p) & \text{# no write conflict} \\
\text{halt}(p) \quad \# write conflict
\end{cases}
\end{align*}
\]

4 Instructions

4.1 Addressing modes

4.1.1 Implicit modes

There are three implicit addressing modes on the PRAM—i.e. addressing modes that are
implicit in the opcode, rather than being explicitly provided as an operand. Access to these
addressing modes always takes place in the register read/write phase. The three implicit
addressing modes are accumulator addressing,

[4.1] accr(p, v) = FE\[2 P\[\text{ACC}[v]. \quad \# \text{accumulator read}
\text{accw}(p, v) = FE\[2 \quad \Rightarrow P\[\text{ACC}[v]. \quad \# \text{accumulator write}

signature register addressing
4.1.2 Register modes

We first define two 'textless' register addressing archetypes that can be used to read/write values from/to local registers.

\[ \text{reg}(p, r, v) = \text{FEI} \quad P-p: r[v]. \] # local register read
\[ \text{regw}(p, r, v) = \text{FEI} \quad P-p: r[v]. \] # local register write

These archetypes can, among other things, be used to define the operand register addressing modes. These are direct register addressing mode

\[ \text{DR}(p, r, v) \quad r = \text{reg}(p, r, v). \] # direct register (read)

and indirect register addressing mode.

\[ \text{IR}(p, r_1, v) \quad r = \text{reg}(p, r, r_1) \quad \text{reg}(p, r_1, v). \] # indirect register (read)

Note that archetypes in 4.5 and in 4.6 are command archetypes generating addressing mode mnemonic for a PRAM instruction. These two addressing modes are known as register modes.

\[ \text{rm}(p, a, v) = \text{DR}(p, a, v) \] # register modes
\[ = \text{IR}(p, a, v). \]

Parameter \( p \) is the processor number, \( a \) is the effective (register) address, and \( v \) is the value accessed. Register modes are the only addressing modes that may be used by a STORE instruction (the STORE instruction must have an effective (register) address in which to store the value—see section 4.3).

4.1.3 Immediate/Register modes

Most of the remaining instructions (jump instructions, arithmetic/logical instructions, the LOAD instruction) may also take an immediate value. Adding this to the register modes defines the set of immediate/register modes.

\[ \text{irm}(p, v) = \text{IMM} v \quad \# \text{immediate mode and register modes} \]
\[ \text{rm}(p, \_ , v). \]
4.1.4 Memory Modes

Memory read and memory write accesses can be direct or indirect. A direct memory read is defined by

\[
\text{DMR}(p) \begin{array}{c}
2 \\
3 \\
\end{array} \\
\text{P}^\text{p}: \text{MAR}[m] \\
\text{P}^\text{p}: \text{MAR}[^\text{m}].
\]

The MAR is loaded with the address in phase two (register read/write), and cleared in phase three (memory read).

The indirect memory read is

\[
\text{IMR}(p) \begin{array}{c}
2 \\
3 \\
\end{array} \\
\text{P}^\text{p}: x[m_1] \\
\text{P}^\text{p}: \text{MAR}[m_1].
\]

The addressing modes in 4.9 and 4.10 are combined to form a set of memory read modes for use with the READ instruction—see section 4.5.

\[\text{mrm}(p) = \text{DMR}(p) \cap \text{IMR}(p).\]

The memory write modes are defined similarly.

\[
\text{DMW}(p) \begin{array}{c}
3 \\
4 \\
\end{array} \\
\text{P}^\text{p}: \text{MAR}[m] \\
\text{P}^\text{p}: \text{MAR}[^\text{m}].
\]

\[
\text{IMW}(p) \begin{array}{c}
2 \\
3 \\
4 \\
\end{array} \\
\text{P}^\text{p}: x[m_1] \\
\text{P}^\text{p}: \text{MAR}[m_1] \\
\text{P}^\text{p}: \text{MAR}[^\text{m}].
\]

These two addressing modes form the memory write mode set.

\[\text{mwm}(p) = \text{DMW}(p) \cap \text{IMW}(p).\]

A read from shared memory will take place in the memory read phase (phase three), and a write to shared memory will take place in the memory write phase (phase four). A processor will read from memory if and only if its MAR contains a non-NULL address at the onset of phase three, and write to memory if and only if its MAR contains a non-NULL address at the onset of phase four. The mrm and mwm addressing modes are designed to load the processors’ MARs with non-NULL addresses at the correct stage of the f/e cycle.

4.2 Accumulator loading instructions

There are two groups of (local) accumulator loading instructions—the arithmetic/logic instructions and the load instructions. We will first define the way in which these instructions access the value to be written to the accumulator.
Arithmetic instructions

All arithmetic/logical instructions use only (local) registers (the imr addressing mode set), and hence take place wholly in phase two (the register read/write phase).

Arithmetic/logical operators can be distinguished as being binary operators

\[ \begin{align*}
\text{binary}(x, y, x + y) &= \text{ADD}. & \# \text{ addition} \\
\text{binary}(x, y, x - y) &= \text{SUB}. & \# \text{ subtraction} \\
\text{binary}(x, y, x \times y) &= \text{MUL}. & \# \text{ multiplication} \\
\text{binary}(x, y, x/y) &= \text{DIV}. & \# \text{ division} \\
\text{binary}(x, y, x \% y) &= \text{MOD}. & \# \text{ modulo} \\
\text{binary}(x, y, x \ll y) &= \text{SHIFT}. & \# \text{ shift left} \\
\text{binary}(x, y, x \& y) &= \text{AND}. & \# \text{ bitwise and} \\
\text{binary}(x, y, x | y) &= \text{OR}. & \# \text{ bitwise or} \\
\text{binary}(x, y, x \oplus y) &= \text{XOR}. & \# \text{ bitwise xor}
\end{align*} \]

or monadic operators

\[ \begin{align*}
\text{monadic}(x, \log(x)) &= \text{LOG}. & \# \logarithm \\
\text{monadic}(x, \text{not}(x)) &= \text{NOT}. & \# \text{ bitwise not}
\end{align*} \]

Both binary and monadic operators access one argument from the instruction (immediate/register addressed) operand. Binary operators access their other (first) argument from the processor's accumulator.

\[ \begin{align*}
\text{arlog}(p, r) &= \text{binary}(x, y, r) \text{ irm}(p, y) \text{ accru}(p, x) \\
&= \text{monadic}(y, r) \text{ irm}(p, y).
\end{align*} \]

Load instructions

There are three load instructions on the PRAM. They all load the processor's accumulator with a value. We will first define the value access and corresponding opcode. Note that LOADINDEX and LOADPC are zero operand instructions.

\[ \begin{align*}
\text{load}(p, v) &= \text{LOAD} \text{ irm}(p, v) \\
&= \text{LOADINDEX} \text{ sigr}(p, v) \\
&= \text{LOADPC} \text{ pcr}(p, v).
\end{align*} \]

Writing the result

We now combine these two sets of instructions and define the write to the accumulator.

\[ \begin{align*}
\text{toacc}(p, v) &= \text{arlog}(p, v) \implies \text{accw}(p, v) \\
&= \text{load}(p, v) \implies \text{accw}(p, v).
\end{align*} \]
4.3 General purpose register loading instructions

The store instruction writes the contents of the accumulator to a local register. The following update scheme defines the value and effective address access.

\[ \text{store}(p, r, v) = \text{STORE rm}(p, r, -) \text{ accr}(p, v). \]

The write to the register is defined by

\[ \text{toreg}(p, r, v) = \text{store}(p, r, v) \Rightarrow \text{regw}(p, r, v). \]

4.4 Program counter loading instructions

The jump family of instructions will load a processor's program counter. The value to be loaded is defined by

\[ \text{jump}(p, v > 0, a) = \text{JPOS irm}(p, a) \text{ accr}(p, v). \]
\[ \text{jump}(p, v = 0, a) = \text{JZERO irm}(p, a) \text{ accr}(p, v). \]
\[ \text{jump}(p, \text{TRUE, } a) = \text{JUMP irm}(p, a). \quad \# \text{unconditional jump} \]
\[ \text{jump}(\_, \text{TRUE, } C) = \text{HALT}. \]

The program counter is only updated if the condition in the second parameter of jump is TRUE.

\[ \text{topc}(p) = \text{jump}(p, \text{cond, } a) = \langle \text{cond} \rangle \Rightarrow \text{pcw}(p, a) \]
\[ = \text{jump}(p, \text{cond, } a) = \langle \neg \text{cond} \rangle \Rightarrow . \]

4.5 Memory read instructions

The READ instruction moves a value from shared memory into a RAM's accumulator. As mentioned earlier, the values are read from shared memory in phase three of the f/e cycle by the shr archetype, which checks memory read conflicts. The read archetype only generates the instruction and addressing mode mnemonic, and moves shared memory address into the MAR for the shr archetype.

\[ \text{read}(p) = \text{READ mrm}(p). \]

4.6 Memory write instructions

The WRITE instruction stores values from a RAM's accumulator into shared memory. Similarly to the READ instruction, the actual write is done by the shw archetype, which checks and resolves memory write conflicts.

\[ \text{write}(p) = \text{WRITE mwm}(p). \]
4.7 The instruction set

Finally, all 5 types of instructions are added to the instruction set.

\[4.25\]  
\[\text{instr}(p) = \text{toacc}(p, \_)
\]
\[= \text{toreg}(p, \_\_, \_\_)
\]
\[= \text{topc}(p, \_)
\]
\[= \text{read}(p)
\]
\[= \text{write}(p).
\]

5 Conclusion

Although [61] contains a specification of a parallel machine, this was only of pipelining in a relatively simple RISC processor. This specification, on the other hand, is a simple demonstration that a massively parallel system can easily be formally described with EUP. It could also be relatively easily adapted to describe more recent PRAM models such as [76].

While a similar specification of the PRAM machine would be possible using basic Update Plans, this could only be done at a more concrete level of description by enforcing sequential behaviour through constructs unspecified by the PRAM model—i.e. effectively preempting design decisions about the implementation of parallelism which the PRAM model was developed to avoid. The Extended Update Plan specification preserves the level of abstraction of the PRAM model, with the shr and shw archetypes containing all the characteristics of the various access models.

In addition the Extended Update Plan specification encapsulates most of the PRAM instructions in single sequential archetype definitions—corresponding to the descriptions with which a user would be familiar—even though the effects of these instructions take place across several clock cycles. This greatly simplifies the task of verification between multiple levels of specifications as a direct correspondence between these levels can easily be found.
Chapter 8

Other Methods

This chapter provides an overview of the most frequently used formal methods in the area of formal specification of hardware architectures. It is almost impossible to provide a complete list of methods that serve this purpose. For a fuller picture the reader is referred to comprehensive surveys in [18, 29, 42, 52, 68, 80], where methods for specification and verification, not only of computer hardware, have been described.

A group of specification methods (e.g. [31]) which is not considered in this chapter is based on process algebras. Although they are based on rigorous, well developed mathematical theories providing a variety of techniques for proving and verifying properties, this approach is still in its infancy and readability of such specifications is generally poor.

Several examples that demonstrate most of the formalisms and compare them to UP are given. Although the examples are too basic to exercise a wide range of language features, or to give a true test of ease of expressibility, they should provide some insight into the formalisms.

1 Specification methods

It is difficult to separate various formalisms into distinct classes as some of them overlap and every generalisation is bound to introduce some inaccuracies. Nevertheless, they will be categorised here by their area of predominant use.

1.1 Hardware

Hardware Description Languages (HDLs) have been used in the industry since the 1960s to document and simulate designs, mainly at the circuitry level of machine architectures. The most widely used HDLs are VHDL, Verilog and ELLA.
### 1.1.1 VHDL


Each VHDL description contains three main parts: the ENTITY section describing the entity interface, the ARCHITECTURE section of which there may be several instances for a particular entity and the CONFIGURATION section which defines the particular architecture to be used for an entity by its environment.

A rich set of tools and has been written for VHDL which aid the development, synthesis, testing and verification of hardware designs. There are also IEEE standard libraries for some pre-defined components.

Several methods have been developed to translate a VHDL subset into formats suitable for formal verification by both model checking and theorem proving [49, 71]. However, as VHDL semantics (in contrast to UP) is not formally defined, any such translations must be taken as only 'provisional'.

### 1.1.2 Verilog

Verilog HDL [73] and VHDL are essentially identical in function, however, Verilog is simpler (less general) than VHDL, and syntactically different. Its programming constructs are based on C, while those of VHDL are based on ADA. Verilog was made IEEE standard 8 years after VHDL in 1995. Similarly to VHDL, Verilog has a large library of predefined components.

Figure 8.1 shows a structural specification of a half adder in both Verilog and the UP formalism. An example of a sequential logic circuit is given in 8.2, now comparing Verilog to Extended UP as described in chapter 6.

The attempts [28] to formalise Verilog in order to facilitate formal verification are (in common with VHDL) still in very early stages. Unlike UP, Verilog is (in common with VHDL and ELLA) a deterministic language, so non-determinism must be emulated.

---

**Verilog**

module half_adder(a, b, s, c);
input a, b;
output s, c;
xor g1(a, b, s);
and g2(a, b, c);
endmodule

<table>
<thead>
<tr>
<th>Verilog</th>
<th>Update Plans</th>
</tr>
</thead>
<tbody>
<tr>
<td>module half_adder(a, b, s, c);</td>
<td>half_adder(a, b, s, c) = xor(a, b, s) and(a, b, c).</td>
</tr>
<tr>
<td>input a, b;</td>
<td></td>
</tr>
<tr>
<td>output s, c;</td>
<td></td>
</tr>
<tr>
<td>xor g1(a, b, s);</td>
<td></td>
</tr>
<tr>
<td>and g2(a, b, c);</td>
<td></td>
</tr>
<tr>
<td>endmodule</td>
<td></td>
</tr>
</tbody>
</table>

Figure 8.1: Verilog vs. UP
CHAPTER 8: OTHER METHODS

1.1.3 ELLA

ELLA [57] has been developed for circuit design, with the aim of supporting automatic synthesis from high-level behavioural descriptions to low-level structural descriptions. ELLA is a parallel language and describes circuit behaviour by defining nodes, connections and signal flows in that circuit.

One of the marked differences between ELLA and VHDL is that in VHDL the structural, behavioural and procedural design descriptions must be separate, whereas in ELLA they can be freely mixed. Also, ELLA is primarily a functional language, whereas VHDL relies on state transitions. It is not only the regional use\(^1\) and features described in this section that separate VHDL and ELLA. For a fuller comparison between VHDL and ELLA the reader is referred to [79].

As with other HDLs, though a complete formal semantics for ELLA does not currently exist, various verification strategies (based on model checking and theorem proving) for a subset of this specification language have been developed [5, 6, 12].

Two simple examples are provided to compare specification capabilities of ELLA and (E)UP. The first one (figure 8.3) is a two-bit multiplexer, the second one (figure 8.4) is a sequential parity checker. Note the clear arrangement of the parity checker modules in the EUP definition which makes the layout and the connections between the modules immediately obvious.

\(^1\)VHDL is the standard language for the US DoD, ELLA is likely to become its equivalent for the UK MoD.
1.2 Concrete machines and instruction sets

1.2.1 RTL, ISPS

Historically, the best known formalisms specifically aimed at the description of instruction sets are register transfer languages (RTL) and Instruction Set Processor Specification (ISPS) [69].

There is a variety of slightly differing RTL notations for describing the workings of computers at the register level. They all, however, semi-formally describe behaviour of computers as stepwise transformations on register contents, where variables correspond to hardware registers, using some FORTRAN, PASCAL or C constructs.

ISPS has its origins in the Instruction Set Processor (ISP) notation and has been frequently used in the past as a design tool to cover a wide area of applications. One of the most significant applications is the PDP-11 specification [20]. However, the specification was ambiguous and various implementations of the PDP-11 machine exhibited differing behaviour of several instructions. Formal semantics of the Update Plans formalism ensures that only unambiguous specifications are written.

1.2.2 RAPIDE

RAPIDE is a high level, event-based, concurrent, object-oriented specification and simulation language. It was designed for prototyping system architectures. Instruction sets are modelled by communicating modules forming an ‘architecture’. The formalism started as an effort to complement often over-specific and hard to understand HDL descriptions.

Unfortunately, RAPIDE lacks a formal semantics, which makes any rigorous formal verification methodology impossible. The only way to verify properties of a system under description

\[ \text{inv}(12, 11) \]
\[ \text{mux}(\text{in}, 11, 12, 13) \]
\[ \text{reg}(\text{HI}, 14) \]
\[ \text{mux}(14, 13, \text{HI}, \text{out}) \]
\[ \text{reg}(\text{out}, 12) \]

Figure 8.4: ELLA vs. EUP
**Rapide**

```plaintext
type Producer is interface
  action out Send(data: integer);
  action in Ready();
behavior
  function GenData() return integer;
begin
  Start => Send(GenData());
  Ready => Send(GenData());
constraint
  observe from Start, Send, Ready
  match (Start -> Send) -> [* rel ->] (Ready -> Send);
end Producer;
```

**Update Plans**

```plaintext
RECEIVED[0] START[1] READY[0].
  # Producer
  START[s] READY[v] = [ s \lor v ] \rightarrow
  SEND[gendata()] START[0] READY[0].
  # Consumer
  RECEIVED[r] SEND[data] READY[0] \rightarrow
  RECEIVED[r + 1] READY[1] r[data].
```

Figure 8.5: Rapide vs. UP

is traditional simulation. The authors [50, 66] argue that the strength of the Rapide formalism lies in partially ordered sets of events (posets) which are generated by executing a Rapide model. Where other concurrent event-based simulation languages produce linear traces of events, Rapide simulation shows dependency between events. However, the same effect (building posets) could easily be achieved by an UP simulator by annotating special "communication cells".

The power of Update Plans to describe concrete data manipulations in a parallel environment is demonstrated in figure 8.5 which shows the classic "producer/consumer" problem. The very first ‘Send’ is initiated by ‘Start’ and subsequent ‘Send’ signals causally follow ‘Ready’ (acknowledgement) signals. The Rapide specification of the ‘Consumer’ interface is not shown here for the sake of brevity, as its description is even longer than that of ‘Producer’. On the other hand, the UP specifications of both the ‘Producer’ and ‘Consumer’ processes are included. The ‘Consumer’ simply stores data sent by the ‘Producer’ into a buffer.

1.3 Parallelism

1.3.1 UNITY

The UNITY formalism [16] consists of two parts: a programming language based on transition systems, and a specification language, based on a linear temporal logic. Similarly to other formalisms, such as UP and CHAM, the UNITY computational model is liberated from control management.

A minimal UNITY program consists of three parts: a collection of variable declarations called the ‘declare’ section, a set of initial conditions called the ‘initially’ section, and a
CHAPTER 8: OTHER METHODS

UNITY

program TrafficLight
declare
ns, ew : {red, grn, yel}
initially
ns, ew = {red, red}
assign
ns := grn if (ns = red) ∧ (ew = red)
| ns := yel if (ns = grn)
| ns := red if (ns = yel)
| ew := grn if (ns = red) ∧ (ew = red)
| ew := yel if (ew = grn)
| ew := red if (ew = yel)
end TrafficLight

Figure 8.6: UNITY vs. UP program for a traffic light controller.

finite set of statements in the ‘assign’ section. The standard UNITY execution model is a non-deterministic, fair interleaved selection of all statements from the ‘assign’ section.

An example of a program for a traffic light controller in both UNITY and UP is given in figure 8.6. Although very similar to UP, UNITY lacks an appropriate abstraction mechanism and is more suited for the description of concurrent programs, rather than architectures. Also, as opposed to Extended UP, as described in the second part of this thesis, there is no concept of a sequence which is often needed to describe hardware architectures.

1.3.2 Γ, CHAM

The Chemical Abstract Machine (CHAM) model of computation [9, 10, 37] is fashioned on chemicals and chemical reactions. The model is built upon the Γ language [4] for parallel programming. Γ computation is a set of reactions that consume elements of a multiset and produce new ones according to the rules that constitute the program. As reactions can take place in any order (or even simultaneously), the model is inherently parallel in common with other formalisms such as UP and Petri Nets.

The Γ language is extended by the CHAM formalism by the provision of a classification scheme for reaction rules and a membrane construct which extends the use of multisets in such a way that they can form parts of molecules. The second extension allows the formalism to deal with abstraction and hierarchical programming, as a membrane can be porous to allow communication between an encapsulated solution (multisets of molecules) and its environment. Perhaps an overused example, but one which effectively demonstrates the philosophy of CHAM is that a solution originally made of all integers from 2 to n along with a rule that any integer destroys its multiple will result in a solution of prime numbers between 2 and n.

CHAM, as opposed to UP, is predominantly aimed at formally specifying and analysing
software architectures. In contrast to CHAM, UP specifications are also reasonably simple to implement on concrete machines.

1.4 Protocols

There are many competing, well established protocol description languages/environments. The main contributors are SDL, Estelle and LOTOS.

SDL stands for Specification and Description Language [15]. It is an object-oriented, formal language standardised by The International Telecommunications Union—Telecommunications Standardization Sector (formerly CCITT). It was developed to describe real-time and distributed communicating systems. The major drawback in using SDL for protocol specification, however, is that both the graphical and textual protocol specification descriptions tend to be large, and therefore difficult to understand and maintain.

Estelle (Extended State Transition Language) [38] is an ISO standardised, partly formalised technique for the specification of distributed and concurrent processing systems based on an extended state transition model (non-deterministic finite state machine augmented by the addition of variables). Estelle has been successfully used on the specification and analysis of many real protocols.

LOTOS and especially DILL deserve further attention as they have been used for the description of hardware, specifically sequential digital circuits.

1.4.1 LOTOS, DILL

LOTOS (Language Of Temporal Ordering Specification) [39] is a by-product of the effort of standardisation of the Open Systems Interconnection (OSI) within ISO. It is a standardised formal description technique designed to describe distributed concurrent information processing systems, in particular the OSI architecture and the related standards.

DILL (Digital Logic in LOTOS) [33, 34] uses LOTOS to formally specify digital hardware making use of a library of typical components and is realised through translation into LOTOS. The analysis and verification of properties takes place at the LOTOS level.

For a comparison between DILL and UP see figure 8.7. The basic component of a logic circuit (a NAND gate) is described using (only) a behavioural style in DILL, and both the behavioural and structural styles in UP. It turns out that great care must be taken to avoid non-deterministic behaviour of components when writing DILL specifications. Even if such care has been taken there is another problem—the DILL model is not suitable for circuits containing cyclic connections. Although the NAND gate is modelled in the manner "when all inputs arrive, then output happens" as in UP, the DILL specification will deadlock [33] if there is a cyclic connection within each stage.

Another disadvantage of this approach is that DILL only has a limited set of components in the hardware library and the construction of new components requires knowledge of both


**DILL**

<table>
<thead>
<tr>
<th>process Nand2 [Ip1, Ip2, Op]: noexit :=</th>
</tr>
</thead>
<tbody>
<tr>
<td>(Ip1 ? dtIp1 : Bit; exit (dtIp1, any Bit))</td>
</tr>
<tr>
<td>(Ip2 ? dtIp2 : Bit; exit (any Bit, dtIp2))</td>
</tr>
<tr>
<td>&gt;&gt; accept dtIp1, dtIp2 : Bit in</td>
</tr>
<tr>
<td>(Op ! (dtIp1 nand dtIp2);</td>
</tr>
<tr>
<td>Nand2 [Ip1, Ip2, Op])</td>
</tr>
<tr>
<td>endproc (* Nand2 *)</td>
</tr>
</tbody>
</table>

**Update Plans (behavioural description)**

- `ip1, ip2 :: Bit.`
- `nand(ip1, ip2, ~(ip1 & ip2)).`

**Update Plans (structural description)**

- `nand(ip1, ip2, op) = ip1[0] ip2[0] -> op[1].`
- `= ip1[0] ip2[1] -> op[0].`
- `= ip1[1] ip2[0] -> op[0].`

**Figure 8.7:** DILL vs. UP modelling a NAND gate

DILL and LOTOS. However, it seems to be convenient for giving higher-level architecture specifications.

### 1.5 Z/VDM

The list of formal methods would not be complete without at least mentioning the pioneering Z notation and the Vienna Development Method (VDM) [32] both of which still have a large user base. Z and VDM are based on set theory and first order predicate calculus. ISO standards exist for both of them.

Z is a non-executable specification language developed mainly by the Programming Research Group at the Oxford University Computing Laboratory from the late 1970s. The most noticeable difference between Z and VDM are structures called schemas, by which programs are described.

VDM, being a method rather than just a notation for expressing software specification design and development, also contains an inference system for constructing correctness proofs and a methodology for developing software from a specification in a formally verifiable manner.

While different in syntax and structure, Z and VDM do not differ radically from one another. Unfortunately, neither Z nor VDM can handle concurrent systems [78].

Although there has been some work [24] in describing computer architectures using both these methods, they are used primarily for software requirements specification and program development.

### 2 Verification

While formal specification of a system often forms an important part of its design, it is sometimes not sufficient and needs to be complemented by some sort of a verification strategy. This section gives a short overview of different verification techniques.

The traditional method of trying to ensure correctness is through simulation and testing.
However, as Moore's prediction\(^3\) turned out to be very accurate, these techniques have their limitations as with ever-increasing complexity of hardware it is impossible to simulate all inputs or sequences of inputs. Formal techniques, on the other hand, are better able to scale with complexity by using various mathematical methodologies rather than exhaustive testing.

While there have been some attempts at the "ideal solution"—correct-by-construction synthesis [64], designers are still using hand-crafted custom design [44] in an effort to optimise performance of systems, which necessitates some post-design verification.

Formal post-design verification involves the use of analytical methods to prove that the implementation of a system conforms to the specification. Formal proofs are based on establishing that universal properties about the design hold independently of any particular set of inputs, or on showing the equivalence between several layers of a system specification with differing degree of abstraction. Verification methods can be classified into two major groups: model-checking and theorem-proving, based on these criteria.

In model-checking, the implementation description (model) is given as (or transformed to) a finite state machine (FSM), and the specification description by properties given in some kind of temporal logic. Correctness is then established by showing whether some property holds in the FSM model, or if not, a counter example is provided. Model-checking tools are fully automated, and perform exhaustive searches through the state space of the model. Unfortunately, this method is not scalable to larger circuits due to the state explosion problem. Various methodologies have been developed to alleviate the state explosion [17] usually by treating the sets of states symbolically or replacing the system to be checked by a simpler one in which irrelevant details are suppressed.

Some specification methods use first-order (Boyer-Moore/ACL2) or higher-order logic (HOL, PVS) for both the specification and implementation description. Theorem-proving then tries to establish whether the specification and implementation are equivalent or the language representing the implementation is contained in the language representing the specification. The main advantages of this approach are that the formal proof can be mechanically checked and, in contrast to model checking, theorem proving can deal directly with infinite state spaces. However, as opposed to model-checking the derivation of a formal proof is extremely tedious, as the theorem-proving tools are semi-automatic and need a large degree of expertise for efficient use. Moreover, a theorem-prover is not guaranteed to always give an answer because of decidability problems.

### 2.1 ACL2

One integrated specification and verification method for computer architectures that stands out of the rest mainly because of its use in the industry, is A Computational Logic for Applicative Common Lisp (ACL2) [13, 41]. ACL2 is a re-implementation of the Boyer-Moore

---

\(^3\)In 1965 Gordon Moore predicted that complexity of hardware devices would double every 18 months.
system Nqthm. Like Nqthm, ACL2 supports a Lisp-like, first-order mathematical logic.

Most of the Common Lisp functions were axiomatised or defined as functions or macros in ACL2. In contrast to Common Lisp, all functions in ACL2 are total.

There are two large scale verification projects based on ACL2. The first one is a formal executable specification of the Motorola CAP [26] digital signal processor. The second, perhaps even more important project was the application of ACL2 to formalise and prove the correctness of the microcode for the kernel of the floating point division operation used on the AMD5K86 microprocessor [56].

As ACL2 is based on theorem proving, it is more reliant on the user. Authors themselves admit [13] that ACL2 proofs require many skills including great familiarity and insight into the applications areas, engineering issues, mathematics, formal logic, the workings of the ACL2 proof tool, and a lot of persistence and dedication.

3 Conclusions

This chapter briefly described several specification and verification methods relevant to the area of the Update Plan formalism.

3.1 Integrated specification/verification methodologies

The most significant challenge to UP are integrated specification and verification methodologies such as ACL2. However these require a great deal of skill from the user, both in designing the specification in order to make it amenable to verification, and in performing the verification itself. A verification methodology is only useful if it is used. While Update Plans are not, as yet, embedded in a verification formalism (e.g. HOL) there do not seem to be any major obstacles to achieving this. This would provide a useful mechanism for producing intuitive, structured and verifiable specifications.

3.2 Specification methods

It is noticeable that all of the Update Plan specifications in section 1 are more compact but, arguably, more easily understood than the corresponding specifications using alternative methods. In contrast to many of the methods in section 1 the Update Plans formalism has a formal semantics, and can be applied to the specification of both hardware and software architectures.

3.2.1 Hardware

None of the HDLs described in section 1.1 has a formal semantics which has hindered the development of formal verification methodologies [79]. Although some pioneering work has
been done on verification of HDL specifications, this was either only a preliminary work to inspire further research [28], or developed verification methodologies for only a restricted subset/core of that language [5]. UP on the other hand has a simple but formally defined semantics.

The HDLs in section 1.1 are also generally not very well suited to specification of software architectures. Furthermore they are deterministic. This makes it relatively simple to specify sequences and modules consisting of sequentially connected elements, but difficult to specify parallel and/or synchronous processes—in general these must be emulated. On the other hand non-determinism is an inherent part of the semantics of Update Plans making the specification of parallel processes relatively easy. Indeed basic Update Plans were somewhat weak in specifying sequential behaviour. The introduction of sequential schemes and archetypes in Extended Update Plans has solved this, and sequential systems can now be specified with ease.

3.2.2 Concrete machines and instruction sets

Again, none of the methods described in section 1.2 has a formal semantics—and RTL in particular does not even have a standard syntax, as can be seen from the number of differing RTL notations. ISPS dates from the 1970s and is essentially an imperative procedural language with all the difficulties in formal reasoning that this entails. Though both RTLs and ISPS can provide detailed descriptions of the hardware/software interface neither is particularly suited to either the specification of hardware or the specification of software at a more abstract level. Nor are they strong in the description of parallel processes.

Though RAPIDE is a more recent development it still lacks a formal semantics, and verification can only be achieved through simulation. RAPIDE specifications also tend to be considerably longer than the corresponding UP specifications.

3.2.3 Parallelism

None of the specification methods discussed so far would be the tool of choice for defining parallel processes. The methods discussed in section 1.3 (UNITY, Γ and CHAM) were designed with parallelism explicitly in mind. However they are all far more suitable for the description of parallel programs rather than hardware. They also lack mechanisms, such as sequential Update Schemes and archetypes, for imposing the sequentiality and synchronisation that is almost always an essential part of hardware behaviour. In addition Γ and CHAM specifications are considerably harder to implement on real machines than the corresponding UP specifications.
3.2.4 Protocols

Of the specification methods discussed in section 1.4 LOTOS and DILL are the most suited to the description of systems across the hardware/software interface. LOTOS provides the higher level (protocol) specification, while DILL uses LOTOS to provide a limited hardware specification facility. This means, however, that writing non-trivial DILL specifications requires a good knowledge of both LOTOS and DILL. Also DILL is limited in the class of systems that it can specify—it cannot even be used to specify a simple RS flip-flop, since this contains cyclical connections. It also tends to introduce unexpected non-determinism unless used carefully.

3.2.5 Z/VDM

These methods are again more suited to higher level (software) specifications rather than hardware. In contrast to UP they do not produce executable specifications.

3.3 Summary

Though there is a large number of specification methods across the application area of Update Plans none is as suited to describing systems at the hardware/software interface as Update Plans. Almost all of the methods tend to be either hardware or software oriented. Most of the methods lack a formal semantics making verification of the systems specified difficult. Also the Extended Update Plans formalism can be used equally well for the specification of sequential and parallel systems, and the interaction between sequential and parallel processes. The other methods surveyed here are mostly suitable for describing sequential or parallel systems, but not both. In general Extended Update Plans provide a powerful method for specifying and reasoning about all the most common processes and systems that exist at the hardware/software interface.
Chapter 9

Conclusions and Future Research

Update Plans constitute a very flexible, clean formalism in which clear, elegant, compact, intuitive, simple to read and unambiguous low level specifications can be written. Contributions to the Update Plans formalism presented in this thesis can be divided into two major areas.

Firstly, in the application domain, instruction sets of four different machines (ranging from concrete to purely mathematical models) have been described using both original Update Plans and UP with the extensions described in the second part of the thesis.

Secondly, syntactic and semantic extensions to Update Plans made the whole formalism even more consistent and expressive. The semantic extensions in particular provide a way to make UP descriptions more compact and readable. This is mainly due to the advances in the area of synchronisation between parallel and sequential processes and the provision of a tool to design specifications with a better degree of modularity.

1 Update Plans applications

Update Plans applications featuring both in the first and second part of the thesis provide a good deal of evaluation of the Update Plans formalism which drove both the syntactic and semantic extensions. Specifications were produced for a variety of concrete and abstract machines.

The PDP-11 machine instruction set specification provided a useful comparison of an UP specification to the historically important formal specifications in [20, 69]. A significant subset of the PDP-11 instruction set, including the complete set of addressing modes, single and double operand instructions, condition code and program flow instruction, and the interrupt mechanism, was developed. This specification was unambiguous (in contrast to, for example, [20]), more compact than [20, 69] and significantly clearer. Consideration of a more detailed specification at the fetch/execute cycle drove the development of sequential update schemes—a semantic extension to the Update Plan model.
The SPARC-V9 was chosen to test UP's suitability for specifying modern RISC architectures. A partial specification of the SPARC-V9 architecture exists [50, 65], but this is aimed more at the specification of connections and communication than at a concise specification of the instruction set. The UP specification covers the SPARC-V9 register architecture (general purpose and floating-point registers, the SPARC-V9 register window mechanism and instructions); arithmetic, logical and floating-point instructions; data transfer instructions; and control transfer instructions. The main achievement of this specification, apart from its clarity and compactness, is that the extensions to the Update Plan formalism allowed simultaneous specification of the SPARC-V9's assembly language and machine code.

The Java Virtual Machine (JVM) was chosen to test Update Plans' suitability for describing more abstract instruction sets using a wide variety of types. Again a compact and readable specification of the JVM instruction set subset was produced. This specification exercise lead to some syntactic changes to the original Update Plans formalism.

The Parallel Random Access Machine (PRAM) [22] was chosen as a specification target because it provided a useful test-bed for the semantic extensions to Update Plans described in chapter 6. It was also a test of the expressive power of Update Plans in that the specification should be able to describe all of the PRAM's memory models within one specification. The "n-PRAM" specification is thus a family of specifications, parameterised by the number of RAMs, and the particular read/write model under consideration.

2 Update Plans extensions

The second class of advances achieved by the thesis are the extensions to the previous Update Plans formalism.

The syntactic extensions have a more significant impact than might at first sight seem. They now allow, in many cases, for specifications at multiple levels of abstraction within one update plan. The equivalence of the specifications is then implicit in that single specification, rather than requiring explicit proof through transformation. The SPARC-V9 specification is an example of such a specification, since it describes both SPARC-V9 assembly language and machine code.

The typing mechanism has been extended to allow a clearer match between "hardware types" (e.g. bits, bytes, words, etc.; registers, memory, etc.) and "software types" (e.g. integers, booleans, etc.; opcodes, operands, condition codes, instruction words, etc.). The JVM specification, for example, defines both the abstract and the physical structure of Java bytecode.

The problem in specifying parallel processes is often not the parallelism in itself, but the interaction between, and the synchronisation of, sequential and parallel processes. In the PRAM, for example, the massive parallelism of the memory access model must be syn-
chronised with the sequentiality of the fetch/execute cycles of each of the individual RAMs in the PRAM. When this research started parallel system could be specified using Update Plans, but synchronisation of the parallel processes thus specified required the introduction of artificial constructs unrelated to the architecture under consideration, leading to inelegant specifications that were hard to read. The introduction of sequential update schemes provides the formalism with a general synchronisation primitive that augments the non-deterministic model of Update Plans by explicitly stating the order in which updates will be applied. As a result it is now also possible to give more abstract descriptions in EUP than in UP.

The development of sequential update schemes led naturally to the development of sequential archetypes. These greatly extend the possibilities for information hiding and structure reuse by encapsulating a series of synchronised updates into a “module” (a sequential archetype). A sequential archetype can express an atomic action at one level of abstraction (e.g. an instruction execution) as a series of atomic actions at a lower level of specification (e.g. phases of the f/e cycle). This simplifies the task of proving the equivalence of the two levels of specification.

3 Future considerations

Suggestions for further UP-related work can be classified into three different areas. Firstly, the work on theoretical aspects of the formalism itself. Secondly, more research should be carried out on the application area of Update Plans. Finally, the implementation for the Update Plans formalism consisting of appropriate CAD tools should be developed.

3.1 Theory

3.1.1 Metrics

An annotation to indicate cost of applying an update or expanding an archetype coupled with a cost-derivation methodology should be developed. This (with a working UP simulator) would not only make the application of Update Plans to compiler optimisations possible as suggested in [60], but it would also prove useful in a variety of other applications such as automatic calculation of delays of (networks of) components in a logic circuit, simulation of latencies in a network environment or answering other performance-related questions.

3.1.2 Modularity

It would be useful to have a mechanism for restricting access of updates to only a selected set of cells in the memory by means of some kind of a modular structure or simply by restricting the domain of variables in some way. Not only would this automatically protect accesses to undesired parts of memory, but the current set of grounding rules [60] could be further
extended. The impact on the compactness and readability of UP specifications would in some cases be significant.

3.1.3 Verification

Verification, and possibly transformation methods should be developed, together with appropriate heuristic rules. Although a first step in this direction has been taken [60], further work is necessary.

On a higher level of abstraction, it is more likely that complex data manipulations will be described by a well-defined external function rather than Update Plans. On a lower level of abstraction, this function will in some cases be specified in terms of Update Plans. Therefore it will be necessary to find such a formalism and develop a methodology of proving semantic equivalence or containment between these two descriptions.

The majority of successful general-purpose proof checking systems that have proved themselves on several industrial-scale microprocessors are based on theorem proving in first or second order logic. Therefore a promising approach seems to be the formalisation of Update Plans in such a system (e.g. ACL2, PVS, HOL, Isabelle, IMPS or Nuprl).

Alternatively, use of existing model-checking tools (e.g. SPIN) could be made after the development of methodologies to transform UP descriptions into specification languages the tools use. It would then be possible to use the tool’s logic to specify and check desired properties.

As no method or tool is general enough and appropriate for all the varied, and sometimes conflicting requirements of hardware architecture description and analysis, a combination of existing methodologies will almost certainly need to be used. UP should be used in conjunction with other formal techniques and the traditional approaches such as testing through simulation.

Finally, methodologies to translate specification languages used by general theorem provers to hardware specification languages have been developed [8]. A similar approach could be taken to facilitate automatic synthesis and implementation of chips.

3.2 Applications

Although there have already been a lot of UP-based specifications of concrete and abstract machines, further validation of the Update Plans formalism is needed. This effort should concentrate on the specification of one of the concrete machines already specified, but on a lower level, e.g. microcode. These two levels should then be subject to formal verification for equivalence or containment by one of the methods suggested in section 3.1.3. As Extended Update Plans introduced a convenient way to describe sequences, libraries of components to describe the behaviour of basic building blocks starting from basic elements of logic circuits to blocks such as multipliers for an Update Plans simulator could be defined.
A true test of the (Extended) Update Plans formalism will be more sophisticated models of parallel computation such as F-PRAM [76], which introduce a communication network with latency and a synchronisation barrier among asynchronously running processors. On a slightly related note, attempts to enter into neighbouring domains such as communication protocol specification could be tried.

3.3 Implementation

A prototype implementation of Update Plans [60] which has been successfully used as a didactic aid has been developed. Completion of this work will greatly enhance the usefulness of the formalism not only in automatic derivation of specified properties, but also in prototyping and validation. A full implementation of UP (an UP simulator) will lead to better popularisation of UP, and helping to prompt more research in certain features of Update Plans and their refinement.
Bibliography


[47] G. W. Leibniz. *Œuvre philosophiques, latines et françaises, de feu Mr. de Leibniz, tirées de ses manuscrits, qui se conservent dans la bibliothèque royale à Hanovre et publiées par M. Rud. Eric Raspe*. Amsterdam/Leipzig, 1765.


Appendix A

Extended Update Plans Grammar

This appendix gives a full context free grammar for Extended Update Plans as specified in this thesis. The grammar is given as an ASF+SDF [43, 74] specification, a specification formalism developed at the University of Amsterdam and the Centrum voor Wiskunde en Informatica. The specification can be read as a context free grammar, with production rules reading from right to left. The following notational conventions apply:

- $S*$ defines zero or more repetitions of a symbol (non-terminal or literal) $S$;
- $S+$ defines one or more repetitions of a symbol $S$;
- $\{S \sep\}*$ defines zero or more repetitions of a symbol $S$ separated by the literal $\sep$;
- $\{S \sep\}+$ defines one or more repetitions of a symbol $S$ separated by the literal $\sep$;
- $($ denotes grouping of two or more symbols $';$);
- $[s]$ represents any one of the literals in the string $s$;
- $^{-}[s]$ represents any character not in the string $s$;
- $\t$ is the horizontal tabulation character;
- $\n$ is the newline character.

The {left} and {bracket} annotation is applied to disambiguate parsing, as is the information provided under a priorities header. The symbols appearing in the sorts section denote start symbols for the grammar.
module Main
imports Layout Schemes Types Stores Archetypes Updates
exports
  sorts SCRIPT
  context-free syntax
  CONFIGURATION "." PLAN → SCRIPT
  (ITEM "." )* → PLAN
  TYPE-DECLARATION → ITEM
  STORE-DECLARATION → ITEM
  ARCHETYPE-DEFINITION → ITEM
  UPDATE → ITEM

module Updates
imports Alternatives ParallelBlocks SequentialBlocks
exports
  context-free syntax
  ALTERNATIVES → UPDATE
  PARBLOCK → UPDATE
  SEQBLOCK → UPDATE

module Alternatives
imports Schemes
exports
  context-free syntax
  {SCHEME ";" }+ → ALTERNATIVES

module ParallelBlocks
imports Updates
exports
  context-free syntax
  "(\|" {UPDATE \|"}+ \")" → PARBLOCK
In the production rules for sequential blocks layout is forbidden between the pipeline symbol ‘|’ and the symbol/variable indicating a stage.

module SequentialBlocks
imports Lexicon Updates
exports
context-free syntax
“(" SEQBLOCK-ID-OPT "|" { (STAGE-OPT UPDATE) "|" }+ ")" → SEQBLOCK
SYMB-CONST → SEQBLOCK-ID-OPT
SYMB-CONST → SEQBLOCK-ID-OPT
SYMB-CONST | VARIABLE → STAGE-OPT
SYMB-CONST | VARIABLE → STAGE-OPT

In the production rules for update schemes layout is forbidden between a locator and the ‘[‘ or ‘]’ of the cell sequence of which it is a locator.

module Schemes
imports Terms
exports
context-free syntax
CONFIGURATION GUARD CONFIGURATION → SCHEME
REPEAT GUARD CONFIGURATION → SCHEME
TEXT CONTEXT → CONFIGURATION
TERM* → TEXT
LOC-EXPR* → CONTEXT
"=" → REPEAT
"==" → Guard
"[" TERM "]=>" → GUARD
"[" TERM-OPT "[" TEXT "]"]" → LOC-EXPR
"[" TERM-OPT "[" TEXT "]"]" → LOC-EXPR
LOCATOR "[" TEXT "]"]" → LEFT-SECTION
TERM-OPT → LOCATOR
module Archetypes
imports BasicArchetypes AmbidextrousArchetypes Lexicon Schemes Updates
exports
class context-free syntax
  BASIC-ARCHETYPE-DEFINITION → ARCHETYPE-DEFINITION
  AMBID-ARCHETYPE-DEFINITION → ARCHETYPE-DEFINITION
  "=" ARCHETYPE-BODY → BASIC-DEFINITION
  UPDATE | CONFIGURATION → ARCHETYPE-BODY
  ARCHETYPE-CALL → TERM
  ARCHETYPE-NAME INDEX-OPT PARAMETERS → ARCHETYPE-CALL
  INDEX → INDEX-OPT
  "(" TEXT "," OPT ")" → PARAMETERS

module BasicArchetypes
imports Archetypes Lexicon
exports
class context-free syntax
  BASIC-DECLARATION BASIC-DEFINITION+ → BASIC-ARCHETYPE-DEFINITION
  BASIC-ARCHETYPE-NAME PARAMETERS BASIC-DECLARATION → BASIC-DECLARATION

module AmbidextrousArchetypes
imports Archetypes Lexicon
exports
class context-free syntax
  AMBID-DECLARATION BASIC-DEFINITION+ → AMBID-ARCHETYPE-DEFINITION
  ARCHETYPE-NAME PARAMETERS TEXT → AMBID-DECLARATION
module Stores
imports Lexicon
exports
context-free syntax
  STORE-NAME → STORE-STRUCTURE
  "(" STORE-STRUCTURE ")" → STORE-STRUCTURE {bracket}
  STORE-STRUCTURE "++" STORE-STRUCTURE → STORE-STRUCTURE {left}
  STORE-STRUCTURE "|" STORE-STRUCTURE → STORE-STRUCTURE {left}
  STORE-STRUCTURE "*" → STORE-STRUCTURE
  "{" {STORE ","}+ "}" → STORE-DECLARATION
  STORE-IDENTIFIER "=" STORE-STRUCTURE → STORE
  STORE-IDENTIFIER STORE-STRUCTURE → STORE-IDENTIFIER
context-free priorities
  STORE-STRUCTURE "*" → STORE-STRUCTURE >
  STORE-STRUCTURE "++" STORE-STRUCTURE → STORE-STRUCTURE >
  STORE-STRUCTURE "|" STORE-STRUCTURE → STORE-STRUCTURE

dmodule Types
imports Terms Lexicon Stores
exports
context-free syntax
  {(TERM CONST-OPT) ","}+ "::" STORE-STRUCTURE → TYPE-DECLARATION

dmodule Terms
imports Archetypes BasicTerms Arithmetic Logic Ordering Memory
exports
context-free syntax
  "(" TERM ":=" STORE-STRUCTURE ")" → TERM
  TERM "+" → TERM
  TERM "++" TERM → TERM {left}
  TERM "|" TERM → TERM {left}
  "(" TERM ")" → TERM {bracket}
  TERM → TERM-OPT
context-free priorities
  TERM "+" → TERM > TERM "++" TERM → TERM > TERM "|" TERM → TERM
module BasicTerms
imports Lexicon
exports

context-free syntax
    NUMBER    -> TERM
    CHAR      -> TERM
    SYMB-CONST -> TERM
    VARIABLE  -> TERM
    DONTCARE  -> TERM

module Arithmetic
imports BasicTerms
exports

context-free syntax
    TERM "+" TERM -> TERM \{left\}
    TERM "-" TERM -> TERM \{left\}
    TERM "*" TERM -> TERM \{left\}
    TERM "/" TERM -> TERM \{left\}
    TERM "%" TERM -> TERM \{left\}
    TERM "^" TERM -> TERM \{left\}
    "-" TERM    -> TERM

context-free priorities
    "-" TERM -> TERM > TERM "^" TERM -> TERM >
    \{left: TERM "%" TERM -> \\
    TERM "/" TERM -> TERM \}
    \{left: TERM "*" TERM -> TERM \}
    \\{left: TERM "-" TERM -> TERM \}
    \\{left: TERM "+" TERM -> TERM \}

module Logic
imports BasicTerms
exports

context-free syntax
    TERM "\&" TERM -> TERM \{left\}
    TERM "\lor" TERM -> TERM \{left\}
    "-\!" TERM    -> TERM

context-free priorities
    "-\!" TERM -> TERM > TERM "\&" TERM -> TERM > TERM "\lor" TERM -> TERM
module Ordering
imports BasicTerms
exports
context-free syntax
TERM "<" TERM → TERM
TERM "≤" TERM → TERM
TERM "=" TERM → TERM
TERM "≠" TERM → TERM
TERM "≥" TERM → TERM
TERM ">" TERM → TERM

module Memory
imports Lexicon BasicTerms
exports
context-free syntax
"|" STORE-NAME "|" → TERM
"@" "(" TERM "," TERM ")" → TERM
"@" TERM → TERM

module Lexicon
exports
lexical syntax
[0-9]+ → NUMBER
"=" "[\n] "" → CHAR
[A-Z][A-Z'0-9.]* → SYMB-CONST
[a-z][a-z'0-9.]* → VARIABLE
"." → DONTCARE
[A-Z][A-Z'0-9.]* [a-z][A-Za-z'0-9.]* → STORE-NAME
[a-z][a-z'0-9.]* → BASIC-ARCHETYPE-NAME
[A-Z][A-Z'0-9.]* → COMMAND-ARCHETYPE-NAME
BASIC-ARCHETYPE-NAME → COMMAND-ARCHETYPE-NAME
COMMAND-ARCHETYPE-NAME → ARCHETYPE-NAME
NUMBER → INDEX
{" NUMBER "} → INDEX
{" NUMBER "} → INDEX
(" NUMBER ") → CONST-OPT
(" NUMBER ") → CONST-OPT
module Layout
exports
  lexical syntax
    \[\t\n]* \[\t\n]* LAYOUT
    "#" \[\n]* \[\n] LAYOUT
context-free restrictions
  LAYOUT? -/- \[\t\n]
Appendix B

PDP-11

\[
\begin{align*}
\text{REG}(r, v) & \quad = r[v]. \quad \# \text{register mode} \\
\text{AUTOINC}(b, v) & \quad = r[b] b[v]c \implies r[c]. \quad \# \text{autoincrement mode} \\
\text{AUTODEC}(a, v) & \quad = r[b] a[v]b \implies r[a]. \quad \# \text{autodecrement mode} \\
\text{INDEX}(b + d, v) & \quad = r[b] b+d[v]. \quad \# \text{index mode} \\
\text{REGDEF}(b, v) & \quad = r[b] b[v]. \quad \# \text{register deferred mode} \\
\text{AUTOINCDEF}(b_2, v) & \quad = r[b_1] b_1[b_2] b_2[v]c_2 \implies r[c_1]. \quad \# \text{autoincrement deferred mode} \\
\text{AUTODECDEF}(a_2, v) & \quad = r[a_1] a_1[a_2] a_2[v]b_2 \implies r[a_1]. \quad \# \text{autodecrement deferred mode} \\
\text{INDEXDEF}(b_2, v) & \quad = r[b_1] b_1+d[b_2] b_2[v]. \quad \# \text{index deferred mode} \\
\text{IMM}(v) & \quad = v. \quad \# \text{immediate mode} \\
\text{ABS}(a, v) & \quad = a[v]. \quad \# \text{absolute mode} \\
\text{REL}(p + d, v) & \quad = PC[p] p+d[v]. \quad \# \text{relative mode} \\
\text{RELDEF}(a, v) & \quad = PC[p] p+d[a] a[v]. \quad \# \text{relative deferred mode} \\
\end{align*}
\]

sop(v) = REG(-, v). \\
\quad = AUTOINC(-, v). \\
\quad = AUTODEC(-, v). \\
\quad = INDEX(-, v). \\
\quad = REGDEF(-, v). \\
\quad = AUTOINCDEF(-, v). \\
\quad = AUTODECDEF(-, v). \\
\quad = INDEXDEF(-, v). \\
\quad = IMM(-, v). \\
\quad = ABS(-, v). \\
\quad = REL(-, v). \\
\quad = RELDEF(-, v).

dop(a, v) = REG(a, v). \\
\quad = AUTOINC(a, v). \\
\quad = AUTODEC(a, v). \\
\quad = INDEX(a, v). \\
\quad = REGDEF(a, v). \\
\quad = AUTOINCDEF(a, v). \\
\quad = AUTODECDEF(a, v). \\
\quad = INDEXDEF(a, v). \\
\quad = IMM(a, v). \\
\quad = ABS(a, v). \\
\quad = REL(a, v). \\
\quad = RELDEF(a, v).
APPENDIX B: PDP-11

\{Nibble = Bit Bit Bit Bit, Byte = Nibble Nibble, Word = Byte Byte\}.

\texttt{b_1, b_2, b_v, b_c :: Bit.}
\texttt{nibble :: Nibble.}
\texttt{byte :: Byte.}
\texttt{PSW[byte nibble]} \texttt{CC_N[b_1][CC_N[b_2][CC_V[b_v]CC_C[b_c]]].}

\texttt{cc(v) = CC_N[(v < s 0) (v = 0) (\neg (MIN_s \le v \le \text{MAX_s})) (v > u \text{MAX_u})].}

\texttt{arithm(-, 0) = CLR. \# clear destination}
\texttt{arithm(x, \neg x) = COM. \# complement destination}
\texttt{arithm(x, x + 1) = INC. \# increment destination}
\texttt{arithm(x, x - 1) = DEC. \# decrement destination}
\texttt{arithm(x, \neg x) = NEG. \# negate destination}
\texttt{arithm(x, x + c) = ADC CC_C[c]. \# add carry to destination}
\texttt{arithm(x, x - c) = SBC CC_C[c]. \# subtract carry from destination}
\texttt{arithm(x, x / 2) = ASR. \# arithmetic shift right destination}
\texttt{arithm(x, x \times 2) = ASL. \# arithmetic shift left destination}
\texttt{arithm(_, \neg 1 \times n) = STX CC_N[n]. \# sign extend destination}
\texttt{arithm(x, r) dop(a, x) \rightarrow a[r] cc(r).}

\texttt{v_1, v_0 :: Byte.}
\texttt{SWAB dop(a, _) = a[v_1 v_0] \rightarrow a[v_0 v_1]. \# swap bytes of destination}
\texttt{arithm(x, x + 1) = INC \rightarrow W. \# increment destination}
\texttt{arithm(x, x + 1) = INCB \rightarrow B. \# increment destination byte}
\texttt{arithm(x, r) dop(a, x) \rightarrow rcc(a, r, arithm(-, -)).}

\texttt{type(W) = Word.}
\texttt{type(B) = Byte.}
\texttt{rcc(a, v, w) = cc(v, w) \rightarrow a[\langle v :: type(w)\rangle].}
\texttt{cc(v, w) = CC_N[(v < s 0) (v = 0) (\neg (MIN_s \le v \le \text{MAX_s})) (v > u \text{MAX_u})].}
\texttt{cc(v, B) = CC_N[(v < bs 0) (v = 0) (\neg (MIN_B_s \le bs v \le bs \text{MAX_B_s})) (v > bu \text{MAX_B_u})].}
\texttt{TST dop(-, x) \rightarrow cc(x). \# test destination}
\texttt{ROR dop(a, x) CC_C[c] \rightarrow a[(x>>1)|(c<<15)] CC_C[(x\&1 :: Bit)].}
\texttt{ROL dop(a, x) CC_C[c] \rightarrow a[(x<<1)|c] CC_C[(x>>15)\&1 :: Bit].}
\texttt{MOV sop(v) dop(a, _) \rightarrow a[v].}
app\ith(\textit{x}, \textit{y}, \textit{x} + \textit{y}) = \text{ADD}. \ # \textit{add source to destination}

app\ith(\textit{x}, \textit{y}, \textit{x} - \textit{y}) = \text{SUB}. \ # \textit{subtract source from destination}

app\ith(\textit{x}, \textit{y}, \neg \textit{x} \& \textit{y}) = \text{BIC}. \ # \textit{bit clear destination from source}

app\ith(\textit{x}, \textit{y}, \textit{x}|y) = \text{BIS}. \ # \textit{bit set destination from source}

\text{cmps}(\textit{x}, \textit{y}, \textit{x} - \textit{y}) = \text{CMP}. \ # \textit{compare source to destination}

\text{cmps}(\textit{x}, \textit{y}, \textit{x} \& \textit{y}) = \text{BIT}. \ # \textit{bit test source and destination bytes}

\text{cmps}(\textit{x}, \textit{y}, \textit{r}) \ \text{sop}(\textit{x}) \ \text{dop}(\textit{a}, \textit{y}) \Longrightarrow \textit{a}[\textit{r}] \ \text{cc}(\textit{r}).

\text{SEZ} \Longrightarrow \text{CCZ}[\text{TRUE}]. \ \text{CLZ} \Longrightarrow \text{CCZ}[\text{FALSE}].

\text{SEV} \Longrightarrow \text{CCV}[\text{TRUE}]. \ \text{CLV} \Longrightarrow \text{CCV}[\text{FALSE}].

\text{SEC} \Longrightarrow \text{CCC}[\text{TRUE}]. \ \text{CLC} \Longrightarrow \text{CCC}[\text{FALSE}].

\text{SCC} \Longrightarrow \text{CCN}[\text{TRUE}] \ \text{CCZ}[\text{TRUE}] \ \text{CCV}[\text{TRUE}] \ \text{CCC}[\text{TRUE}].

\text{CCC} \Longrightarrow \text{CCN}[\text{FALSE}] \ \text{CCZ}[\text{FALSE}] \ \text{CCV}[\text{FALSE}] \ \text{CCC}[\text{FALSE}].

\text{branch}(\text{true}) = \text{BR}. \ # \text{branch always}

\text{branch}(\neg \text{z}) = \text{BNE} \ \text{CCZ}[\text{z}]. \ # \neq 0

\text{branch}(\text{z}) = \text{BEQ} \ \text{CCZ}[\text{z}]. \ # = 0

\text{branch}(\neg (\text{n} \wedge \text{v})) = \text{BGE} \ \text{CCN}[\text{n}] \ \text{CCV}[\text{v}]. \ # \geq 0

\text{branch}(\text{n} \wedge \text{v}) = \text{BLT} \ \text{CCN}[\text{n}] \ \text{CCV}[\text{v}]. \ # < 0

\text{branch}(\neg (\text{z}\neg (\text{n} \wedge \text{v}))) = \text{BGT} \ \text{CCZ}[\text{z}] \ \text{CCN}[\text{n}] \ \text{CCV}[\text{v}]. \ # > 0

\text{branch}(\text{z}\neg (\text{n} \wedge \text{v})) = \text{BLE} \ \text{CCZ}[\text{z}] \ \text{CCN}[\text{n}] \ \text{CCV}[\text{v}]. \ # \leq 0

\text{branch}(\neg \text{n}) = \text{BPL} \ \text{CCN}[\text{n}]. \ # +

\text{branch}(\text{n}) = \text{BMI} \ \text{CCN}[\text{n}]. \ # -

\text{branch}(\neg (\text{c} \neg \text{z})) = \text{BHI} \ \text{CCZ}[\text{c}] \ \text{CCZ}[\text{z}]. \ # \text{higher (unsigned comparison)}

\text{branch}(\neg \text{c}) = \text{BCC} \ \text{CCZ}[\text{c}]. \ # \text{carry clear}

= \text{BHIS} \ \text{CCZ}[\text{c}]. \ # \text{higher or same (unsigned compar.)}

\text{branch}(\text{c}) = \text{BCS} \ \text{CCZ}[\text{c}]. \ # \text{carry set}

= \text{BLO} \ \text{CCZ}[\text{c}]. \ # \text{lower (unsigned comparison)}

\text{branch}(\text{c}\text{z}) = \text{BLOS} \ \text{CCZ}[\text{c}] \ \text{CCZ}[\text{z}]. \ # \text{lower or same (unsigned compar.)}

\text{branch}(\neg \text{v}) = \text{BVC} \ \text{CCV}[\text{v}]. \ # \text{overflow clear}

\text{branch}(\text{v}) = \text{BVS} \ \text{CCV}[\text{v}]. \ # \text{overflow set}

\text{PC}[	ext{pc}] = \text{pc}[	ext{branch}(\text{c}) \text{ d} \text{pc} = [\text{c}] \Longrightarrow \text{PC}[	ext{cp} + \text{d}]; \ # \text{branch}

" \Longrightarrow \text{PC}[	ext{cp}]. \ # \text{no branch}

\text{JMP} \ \text{dop}(\neg, \text{a}) \Longrightarrow \text{PC}[\text{a}].

\text{reg}(\text{r}, \text{v}) = \text{r}[\text{v}].
APPENDIX B: PDP-11

PC[pc] pc[JSR r dop(a,_,)]qc reg(r, v) reg(6, tp)
⇒ PC[a] reg(6, sp) sp[v]tp reg(r, qc).

RTS r reg(r, pc) reg(6, sp) sp[v]tp ⇒ PC[pc] reg(6, tp) reg(r, v).

trap(148, 168) = BPT. # breakpoint trap
trap(208, 228) = IOT. # input/output trap
trap(308, 328) = EMT. # emulator trap
trap(348, 368) = TRAP. # TRAP
trap(Pc7 pswa) reg(6, tp) Pca[pc] pswa[psw] PC[pc,] PSW[psw]
⇒ sp[pc, psws]tp reg(6, sp) PC[pc] PSW[psw].

ret() = RTI. # return from interrupt
⇒ RTT. # return from trap

NOP ⇒ . # no operation
Appendix C

SPARC-V9

\{\text{Bit}\}.
\{\text{Byte}(01_2) = \{\text{Bit}\}8, \text{Halfword}(10_2) = \{\text{Bit}\}16, \text{Word}(00_2) = \{\text{Bit}\}32, \text{Extended\_word} = \{\text{Bit}\}64\}.
\{\text{Fp\_single} = \{\text{Bit}\}32, \text{Fp\_double} = \{\text{Bit}\}64, \text{Fp\_quad} = \{\text{Bit}\}128\}.

\text{nPC :: Constant.}
\text{IMM13(1_2) :: Bit.}
\text{OP: ARM(10_2), OP: LS(11_2), OP: BRS(00_2), OP: CLL(01_2) :: \{\text{Bit}\}2.}
\text{BRZ(01_2), BRLEZ(10_2), BRLZ(11_2) :: \{\text{Bit}\}2.}
\text{ADD(000_2), SUB(100_2), SAVE(100_2), RESTORE(101_2) :: \{\text{Bit}\}3.}
\text{BPA(1000_2), BPN(0000_2), BPNE(1001_2), BPE(0001_2) :: \{\text{Bit}\}4.}
\text{LDSTUB(00 11 01_2) :: \{\text{Bit}\}6.}
\text{FADD(001 00 00_2), FSUB(001 00 01_2) :: \{\text{Bit}\}7.}
\text{REGRES(000 00 00 00_2) :: \{\text{Bit}\}9.}

\text{rr(0,0) = .} \quad \# \text{reading r[0] yields 0}
\text{rr(a,v) = a[v]} \quad \quad \left[ 1 \leq a < 8 \right] \Rightarrow \# \text{global registers}
\text{= CWP[w] outr(a-8,v,w)} \quad \left[ 8 \leq a < 16 \right] \Rightarrow .
\text{= CWP[w] locr(a-16,v,w)} \quad \left[ 16 \leq a < 24 \right] \Rightarrow .
\text{= CWP[w] inr(a-24,v,w)} \quad \left[ 24 \leq a < 32 \right] \Rightarrow .

\text{rw(a,v)} a = CWP[w] rw(a,v,w).
\text{rw(a,v,w) = =\{ 0 = a \} \Rightarrow .} \quad \# \text{writing r[0] has no effect}
\text{= =\{ 1 \leq a < 8 \} \Rightarrow a[v]. \# global registers}
\text{= =\{ 8 \leq a < 16 \} \Rightarrow outr(a-8,v,w).}
\text{= =\{ 16 \leq a < 24 \} \Rightarrow locr(a-16,v,w).}
\text{= =\{ 24 \leq a < 32 \} \Rightarrow inr(a-24,v,w).}
APPENDIX C: SPARC-V9

\[
\text{outr}(a, v, w) = \text{ilocr}(a, v, w).
\]
\[
\text{outw}(a, v, w) = \text{ilocw}(a, v, w).
\]
\[
\text{locr}(a, v, w) = \text{ilocr}(a + 8, v, w).
\]
\[
\text{locw}(a, v, w) = \text{ilocw}(a + 8, v, w).
\]
\[
\text{inr}(a, v, w) = \text{ilocr}(a, v, w + 1).
\]
\[
\text{inw}(a, v, w) = \text{ilocw}(a, v, w + 1).
\]
\[
\text{ilocr}(a, v, w) = (\text{NWINDOWS} - 1 - (w \mod \text{NWINDOWS})) \times 16 + a[v].
\]
\[
\text{ilocw}(a, v, w) = (\text{NWINDOWS} - 1 - (w \mod \text{NWINDOWS})) \times 16 + a[v].
\]

\[
\text{fr}(a, v, s) = a[(v :: \text{Fp_single}) \Rightarrow (s = 0) \Rightarrow a[(v :: \text{Fp_double}) \Rightarrow (s = 0) \Rightarrow a[(v :: \text{Fp_quad})]
\]
\[
\text{fw}(a, v, s) = a[(s = 0) \Rightarrow a[(v :: \text{Fp_single})]
\]
\[
\text{sop1}(v) = \text{rr}(a, v) a.
\]
\[
\text{sop2}(v) = \text{REGRES}(v).
\]
\[
\text{REGRES}(v) a = \text{rr}(a, v).
\]
\[
\text{IMM13}(\text{sign_ext}(v)) = v :: \{\text{Bit}\}13.
\]
\[
\text{iccr}(c) = \text{CCR: ICC: C}[c].
\]
\[
\text{iccrb}(c) = 12 \text{ iccr}(c).
\]
\[
\text{iccrb}(0) = 02.
\]
\[
\text{ccw}(v) = \text{CCR: ICC: C}[v < 4s 0] (v = 0) (\neg(MIN_{4s} \leq v \leq MAX_{4s})) (v > 4s MAX_{4s})
\]
\[
\text{ccw}(v) = \text{CCR: ICC: C}[v < 2s 0] (v = 0) (\neg(MIN_{2s} \leq v \leq MAX_{2s})) (v > 2s MAX_{2s}).
\]
\[
\text{c :: Bit}.
\]
\[
\text{aloper}(x, y, x + y + c) = 02 \text{ ccwb}(\$3) \text{ iccrb}(c) \text{ ADD}.
\]
\[
\text{aloper}(x, y, x - y - c) = 02 \text{ ccwb}(\$3) \text{ iccrb}(c) \text{ SUB}.
\]
APPENDIX C: SPARC-V9

n :: Bit.
op :: {Bit}2.
neg(v) = \neg \nu \equiv \begin{cases} v = 12 & \iff \# empty body if v \neq 12 \\
1 \& \nu = \{ \text{op} = 012 \} \iff \# \text{bit-wise and} \\
| = \{ \text{op} = 102 \} \iff \# \text{bit-wise or} \\
\wedge = \{ \text{op} = 112 \} \iff \# \text{bit-wise xor} 
\end{cases}
alopec(x, y, \neg(n) (x \text{ lop}(op) y)) = 02 \text{ ccwb}($3) 02 n \text{ op}.

PC[pc] pc[pc()]qc nPC[pc'] pc'[pc()]qc nPC[pc' + 4].

OP:ARM rw(\_, r) alopex, y, r) sop1(x) sop2(y) \iff \# OP:ARM = 102

sr(x, y, x + y + c, v + 1) =
\begin{align*}
10 & \text{ ccwb}($3) \text{ iccr}(c) \text{ SAVE} \text{ CWP}[w] \iff \# \text{SAVE} = 1002
sr(x, y, x + y + c, v - 1) =
\begin{align*}
10 & \text{ ccwb}($3) \text{ iccr}(c) \text{ RESTORE} \text{ CWP}[w] \iff \# \text{RESTORE} = 1012
\end{align*}
\end{align*}

OP:ARM rw(a, r, w) a sr(x, y, r, w) sop1(x) sop2(y) \iff \text{CWP}[w \% 13WINDOES].

r :: Byte.
ldstub(a, r) = LDSTUB a[r] \iff \text{LDSTUB} = 00 11111112.

OP:LS rw(\_, r) ldstub(x + y, r) sop1(x) sop2(y) \iff \# OP:LS = 112

\begin{align*}
\text{signed}(s) = \text{zero_fill} = \{ s = 0 \} \iff \\
\text{sign_ext} = \{ s = 1 \} \iff 
\end{align*}

\{Integer = Byte|Halfword|Word\}.

asi :: Byte.
sopasi(x, y, asi) = sop1(x) 02 asi sop1(y).
\begin{align*}
= \text{sop1}(x) \text{ IMM13}(y) \text{ ASI}[asi].
\end{align*}

s :: Bit.
lias(asi, a, signed(s) (r)) = s 02 \text{ Integer asi:a}(r :: \text{Integer}].
lias(asi, a, r) = 10112 asi:a(r :: \text{Extended_word}].

OP:LS rw(\_, r) 012 lias(asi, x + y, r) sopasi(x, y, asi) \iff .

sias(asi, a, r) = 012 \text{ Integer} \iff asi:a(r :: \text{Integer}].
sias(asi, a, r) = 11102 \iff asi:a(r :: \text{Extended_word}].

OP:LS rr(\_, r) 012 sias(asi, x + y, r) sopasi(x, y, asi) \iff .
APPENDIX C: SPARC-V9

s :: {Bit}2.

farithm(x, y, x + y) = FADD. # FADD = 0010 0002
farithm(x, y, x - y) = FSUB. # FSUB = 0010 0012
OP:ARM fr(-, r, s) 1101002 fr(-, x, s) f arithm(x, y, r) s fr(-, y, s) \rightarrow.

f cmp(x, y) =
\begin{align*}
&= [x = y] \rightarrow 00_2. \\
&= [x < y] \rightarrow 01_2. \\
&= [x > y] \rightarrow 10_2. \\
&= [x \neq y] \rightarrow 11_2. \quad \# \text{unordered} \ (x \text{ or/and} \ y \text{ is NaN})
\end{align*}

v :: {Bit}2.

fccnr(fcc, v) = FSR: FCC: fcc[v].

fccnw(fcc, v) = \quad \rightarrow FSR: FCC: fcc[v].

OP:ARM 0002 cc1 cc0 1101012 fr(-, x, s) 00101002 s fr(-, y, s) \rightarrow

fccnw(cc1 || cc0, f cmp(x, y)).

ldfa(asi, a, r, 01) = 002 asi:a[(r :: Fp.single)]. # LDFA
ldfa(asi, a, r, 10) = 112 asi:a[(r :: Fp.double)]. # LDDFA
ldfa(asi, a, r, 11) = 102 asi:a[(r :: Fp.quad)]. \quad # LDQFA
OP:LS fw(-, r, s) 11002 ldfa(asi, x + y, r, s) sopasi(x, y, asi) \rightarrow.

stfa(asi, a, r, 01) = 002 \rightarrow asi:a[(r :: Fp.single)]. # STFA
stfa(asi, a, r, 10) = 112 \rightarrow asi:a[(r :: Fp.double)]. # STDFA
stfa(asi, a, r, 11) = 102 \rightarrow asi:a[(r :: Fp.quad)]. \quad # STQFA
OP:LS fr(-, r, s) 11012 stfa(asi, x + y, r, s) sopasi(x, y, asi) \rightarrow.


a, p, n :: Bit. \quad \# \text{annul, prediction and "negate condition" bits}
da16hi :: {Bit}2. \quad \# 2-bit PC-relative displacement
rs1 :: {Bit}5. \quad \# \text{address of the 1st source register}
da16lo :: {Bit}14. \quad \# 14-bit PC-relative displacement

bpr cond(v, neg(n) (v = 0)) = n BRZ. \quad \# \text{Branch on Register Zero}
bpr cond(v, neg(n) (v \leq 0)) = n BRLEZ. \quad \# \" \text{Register Less Than or Equal to Zero}\"
bpr cond(v, neg(n) (v < 0)) = n BRLZ. \quad \# \" \text{Register Less Than Zero}\"
ea(pc, displ) pc + (4 \times \text{sign_ext}(\text{displ})) = .
APPENDIX C: SPARC-V9

PC[pc] nPC[npc] pc[OP: BRS a 02 bpr.cond(v, c) 0112 d16hi p rw(rs1, v) d16lo]
    = (c) \rightarrow PC[npc] nPC[ea(pc, d16hi | d16lo)].
" = [\neg c \land a = 0 ] \rightarrow PC[npc] nPC[npc + 4].  \# instruction in delay slot executed
" = [\neg c \land a = 1 ] \rightarrow PC[npc + 4] nPC[npc + 8].  \# instruction in delay slot annulled

n, z, v, c :: Bit. \# bits of the ICC/XCC register
ccl, ccO :: Bit. \# condition codes selection
cond :: {Bit}4. \# the condition field
disp19 :: {Bit}19. \# branch's PC-relative displacement

bp.cond(_, TRUE) = BPA. \# Branch Always
bp.cond(_, FALSE) = BPN. \# " Never
bp.cond(ncc, \neg z) = CCR:ncc[n z v c] BPNE. \# " on \neq
bp.cond(ncc, z) = CCR:ncc[n z v c] BPE. \# " on =
bp.cond(ncc, \neg(z \lor (n \lor v))) = CCR:ncc[n z v c] BPG. \# " on >
bp.cond(ncc, z \lor (n \lor v)) = CCR:ncc[n z v c] BPLE. \# " on \leq
bp.cond(ncc, \neg(n \lor v)) = CCR:ncc[n z v c] BPGE. \# " on \geq
bp.cond(ncc, n \lor v) = CCR:ncc[n z v c] BPL. \# " on <
bp.cond(ncc, \neg(c \lor v)) = CCR:ncc[n z v c] BPGU. \# " on > Unsigned
bp.cond(ncc, c \lor v) = CCR:ncc[n z v c] BPLEU. \# " on \leq Unsigned
bp.cond(ncc, \neg(c)) = CCR:ncc[n z v c] BPCS. \# " on < Unsigned
bp.cond(ncc, n) = CCR:ncc[n z v c] BPPOS. \# " on Positive
bp.cond(ncc, n \lor c) = CCR:ncc[n z v c] BPNEG. \# " on Negative
bp.cond(ncc, \neg(v)) = CCR:ncc[n z v c] BPVC. \# " on Overflow Clear
bp.cond(ncc, v) = CCR:ncc[n z v c] BPVS. \# " on Overflow Set

PC[pc] nPC[npc] pc:COND[cond]
pc[OP: BRS a]pc:COND[bp.cond(cc1 | cc0, c) 0012 cc1 cc0 p disp19]
    = (c) \rightarrow PC[npc] nPC[npc + 4].
" = (c \land a = 0 ) \rightarrow PC[npc + 4] nPC[npc + 8].
" = (c \land a = 1 ) \rightarrow PC[ea(pc, disp19)].
" = (c \land a = 1 \land cond = BPA ) \rightarrow PC[ea(pc, disp19)] nPC[ea(pc, disp19) + 4].
" = (c \land a = 1 \land cond \neq BPA ) \rightarrow PC[npc + 8] nPC[ea(pc, disp19)].

fbp.cond(_, TRUE) = FBPA. \# Branch Always
fbp.cond(_, FALSE) = FBN. \# " Never
fbp.cond(fcc, v = U) = fccnr(fcc, v) FBPU. \# " on Unordered
fbp.cond(fcc, v = G) = fccnr(fcc, v) FBPG. \# " on >
fbp.cond(fcc, v = U \lor v = G) = fccnr(fcc, v) FBPUG. \# " on Unordered or >
**APPENDIX C: SPARC-V9**

\[
\begin{align*}
\text{fbp\_cond}(fcc, v = L) &= fccnr(fcc, v) \text{ FBPL. } \# \text{ on } < \\
\text{fbp\_cond}(fcc, v = U \lor v = L) &= fccnr(fcc, v) \text{ FBPUL. } \# \text{ on Unordered or } < \\
\text{fbp\_cond}(fcc, v = L \lor v = G) &= fccnr(fcc, v) \text{ FBPLG. } \# \text{ on } < \text{ or } > \\
\text{fbp\_cond}(fcc, v = E) &= fccnr(fcc, v) \text{ FBPNE. } \# \text{ on } \neq \\
\text{fbp\_cond}(fcc, v = E) &= fccnr(fcc, v) \text{ FBPE. } \# \text{ on } = \\
\text{fbp\_cond}(fcc, v = U \lor v = E) &= fccnr(fcc, v) \text{ FBPUE. } \# \text{ on Unordered or } = \\
\text{fbp\_cond}(fcc, v = L \lor v = E) &= fccnr(fcc, v) \text{ FBPGE. } \# \text{ on Unordered or } \geq \\
\text{fbp\_cond}(fcc, v = L \lor v = E) &= fccnr(fcc, v) \text{ FBPL. } \# \text{ on Unordered or } \leq \\
\text{fbp\_cond}(fcc, v = U) &= fccnr(fcc, v) \text{ FBP0. } \# \text{ on } \text{Ordered} \\
\end{align*}
\]

\[
\# e:: \text{Bit} \cdot lgu:: \{\text{Bit}\}^3.
\]

\[
\# \text{fbp\_cond}(fcc, \neg(e) (dm(\neg v) \& (\neg lgu + 1))) = fccnr(fcc, v) \text{ e lgu.}
\]

\[
\begin{align*}
\text{PC}[pc] \text{ nPC}[npc] \text{ pc:COND[cond]} \\
\text{pc}[OP: BRS a][pc:COND[fbp\_cond(cc1 \mid cc0, c) 1012 cc1 cc0 p disp19)] \\
= [\neg c \land a = 0] \Rightarrow \text{PC}[npc] \text{ nPC}[npc + 4]. \\
" = [\neg c \land a = 1] \Rightarrow \text{PC}[npc + 4] \text{ nPC}[npc + 8]. \\
" = [c \land a = 0] \Rightarrow \text{PC}[npc] \text{ nPC}[ea(pc, disp19)]. \\
" = [c \land a = 1 \land \text{cond }= \text{ FBPA}] \Rightarrow \text{PC}[ea(pc, disp19)] \text{ nPC}[ea(pc, disp19) + 4]. \\
" = [c \land a = 1 \land \text{cond }\neq \text{ FBPA}] \Rightarrow \text{PC}[npc] \text{ nPC}[ea(pc, disp19)].
\end{align*}
\]

\[
\text{disp30}:: \{\text{Bit}\}^{30}.
\]

\[
\begin{align*}
\text{PC}[pc] \text{ nPC}[npc] \text{ CWP}[w] \text{ pc}[OP:CLL disp30] \# \text{OP:CLL }= 012 \\
\Rightarrow \text{PC}[npc] \text{ nPC}[ea(pc, disp30)] \text{ rw}(15, pc, w).
\end{align*}
\]

\[
\begin{align*}
\text{PC}[pc] \text{ nPC}[npc] \text{ CWP}[w] \text{ pc}[OP:ARM 000002 1110012 sop1(x) sop2(y)] \\
\Rightarrow \text{PC}[npc] \text{ nPC}[x + y] \text{ CWP}[(w - 1) \% \text{NWINDOWS}].
\end{align*}
\]

\[
\begin{align*}
\text{OP:BRS 000002 1002 002 000002 000002 000002 000002 } \Rightarrow .
\end{align*}
\]
APPENDIX C: SPARC-V9

Figure C.1: Example of archetype expansion

1. PC[pc] nPC[npc] pc[OP:ARM rw(, pc) 1110002 sop1(x) sop2(y)]
2. \[ \Rightarrow PC[npc] nPC[x + y]. \] # [3.22]
3. \[ \# [2.2] rw(a, v) a = CWP[w] rw(a, v, v). \] \[ = a, pc = v \]
4. PC[pc] nPC[npc] CWP[w] pc[OP:ARM rw(a, v, w) a 1110002 sop1(x) sop2(y)]
5. \[ \Rightarrow PC[npc] nPC[x + y]. \]
6. \[ \# [3.1] sop1(v1) = rr(a1, v1) a1. \] \[ x = v1 \]
7. PC[pc] nPC[npc] CWP[w] pc[OP:ARM rw(a, v, w) a 1110002 rr(a1, v1) a1 sop2(y)]
8. \[ \Rightarrow PC[npc] nPC[v1 + y]. \]
9. \[ \# [3.1] sop2(v2) = IMM13(v2). \] \[ y = v2 \]
10. PC[pc] nPC[npc] CWP[w] pc[OP:ARM rw(a, v, w) a 1110002 rr(a1, v1) a1 IMM13(v2)]
11. \[ \Rightarrow PC[npc] nPC[v1 + v2]. \]
12. \[ \# [2.2] rw(a, v, w) = \{ 1 \leq a < 8 \} a[v]. \]
13. PC[pc] nPC[npc] CWP[w] pc[OP:ARM a 1110002 rr(a1, v1) a1 IMM13(v2)]
14. \[ = \{ 1 \leq a < 8 \} PC[npc] nPC[v1 + v2] a[v]. \]
15. \[ \# [2.1] rr(a1, v1) = CWP[w1] inr(a1 - 24, v1, v1) = 24 \leq a1 < 32 \Rightarrow . \]
17. \[ = \{ 1 \leq a < 8 \} (24 \leq a1 < 32) PC[npc] nPC[v1 + v2] a[v]. \]
18. \[ \# [3.2] IMM13(sign_ext(v3)) = (v3 :: {Bit}13). \] \[ v2 = sign_ext(v3) \]
20. \[ = \{ 1 \leq a < 8 \} PC[npc] nPC[v1 + v2] a[v]. \]
21. \[ \# [2.3] inr(a2, v4, w1) = ilocr(a2, v4, w1 + 1). \] \[ a1 - 24 = a2, v1 = v4 \]
22. PC[pc] nPC[npc] CWP[w] CWP[w1] pc[OP:ARM a 1110002 ilocr(a2, v4, w1) a1 IMM13(v3 :: {Bit}13)]
23. \[ = \{ 1 \leq a < 8 \} (24 \leq a1 < 32) PC[npc] nPC[v1 + v2] a[v]. \]
24. \[ \# [2.4] ilocr(a2, v4, w2) = (NWINDOWS - 1 - (w2 \% NWINDOWS)) \times 16 + a2[v4]. \] \[ w1 + 1 = w2 \]
26. PC[OP:ARM a 1110002 a1 IMM13 (v3 :: {Bit}13)]
27. \[ = \{ 1 \leq a < 8 \} (24 \leq a1 < 32) PC[npc] nPC[v1 + v2] a[v]. \]
28. \[ \# after resolution and rearrangement of locators and text \]
29. PC[pc] nPC[npc] CWP[w] (NWINDOWS - 1 - ((w + 1) \% NWINDOWS)) \times 16 + (a1 - 24)[x]
30. PC[OP:ARM - 1110002 a1 IMM13 (v3 :: {Bit}13)] \# field format of current instruction
31. \[ = \{ 1 \leq a < 8 \} (24 \leq a1 < 32) PC[npc] nPC[x + sign_ext(v3)]..pc. \] \# effects of the instruction on configuration
Appendix D

Java Virtual Machine

{Bit}.
{Boolean = {Bit}32, Byte = {Bit}32, Char = {Bit}32, Short = {Bit}32,
Int = {Bit}32, Float = {Bit}32, Reference = {Bit}32,
ReturnAddress = {Bit}32, Long = {Bit}64, Double = {Bit}64}.

{Category1 = Boolean | Byte | Char | Short | Int | Float | Reference | ReturnAddress,
Category2 = Long | Double,
Category12 = Category1 | Category2}.

{UnsignedByte = {Bit}8, SignedByte = {Bit}8,
UnsignedShort = {Bit}16, SignedShort = {Bit}16}.

{Word = Category1 | Category1 | Category2}.

POP(87), POP2(88), DUP(89), DUP_X1(90), DUP_X2(91), DUP2(92)
DUP2_X1(93), DUP2_X2(94), SWAP(95),
ILOAD(21), ILOAD_0(26), ILOAD_1(27), ILOAD_2(28), ILOAD_3(29),
ISTORE(54), ISTORE_0(59), ISTORE_1(60), ISTORE_2(61), ISTORE_3(62),
IADD(96), ISUB(100), IMUL(104), IDIV(108), IREM(112),
IAND(126), IOR(128), IXOR(130), ISHL(120), ISHR(122), IUSHR(124),
INEG(116), IINC(132),
ICONST_M1(2), ICONST_0(3), ICONST_1(4), ICONST_2(5), ICONST_3(6), ICONST_4(7),
ICONST_5(8),
BIPUSH(16), SIPUSH(17),
IFEQ(153), IFNE(154), IFLT(155), IFGE(156), IFGT(157), IFLE(158),
IF_ICMPEQ(159), IF_ICMPNE(160), IF_ICMPLT(161), IF_ICMPGE(162), IF_ICMPGT(163),
IF_ICMPLE(164),
GOTO(167), GOTO_W(200), JSR(168), JSR_W(201), RET(169),
WIDE(196) :: UnsignedByte.
b_s :: SignedByte.
b_u, opc :: UnsignedByte.
s_s, δ_s :: SignedShort.
s_u :: UnsignedShort.
i, j, δ_ι :: Int.
z :: Category12.

POP \[SP[t] s[v]t \rightarrow SP[s]\].
PPOP \[SP[t] s[w]t \rightarrow SP[s]\].
DUP \[SP[t] s[v]t \rightarrow SP[t'] s[v v]t'\].
PPOP 2 \[SP[t] s[w]t \rightarrow SP[t'] s[w w]t’\].
DUP \[X\] \[SP[t] s[v_1]v_0]t \rightarrow SP[t'] s[v_0 v_1]t’\].
PPOP 2 \[X\] \[SP[t] s[v_1]v_0]t \rightarrow SP[t'] s[v_0 v_1]t’\].
DUP \[X\] \[SP[t] s[w_1]v_0]t \rightarrow SP[t'] s[v_0 w_1]t’\].
PPOP 2 \[X\] \[SP[t] s[w_1]v_0]t \rightarrow SP[t'] s[v_0 w_1]t’\].
SWAP \[SP[t] s[v_1]v_0]t \rightarrow SP[t'] s[v_0 v_1]t’\].

v :: Category1.
w :: Word.

push(v) = popsh(v, v).
pop(v) = popsh(v, v).

var(n, z) n[z] =

α(b_u1, b_u2) ((((b_u1 << 8)|b_u2) :: UnsignedShort) = .
δ(b_u1, b_u2) ((((b_u1 << 8)|b_u2) :: SignedShort) = .
δ(b_u1, b_u2, b_u3, b_u4) ((b_u1 << 24)|(b_u2 << 16)|(b_u3 << 8)|b_u4) = .
wide(opc, b_u) = opc b_u.
wide(opc, b_u, b_s) = opc b_u b_s.
wide(opc, α(b_u1, b_u2)) = WIDE opc b_u1 b_u2.
wide(opc, α(b_u1, b_u2), δ(b_u3, b_u4)) = WIDE opc b_u1 b_u2 b_u3 b_u4.

i3(0) = 0. i3(1) = 1. i3(2) = 2. i3(3) = 3.
iload(i) = wide(LOAD, i).
iload(i) = LOAD.i3(i).
istore(i) = wide(ISTORE, i).
istore(i) = ISTORE.i3(i).

iload(n) var(n, v) push(v) \rightarrow .
istore(n) pop(v) \rightarrow var(n, v).
APPENDIX D: JAVA VIRTUAL MACHINE

iarithm-binary(i1, i2, i1 + i2) = IADD.

iarithm-binary(i1, i2, i1 - i2) = ISUB.

iarithm-binary(i1, i2, i1 \times i2) = IMUL.

iarithm-binary(i1, i2, i1/i2) = IDIV (\text{if } i2 \neq 0)

iarithm-binary(i1, i2, i1 \% i2) = IREM (\text{if } i2 \neq 0)

iarithm-binary(i1, i2, i1 & i2) = IAND.

iarithm-binary(i1, i2, i1 | i2) = IOR.

iarithm-binary(i1, i2, i1 ^ i2) = IXOR.

iarithm-binary(i1, i2, i1 << i2) = ISHL.

iarithm-binary(i1, i2, i1 >> i2) = ISHR.

iarithm-binary(i1, i2, i1 >>> i2) = IUSHR.

iarithm-binary(i1, i2, ir) popsh(i1 i2, ir) ⇒

iarithm-monadic(i1, i1) = INEG.

iarithm-monadic(i, ir) popsh(i, ir) ⇒

\text{wide(IINC, s_u, s_s) \ var(s_u, i) ⇒ var(s_u, i + s_s).

i5(-1) = M1. i5(0) = 0. i5(1) = 1. i5(2) = 2. i5(3) = 3. i5(4) = 4. i5(5) = 5.

iconst(i) = ICONST-i5(i).

iconst(b_s) = BIPUSH b_s.

iconst(b_u1_2) = SIPUSH b_u1 b_u2.

iconst(i) push(i) ⇒

jmpc(i, i = 0, \delta(b_1, b_2)) = IFEQ b_1 b_2.

jmpc(i, i \neq 0, \delta(b_1, b_2)) = IFNE b_1 b_2.

jmpc(i, i < 0, \delta(b_1, b_2)) = IFLT b_1 b_2.

jmpc(i, i \geq 0, \delta(b_1, b_2)) = IFGE b_1 b_2.

jmpu(TRUE, b_1 b_2) = GOTO b_1 b_2.

jump(cond, \delta) = jmpc(i, cond, \delta) pop(i).

PC[pc] pc[jump(cond, \delta)]qc = cond ⇒ PC[pc + \delta].
\( \text{jsr}(\delta(b_1, b_2)) = \text{JSR } b_1 b_2. \)
\( \text{jsr}(\delta(b_1, b_2, b_3, b_4)) = \text{JSR}_W b_1 b_2 b_3 b_4. \)

\( \text{PC}[pc] \text{pc}[\text{jsr}(\delta_1)]qc \text{push}(qc) \rightarrow \text{PC}[pc + \delta_1]. \)
\( \text{PC}[pc] \text{pc}[\text{wide(RET, } n)]qc \text{var}(n, a) \rightarrow \text{PC}[a]. \)
Appendix E

\( n \)-PRAM

\{MrModel = \text{Int}, MwModel = \text{Int} \}. \# \text{type aliases for the memory read and write models}

\{ER, CR :: MrModel, \\
\mr :: MrModel\}.

\{EW, \text{CW}_\text{WEAK}, \text{CW}_\text{COMMON}, \text{CW}_\text{TOLERANT}, \text{CW}_\text{COLLISION}, \text{CW}_\text{COLLISIONP}, \\
\text{CW}_\text{ARBITRARY}, \text{CW}_\text{PRIORITY} :: MwModel, \\
w \cdot :: MwModel\}.

\text{pram}(n, \text{rm}, \text{wm}) = \text{FE} \begin{cases} 
\text{pcs}(n) & \# \text{update PCs, block (stage 2) if no RAM is running} \\
3 \text{shr}(n, n, \text{rm}, \emptyset) & \# \text{shared memory reads} \\
4 \text{shw}(n, n, \text{wm}, \emptyset) & \# \text{shared memory writes}
\end{cases}

\text{pcs}(0) = \cdot. \# \text{0 RAMs}

\text{pcs}(p) = \text{pc}(p) \text{pcs}(p - 1) = \{p > 0 \} \Rightarrow .

\text{pc}(p) = \text{FE} \begin{cases} 
\text{P}^{-} \text{p}: \text{PC}[\text{pc}] \text{pc}[\text{instr}(p)][\text{qc}] = \{0 \leq \text{pc} < \text{C} \} \Rightarrow \text{P}^{-} \text{p}: \text{PC}[\text{qc}] & \# \text{running} \\
2 \Rightarrow . & \# \text{don't block} \\
\text{pC}(p) = \text{P}^{-} \text{p}: \text{PC}[\text{pc}] = \{0 > \text{pc} \lor \text{pc} \geq \text{C} \} \Rightarrow . & \# \text{not running}
\end{cases}

\text{halt}(p) = \cdot. \# \text{halt 0 RAMs}

\text{halt}(p) = \text{halt}(p - 1) = \{p > 0 \} \Rightarrow \text{P}^{-} \text{p}: \text{PC}[\text{C}] \Rightarrow \text{halt p RAMs}

\# \text{shr}(p, a) = \text{P}^{-} \text{p}: \text{MAR}[a].

\# \text{shr}(n, \text{CR}) = \sum_{p=1}^{n} \text{shr}(p, a_p) \text{read}(p, a_p).

\# \text{shr}(n, \text{ER}) = \sum_{p=1}^{n} \text{shr}(p, a_p) \text{halt}(p) \Rightarrow \{ \{a_p^{\text{\text{null}}}\} \neq \{a_p^{\text{\text{null}}}\} \} \Rightarrow \# \text{read conflict}

\# \text{shr}(n, a_p) \text{read}(p, a_p) \Rightarrow \{ \{a_p^{\text{\text{null}}}\} \neq \{a_p^{\text{\text{null}}}\} \} \Rightarrow \# \text{no read conflict}
APPENDIX E: n-PRAM

\[ \text{shr}(0, \_, \_) = . \quad \# \text{0 RAMs, no conflicts} \]
\[ \text{shr}(p, n, \text{CR}, \_) = P^p: \text{MAR}[a] \quad \text{shr}(p - 1, n, \text{CR}, \_) \quad \text{read}(p, a) \]
\[ \begin{cases} p > 0 \Rightarrow . & \# \text{CR, no conflicts} \\ \end{cases} \]
\[ \text{shr}(p, n, \text{ER}, s) = P^p: \text{MAR}[a] \quad \text{shr}(p - 1, n, \text{ER}, s) \quad \text{halt}(n) \]
\[ \begin{cases} a \neq \text{NULL} \land a \in s \land p > 0 \Rightarrow . & \# \text{ conflict} \\ \end{cases} \]
\[ \text{shr}(p, n, \text{ER}, s) = P^p: \text{MAR}[a] \quad \text{shr}(p - 1, n, \text{ER}, s \cup \{a\}) \quad \text{read}(p, a) \]
\[ \begin{cases} (a = \text{NULL} \lor a \notin s) \land p > 0 \Rightarrow . & \# \text{ no conflict} \\ \end{cases} \]
\[ \text{read}(p, a) = a[v] = \{ a \neq \text{NULL} \} \Rightarrow P^p: \text{ACC}[v]; \quad \# \text{ read value v from shared memory} \]
\[ \Rightarrow . \quad \# \text{ not a memory read instruction} \]
\[ \text{shw}(0, \_, \_, \_) = . \quad \# \text{0 RAMs, no write conflicts} \]
\[ \text{shw}(p, n, \text{wm}, s) = P^p: \text{MAR}[a] \quad \text{shw}(p - 1, n, \text{wm}, s) = \{ a \in s \land p > 0 \} \Rightarrow \# \text{ a resolved} \]
\[ = P^p: \text{MAR}[a] \quad \text{shw}(p - 1, n, \text{wm}, s \cup \{a\}) \quad \text{sw}(p, n, \text{wm}) = \{ a \notin s \land p > 0 \} \Rightarrow . \]
\[ \text{sw}(p, n, \text{EW}) \quad \# \text{exclusive write} \]
\[ = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v] \quad \text{sw}(p, a, v, n, n). \]
\[ \text{sw}(p, n, \text{CW.\text{WEAK}}) \quad \# \text{concurrent write, weak model} \]
\[ = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v] \quad \text{weak}(p, a, v, n, n). \]
\[ \text{sw}(p, n, \text{CW.\text{COMMON}}) \quad \# \text{concurrent write, common model} \]
\[ = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v] \quad \text{common}(p, a, v, n, n). \]
\[ \text{sw}(p, n, \text{CW.TOLERANT}) \quad \# \text{concurrent write, tolerant model} \]
\[ = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v] \quad \text{tolerant}(p, a, v, n, n). \]
\[ \text{sw}(p, n, \text{CW.COLLISION}) \quad \# \text{concurrent write, collision model} \]
\[ = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v] \quad \text{collision}(p, a, v, n, n). \]
\[ \text{sw}(p, n, \text{CW.COLLISIONP}) \quad \# \text{concurrent write, collision+ model} \]
\[ = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v] \quad \text{collisionp}(p, a, v, n, n). \]
\[ \text{sw}(p, n, \text{CW.ARBITRARY}) \quad \# \text{concurrent write, arbitrary model} \]
\[ = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v] \quad \text{arbitrary}(p, a, v, n, n). \]
\[ \text{sw}(p, n, \text{CW.PRIORITY}) \quad \# \text{concurrent write, priority model} \]
\[ = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v] \quad P^p: \text{SIG}[s] \quad \text{priority}(p, a, v, s, n, n). \]

\# \{ a_1 \otimes a_2 \mid a_1 = a_2 \land a_1 \neq \text{NULL} \} \\
\# \{ a_1 \otimes a_2 \mid a_1 \neq a_2 \lor a_1 = \text{NULL} \lor a_2 = \text{NULL} \} \\

\# \text{shwr}(p, a, v) = P^p: \text{MAR}[a] \quad P^p: \text{ACC}[v]. \]

\# shw(n, \text{EW}) 
\[ \begin{align*} 
\# &= \text{shwr}(p, a, v) \quad \text{halt}(p) \quad \{ a_p^{\text{NULL}} \} \neq \{ a_p^{\text{NULL}} \} \Rightarrow \# \text{ write conflict} \\
\# &= \text{shwr}(p, a, v) \quad \text{write}(a_p, v_p) \quad \{ a_p^{\text{NULL}} \} = \{ a_p^{\text{NULL}} \} \Rightarrow . \quad \# \text{ no write conflict} 
\end{align*} \]
APPENDIX E: $n$-PRAM

$\text{ew}(-, a, v, 0, -) = \text{write}(a, v)$.  
$\text{ew}(p_t, a_t, v_t, p, n)$

$= P^p:\text{MAR}[a] \text{ halt}(n)

= \begin{cases} \text{true} & (p_t \neq p \land a_t \oplus a \land p > 0) \Rightarrow \# \text{write conflict} \\ \text{false} & \end{cases}$

$= P^p:\text{MAR}[a] \text{ew}(p_t, a_t, v_t, p - 1, n)$

$= \begin{cases} \text{true} & (p_t = p \lor a_t \oplus a \land p > 0) \Rightarrow \# \text{no conflict} \\ \text{false} & \end{cases}$

$\# \text{shw}(n, \text{CW}_\text{WEAK})$

$\begin{cases} \exists \widetilde{\text{shw}}(p, a_p, v_p) \text{ halt}(p) \Rightarrow \# \text{write conflict} \\ \forall \widetilde{\text{shw}}(p, a_p, v_p) \Rightarrow \# \text{write conflict, halt} \\ \forall \widetilde{\text{shw}}(p, a_p, v_p) \Rightarrow \# \text{no conflict, check further} \end{cases}$

$\text{weak}(-, a, v, 0, -) = \text{write}(a, v)$.  
$\text{weak}(p_t, a_t, v_t, p, n)$

$= P^p:\text{MAR}[a] \text{ P^p:ACC}[v] \text{ weak}(p_t, a_t, v_t, p - 1, n)$

$= \begin{cases} \text{true} & (p_t \neq p \land a_t \oplus a \land v_t = 0 \land v = 0 \land p > 0) \Rightarrow \# \text{write 0 conflict} \\ \text{false} & \end{cases}$

$= P^p:\text{MAR}[a] \text{ P^p:ACC}[v] \text{ halt}(n)$

$= \begin{cases} \text{true} & (p_t \neq p \land a_t \oplus a \land (v_t \neq 0 \lor v \neq 0) \land p > 0) \Rightarrow \# \text{conflict, halt} \\ \text{false} & \end{cases}$

$= P^p:\text{MAR}[a] \text{ weak}(p_t, a_t, v_t, p - 1, n)$

$= \begin{cases} \text{true} & (p_t = p \lor a_t \oplus a \land p > 0) \Rightarrow \# \text{no conflict, check further} \\ \text{false} & \end{cases}$

$\# \text{shw}(n, \text{CW}_\text{COMMON})$

$\begin{cases} \exists \widetilde{\text{shw}}(p, a_p, v_p) \text{ halt}(p) \Rightarrow \# \text{write conflict} \\ \forall \widetilde{\text{shw}}(p, a_p, v_p) \Rightarrow \# \text{write conflict, halt} \\ \forall \widetilde{\text{shw}}(p, a_p, v_p) \Rightarrow \# \text{no conflict, check further} \end{cases}$

$\text{common}(-, a, v, 0, -) = \text{write}(a, v)$.  
$\text{common}(p_t, a_t, v_t, p, n)$

$= P^p:\text{MAR}[a] \text{ P^p:ACC}[v] \text{ common}(p_t, a_t, v_t, p - 1, n)$

$= \begin{cases} \text{true} & (p_t \neq p \land a_t \oplus a \land v_t = v \land p > 0) \Rightarrow \# \text{conflict, same value, proceed} \\ \text{false} & \end{cases}$

$= P^p:\text{MAR}[a] \text{ P^p:ACC}[v] \text{ halt}(n)$

$= \begin{cases} \text{true} & (a_t \oplus a \land v_t \neq v \land p > 0) \Rightarrow \# \text{conflict, halt} \\ \text{false} & \end{cases}$

$= P^p:\text{MAR}[a] \text{ common}(p_t, a_t, v_t, p - 1, n)$

$= \begin{cases} \text{true} & (p_t = p \lor a_t \oplus a \land p > 0) \Rightarrow \# \text{no conflict} \\ \text{false} & \end{cases}$
APPENDIX E: n-PRAM

\# shw\(n, CW\_\text{TOLERANT}\) = \(\bigwedge_{p=1}^{n} \text{shwr}(p, a_p, v_p) \text{write}(a_{wp}, v_p)\)
\# = \(\exists (a_p, v_p), (a_q, v_q) \in \{(a_p^{\text{NULL}}, v_p)\} \mid (a_{wp} := (p \neq q \land a_p = a_q) ? \text{NULL} : a_p) \Rightarrow \).

tolerant\((p, a, v, 0, \_\,) = \text{write}(a, v)\).

tolerant\((p_t, a_t, v_t, p, n)\) = \(P^p:\text{MAR}[a]\)
\((p_t = p \lor a_t \oplus a \land p > 0) \Rightarrow \) \# conflict, value not changed
\(P^p:\text{MAR}[a]\) tolerant\((p_t, a_t, v_t, p - 1, n)\) \# no conflict
\(= (p_t = p \lor a_t \oplus a) \land p > 0) \Rightarrow \).

\# shw\(n, CW\_\text{COLLISION}\) = \(\bigwedge_{p=1}^{n} \text{shwr}(p, a_p, v_p) \text{write}(a_p, v_{wp})\)
\# = \(\exists (a_p, v_p), (a_q, v_q) \in \{(a_p^{\text{NULL}}, v_p)\} \mid (v_{wp} := (p \neq q \land a_p = a_q) ? \text{COLL} : v_{wp}) \Rightarrow \).

collision\((\_, a, v, 0, \_\,) = \text{write}(a, v)\).

collision\((p_t, a_t, v_t, p, n)\) = \(P^p:\text{MAR}[a]\)
\((p_t = p \lor a_t \oplus a \land p > 0) \Rightarrow \) write\((a_t, \text{COLL})\) \# conflict, write a collision symbol
\(P^p:\text{MAR}[a]\) collision\((p_t, a_t, v_t, p - 1, n)\) \# no conflict
\(= (p_t = p \lor a_t \oplus a) \land p > 0) \Rightarrow \).

\# shw\(n, CW\_\text{COLLISIONP}\) = \(\bigwedge_{p=1}^{n} \text{shwr}(p, a_p, v_p) \text{write}(a_p, v_{wp})\)
\# = \(\exists (a_p, v_p), (a_q, v_q) \in \{(a_p^{\text{NULL}}, v_p)\} \mid (v_{wp} := (p \neq q \land a_p = a_q) ? \text{COLL} : v_{wp}) \Rightarrow \).

collisionp\((\_, a, v, 0, \_\,) = \text{write}(a, v)\).

collisionp\((p_t, a_t, v_t, p, n)\) = \(P^p:\text{MAR}[a]\)
\(P^p:\text{ACC}[v]\) collisionp\((p_t, a_t, v_t, p - 1, n)\)
\((p_t = p \lor a_t \oplus a \land v_t = v \land p > 0) \Rightarrow \) \# conflict, same value, proceed
\(P^p:\text{MAR}[a]\) \# conflict, write a collision symbol
\(P^p:\text{ACC}[v]\) collisionp\((p_t, a_t, v_t, p - 1, n)\) \# no conflict
\(= (p_t = p \lor a_t \oplus a) \land p > 0) \Rightarrow \).

\# shw\(n, CW\_\text{ARBITRARY}\) = \(\bigwedge_{p=1}^{n} \text{shwr}(p, a_p, v_p) \text{write}(a_p, v_{wp})\)
\# = \(\exists (a_p, v_p), (a_q, v_q) \in \{(a_p^{\text{NULL}}, v_p)\} \mid (v_{wp} := (p \neq q \land a_p = a_q) ? \text{rand}([v_p^{\_	ext{vs}}]) : v_p) \Rightarrow \).
arbitrary(_,-,a,v,0,_) = write(a,v).

arbitrary(p,a_t,v_t,p,n)
= P^-p:MAR[a] arbitrary(p_t,a_t,v_t,p-1,n)
= [p_t ≠ p ∧ a_t ⊕ a ∧ p > 0]⇒ # conflict, choose v_t
= P^-p:MAR[a] P^-p:ACC[v] arbitrary(p_t,a_t,v,p-1,n)
= [p_t ≠ p ∧ a_t ⊕ a ∧ p > 0]⇒ # conflict, choose v
= P^-p:MAR[a] arbitrary(p_t,a_t,v_t,p-1,n)
= [p_t = p ∨ a_t ⊕ a ∧ p > 0]⇒. # no conflict

# shwr(n,CW_PRIORITY) = shwr(p,a_p,v_p,s_p) write(a_p,v_wp)
# = { a_p ≠ NULL, (a_p, v_p) ∈ (a_p, v_p, s_p) | (v_wp := (p ≠ q ∧ a_p = a_q)? min_p, s_p : v_p) } ⇒ .

priority(_,-,a,v,_,0, _) = write(a,v).

priority(p,a_t,v_t,s_t,p,n)
= P^-p:MAR[a] P^-p:ACC[v] P^-p:SIG[s] priority(p_t,a_t,v_t,s_t,p-1,n)
= [p_t ≠ p ∧ a_t ⊕ a ∧ s_t < s ∧ p > 0]⇒ # conflict, p has lower priority
= P^-p:MAR[a] P^-p:ACC[v] P^-p:SIG[s] priority(p_t,a_t,v,s,p-1,n)
= [p_t ≠ p ∧ a_t ⊕ a ∧ s_t > s ∧ p > 0]⇒ # conflict, p has higher priority
= P^-p:MAR[a] P^-p:ACC[v] P^-p:SIG[s] priority(p_t,a_t,v_t,s_t,p-1,n)
= [p_t = p ∨ a_t ⊕ a ∧ p > 0]⇒. # no conflict

write(a,v) = [a ≠ NULL]⇒ a[v]; # write value v into shared memory
⇒⇒ . # not a memory write instruction

accr(p,v) = FE\_2 P^-p:ACC[v]. # accumulator read
accw(p,v) = FE\_2 ⇒ P^-p:ACC[v]. # accumulator write

sigr(p,v) = FE\_2 P^-p:SIG[v]. # signature register read
sigw(p,v) = FE\_2 ⇒ P^-p:SIG[v]. # signature register write

pcr(p,v) = FE\_2 P^-p:PC[v]. # program counter read
pcw(p,v) = FE\_2 ⇒ P^-p:PC[v]. # program counter write

regr(p,r,v) = FE\_2 P^-p:r[v]. # local register read
regw(p,r,v) = FE\_2 ⇒ P^-p:r[v]. # local register write

DR(p,r,v) = regr(p,r,v). # direct register (read)
IR(p,r_1,v) = regr(p,r,r_1) regr(p,r_1,v). # indirect register (read)
APPENDIX E: n-PRAM

\( \text{rm}(p, a, v) = \text{DR}(p, a, v) \) \# register modes
\[ = \text{IR}(p, a, v). \]

\( \text{irm}(p, v) = \text{IMM } v \) \# immediate mode and register modes
\[ = \text{rm}(p, -, v). \]

\( \text{DMR}(p) = \begin{cases} \text{FE} & 2 \\ \rightarrow P^p: \text{MAR}[m] \end{cases} \) \# direct memory read
\[ 3 \rightarrow P^p: \text{MAR}[^{\text{NULL}}]. \]

\( \text{IMR}(p) = \begin{cases} \text{FE} & 2 \\ P^p: r[m] \rightarrow P^p: \text{MAR}[m] \end{cases} \) \# indirect memory read
\[ 3 \rightarrow P^p: \text{MAR}[^{\text{NULL}}]. \]

\( \text{mrm}(p) = \text{DMR}(p) \)
\[ \rightarrow \text{IMR}(p). \]

\( \text{DMW}(p) = \begin{cases} \text{FE} & 3 \\ \rightarrow P^p: \text{MAR}[m] \end{cases} \) \# direct memory write
\[ 4 \rightarrow P^p: \text{MAR}[^{\text{NULL}}]. \]

\( \text{IMW}(p) = \begin{cases} \text{FE} & 2 \\ P^p: r[m] \rightarrow P^p: \text{MAR}[m] \end{cases} \) \# indirect memory write
\[ 3 \rightarrow P^p: \text{MAR}[m_1] \\
4 \rightarrow P^p: \text{MAR}[^{\text{NULL}}]. \]

\( \text{mwm}(p) = \text{DMW}(p) \)
\[ \rightarrow \text{IMW}(p). \]

\( \text{binary}(x, y, x + y) = \text{ADD}. \) \# addition
\( \text{binary}(x, y, x - y) = \text{SUB}. \) \# subtraction
\( \text{binary}(x, y, x \times y) = \text{MUL}. \) \# multiplication
\( \text{binary}(x, y, x/y) = \text{DIV}. \) \# division
\( \text{binary}(x, y, x \% y) = \text{MOD}. \) \# modulo
\( \text{binary}(x, y, x << y) = \text{SHIFT}. \) \# shift left
\( \text{binary}(x, y, x \& y) = \text{AND}. \) \# bitwise and
\( \text{binary}(x, y, x | y) = \text{OR}. \) \# bitwise or
\( \text{binary}(x, y, x ^{\text{t}} y) = \text{XOR}. \) \# bitwise xor

\( \text{monadic}(x, \log(x)) = \text{LOG}. \) \# logarithm
\( \text{monadic}(x, \text{not}(x)) = \text{NOT}. \) \# bitwise not

\( \text{arlog}(p, r) = \text{binary}(x, y, r) \text{irm}(p, y) \text{acrr}(p, x) \)
\[ = \text{monadic}(y, r) \text{irm}(p, y). \]

\( \text{load}(p, v) = \text{LOAD} \text{irm}(p, v) \)
\[ = \text{LOADINDEX} \text{sig}(p, v) \]
\[ = \text{LOADPC} \text{pcr}(p, v). \]
APPENDIX E: n-PRAM

toacc(p, v) = arlog(p, v) \implies accw(p, v)
  = load(p, v) \implies accw(p, v).

store(p, r, v) = STORE rm(p, r, \_ ) accr(p, v).
toreg(p, r, v) = store(p, r, v) \implies regw(p, r, v).

jump(p, v > 0, a) = JPOS irm(p, a) accr(p, v).
jump(p, v = 0, a) = JZERO irm(p, a) accr(p, v).
jump(p, TRUE, a) = JUMP irm(p, a). \# unconditional jump
jump(\_, TRUE, C) = HALT.

topc(p) = jump(p, cond, a) =\{ cond \} \Rightarrow pcw(p, a)
  = jump(p, cond, a) =\{ \neg cond \} \Rightarrow .

read(p) = READ mrm(p).
write(p) = WRITE mwm(p).

instr(p) = toacc(p, \_)
  = toreg(p, \_ , \_)
  = topc(p, \_)
  = read(p)
  = write(p).
Appendix F

Glossary

This glossary contains basic terminology for Extended Update Plans. Many terms have been adopted from the original work on Update Plans [60], others have been changed to accommodate for changes introduced in this thesis, and there is also a number of completely new terms exclusive to EUP.

alternatives, 14
  a series of update schemes; the first applicable scheme in the series is applied

ambidextrous archetype, 18
  a pair of archetypes of the same name, one left-handed, the other right-handed

anonymous sequence, 67
  a sequence with no sequential block identifier

applicable, 11
  an update rule is applicable if its left-hand side is consistent with the current configuration and its guard evaluates to true

application order, 68
  the order updates are applied to a configuration in a sequence

archetype, 15
  a macro-like mechanism

archetype expansion, 17
  the mechanism by which archetypes are expanded to give update schemes

canonical form, 70
  a sequential or a parallel block is in canonical form if it has only one type of updates—update schemes
casting, 14
  forcing the value of a term to be of a different type than the type globally declared for
  that term

cell, 10
  an element of configuration or memory

command, 13
  an update scheme in which both the left and right-hand sides are in command form

command archetype, 19
  an ambidextrous archetype whose name starts with a constant

command driven, 13
  an update plan in which all update schemes are commands

command form, 13
  a configuration containing a non-empty command sequence, or one in which the
  contents of the register PC are not specified

command sequence, 13
  a sequence of terms

configuration, 10
  a partial function from locators to values; a consistent set of locator expressions

consistent, 10
  a set of locator expressions is consistent if it does not specify conflicting contents for
  one and the same cell

context, 17
  the non-text part of a configuration, in particular in an archetype body

expansion, 17
  see archetype expansion; the text that actually replaces the archetype call

final configuration, 11
  a configuration to which none of the update schemes in an update plan are appli-
 icable

full applicability, 68
  a sequence is fully applicable if all of its constituent updates are applicable in the
  application order

ground term, 15
  a term for which a unique variable free value can be derived
guard, 10
   a condition for applicability

initial configuration, 11
   specifies the initial state of the memory before any update scheme is applied

left-handed archetype, 18
   an archetype having an empty right-hand side expansion

locator, 10
   an index to a memory

locator expression, 10
   a sequence of cells delimited on the left and right by a locator

memory, 11
   a function from locators to values

opcode, 13
   the first element of a command sequence

parallel block, 20
   a set of update schemes to be applied simultaneously

parallel block symbol (open), 20
   '{': indicates the start of a parallel block

parallel block symbol (close), 20
   '}': indicates the end of a parallel block

parameter resolution, 17
   the mechanism by which parameters of an archetype are rewritten to evaluable expressions

partial applicability, 68
   a sequence is partially applicable, if more than one but not all its constituent updates are applicable in the application order

program counter, 12
   the register PC

register, 12
   a constant locator

repeat, 12
   '~': indicates a repeat of the previous left-hand side
right-handed archetype, 18
an archetype having an empty left-hand side expansion

semi-ground term, 15
a term for which a finite number of variable free values can be derived

sequence, 67
a synonym for a sequential update scheme

sequencer, 67
a synonym for the sequential block identifier

sequential archetype, 78
an archetype whose body is a sequential block

sequential block, 67
a set of updates to be applied sequentially

sequential block symbol (open), 67
'(S)': indicates the start of a sequential block S

sequential block symbol (close), 67
'\)\)': indicates the end of a sequential block

sequential update scheme, 67
a top-level sequential block

stage, 67
one step of a sequence

stageless sequence, 67
a sequence with all of its stages left untagged

store, 10
a two-way countably infinite set of cells

store structure, 15
a regular expression over store names

synchroniser, 73
central update plan synchroniser (UP:SEQ), a constant used in the implementation of sequences

text, 17
a sequence of terms
textual expansion, 17
the replacement text of an archetype call, before parameter resolution

textual ordering, 69
the way text expanded in individual stages of a sequence is arranged

top-level, 69
an update plan's top-level update is also an item in the update plan

type alias, 14
a type declaration

type primitive, 14
a type name not appearing as the left-hand side of a type alias

update, 58
a parallel block, a sequential block or alternatives

update plan, 10
a set of updates, type and store declarations

update rule, 10
an update scheme which contains no variables and both its left and right-hand sides are self-consistent

update scheme, 10
consists of a left-hand side, a right-hand side (both configurations) and a guard

update script, 11
an update plan and an initial configuration