ModulaTor logo, 7.8KB

The ModulaTor

Oberon-2 and Modula-2 Technical Publication

Ubaye's First Independent Modula-2 & Oberon-2 Journal!

Nr. 2..5, Mar..Jun-1996 (revised 07-Sep-2009: added Chap. 12)

Porting the Oberon System to AlphaAXP

Copyright (1996-2009) by Günter Dotzel and Hartmut Goebel

Abstract

The Oberon System is an object-oriented programming framework supporting persistent objects and run-time extensibility. This paper describes the development of Alpha Oberon, an implementation of the Oberon System V4 for AlphaAXP under OpenVMS with X11 server. The port is based upon the ETH Zuerich's Oberon System for MIPS/Unix and ModulaWare's OpenVMS AXP stand-alone Oberon-2 compiler. The processor and operating system specific parts of the Oberon System and its boot-loader were rewritten in Oberon-2 from scratch. Details are provided for the module-loader, bootstrap-mechanism, garbage collector, OpenVMS' vs. Unix's system service support for exception- and stack-processing, procedure calling conventions, data alignment issues, and run-time data structures.

Keywords: Oberon, Alpha, AXP, OpenVMS, Compiler, OOP, Operating System, Exception handler, Garbage collector

0 Summary

This paper describes the development of Alpha Oberon, the Oberon System V4 for OpenVMS AXP with OSF/Motif (X11), in respect of the implementation of the module-loader, the bootstrap-mechanism, the garbage collector (GC), the exception-handler (EH), and the modifications to the stand-alone Oberon compiler.

1 Introduction

Oberon is a programming language as well as an operating system [Rei91], [Wir92]. The programming language Oberon is simpler than Modula-2, but supports object-oriented programming through extensible records and polymorphism. Oberon-2 [M~os91] adds type-bound procedures (methods) and dynamic arrays. The Oberon System provides an extensible but compact framework for Oberon-2, where all resources are used economically.

2 The Oberon System

The Oberon System is a single-process multitasking operating system with automatic garbage collection and dynamic module-loading, which allows run-time extensibility. See also [Rei91], [Rei92], [Wir88], and [Wir92].

The ETH-Zuerich offers two versions of the Oberon System, version 3 and 4. The conception of V3 seems more powerful, because it is completely based on persistent objects [Gut94]. However, the interfaces of V4 are stable in contrast to V3 which is still evoluting. Also, at the start of the project, the sources of V3 were not available, which ruled out the possibility to start with V3. But there is evidence from other implementors that the migration from V4 to V3 is not complicated.

Alpha Oberon is based on DECOberon, an implementation of Oberon System V4 on MIPS/Unix respectively DEC Ultrix. All processor and platform specific parts of the Oberon System, i.e.: the module-loader, boot-loader and EH were rewritten from scratch.

DECOberon uses X11 as graphical user interface which offers drawing functions, mouse- and keyboard-event processing needed by the Oberon System. Except for the foreign procedure type declarations, module X11 remained almost unchanged in Alpha Oberon.

In DECOberon, the file I/O interface- and other Unix service-routine calls are located in module Unix. The interface of module Unix was essentially kept unchanged in Alpha Oberon, so that higher-level Oberon System modules didn't need any changes. The implementation of module Unix translates all Unix-functions needed by the Oberon System to functionally equivalent OpenVMS-calls. Those Unix services which don't have a direct equivalent in OpenVMS are emulated in module Unix. Such an emulation is not always trivial. For example, Alpha Oberon also supports reading OpenVMS' variable record length files, which involve a temporary file copy in Unix.Open.

Compared to other RISC-processors, the AXP architecture [AXP] has some architectural specialities: There aren't any symbolic references in the program-code, there is no program-status-word and there are 48 different modi for each floating-point-instruction. The processor can execute multiple instructions in one cycle. In high-performance mode, it isn't not possible to localize the instruction, which has caused a trap (e.g. division by zero or access violation). The processor also requires special EH mechanism and natural data alignment to avoid execution time penalties.

3 The Alpha Oberon Compiler A2O

For other implementations of the Oberon Systems a dedicated compiler was developed. For the Alpha Oberon port, a stand-alone, native-code Oberon-2 compiler was already at hand [Dot94]. During the development of Alpha Oberon only three small compiler errors were detected. A2O generates directly AXP machine-code in OpenVMS object-file format. The A2O data sheet is located in ModulaWare's homepage (file /h2odat.txt).

Only a relatively small extension of A2O's compiler-backend (linker-interface) was necessary to generate Oberon load-files (OLF) instead of OpenVMS object-files (for stand-alone programs) and to support the GC (see chap. 7).

Unlike other implementations such as HP-Oberon [Sup94], no distinction has to be made between calling internal (available with the Oberon System) or external procedures (shareable image), because A2O uses the OpenVMS AXP calling conventions [VMS2]. A2O generates identical code for the Oberon System and for stand-alone applications. Any run-time system (RTS) routine of A2O can easily be remapped to an external procedure. A feature which is used in the boot-loader (see chap. 5.1 and 6).

4 Concepts

This chapter summarises those concepts whose implementation details are presented in the chapters 5 to 8.

4.1 Dynamic Module Loading

Dynamic module-loading allows to load a module only when it is needed. A module has neither to be present during system development, nor declared. Linkage is impossible. Such an unnecessary additional step would limit the extensibility of the system. If the user develops and compiles a module, this module is immediately available by loading it. The module-loader loads each module into the heap of the Oberon System, resolves all references and initialize the module-body. If a module imports other modules they are also loaded, if they are not already located in the heap. This process runs recursively and to make sure, that all indirectly used modules are available. The so-called module-key guarantees the version consistency of the module interfaces.

4.2 Garbage Collection

The GC collects automatically all unused storage units which can be re-used for allocation. The GC recognizes an unused storage unit, if it is no longer directly or indirectly referenced. The GC uses the so-called mark-sweep-method; that is: At first all reachable objects were marked, then the heap is examined sequentially and the storage-scope of all non-marked objects is released. This requires that the heap consists of a continuous storage area. The boot-loader allocates this storage area from the operating system and makes it available to the Oberon System. It isn't necessary that this (possibly large) storage area is initialized with NIL, because this is done with each allocation with Oberon NEW. Like most Oberon Systems which are built on-top of an existing operating system, the GC is also responsible for closing all files. More details are presented in [Wir92], [Pfi91], [Tem91] and chap. 7.

4.3 Exception Handling

Programming errors (traps) have to be handled within the system, otherwise the system would terminate. The control would return to the underlying operating system (if there is one). Why an execption handler is needed, is illustrated by an example: In a conventional operating system, an editor is activated by a command. This command puts the computer into the "edit" mode. If an error occurs during the execution, which isn't handled by the editor, the editor (program) would terminate. In most cases the modifications to the text are lost. In the Oberon System is no edit-mode: there is only a class 'Text' and a set of editing commands, which operate on this class. Also loading a text file is only a command (Edit.Open), which loads the text and opens a so-called viewer to map part of that text to a display window. After execution of Edit.Open, the control returns to the system's main-loop, Oberon.Loop. If an error occured during command execution, which isn't handled by the command itself, the control is also given back to the main-loop. Thus the text is kept and is editable further on.

4.4 Bootstrap/Boot-loader

There are three module categories: core- and system-modules and tools. The bootstrap loads the Oberon System into the process' main storage. There are two boot-phases: the first is under control of the boot-loader (module BootLoader) and the second under control of the Oberon core (module System). The boot-loader loads the code, reference and constant sections of all core-modules, into the heap. During the load, all references are resolved, the type descriptors for run-time type information are constructed and the global variable sections are reserved. Then the boot-loader transfers the program control to the body of the module, which is the top of the import-hierarchy, for initialisation, At this point, the Oberon System is initialized and the remaining system-modules were loaded.

4.5 Extensibility to 64 Bit

The architecture of the original Oberon System is inherently designed for 32 Bit. In Oberon, the 32 bit dependency can be illustrated for example by the function SYSTEM.ADR, which is of type LONGINT (32 Bit). The Oberon System assumes at many places, that LONGINT has the same size as an address. Low-level modules like Kernel and the module-loader perform a lot of address computations using variables of type LONGINT. There are many 32 bit dependencies, which can't easily be detected and hence it seems impossible to have 64 bit addressing without also changing the pervasive type LONGINT to 64 bit integer.

The AXP processor has 64 bit addresses, but OpenVMS processes are currently restricted to operate within 32 bit virtual address space. The 64 bit address representation is a canonical extension of the sign bit 31 into bits 31 to 63. This technique allows to have compatible system services and data types on VAX and AXP, which eases migration to AXP. DEC currently extends OpenVMS to allow 64-bit virtual process space.

One of the goals of the Alpha Oberon was, to prepare for 64 bit address extension in the low-level modules (module-loader, GC). This was done by introducing two types ADDRESS = LONGINT, for variables, which contain a 32 Bit address and ADDRESS64 = SYSTEM.SIGNED_64 (64-bit integer type of A2O), for variables which already contain a 64 bit address in canonical form. The heap data structures of Alpha Oberon are already prepared for 64 bit. Where possible ADDRESS64 was used, othwerwise fields of type ADDRESS are padded with fill-bytes.

5 The Module-Loader

A2O can generate OpenVMS object-files, which allow bindings to so-called shareable images. This feature allows to connect at run-time to procedures, which are not known at compile time. This is done using an operating system service (LIB$FIND_IMAGESYMBOL) via name symbol, e.g.: "Module.Procedure".

The use of shareable images in order to implement Oberon System's module-loader, would allow to have a common object file format for stand-alone and embedded Oberon programs. But this possibility turned out to be impractical for several reasons. Therefore the A2O back-end (linker interface) was extended, to allow generation of syntactically simple Oberon load-files (OLF).

On instruction level, the generated code is identical for both formats. The reference section is of course different. What the OpenVMS linker is supposed to resolve at link time, to produce a static linkage section, required by the so-called base pointer architecture, needs to be done by the module-loader at run-time, with identical storage layout. Identical mapping is also required for storage sections of constants and variables, and for the program code itself. Constants and program-code are transfered one-to-one from the OLF and copied into the heap. Storage for the variable-section is allocated in the heap.

An OLF contains all information, necessary for module binding by the module-loader: imported modules, exported procedures and commands. The GC needs pointer-offsets and the exception- and trap-handler (post-mortem dump viewer) needs the symbolic references (RefBlk; see [Wir92]). For each loaded module a structure ModuleDesc is allocated to store its descriptor:



  Name *= ARRAY 32 OF CHAR;
  Module* = POINTER TO ModuleDesc;
  ModuleDesc* = RECORD
    next-: Module;
    refcnt-: LONGINT;
    key: ModuleKey;
    imports: POINTER TO ARRAY OF (* Module *) ADDRESS;
    linkage: LinkageSectionPtr;
    procDescs-: POINTER TO ARRAY OF ProcDesc;
    data-, const, code-: POINTER TO ARRAY OF LONGINT;
    entries: POINTER TO ARRAY OF INTEGER;
    cmds-: POINTER TO ARRAY OF Cmd;
    ptrTab: POINTER TO ARRAY OF ADDRESS;
    tdescs-: POINTER TO ARRAY OF (* Kernel.Tag *) ADDRESS64;
    refs-: RefBlock;
    rtsFlags: SET;
    name-: Name;
    init: BOOLEAN;
  END ;
 
The types ModuleKey, Cmd and RefBlock aren't described here, because they are not needed for the understanding of the presented concepts. The most important tasks of the module-loader is the generation of linkage- and type descriptor section.
5.1 Linkage-section

Most CISCs and 32-bit RISCs call a procedure directly via an address whose immediate value follows the call instruction. The 64-bit AXP RISC processor does not have any linker-relocatable addresses in the program section. Each processor instruction has a fixed width of 32 bit and there is no room to store any 64 bit virtual addresses in the instruction format. Data is accessed relative to base pointers and procedures are accessed via a so-called procedure linkage pair. The called procedure finds its context and base pointers by a pointer to its own procedure descriptor [VMS2].

Data base pointers, linkage pairs and procedure descriptors are stored in the linkage-section. The references, all 64 bit wide are resolved by the binder or loader. The procedure descriptor stores information about start address of the procedure code (entry), parameter types, stack-frame size, and EH, if present. A linkage-pair consists of the procedure code entry-address and the address of its procedure descriptor.

Before a procedure is called, in each case the corresponding procedure descriptor has to be loaded into a reserved register. Only by means of its own procedure descriptor, a procedure can load the required base registers to address its own or foreign global variables, constants, objects, and procedures.

With A2O, the linkage-section of a module contains the pointers to its own sections (constant, data, objects, run-time system), procedure descriptors of the own procedures, pointers to sections of imported modules (constant, data, objects), linkage pairs for the run-time system procedures and linkage-pairs for all used procedures from the own and from external modules.

These data stuctures are defined and documented in module A2OLayout.



  LinkageSection = RECORD (* <modulename>_$link$ = modula2_$vector$ *)
    linkdata: RECORD
      data   : ADDRESS64; (* modula2_$data$ *)
      const  : ADDRESS64; (* modula2_$strings$ *)
      rts    : ADDRESS64; (* Modula2$RunTimeSystem_$data$ *)
      object : ADDRESS64; (* oberon2_$objects$ *)
      unused1: ADDRESS64;
      unused2: ADDRESS64;
    END;
    procDesc : ARRAY nofOwnProcs OF ProcDesc;
    impMods  : ARRAY nofImportedModules OF ImportedModuleData;
    rts      : ARRAY 16 OF ProcLinkage;
    own      : ARRAY nofOwnProcs OF ProcLinkage;
    imp      : ARRAY nofImportedProcs OF ProcLinkage;
  END;
  
Pseudo-definition of the linkage-section

OpenVMS defines different formats of procedure descriptors; more details are in [VMS2]. For the module-loader this definition of the so-called full-frame procedure descriptor is sufficient:



  ProcDescPtr = POINTER TO ProcDesc;
  ProcDesc = RECORD
    data1 : SYSTEM.QUADWORD;
    entry : ADDRESS64;
    data2 : ARRAY 2 OF SYSTEM.QUADWORD;
    handlerData : ARRAY 2 OF ADDRESS64;
  END;
  
  ProcDesc = data1:8 entry:Num data2:16 handlerEntry:Num handlerData:Num .
  
  Block(86X); i := 0; t := S.ADR(m.code[0]);
  WHILE i < nofOwnProcs DO
    Files.ReadBytes(R, m.procDescs[i].data1, 8);
    Files.ReadNum (R, adr); m.procDescs[i].entry := t + adr;
    Files.ReadBytes(R, m.procDescs[i].data2, 16);
    Files.ReadNum (R, adr); m.procDescs[i].handlerData[0] := adr;
    Files.ReadNum (R, adr); m.procDescs[i].handlerData[1] := adr;
    INC(i);
  END;

Definition of procedure descriptor, OLF-representation and read routine

The data sections data1 and data2 are read in one block and copied into the storage without interpretation. For the entry data the offset of the code-section is entered in the OLF. handlerData is used for the EH and is currently always zero. The procedure descriptors are followed by the data pointer for the imported modules: one for variables/data-, one for constants- and one for type descriptor/object-section. They are used, if the data of an imported module is referenced. Becauses all imported modules are already loaded, the pointers can be copied from their linkage-section.



  ImportedModuleData = RECORD
    data  : ADDRESS64
    const : ADDRESS64;
    object: ADDRESS64;
  END;

  i := 0;
  WHILE i < nofimp DO
  m1 := imports[i]; m.imports[i] := S.VAL(ADDRESS,m1);
  IF m1 # foreignMod THEN
    impMods[i].data := m1.linkage.linkdata.data;
    impMods[i].const := m1.linkage.linkdata.const;
    impMods[i].object:= m1.linkage.linkdata.object;
  END;
  INC(i);
  END;
 
The last part of the linkage-section contains the linkage-pairs. A linkage-pair has following structure:


  LinkagePair = RECORD
    entryAdr : ADDRESS64;
    procDesc : ADDRESS64; (*ProcDescPtr;*)
  END;

Actually the item entryAdr is redundant, because it is also registered in the procedure descriptor. It serves to avoid an additional dereferencing during procedure call. Besides the redundancy, it also facilitates the construction of the linkage-pairs: only the address of the procedure descriptor is needed.


  PROCEDURE FillLinkagePair(VAR lp: LinkagePair; procDesc: ADDRESS);
  VAR pdesc: ProcDescPtr;
  BEGIN
    pdesc := SYSTEM.VAL(ProcDescPtr, procDesc);
    lp.entry := pdesc.entry;
    lp.procDesc := procDesc;
  END FillLinkagePair;
 
The linkage-pairs section is divided into three sub-sections: the linkage-pairs of the RTS-routines, of the own procedures, and of the imported procedures. The module-loader processes Oberon procedures and foreign language procedures differently. In A2O, foreign procedures are declared in so-called foreign interface modules. Like all external symbols, references to foreign procedures are resolved symbolically by procedure Kernel.dllsym (see chap. 6.2) via an operating system service.

Maybe the only 'hack' in the Alpha Oberon is the installation of the storage allocation routine. O2NewCode is the only RTS-routine, which is currently represented by an Oberon procedure. For stand-alone programs (OpenVMS linker), it is resolved to Storage.ALLOCATE directly. The module-loader 'patches' this entry to Kernel.ALLOCATE. This binding is done symbolically, by 'misuse' of RefBlk of the module Kernel. A description of ALLOCATE is searched in Kernel to resolve the reference. More informations about RefBlk can be found in [Wir92].

This 'hack' is necessary, in order to guarantee that the procedure Kernel.ALLOCATE is referenced in the boot-loader from the subsequently loaded module Kernel. SYSTEM.VAL(ADDRESS, Kernel.ALLOCATE) would deliver a reference to the statically linked module Kernel. However, the system wouldn't work with that procedure. Also, the work-around for module Storage, as used for the storage management in the boot-loader itself, wouldn't work with it. The OLF does not contain data for the linkage pairs of the procedures of the own module. Their sequence corresponds to the order of the procedure descriptors, from which they are calculated.



  i := 0;
  WHILE i < nofOwnProcs DO
    FillLinkagePairP(ownLinkage, S.ADR(m.procDescs[i]));
    INC(i);
  END;
 
For the linkage pairs of the imported procedures the module number and the number 'num' (see below) of the export-entry are stored in the OLF. The module number specifies from which of the imported modules the procedure is derived. In the export list of this module, the first entry contains the number of the corresponding procedure descriptor. From that the linkage-pair is calculated.


  i := 0;
  WHILE i < nofImpProcs DO
    Files.ReadNum(R,num);
    IF num = -1 THEN
      Files.ReadString(R,name);
      Unix.dllsym(name, t); FillLinkagePairP(ownLinkage, t);
    ELSE
      m1 := S.VAL(Module, m.imports[num-1]);
      Files.ReadNum(R,num);
      FillLinkagePairP(ownLinkage, S.ADR(m1.procDescs[m1.entries[num]]));
    END;
    INC(i);
  END;
 
If the module number is equal to -1, a foreign procedure this indicated, which is to be referenced symbolically by its name (without module name). The symbol stored in the OLF is passed to Unix.dllsym, which resolves and returns the symbols address. Per definition, the value of a procedure symbol is identical to the address of the procedure descriptor. With that the return value of Unix.dllsym is used for the generation of linkage-pairs. More information about the dllsym can be found in chap. 6.2.
5.2 Type descriptor section

As described in chap. 7.3, the type descriptors have to be allocated in the heap individually. The layout corresponds to the structures used in DECOberon, however the addresses were expanded to 64 Bit. Here is the pseudo-definition of a type descriptor:



  TypeDesc = RECORD
    tag0     : ADDRESS64;
    tdSize   : INTEGER64;
    sentinel : INTEGER64;
    self     : ADDRESS64; (* points to recSize *)
    ext      : RECORD extlev, pad: LONGINT; END;
    module   : ADDRESS64; (* points to the types Modules.ModuleDesc *)
    name     : Name; 
    methods  : ARRAY nofMethods OF LinkagePair;
    keys     : ARRAY 32 (*=maxExts*) OF ADDRESS64;
    tag      : ADDRESS64; (* points to tdSize *)
    recSize  : INTEGER64;
    ptrs     : ARRAY nofPtrs+1 OF LONGINT;
  END;
 
The base address of a type descriptor is SYSTEM.ADR(recSize). It represents the tag of all records of this type. The field elements tag0 to self correspond to the structure which is generated at the beginning of the allocated storage area. Solely the element self doesn't point to tdSize, but to recSize. tag and self connect type descriptor and type information, which are used for persistent objects. The OLF contains all information, necessary to generate the type descriptors.


  TypeBlock = 89X maxIdentLen:Num {Type} -1:Num .
  Type      = recsize:Num nofmethods:Num nofptr:Num extLev:Num
              String BaseTypeMethods {offset:Num}*nofptr .
  BaseType  = [ -1:Num | modnum:Num offset:Num nofinhMeth:Num ] .
  Methods   = {methNum:Num entryNum:Num} -1:Num .

The storage requirements for the type descriptor is determined with nofmethods and nofptrs. All others have constant size. The generated type descriptor must contain the keys of all base types, as well as the methods inherited by them. There is a simple way to construct the type descriptor, because it differs from its direct base type only by an additional key value and the additional or overwritten methods. Consequently, the keys and methods of the base type can be copied and then completed. Because the base type is always derived from the actual or directly imported module, it can only be referenced by a number. nofinhMeth counts the numbers of base type's methods. This number determines the size of the area to be copied. The correspondig section of LoadTypes is given below:


  Files.ReadNum(R, i); (* modnum *)
  IF i # -1 THEN (* inherit/copy data from basetype *)
    IF i = 0 THEN m1 := m; ELSE m1 := imports[i-1]; END;
      Files.ReadNum(R, i); S.GET(S.ADR(m1.tdescs[i]), i); (* base tdadr *)
      Files.ReadNum(R, j); j := j*AL.LnkPLSize + keySize + qwSize;
                           (* inherited methods + keys + tag *)
      S.MOVE(i-j, t-j, j-qwSize);
    END;
    S.PUT(t - (qwSize + (tdd.ext.extlev+1)*tagSize), t); (* implant own key *)
    p := S.VAL(ProcLinkageArrPtr, t-(qwSize+keySize+(nofmeth+1)*AL.LnkPLSize));
    LOOP
      Files.ReadNum(R, i); IF (i = -1) THEN EXIT; END;
      Files.ReadNum(R, j);
      FillLinkagePair(p[nofmeth-i], S.ADR(m.procDescs[j]));
    END;
 
After copying the base type data, the address of the own type descriptor is written as a key into the key table. The last loop generates the linkage-pairs for the methods. These can only be defined in the same module, where the type is defined. That's why the method number and the number of the procedure descriptor are stored in the OLF. The linkage-pairs of the methods are stored in the type descriptor, because methods were called by the address of the type descriptor plus offset. The procedure descriptors are in the linkage-section of the module. A2O stores also the associated linkage-pairs, but they aren't needed.
6 The Bootstrap-Process

Because the module-loader runs within of the Oberon System, at first itself and all used modules have to be loaded into the storage. Then the Oberon System has to be started for self-booting. For the most Oberon implementations the bootstrap works as follows: With a special tool, the so-called boot-linker, the module-loader and all other used modules were composed into a boot-file. The boot-file contains a direct memory map of these modules with their data-sections and all references in the code being already relocated. The boot-loader demands the Oberon heap from the operating system, loads the boot-file into the heap and resolves the address references. Then it jumps to the entry-point of the boot-file and the System boots. But now the heap is initialized in Kernel.Boot. Because the data of the boot-file consists of a contiguous block of data, the Kernel.Boot recognizes the end of the boot-file data and the beginning of free storage.

One disadvantage of this method is the code duplication caused by the boot-linker: it must contain the load routines, which are also in the module-loader. Also, the boot-linker must generate an exact image of the heap, by duplicating the storage management-routines from module Kernel. There is of course a good reason for having a boot-linker, namely cross development. Normally there is no stand-alone Oberon-2 compiler available on the target machine. The boot-loader is then written in another language for which a compiler is available.



    Target machine    Boot-loader language Module-loader language
    ____________________________________________________________
    Ceres NS32000     Assembler            Oberon
    Mac II            -                    MC68000 Assembler
    SPARCStation      -                    Modula-2
    DECstation        C                    Oberon
    RISC System/6000  -                    C
    Chameleon         Oberon               Oberon
    Mithril           Modula-2             Oberon
    Alpha AXP         Oberon               Oberon

Implementation language of the loader, extended from [Bra92]

Due to A2O, the Alpha Oberon bootstrap could be simplified. The boot-loader doesn't need a special boot-file; it reads each module as OLF. Therefore the boot-loader uses the same algorithm as the module-loader. Code-duplication could be limited to one single procedure (BootLoader.LoadModule). But one problem remains to be solved: How do the modules get into the heap, without leaving references to the boot-loader? Such references would disarrange the GC.

The solution is to use the Oberon heap only and to cut all remaining references. A2O maps the functions NEW and SYSTEM.NEW to Storage.ALLOCATE, and therefore the boot-loader could use another storage module, which maps Storage.ALLOCATE to Kernel.ALLOCATE. So the boot-loader allocates and initializes the Oberon heap, and all demands for storage are satisfied by Kernel.ALLOCATE from there.



  PROCEDURE InitHeap; (* types are taken from Kernel *)
  TYPE
    FreePtr = POINTER TO RECORD
      (* off-8*) tag : ADDRESS;
      pad0: LONGINT; (* todo 64 bit: remove pad0 *)
      (* off0 *) size: LONGINT; (* field size aligned to 8-byte boundary,
                                   size MOD B = B-8 *)
      pad1: LONGINT; (* todo 64 bit: remove pad1 *)
      (* off8 *) next: ADDRESS;
      pad2: LONGINT; (* todo 64 bit: remove pad2 *)
    END ;
  VAR size, firstBlock, endBlock: LONGINT; rest: FreePtr;
  BEGIN
    heapSize := heapSize*1024*1024;
    heapAdr := Unix.Malloc(heapSize);
    
    firstBlock := heapAdr + (-heapAdr-8) MOD B;
    size := heapAdr + heapSize - firstBlock;
    DEC(size, size MOD B);
    IF size = heapSize THEN DEC(size,B) END; (* makeroom for rest^ *)
    endBlock := firstBlock + size;
    heapAdr := firstBlock; heapSize := size; 
    (* save re-calculation inKernel.Boot *)
    rest := S.VAL(FreePtr, firstBlock);
    rest.tag := S.VAL(LONGINT, S.VAL(SET, S.ADR(rest.size)) + free);
    rest.size := S.VAL(LONGINT, endBlock) - S.VAL(LONGINT, rest) - 8;
    rest.next := 0;
    Kernel.FindRoots := TheEmptyProc; (* there's nothing to collect *)
    Kernel.Boot;
  END InitHeap;
 
As soon as the boot-loader demands storage from the operating system, the storage is initialized by a single free-block. Then Kernel.Boot is called and from there the GC. Because there are no objects in the heap, this serves for the free-list generation. During the load process of module Unix the variable _Unix.dllsym is initialized by the procedure boot-loader.dllsym. Because _Unix.dllsym is the first variable in _Unix, this can be done without importing _Unix. It would not make sense to import _Unix, because not the dynamic module _Unix, which is to be loaded, would be used, but the statically linked module.

After loading of the System modules into the heap, the heap also contains file buffers and other storage blocks. But the Oberon System should only take over the loaded modules. The task to remove them isn't done by the boot-loader, but by the system itself during boot-phase two. All references to the boot-loader have to be removed from the heap. "Module" is the only data type, which should be taken over. Thus only the tags for ModuleDesc to the corresponding tags in the Oberon System have to be patched:



  PROCEDURE PatchTags;
  VAR m: Module; i: LONGINT; t: ADDRESS;
    td: POINTER TO RECORD filler: ARRAY 5 OF INTEGER64; name: Name END;

  BEGIN
    m := modules;
    WHILE (m # NIL) & (m.name # "Modules") DO m := m.next; END ;
    ASSERT(m # NIL);
    i := LEN(m.tdescs^);
    REPEAT
      DEC(i);
      t := SHORT(m.tdescs[i]); (* todo 64bit: remove SHORT *)
      S.GET(t - 8, td);
    UNTIL td.name = "ModuleDesc";
    ASSERT(t # 0);
    m := foreignMod;
    WHILE (m # NIL) DO
      S.PUT(S.VAL(LONGINT,m) - qwSize, t);
      m := m.next;
    END;
  END PatchTags;

From the type descriptors of 'Modules' the type 'ModuleDesc' is searched using the information for persistent objects. Then the module list is checked and the tag is overwritten by the newly determined value respectively. The precondition is that 'Modules' is loaded directly or indirectly in order to determine the tag. This isn't a limitation, because 'Modules' is an essential part of the Oberon System, because it implements the module-loader.

Above is said that "all references to the boot-loader" have to be resolved, but exactly speaking, Unix.dllsym must still point to BootLoader.dllsym. Because this reference is located within a variable section of a module and procedure variables aren't traced by the GC, this reference doesn't disarrange. At this point the initializing body of 'Modules' is called and boot-phase two starts.

6.1 The Context-Switch, Bootphase 2

When the body of 'Modules' gets control, it initializes and calls Kernel.Boot. From there the GC is called and all modules are marked. Because the pointer in the variable section of the modules were assigned to NIL, no further objects are marked. By this all undesired blocks are released during the sweep-phase e.g. file buffer of the boot-loader. At the same time the free-list in the heap is produced. The free-list of the statically linked module Kernel is located in the variable section of the boot-loader and hence are inaccessible.



  PROCEDURE Boot*; (* is called from Modules immediatly after booting *)
    BEGIN
    IF ~ booted THEN booted := TRUE; (* avoid user call *)
      (* heap has been set up by boot-loader, so just get the values *)

      Unix.dllsym("heapAdr", S.VAL(ADDRESS, heapAdr));
      Unix.dllsym("heapSize", S.VAL(ADDRESS, heapSize));
      firstBlock := heapAdr;
      endBlock := firstBlock + heapSize;
      firstTry := TRUE;
      GCenabled := TRUE;
      GC(FALSE);
    END;
    Unix.Init;
  END Boot;

  Kernel.Boot
 
Now the heap contains only the data of the modules. By initializing of the remaining modules which were loaded, boot-phase two is completed. Boot-phase three is started by loading/initializing 'Modules' shown by the following sequence:


  VAR modPtr, cmdPtr: POINTER TO RECORD name: Name END;
  ...
  Unix.dllsym("modPtr", S.VAL(ADDRESS, modPtr));
  Unix.dllsym("cmdPtr", S.VAL(ADDRESS, cmdPtr));
  loop := ThisCommand(ThisMod(modPtr.name), cmdPtr.name)(*default: Oberon.Loop*)
 
modPtr and cmdPtr are pointers to character strings, which are specified by the boot-loader command qualifiers /Module[="Oberon"] and /Command[="Loop], which loads module Oberon and its body is executed. The body calls module Configuration and then returns to module 'Modules' and to Oberon.Loop, which waits for input.
6.2 Resolving foreign symbols: dllsym

Some problems remains to to be solved: How will the reloaded module Kernel come to know where the heap starts and what's its size? How can the options of the boot-loader be transfered to the System? How can the module-loader references system services?

This is done by BootLoader.dllsym. As mentioned in chap. 4.4, the procedure variable dllsym of the reloaded module Unix is assigned to that procedure. With that the connection from the Oberon System to the outer world is established. dllsym transferes all required data to the Oberon System. So it is not necessary to reserve further variables in the Oberon System for the boot-loader. The System demands data by the boot-loader. This minimal interface makes the System more robust for changes:



  PROCEDURE dllsym * (name: ARRAY OF CHAR; VAR res: ADDRESS);

   PROCEDURE ResSym(pat, library: ARRAY OF CHAR): BOOLEAN;
   VAR i, len: LONGINT;
   BEGIN i := 0; len := LEN(name); IF LEN(pat) < len THEN len := LEN(pat) END;
     WHILE (name[i] = pat[i]) & (i < len) DO INC(i) END;
     IF pat[i] = CHR(0) THEN
       IF ODD(lib.LIB$FIND_IMAGE_SYMBOL(library,name,res)) THEN RETURN TRUE END;
     END;
     RETURN FALSE
   END ResSym;

  BEGIN
    res := 0;
    (* resolve our symbols first *)
    IF name = "heapAdr" THEN res := S.VAL(ADDRESS,heapAdr);
    ELSIF name = "heapSize" THEN res := S.VAL(ADDRESS,heapSize);
    ...
    ELSE (* foreign symbol *)
      IF ResSym("SYS$","SYS$SSISHR")
      ...
      OR ResSym("","LIBRTL")
      THEN (* okay *)
      ELSE
        Console.Str("Error: Can't resolve symbol ");
        Console.Str(name); Console.Ln; HALT(20);
      END;
    END;
  END dllsym;
 
The procedure gets the name of the symbol to be resolved. If the symbol is defined by the boot-loader, the value is given back directly, for example the symbol heapAdr. Otherwise, the operating system service LIB$FIND_IMAGE_SYMBOL tries to resolve the symbol. To find out in which sharable image the symbol is, the beginning of the symbol-name is compared with a pattern, e.g.: all symbols, which start with "SYS$" are searched in the shareable image SYS$SSISHR.EXE. If no pattern matches, the symbol is searched in LIBRTL.EXE, X11 and other shareable image libraries.
7 The Garbage Collector

During development it turns out, that without GC the system would work properly. As described in [Bra92] the GC is responsible for closing the files, because the Oberon System does not have the possibility to close files physically, i.e., closing files on the level of the underlying file system. Without GC a file couldn't be opened a second time, after it was closed. As base, the GC of DECOberon was used, which itself is based on the SPARC-Oberon [Tem91]. This GC is able to follow objects on the stack. This is important, because the GC is also called implicitly, e.g. when a file is opened but the storage is insufficient.

The original GC of Oberon System V1 didn't need to search for objects on the stack, because the GC isn't called implicitly but only as command, i.e.: when the stack is empty. To follow pointers in dynamically allocated arrays, a special type descriptor has to be used.



    Target machine   GC language        search on stack  follows dyn. arrays
    ___________________________________________________________________
    Ceres NS32000    NS32000 Assembler  no               no
    Mac II           MC68000 Assembler  no               no
    SPARCStation     Modula-2           yes              no
    DECstation       Oberon             yes              yes
    RISC System/6000 Oberon             yes              yes
    Chameleon        Oberon             yes              yes
    Mithril          Modula-2           yes              yes
    Alpha AXP        Oberon             yes              yes

Implementation language of the GC and search strategies, expanded from [Bra92]

The GC needs compiler support, i.e.: alignment of record-elements, storage-allocation, and layout of typedescriptors. The GC must also be adapted to the particular heap-structures, i.e.: address extension to 64 Bit.

7.1 Alignment of Record-Elements

The GC has also to follow objects on the stack. Therefore the stack is run through in 4 byte increments. The data retrieved is given to the GC as pointer candidates (see also [Syp92]). Because local variables are 8 byte aligned, all local pointer variables are found. If the pointer is an element of a local record variable, it couldn't be found reliably. The reason is that record elements were byte-level aligned. The alignment for record elements was changed so that pointers within a record are always aligned to a 4 byte boundary and hence can be found by the GC.

7.2 Storage-Allocation

The GC requires information about the object size in the memory. This information has to be stored at allocation time. With DECOberon this is done by three procedures in module Kernel: one for records, one for dynamic arrays, and one for SYSTEM.NEW. As mentioned above, A2O maps NEW and SYSTEM.NEW to only one RTS-routine. Changing A2O to call different allocation routines would have been too much work. To avoid that change, the compiler was modified to generate inline code, to store the required data for the GC.

The inline code has one further advantage: The compiler can generate a tight loop (using loop-unrolling) to clear the allocated memory, so that pointer variables are initialised to NIL, which is required by the GC.

7.3 Layout of type descriptors

In the stand-alone version, A2O has generated the static type descriptors in the type descriptor section sequentially. Then follows the area for the support of persistent objects. The type descriptors were accessed by base (type descriptor section) + offset (type). However for the Oberon System the type descriptors have to be allocated individually, because it is possible that objects which need a descriptor, are still in the heap, even though the corresponding module was already released (see [Wir92]). Therefore the layout of the type descriptor section was changed to have an array of pointers to the actual type descriptors at the beginning. Then the type descriptors follow.

7.4 GC and 64-bit address extension

The Alpha AXP processor uses 64-bit addressing. From spring 1996 it is said that OpenVMS AXP is available with a 64-bit extension. At the moment each address -managed by the operating system- has 32 bit, with a sign extension to 64 bit. AOS (and also A2O) is designed to allow 64 bit address extension. In particular the heap structures start from 8 byte addresses. Naturally, the GC is highly optimized to be fast and although is is written in Oberon, it was difficult to understand and modify, because it uses low-level operations.

8 Exception Handling

As already mentioned in chap. 4.3, traps should handled within the Oberon System. There are two classes of traps: (1) those raised by the Oberon run-time system (e.g. array index out-of-range) and (2) those raised by OpenVMS (e.g. access violation, device full). A2O reports class 1 errors by a call to the operating system service LIB$SIGNAL which can be processed like class 2 errors.

Exceptions are normally handled by the command interpreter. The OpenVMS message utility [VMS4] translates the error numbers to textual error messages with the SYS$PUTMSG service, which outputs the message. The EH of Alpha Oberon also uses the OpenVMS message utility by specifying a call-back-routine as optional parameter of SYS$PUTMSG to the message. Thus error messages are displayed in an ordinary Oberon text viewer called System.Trap.

In contrast to other ports of the Oberon System, this has the advantage, that there is no need to deal with message generation. There is no need to know which errors can occur and what the associated message is. This provides for private shareable images (dllsym) to raise their own exceptions or to extend the compiler's run-time errors, without any modifications to module System.

After displaying the error message, the EH generates a list of procedures in the call sequence (procedure trace). The EH also displays the values of local variables according to their corresponding data type. This corresponds to the "facility for symbolic debugging" as described in [Wir92].

What happens, if a trap occured before the EH is installed by module System? The trap would be handled by the handler of the command-interpreter. If the trap occured after the control is transfered to the Oberon System, there is no possibility of locating the trap. Thus the boot-loader installs its own handler, which outputs the list of the call trace on the console.

The actual EH System.Trap is installed two times: (1) in the initialization part of module System and (2) in Oberon.Loop. The first installation is to catch traps during the boot-process. Because the handler is installed as a so-called stack-frame handler, it is reinstalled again when leaving the initialization body of module System. Thus Oberon.Loop installs it again, because the control shall be transfered back, if a trap occurs.

8.1 Walking the call-chain

Processing the call-chain is highly system-dependent. Unix and most other operating systems require to fumble with the stack-pointer. Under OpenVMS AXP this can be done by using two services to get the invocation context (INVO_CONTEXT) of the actual procedure and all its predecessors:



  PROCEDURE LIB$GET_CURRENT_INVO_CONTEXT (VAR invoContext$N: Invo_Context);
  PROCEDURE LIB$GET_PREV_INVO_CONTEXT (VAR invoContext$N: Invo_Context): BOOLEAN;
 
LIB$GET_CURRENT_INVO_CONTEXT fills the parameter invoContex with the context of the currently active routine. LIB$GET_PREV_INVO_CONTEXT delivers for a given context K of a procedure P the context K' of the calling procedure P'. Thus the call-chain length decreases by one context. If such a context K' doesn't exist, the system service returns FALSE. (Syntax note: the suffix $N in the procedure definition prevents, that A2O transfers the Oberon type tag to the system service used for the record type Invo_Context.)


 TYPE
    InvoContext = RECORD
      length: LONGINT;
      flags: SET; (* upper 8 bits are version byte *)
      procDesc: ADDRESS64;
      progCnt: ADDRESS64;
      procStatus: SYSTEM.QUADWORD; (* 64 bit flag field *)
      iReg: ARRAY 31 OF SYSTEM.SIGNED_64;
      fReg: ARRAY 31 OF SYSTEM.QUADWORD;
    END;
 
For the call-trace only the procedure descriptor 'procDesc' and the frame-pointer 'iReg[29]' from this structure are used. However, without the above mentioned system services it would be more complicated to get both data sets. By using the system services, the EH of the boot-loader consists of a few lines only:


  EX.LIB$GET_CURRENT_INVO_CONTEXT(invoContext);
  WHILE ~ OutProcName(SHORT(invoContext.procDesc))
        & EX.LIB$GET_PREV_INVO_CONTEXT(invoContext) DO END;
  WHILE EX.LIB$GET_PREV_INVO_CONTEXT(invoContext) 
        & OutProcName(SHORT(invoContext.procDesc)) DO END;
  RETURN EX.SS$_RESIGNAL;
 
OutProcName displays the procedure name belonging to INVO_CONTEXT. If the procedure isn't found, OutProcName returns FALSE. The first while-loop walks the call-chain until a procedure, known by the Oberon System, is found. The second while-loop executes as long as the procedures are known by the Oberon System. If a unknown procedure is found the loop terminates.

Because the boot-loader has to terminate the execution if an error occured, the handler returns the status value SS$_RESIGNAL. Thus OpenVMS passes the exception to the next handler in the call-chain. By doing so the EH of the command-interpreter is reached and the boot-loader terminates. The symbolic information belonging to each procedure is searched using its procedure descriptor. The symbolic name, type and offset data is contained in the RefBlk of the module. At the end of the EH the statement


 
  RETURN EX.SYS$UNWIND(mechArgs.depth,0);

transfers the control to the procedure which has installed the handler. The stack is set back, the saved register values are restored and Oberon.Loop continues.
9 Conclusions

Despite its enormous functionality, the Oberon System is relatively easy to port. In a first step, after the bootstrap was done, it was quickly possible to work basically with the System without GC and exception handler. But GC and EH are required to work properly with the System. Exactly these low-level parts render the port difficult, because they are hard to debug. The OpenVMS EH facilities proved to be a powerful instrument. Compared to Unix, the available system services sufficiently support establishing exception handlers. To prepare for the 64-bit extensions also posed a challenge. The original sources frequently contained constant literals instead of spelling identifiers. This required many modifications all over the source text. A big success is that most extensions like Edit, Draw, the hypertext elements and many other applications did run simply by recompilation.

10 References

[AXP] Richard L. Sites (ed.): Alpha Architecture Reference Manual. Burlington, 1992.

[Bra92] Marc Brandis, et al: The Oberon System Family. Department Informatik, ETH Zuerich, Report No. 174 (1992)

[Dot94] Guenter Dotzel: Alpha AXP/OpenVMS Modula-2 and Oberon-2 Compiler Project. In: Peter Schulthess (Hsg.): Proceedings of the Joint Modular Languages Conference, University of Ulm, Germany, 28-30 September 1994. Universit~atsverlag Ulm, 1994. An updated version of this paper is here: http://www.modulaware.com/max_sum.htm

[Gut94] Juerg Gutknecht: Oberon - Perspectives of Evolution. In: Peter Schulthess (Hsg.): Proceedings of the Joint Modular Languages Conference, University of Ulm, Germany, 28-30 September 1994. Universit~atsverlag Ulm, 1994

[Kna94] Markus Knasmueller: Oberon Dialogs, User's Guide and Programming Interface. Institut fuer Informatik, Johannes Kepler Universit~at Linz, Report No. 1 (1994).

[M~os91] Hanspeter M~ossenb~ock, The Programming Language Oberon-2. Department Informatik, ETH Zuerich, Report No. 160 (1991)

[M~os92] Hanspeter M~ossenb~ock, Object Oriented Programming in Oberon-2, Springer Verlag, 1992. [Pfi91] Cuno Pfister (ed.), et al: Oberon Technical Notes. Department Informatik, ETH Zuerich, Report No. 156 (1991)

[Rei91] Martin Reiser: The Oberon System, User Guide and Programmer's Manual. Addison-Wesley, 1991

[Rei92] Martin Reiser, Niklaus Wirth: Programming in Oberon, steps beyond Pascal and Modula. Addison-Wesley, 1992

[Sup94] HP Oberon, The Oberon Implementation for Hewlett-Packard Apollo 9000 Series 700, Jacques Supcik, Department Informatik, ETH Zuerich, Report No. 212 (1994)

[Szy92] Clemens A. Szyperski, Insight ETHOS: On Object-Orientation in Operating Systems, ETH Zuerich Dissertation (1992).

[Tem91] Josef Templ, Design and Implementation of SPARC-Oberon. Structured Programming, 15:12, 197-205. Dez. 1991,5

[VMS1] Digital Equipment Corporation: OpenVMS Programming Concepts Manual. Maynard, Massachusetts, 1994

[VMS2] Digital Equipment Corporation: OpenVMS Calling Standard. Maynard, Massachusetts, 1994

[VMS3] Digital Equipment Corporation: OpenVMS DCL Dictionary. Maynard, Massachusetts, 1994

[VMS4] Digital Equipment Corporation: OpenVMS Command Definition, Librarian, and Message Utilities Manual. Maynard, Massachusetts, 1993

[Wir88] Niklaus Wirth: The Oberon System. Department Informatik, ETH Zuerich, Report No. 88 (1988)

[Wir92] Niklaus Wirth, Juerg Gutknecht: Project Oberon, The Design of an Operating System and Compiler. Addison-Wesley, 1992

11 Acknowledgements

Thanks to Josef Templ for his hints concerning the GC and to A. Schuhmacher who helped with the translation from German into English.

12 More details

A more detailed description which contains several illustration is contained in the german description of this project: "64-Bit-Portierung des Alpha-Oberon-Systems und des Oberon-2-Compilers" by Hartmut Goebel


The ModulaTor Forum

Book recommendation

Michio Kitahara: "The entangled civilization: democracy, equality, and freedom at a loss", 369 pages. University Press of America, Lanham - New York, 1995 and Open Gate Press, London, 1995.

While reading this book, it became evident, that the author has read many good books on economy, political philosophy, history, socialism, statism, science, and psychology, while he lived in Japan, Europe and America. This book shows how vulnerable the Western civilisation got through socialism, how the self as an object in a collective setting is manipulated, that the cause for peoples' violent protest against nuclear power plants is based on egoist human thought.
When explaining how collectivism is emphasized at the expense of individualism, he writes on page 230:
"But the ironic point here is that collectivism is carried out on the basis of the individualistic perception of human behaviour without knowing or realizing this. This is another very important point in this book, and I would like to ask you to read the above sentence once again."
Unfortunately, I can't give you more quotes, because some visitor has stolen my copy of this book, which had a nice all black hard-cover. Explosive content without any journalistic hype: Must read!

Michio Kitahara: "The African Revenge: The Age of Regression and the Decline of the West", Columbus, Ohio: Pine Island Press, 1997.
Book description as provided by the author: By reflecting our evolutionary background, the structure of the human brain contains two primitive levels which deal with the basic existence of ourselves as animals, such as sex and territorial defense. On top of these, we have another level dealing with human characteristics, such as morality, ethics, reason, compassion, and the art of interhuman relations. Medieval Europeans were very much under the influence of the primitive parts of the brain. Along with the rise of the modern West, they learned to restrain them.
But the rise of the modern West also entailed colonialism and slavery. The Africans in America have been forced to suffer for centuries. There is now abundant scientific evidence that when humans experience hardship, adults become childish. When the hardship is extreme, humans tend to exist under the dominance of the two lower levels of the brain. As a result of their tragic past, the African-Americans have created a unique culture of their own, characterized by these tendencies.
This culture emphasizes sensuality, spontaneity, action, and emotions, which appeal to the more primitive aspects of human existence. For this reason, it is irresistible. African-American superstars in rock music, sports, and entertainment became the role models for everyone. But unfortunately, this culture is incompatible with the basic characteristics of the modern West, which emphasize logic, reason, rationality, and the restraint of emotions and spontaneity. The West is also being Africanized more and more in counterproductive ways, as seen in drugs, vandalism, violence, and crimes against persons. Western civilization's abuse of the Africans has boomeranged back upon itself.

About the Author: Michio Kitahara was born in Japan but received his Ph.D. from the University of Uppsala, Sweden. He has held teaching or research appointments at the Universities of Maryland, Michigan, and San Francisco, as well as the State University of New York at Buffalo. This is the third and final book of his trilogy on the rise of the modern West and its future, following "The Tragedy of Evolution" (1991) and The Entangled Civilization (1995). He currently lives in Sweden in order to study the fate of Scandinavian social democracy firsthand.


IMPRESSUM: The ModulaTor is an unrefereed journal. Technical papers are to be taken as working papers and personal rather than organizational statements. Items are printed at the discretion of the Editor based upon his judgement on the interest and relevancy to the readership. Letters, announcements, and other items of professional interest are selected on the same basis. Office of publication: The Editor of The ModulaTor is Guenter Dotzel; he can be reached by tel/fax: [removed due to abuse] or by mailto:[email deleted due to spam]

[ Home | Contact | Legal | OpenVMS_compiler | Alpha_Oberon_System | The ModulaTor | Bibliography | Oberon[-2] links | Modula-2 links | General: Interesting Books Selector ]
Amazon.com [3KB] [Any browser]
© (1996-2009) by modulaware.com
Webdesign by www.otolo.com/webworx, 09-Sep-2009