ModulaTor logo, 7.8KB

The ModulaTor

Oberon-2 and Modula-2 Technical Publication

Ubaye's First Independent Modula-2 & Oberon-2 Journal! Nr. 81, Apr-2001


Oberon-2 from 32 bit to 64 bit -- without any programming language change

by Günter Dotzel, Apr-2001

Preface: This is an article I prepared in July 1999, but I decided to delay publication after the discussion on this topic got offensive in news:comp.lang.oberon at this time.

Definition is restriction. Thus language reports do not define the size of pervasive data types. This is the task of implementation notes.

A programming language standard does neither define the size, nor the internal representation, nor storage allocation, nor alignment of abstract whole number, real number, or any other data types. Any such specification would unnecessarily restrict compiler implementations on different processors and operating systems. In contrast, compiler implementation notes define such low-level details.

High-level programming language provide an abstraction between the application programs and the hardware/operating system. The goal of a programming language specification is to provide this abstraction while maintaining the programs' semantics independent from any specific compiler implementation. Application programs developed without assumption of any implementation-specific details are portable.

For example, programs developed witg portability in mind on 32 bit machines using a 32 bit compiler should compile without modification on a 64 bit machine with a 64 bit compiler.

Should this migration present any problems, this is mostly due to the fact that the developer assumed specific pervasive data type size or used low-level facilities. Relying on specific data type sizes, i.e., upon the implementation notes, renders a program unportable. Low-level facilities should should only be used in well-encapsulated, system dependent modules.

In the following only Oberon-2 is considered. Oberon-2 is a modern, imperative, object-oriented programming language. Compilers for Oberon-2 are available for all popular processors and operating systems. A huge amount of free applications with complete source code is available.

Most compilers are 32 bit implementations. But there are also compilers for 64 bit machines, see "64 bit Oberon", single-chip computers, and digital signal processors. Here is a summary of many available compiler implementations.

To clarify the term 64 bit Oberon-2 compiler, see What is a 64 bit compiler?

For example, a change in whole number data types from SIZE(LONGINT)=4 to SIZE(LONGINT)=8 only breaks lowest-level source code. The rest of the programs developed with 32 bit compilers, not using other implementation specific features, such as module SYSTEM, remain unchanged.

The Oberon System V4 is the largest Oberon-2 source base. V4 comprises an extensible programming system with a graphical user interface, a programming library, and many tools and applications.

When working on a 64 bit machine, one wants to use the flat 64 bit address space and 64 bit whole number arithmetic. This was the reason to construct 64 bit processors.

A 64 bit Oberon-2 compiler on a 64 bit machine thus requires that (1) pointers, addresses, and the pervasive type LONGINT are 64 bit wide, (2) the operating system provides a 64 bit storage allocation procedure, (3) 64 bit range constant integer expressions are allowed, (4) there are no 32 bit restrictions, e.g., in indexing arrays, size of structured data), and (5) the separate library and run-time system can handle 64 bit data types. 32 bit whole numbers are only needed where memory space considerations matter or where foreign language interfaces require 32 bit data types. To remain portable with existing source code, neither the language nor the library interface must be changed.

Library modules which contain procedures such as

PROCEDURE WriteInt (i, n: LONGINT);
PROCEDURE ReadInt (VAR i: LONGINT);

do not require any interface change, although the procedures' implementation will require some modification to provide for the larger 64 bit integer range.

Product Terminolgy

A2O is a 64 bit native code Oberon-2compiler for OpenVMS Alpha operating system.

AOS is a 64 bit implementation of the Oberon System V4 on 64 bit Alpha OpenVMS.

OOC is an Oberon-2 to C translator for Unix.

The ISO Modula-2 library is written in Modula-2. To create stand-alone Oberon-2 applications with A2O, the ISO Modula-2 lib (32 bit and 64 bit implementation) is used.

Under AOS, the ISO M2 lib can only be used via a foreign language interface. Within AOS, M2 is a foreign language, just like C and Pascal. (If it were a goal to have the ISO Modula-2 lib under the Oberon System, it must be transpiled from Modula-2 to Oberon-2.)

AOS uses a special object file format which allows dynamic link load + metaprogramming + fine grained symbol files. The library of AOS is the Oberon System V4 API.

There is only one compiler (one source/one exe-file), called A2O, which is both,

  1. a stand-alone 32 bit compiler (* compatible with standard Oberon *)
    • with SHORTINT/INTEGER/LONGINT/SYSTEM.SIGNED_64 of size 1/2/4/8 bytes,
    • with 64 bit integer extensions and
    • LONGINT synonym for SYSTEM.SIGNED_32 and

  2. a stand-alone 64 bit compiler (* compatible with standard Oberon *)
    • with SHORTINT/INTEGER/SYSTEM.SIGNED_32/LONGINT of size 1/2/4/8 bytes,
    • with 32 bit integer extensions and
    • LONGINT synonym for SYSTEM.SIGNED_64.

A2O can generate 32 bit and 64 bit code (size of pointer/proc-var/LONGINT) for both, (1) the OpenVMS linker and for (2) the AOS's linker/loader.

To compile a module, module Compiler.Compile calls A2O from AOS (Alpha Oberon System) only as a 64 bit compiler, because AOS is a 64 bit Oberon System.

AOS is a set of several hundred Oberon-2 modules which compiles with A2O in either 32 bit or 64 bit mode without source code change. If you compile all modules in 32 bit mode you'd get a 32 bit Oberon System.

According to the Oberon-2 report, SHORTINT/INTEGER/LONGINT are abstract whole number data types, only defined by the type inclusion SHORTINT <= INTEGER <= LONGINT. Data types with specific sizes are imported from SYSTEM.

LONGINT gets you the maximal integer range with any given compiler. If a program worked with a 32 bit compiler, it'll also work with a 64 bit compiler, given the size of pointers, SYSTEM.PTR matches the size of LONGINT and given the result type of ENTIER() and SYSTEM.ADR() is LONGINT.

If you follow these rules, no source code changes are required to existing programs, except where specific data type sizes are needed. These are imported from module SYSTEM.

If someone had reason to use the pervasive type SHORTINT, she knew about the range of this type and that it most probably matches the type size of CHAR and SYSTEM.BYTE.

If someone had reason to use INTEGER instead of LONGINT, she most probably knew, that at least this data type size is needed to represent for example ORD(char).

All 32/64 bit compatible language extensions are described in "64 bit Oberon" Not a single language change was required, which would invalidate existing source code.

Download the free AOS for OpenVMS Alpha

The port of the Oberon System V4 to the Compaq 64 Alpha under OpenVMS is one example of a successful 64 bit migration with more than 300 modules; it is described in the article entitled "64 bit Oberon".

To clarify the 64 bit Alpha Oberon System (AOS) port: No source change was required for 99% of all existing Oberon System V4 applications. Note, only 1% of all modules did required changes -- not 1% of the source code. This is mostly because in some low-level modules, the programmers assumed that SIZE(LONGINT) would always be 32 bits -- at the time when the Oberon System was developed, this was a practical, but nevertheless unnecessary assumption. Still 99% of the modules simply compiled and worked even without even looking at the source code. (We would not have had the time to even look at the huge amount of source code already publicly available at this time.)
Only low level modules needed some modifications -- all these 32 bit dependencies were easily tracked down after being flagged at compile time; usually only a few source lines had to be changed. Such changes can be made such that it would still would compile with 32 bit LONGINT size.

Our approach was proved by the success of porting a huge collection of applications within a few weeks.

Nevertheless, there are other proposals to provide for 64 bit extensions in Oberon-2:

One Oberon-2 compiler implementation (OOC) added a new pervasive integer called HUGEINT and the author even proposed this extension in an effort to standardize the Oberon-2 language and -- even worse -- trying to standardize the pervasive integer type sizes. But the HUGEINT-approach is flawed because it changes the de-facto Oberon-2 language report by

  1. introducing a new pervasive identifier HUGEINT in addition to SHORTINT/INTEGER/LONGINT,
  2. SYSTEM.ADDRESS instead of SYSTEM.PTR,
  3. changing the function procedure result type of SYSTEM.ADR from LONGINT to SYSTEM.ADDRESS,
  4. introduces some magic mapping from either HUGEINT or LONGINT to SYSTEM.ADDRESS depending on whether it is a 32 bit or a 64 bit implementation,
  5. requires an additional function procedure to convert from real to HUGEINT, assuming he did not change the result type of pervasive function procedure ENTIER which is defined to be LONGINT,
  6. same problem as in (5) with the pervasive function procedure ASH (anyIntegerType, x) whose result type is LONGINT.
  7. same problem as in (5) with the first argument (memory address) of SYSTEM.GET, PUT, and BIT which is of type LONGINT.
  8. same problem as in (5) with the three arguments of SYSTEM.MOVE (source, target address, and length) which are of type LONGINT,
  9. constant integer literals and expressions are limited to the 32 bit LONGINT range (otherwise it would not be backward compatible, with HUGEINT being optional),
  10. Array indexing with variables of type HUGEINT is not possible with compiler implementations that do not support the type HUGEINT,
  11. does not offer seamless 64 bit migration, because library procedures such as WriteInt above can only accept actual parameters in the range of LONGINT, which is still 32 bits.
Such a HUGEINT-approach would be appropriate for a 64 bit Oberon-2 extensions -- not for a 64 bit Oberon-2 compiler (on a 64 bit machine).

I also checked the latest source of OOC (summer 1999): it is a 32 bit compiler, even when run on a 64 bit machine, because (1) the scanner can't even parse 64 bit integer literals, (2) the library is only 32 bit int/str/int conversion. (3) all formal procedure parameters of type LONGINT must be converted to HUGEINT, if they want to migrate to 64 bit. In addition, OOC does not have any number conversion from real to hugeint; ENTIER() result type is always LONGINT. The compiler itself internally uses only LONGINT, which doesn't allow to store 64 bit constant integer values. OOC has 32 bit restrictions all over the place (size of structures, maximal index in array declaration, max index in dynamic arrays, size of local and global variables, size in SYSTEM.COPY, MOVE).

The compiler would have to use HUGEINT instead of LONGINT in most places. The resulting source of OOC would no longer compile on 32 bit machines, which proves that there 64 bit concept results in 32 incompatibility. MAX/MIN(HUGEINT) is even set to MAX/MIN(LONGINT) in the ansi c backend; there is no other (native code) backend. OOC has a long way to go, because its HUGEINT extension does not allow seamless migration from 32 to 64 bit.

My observations allow to conclude that they don't have any 64 bit experience, but they want to standardize their implementation notes (what concerns the size of pervasive integer types, SYSTEM.ADR(): SYSTEM:ADDRESS, magic mapping from LONGINT or HUGEINT to ADDRESS, etc., apart from LONGCHAR, LONGCHR(), ...

Note, this is the state of summer 1999; I don't know if OOC was modified in this respect since then.

Q&A

Q: What is a 32 bit restriction?

A: You've got a 32 bit restriction, if for example:

VAR l: LONGINT;
BEGIN
  l:=SYSTEM.ADR(l);
does compile with both oo2c_32 (32 bit implementation) and oo2c_64 (64 bit implementation)?

I guess it does not, because the result type of SYSTEM.ADR is SYSTEM.ADDRESS; on 32 bit systems this is an alias to LONGINT, on 64 bit systems an alias to HUGEINT. Later it was stated that SYSTEM.ADDRESS were not a synonym for HUGEINT. so is alias and synonym not identical? Still later it was clarified: "[in OOC], on systems with 32 bit pointers, ADDRESS is an alias for LONGINT, whereas on systems with 64 bit pointers, it is an alias for HUGEINT."

Anyway, this is a language change and it breaks existing code, because a new data type HUGEINT is needed to take advantage of 64 bit arithmetic/adressing.

An then why are there two different compiler names for OOC?

A2O does not need such a disticntion, because it is both, (1) a 32 bit and (2) a 64 bit compiler without 32 bit restrictions; It's just a compilation switch. This is possible, because A2O always runs on a 64 bit machine. (The 32 bit mode is no longer needed for AOS; it was only kept for stand-alone programs.) And if you want

This is what we did. It is simple. It has been proven. It works. It allows migration from 32 bit to 64 bit pointer/address/longint with maximal upward and backward compatibility. It does not require any change to the Oberon-2 language report, and e.g.: the result type of SYSTEM.ADR() remains LONGINT. We are not pushing our language, because we did not make any language change to Oberon-2. (What concerns the language extensions: even without the new LONGSET type and 64 bit hex literals which we introduced in A2O, all I said above is still valid.)

One ironic tragedy remains: If the majority adopted OOC as a de factor standard, it'd create a lot of work necessary to modify existing source code, which in turn could decrease the jobless rate. ;-) And after all, the majority is always right. Right?


But OOCists really debated source code changes:

Q: "Why are you so against code changes? These modifications solve the problem once and for all."

A: Because the number of applications and tools that exist for the Oberon System is to large and life is too short to even look at them. I compiled and used them (the applications and tools) on a 64 bit oberon system and they worked. This proves that our 64 bit concept (you might call it 64 bit extension) works.
In addition, you have to consider Oberon as language and environment. The Oberon[-2] language is only one piece in extensible programming created by Wirth/Gutknecht. Any attempt to make the Oberon-2 language incompatible with existing source code developed for the Oberon System neglects their achievements.


An OOCist complained that

WriteInt(MAX(LONGINT), n);

will output different values, depending on size of LONGINT.

A: but it always outputs MAX(LONGINT), as required, e.g. 2^31-1 using a 32 bit compiler and 2^63-1 using a 64 bit compiler. LONGINT ist abstract whole number type und implementation dependent; only MAX(SYSTEM.SIGNED_32) is always 2^31-1.


An OOCist disputed our source base to test the practicallity of our simple approach:

"Once and for all. The Oberon Systems are a research product of various universities to test new aproaches in operating system design. Cool stuff. However the commerical effect is nearly zero, nothing. ... Oberon Systems are a dead end for my purposes. The official Oberon people have missed that badly."

A: There are real world applications for OS V4 and S3. I use Gisela's spreadsheet [which she developed for V4] under AOS. (Too bad that her source package is difficult to find on the web.) I use Kepler to draw illustrations, and other tools. I know of someone (other than the author), who uses Nepros (artificial simulation system) I use AOS for programming. My friend plays Tetris under AOS. These are all really useful, real world, and there are more.


Q: can compile your programms using X11 without interface or programm code changes when your datatype sizes change as result of your type model?

A: Yes. The alpha oberon system (AOS) uses X11 in many lower level modules such as Font and Display. The whole AOS together with all its tools and applications can be compiled with LONGINT being 32 bit or 64 bit wide without any source code change. This was one of the goals of the port. Just for fun. Only that you can't use 64 bit addressing in 32 bit AOS, which is fair enough. 64 bit AOS can compile/run everything 32 bit AOs can, so there is no need for a 32 bit AOS.

Implementation Notes: The procedure calling conventions might be different on different operating system or processors, but this does not change the interface seen by the Oberon programmer, except if you talk different releases of x11 (each time they changed a lot).

Because not all Oberon System implementations are based on X11, module X11 is normally not directly used in Oberon System. Such programs would not be portable. X11 is used to implement the modules Display, Font, etc. on Oberon Systems based on X11 (linux/unix and openvms ports).

Last time I checked X11 was still 32 bit only (pointers etc.). I guess this is because a 64 bit version of X11 would break too much existing C/C++ code. So they can't come-up with a 64 bit X11 version. This is the reason why 64 bit OpenVMS has a 32 bit X11 implementation.

In AOS global variables and heap which are always located above the 32 bit address space. The full 32 bit address space is reserved for the main and coroutines stack (OpenVMS stack restriction).

Because of the 32 bit X11 restrictions, AOS copies all data which goes to or comes from X11 via 32 bit memory, which is allocated on the local procedure stack (auxiliary local variables). For open array formal parameters, the procedute stack is enlarged dynamically. (This extra copying is not noticable even on the slowest existing Alpha workstation.)

The copying must be done for all modules directly making data transfers to/from X11.

By the way, everything in AOS is written in O2. Not a single line of assembly, C language or any other foreign language is used -- in the case of AOS, even the primary Oberon System bootstrap loader is written in O2.

Using A2O to generate code for stand-alone programs, which by the way means that our concept even works for so-called command line compilers, i.e. out-side the Oberon System -- and yes, we can also use X11 in stand-alone 64 bit Oberon-2 programs -- in what you call the "real world" and we can also import the 64 bit ISO Modula-2 library modules in Oberon-2.

We used the same concept in our 64 bit OpenVMS Alpha Modula-2 compiler (8 byte sized INTEGER/CARDINAL/subranges and SYSTEM.[UN]SIGNED_32 and _64).

The advantage of a extensible operating system interface is that the application program interface is mostly 64 bit ready. OpenVMS Alpha is one such example. For the I/O-size it does not matter if you have a 32 bit or 64 bit actual parameter. The immediate value is always passed in a 64 bit register (or on the stack from the 7th. parameter); same applies to addresses. Except in case of VAR parameters, where the callee needs to know the size of the actual parameter (good that Oberon requires type identity for VAR params).

Of course X11 is still restricted to 32 bit. But this is due to the inflexibility of C's type system (pointer size/long int). Seems that C's "long int" is not an abstract integer type, but 32 bits. (wonder whether this size is really defined in the ansi c or c++ standard.) so Oberon programmers have to life with C's flawed language design when inheriting applications/standards written in C.

VisualOberon

In 1998, I looked for a student who would port VisualOberon (VO) to AOS in the summer holidays, just for curiosity and in order to prove that it is possible, but I couldn't find one. Most of the work would have been attributed to overcome the 32 bit X11 restrictions.
(The resulting port would again be under GPL of course, although I always had and still have mixed feelings about of GPL in general.)

Future

On a 128 bit compiler, the size of LONGINT and pointers will be 16 bytes. Again, no changes to existing source code will be required.

Summary

From what I've seen, the OOC compiler is great (apart from the fact that there is only a ansi-c back-end). One could make a 64 bit compiler from OOC by migrating LONGINT to 64 bit without changing too much (discard HUGEINT and put 32 bit ints into SYSTEM). But exactly this they are opposing to. They live in a 32 bit world, claiming that to interface foreign language libraries they need LONGINT to be 32 bit. But that will change on 64 bit operating systems which use 64 bit whole numbers in the API (size of a i/o transfer, size of memory allocation, etc).

Links:

"Porting the Oberon System to AlphaAXP", which was still a 32 bit system with 64 bit integer extension, was published at our web-site in Jun-1996.

"64 Bit Address Extension of the Alpha Oberon-2 Compiler", was published at our web-site in Sep-1996.

"64 bit Oberon", was published at our web-site in summer 1997 and in the ACM SIGPLAN notes in early 1998.


IMPRESSUM: The ModulaTor is an unrefereed journal. Technical papers are to be taken as working papers and personal rather than organizational statements. Items are printed at the discretion of the Editor based upon his judgement on the interest and relevancy to the readership. Letters, announcements, and other items of professional interest are selected on the same basis. Office of publication. The Editor of The ModulaTor is Günter Dotzel; he can be reached at [email deleted due to spam]

Most of the ModulaTor back-issues are available from http://www.modulaware.com/mdltr_.htm


ModulaWare.com website navigator

[ Home | Site_index | Legal | OpenVMS_compiler | Alpha_Oberon_System | ModulaTor | Bibliography | Oberon[-2]_links | Modula-2_links | General interesting book recommendations ]
© (1999) by modulaware.com
Created 12-Apr-2001 (assembled from replies in newsgroups of Jul-1999), revised 02-May-2001