The industry today is plagued by a variety of problems, including operating systems that lack security, viruses, worms, spam, theft of identity, intrusion into personal systems, wireless data interception, satellite data interception, hackers, and so on. The costs to industry from spam alone are high, and viruses have played havoc with business activity, even putting some companies out of business. Security threats to individuals, companies, and countries are increasing. It is high time that we addressed potential solutions and acted upon those that offer the most promise.
This article describes one possible solution and outlines a programming paradigm that could be developed as a standard. The solution has already been used successfully on several levels of computers, from main frames and microcomputers to 8-bit RISC chips for smart cards and embedded systems. The paradigm proposed is a standard that can apply to all levels of programming activity, with considerable flexibility for customization. It has the potential to eliminate most of the problems mentioned earlier and is small enough that the entire system could be encrypted for each computer and server. It would simplify or eliminate all operating systems.
The paradigm utilizes an expandable language, which can be converted to a byte string on any computer. The byte string is completely independent of the target computer for the application. Using the rules established for the paradigm, a simple compiler can process any application written in terms of the expandable language. In fact, the rules are so simple that we can develop an application without the need for a compiler, generating a byte string for acceptance by the run system. The end of the article explains how readers can obtain a copy of the simple compiler and a basic expandable language, to try that phase out.
The run system processes the generated string of byte codes. To do this, it uses a double numeric system, which uniquely identifies every element needed within an application. This technique makes the virtual processor run system very tiny, from three or four thousand bytes for 8 bit RISC chips using typical smart card and embedded systems applications, to seven or eight thousand bytes for a microcomputer with simple graphics, to somewhat more for the processing of video images and other more demanding applications. The numeric coding system used uniquely identifies every activity that the system will carry out, making for a fast running application. The numeric codes of the paradigm are unique, which makes it possible to add new capability without affecting what was already developed.
To compile or not to compile: that is the question
The author has had experience in developing compilers and a JAVA run system. FORTRAN and COBOL (as for C, C++, PL1, and other compilers) generate a machine language structure, which utilizes a library of subroutines. In JAVA, a string of byte codes is generated which requires a library of methods and similar structures. The libraries for both approaches tend to be large. FORTRAN and COBOL are quite limited in their capabilities while JAVA is verbose with a clumsy vocabulary. Even the very simple ubiquitous “hello world” application in JAVA needs a huge amount of methods and resources.
The compiler for the paradigm of this paper is itself very small and can be placed, if desired, at the front of the run system, taking just a few hundred more bytes (as the compiler and run system have mutual routines). In that situation, the language elements, rather than the byte codes, are presented to the system, which generates the byte codes first and then runs them. This mode is particularly useful for safety critical applications (where the compiler and the application have to be tested whenever a change to either occurs). For the remainder of this article, let us assume that the compile is complete and the system was presented with a string of byte codes. All elements of the run system are static. The only variable part of the paradigm is the generated byte stream, which will vary from application to application.
Every application consists of a string of language elements that may be associated with parameters such as numbers or variable names. With the exception of numeric data and literals, all language elements and variables are converted to a single byte. Each language element is associated with a number currently running from 0 to 255, although at present only a fraction are used. It is unlikely that more than 256 will ever be needed but, if so, the number can simply be increased to just short of 65536, without in any way affecting what has been developed before such an extension takes place. Some typical examples of language elements are:
looping 1 1 100 adr grt
screen ^hello world to^ name
arith alpha = beta + gamma / delta + 13
The language elements shown are “looping,” “screen,” “bitmap,” and “arith,” which will have a numeric code associated with them such as 3,5,+, or *.
For the element “looping” the numbers 1 1 100 represent looping parameters going from 1 by steps of 1 to 100. The symbols adr and grt represent transfer points for the true or false result of the operation. Such a language statement may result in the byte code sequence 2(1(1(d25.
The (1 indicates that a numeric number has been converted to its binary equivalent—in this case a 1. The 25 indicates that the second and fifth named language statements are to be transferred to depending on the result of the looping arithmetic (this is done automatically during the compile process). Other language elements give alternate forms of loop control.
The element “screen” might result in the byte code sequence 511, indicating that the first literal is to be placed on the screen, followed by the first variable, which would likely contain the name of a person receiving the message. It has been ascertained that few applications contain more than 256 variable names. While this is the limit in the initial system, extension to just less than 65536 variable names can be accomplished without disturbing what has been developed previously.
The element “bitmap” would have the single byte code +, which would trigger a sequence of activity in the run system asking for the name of the bitmap image that should be produced.
The final language element “arith” might generate the byte codes
where the fourth variable has the result of taking the seventh variable, adding the sixth divided by the third, to which we add 13.
In this case, the % indicates that what follows is a floating point number whose length is 5 with the positive sign and whose value is 13.0. Again, the relative numbers used are a function of the compiler, and the programmer doesn’t have to be aware of the coded sequences.
An initial reading may suggest that the structure is complicated. However, it is this simple—the very tiny compiler does the numeric conversions and the run system processes them. The programmer does not need to specifically know the coding system. The run system, from that byte code stream, does exactly what is required.
One important observation is that a spurious byte code introduced nefariously would likely cause the application to abort. For applications that are more critical, we could add a check sum at the end of the byte codes, giving the total value of all bytes, and more or less guaranteeing security from hackers and virus activity. This would be checked at the beginning of an application.
The Virtual Machine processor —the run system
The run system consists of about 30 small modules in native code, the number of modules depending on the functionality included. Most of the modules are independent of each other so that the size of the virtual processor (VP) can be reduced for the client needs (for example, smart card applications may not need the graphics or the bitmap modules). Even so, the VP is very small for most client needs, ranging in size from about 4k bytes to 10k, depending on the functionality included. Access to the VP is through a numeric code within the static part of the software.
Most of the modules require only a few bytes of machine code, the only exceptions being modules such as bitmap and the software floating point routines for add, subtract, multiply, divide and test floating point numbers (which are similar to but more accurate than the IEEE format). The technique of numeric coding for the static part of the system is what makes such a small VP size possible. The coding also enables the VP to go directly to both the module required and its associated parameters.
Most of the modules are concerned with data moves from direct or indirect addresses, and with binary arithmetic and logic routines. These are all that were necessary from a review of compiler-generated code from many applications in a business environment.
Language and internal elements
In addition to the VP there is a static section (which is the main reason such a small but functionally powerful system can be built) consisting of a string of numbers associated with each language element (one for each element) and a set of internal element numeric strings which complement the modules in the VP. Both the language elements and the internal elements complementing the VP modules were identified by a study of existing business applications. Both can be augmented without affecting what has been developed previously, enabling a controlled development of the concept to take place.
The application programmer does not need to know the internal structure of the elements, this being only of concern to the very small number of people who are involved with system expansion. The general form of the elements is illustrated, as shown here:
Name1 A, B, C, m, D, E, F, G, name2, n, H, I, o, name7, endit
This is the typical structure of an element. Each element, whether language or internal is given a mnemonic associated with its function, such as Name1 in the figure. This applies to either a language or an internal element.
A, B,… indicates an identifiable mnemonic calling for a VP module with appropriate parameters, such as “screen” or “looping.”
m, n… indicates a branch operation depending on whether the result of the previous activity was “true” or “false.”
Name2, name7… refers to other internal element strings that will be used. The element called must not be a language element. This feature also contributes significantly to the very small size of the system, as several layers of internal elements may be addressed before the system returns to the VP module following the call to another internal module.
endit is a special function indicating the end of an element, it need not be placed at the end of the element but should be the logical termination point of the element.
The various items A, m, namex are assigned by system developers, the system itself being designed to process the byte codes generated by the compile operation. As was mentioned earlier the byte codes can be generated on any system with an appropriate compiler, or even be generated manually.
Structure of the numeric code
The numeric code is number between 0 and 65535. Numbers above 65500 are for control functions, such as endit and error and termination activities. Numbers less than 512 indicate a logical transfer within the element, while numbers under 32768 but above 512 refer to an internal element name location. In this regard it is not expected that the internal elements will ever need more than 32768 bytes, requiring only about a quarter of that figure at the present time. As the concept becomes accepted as an industry standard it is conceivable that internal elements could go higher than 32768 but a strategy has been developed for an orderly upgrade should that situation ever occur.
The numbers between 32768 and 65500 refer to the use of modules within the VP. One part of the number indicates the specific module to be used, while the balance of the number uniquely identifies the location of the parameters to be used. It should be stressed once again that these numbers are allocated during the compile stage from the language statements of the application, and need not be known by an application developer.
Most applications are relatively compact in their byte code structure and many applications can be resident simultaneously, even on smart cards with their limited real estate, as well as on embedded systems. In early development work a complete hospital information system, on-line accountancy, and a business credit reporting system were all using the same VP software, each with a string of their own applications.
Use of multiple applications with the same software does involve some control of the application names, to avoid duplication and ambiguity, but this is relatively simple to accomplish. The multiple applications can also be assigned one, two, or three priority levels, if desired. In this context all priority one’s are processed once, then a priority two, repeating the cycle until all priority two’s have been processed once, at which time a priority three gets processed. This round robin priority ensures that all applications do see some light of day during the course of on-going operations. This is achieved by a simple “roll-in roll-out” process of variable data within the application, including the stack process that controls the multi-layer operation of the internal and language elements.
Another useful feature is that the language elements can be in any ethnic language, and the multiple applications do not have to be in the same ethnic language. Even within a single application, it is possible to use more than one ethnic language using synonyms. Use of such synonyms adds slightly to VP processing but not significantly so.
The VP, as well as the numeric elements, is quite small and the elements could readily be encrypted with any one of the current algorithms, being decrypted only during their use. This would add a minor but continuous process time penalty. An alternative would be to store the elements in encrypted form and decrypt the elements at the beginning of a run, which would be a reasonable strategy for discreet running applications but likely not quite as suitable for continuously running applications such as may be used for pipeline or nuclear power plant monitoring.
For those with less sensitive needs various check digits can be incorporated both for the VP and for the element segments, these check sums being verified at each run if necessary, or at random intervals. It is unlikely that any intrusion of elements or VP would go undetected.
The proposed standard for programming represents the culmination of years of development, most of it catering to the conventional approach. The standard now proposed is essentially a numeric table with a small number of associated modules that decode the numeric and carry out what previously has been done by computer instructions. The modules will grow very slowly as the concepts are accepted by industry, the numeric elements on a more accelerated basis as the functionality is enhanced. It is possible, with the approach outlined, to consider this as a single, unchanging technology that can accommodate all current and future application needs, from the molecular needs of nanotechnology to the mathematical expansion requirements of the most powerful super computers.
The earlier concepts demonstrated successful applications on main frames, on mid size and microcomputers and on microcontrollers with RISC chips.
It will take several years for these concepts to become dominant in the industry but dominate it they certainly will. In the first instance, they should be adapted to microcontrollers for smart cards and embedded systems, which constitute over 90 per cent of all installed computers, but which are not dominated by monopolistic software vendors, and where only limited interaction is required between processors.
This will necessitate establishing simple networks based on the concepts (communications and control of high speed networks were part of earlier development), in particular with the use of smart cards for a variety of purposes such as health, financial transactions, personal identification and the like. This would place the concepts in the chips on the cards, in the card readers and in the servers controlling the network.
At the same time, the numerical approach should be introduced to the embedded systems arena, by specific industries such as the automotive or aerospace.
Once the numeric approach is introduced at the microcontroller level, it could then move to the larger systems, at first integrating with their various operating systems, but then replacing them, as they will become redundant. Again, it would best be done by industry (servers, graphics, video etc.) but after successful implementation with the microcontroller world most industries will, by that time, be ready to move.
In order for the this to be done in a controlled fashion by the PC+ industrial groups, which tend to favour propriety in software, it would be useful to establish a working group to oversee the orderly development of the numeric approach.
Free from viruses, worms and identity theft
One of the reasons that worms and viruses continue to exist is that the current software approach is based on computer language requiring a huge infrastructure. The complexity of the operating systems used is such that it is impossible to guarantee that there are no security loopholes. Nefarious person then exploit these loopholes and introduce code that is sent around the world, compromising millions of users systems, and possibly creating a national security risk. Until we erase this vulnerability, no user, company, or country is safe from vicious attacks on its computing lifeblood.
One of the reasons for moving to the numeric approach is the need to get away from this multitude of problems. The numeric approach offers:
A very small VP which is static, with no opportunity to introduce spurious code if check sums are included.
A highly efficient VP that can be encrypted if necessary.
A static set of elements that numerically describe suites of applications in a form where each number within an element points directly to the VP process required.
A static set of numeric elements that can be verified through check sums.
Even in the unlikely event that a spurious number was introduced into a numeric element, without affecting the check sum; applications would abort, due to the critical relationship that exists between each number within the element structure.
The one area where the numeric system does not have full control is in the byte stream code generated by the compile function. Even here, however, the byte code stream does not have access to the numerical elements nor to the small VP machine code. Neither, when multiple applications are running, can the data from one application corrupt the data from another application, with the exception that some applications can share data, and in such circumstances, the application developer would have to handle that aspect. Error Free Although the analysis has not yet been done, it is believed that the VP is so tiny that it could be proved error free. In a similar way so could each of the numeric elements. Numeric elements and the VP are static so that, once verified, there would be little need, if any, for further verification.
The result is a single, very small piece of software that enables all applications to be built with it and which can act as its own operating system. Placing it on a chip creates the Turing Universal machine.