Despite efforts to define programming languages with a broad scope of applicability, it has become clear that no single programming language will support all the needs, real or perceived, of the information technology community. At various times, proponents of languages such as PL/I, Ada, C, and even Lisp have suggested they are adequate for nearly all information processing needs; and in all cases, the marketplace has demonstrated otherwise by economically supporting a variety of languages for different application domains. In fact, the current trend in large information system development is toward multilanguage integrations, in other words, multiple languages implementing a single system.
Standards organizations have reacted to the reality of a multilingual user community by encouraging the development of standards not specified in terms of a single programming language. Policy decisions made at various levels of both the American National Standards Institute (ANSI) and the International OrganiZation of Standards (ISO) have directed working groups to develop base specifications expressed in a language-independent fashion, plus a number of "bindings" explaining how the services provided by a particular specification may be obtained from a particular programming language.
The approach has met with some notable successes. One example is the success of the X3H2 standards committee in specifying a standard for SQL.(1) When the committee was formed, implementations provided a wide variety of functionality. However, aside from the issues of behavior, the various implementations used different syntaxes to embed SQL statements directly in the source language of the application programs invoking the database functions. The committee was confronted with the important problem of standardizing behavior while having to cope with the distraction of converging the syntaxes employed by several vendors to embed SQL in several different programming languages.
The solution adopted by X3H2 was bold; they discarded embedding as the reference interface to SQL functionality and developed a new abstraction--the "module"--upon which applications would make procedural calls in order to invoke SQL functionality. The committee then defined procedural bindings to the module for various common programming languages. Finally, the committee described the traditional embedded form of binding as a syntactic transformation upon an application program and its module.
In the case of X3H2, the intellectual payoff for the process was great. Not only did they find an economical way to express the bindings to the various programming languages, but they invented an entirely new binding, a procedural one, superior to the embedded binding for those languages similar to Ada which adhere to a "contract" model of separating interface specification and implementation. Perhaps more importantly, though, the existence of a language-independent specification (LIS) enabled the committee to separate or factor their concerns; they could consider issues of functionality separately from those of interface without being distracted by the myriad of detailed considerations relevant to the various programming languages. In fact, it is this last advantage that is most generally cited for doing language-independent specifications.
Not all efforts to produce language-independent specifications have been successful. An IEEE committee is working on standard specifications for POSIX, the Portable Operating System Interface for Unix-like systems. Although the initial POSIX specifications were specific to the C programming language, the IEEE has attempted to develop language-independent versions of those standards. The difficulties it has encountered are instructive to those who would attempt similar efforts.
At the outset, it should be noted that it is probably far more difficult to produce a language-independent standard for POSIX than for SQL. The scope of SQL functionality is clearly circumscribed by relational database management; the scope of POSIX functionality is open-ended as evidenced by the frequent creation of new working groups in areas such as windowing, supercomputing, transaction processing and systems administration. Furthermore, for the most part, SQL functions are amenable to being called in a procedural fashion and directly returning while some POSIX functions, such as signalling and multithreading, implicitly pervade the operation of a calling program.
The first and most apparent difficulty is to obtain the support of the volunteer labor needed to develop the LIS. Most of the labor in standards development committees is funded by product vendors or product users. Both vendors and users are concerned with actual implementations of products, not with language-independent abstractions which are at least once-removed from actual usable products. This was a particular problem for the POSIX community because the vast majority of its vendors and users are solidly based in the C community and perceive only marginal value in generalizing the specifications to accommodate the needs of other languages.
Another difficulty is the intellectual effort needed to deal with yet another level of "meta-ness." Product developers make choices among functions to be provided. Standards developers make choices among various constraints to be imposed upon the choices to be made by product developers; various functions may be required, prohibited, permitted, or even encouraged or discouraged. Language-independent specifications are another step further removed. Their developers make choices among abstractions that are to be mapped to the various functions and constraints that would otherwise be directly articulated in a language-specific standard.
These abstractions must be identified and articulated in a language-neutral fashion. Consider multi-threading, for example. Since the C language does not directly support concurrency, it is clear the initiation, termination and coordination of multiple threads of execution must be expressed by procedure calls the application would make upon the underlying operating system. Ada, on the other hand, has concurrency mechanisms built directly into the language as an implicit part of its semantics; these implicit semantics should be equated to any underlying concurrency provided by POSIX. It is extremely challenging to describe an abstraction of multithreading in a manner that does not preclude appropriate binding to one of these two languages.
One does not have to consider semantics, though, to appreciate the difficulties in expressing language-neutral abstractions. Users of different programming languages organize their development tasks by considering various key issues that are different for the various languages. For instance, C or Fortran programmers tend to focus on the operations; hence their first question is, "What are the functions?" Ada programmers concentrate on program structure and modularity; they ask, "What are the packages?" Programmers of object-oriented languages such as SmallTalk and C++ ask, "What are the objects?" No LIS can address these fundamentally different approaches in an evenhanded fashion.
Developers of language-independent specifications must also work in an environment that is tightly constrained by the pre-existence of language-specific standards. The C-based POSIX standard for base operating system functionality was already well on the way to international standardization when focused effort began on the LIS. Naturally, the working group developing the LIS was constrained by the requirement that the final LIS plus its binding to C had to specify functionality identical to the existing C-based standard. (This effort was further complicated by the existence of artifacts in the C-based standard that were variously characterized as either bugs or features.) Standardization is inherently a political process and at this point the political nature becomes apparent in the determination of an important value: is it more important to serve the existing large base of Unix C users or is it more important to reach the communities currently using other languages but which desire the "openness" provided by POSIX?
Those who would write a LIS must identify data structures and operations appropriate for implementation in various languages. It would be tedious and unrewarding to take each such item, survey all languages that are candidates for bindings, and select a compromise description of the item equally suitable for all of the languages. Instead, LIS developers typically search for those abstractions that underlay the desired functionality and express those abstractions in some (semi-)formal manner. Identifying the details of such abstractions can be surprisingly difficult. For example, POSIX file descriptors, similar to those in Unix, are integers. Does that mean it is appropriate to increment a POSIX file descriptor in order to traverse the space of file descriptors? Is it appropriate to create an array indexed by file descriptors in order to retain supplementary information regarding files? Is it appropriate to add two file descriptors? Different experts involved in writing the POSIX/C standard give differing answers to these questions, but the answers to those questions have important impact upon the abstraction selected for file descriptors.
LIS developers also need to adopt a formalism for expressing the selected abstractions. Although most common programming languages are not strongly typed, LIS developers often select a strongly typed formalism for articulating the abstracted objects and operations. They face a dilemma, though, because the variety of types available in the programming languages cannot be expected to match the types used in the LIS formalism.
Each programming language describes a primitive or fundamental set of types and operations. Some languages are strongly typed, in that programmers can create new types, often with semantic constraints such as limited ranges that are enforced by the compiler. Other languages do not support user-defined types, or do so in a way that is not checked by the compiler. In addition, different languages place different semantics on what seems to be "the same type."
For instance, the notions of a character in C, Fortran and Ada are all different. If the programming language types are less strongly specified than the LIS types, then the application must be instructed to preserve some semantic properties. On the other hand, if the programming language types are more strongly specified than the LIS types, then language bindings lose some of the generality permitted by the LIS. An example of the first case is POSIX file descriptors. The LIS might describe them as ordinal integers that lack arithmetic operations. Programming language bindings might also describe them as integers but must levy a requirement that the application must not attempt to add two file descriptors.
An example of the opposite case also occurs in POSIX. According to the LIS, an implementation may place a restriction on the maximum number of file descriptors or may specify that a process may have an "unlimited" number of open files. Due to the limitations of digital computers, it is generally impossible to specify an integer type with no upper bound. Even in weakly typed languages such as C, there is some ultimate implementation limit on the representation of an integer. So the otherwise attractive mapping of LIS file descriptor to C integer has the effect of only approximating the notion of permitting an unlimited number of open files.
There are also important issues regarding the structure of the language bindings themselves. Many LIS developers would prefer the specifications of the language bindings to be simple documents that transform the LIS types and operations into programming language types and operations in a straightforward mechanical manner. This has the effect of confining the tough intellectual questions to the LIS specification and uniformly applying the results to the various programming languages. The problem, of course, is that such bindings do not exploit the particular strengths and characteristics of the various programming languages. The reporting of errors is an instructive example. An LIS that reports errors through the use of return codes is well suited for languages similar to Fortran and C, but poorly suited for Ada. On the other hand, an LIS that reports errors via exceptions would present great obstacles for most programming languages.
This problem of developing language-appropriate bindings is often referred to as an issue of "thick" versus "thin" bindings, but there are actually two related issues we will call "thick" versus "thin" and "direct" versus "abstract."
The thickness terminology arises because advocates of language-independent specifications believe the factoring of concerns should be expressed in the nature of the respective documents. All of the functionality of the standard should be expressed in the base LIS standard, a physically thick document. The language bindings should confine themselves to expressing the syntax for accessing the functions of the base standard; they should be thin documents.
This approach has indisputable advantages for the developers of standards. However, it does present a problem for users. In order to understand any particular concrete construct of the standard, they must consult at least two documents, the base standard and the language binding document. Advocates of the thin approach suggest that end-users probably do not consult standards documents anyway; they ought to use secondary tutorial texts.
Obviously, it is easy to write a thin-bindings document if all of the LIS abstractions are mechanically translated into procedure calls of the application language. As is illustrated by the previous discussion, it is not always appropriate to provide a direct mapping of the LIS constructs. In some cases, an LIS construct should be mapped to some language feature other than a simple procedure call. This is called an abstract mapping. Abstract mappings complicate, sometimes fatally, the prospects for writing a thin-bindings document.
Ultimately, the basic problem with mechanical translation is to develop an appropriate meta-language that can capably and equitably define functions for function-oriented languages, packages or modules for modular languages, and objects for object-oriented languages. This meta-language must precisely specify semantics, but must not otherwise constrain the concrete syntax used by any particular programming language.
Unfortunately, discussions of the issues related to mechanical binding are often unproductive because advocates often confuse the technical issue of direct versus abstract with the expository issue of thick versus thin.
In summary, although language-independent standards dearly have desirable attributes, difficult problems impede the prospects for success. Aside from the difficulties in staffing an effort to produce a LIS, there are unresolved technical problems that complicate the development of suitable standards.
1 When used to refer to the standard, SQL is not an acronym--merely a short form for "Database Language SQL."
James Moore is a senior employee at the MITRE Corp. in Reston, Va. David Emery works for the MITRE Corporation in Bedford, Mass. Roy Rada is a professor of computer science at the University of Liverpool in England. All three are members of the ACM Technical Standards Committee.
|Printer friendly Cite/link Email Feedback|
|Author:||Moore, James W.; Emery, David; Rada, Roy|
|Publication:||Communications of the ACM|
|Date:||Dec 1, 1994|
|Previous Article:||Can wiretaps remain cost-effective?|
|Next Article:||The NII intellectual property report.|