Strategic directions in research on programming languages.
A good programming language is a conceptual universe for thinking about programming.
ALAN PERLIS NATO Conference on Software Engineering Techniques Rome, 1969
Programming language research covers a broad spectrum from systems work to theory. On the theoretical side, programming language research has its roots in the work of Schonfinkel, Curry, Kleene, and Church in the 1920s and 1930s. On the practical side, it has been influenced by developments in hardware from the von Neumann, architecture of the 1950s to today's heterogeneous distributed networks. In addition, developments in software engineering with its requirements of feasibility, reliability, and performance have had a great impact on the area. A feature of programming language research that gives it much of its excitement is that individual researchers have the opportunity to work across broad bands of this spectrum.
The area is concerned mainly with the discovery of principles and generic techniques; it is not, with a few notable exceptions, about the invention of new languages. Many of the results of the research are generally applicable to a wide variety of language paradigms including procedural languages, functional and logic languages, object-oriented languages, and the like. Many common programming language features have emerged from this area; some of the more well-known examples are:
* Procedures: inspired by the notion of [Lambda]-abstraction and pervasive in modern programming languages.
* Recursion: primitive recursion (for loops) and general recursion (while loops) were introduced in recursion theory and are pervasive in modern high-level languages.
* Types: developed in the context of pure calculi such as the [Lambda]-calculus and combinatory logic but extended by programming language researchers and now pervasive.
* Exceptions: introduced in PL/1 but extensively studied and formalized in ML [Milner et al. 1990]; similar concepts appeared later in C + +.
* Monitors: introduced by Hoare in the 1970s [Hoare 1974] and now the basis for the concurrency features of Java.
In addition to language features, programming language research has also produced important techniques. The five areas that receive special attention in this report are as follows.
Semantics. Formal semantics is concerned with the description of program meanings by operational, denotational, or axiomatic specifications [Nielson and Nielson 1992; Tennent 1991]. It improves our understanding of new as well as familiar programming constructs and it provides a yardstick for implementation and a foundation for analysis and verification techniques and program transformation. Over the years different techniques have been developed to handle different programming paradigms and different applications.
Type systems. A type is a collection of values that share a common structure, operations, and other properties. A type system is a specification of how types are assigned to values [Cardelli and Wegner 1985]. Type safety - the prevention of certain classes of programming errors - is a desirable property of any programming language; many of the recently published standards for safety-critical software insist upon the use of strongly typed programming languages. Over the years different type systems have been developed and have found their ways into commercially successful languages.
Program analysis. Program analysis is concerned with the problem of statically predicting properties of the dynamic executions of programs [Aho et al. 1986; Cousot and Cousot 1977; Jones and Nielson 1995]. Traditionally, program analysis has been used extensively to make possible various optimizations and transformations in compilers; among the newer applications is the validation of software to reduce the likelihood of malicious behavior. Over the years a wide variety of techniques has been developed to handle the different analysis problems and different programming paradigms.
Program transformation. The goal of program transformation is to modify some representation of the program to change some of its properties while preserving others. For example, most useful program transformations preserve the input/output semantics of the program but can radically change the program's complexity. The transformation of programs is an important technique in the development of reliable and efficient software [Burstall and Darlington 1977; Jones et al. 1993]. Techniques have been developed for different stages of the software development process: when developing programs from specifications, when specializing existing programs to specific contexts and, in particular, when optimizing compilers.
Implementation. This area concerns compilation [Aho et al. 1986] and run-time support. Current concerns in compiler technology include compilation for distributed systems, optimizations across module boundaries, and correctness issues. Memory management, particularly on complex cache architectures, is a critical concern in run-time support. This area is increasingly concerned with the development of generic tools rather than tailored solutions.
Programming language research in the 1990s is slowly changing from what it was in the 1970s and 1980s: the traditional uniprocessor machines are being replaced by heterogeneous and physically distributed computer networks, the traditional line-based user interfaces are being replaced by graphical user interfaces, multimedia applications are the rule rather than the exception, and so on. There is a growing need for programming languages to adapt to this fundamental change and thereby for programming language research to address these problems more thoroughly. We need to understand the semantics of the language features developed for programming these systems; we need to understand what type security means in this setting; we need to develop program analyses that statically predict the behavior of these systems; and we need to develop implementation techniques that utilize the vast amount of parallelism present in these networks.
We believe that the history tells us that programming language research is in good shape to take up this challenge.
The views of programming language research expressed in this report are inevitably colored by the constitution of the working group; there were other working groups covering similar topics, notably Object-Oriented Programming, Software Engineering and Programming Languages, and Human-Computer Interaction. A recent publication containing more detailed discussion of some of the research themes identified is Hankin and Nielson .
2. STRATEGIC DIRECTIONS
This section presents five strategic directions identified by the working group. Our aim was to identify some general themes that will provide the impetus for new research over the next few years. Progress will be achieved by successful results in a number of technical areas, (some of) which are identified in the next section. The strategic directions represent a mix of continuing, long-term research going on within the field and more recent directions sparked by changes in the larger world of computing. In the past, the programming of single computers and computers in local-area networks has been the dominant programming task. In the coming years, programs will be needed for global computing, domain-specific computing, embedded systems, large-scale systems, and more. These application areas almost certainly require new programming concepts. For example, the heterogeneous, distributed, and dynamic nature of global computing networks raise issues to do with configuration, coordination, security, and exploitation of interprocessor parallelism. In the context of embedded systems, performance predictability and fault tolerance pose new challenges.
Each of the five strategic directions is addressed by many of the theme areas in programming language research discussed in the following. The alphanumeric codes identify relevant research topics in these themes, which are described more fully in the next section.
Distributed computing. One of the recurrent themes throughout the position statements is distributed computing. This poses new challenges in all areas of programming language research: how do we design languages for such systems that have firm semantic foundations, support the safe programming of distributed systems, and utilize the vast amount of parallelism offered by the networks. Persistence and support for transactions are examples of areas of increasing importance. A common feature of distributed systems, which is also of independent interest, is mixed languages - "integration of programming paradigms" is an important theme. A number of the topics described in the next section contribute to this direction: S-1, S-3, S-4, T-4, PA-4, PT-4, I-1, and I-4.
Incrementality, modularity, and abstraction. Software systems are long-lived and must survive many modifications in order to prove useful over their intended life span. The primary linguistic mechanisms for managing complexity are modularity (separating a system into parts) and abstraction (hiding details that are relevant only to the internal structure of each part). A challenge for future language design is to support modularity and abstraction in a manner that allows incremental changes to be made as easily as possible. Object-oriented concepts have much to offer and are the topic of much current investigation. Topics contributing to this direction include: S-3, T-1, PA-2, PT-2, I-2, and I-5.
Generic formalisms, integration of tools and techniques. Many of the formalisms we work with have been developed in the context of specific problem domains; there is the need to develop these further to provide the basis for generic formalisms. At the more systems-oriented end of the spectrum, there is the need to integrate the tools and techniques that are being produced by individual research activities into "real" systems. Part of the work in this direction also involves developing a better understanding of the relationship between different approaches. We have identified the following topics as contributing: S-2, S-3, T-2, T-3, PA-1, PA-2, PT-1, PT-3, and I-5.
Correctness, efficiency, engineering, and pragmatics. The goal of programming language research must be to define languages and techniques that improve the quality of software. The notion of quality is multifaceted but must include programmer productivity, verifiability, reliability, maintainability, and efficiency. This theme has assumed greater importance as a result of a number of well publicized disasters, such as the Pentium chip and the Ariane-5 rocket. A number of topics address this direction: T-3, T-4, PA-3, PT-2, PT-3, I-3, and I-4. Education and technology transfer.
Although only a few of the topics mention this aspect, the validity of all programming language research hinges on our ability to educate others and transfer our technology into practical systems. For example, we can consider TCL to be a failure of education and technology transfer because the language does not even have a syntax suitable for presentation by a grammar, and Java a success, since the industrial group responsible for its design used research developments of the past decade to great advantage. The following topics explicitly address this direction: PT-1 and PT-2.
3. RESEARCH AREAS
Programming language research is built on a bedrock of principles, ranging widely along the pure-applied and theory-systems spectra. In this working group we focus on aspects that play a central role in programming language research:
(1) semantics, (2) type systems, (3) program analysis, (4) program transformation, and (5) implementation.
None of these subareas can exist without the others, and indeed, the various subareas rely upon one another. For instance, program analysis has a strong theoretical basis: we trade precision for computability and thus cannot expect precise answers to our questions, but it is important for the practical applications that the answers are semantically sound and feasible to implement. Program transformation likewise has strong theoretical foundations and is often made possible by program analysis: it is crucial that the transformations preserve the semantics of the programs and transformations are often guided by the results of program analyses. Type systems ensure that the execution of well-typed programs does not lead to certain run-time errors. Implementations are required to be faithful to the semantics and rely on types, program analysis, and transformation.
The various subareas are useful in identifying future topics for programming language research. The following topics were abstracted from the position statements prepared by the participants of the working group.
S-1: Language design. An area in which there is current interest in language design is coordination languages [Carriero and Gelernter 1992; Ciancarini and Hankin 1996]. Coordination languages have been developed for programming systems in which multiple, heterogeneous agents cooperate in the execution of a particular task. The recent emergence of this class of languages has coincided with the emergence of the WWW; the latter provides a fertile area for applications. As yet, there is no clear consensus about a suitable set of language primitives for coordination. As in other areas of language design, there is the often unrealized expectation that studies in semantics will lead to improved language designs.
S-2: Foundations. Work on programming language semantics has produced a number of different formalisms. Operational semantics provide an abstract implementation-oriented account of program meaning, denotational semantics give a more abstract mathematical account, and axiomatic semantics focus on partial correctness issues (see Nielson and Nielson  and Tennent  for a thorough discussion). In operational semantics there is the choice between big-step natural semantics or small-step structural operational semantics. In denotational semantics there is a choice about which mathematical universe is used for program meanings (e.g., metric spaces or domains). These choices are sometimes a matter of taste rather than deep foundational differences. It is important that effort be devoted to understanding the capabilities and limitations of the different approaches so that informed choices can be made.
S-3: Paradigm integration. Mixed-paradigm languages are attractive because they allow the programmer to exploit the different strengths of the paradigms in addressing different aspects of a problem. A problem that arises, particularly if the mixing is ad hoc, is that there can be unwelcome interactions between the component sublanguages. Examples of successful integrations include functional and logic languages [Hanus 1994] and functional languages with imperative features [Tennent 1991]. The emergence of concurrent and distributed applications raises new issues and opportunities for successful integrations.
S-4: New directions. The last few years have seen the development of a number of new areas of computing that pose interesting challenges for semantics, among them artificial neural networks, embedded systems, graphical user interfaces, and global computing. Some of these may require essentially new approaches and theoretical tools. For example, some of the issues that must be addressed in global computing include type systems, security, reliability, modularity, and resource management. Another challenge is the "executable content in messages," which is supported to a limited degree by Java. There has been very little work so far on semantics-related issues that arise in these new areas.
3.2 Type Systems
T-l: Types for objects. Many type systems for object-oriented languages have been defined [Cardelli and Wegner 1985; Gunter and Mitchell 1994]. Their relative merits should be investigated, so as to build consensus on what features a type system should include, what the proper role of type inference is, and how these ideas can be integrated with module systems. Specific issues include the integration of subtyping and polymorphism, the current lack of principal types in implicit types systems with subtyping, and the typing of binary methods.
T-2: Type-based compilation. Types can be used by an optimizing compiler to make programs run faster and consume less space. Known ideas on typed intermediate languages, typed pointer data structures, and the like, should be extended and applied to the compilation of both conventional and modern languages. It should be further investigated how the type-based program analyses compare with more traditional program analyses, and how we can integrate type-based analysis and compilation. Specific issues include side effects, exceptions, and modules.
T-3: Type-aware programming environments. The editors, version managers, etc., for typed languages can be improved by making them aware of the type system. During programming, type information might be computed incrementally and used, for example, to warn about errors and to help the programmer understand the program. The type system itself might be made changeable by the programmer, leading to customizable tools.
T-4: New applications. The safety and security issues inherent in distributed programming [Bank 1995] may be partially solved using type systems. Known ideas on structural type matching, type systems for concurrency, and types for security should be extended and complemented. On a different front, there is a continuum between type systems and program analysis. Refined type systems are increasingly being used to capture program properties that have been the preserve of program analysis. One example is the property of deadlock-freedom in process calculi. Such type systems are usually undecidable, so we have to work with approximate types. This is an area that will be of growing importance.
3.3 Program Analysis
PA- 1: Unification of different techniques. Program analyses have been developed for a variety of programming languages (e.g., imperative, functional, logic, object-oriented) using different techniques (e.g., flow-based, semantics-based, inference-based, and abstract-interpretation-based) and with different semantic foundations (e.g., flow charts, operational semantics, denotational semantics). The techniques have been developed and further specialized by different subcommunities that have to some extent continued in different directions. There is a critical need to understand the relative merits of the different approaches and the extent to which they improve each other.
PA-2: Realization of program analyses. Program analyses have been implemented in a number of optimizing compilers but the reusability of these implementations seem to be rather limited. In particular, the automatic generation of program analyzers still lags far behind the technology developed for front-ends (parsers) and back-ends (code generators) [Aho et al. 1986]. Although the theoretical foundations of a particular technique may seem well understood, a generally useful analysis tool may well be far away.
PA-3: Choice of program analyses. Different analyses may claim to solve the same problem but may do so in different ways (even when using the same underlying framework) and may yield different precision. Sometimes a coarse analysis is acceptable if we get the answer fast and at other times we need the precision that a slower analysis can give. We need criteria for selecting an analysis with the "right" balance between precision and cost.
PA-4: New applications. Program analysis has been applied in compiler construction but the technology has much more to offer. How to analyze real-time systems and distributed systems is largely unexplored. This is relevant for safe systems on heterogeneous networks with thousands of workstations.
3.4 Program Transformation
PT-l: Automatic tools. The development of programs is to a large extent manual work. The development of semantics-based program manipulation tools offers the prospect of changing that: programs can be considered as data and thereby be manipulated automatically. Progress has been made in various areas such as partial evaluation and unfold/fold transformation, and in the application of techniques such as program slicing and staging transformations. But we need to push the state of the art so that semantics-based transformation tools can play a full role in the software development process.
PT-2: Algorithm development and design. Program transformations have been used to integrate algorithm design and program development. This has already led to a better understanding of known algorithms and to the discovery of new algorithms [Jones et al. 1993]. A next step would be to use program transformation technology to develop efficient implementations using the best known data structures.
PT-3: Foundations. Program transformations allow us to replace certain programs with others. In order for this to be sound, it is crucial that the transformations preserve the semantics of the original programs. Moreover, we must have strategies to direct the application of the transformation rules to achieve goals such as efficiency improvement, greater degree of parallelism, better modularity, and so on.
PT-4: New applications. Traditionally, program transformations have been used to improve the efficiency of programs run on a single machine. When programming complex parallel and distributed systems, we may need tools for automatically mapping programs onto the architecture. Sophisticated transformations are then required to enhance the performance of the overall system; there may even be a need to apply such transformations dynamically, thereby adapting the programs to a constantly changing environment.
I-1: Compilation for distributed systems. Programming languages with support for parallelism and distribution increase the demands for incremental compiler technology: occasionally decisions must be taken dynamically about where to store data, when to move data from one storage area to another, on which machine to perform specific computations, and so on. In general, the language support will be for coarse-grained parallelism, whereas the exploitation of fine-grained, instruction-level parallelism is left to the compiler itself. Techniques have been developed for single processors but the adaption to heterogeneous processor architectures is a complex task.
I-2: Optimization. Modularity is often associated with separate compilation, and separate compilation has traditionally been a barrier to aggressive optimizations, which often require global program analysis. Feasible global optimization might be achieved by compiling each module into a central persistent repository, and letting an optimizer roam it whenever there are cycles to spare. The delay of some optimization and code generation until link-time and run-time has shown promise and deserves further attention. Finally, it is important to get a better prediction of the cost/benefit of various optimizations.
I-3: Compiler correctness. Although techniques for proving the correctness of nonoptimizing compilers are basically well understood, it is not so clear how to prove the correctness of realistic optimizing compilers. Ideally one would want to prove each of the optimizations correct and then combine the results into a correctness theorem for the complete compiler. However, this is complicated by the fact that the optimizations often introduce new constructs in the intermediate language and this has to be reflected in the correctness predicates.
I-4: Memory management. New garbage-collection techniques, interacting with representation optimizations, can improve locality and latency properties of automatic memory management. Also, write-allocate caches and fast block moves would speed heap allocators and garbage collectors, but these are rare in current hardware. As garbage collection becomes customary, one can hope that memory system design will be adapted to optimize it. The increasing ratio of processor speed to memory speed means that cache performance will dominate all other back-end factors; for complex cache architectures it is far from clear how to achieve good cache performance.
I-5: Generic tools. Tools for developing compiler front-ends (scanners and parsers) are well known and widely used. The remainder of the compiler has not been as successfully automated: tools based on attribute grammars have been for semantic analysis and only very recently have tools for the optimization phase been developed. For the back-end, very recent tools for code selection based on pattern matching seem promising and tools based on graph coloring have been successful for register allocation. Further development and extension of these tools to languages and systems running on heterogeneous networks is needed.
AHO, A. V., SETHI, R., AND ULLMAN, J. D. 1986. Compilers: Principles, Techniques and Tools. Addison Wesley, Reading, MA.
BANK, J. 1995. Java security, available via http://www-swiss.ai.mit.edu/jbank/javapaper/java-paper.html.
BURSTALL, R. AND DARLINGTON, J. 1977. A transformation system for developing recursive programs. J. ACM 24, 1, 44-67.
CARDELLI, L. AND WEGNER, P. 1985. On understanding types, data abstraction and polymorphism. ACM Comput. Surv. 17, 4, 471-522.
CARRIERO, N. AND GELERNTER, D. 1992. Coordination languages and their significance. Commun. ACM 35, 2, 97-107.
CIANCARINI, P. AND HANKIN C. (EDS.) 1996. Coordination Languages and Models. LNCS 1061, Springer Verlag.
COHEN, J. 1981. Garbage collection of linked data structures. ACM Comput. Surv. 13, 3, 341-367.
COUSOT, P. AND COUSOT, R. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. Proceedings of the Fourth ACM Symposium on Principles of Programming Languages (POPL), ACM Press, New York 238-252.
GUNTER, C. AND MITCHELL, J. 1994. Theoretical Aspects of Object-Oriented Programming. MIT Press, Cambridge, MA.
HANKIN, C. AND NIELSON, H. R. (EDS.) 1996. Symposium on models of programming languages and computation. ACM Comput. Surv. 28, 2, 293-359.
HANUS, M. 1994. The integration of functions into logic programming: From theory to practice, J. Logic Program. 19, 20, 583-628.
HOARE, C. A. R. 1974. Monitors: An operating system structuring concept. Commun. ACM 17, 10, 549-557.
JONES, N. D., GOMARD, C. K., AND SESTOFT, P. 1993. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Englewood Cliffs, NJ.
JONES, N. D. AND NIELSON, F. 1995. Abstract interpretation: A semantic-based tool for program analysis. In Handbook of Logic in Computer Science, Vol. 4, S. Abramsky, D. Gabbay, and T. Maibaum, Eds. Clarendon Press, New York.
MILNER, R., TOFTE, M., AND HARPER, R. 1990. The Definition of Standard ML. MIT Press, Cambridge, MA.
NIELSON, F. AND NIELSON, H. R. 1992. Semantics with Applications, a Formal Introduction. Wiley, New York.
TENNENT, R. D. 1991. Semantics of Programming Languages. Prentice-Hall International, Englewood Cliffs, NJ.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||ACM 50th Anniversary Issue: Strategic Directions in Computing Research|
|Author:||Hankin, Chris; Nielson, Hanne Riis; Palsberg, Jens|
|Publication:||ACM Computing Surveys|
|Date:||Dec 1, 1996|
|Next Article:||Strategic directions in computer architecture.|