Java research directions

Work on Java security :

Java is not type-safe

Java is the programming language of our times ... net-aware, object-oriented, buzzword-compliant, hype-laden.

Where did the language come from?

Oak project (1991?), at FirstPerson Inc, led by James Gosling --- focused on programming of embedded systems (controllers for desktop sets, coffee-makers...). Started system design in C++ --- turned out to be too cumbersome for their purposes. Garbage-collection was a serious issue. Focused on defining a ``clean'' subset of C++ ... ``C++ without guns, knives and clubs'' (Gosling), with significant influence from Objective C (interfaces).

The language design has almost no new language features --- most of the ideas have been introduced in other languages before. For instance, the notion of a Virtual Machine (VM) has been around at least since the popular P-code machine developed by Niklaus Wirth and his collaborators in the 70s as a target for a portable Pascal compiler.

Java's contribution to programming language design lies in its simplifcation of the C++ computation model, and in its introduction of type-safe, arbitrarily user-extensible dynamic linking, loading and verification.

Where is the language today?

So what are the core features of the language? It is strongly typed, garbage-collected, VM based, dynamically linked. The latter makes possible the notion of code mobility over the net, and ``sandbox''-security. The object model is simple and elegant, with single-inheritance of signatures and implementations, but multiple inheritance of interfaces. VM is carefully designed to allow for considerable compile-time information to be carried over and available at link-time. Thus a number of transformations (e.g. adding new private members, reimplementing methods without changing the public interface) can be performed on source code without affecting the linkability against pre-existing, pre-compiled code. This kind of binary compatibility is extremely attractive, though it has not been pushed to the extent that it can. Of particular note in the Java design is the provision of classloaders. A class-loader is, essentially, user-defined code that is called in the heart of the Java implementation, and is used to resolve a class name into binary code to be loaded by the Java Virtual Machine. The current design of Java suffers from the severe bug that people can write class loaders which compromises the type-security of the language. Among the wide-spread commercially available languages, Java is the best basis for large-scale distributed computing. This is due in part to recent extensive work (released in Java 1.1) on extending Application Programmer Interfaces (APIs) to support

Reflection (run-time determination of classes, their members, and their inheritance structure),
Remote Method Invocation, RMI, (which allows a java program running on one virtual machine to invoke a method on a Java object running in another VM, possibly causing classes to be dynamically loaded over the net on the second VM),
Object Serializability (which allows for the state of a network of objects, and not just classes, to be streamed out, so that it can be stored on files for instance).

In addition an object component model (JavaBeans) is being developed on top of these facilities to allow interoperable plug-and-play components to be developed.

Work to do done ...

Weaknesses ... and therefore opportunities for research.

No support for changability, hence persistence. No support for reloading a new version of a class, and adjusting existing instances, or changing the class of instances at run-time.
No support for resource (time, space) encapsulation, usage monitoring, as e.g. in the KeyKos nano-kernel. (Crucial in network spaces to prevent denial of service attacks.)
Typing model can be extended, types are not objects.
Security model is ad-hoc. No notion of capability-based security. Any class can be named in code.
Byte-code verification is given by what may best be called a murky box -- not white, not black, some muddy in-between. I have taken a first-principles approach in specifying this problem, and show that in fact it can be solved fairly straightforwardly by translating a Java class files C into a concurrent constraint program T(C) which is such that C can be executed safely in a Java Virtual Machine if T(C) does not deadlock or yield false when executed at link-time.
Java and the Java Virtual Machine do not have a formal semantic model. Of course, neither do C or C++, or any other widely used language. What is remarkable is that this is in fact possible , in the style of structural operational semantics now very familiar from work in semantics. We are working on such a formal description, and expect that many other research groups also are.
The Java computation model is already somewhat complex. Not geared towards fine-grained distributed programming (static variables, for instance, have meaning only in one VM). Code visibility issues (private, protected etc methods) are somewhat cumbersome, inherited from C++. Nested classes were added post hoc resulting in some syntactic inelegance in the language.
Component model still being developed.
Reflection model is weak. For instance, the state of the VM is not made available at runtime. There is no runtime representation of threads.
Political problems. Sun is intent on transforming Java from a good clean and simple programming language to a Windows-killer computing platform. This may tarnish the attraction of Java for some developers and researchers.
[Distributed GC?]

Research areas

Involvement in developing several APIs.
Performance enhancements techniques. Sun has bought Hot Spot technology from developers of Self compilers --- this allows extensive run-time optimizations based on run-time performance-monitoring. There is substantial work ahead in exploiting all the extra information Java compilers keep in compiled classes to improve run-time performance.
Critical study of the Java security model. This is a really important area of work. The Java security model has not been subject to extensive peer review. In addition the implementation is far from straightforward, and the language has now developed several interlocking features. A full analysis of the language from the viewpoint of security is sorely needed.