Java research directions
Work on Java security :
Java is the programming
language of our times ... net-aware, object-oriented,
buzzword-compliant, hype-laden.
Where did the language come from?
Oak project (1991?), at FirstPerson Inc, led by James Gosling ---
focused on programming of embedded systems (controllers for
desktop sets, coffee-makers...). Started system design in C++ ---
turned out to be too cumbersome for their purposes.
Garbage-collection was a serious issue. Focused on defining a
``clean'' subset of C++ ... ``C++ without guns, knives and
clubs'' (Gosling), with significant influence from Objective C
(interfaces).
The language design has almost no new language features --- most of
the ideas have been introduced in other languages before. For
instance, the notion of a Virtual Machine (VM) has been around at
least since the popular P-code machine developed by Niklaus Wirth and
his collaborators in the 70s as a target for a portable Pascal
compiler.
Java's contribution to programming language design lies in its
simplifcation of the C++ computation model, and in its introduction of
type-safe, arbitrarily user-extensible dynamic linking, loading and
verification.
Where is the language today?
So what are the core features of the language? It is strongly
typed, garbage-collected, VM based, dynamically linked. The
latter makes possible the notion of code mobility over the net,
and ``sandbox''-security. The object model is simple and
elegant, with single-inheritance of signatures and
implementations, but multiple inheritance of interfaces. VM is
carefully designed to allow for considerable compile-time
information to be carried over and available at link-time. Thus
a number of transformations (e.g. adding new private members,
reimplementing methods without changing the public interface) can
be performed on source code without affecting the linkability
against pre-existing, pre-compiled code. This kind of binary
compatibility is extremely attractive, though it has not been
pushed to the extent that it can.
Of particular note in the Java design is the provision of
classloaders. A class-loader is, essentially, user-defined
code that is called in the heart of the Java implementation, and
is used to resolve a class name into binary code to be loaded by
the Java Virtual Machine. The current design of Java suffers from
the severe
bug that people can write class loaders which compromises
the type-security of the language.
Among the wide-spread commercially available languages, Java is
the best basis for large-scale distributed computing. This is
due in part to recent extensive work (released in Java 1.1) on
extending Application Programmer Interfaces (APIs) to support
- Reflection (run-time determination of classes, their members,
and their inheritance structure),
- Remote Method Invocation, RMI, (which allows a java program running
on one virtual machine to invoke a method on a Java object
running in another VM, possibly causing classes to be
dynamically loaded over the net on the second VM),
- Object Serializability (which allows for the state of a network
of objects, and not just classes, to be streamed out, so that it
can be stored on files for instance).
In addition an object component model (JavaBeans) is being
developed on top of these facilities to allow interoperable
plug-and-play components to be developed.
Work to do done ...
Weaknesses ... and therefore opportunities for research.
- No support for changability, hence persistence. No support for
reloading a new version of a class, and adjusting existing
instances, or changing the class of instances at run-time.
- No support for resource (time, space) encapsulation, usage
monitoring, as e.g. in the KeyKos nano-kernel. (Crucial in
network spaces to prevent denial of service attacks.)
- Typing model can be extended, types are not objects.
- Security model is ad-hoc. No notion of capability-based
security. Any class can be named in code.
- Byte-code verification is given by what may best be called a
murky box -- not white, not black, some muddy
in-between. I have taken a first-principles
approach in specifying this problem, and show that in fact
it can be solved fairly straightforwardly by translating a Java
class files C into a concurrent constraint program
T(C) which is such that C can be executed
safely in a Java Virtual Machine if T(C) does not
deadlock or yield false when executed at link-time.
- Java and the Java Virtual Machine do not have a formal
semantic model. Of course, neither do C or C++, or any other
widely used language. What is remarkable is that this is in fact
possible , in the style of structural operational
semantics now very familiar from work in semantics.
We are working on such a formal description, and expect that many
other research groups also are.
- The Java computation model is already somewhat
complex. Not geared towards fine-grained distributed
programming (static variables, for instance, have meaning only
in one VM). Code visibility issues (private, protected etc
methods) are somewhat cumbersome, inherited from C++. Nested
classes were added post hoc resulting in some syntactic
inelegance in the language.
- Component model still being developed.
- Reflection model is weak. For instance, the state of the VM is
not made available at runtime. There is no runtime representation of
threads.
- Political problems. Sun is intent on transforming Java from a
good clean and simple programming language to a Windows-killer
computing platform. This may tarnish the attraction of Java for
some developers and researchers.
- [Distributed GC?]
Research areas
- Involvement in developing several APIs.
- Performance enhancements techniques.
Sun has bought Hot Spot technology from developers of Self
compilers --- this allows extensive run-time optimizations based
on run-time performance-monitoring. There is substantial work
ahead in exploiting all the extra information Java compilers keep
in compiled classes to improve run-time performance.
-
Critical study of the Java security model.
This is a really important area of work. The Java security model
has not been subject to extensive peer review. In addition the
implementation is far from straightforward, and the language has
now developed several interlocking features. A full analysis of
the language from the viewpoint of security is sorely needed.