Emory Utilities

Collection of Java 1.4-compatible utility classes for advanced systems and applications.

Overview

This package contains Java utility classes covering diverse programming areas, including: specialized and primitive collections, networking, stream-based I/O, class and resource loading, JAR files and resources, concurrent processing, distributed processing, security, XML parsing, handling configuration files, Swing GUI, and memory management. Details can be found below. Some of these classes are relatively simple, whereas some are substantial; some introduce new APIs and functionality while others workaround problems found in standard Java APIs.

These classes were originally developed to fulfull needs of specialized projects by DCL, such as RMIX and H2O. Over time, they grew to become a separate, substantial, organized package. Importantly, because of their origin, these utilities solve actual programming problems that we have encountered along the way in one of our projects.

Emory Utilities package contains backport-util-concurrent.

Acknowledgements

This package is an open-source research project of Distributed Computing Laboratory, Dept. of Math and Computer Science, Emory University. The research is supported in part by U.S. DoE grant DE-FG02-02ER25537 and NSF grant ACI-0220183.

License

This software is released to the public domain. It can be used for any purpose, modified, and redistributed without acknowledgment. No warranty is provided, either express or implied.

Features

The following sections contain highlights of the most interesting classes you can find in this package:

Disclaimer: many, if not the most, of these utility classes are used heavily in our projects; in effect, these are quite well tested. However, some classes (especially new ones) may have received less careful review. Some classes have very stable APIs that we are very confident about; the others may be new and experimental, and may change (possibly, breaking compatibility) from release to release. As a rule of thumb, the more complete and detailed the javadoc for a given class is, the more confidence we have in its reliability and stability.

We release this software AS IS, hoping that it will be useful. While we make efforts to make it so, we do not provide warranty of any kind. Please see the license for more details.

Specialized collections

RadkeHashMap and RadkeHashSet provide implementations of hash map and hash set based on Radke quadratic residue probing. The approach eliminates per-entry memory allocation; all the data is kept in a few large arrays which are expanded as needed. They match the performance of standard hash map and hash set, while demonstrating smaller memory footprint and garbage collection overhead.

WeakValueHashMap is a companion to the standard java.util.WeakHashMap that keeps the values (rather than keys) as weak references. Thus, when a value is no longer in ordinary use, it is subject to garbage collection and all entries referring to that value are removed from the map. The class may be indispensable in implementing certain types of caches.

Primitive collections

Emory Utilities include a primitive collections package, which is small, but consistent, very efficient, and offers some unique features. The package supports collections, lists, and sets of primitive types (currently int, short, and long), as well as maps with primitive keys and object values.

<Type>RadkeHashMap maps primitive keys to objects. It allows to save a lot on memory allocation costs (no per/entry allocation is performed upon insert) resulting in superior performance. The class may be useful in implementing "sparse arrays", e.g. large arrays with most entries being null.

<Type>RadkeHashSet is a companion hash set of primitive types, which similarly exhibits very high performance.

<Type>ArrayList provides a fast, growable primitive type array.

<Type>IntervalSet is a sorted set of integers, providing functionality similar to SortedSet and NavigableSet in the standard Java collections. However, as an integer set, this class posesses some interesting and unique characteristics. First, since it internally stores elements as a sorted sequence of intervals, it can sustain huge element counts (billions and more) with very low memory overhead, as long as the elements are clustered. This makes it particularly suitable for collision-detection arrays and sequence generators with entry reclaiming (think e.g. of assignment of process IDs). Further, it provides a "complement set" view, which e.g. allows to query for the smallest integer not contained in the set etc. Finally, in addition to standard iterators, it supports interval iterators, allowing to quickly sweep through the consecutive intervals contained in the set.

Macro expansion

Macro expansion utilities are particularly useful in handling configuration files and policies. They take a string and resolve (possibly recursive) macro occurrences within that string against specified macro templates. Macro templates are fully customizable; the set of default ones include a template that resolves system property names into their values. Thus, "${user.dir}" will be replaced by the value of the system property "user.dir", and "${/}" will be replaced by File.separator. As another example, "$trim{text}" uses a macro template named "trim" to operate on text; the macro simply trims the specified string.

Networking

Connection pool manages a pool of socket connections to a single network endpoint. Pooling enables reusing connections for multiple, unrelated data transfers, and it can be used to implement certain connection-based protocols like HTTP 1.1. Additionally, pooling can aid in controlling network load - limiting the maximum pool size causes excessive connection requests to be enqueued at the client side.

Socket wrappers allow to add functionality over existing network connections, so that the decorator is still perceived as a socket. It is very useful in cases when it is impossible to write decorator as a subclass, for instance when the base socket is created by an independent library. Practical application is supplied by compressing wrappers, which enable compression on top of existing socket connections, and can readily serve as an RMI transport. In addition to plain wrappers, wrappers for SSL sockets are also provided.

In-process sockets provide a socket API on top of a shared memory. They can connect to each other only within a process. They can be used to create local in-process bindings within APIs that assume remote access. For instance, when used as an RMI transport, in-process sockets can interconnect local objects while maintaining remote invocation semantics (pass-by-value etc.) yet avoiding security risks associated with network sockets and offering a bit better performance than a loopback network interface.

Tunneled sockets enable a single server socket to act as a tunnel and a dispatcher for connection requests addressed to multiple virtual server sockets. The tunneling is firewall-friendly (it uses a single server port, usually predefined, which thus can be enabled in the firewall) and transparent to the application layer.

Stream-based I/O

Base64 encoder and decoder allow to encode and decode bytes to and from the base64 format. The classes can work both as filtering streams and offline (converting strings to byte arrays and vice versa).

Buffered pipe is an in-memory pipe that links an output stream with an input stream via a dynamically sized memory buffer. Such pipes are useful for decoupling data producing threads from data consuming threads in a way that allows to control and minimize memory footprint.

Various ways of redirecting and splitting streams are handled by ForkOutputStream, RedirectingInputStream, RedirectingReader, and TeeInputStream classes. At the same time, NullInputStream and NullOutputStream provide stream "terminators", similar to /dev/null: the former is always at EOF, and the latter discards all the data written. They are useful to specify "no data" or to discard an output when working with stream-oriented APIs.

CompressedOutputStream and CompressedInputStream provide data compression capabilities while maintaining strong flush semantics, i.e. flushing causes all buffered data to be written out. Because of this feature, these classes can be used for request-response-based applications, e.g. as RMI or RPC transport. Note that standard ZipOutputStream and GZIPOutputStream from java.util.zip package lack this feature, and thus cannot be used as an RMI transport.

Class and resource loading, and JAR handling

Set of classes supporting dynamic class and resource loading and simplifying development of custom class loaders. Separates class loader functionality from (1) policy defining where to find resources and (2) mechanics to actually download and cache resources from the network URLs.

URIClassLoader is a clean-room network class loader implementation that is virtually equivalent to java.net.URLClassloader, but without bugs related to ill-formed URLs and with customizable JAR caching policy. The standard URLClassLoader accepts URLs containing spaces and other characters which are forbidden in the URI syntax, according to the RFC 2396. As a workaround to this problem, Java escapes and un-escapes URLs in various arbitrary places; however, this is inconsistent and leads to numerous problems with URLs referring to local files with spaces in the path. SUN acknowledges the problem but refuses to modify the behavior for compatibility reasons; see Java Bug Parade 4273532, 4466485. Additionally, the JAR caching policy used by URLClassLoader is system-wide and inflexible: once downloaded JAR files are never re-downloaded, even if one creates a fresh instance of the class loader that happens to have the same URL in its search path. In fact, that policy is a security vulnerability: it is possible to crash any URL class loader, thus affecting potentially separate part of the system, by creating URL connection to one of the URLs of that class loader search path and closing the associated JAR file. See Java Bug Parade 4405789, 4388666, 4639900. This class avoids these problems by 1) using URIs instead of URLs for the search path (thus enforcing strict syntax conformance and defining precise escaping semantics), and 2) using custom URLStreamHandler which ensures per-classloader JAR caching policy.

GenericClassLoader, base class of the URIClassLoader, enforces no constraints on the class and resource search algorithm and thus it can be used as a basis of custom class loaders that may load classes from sources different than the network, and/or using resolution approaches different than sequential search.

ResourceLoader is a lower-level utility class that allows to localize requested resource (specified via relative path) given the set of base URLs. The class automatically handles JAR file caching, JAR class paths, and JAR indexes, performing similar search as the standard Java URLClassLoader. In other words, this class is capable of scanning JAR dependencies, downloading JAR files as needed, avoiding excessive download through JAR cache and by examining JAR indexes.

The package contains alternative implementation of JarURLStreamHandler that supports customizable JAR caching policies. It addresses bugs 4405789, 4388666, 4639900 in Java Bug Parade. SUN recommends to disable caches completely as a workaround for those bugs; however, this may significantly affect performance in case of resources downloaded from the network. This class is a part of the solution that allows to tailor the caching policy according to the program needs, with cache-per-classloader default policy. The class can be used as a system-wide JAR handler (by setting appropriate stream factory), on individual URLs (using overloaded constructor), with class loaders from this package (they use it as a default), or with standard URLClassLoader (using overloaded constructor).

Concurrent processing

Emory Utilities package contains the backport-util-concurrent, a concurrency library of choice on Java platforms prior to 5.0. Additionally, certain extra concurrency utilities and extensions are provided (some of which are experimental) as discussed below.

DynamicArrayBlockingQueue is an implementation of unbounded blocking queue based on an array.

Thread context and delegatable thread locals together allow to store/restore and temporarily delegate thread state, including thread locals and context class loaders, to worker threads in thread pools.

ExecutorUtils class provides new utility methods for handling secure thread pools. ExecutorUtils.delegatedRunnable(runnable) method captures the caller's access control context and thread context, and returns a runnable wrapper which will run under that captured context (with restored delegatable thread locals and the context class loader) when later called, even if by a different thread. ExecutorUtils.safeThreadFactory() is similar to Executors.privilegedThreadFactory() from java.util.concurrent (or backport-util-concurrent) in that it captures its creator's access control context and a context class loader and later creates all threads from within these contexts, but it additionally preserves delegatable thread locals, and it provides safeguards against abuse of thread priorities. (Note that the use of Executors.defaultTreadFactory() for secure thread pools should be discouraged, since it can cause non-uniformity among worker threads and cause hard-to-track security bugs.)

SecureThreadPoolExecutor is a thread pool executor that enforces strict context propagation from the submitter to the task, ensuring that the tasks execute within contexts of clients who submit them. (In contrast, the standard ThreadPoolExecutor executes tasks within the security context of a worker thread, which is either (1) inherited from the thread factory creator if a privileged thread factory has been used, or (2) undeterministic − inherited from some previous caller − if the default thread factory has been used; see above).

These extensions may be useful in particular in the middleware systems, where it is not uncommon for separate executable actions to run with different access permissions, depending on the identity of a subject that requested them.

ReentrantNamedLock is a simple reentrant lock that can be shared between different parts of an application without sharing the lock object, by referring to a common name.

ThreadSerializingExecutor is an experimental executor wrapper which ensures that tasks submitted from any single thread are executed sequentially (not reordered and not running in parallel).

Distributed processing

Utility classes that extend traditional APIs to work across the network using RMI as a communication substrate.

Remote streams are networking byte streams that use RMI as the underlying transport layer. Since RMI itself can be enabled over various protocols and socket factories, these classes allow to tunnel byte streams through a variety of protocols. For example, if used with RMIX, it is possible to tunnel streams over SOAP/HTTP.

Remote process allows to control native processes running on remote machines via API analogous to java.lang.Process, using RMI as interconnect.

Security

In a secure application, it is often necessary to read system properties using privileged blocks. Here you will find classes that allow you to do just that, without creating anonymous inner classes.

Also, the package contains utility classes to operate and manage keys, certificates, passwords, and digests.

XML parsing

Even though emergence of XML schema obsoletes DTDs, and binding utilities such as JAXB reduce the need to use DOM API explicitly, the applicability of these new technologies is diminished by the fact that they are not supported out-of-the-box by J2SE 1.4. This package contains set of utilities to simplify usage of DTD validating parsers and DOM trees.

Swing GUI

One of the most interesting classes here is a detailed message box, that is, a message box with the expandable detail pane at the bottom of the window. The dialog allows its users to show/hide message details which themselves may be represented by any Swing component.

Integration with native environment

ExecUtils class simplifies invocation of native shell processes and collecting their results. FilesystemUtils contains rudimentary tools to modify certain file attributes. PvmArch can detect the system architecture.

Integration with garbage collector

The edu.emory.mathcs.util.gc package provides utilities for "better finalization" -- that is, reclamation of resources associated with objects that have been garbage collected, and also reclamation of such resources at VM exit. The oldest way of implementing such reclamation is via finalize() method and System.runFinalizersOnExit(). Those are however problematic for many reasons -- there's no control over the thread in which finalizers execute, object resurrection is possible, and finalization upon VM exit is inherently deadlock prone and deprecated. The Emory APIs build upon java.lang.ref to provide a reliable finalization framework addressing these shortcomings, and give users much more control over what and when is going on. The framework is used in the H2O project to ensure that a client closes its remote sessions on remote machines when session objects are garbage collected or when its JVM exits.

Changelog

Version 2.1 (Jun 4, 2006) [CVS log]

Version 2.0_01 (Mar 20, 2006) [CVS log] Version 2.0 (Jan 31, 2006) [CVS log] Version 1.4 (Jul 17, 2004) [CVS log]