Group Members

Tim Brecht, Computer Science
Peter Buhr, Director, Computer Science
Charlie Clarke, Computer Science
Gordon Cormack, Computer Science
Mike Godfrey, Computer Science
Ric Holt, Computer Science
Ondřej Lhoták, Computer Science
Steve MacDonald, Computer Science

Contact Person

Peter A. Buhr, 519-888-4567 ext. 34453, pabuhr@plg.uwaterloo.ca
URL: http://plg.uwaterloo.ca

Research Objective

The Programming Languages Group (PLG) conducts work in 3 applied research streams: programming languages, text retrieval and software engineering. These three streams are united by the underlying use of basic programming-language technologies: composition methodologies, scanning, parsing, symbol tables, type theory, code generation, static/dynamic analysis and management.

Programming Languages

Peter Buhr's research concentrates on developing concurrent software in C++ for shared-memory multiprocessor computers running UNIX. Areas of research interest include: profiling, debugging and visualizing of concurrent programs; advanced concurrent exception handling; and real-time specification and scheduling. The goal of this work is to provide a framework and a set of tools within that framework to help construct, understand and maintain robust real-time programs. Real-time programs form the core of many industry-critical applications, e.g., medical equipment, engine management, flight control, manufacturing, telephony, etc. The number of real-time programs is increasing significantly as computers are now embedded in more and more products. Unfortunately, real-time programming is extremely difficult to specify, code, debug and maintain, and therefore, represents a major impediment to advances in these strategic and economically important areas. To ameliorate these difficulties, I develop new programming language facilities to simplify real-time programming and generalize real-time capabilities. In addition, I develop tools for profiling, debugging and replaying concurrent real-time programs to help understand and control execution behaviour.

Ondřej Lhoták's research concentrates on interprocedural static analysis of object-oriented and aspect-oriented languages. Program analysis has long been used to generate efficient code, and it is increasingly being used in software engineering tools. These applications require precise and efficient program analyses. The increased modularity enabled by object-oriented and aspect-oriented languages makes interprocedural analysis necessary for precise results. Therefore, Lhotak is working on making precise interprocedural analyses efficient enough to be practical. Specifically, Lhotak and colleagues explored the use of binary decision diagrams (BDDs) to represent the large data structures that must be manipulated in interprocedural analyses. In particular, BDDs have been shown to be able to compactly encode context-sensitive analysis facts. Currently, Lhotak is using his BDD-based analysis implementation to study the effects of context sensitivity on analysis precision, and to design efficient context-sensitive analyses. Lhotak and colleagues have applied program analyses both to generate efficient code (for Java and AspectJ), and to aid programmers through software development tools (in particular, the Eclipse IDE).

Steve MacDonald's research concentrates on controlling the execution of multithreaded programs using a combination of compiler analysis and aspect-oriented programming. As well, the hardware architecture is utilized in ``design-pattern'' environments, exploring the rapid development problems of concurrent applications in pattern-based systems, including difficult issues such as extensibility, flexibility, and support tools. Separately, extensions to the JVM are being examined to better support advanced execution features. Finally, a new project is under development to improve documentation generation for aspect-oriented programs, where the document generator weaves text from the aspects into the end product as opposed to current approaches that simply reference separate documentation produced for the aspect.

Text Retrieval

Charlie Clarke's research concentrates on information storage and retrieval, programming language implementation and software tools. At present, I am involved in a major project to develop and prototype scalable technologies for the implementation of digital library-systems on parallel and distributed platforms. Issues addressed by the project include system performance, index and data compression, fast update, data distribution and load balancing. The goal is to provide scalable performance at low cost. A second project has the goal of exploiting large collections of Web data to perform fact-finding and question-answering tasks. Given a simple, factual question, a fast algorithm has been developed that finds short passages of text likely to contain the answer. By combining an analysis of several passages, a high-quality result can be obtained. The current focus is on improving both the speed and effectiveness of the basic technique. I am also involved in projects related to XML, program comprehension tools and memory management.

Gordon Cormack's research concentrates on programming language issues in concurrent and distributed systems, as applied to the vast data repository comprising the Internet. The principal objective of this research is to construct a global information architecture and operating environment in which data can be accessed by content, as opposed to data structures, file systems, and databases with rigid schemata. The research issues addressed include associative searching of a store (+200 GB), the construction of a file system and programming environment based on this store, and the investigation of distributed coherency problems that arise in this environment.

Software Engineering

Michael Godfrey is investigating software architecture, software evolution, and rapid migration. Software architecture and evolution involves performing detailed case studies of how and why large software systems evolve over time. This work is both labour and compute intensive: initial creation of architectural models is done manually, but the extraction of evolutionary histories of large systems is automated, hierarchically organizes millions of "facts" and subsequently allowing complex queries on the histories. A recent study on the growth of the Linux operating system kernel is typical of this work; this case study examined 96 versions of the Linux kernel comprising over 87 million lines of code in total. Ongoing projects include analysis of the history of the gcc compiler and the vim text editor; future projects will include commercially developed systems, such as the StarOffice toolsuite.

Ric Holt's research concentrates on extracting and viewing the architectures of large productions systems, such as Linux and Netscape. Once a software system reaches a certain size, around 300,000 lines of code, the architecture of the software becomes a central help or hindrance in further development and maintenance. The hierarchy in the architecture is commonly diagrammed as a nested set of box and arrow diagrams. The boxes represent components of the software system and the edges represent interactions. Extracting and viewing the architectures is done by parsing the system's source code and emitting it as a huge graph, with hundreds of thousands of edges, where the edges represent "calls" and "data references". To this is added the "contain relation", giving the system hierarchy. This huge graph is manipulated using Tarski's algebra of typed graphs, to produce various abstractions, such as a simple high-level view of the top-level components. A web-based Java applet is used to visualize and navigate among the resulting views.