Object Oriented Queries Over Software Systems

ACM SIGPLAN 2007 Workshop on Partial Evaluation and Program Manipulation

Invited Talk by Oege de Moor

Joint work with Elnar Hajiyev, and Mathieu Verbaere

Abstract

Code queries are useful for enforcing coding conventions, navigating a large code base, and for identifying locations to refactor. The program understanding community has long advocated the use of a relational database to facilitate such code queries, but relational queries over code have not found widespread use.

We argue this is due to a lack of scalability (it takes too long to evaluate queries), and a lack of a modern query language with all the tool support that entails (writing code queries in SQL can take many pages). In this talk we demonstrate a new system that goes some way towards alleviating both problems.

First, we demonstrate IQL, an object-oriented query language. It allows the concise expression of complex queries; due to its object-oriented features, it is easy to extend and modify existing queries. Another benefit of object-orientation is that it gives natural tool support, for instance for auto-completion.

The semantics of IQL is defined by translation to Datalog. Datalog is a logic programming language like Prolog, but it lacks data structures. As a consequence, all queries in Datalog are guaranteed to terminate, and they have a very simple semantics, enabling aggressive optimisations.

While Datalog has been extensively studied in the theoretical database community, it lacks a proper type system. We introduce a type system with subtyping to account for the class hierarchy of IQL. Type inference is achieved via an abstract interpretation that maps each relation between runtime values to a relation between binding sites.

We implement Datalog itself via a compiler to procedural SQL; a configuration file allows us one target different relational database systems as the backend. The compiler performs many optimisations, ranging from straightforward constant propagation to a form of the well-known `magic sets' transformation. It also performs type specialisation, based on the abstract interpretation in the type inference algorithm.