If you are interested in doing a project in our lab, contact Dror Feitelson at .
If you are a double-major studying computer science and cognition, we can discuss options for a project related to the human cognitive aspects of software engineering.
This is an ongoing project (see paper) with many subprojects. We developed a game-like platform to measure how hard it is for developers to understand short snippets of code, and use it to measure snippets with different properties. Future work will focus on using negation in predicates, on composing various elements together, on interaction with short-term memory, and more.
It is commonly accepted that a method should perform a single task. But how small should it be? Making methods very small is good for unit testing, as only few tests are needed. But this may cause the code to become fragmented to the point that it is harder to understand. The project is to try to characterize this tradeoff, and also to review the situation in real open-source code.
Variable names should be "meaningful", but often they are not (see our paper on names, including misleading ones). This project is about using techniques developed for natural language processing, e.g. identification of synonyms using vectors of co-occurring words, to identify potentially problematic names. This includes the questions of whether this should use a text corpus or a code corpus, and what co-occurence window to use (measured in characters? tokens? or based on code structure?).
This projects extends a previous one on single-letter names. Issues include the correlation of names and types, use of different names in different domains, use of especaily long names, and preference of experts and novices for long or short names.
Complexity metrics for OOP are based on reasonable ideas, but have seldom been validated in any way on real code. The project is to study inheritence and composition as they appear in real code (using large popular open-source projects), and come up with hypotheses and experiments on how these properties affect complexity.
Linux (and many other large systems) follows a perpetual development lifecycle model, with a continuous backbone of development activity that is interrupted at times to produce a new stable release (see our paper characterizing this process). It is conjectured that the release activity is related to improving the code and the systems stability, but can this be quantified using code metrics? The project is to characterize what happens both in the preperation before a release and in the period following it, and how it differs from normal development on one hand and continued maintenance on the other.
A developer that enters a long term software project suffers an inherent disadvantage: the code he sees is the result of a long evolutionary process, and is therefore harder to understand than it was to the developers that actually followed the process of its creation. This project is about creating a tool to reconstruct historical views of the code, and assessing whether it indeed helps developers understand the code better. This may be based on file history flow graphs.
System growth is typically measured in LOC or number of modules. But in an operating system like Linux one may be able to measure functional growth. We have already shown that the growth in number of system calls is slowing down considerably, but the growth in configuration options is accelerating (see paper). The project is to extend this to other metrics, such as flags to system calls and ioctl calls.
It is generally accepted that programmers may have very different levels of competence, but this is rarely quantified. This project will attempt to derive a quantification of the relative contribution of different developers to a project, by studying data about the development of open source projets.
This is related to workload resampling, based on inspiration from Michal Irani's work on image and video processing. The idea there is that a lot of cool things can be done based on inherent redundancy in visual data, where patches of the image are similar to other patches. We want to apply the same ideas to workload data, and do things like
It is tempting to think of workloads as stable and stationary, meaning that the rate of arrival of new jobs is constant and within system capacity. But empirically this is not always the case, and systems may experience intermittent overload. So understanding and modeling such overloads is in order. The project is to analyze workload logs from parallel system, identify "red days" -- namely days in which the arriving load is higher than the system capacity -- and then
memory reference models like the stack model try to mimic the correct correlation structure, but ignore other structure that exists in real reference traces. The project is to create a more realistic model, based on components such as repeated use, scans, and random accesses.
Another deficiency of the stack model is that it leads to a "crawling phase shift": occasionally an address from deep in the stack is selected, and it then becomes popular. A more realistic model would include phase shifts explicitly, and change the popularities of a whole set of addresses collectively.
A deeper challenge is to figure out whether all this is important for evaluating real systems. One idea: real phase shifts may be important when evaluating processors with multiple issue and out of order execution, because they can overlap multiple requests for data items from the new locality.
Various parameters of real-world workloads are heavy tailed, meaning that there is a non-negligible probability of very large values. An example is Unix process lengths: most are short, but some are extremely long. Simulating system behavior under such conditions may not converge, and in any case is very sensitive to the few largest samples, which are largely random. The project is to help develop and evaluate methods to overcome this difficulty, based on truncated models of the distributions. This means that we artificially remove some of the most problematic samples in a controlled way, to try and characterize their effect.
We normally design new scheduling algorithms that operate with no constraints, and can schedule available jobs as they wish. But data from workload logs indicates that real systems actually operate under various constraints, and there are instances where the scheduler seems not to utilize the resources as well as it might. The project is to identify, analyze, and characterize such instances by a detailed comparison of the decisions made by simple schedulers like EASY as opposed to what is recorded in the logs. This can be augmented with knowledge of the scheduling mechanism, e.g. the SLURM resource manager that is used on machines like Curie. Finally, this can shed light on the relevance of simple simulations, and what needs to be added to make them more realistic.
Schedulers are often evaluated by showing how they perform as a function of the load. Thus it is required to generate workloads with varying load levels. The question is how to do this correctly; maybe all workload parameters need to come from distributions conditioned on the desired load level? An intriguing alternative is to try and analyze a single realistic run and extract data about different load conditions that occurred in it. And how can we decide which methodology is more representative?
One of the constraints on schedulers is that they need to operate in an on-line manner without knowing the future. But how much does knowing the future help? The project is to try and design a scheduler that exploits future knowledge, in the context of parallel job scheduling. This will shed light on the question of how much we should invest in being able to predict the future.
There are two basic models of how work arrives at a system: the open model and the closed model. The difference is that in the closed model there exists a feedback effect from the system's performance to the submittal of additional work. The problem is that usually we don't have any data about such feedback. The project is to generate such dependencies based on educated guessing. We will look at the trace, and try to guess whether submitted jobs depend on previous ones. Specifically, we will look at previous jobs submitted by the same user (as identified by field 12), and when they terminated. We will then assign dependencies, and re-write the trace with this added heuristic data in fields 17+18.
One common way to schedule parallel jobs is with backfilling. This requires job runtimes to be known in advance. Production systems typically just require users to provide runtime estimates, but there have been several research ideas about how to predict runtimes. The project is to compare these ideas in the context of the scheduling, i.e. to see how they affect the scheduling results.
Many years ago we published a paper on self tuning a system using genetic algorithms. Some years later someone actually implemented genetic algorithms in the Linux kernel! question is, can we really use this effectively?