Improving Usability of Refactoring Tools

Emerson Murphy-Hill, Portland State University, emerson@cs.pdx.edu

Abstract

Refactoring is the process of changing the structure of code without changing its behavior. Refactoring can be semi-automated with the help of tools, but many existing tools do a poor job of communicating errors triggered by the programmer. This poor communication causes programmers to refactor slowly, conservatively, and incorrectly. In this paper, I demonstrate the problems with current refactoring tools, characterize three new tools to assist in the Extract Method refactoring, and describe a user study that compares these new tools to existing tools. The results of the study show that these new tools increase both the speed and accuracy of refactoring. Based on the new tools and my observation of programmers, I present several guidelines to help build future refactoring tools.

Problem and Motivation

Refactoring, the process of changing the structure of code without changing the way a program behaves, is a potentially useful technique for building and maintaining code [Fowler99]. For example, using the Extract Method refactoring, a programmer may remove duplicated code by putting it into a new method and calling that new method instead. Semi-automated refactoring tools in most commercial programming environments relieve the programmer from having to make tedious and error-prone changes by hand. For example, a tool can encapsulate a field using getter and setter methods, and the tool will automatically replace all direct references to the field with references to the new methods. In this way, a refactoring tool offers the potential for tremendous increases in refactoring productivity.

In a small exploratory study, I set out to determine how productivity is effected using refactoring tools. I used the Extract Method refactoring tool in the Eclipse environment, because Extract Method is a common yet complex refactoring [Fowler99,Murphy06] and because Eclipse contains a mature Extract Method tool. In this exploration, I observed 11 programmers perform several Extract Method refactorings in Java for about half an hour per programmer. Six of the programmers were graduate students, two were professors, and three were commercial software developers. During each session, the programmer was allowed to extract methods anywhere in a large corpus of open-source code. Each programmer was able to successfully extract between 2 and 16 methods. Nine out of the eleven programmers triggered at least one refactoring error message during the study. I made two particularly interesting observations:

In general, I regard both of these problems as significant hindrances to refactoring for several reasons. First, all refactorings require the programmer to select program elements and to understand violated preconditions, so the problems are not specific to Extract Method. Second, these problems caused programmers to use refactoring tools less often, because they misunderstood how the tool works. Third, while the frequency with which these problems appeared varied among programmers, the problems were sometimes severe. For example, in one session error messages resulted during more than 2/3 of the Extract Method attempts. While this study does not say with certainty that errors are the most important problem facing refactoring tools, the study does provide a glimpse into the usability problems with existing tools. Further details about this exploratory study can be found in my technical report [Hill06].

To address the problems exposed in the exploratory study, I created two tools to help the programmer select code and one tool to help the programmer understand violations of refactoring preconditions. These tools were written in Eclipse; a short movie and downloads are available at http://www.multiview.cs.pdx.edu/refactoring.

Selection Assist Image
Figure 1. Selection Assist.

Box View Image
Figure 2. Box View.

Refactoring Annotations Image
Figure 3. Refactoring Annotations.

The two tools to assist the programmer in selecting statements suitable for input to Extract Method tools are called Selection Assist and Box View. Selection Assist (Figure 1) overlays program text with a light green color, in order to provide a visual cue to the programmer of the extent of a program statement. Box View (Figure 2) resides to the left of the program text, and represents each statement as a rectangle. When a box is selected, the corresponding program text is selected, and vice versa.

The tool that helps the programmer understand violations of refactoring preconditions is called Refactoring Annotations. Refactoring Annotations eliminate the need for several error messages — instead, a description of the error is displayed graphically. The code that is about to be extracted into a new method is first selected by the programmer, then Refactoring Annotations are activated before the refactoring takes place. Each variable involved in the refactoring is assigned a distinct color. Lines are drawn in the top of the selection to indicate parameters that must be passed in to the extracted method. Lines are drawn through the bottom of the selection identifying the value that will be returned. Other lines are drawn to indicate control flow. When a refactoring precondition is violated, an X is drawn on top of the appropriate line, indicating the location(s) of the offending code. Figure 3 shows an example of Refactoring Annotations where two values would be returned from the extracted method — a violation of a refactoring precondition.

At this point, most research projects halt: a problem has been identified, a tool has been created, and the problem is considered solved. However, this is just the beginning of my research, because my objectives included showing measurable usability improvements in real refactoring situations and producing guidelines to help build other kinds of highly usable refactoring tools.

Background & Related Work

The three tools I have created provide only a limited technical contribution, as they can be viewed as simply another application of existing user interface techniques. DrScheme has an tool similar to Selection Assist [Findler02] and Box View's appearance is similar to a web page authoring tool in Adobe GoLive [Adobe05]. Refactoring Annotations are similar to Control Structure Diagrams [Hendrix00] and variable arrows drawn in DrScheme [Findler02]. Nevertheless, my tools represent a novel application of existing techniques.

Some tools avoid having to present precondition violations by silently resolving them. For instance, when you try to extract an invalid selection in Code Guide, the environment expands the selection to a valid list of statements [Omni05]. You may then end up extracting more than you intended. With X-Refactory, if you try to use Extract Method on code that would return more than one value, the tool generates a new tuple class [XRef07]. Again, this may or may not be what you intended, and is not the only solution to the violation.

When considering how to improve the human interface to refactoring tools, it is worthwhile to examine existing usability guidelines. For example, if we try to apply Smith and Mosier’s five objectives for data display [Smith86] to refactoring precondition violations, we find that the error messages presented by refactoring tools meet the objectives quite well. However, as we saw in the last section, there are still problems with these error messages. Generally, I have found that high-level guidelines are not specific enough to guide the development of the user interface to refactoring tools. Furthermore, if we try to apply lower level guidelines, the advice can imply an overly-restrictive user interface. For example, if we apply Shneiderman's principles for good error messages [Shn82], we are compelled to use a natural language notation to explain errors. But as I will show later in this paper, a graphical notation can be employed more effectively.

While the literature is rife with new refactoring tools and techniques, very little work exists on what the interface to refactoring tools should look like. Mealy and Strooper comparatively evaluated several refactoring tools, concluding that "usability of refactoring tools requires further research/consideration" [Mealy06]. My research provides a step in that direction.

Uniqueness of the Approach

This research is unique in two respects. First, the three tools I have built are based on empirical observation and have been validated in a controlled human-subjects experiment. Second, based on the observations of programmers, the new tools, and the old tools, I expose a set of guidelines that I anticipate will be useful for future refactoring tools. The experiment and derived guidelines are discussed in the next section.

Results and Contributions

I performed a controlled experiment in order to determine if and when the new refactoring tools allow programmers to perform better than with existing refactoring tools. The experiment has two parts. In the first part, programmers used the standard mouse and keyboard, Selection Assist, and Box View to select program statements. In the second part, programmers used the standard Eclipse Extract Method Wizard (with error messages) and Refactoring Annotations to identify problems in a selection that violated Extract Method preconditions. In both parts, I evaluated their answers for speed and correctness. Experiments were conducted with each subject individually, lasting between 0.5 and 1.5 hours each.

I drew subjects from an object-oriented programming class containing 18 students, 16 of whom elected to participate. Of these 16 participants, most had around 5 years of programming experience, but three students had around 20 years. Half of the students had used integrated development environments, but only two students had used refactoring tools.

Space constraints prohibit a full description of the experiment, explanation of the results, discussion of the threats to validity, and the contents of a post-test questionnaire; these can be found in my technical report [Hill06].

Experiment 1: Code Selection

In this experiment, I compared how fast people select statements in open-source Java code using three tools. Using a randomized blocked experiment design, I randomly assigned each participant to one of five groups. Within each group, each subject was told to use a keyboard or mouse (subject's choice), Selection Assist, or Box View to select every if statement in several predefined methods. Between groups, tool usage and code selection order was randomized. Essentially, each programmer was told to select about 20 if statements with each tool. Each subject was trained for a few minutes on how to use each tool and was allowed to practice using each tool on some example code. I recorded whether each programmer selected the statement correctly (ignoring whitespace), and how long it took to select each statement. A summary of the results across all participants is shown below:

Tool Total Mis-selected
If Statements
Total Correctly Selected
If Statements
Mean Selection Time Selection Time as Percentage of
Mouse/Keyboard Selection Time
Keyboard/Mouse 37 303 10.2 seconds 100%
Selection Assist 6 355 5.5 seconds 54%
Box View 2 357 7.8 seconds 71%

The data in this table shows that Selection Assist allowed the programmer to select statements fastest, but Box View allows the most accurate selection. Both Selection Assist and Box View were faster and more accurate than the keyboard or mouse.

A post-experiment questionnaire was administered to gauge users' subjective feelings about the tools. Most users did not find the keyboard or mouse alone helpful in selecting if statements, and generally rated the mouse and keyboard lower than either Box View or Selection Assist. All users were either neutral or positive about the helpfulness of Box View, but were divided about whether they were likely to use it again. Selection Assist scored the highest marks of the selection tools, with 15 of 16 users reporting it was helpful and they were likely to use it again.

Experiment 2: Error Comprehension

In this experiment, I compared how well error messages and Refactoring Annotations helped programmers understand the causes of violated refactoring preconditions. I randomly assigned each participant to one of two groups. Both groups first used the standard Eclipse Extract Method Wizard (with error messages) on 4 Extract Method candidates, then used Refactoring Annotations on 4 different Extract Method candidates. Subject group 1 attempted Extract Method candidate set A, then candidate set B, whereas subject group 2 attempted set B and then set A. I pre-selected the Extract Method candidates from open-source Java code to vary in length and anticipated difficulty. However, set A and set B were chosen to contain candidates of approximately equal size and to contain the same number and kind of violated preconditions. Participants were instructed to use Eclipse error messages to determine the location(s) of code causing errors in 4 Extract Method candidates, then repeat the process for different code using Refactoring Annotations. For example, suppose the subject were given several statements to extract into a new method, but those statements contained assignments to two different variables whose values were used in the following code. I would then ask the subject to diagnose the problem using the assigned tool and then allowed them to indicate the two offending variables. For each Extract Method candidate, the time to complete the task and the correctness of the response was recorded. Below is a summary of the results across all participants:

Tool Missed
Violation
Irrelevant
Code
Mean Identification Time
Eclipse Error
Messages
11 28 164 seconds
Refactoring
Annotations
1 6 46 seconds

In the table, “Missed Violation” means that a subject failed to recognize that one or more preconditions were being violated. “Irrelevant Code” means that a subject indicated some piece of code that was irrelevant to the violated precondition, such as indicating a break statement when the problem was multiple return values. The data in the table shows that, using Refactoring Annotations, programmers were several times less error prone and about three times faster.

In the post-experiment questionnaire, subjects were unanimously positive on the helpfulness of Refactoring Annotations, and almost all of them preferred Refactoring Annotations to the standard Eclipse Extract Method Wizard (with error messages). Concerning the standard Eclipse Extract Method Wizard, subjects reported that they “still have to find out what the problem is” and are “confused about the error message[s].” In reference to the error messages the Eclipse tool produced, one subject quipped, “who reads alert boxes?”

Interpretation

While the results show the new tools as promising alternatives to existing tools, the results are open to interpretation. Due to limitations of the experiments and variability in programming experience and context, I cannot claim that any tool is strictly "better." Furthermore, because the human subjects were students from one particular class and therefore do not represent a random sample of programmers, it would be inappropriate to apply common statistical techniques, such as analysis of variance. Instead, I encourage you to take the data at face value and build your own interpretation. What follows is my interpretation of the results.

Programmers can use either Box View or Selection Assist to improve code selection. Box View appears to be preferable when the probability of mis-selection is high, such as when statements span several lines or are formatted irregularly. Selection Assist appears to be preferable when a more lightweight mechanism is desired and statements are less than a few lines long. An effective statement selection tool is critical to a successful Extract Method refactoring.

Refactoring Annotations are preferable to a wizard-based approach for showing precondition violations during the Extract Method refactoring. The results of this study indicate that Refactoring Annotations communicate the location of precondition violations effectively. When a programmer has a good understanding of refactoring problems, I believe the programmer is likely to be able to correct the problems and successfully perform the refactoring.

Derived Guidelines

Based on the experiments, observations, and tools described in this paper, I have derived a set of guidelines for building future tools that help with refactoring. Tools that help the programmers with selection should:

Tools that communicate violated refactoring preconditions should:

While these guidelines may seem obvious, prior to this research I could identify only about half of them, and even then had little idea how they might be realized in a practical refactoring tool or whether they were truly important to supporting the programmer when refactoring. In current research, I am working towards expanding these guidelines to all phases of the refactoring process and for several refactorings.

Contributions

This research makes three contributions: three new tools designed to address observed usability problems with existing tools, a human-subjects experiment showing concrete usability improvements using these new tools, and guidelines for future refactoring tools based on my observations.

Conclusion

Refactoring is an important part of software development and refactoring tools are critical to making refactoring fast and behavior preserving. In this paper, I have presented three new tools that help programmers avoid selection errors and understand violations of refactoring preconditions. Through a user study, I have demonstrated that these tools exhibit several qualities that improve the experience of refactoring, help programmers correctly identify problems with a proposed refactoring, and increase speed of the refactoring process. I hope that these qualities will be adopted by new refactoring tools, make tools more usable and thus more used, and eventually contribute to the production of more reliable, on-time software.

References