Dr. rer. nat. Dipl.-Inform. Michael Burch
Email: michael.burch@visus.uni-stuttgart.de


VISUS - Institut für Visualisierung und Interaktive Systeme - Stuttgart

37223015123534311235119153913341993210


Indented Pixel Tree Plots

Paper pdf-Version
Indented Pixel Tree Plot for the NCBI Taxonomy

The NCBI Taxonomy with more than 324,000 nodes

The Indented Pixel Tree Plots (IPTPs) are initially developed to display huge hierarchies. With this novel pixel-based technique we are able to explore and understand the structure of a tree. The visualization is inspired by the visual metaphor of indented outlines, omnipresent in graphical file browsers and pretty printing of source code.

The idea is to represent inner vertices as vertically arranged lines and leaf groups as horizontally arranged lines. To avoid a scattering of leaf nodes in the plot we always group them to the rightmost position on their corresponding hierarchy level. This strategy leads to implicitly displayed edges in contrast to explicitly displayed ones in traditional node-link diagrams for trees. As a result, the visualization is sparse, redundant-free, and minimalistic in a way that it needs as little graphical primitives as possible.

We conducted a user study with 30 subjects in that we compared IPTPs and node-link diagrams as a within-subjects variable. The study indicates that working with IPTPs can be learned in less than 10 minutes. Moreover, IPTPs are as effective as node-link diagrams for accuracy and completion time for three typical tasks; participants generally preferred IPTPs. The usefulness of Indented Pixel Tree Plots is demonstrated by exploring and understanding hierarchical features of huge trees such as the NCBI taxonomy with more than 324,000 nodes. An example figure of the NCBI taxonomy as an IPTP is given at the top of this page. A blue to red color gradient is used to additionally encode the depths of hierarchical elements. The hierarchy is deeply structured with at most depth 40. The deepest substructures can be seen near the center of the plot at the red color coded part. Here, the Perciformes species is visually encoded in the hierarchy.

"The Perciformes, also called the Percomorphi or Acanthopteri, is one of the largest orders of vertebrates, containing about 40 percent of all bony fish. Perciformes means perch-like. They belong to the class of ray-finned fish and comprise over 7,000 species found in almost all aquatic environments. They are also the most variably sized order of vertebrates, ranging from the 7 millimeters (0.3 in) Schindleria brevipinguis to the 5 meters (16 ft) Makaira species. They first appeared and diversified in the Late Cretaceous" (taken from Wikipedia).

Their variety and diversity explains the deeply structured hierarchy there.

Indented Pixel Tree Plot for the NCBI Taxonomy

The NCBI Taxonomy in a blue, green, red color coding

As an additional feature, Indented Pixel Tree Plots can be extended by space-filling lines starting all at the same vertical or horizontal position and ending at the corresponding point at the tree outline, see the figure to the right. This makes the diagram more aesthetically appealing. Furthermore, different color codings can be applied to the IPTPs. To this end, we added some trivial interactive features to the plots such as scaling up and down, filtering, details on demand, or the like. Expanding and collapsing of subhierarchies is also possible. We applied the technique to a list of open source software systems and found very different hierarchical structures.

Indented Pixel Tree Plot for the NCBI Taxonomy

User Study Details

In a user experiment, we firstly investigated the readability of IPTPs compared to node-link diagrams both without color gradient and interactive features. The user study followed a within-subjects design. We chose node-link diagrams as source of comparison because those diagrams are the most widely used, well established, de-facto standard for hierarchy visualization. Each test of the visualizations included three dataset sizes and three tasks. Questions were designed so that subjects had to answer in forced-choice fashion.

A stochastic algorithm generated all datsets synthetically. The dataset construction was parameterized by the size of the hierarchy in terms of the number of vertices and the maximal depth. One constraint was that the presentation space was identical for both techniques. The experiment included three tasks:

Tasks

  1. Find the least common ancestor of two leaf vertices.
  2. Check the existence of an identical subhierarchy elsewhere in the plot.
  3. Estimate which of two subhierarchies is the larger one.

All tasks are important when exploring attributes that are attached to all hierarchy levels since patterns in the set of attributes may be caused by corresponding hierarchy levels that may show similar hierarchical patterns. We chose the tasks as a result of a pilot study.

Study Method

We chose a within-subjects study design with 30 participants. They had to answer questions that were recorded by an operator. Subjects could additionally mark preferences and provide comments by filling in questionnaires.

Environment Conditions and Technical Setup

The user experiment was conducted in a laboratory that was insulated from outside distractions. All visualizations were presented on a 24 inch Dell 2408 wfp ultrasharp TFT screen at a resolution of 1920 x 1080 pixels with 32bit color depth. To avoid wrong results the subjects' responses were recorded by an operator pushing two specially marked keys on a PC keyboard. We assume that the delay associated with the operator for every task execution and recording loop was approximately the same amount of time (fault tolerance < 100ms).

Subjects

Thirty (23 male, 7 female) subjects were recruited. Gender was not considered a confounding factor for this study. Twenty-seven participants were undergraduate students of our university and three were graduate students. Twenty-three subjects were computer scientists and seven were engineers. The average age was 27 years (minimum 22, maximum 53). Subjects were paid 10 Euros for participating in the user experiment. Twelve stated that they were familiar with visualization techniques or had attended a lecture with this topic. Eighteen stated that they were not familiar with visualization techniques. All participants had normal or corrected-to-normal color vision, which we confirmed by an Ishihara test and a Snellen chart to estimate visual acuity.

Study Procedure

First, subjects had to fill out a short questionnaire about age, field of study, and prior knowledge in visualization techniques. Then, they read a two-page instruction manual on IPTPs and node-link diagrams. After the participants were given time to read this tutorial, we did a practice run-through of the user tasks. The time duration of the complete training was 10 minutes. During this practice test, subjects could ask questions about the visualization technique and clarify potential problems or misinterpretations. We also used the practice test to confirm that the subjects understtod both IPTPs and node-link diagrams.
Then, we continued with the main evaluation that took between 15 to 20 minutes depending on the fitness of the subject. There was a "Give Up" option, but it was not used by the subjects. Tasks, tree sizes, and visualization types were randomized and balanced to compensate for learning effects. Each participant had to perform Task 1, Task 2, and Task 3 for each tree size and visualization type. One task consisted of seven trials per tree size. The time limit for every trial was 20 seconds.
In task 1, the child nodes were marked by red colored circles in case of node-link diagrams, and by red colored triangles in case of IPTPs. The area of circles and triangles is equal sized. Two possible ancestors were colored in green and blue. In task 2, one subhierarchy was marked with a red starting node. In task 3, the starting nodes of two subhierarchies were marked green and blue. Subjects had to respond with blue or green (task 1 and task 3) and Yes or No (task 2). Completion times and correctness of answers were recorded for the seven trials. This procedure resulted in a total of 126 measurements for each participant since we showed the combination of two visualization techniques, three tasks, three dataset sizes, and seven trials.
After the main evaluation, subjects were given a second questionnaire in which they marked their preferences in using one of the two visualization techniques. Finally, participants were given the opportunity to provide open, unconstraint comments.

Study Results (Completion Times for Node-Link Diagrams (NLDs) and Indented Pixel Tree Plots (IPTPs)

Task 1Task 2Task 3
Small
t0.450.170.002
NLDs1995 (545)5328 (1634)3576 (1215)
IPTPs2016 (934)5090 (1293)4514 (1473)
Medium
t0.370.210.07
NLDs2110 (541)5930 (1723)5107 (2183)
IPTPs2059 (987)6156 (2018)4500 (1166)
Large
t0.490.41 <0.001
NLDs2388 (562)8685 (3001)3325 (926)
IPTPs2386 (1315)8603 (2181)3976 (1257)