Parallel Coordinates and Radar Chart
Figure: The Parallel Coordinates (PCP) and Radar chart are both familiar InfoVis methods for exploring and analyzing multi-dimensional data. Demonstrator:
The Radar chart is a well-known InfoVis method to explore multi-dimensional data with an arbitrary number of data items/observations - primarily suited for strikingly showing outliers. Each star in the Radar chart represents a single data item (country in figure).
The Radar chart (or star plot) consists of a sequence of equi-angular spokes, called radii, with each spoke representing one of the variables. The data length of a spoke is proportional to the magnitude of the variable for the data item relative to the maximum magnitude of the variable across all data points. A line is drawn connecting the data values for each spoke. This gives the plot a star-like appearance. The Radar chart can be used to answer the following questions:
- Which data items e.g. countries are most similar, i.e., are there clusters of countries with the same profile? Radar charts are used to examine the relative values for a single data point (e.g., data item "Niger" is large for variables "age 0-14" and "Fertility rate", small for "age 65+" and "age 15-64". "Germany" is large for variable "age 65+" etc.
- Locate similar data items (countries) or dissimilar data items
- Are there outliers?
Radar charts are criticized as poorly suited for making trade-off decisions – when one chart is greater than another on some variables, but less on others. Further, it is hard to visually compare lengths of different spokes, because radial distances are hard to judge, though grid lines may help (see figure).
Figure: Radar Chart comparing 5 indicators for 3 countries Sweden, Niger and Germany. We observe that Sweden and Germany have almost the same star profile while Niger differs very much..
Parallel Coordinates Plot (PCP)
The PCP is a proven information visualization technique for identifying trends and cluster in many scientific environments and among the most common subjects of academic papers in InfoVis but has not been used in statistical visualization. PCP enables visual representation of spatio-temporal and multi-dimensional indicator data and hence could become an important mechanism in Statistics Analytics. Initially the PCP can be confusing to understand but is a powerful tool for exploring and gain insight into a multi-dimensional numerical dataset.
The PCP was originally invented by Philbert Maurice d'Ocagne in 1885, and later re-discovered by Al Inselberg in the early 60's and popularised as a coordinate system in the early 80's with the growing interest for information visualization (InfoVis) domain.Inselberg made a full review of how to visually read out parallel coordinates relational patterns. When most lines between two parallel axis are somewhat parallel to each others, that suggests a positive relationship between these two dimensions. When lines cross in a kind of superposition of X-shapes, that's negative relationship. When lines cross randomly or are parallel, that show there is no particular relationship.
There’s a simple and educational way of getting a better understanding of the PCP through the representation of a data table:
Figure: Each axis represents a single statistical indicator (Column). Each country is represented by a string (line) passing through the “parallel axes” intersecting each axis at a point depending on the indicator value of region.
Figure: PCP comparing 5 indicators for 3 countries Sweden, Niger and Germany. Try this HTML5 based Demonstrator.
Each country in the table is represented by a string (line) passing through the “parallel axes” intersecting each axis at a point depending on the indicator value of region. Each axis represents a single statistical indicator. The highlighted strings forms a visual representation (profile) of the characteristics of one region (Niger etc.).
The following educational text and figures for the PCP can be tested and evaluated in the eXplorer applications: eXplorer
Selecting and changing the order of the variables (indicators)
- The “Ind.Select” icon or the Set Indicators in PCP opens a panel (figure A) with all current active indicators. The panel shows which indicators are used and their order in the PCP. The user can now select indicators to be used in the PAC and select their internal order.
- Five indicators (Population 0-14 yrs, Population 15-65 yrs, Population 65+ , Life Expectancy and Fertility Rate) are selected for the PCP analyse (figure B);
- You can change the order of the indicators in the PCP by holding down left mouse button and drag the indicator to another position;
- Figure C shows dragging indicator Fertility Rate from its previous position into the first place before Population 0-15 yrs;
The revised order of the PCP indicators are shown here.
Figure: Select and Change the order of the variables (indicators). A) Settings Panel B) Select Variables C) Move a PCP axis
Histograms attached to a PCP axis
Histograms attached to each PCP axis are used to visualize the distribution of the indicator data values, splitting the axes into a user defined number of equally high rectangular areas (bins). A coloured region line intersects the PCB axis at a point depending on the indicator value and is then sorted into corresponding bin. The length of a rectangle indicates the frequency of regions intersecting that bin, the more regions within an area the wider the rectangle. Histograms can be turned on/off and the number of bars can be specified. Histograms are dynamically moved during a time animation and can expose interesting trends over time.
Figure: Indicator “age group 65+” represented by one PAC axis with minimum value at the bottom and maximum at the top. The axis is divided into15 equally sized bars. The length of the bars depends on the number of lines (regions) crossing the axis at within the bar. Examples of 1, 3 and 4 lines crossing are displayed.
Dynamic animation in linked and coordinated views
Figure below: Three linked and coordinated views PCP+Map+Data Table showing representation of a European NUTS2 data set with 4 indicators for France only, where “population age group 65+%” represents the coloured indicator. Each axis represents a single indicator and regions are represented by coloured strings (“population 65+ %) passing through the axes. PAC facilitates exploration and analysis of the relationships between several indicators and regions. Negative relationship, for example, is shown between indicator "Population 15-64" and "Population 65+". Similar profiles of selected regions can be seen by highlighting". Paris, for example, is highlighted in a black thick line. The green thick line represents the mean value for each attribute and allows a comparison between actual and mean value. Each column in the Data Grid corresponds to an axis (indicator) in the PCP. A string corresponds to a row (region) in Excel.
Scaling of individual PCP axes
The icon with two arrows is used to switch min and max order along the axis. For example, in the figure, the indicator “unemployment rate” is regarded to have max value at the bottom (worse scenario) while low unemployment rates are placed at the bottom.
Visual Inquiries based on dynamic filter operations
A special interactive advantage often applied in the PCP technique is the capability to dynamically make visual inquiries and filter data. Filtering data is a critical step in the process of data exploration. Filter out uninteresting regions; reduce the data set to a smaller, more manageable size. Each indicator axis has a pair of range sliders which define the bottom and top range for the query area. The range of an indicator can be specified by “dragging” the handles on the top and bottom of the corresponding range slider. Regions with values for a selected indicator, that fall outside of the specified range, are filtered out. A combination of range slider movements can be used to dynamically formulate a more complex visual inquiry. These visual conditions and constraints will immediately reflect the visual contents in all linked views. An example of a query using the sliders is shown below. After a dynamic query operation, applied to the indicator "Labour Productivity”, regional values below the mean (green line) was removed. A tooltips shows the exact value for the slider position along the indicator axis. A second condition is then given for indicator “Unemployment rate”, where regions with higher rate (above mean value) are removed.
Figure: European NUTS2 regions. Filter operation is performed with range sliders positioned at the top and bottom of each indicator axis. Two filter conditions are performed here: 1) Keep regions with high “Labour Productivity” (above mean value) AND 2) Keep regions with low “Unemployment Rate” (below mean value). The map to the left shows the result after removing the regions that do not fulfil these two conditions. Only regions with yellow-red colours remain in the map and all other views. Use this Demonstrator http://mitweb.itn.liu.se/GAV/world/ to interactively learn more about this dynamic visual inquiry and filter operation feature.
Visual Inquiries based on statistical methods
Using range sliders described above offers control of the data value of the filtering threshold. However, it does not return any information on the number of data items removed, and is limited to removal of data outside of the range slider. To support visual inquiries using statistical measures a special filtering method based on percentile values has been implemented in the PCP. This statistical concept can be used to better facilitate the overview and understanding of data distribution. By representing the position of specific percentiles along the axes the user can easily get an understanding of the distribution of regions for current indicators. Given a range limited by an upper and a lower percentile value, filter operations can be performed either inside or outside of that area by clicking on the removal circle with an inside x symbol. This is shown in the figure below where all regions between the 10th and 90th percentiles are removed (filtered out) by one click. For example, the following conjunctive query: "Find the extreme regions for indicator "age 65+" below the 10th and above the 90th percentiles".
Figure: The filter operation for “Population age 65+” is here performed with a statistical correct operation based on percentiles. Select Option “Percentile” in the PCP GUI panel . Two percentile sliders marking the 90th and 10th percentiles are displayed for all indicators. By clicking on the symbol x between these two markers, all regions between the 10th and 90th percentiles are removed. Remaining regions representing the “outliers” are shown in the left map.
Special interactive exploration and discovery features
- Revealing correlation between indicators;
- Estimation of degree of similarity between regions;
- Finding clusters and outliers;
- Analyze the characteristics of many regions;
- Picking and highlighting of interesting data items for profile and comparison;
- Comparison of individual characteristics of a region to the characteristics of all regions;
- Comparison of indicators associated with a selected region;
- Comparison of variations of values of different indicators;
- Dynamic range sliders and statistical methods for defining events such as exceeding of a given threshold and identification of outliers;
- Dynamic visual inquiries, filter operations using familiar statistical methods;
- Facilitates dynamic animation in linked and coordinated views;
Special limitations and comments
In addition to some experience in reading parallel coordinates, the best way to get to know a dataset using the technique is clearly interaction. The main one in parallel coordinates is called brushing, for reasons that should be obvious from looking at the image below. For this to make sense, we need to look at all axes.
The PCP also have some limitation. For example, when the number of data items gets very high, there is a lot of over-plotting that can make it impossible to see anything. The number of indicators needs to be in the range 4-12, anything below or above that gets very difficult to interpret. The number of observations should be limited a few thousands. T he data must to be numerical and does not work very well for categorical data (there are exceptions!).
Further test and evaluation of the PCP in the educational PCP Vislet
Last updated: 2013-12-09