Interactive Geovisual Analytics applied to BIG Statistical Regional Data
Collaboration with Statistics Sweden (SCB)
This "BIG data" research project is based on a close collaboration between Statistics Sweden (SCB) and NCVA. By choosing Adobe Flash in 2008 as our programming platform for our visualization, we face two major challenges, performance and interactivity, when dealing with large geographical statistical datasets applied to visual geographical components such as the interactive choropleth map but also when using the more traditional scatter plot and bar chart. This SCB eXplorer application (right) is based on more than 10000 small ZIP code regions.
We have applied the well-known Information Visualization method "Focus & Context" (Overview & Detail) to reduce the amount of statistical geographical related data for the interactive visual analysis. Zoom in on a smaller area to see details while still maintaining the overview gives the user an idea where to find regions-of-interest and then analyze in more detail what is important to the user. This can help us to also avoid a performance issue.
NCVA supports Focus & Context in GAV Flash by implementing the concept of sub-datasets. A sub-dataset is a list of references to data items in a large dataset. Since a data item normally corresponds to a region in a map and vice versa, a sub-dataset corresponds to a sub-area of the whole map and vice versa. Then each component works basing on its sub-dataset input. It visualizes only items in the sub-dataset. In case we want to visualize the whole dataset, the sub-dataset will be set to the whole dataset.
We demonstrate this approach using the large geographical statistical dataset with almost 10,000 Swedish Zip Code regions including more than a million (X,Y) geographical coordinates and 50 indicators over several time steps. An interactive region selection mechanism is applied to the context choropleth map (left view) to focus on an area of interest to be applied in the two right views (focus choropleth map and scatter plot/bar chart) which interacts only with the requested reduced regional data set for better performance and overview.
We use a simple dataset model "data cube" as a base for storing and communicating large statistical multivariate datasets. It is designed to manage data in three dimensions, represented by indicator, space and time and can communicate the boundaries and content of these dimensions to the visualizations. To make it independent of actual storage structure, we have used a static and simple interface for data access. This allows different storage structures to be implemented to serve for different purposes. For large data, we have implemented a storage structure that simply is an array optimized for fast access and can handle large spatio-temporal, multivariate datasets.
This data model also facilitates implementation of geovisual analytics applications that typically have a pipeline architecture (see figure) of three modules. In this architecture, data is first loaded from data sources into the data provider module and then passed to the data transformation module for analysis and/or processing before being passed to the visualization module for visualization. The transformation is optional, and an application can even combine visualizations that display the data both before and after transformation. Visualizations can be linked to control the transformation thus giving the user direct access to how the data is manipulated.
Figure: The simple Data Cube Model represented by dimensions Indicator, Spatial and Time is used to interact efficiently with large geographical statistical data. A cell represents, for example, (Stockholm;age group 65-84 and above;Year 2010). The user can interactively request all regions for an indicator "65 and above" to be visualized in a Bar Chart or Scatter Plot with four indicators age group 0-15, age group 65 and above, total population and age group 65 and above (Colour).
Figure: 10,000 Swedish Zip Code regions and associate indicators over several years visualized using the "Focus & Context" technique linked and coordinated with a Bar Chart using fisheye lens technique. Context is here represented by almost 3,500 regions (County Skåne) visualized in a linked and coordinated choropleth map and bar chart.
The application is available at:
Another important visualization technique applied to BIG data is the Fisheye Lens that magnify the center of the field of view (Focus), with a continuous fall-off in magnification toward the edges (Context). Degree-of-interest is determined by the level of detail to be displayed assigned through user interaction. The fisheye Lens, here applied to the bar chart, is an interactive data reduction visualization tool for seeing both local detail and global context of a large number of bars simultaneously. It is an intuitive spatial navigation tool which clearly portrays immediate relationships between bars in focus and its neighbour bars. This fisheye technique was developed and integrated by NCVA into a bar chart as part of the GAV component toolkit in a joined research project with OECD staff already in 2008 and first used in OECD Regional eXplorer.
Figure: A linked Bar Chart "fish-eye" view with large statistical data representing almost 3,500 regions showing indicator elderly age group 65-84. The bars in the focus view have the same wider width and data values with region names can easily be discovered and analysed. The bars in the context view are much thinner but the user can still see the overall trend. The focus view is interactively controlled by the user through a dual slider in the bottom of the bar visualization. This means that the user can increase or decrease the width of the bars. The user can move the slider to move the focus view, or move two control points of the slider to extend or shorten the focus view (i.e. to increase or decrease the number of items displayed in the focus view)
Figure: Below shows the Swedish map based on zip code regions with a focus view around Stockholm. Selected regions are here shown in the fish eye bar chart.
Figure: A linked Scatter Plot view with statistical data representing almost 3,500 regions for county Skåne. Scatter The special Scatter Plot can represent for indicators Y-axis: age group 0-15, X-axis: age group 65-84, Point Size: total population and Point Colour: age group 65 and above.The two selected zip code regions are also highlighted in the map.
See demonstrators at Statistical Sweden Web Site:
Geovisual Analytics applied to 2,000 OECD TL3 Geographical Regional Statistical Data
Demonstrator avaiable at:
OECD has since long felt the need to make large regional data much more easily available on the web in an interactive and user-participative way. In particular, to make a more extensive use of dynamic web-enabled visualization which can, more effectively than a static graph, convey the four dimensions included in the regional database: statistical indicator, time, regional and country value. In addition, timely information on the progress of a local community requires crossing different sources of information and new ways to generate and share information for decision-making. The OECD Regional eXplorer was introduced in 2008 at the public OECD Web site and immediately gained a lot of interest and became a popular statistical Web site.
A large number of OECD regional multivariate and temporal statistical data are here explored through the use of time-linked views controlled by a time slider. Trends are detected through several visual representations simultaneously, each of which is best suited to highlight different patterns and can help stimulate the analytical visual thinking process . Of particular interest is our Flash implementation of motion graph components through a dynamic bar chart with embedded focus & context technology, scatter matrix coordinated with a scatter plot, table lens, extended parallel coordinates plot (PCP), choropleth layered map structure, map glyphs and a special distribution plot developed in collaboration with OECD staff. Interactive features include tooltips, brushing, highlight, visual inquiry, and conditioned statistics filter mechanisms that can discover outliers and simultaneously update all views.
Figure: A large number of spatial-temporal and multivariate OECD regional statistical data are here simultaneously explored through the use of dynamically linked multiple views to detect complex patterns and problems. The views are coordinated using the GAV data linking method based on the data cube model and colouring scheme. Any filtering or highlighting made in one of the linked functional components is transmitted to all the others. This is an example of an OECD Regional eXplorer linked view scenario (choropleth map, distribution plot and fisheye bar chart) applied to about 1200 OECD EU TL3 regions. The chosen colour map indicator is “percentage of population aged 65 and more”. Selected regions are highlighted in all three views for comparison. A selected number of interesting regions with highest level of elderly people are highlighted in all three views.
The OECD Distribution Plot (see figure below) is a more advanced variation of the box plot visualization method. The distribution plot was designed and implemented in close collaboration between NCVA and OECD thus providing direct valuable feedback from end users. This visualization method shows how OECD TL3 regions are represented within a country for a selected indicator (Population age group 65+). Each cluster is represented as one row of dots (regions that belongs to the country) and these dots are a representation of regions belonging to this country. The method is based on the premise that a data item must belong to a cluster. It is a good way to visualize the differences within a cluster; showing the minimum, maximum and mean value as well as the interval of the values in the cluster. Use the menu on the left side to change which indicator to be visualized and use the green scroll bar on the top to "zoom" in on interesting areas. Tooltips provides information about region name and associate statistical value.
Figure: OECD Distribution Plot showing size of ageing population (age group 65 and above) based on OECD regional data. Each row of dots represents one country. The red points represents regions (to the right of the row) with a high rate of elderly people, while blue points (e.g. Inner London) has a very low rate. Regions with extreme high level of elderly people are highlighted. The Distribution Plot is linked and coordinated with a European choropleth map. The user can click on a coloured circle and see the location of the region in Europe. The map below shows the highlighted regions.
Last updated: Tue Nov 22 15:21:55 CET 2016