Visualization on the Web Tutorials - Advanced D3: More on selections and data; scales; axis

Material based on Scott Murray’s book and the Selection Tutorial, various other examples by Mike Bostock, and Jerome Cukier’s tutorial on scales

Selections and Data

First, let’s go over how selections and data mapping works again. We talked about the selection methods select() and selectAll() that take CSS selectors. That means that we can select on

elements – select("svg"),
classes – selectAll(".className"),
IDs – selectAll("#cancelButton"),
but also arbitrary complex selection expression such as parent child relationships – selectAll("p > span").

Here is a simple example:

See output in new page.

So, this is not new, but let’s take a look at the data structure the selection uses:

selection data structure The selections returns a Selection object which contains (among other things) a _groups array, which contains a NodeList. In the NodeList we can see all the matching objects. You can also see that the Selection object has other fields, e.g., a list of _parents.

Mapping Data

When we map data to that object – called a data-join – we can see where the data is stored:

See output in new page.

data assinged If we look at the properties of the Selection object after we assigned the data we can see the distinct selections:

_enter for all data items that didn’t match to a DOM element and are scheduled to be added,
_exit for all DOM elements that didn’t map to a data item,
and the familiar _groups for the updated selection.

When we drill down to the rect DOM elements, we can see a field __data__ that holds the actual data values. We then can access these data values through the first parameter passed into a function
.attr("width", function (d) {return d;})
or
.attr("width", d => d)

An optional second parameter
.attr("width", (d, i) => i)
is the index of the current element, in the order of how they occur in the DOM.

The following figure illustrates the differences between the enter, update and exit selection:

data_concept

Typically, we want to be able to create visualizations from scratch, i.e., we don’t want to select existing elements but add new ones. We can do this by appending elements to the enter selection:

See output in new page.

The enter selection right after the data assignment contains an array of length 3 with the generic EnterNode objects. We can see that the data is already bound to those objects.

Data assigned to placeholder nodes

Next we can use the append() function to assign specific DOM elements to the elements in the selection. Again, we only have to declare what we want to do once, and D3 will take care of the looping for you.

However, the things that we apply to the enter selection, do not apply to elements in the update selection. So, if we for example add a rectangle <rect x="0" y="10" height="10" width="50"></rect> to the above SVG, we will see that the data is not applied to it.

Updating

Here is an example that handles enter and update correctly:

See output in new page.

What happens here is that we define what to do when we add something (we add a rectangle and set its class), what happens when we remove something (we just remove it), and when we update something (we apply the new data elements). Notice the call to the merge() function, which merges the enter selection with the update selection, so that we can later set attributes on all the DOM elements at the same time.

Here is the example using the new selection.join() syntax.

See output in new page.

Notice that the “rect” parameter we pass to join("rect") specified which DOM element to create, and that removes are also handled automatically.

Transitions

That’s great, but now let’s look at how we can do this with transitions:

See output in new page.

That’s pretty smooth. We initialize new bars with a width of 0 and fade them in as they grow. Notice that D3 interpolates opacity, color, position, and size as we use transitions. For removing elements we simply fade them out with opacity.

Groupings: Handling Nested Elements

In many cases we want to apply data not directly to low-level SVG elements, but instead use a hierarchy of elements. For the bar chart, for example, we might want to add a label showing the actual value. There are two approaches to doing this:

Laying out the numbers and the bars independently so that they match up.
Using a group element to define the commonalities between the bars and the numbers (i.e., the y position in our example).

The latter is the better approach, as we only have to define the common aspects of the group once. This might not make much of difference for labels and bars, but we could also add tick marks on the bar charts, an overlay for highlights, etc. So, let’s try to add groups and take a similar approach to the example before:

See output in new page.

broken group

This doesn’t work as we would hope. The update isn’t handled correctly. When we look at the selection we can see why: the data in the g element is updated correctly, but not the data in the rect or the text; these are only handled upon enter not upon update.

We can fix this by separating the update from the enter and by explicitly selecting the lower-level elements (the rectangle and the text). The selection propagates the data to the actual elements:

See output in new page.

Scales

Up to this point, our data has conveniently had dimensions that we could directly plot without applying a transformation. However, most of the time the data we want to visualize doesn’t neatly match to pixels, so we will have to do some data transformation.

Here is an example of a dataset that is not suitable for direct plotting:

See output in new page.

Generally speaking what we are looking for is a function f() that maps an input dataset D to a derived dataset D', i.e.,
D'=f(D)
Naturally, we want to choose f() such that the resulting derived dataset can be used for plotting.

Let’s write a function that we can call so that this is dataset can be easily plotted:

See output in new page.

That’s nice! We could simply do this for any kind of data we have and could achieve our goal. However, there are many possibly useful functions. For example, we could not only do linear scales, but could use logarithmic scales, power scales, discrete scales, etc.

And we don’t always want to write output in screen coordinates, we equally want to vary the saturation of a color, etc. Finally, we could have ordinal data, or temporal data. So you can see that there are many possibilities to transform your input data to achieve a particular visual mapping. Fortunately, D3 provides a set of powerful scales:

See output in new page.

Here we’ve successfully used a linear D3 scale. There are a couple of new things here. First, we create a scale of the linear type with the call d3.scaleLinear(). Next, we define the input domain and the output range. The domain defines the values that we expect in our dataset. Here we chose to use an input domain starting at 0 and up to the highest value of the dataset, which we can conveniently determine by using D3’s d3.max() function. We could use d3.min() to define the lower bound of the range, however, that would mean that our bar-chart doesn’t start at 0, and this is something that we will learn is frequently bad!

The range defines which output values we want from the scale, i.e., we’d typically pick something that we can easily use to draw on the screen.

Clamping

What happens when you plot a value that’s larger than your domain? When it’s feasible, you will get a value based on your mapping function, but it will exceed your screen coordinates. There is a way to avoid overplotting, by using clamping, i.e., by manually assigning every value larger than the extend of your domain to the maximum value of the domain. This can be very helpful, but should always be done with care: you need to highlight that you’ve broken the scale somehow, e.g., by using a label or a special color.

See output in new page.

Color Scale

The next example shows how we can use a color scale to redundantly encode a bar chart with positive and negative values:

See output in new page.

Setting the color scale is very convenient. We simply define a range of three points, the minimum, zero and the maximum, and identify the three corresponding colors (darkred, lightgray, and steelblue) in the range definition. When calling upon the scale, D3 will interploate between the colors for you.

Axes

Drawing Axes and legends is critical for data visualization. All of our examples, with the exception of the labelled bar charts, didn’t really tell us about the absolute values of the data. However, drawing axes well can be a lot of work. Fortunately, D3 has extensive support for axes. Here is an example based on our previous bar chart:

See output in new page.

This worked! We create a new axes by calling d3.axisBottom(); and tell it its value range by passing the relevant scale. We then append the scale to the svg with this call: svg.append("g").call(xAxis). Now, we can see that it overlaps with our bar charts and that some of the numbers are close to the edge of the svg. To fix this we have to style the chart and add some margins. We’ll also apply some custom styles to our axis.

See output in new page.

This is a nicer bar chart! We’ve used CSS to style the axis, introduced a padding and a background for the chart. Also, we’ve used nice() on the scales to get a nice, human-readable range.

More Complex Data

Up to this point we have only worked with one-dimensional arrays. Here is an example on how to handle arrays of objects, where each object has multiple values. We have three types of values: products (labels), types of products (categories), and tons (numerical data). We’re showing the data in a colored bar chart, where the color corresponds to the product type and the bar width to the tonnage. In addition to that, you can sort the bar chart based on the three data values: alphabetically by type and product, numerically by tonnage.

Among the new concepts introduced in this example are:

How to access data in objects.
How to sort data.
How to properly deal with padding in an SVG.
How to use a scale with bands to evenly space the bars.
How to update a scale.
How to ensure object consistency for transitions with the key function of the data mapping.

See output in new page.

An alternative to the sorting as used here would be to use D3’s sort feature for selections.