Find out how data visualization has evolved over the years and what you can do with it.
In our daily work, at some point, we will be assessing information and trying to understand it. It is very common for work teams of all kinds to use metrics to evaluate how their work has evolved or if they are close to reaching a goal. When looking at data and striving to understand it, the use of a correct data display is key. And that is what we are going to talk about in this blog post.
I have been studying and working with data management as a Data Engineer for several years and part of that includes finding the best way to visualize the data in question. However, we don’t always succeed in displaying and visualizing data in an easily understandable way.
As software developers, we are often less qualified to tell stories with data. In our education we learn a lot about language and how to write papers and articles and are also provided lectures on numbers and mathematics, but we are never really taught how to present data or quantifiable information in an easy and simple way. For me, learning more about the world of data visualization brought these two areas of knowledge together.
The Evolution in Data Displays
Lately in the software world, we’ve seen a strong app development trend that displays data using dashboard-style functionalities. It provides information through graphics such as these:
But the first data visualization tools started long ago and in a more analogical way than the ones we know today.
One of the earliest data display formats that we know are maps. In the following image we see the first world map from 1570, which showed the trade routes at that time. The creator, Abraham Ortelius, published an atlas of 53 maps as a first effort to bring together the world’s cartography knowledge.
Then, in 1645, a mathematician sent a letter to the Queen of Spain giving her what is known as the world’s first statistical data chart. In this letter he reported mistakes he identified in the ranges of longitude on the world map.
In the graphic he included in the letter, he wanted to prove the correct estimation of the distance between Toledo and Rome, and to show how it differed from the maps known at that time. He might simply have made a chart or he could have listed the values of the longitudes, but he noticed this form failed to achieve his intended purpose.
More than 250 years ago, a history teacher wanted to summarize all the time periods and dynasties in human history. To communicate this to his students and make it easier for them to remember, he created a timeline, which was one of the first important contributions he made to the world of data visualizations.
Like these, there are several other interesting examples of data visualization that communicate and display information more quickly.
In this one, a historian wanted to illustrate the impact of their military’s trip to Russia during the French Revolution. The graphic shows us the number of people who started the campaign and then, as they advanced into Russian territory and battles took place, the number decreased. The graphic also shows a line marking the retreat, and how the number of men who returned from war is much lower than the starting number.
This next image is a reference to an outbreak of cholera in London in 1850. At first, there were many hypotheses of how the disease was transmitted. Many said it was contagious through the air, but Dr. John Snow refused to accept this idea. To prove his theory that the illness was associated with the use of something contaminated, he began to mark the place where each cholera victim died with a small line on their house’s door. This metric led to the first heat map in history. Eventually, they realized the regions with concentrated lines contained a contaminated water bomb, which explained the outbreak.
Why Do We Display Data?
Data visualization is used to explore the information we have and see how it behaves, as well as understand why events occur in a certain way.
There are different types of data visualizations:
Quantity Visualization
Example: bar charts or heat maps
- They are used to indicate the best or worst in a category by comparing two or more points; to compare performance with the target or goal; to show what has or has not progressed in a certain period of time.
- It is useful in showing similarities and differences in a straightforward way.
Distribution Visualization
Examples: histogram, density plot, box plots
- They are used to indicate the highest, middle, and lowest values.
- It is recommended to find out if there is something that stands out from the rest, reveals atypical values such as distribution shape, frequencies, ranges, etc.
Proportion Visualization
Examples: Cake, bars, plots, grouped bars, mosaics, etc.
- They indicate which parts make up the totality and serve to highlight which is larger or smaller, which is similar or different, etc.
- Highly recommended to show summaries, similarities, anomalies, percents related to the total, etc.
X-Y Relationship Visualization
Examples: line graph, bubble chart, 2D bins, etc.
- They are useful to question whether the relationship between two numerical variables is positive, negative, or neither, or to understand how one X value or group is related to another Y value or group.
- Used to show atypical values, correlations, positive and negative relationships between two or more variables.
Guidelines for Good Visualization:
- Understand the context: who is the audience for the data? Do they have technical knowledge about the topic? How much time do we have to show this data, and where are we going to show it? Will it be a face-to-face meeting for a hundred people using a projector or is it a one-on-one Zoom video call? What information is most essential?
- Choose an appropriate visualization: we have options to display data through texts, tables, graphics, among others. Depending on the information and what we want to highlight, it is important to choose the most effective visuals.
- Remove disorder and focus the attention of the reader where we want through the choices we make when displaying the data: what shapes or colors we use, what elements we highlight, which ones we can take out because they don’t add value, etc.
- Try to tell a story to lead the reader toward what we want to reveal.
To discover the best way to display data, I recommend testing, experimenting, and making mistakes. Only by trying different alternatives and comparing different visualization styles can you evaluate which one really suits the guidelines and the situation.
It is important to help the reader by highlighting the most important info and taking them where we want to go. For example, if we want to highlight the number 3 in this image, why not make it a different color than the rest?
However, balance is also essential. Avoid helping too much by mixing up things that we normally associate with a specific order or format. In the example below we see how one could interpret the information incorrectly by mixing age ranges in this way, even though the bars are organized with the aim of making it easier for the viewer:
Typical Mistakes:
- Overload of information: by trying to show too many things at once, you end up not getting anything across.
- Abusing cake graphics: these are useful if we want to have a notion of parts vs. the whole. To measure growth, for example, it is not a recommended method as it is difficult for the human eye to compare angles and evaluate.
- Choosing the wrong color palettes that do not correctly reflect the differences you want to highlight.
- Not respecting scales or not showing the whole timeline.
- Variations vs. absolute numbers can also give the wrong picture.
Good Practices:
- Never use 3D graphics. It is difficult to draw imaginary lines or planes that allow us to find the intersection with the axes. It is better to have more than one graph versus using a 3D one to show 3-way aspects comparisons.
- In a correlation or an average, show the points. If not, we have no way of finding out exactly what factors led to this average.
- Inverted axes can be confusing. Don’t fail to follow the logic of how humans read and interpret information.
There’s Nothing Like an Example:
Let me show you how to use data visualization effectively with an example extracted from several books that I recommend for studying data visualization:
- Storytelling with you, by Cole Nussbaumer Knaflic
- Fundamentals of Data Visualization, by Claus O. Wilke
- Calling Bullshit, by Carl T. Bergstrom and Jevin D. West
- Data Visualization Made Simple, by Kristen Sosulski
In order to make the post more insightful, we will replicate the graphics and show code examples using R for its simplicity and popularity in data analysis. However, it is possible to find many more examples for different languages at http://chartmaker.visualisingdata.com/.
In this image we can see the number of tickets received and processed by a support team. This team leader wants to demonstrate that the resignation of a team member has significantly affected the ticket processing capacity. However, as we can see in the image, it is not very easy to see a trend change in the number of tickets processed vs. received.
This is due to the fact that we are using bars instead of a line graph. If we change the data visualization format, we can showcase the trend better:
As you can see, it is much clearer that there was a “cut” in the trend from May (the month the member resigned). However, this graph could be improved further:
The rationale for the changes:
- We do not need a comparison of the values, so we removed the background and unnecessary lines.
- We added the values at each point after August (where the biggest difference is) and its value (in a different color for easier reading).
- We replaced the numbers on the X-axis with abbreviated months to make it easier for users to read.
Finally, we made a few other improvements:
- Eliminated the colors since they can draw attention to areas we don’t want.
- Replaced the classic captions with labels to make the reading even easier.
- Added an eye-catching title, subtitles, and captions, and also improved the axis labels.
- Added a line with a description at the break point.
If we compare the two graphs, the second one helps us communicate our message better and can incite much more interest in our audience. And that’s what data visualization should be all about.
About the Author
Oscar Montañés is a graduated engineer with 10+ years of experience in different roles within information technology and data analysis. He specializes in R and project management.
Key Takeaways on AI from ITC Vegas 2024
At ITC Vegas 2024, the sessions this year had a huge emphasis on how the insurance sector is rapidly advancing the ways in which they use AI. There’s a major shift happening where theoretical possibilities are turning into real-world implementations and this is becoming evident in many ways. Learn more in this blog post.
Three Tips for Successfully Harnessing Data in AI Implementation
Implementing AI successfully typically hinges on one crucial element: data. Without clean, well-governed and ample data, AI systems can struggle to deliver accurate insights and meaningful actions. In this article, learn three tips for harnessing data effectively in AI implementation.
Discovering Storybook: The Tool that Transforms Your Component Development Process
Component creation is an essential and often intricate part of the web development process. With the increasing complexity of applications, developers face the challenge of ensuring that each component functions correctly and seamlessly integrates within the rest of the system. Learn how Storybook can help.
What Are Successful AI Leaders Doing Differently?
In our recent study on AI-infused products, we wanted to get to the bottom of what really differentiates those who are leading the charge in AI and those who are falling behind. So what are AI leaders doing differently? Read this article to find out.