/
Tech-study-notes

1_data_visualization_fundamentals

Sources:

Introduction to Data Visualization

Data visualization is the graphical representation of information and data. Using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

Why Data Visualization Matters

Types of Data Visualization

Exploratory Data Visualization

Explanatory Data Visualization


Data Types and Field Classifications

Dimensions vs. Measures

Understanding your data fields is fundamental to choosing appropriate visualizations.

Dimensions (Qualitative/Discrete)

Measures (Quantitative/Continuous)

Discrete vs. Continuous

Discrete Fields (Blue/Categorical):

Continuous Fields (Green/Numeric):


Chart Types and Their Applications

Column Charts

Best for: Comparing values across categories or showing trends over time

Description: Column charts use vertical rectangular bars of varying lengths to represent different values. Categories are placed on one axis, values on another.

Types:

When to Use:


Bar Charts

Best for: Comparing discrete values, especially with long category labels

Description: Similar to column charts but use horizontal rectangular bars. The length of each bar corresponds to the magnitude of the value.

Types:

When to Use:


Line Charts

Best for: Showing trends and changes over time (time series analysis)

Description: Line charts display information as a series of data points connected by straight line segments.

Types:

When to Use:


Area Charts

Best for: Showing magnitude of change over time and cumulative totals

Description: Similar to line charts but with the area below the line filled in. Multiple series can be stacked to show part-to-whole relationships.

Types:

Benefits:


Pie and Donut Charts

Best for: Showing proportional composition of a whole

Description:

When to Use:

Caution: Difficult for precise comparison; use when approximate proportions suffice


XY Scatter Plots

Best for: Identifying relationships and correlations between two variables

Description: Uses Cartesian coordinates to display values for two variables as points on a plane.

Types:

When to Use:

Key Insight: The pattern of points reveals the nature of the relationship:


Heat Maps

Best for: Visualizing complex data matrices and identifying patterns

Description: Two-dimensional matrix visualization where values are represented by colors.

Characteristics:

When to Use:


Tree Maps

Best for: Displaying hierarchical data with proportional representation

Description: Rectangular visualization where each rectangle represents a category, with size proportional to a quantitative value. Rectangles can be nested for hierarchical data.

When to Use:


Radar (Spider) Charts

Best for: Comparing multiple quantitative variables across categories

Description: Displays multivariate data as a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point.

When to Use:


3.10. Box and Whisker Charts (Box Plots)

Best for: Displaying data distribution and statistical summary

Description: Visual representation of statistical summary including:

When to Use:


3.11. Histograms

Best for: Showing frequency distribution of continuous data

Description: Bar chart variant where bars represent ranges (bins) of data, showing how frequently values occur within each range.

Key Concepts:

When to Use:


3.12. Pareto Charts

Best for: Identifying the most significant factors (80/20 principle)

Description: Combination of bar and line chart where bars represent individual values in descending order, and the line represents cumulative percentage.

The Pareto Principle: Roughly 80% of effects come from 20% of causes

When to Use:


3.13. Waterfall Charts

Best for: Showing cumulative effect of sequential positive and negative values

Description: Shows how an initial value is affected by a series of intermediate positive or negative values, creating a “waterfall” effect.

When to Use:


3.14. Map Charts (Choropleth Maps)

Best for: Geographic data representation

Description: Color-coded maps where regions are shaded based on data values.

When to Use:


Advanced Visualization Techniques

Dual Axis Charts

Purpose: Display multiple data series with different scales on the same visualization

Types:

Use Cases:

Best Practices:

Blended Axis Charts

Difference from Dual Axis:

When to Use:


Chart Selection Guidelines

Choosing the Right Chart

PurposeRecommended Chart Types
Compare categoriesColumn, Bar
Show trends over timeLine, Area
Show composition/proportionPie, Donut, Stacked Column/Bar
Show distributionHistogram, Box Plot
Show relationshipsScatter Plot
Show hierarchyTree Map
Show geographic dataMap Chart
Statistical summaryBox and Whisker
Multi-variable comparisonRadar Chart
Pattern identificationHeat Map

Key Selection Criteria

  1. Data Type: Categorical vs. continuous vs. hierarchical
  2. Number of Variables: Single, dual, or multiple
  3. Comparison Type: Part-to-whole, time-based, or correlation
  4. Audience Needs: Precision vs. pattern recognition
  5. Message Priority: What insight should be most prominent?

Visualization Best Practices

Color Usage

Color Palettes

Color Guidelines

Clarity and Simplicity

Accurate Representation


Summary: Data Visualization Workflow

  1. Define Objectives: What questions need answering?
  2. Understand Data: What types and structures are available?
  3. Classify Fields: Identify dimensions vs. measures
  4. Select Chart Types: Match visualization to purpose
  5. Apply Design Principles: Ensure clarity and consistency
  6. Test and Refine: Validate with target audience
  7. Deploy and Monitor: Track usage and effectiveness

Key Takeaways