/
Tech-study-notes

3_advanced_data_analysis

Table Calculations vs. Database Calculations

Overview

Understanding when to perform calculations in the visualization layer versus the database layer is crucial for performance and accuracy.

Table Calculations

Common Table Calculations:

Database Calculations

When to Use Each:


Ranking Functions

Types of Rank Calculations

Standard Rank (RANK)

Dense Rank (RANK_DENSE)

Modified Rank (RANK_MODIFIED)

Unique Rank (RANK_UNIQUE)

Ranking Applications


Statistical Analysis Techniques

Reference Lines and Bands

Reference Lines

Reference Bands

Distribution Analysis

Box Plots (Box and Whisker)

Components:

Histogram Analysis

Key Insights:


Forecasting and Trend Analysis

Trend Lines

Types of Trend Models

  1. Linear: Constant rate of change

    • Formula: y = mx + b
    • Use: Steady growth or decline
  2. Exponential: Accelerating growth/decay

    • Formula: y = ae^(bx)
    • Use: Viral growth, compound interest
  3. Logarithmic: Rapid initial change, then leveling

    • Formula: y = a + bยทln(x)
    • Use: Diminishing returns
  4. Polynomial: Multiple inflection points

    • Formula: y = a + bx + cxยฒ + …
    • Use: Cyclical patterns, complex relationships

Forecasting Techniques

Time Series Forecasting

Forecast Components

Confidence Intervals


Conditional Logic and Segmentation

Calculated Fields with Logic

IF-THEN-ELSE Logic

Structure:

IF condition THEN result
ELSE IF condition THEN result
ELSE default_result
END

Applications:

CASE Statements

Structure:

CASE expression
  WHEN value1 THEN result1
  WHEN value2 THEN result2
  ELSE default_result
END

Use When: Multiple discrete conditions with specific values

Segmentation Analysis

Cohort Analysis

ABC Analysis (Pareto Segmentation)

RFM Analysis (for customer data)


Data Joins and Relationships

Types of Joins

Inner Join

Left Join (Left Outer Join)

Right Join (Right Outer Join)

Full Outer Join

Join Best Practices

Performance Considerations

Data Quality

Unions vs. Joins

Unions (Append Operations)

Key Differences

AspectJoinUnion
DirectionHorizontal (adds columns)Vertical (adds rows)
Result SizeMultiplicativeAdditive
MatchingRequires common keysRequires same structure
Use CaseRelate different entitiesCombine similar entities

Geographic Analysis

Geographic Data Types

Recognized Geographic Fields

Geographic Hierarchy

Country
  โ””โ”€โ”€ State/Province
       โ””โ”€โ”€ City
            โ””โ”€โ”€ Postal Code

Map Visualization Types

Filled Maps (Choropleth)

Symbol Maps (Proportional Symbol)

Heat Maps (Density)

Flow Maps

Spatial Analysis Techniques

Distance Calculations

Territory Analysis


Time Intelligence

Time-Based Calculations

Period-to-Date Calculations

Period Comparisons

Moving Calculations

Date Hierarchy and Drill-Down

Standard Hierarchy

Year
  โ””โ”€โ”€ Quarter
       โ””โ”€โ”€ Month
            โ””โ”€โ”€ Week
                 โ””โ”€โ”€ Day

Fiscal Calendar


Advanced Segmentation

Sets and Groups

Static Sets

Dynamic Sets (Conditional Sets)

Groups

Parameters

Types of Parameters

  1. Numeric Parameters

    • Integer or decimal values
    • Range controls (min, max, step)
    • Example: Top N selector (5, 10, 15, 20)
  2. String Parameters

    • Free text or predefined list
    • Example: Select metric (Revenue, Profit, Quantity)
  3. Date Parameters

    • Single dates or date ranges
    • Dynamic date options (Today, Yesterday, etc.)
  4. Boolean Parameters

    • True/False switches
    • Example: Show details Yes/No

Parameter Applications


Annotations and Context

Types of Annotations

Mark Annotations

Point Annotations

Area Annotations

Best Practices for Annotations


Data Storytelling

Narrative Structure

The Data Story Arc

  1. Setup: Context and baseline
  2. Conflict: Problem or question
  3. Rising Action: Analysis and exploration
  4. Climax: Key insight or discovery
  5. Falling Action: Implications and recommendations
  6. Resolution: Action items or conclusions

Story Points

Sequential Storytelling

Interactive Storytelling

Effective Data Communication

Key Principles

  1. Start with Questions: What does the audience need to know?
  2. Show, Don’t Tell: Let visualizations speak
  3. Highlight Insights: Direct attention to findings
  4. Provide Context: Baseline comparisons matter
  5. Recommend Actions: Data without action is just numbers

Common Pitfalls to Avoid


Performance and Optimization

Query Performance

Strategies for Large Datasets

  1. Extract vs. Live Connections

    • Extracts: Pre-aggregated, faster performance
    • Live: Real-time data, always current
    • Choose based on data size and freshness requirements
  2. Data Source Filters

    • Filter at source before loading
    • Reduces volume transferred
    • Improves processing speed
  3. Aggregation Strategy

    • Pre-aggregate when possible
    • Use appropriate granularity
    • Avoid unnecessary detail levels

Dashboard Performance

Loading Optimization

Interactive Performance


Key Takeaways

Calculations and Analysis

Data Relationships

Geographic and Time Analysis

Advanced Techniques

Performance