Module 2: Data Selection | MobiDaS Tutorial

1

Quick Intro

~2 min

WHAT

Data selection means picking the small number of well-chosen variables that directly support your core question — and rejecting the rest, even when it's available.

WHY

Decision-makers have 5 minutes. Every extra variable steals attention from the answer. Bad data choice can't be rescued by good visualization downstream.

HOW IT FITS

This is pillar 1 of four (data + visualization + narration + interaction). It determines what your chart in Module 3 can actually show.

EXAMPLE

For "Should we expand cycling lanes?": pick cycling trips per district + collision counts. Reject GDP, even though it correlates with everything.

1

Essential vs. noise

30 sec

Essential data = directly answers your question. Noise = interesting but irrelevant.

Essential

PT ridership 2019-2023 (answers "did ridership recover?")

Noise

Average ticket price by region (interesting, but not the question)

Every dataset must pass the test: "Does this help answer my question?"

2

The 2-layer filter

30 sec

Layer 1: Is it accessible and traceable? Layer 2: Does it answer your question?

Layer 1: Quick Filter

✓ Can I download it?
✓ Is the source named (Destatis, KBA)?
✓ Is it clean/structured?

Layer 2: Story Fit

✓ Does it answer my question?
✓ Does it include crisis data?
✓ Does it have a turning point?

Only data that passes BOTH layers goes into your story.

3

Spot cherry-picking

30 sec

Cherry-picking = showing only data that supports your conclusion. Always include uncomfortable data.

Cherry-picked

"Ridership grew 15% from 2021 to 2023" (ignores 2020 crash)

Complete

"Ridership crashed 56% in 2020, then recovered to 96% of 2019 by 2023"

MobiDaS R2 (Integrity): Include the 2020 pandemic drop. Always.

Continue to Module 3 →

2

Guided Practice

~3 min

1

Choose the right dataset

1 min

Practice applying the 2-layer filter.

❓

Question: "Should the city invest in suburban rail to recover lost ridership?"

Which dataset is essential?

A: Ticket revenue 2019-2023

Source: Transport authority annual reports

B: Suburban rail ridership 2019-2023

Source: Destatis/VDV

C: Car ownership rates by district

Source: KBA vehicle registry

2

Check for integrity

1 min

Good data includes uncomfortable truths. Check for gaps.

📊

Dataset: PT ridership 2015-2023

Years available: 2015, 2016, 2017, 2018, 2019, 2021, 2022, 2023

What year is missing?

Missing year:

Why it matters: Missing 2020 = cherry-picking the pandemic crash.

3

Write a source citation

30 sec

MobiDaS R3 (Transparency): Every data point needs a traceable source.

Citation format

[Metric]: [Value] ([Source], [Year])

Example

Suburban rail ridership: 1.2B journeys (Destatis, 2023)

Write a citation for this fact: "EV registrations reached 500,000 in 2023"

3

Applied Task

~3 min

1

Identify your data needs

1 min

Based on your M1 question, what data would answer it?

Your question

(Your M1 question will appear here)

What metric would directly answer this question?

2

Name your source

1 min

German mobility data sources: Destatis, KBA, UBA, VDV, Mobilithek.

Select the most likely source for your data:

Destatis

Official statistics (ridership, modal split)

KBA

Vehicle registrations (cars, EVs)

UBA

Environmental data (emissions, air quality)

VDV

Public transport statistics

3

Write your data selection

1 min

Combine: metric + time range + source + why it answers your question.

Data selection template

I will use [metric] from [source] ([years]) because it shows [how it answers question].

Write your data selection:

0 characters

4

Quick Refinement

~2 min

1

Check your selection

30 sec

Your data selection must pass the 2-layer filter.

Layer 1: Accessibility

Can be downloaded (CSV, Excel) Source is named (official statistics)

Layer 2: Story Fit

Directly answers my question Includes 2020 crisis data

2

Add integrity note

30 sec

Note any limitations or uncomfortable data points you'll include.

What uncomfortable truth will you include? (e.g., pandemic drop)

3

Commit your data selection

30 sec

Your data selection will guide your visualization in Module 3.

Final data selection:

Module 2: Selecting the Right Data

Quick Intro

Essential vs. noise

The 2-layer filter

Spot cherry-picking

Guided Practice

Choose the right dataset

Check for integrity

Write a source citation

Applied Task

Identify your data needs

Name your source

Write your data selection

Quick Refinement

Check your selection

Add integrity note

Commit your data selection