View Categories

Long (“thin”) format data

Is what it says! A dataset that is long and thin!

Details #

So what? Well, often in our fields we have repeated, nested data, e.g. multiple completions of measures by the same person over time. You can arrange this in some database or statistical computer package like this:

wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at ID Item01_1 Item02_1 Item03_1 Item04_1 Item05_1 Item01_2 Item02_2 Item03_2 Item04_2 Item05_2
1 chris 22/05/2025 08:40 AM chris 22/05/2025 08:40 AM 1 0.60 -1.70 -1.30 2.20 0.10 0.70 -0.90 0.30 -3.40 -1.10

That’s “wide” format data and the repeated occasions are indicated by the “_1” or “_2” in the variable names (columns). However, you could also arrange “longer” by putting the data like this.

wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at ID occasion Item01 Item02 Item03 Item04 Item05
1 chris 22/05/2025 08:54 AM chris 22/05/2025 08:54 AM 1 1 0.60 -1.70 -1.30 2.20 0.10
2 chris 22/05/2025 08:54 AM chris 22/05/2025 08:54 AM 1 2 0.70 -0.90 0.30 -3.40 -1.10

This is how I generally handle questionnaire data: the occasion has its own variable but each of the items has its own variable giving a mix of long and wide. You could also format it as truly long, what I think of as “ultra long”, like this.

wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at ID occasion item score
1 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 1 Item01 0.60
2 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 1 Item02 -1.70
3 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 1 Item03 -1.30
4 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 1 Item04 2.20
5 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 1 Item05 0.10
6 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 2 Item01 0.70
7 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 2 Item02 -0.90
8 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 2 Item03 0.30
9 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 2 Item04 -3.40
10 chris 22/05/2025 08:44 AM chris 22/05/2025 08:44 AM 1 2 Item05 -1.10

They all have exactly the same data but generally the wide format (“fat”) is a bad way to handle things, it either fixes the maximum number of occasions, in this little example it cannot be more than two, or else you often have a very wide layout with lots of empty cells to the right.

The mixed format is good where you might be doing a lot of analyses, say internal reliability or other psychometric work across the items. However, the truly long format is generally how database programmers will format things as it makes it possible to any analyses you want (and they, and increasingly myself too!) are very comfortable swinging data around from one format to another. Some database software will organise things in that truly long format internally even if they don’t show it that way.

Try also #

Chapters #

Not covered in the OMbook.

Online resources #

None specifically though pretty much all my shiny apps that handle data will use mixed or truly long format data.

Dates #

First created 22.v.25.

Powered by BetterDocs