Dataset and database

See sample to read why I think we should be using “dataset” instead of “sample”.

A dataset is what it says: any set of data. It tends to be used for “rectangular data”, i.e. data in which all rows of data have the same number of columns and where usually the columns are “variables”. However, it can also be used for “ragged data”: where there is no neat rectangular shape.

“Database” is generally used for more complex structures of data with multiple rectangular sets of data (here usually called “tables” or “data tables”).

As our routine change data are rarely nice samples from defined populations achieved by clear sampling frames it can be good discipline to refer to them as datasets rather than as samples.

Details #

In the simplest dataset rows are typically individuals’ values on the variables in the columns, or the rows might be occasions, i.e. there might be multiple rows for any individuals one per occasion. That format is generally called “long format” in contrast with “wide format” where typically different occasions’ data are columns. That distinction between wide and long format is also used for tables within database structures and generally long format data is most efficient in terms of storage requirements and speed of processing data. Through the 21st Century therapy research and routine data collection has moved from thinking in terms of rectangular data to complexly structured databases which generally much more accurately and appropriately reflect the realities of therapy delivery. However, getting our heads around complex databases and data structures remains a challenge for most of us!

Try also … #

RDBMS: Relational Database Management System

Chapters #

Chapter 8: the service example assumes, as would be the case for such UK services, that service data is being stored in a database whereas the examples for individual practitioners in Chapter 7 could be stored in simple rectangular datasets.

Dates #

Updated 10.iv.24.

Powered by BetterDocs