What Makes Data Information

To paraphrase Peter Drucker, a database is not information, but the raw material from which information can be drawn. He says that to become information, data must be organized for a task, directed toward specific performance and applied to a decision. Let’s look at each of these criteria in greater detail.

Organized for a Task

If you want the opposite of this definition, look up “spurious correlations” websites where you can compare the rate of specific causes of death to economic statistics, environmental variables and other unrelated categories. One of the lessons one learns from this site is that correlation not only means there is no causation, but sometimes you can get an amazingly high R-squared for trends that have nothing to do with each other at all. And while you can generate charts and graphs from these variables, the databases aren’t organized so you can compared all types of unusual ways someone can die, as an epidemiologist or Darwin Awards committee member would find useful.
In the real world, we organize the collection of data so that it is more readily turned into useful information. Time sheets, for example, include the person’s name, employee number, charge numbers and time worked for each task. The data is organized in a way that it can be correlated and combined for specific goals like timekeeping, paycheck generation and activity based costing.

Directed for a Task

You can collect data on how the plants in your yard grow, the number of blue cars you see on a road trip or the number of times someone says a particular word in a speech. However, data only becomes information when it is directed toward a particular task.
You collect information on time worked, charge numbers and employee identifiers and compare it to Human Resources data on pay rates and income tax brackets to generate paychecks that are correct.
Part defect data like part number, location and type of defect, and sometimes information like the conditions in which the defect occurred are collected with the intent of tracking or improving product quality.
Information such as the item ordered, amount paid and from whom are collected so that the company can track expenses.
On a simpler scale, part databases track the dimensions and other specifications of a part, often along with the suppliers who provide the product to the company and the material specifications, so that people can figure out what they have and where it can be used.

Applied to a Decision

The reason to collect data is to aid in making a decision. Tracking time card data and production information leads to decisions on reassigning personnel to eliminate overtime and increase production in understaffed areas. Tracking part defects ideally leads to identification of the root causes of those defects and their elimination. Recording information on purchases and the costs involved should result in better planning of purchases so that the company never runs short and lot size is optimized to minimize cost per item. Tracking purchases and costs of all types allows you to set a realistic budget for next year and identify areas where spending might be trimmed or constrained so the company doesn’t end up consuming its profit margin.

Takeaways from Drucker’s Definition of Information

If someone says, “Let’s track X!”, immediately ask how that data affects existing decisions or what decision they intend to make based on the information. Data collected for no reason is a type of waste.
Extra rows on the spreadsheet and tables in the database turn the analysis into a far more complicated case of the extra info in the math word problem. And collecting data in the hope that it will be useful will certainly waste people’s time when they try to figure out how these variables affect other variables when trying to make a decision.
Design your data collection to make it as easy as possible to collect only the information you need and include all necessary data to become data.
Know what information you are gathering and why so that you can tailor your reports to only include the information you need to make the decision. This will speed up decision making and prevent information overload.