Understand what counts as data, why organisations collect it, and how structured versus unstructured formats influence analysis.
Data are recorded facts, measurements, or observations made to answer a question. They can be numbers, text, audio, video, or any digital artefact.
People, organisations, experiments, and devices all generate data: exam scores, website clicks, lab sensor readings, patient vitals.
Recording why data were collected is as important as the raw values. Context prevents misinterpretation.
| Source type | Description | Examples | Advantages | Challenges |
|---|---|---|---|---|
| Primary (collected) | Data gathered firsthand for a specific purpose. | Surveys, experiments, observations. | High control over quality and definitions. | Expensive and time-consuming. |
| Secondary (published) | Data compiled by someone else and shared. | Government portals, research articles, reports. | Fast access, often large coverage. | Definitions may differ; documentation varies. |
Organised into tables with rows and columns. Easy to search and analyse using SQL, spreadsheets, and statistical software.
Example: admissions dataset with columns for student ID, programme, entrance score, and admission status.
Free-form content such as text, images, audio, and video. Requires preprocessing or feature extraction before analysis.
Example: support ticket descriptions or customer feedback voice recordings.
Many modern projects combine both types: an e-commerce platform may have structured transaction tables plus unstructured product reviews.