Statistics is the art and science of learning from data: we collect, summarise, analyse, interpret and finally communicate findings so that decisions can be made even when uncertainty is present.
Rigorous probability theory provides the science. Choosing the right model, summarising the story concisely, and convincing stakeholders require judgement — the art.
Governments recorded births, deaths, and harvests. Statistics meant "state arithmetic" — keeping careful tallies.
Fisher, Pearson, and Gosset formalised experimental design, sampling distributions, and hypothesis tests. Statistics became a toolkit for inference.
Computers enabled complex models, resampling, and simulation. Data volumes grew and so did the demand for automation.
Statistics underpins machine learning workflows: data wrangling, exploratory analysis, feature engineering, and uncertainty quantification.
Keep this historical arc in mind: modern buzzwords still rest on the same principles of careful data collection and valid inference.
Uses tables, charts, and summary numbers to describe the data we collected.
Example: summarising daily COVID-19 cases in Chennai during July.
Draws conclusions about a wider population by analysing a sample.
Example: using a sample of customer ratings to estimate the true satisfaction level of all customers.
Neither branch stands alone. In practice we start with descriptive statistics to understand the sample, then deploy inferential methods to generalise responsibly.
Within data science pipelines, statistics keeps us honest:
If you are ever unsure of which algorithm to run next, pause and revisit the statistical question: What are we measuring? Who does the sample represent? What uncertainty matters?