Data as a Function, not a Spreadsheet

I started doing Data Science in Excel, and honestly, I liked it (at first). Constructing a complex nested formula felt like solving a competitive programming puzzle. You get instant feedback, and there is a certain satisfaction in watching the numbers align.

But as the datasets grew, I hit a wall. The problem with Excel isn’t that it’s slow; it’s that it lacks abstraction. It forces you to mix your data with your logic. A formula in cell C5 depends on B5, but if you sort the rows, or if a new API endpoint changes the schema, the logic breaks silently. There are no invariants. The state is hidden behind the grid.

I moved to R not only because I wanted to be a “Data Scientist”, but because of scalability as well. In R, data processing is a Directed Acyclic Graph. You pull raw logs from an API, pass them through a series of transformations, and output a result.

The beauty of the CRAN ecosystem is that it handles the I/O, connecting to databases, parsing JSON, so you can focus on the data processing. You aren’t dragging formulas down a column; you are defining a vector operation that applies gloabally.

This shift from “spreadsheet” to “script” is a shift from “arithmetic” to “calculus”. You stop worrying about cells and start thinking about the system. The code becomes the single source of truth, and the report is just a deterministic artifact of that code.