The distinction sounds obvious when stated directly. A model that predicts accurately does not necessarily tell you why something happened, or what would happen if you changed something. A model trained to predict which students will fail a course can be very accurate without revealing anything about what caused those students to struggle, or what intervention would help.
This came into focus during my MSc thesis work. The goal was to build a classifier that could detect student confusion and frustration in online discussion forum posts – not to measure sentiment in general, but to identify the specific signal that indicated a student needed a response from an instructor. An SVM classifier with a non-linear Gaussian kernel, combined with POS frequency counts and a custom course-content dictionary, achieved an F1 score of 0.79 and an accuracy of 0.83. Inter-rater reliability testing against experienced college instructors put agreement at between 74% and 91%, depending on the instructor.
By the standards of the task, those are useful numbers. The classifier does what it is supposed to do. What it cannot tell you is why certain course content consistently generates confused posts, or whether changing the content would reduce confusion, or whether the confusion is caused by the content at all rather than by something else happening in the course at the same time. Prediction and explanation are not the same problem, and a classifier trained to do one does not automatically do the other.
This distinction matters more in some domains than others. In marketing measurement, which is where much of my applied work sits, it matters a great deal. Knowing that a channel correlates with conversions is not the same as knowing that the channel caused them. A customer who sees a display ad and converts through paid search three days later may have converted anyway. The correlation is real. The causal claim requires more work.
Causal inference provides the framework for doing that work. The tools: potential outcomes, directed acyclic graphs, do-calculus; are not new but their application to marketing measurement is underdeveloped, in part because the data requirements are non-trivial and in part because the outputs are less immediately legible than a coefficient in a regression.
The research thread I am pursuing sits at this intersection: causal attribution models for digital marketing, using the kind of server-side event data that a tool like CampaignCheck generates. The practical and the theoretical are, in this case, the same problem approached from different directions.
More on this as the work develops.