- Rethinking Evidence: Can big data help design better policies? - March 22, 2016
- Rethinking Evidence: Can big data help design better policies? - March 22, 2016
There is no debate, both in practice and in academia, about the fact that public policy domain is complex. In trying to navigate this complexity, the field has enjoyed widespread attention in terms of approaches and analytical frameworks. Most of these are grounded in the rational choice theory, where maximisation of aggregate utility across citizens is key. In order to make the most beneficial decisions, it is desirable to have knowledge of all possible alternatives and constraints.
Evidence based policy-making, the latest wave in policy studies, certainly points us in the right direction, as it encourages the use of knowledge in making policy, programme and project decisions. However, as we have learnt, it is possible to have poor evidence based policies because of incomplete, biased, and unsound evidence.
Randomized Controlled Trials, the Gold Standard?
The most common approach for so-called “rigorous evidence” formulation has been randomized controlled trials (RCTs), which are considered the gold standard in assessing the efficiency and effectiveness of medical interventions through clinical trials. In addition to RCTs, other designs like quasi experiments and longitudinal studies as well theory based approaches are used to support causal inference.
RCTs involve randomly assigning the new treatment to trial participants and observing the clinical outcomes for a period of time. The “evidence” from such a trial is whether the clinical outcomes of the group that was randomly assigned to the new treatment is significantly different to those who did not receive the treatment. Similarly, policy makers apply RCT methodology by randomly rolling out a “new” policy intervention and observing whether this leads to significant differences in outcomes.
Central to RCT methodology is that there has to be a large sample, observed over a long period of time. This is because of the reliance on randomization to statistically control for confounding effects that may be confused with the intervention under observation. For example, if we would like to evaluate the effectiveness of a tutoring programme on the performance of first year university students, we rely on randomized enrolment to the tutoring programme to control for socioeconomic status, IQ and other relevant factors.
Is there a better way?
We might ask whether we should be investing in such large studies for policy-making purposes. The effort to gather significant evidence for a policy hypothesis is time-consuming. Is there a more intelligent way to formulate policy such that the time and financial cost of getting the required evidence is not so prohibitive and incomplete? And since development is highly dynamic, how can we make sure that at the end of the study the evidence is still relevant?
One possibility: Borrow from Artificial Intelligence
In the last decade Machine Learning has revolutionised the way we think about and use data. It is a subfield of Artificial Intelligence that focuses on using computer science and statistical methods to learn patterns within datasets without being pre-programmed with respect to where to look. Well-known technology companies like Google have become synonymous with this field. Google uses pattern recognition in large datasets to autocomplete search strings, and translate between multiple languages, among many others.
How can these ideas that have worked so well for Google and others be extended to policy making and the public sector?
Imagine policy-makers and governments in real time collecting data, including meta-data, on the behaviour of the social systems over which they preside – just like technology companies do with their user interactions. The collection and maintenance of such datasets lead to the much talked about realm of “Big Data”. Machine Learning techniques can then be used to harness insights from these datasets.
The power of “Big Data”
While experts can come up with new policy hypothesis based on previous experiences, using “Big Data” and Machine Learning has the added advantage of not only being able to identify the common hypothesis (that can be backed by established theory) but more importantly, we can identify unexpected hypothesis which may be counterintuitive with respect to current thinking. Given the statistical foundations of Machine Learning algorithms the relative strengths of different hypothesis can be evaluated and used to prioritise interventions.
We propose the following framework for embedding “Big Data” principles into policy formulation:
We now demonstrate how we can use the above framework in a hypothetical education policy formulation process. Say we want to design an intervention that helps increase the number of high school learners who obtain university entrance in a particular district.
Data collection priorities
We begin by setting up a database to collect features for every learner in the district. This can include demographics, socioeconomic indicators and performance indicators. More importantly, this should include meta-data – for example how long they travel to school, specific answers to standardised tests and surveys, attendance records, durations of lessons for specific subjects, and so on.
It is notable that this is different from most current approaches that would provide aggregated records for each school. Compliance with information protection laws like the Protection of Personal Information Act (PoPI) is essential here; typically such a database would be anonymized. This real-time data can also be used in the monitoring phase while the policy is in place, as well as for business as usual (BAU) decision support.
Model Building and Inference
We then proceed to modelling the target outcome (obtaining a university entrance) based on the data features collected. Machine Learning algorithms such as Self Organising Maps and K-Means can be used to identify groups of features that have a tendency of being associated with learners who obtain university entrances.
Say for example we find that learners whose travel time to school is shorter tend to obtain university entrance more often than those who travel for a longer time. It is now up to policy-makers to interrogate the implications of this result and translate it into an intervention. After this interrogation it might be found that learners who travel longer times are often too tired to study after school. This could lead to interventions such as instituting after school study sessions while learners are still at school, or even to a completely new learner-school allocation framework.
Implications for African research and evaluation
In both academic and evaluation practice circles there has been a growing debate about the need for more Afrocentric research and evaluation. The sentiment is that current theory has very strong western foundations which cannot necessarily be translated into an African context.
We believe that a data-driven approach aids the move towards the Afrocentric ideal, as learning patterns from data reduces reliance on the individual’s theoretical preconceptions. This will result in more innovative and tailored policy designs that directly respond to the situation on the ground, through the use of accurate, current and reliable African data.
“Big data” and politics: Collaboration is key
Big data has enormous potential to transform the practice of policy-making and creating real societal change. Just as it has managed to do for technology giants like Google, big data can make policy-making more agile, responsive and innovative, resulting in interventions with meaningful impact.
Notwithstanding this, policy-making is still a highly political, decision-centric process. Therefore analysis by decision-makers, the political economy and power relations should not be taken out of the study and practice of development.
As policy-makers, institutions through which policy is implemented, and evaluators, we should always make sure that our efforts are on track to enable social transformation which, just like “Big Data”, asks us to be adaptive, agile, and operate in real time.