In this week’s assignement, we walked through a sample of Esri’s Spacial Data Science MOOC.
We created models to predict voter turn out based on 2016 election data. Each time we ran the model, we included either more precitive variables, more targeted predictive variables, or looked at further granularity.
We had two questions to answer in our reflection of walking through the exercise.
Yes, it did. I had a sense of how decision trees could be used for categorical value prediction before this exercise. However, it was really enlightening to see the details of a forest-based method in action. Did it take a while for my computer to process the higher numbers? Yes. Was it worth going through multiple iterations to look at the changes in details? Also yes. This was a good real world example of how prediction modeling algorithms are made: trial and error and constant refitting. I appreciated that the lesson built on itself. It was very funny to see “owns a selfie stick” be a better predictive variable than some of the more traditional measures.
Predictive modeling is trying to say “what is the best prediction we can make within a certain level of confidence with the information (variables) we have.” The charts/visualizations and message information were key to interpreting and deciding best fit.
I can see where the different approaches of variables, weights, number of branches, and iterations run would changed based on the problem statement. For example, if a larger company with ample funding was putting up targeted voting ads they would probably accept a lower prediction significance of “where would the most people of a certain population see it” than a small “get out the vote” NGO attempting to target areas of largest impact with limited resources.
I can imagine using this tool in an exploratory job basis. However, I think I need more experience with predictive modeling before using it with certainty in a problem solving context. To inform the direction research should go? Yes. As the basis of a decision? Not yet.
To expand on the voting issue, it might be interesting to look at forest-based classification comparing gerrymandered versus non gerrymandered counties to see if the impact of inequitable county lines can be quantified / compare differences in prediction. Another example of a problem this prediction model could solve is in public health issues, such as prediction of increases in illnesses in an area (heart attacks, car accidents, virus outbreaks) and then allocation/assignment of resources. This model is good for telling us where it’s likely to happen - then we respond to that information.