In 2014, during my computer science masters, I had to write an 250-word “essay” for a scholarship application answering the question: “What do you think will be the future impact of data-driven computing on society?”

I re-read my response recently and found it interesting in light of modern day contexts. Note that this essay response was relatively controversial at the time and was the only selected essay with a negative sentiment.

As much as the analysis of larger and larger data sets has been touted as the future of computing, data-driven computing has the potential to have adverse effects if the public does not become more data-literate. Arguably, the excitement and faith in data-driven computing has been this general feeling that data (in all of its many forms) represents some Truth, and the “bigger” the data becomes, the closer we come to this Truth. Anyone who has firsthand collected or analyzed data understands this isn’t the case: surveys are designed poorly, data is mislabeled, and (perhaps worst of all) incorrect conclusions are drawn. These are just some easily-generalizable examples.

The public is capable of being misled by what they believe to be intelligent systems that analyze masses of data. Consider a search engine. We trust that the most truthful, relevant articles appear at the top of the results. Although this assumption can probably be safely made most of the time, it exemplifies the ease in which we relinquish our own abilities to judge quality and accuracy (Truth) to a data-driven application. We must, therefore, understand the limits of data-driven computing when we use these applications. Although I do not have a solution, nor could one be written here within the word limit, it’s clear that in order for data-driven applications to proliferate safety in society, the general public must expand its relatively limited understanding of how data is both gathered and analyzed.

