Scientific analysis of the societal discourse on AI-based social assessment

In contemporary societies, the practice is widespread that social assessment is undertaken by AI-based computer systems. Mostly decision making is not entrusted completely to algorithms (ADM: algorithmic decision-making), but often it remains opaque where and how algorithms come into play (Barocas et al. 2017, Silva and Kenney 2018). Applications can be found in medicine, human resource management and market research, policing and criminal justice, as well as governance and the social welfare system. However, occasional applications can also be found in various other domains such as predicting rising stars in the game of Cricket (Ahmad et al. 2017). In medicine AI systems are used for risk assessment of diseases such as mortality through cancer or suicide (Alaa et al. 2018), but also as a decision support system for doctors (Chan et al. 2013, Gross et al. 2013,) and patients (Rubrichi et al. 2014, Lampos et al. 2015), with applications ranging from psychiatry (Selvi and Pratp 2016) to internal medicine (Fraooq and Hussain, 2016). In human resource management algorithms support application processes, whereas applications in market research range from credit scoring (Umin 2019, Yanhao Wei 2019) to personality assessment (Jimenez et al. 2017). In the domain of policing, the approach of predictive policing gained public prominence in the recent years (Perry et al. 2013, Haskell 2014). However, algorithms are also applied for informing sentence decisions by forecasting criminal behavior (Berk and Bleich 2013), predicting the likelihood of recidivism (Zeng et al. 2017, Dressel and Farid 2018), or border surveillance (Barret 2017). Governance is supported by algorithmic procedures in the public administration (Pacini 2019), defense (De Spiegeleire et al. 2017), as well as in the social welfare system (Niklas et al. 2015). Research is also widespread in many domains such as the development of algorithms or ethical assessment of AI use (Dignum 2018, Mantelero 2018, Riterich 2018) and science and technology studies of its social implications (Fourcade and Healy 2017).

Sociology of AI-based assessment technology

In a sociological perspective technology is to be analyzed as a reflection of the social value system: “Algorithms do not make judgments they are the products and the tools of human judgments” (Burk 2019). AI based social assessment can be regarded as a continuation of the concept of bureaucratic governance (Peeters and Schuilenburg 2018) as it has been described by Max Weber (1922) which is characteristic for the process of – multiple – modernization (Eisenstadt 2000). The technology of AI-based assessment systems fulfils the characteristics of bureaucratic governance of division of labour between technical experts, rule-based operations and management of information (Muellerleile and Robertson 2017). At the first sight, the argument that AI-based technologies reinforce bureaucratic governance seems to be counterintuitive: At onset of the rise of the internet and modern information technologies, these technologies have often been perceived as emancipatory technological developments (Benkler 2006). A free flow of information seemed to foster openness and transparency, creating the vision of bottom-up network societies and flattening hierarchies (Kreiss et al. 2011), thereby weakening hierarchical, bureaucratic structures. However, with or without AI-based technology, social assessment remains a control instrument. Assessing individuals implies surveillance and subsequently assigning individuals to categorial scheme. Such an assignment produces a social sorting (Lyon 2003) of a population into predefined categories. This may be the case in government as well as private companies, for instance involved in marketing. Furthermore, assessing risk scores ranging from recidivism to mortality involves the calculation of, typically probabilistic numbers. Governance by numbers is a core element of bureaucratic transformation of governance (Porter 1995, Hacking 1990). According to Weber the effects of bureaucratic governance are

  • Efficiency: appropriateness of means and ends in relation to organizational goals
  • Objectivity: procedural neutrality
  • Rationality:  Weber’s concept of rationality can be differentiated into practical, theoretical, substantive, and formal rationality (Kalberg 1980)

Discrimination

At first sight computers seem to be efficient, objective, and follow the standards of formal rationality, as computers undertake complex calculations far more efficiently than humans. Algorithmic procedures objectively follow clearly defined rules that are characteristic of formal rationality. However, already the concept of “raw data” to be processed by computers is illusory (Gitelman 2013). Thus, AI-based assessment remains in danger of pitfalls that undermine bureaucratic rationality. For detecting sources of algorithmic bias, a scheme has been developed by Danks and London (2017) and extended by Silva and Kenney (2018).  It provides a kind of algorithmic “value chain”(Silva and Kenney 2018) of the different stages of the development and applications of algorithms.

Following this model, sources of biases can be identified at every stage along this value chain. At the stage of the input data Danks and London (2017) differentiate between training data bias and algorithmic focus bias. Sources of bias at the stage of algorithms are denoted as algorithmic processing bias. At the output stage transfer context several sources of bias are distinguished: misinterpretation bias, automation bias, and non-transparency bias. Finally, at the stage the users, consumer bias, and feedback loop bias are identified. it is important to note that every stage has to be taken into account with equal weight. While algorithms may produce incorrect results in comparison with some empirical data, bias is not something purely technical in the software development process. Algorithms produce bias when they are applied which goes along with a violation of values. Values however are inherently a social concept, as values are guidelines for social conduct, defining what is socially accepted as good or desirable. For this reason, bias at all stages of the algorithmic value chain implies discrimination generated or enforced by AI-based social assessment systems. While ethically discrimination is certainly illegitimate, from a sociological perspective, discrimination provides a challenge for the objectivity of algorithmic bureaucracy as the neutrality of decision making is in question. A description of the potential biases in the algorithmic value chain can be found here.

Unfairness

The critical examination of assessment software mostly concentrates on the issue of discrimination. While certainly discrimination is unfair, the concept of fairness or unfairness of AI-based social assessment is broader. Discrimination does not equal unfairness. Besides a normative definition of fairness, a descriptive approach of what humans perceive as fairness reveals a multidimensional and context dependent concept. An empirical investigation whether feature used by the software COMPAS to calculate scores of recidivism, are perceived as fair revealed at least 8 distinct elements. The 8 dimensions might not be a complete list but are sufficient to explain fairness judgements in the survey (Grgić-Hlača et al. 2018). The survey shows that these dimensions are not perceived as unequivocally important but nevertheless agreed among the sample of different demographic characteristics and value orientations. These dimensions can be violated by software distinctly and need to be taken into account in development of fair assessment software.  As with the issue of discrimination, violation of fairness principles is ethically illegitimate, but also a challenge for the bureaucratic governance by algorithms. Unfair algorithmic assessment violates the principle of procedural neutrality that is required for objective decision making as one of the core elements of bureaucratic governance. A description of the dimensions of the concept of fairness can be found here.

Accuracy

Algorithms provide exact calculations without errors. Thus, algorithmic decision making implies the promise of higher accuracy than assessment and decision making by humans. Big data and algorithms are perceived of high potential in psychology (Adjerid and Kelley 2018) and computer-based personality judgments are seemingly more accurate than those made by humans (Youyou et al. 2015). This should increase the efficiency of bureaucratic assessment procedures. However, this claim is rarely tested. The quality of algorithmic decision making depends on the multiple stages of the algorithmic value chain (Dank and London 2017). A recent examination of the accuracy of the software COMPAS for assessing the risk of recidivism of offenders shows that the predictions of software are only of modest quality (Dressel and Farid 2018). This software calculates the risk of recidivism of a criminal offender within the next two years based 137 features of an individual and his or her criminal record. However, the neutrality of the software has been questioned (Angwin et al. 2016), causing a public debate in the USA on algorithmic fairness and discrimination (Flores et al. 2016, Kleinberg et al. 2016). For this reason, the more fundamental question has been investigated how accurate the predictions are made by the COMPAS algorithm. The study compares the false positives and false negatives, i.e. wrongly predicting recidivism and wrongly predicting no recidivism, of the software with assessments made by humans. The test persons for the study had no prior experience in criminal justice. With information of only 7 features given to the humans no statistically significant differences in the accuracy of the predictions made by the software and the predictions made by non-expert humans could be found. This result questions the assumption of increased accuracy of decision making and thus of increased efficiency of bureaucratic governance by using algorithms.

Accounting for flexibility

In other respects, the potential of bureaucratic governance by algorithmic decision making is challenged by the fact that law and administrative regulation is situated in a dynamic, ever changing social environment and need to balance between competing demands. As rules are static this raises a problem for rule-based decision making. Fixed rules cannot take into account social dynamics such as value change or. In the domain of law this is reflected by the fact that law is not static but rather is subject to legal interpretation through human judges. This human element is essential for smooth operation of bureaucratic governance. It has already been outlined by Max Weber (1922) that formal rules are too rigid to fairly accommodate unforeseen circumstances (Burk 2019). For this reason, an efficient application of administrative procedures is artwork that implies a certain context-specific flexibility from the administrative professionals which implies a certain degree of ex ante uncertainty.  Bureaucracy cannot be perfect in order to operate.

However, flexible adaption to varying circumstances provides a challenge for rules codified in algorithms, once the rules are hardwired in the system (Casey and Niblett 2017). While it could be argued that machine learning might detect patterns in for instance in court decisions which interpret and thereby develop the codified law, this comes at the cost that it freezes the standard of the time the implementation was encoded and that the further adoption of legal interpretations for instance due to changing social values will come to a halt. In consequence if legal governance and administration is increasingly reliant on data collection and algorithmic data processing (Sag 2017) the danger exist that this triggers an unintended and unexpected change of social practices and their guiding values (Burk 2019). For instance, by analysing the example of the legal balancing standard of “fair use” in the handling copyright and copyright exceptions for platforms such as YouTube or Facebook, Burk (2019) finds that fair use is not a static concept and that algorithmic implementation of automated decision making is already changing legal standards and social practices. Even if challenges of discrimination, fairness, and accuracy can be met, it is necessary that in reality the ideal of bureaucratic governance is always only approximated. Thus, the question remains whether AI based social assessment can account for the vague approximation of bureaucracy to account for dynamics of social life.

Ethical challenges

Whereas flexibility provides a challenge for bureaucratic governance, bias, discrimination, and fairness or unfairness of algorithmic decision-making raises ethical concerns. Autonomous decision making by machines leads to the problem of accountability for such possible bias or discrimination (Pasquale 2015 in Raw). Leaving important aspects of individual lives to the rule of AI systems risks “paving the way to a new feudal order” (Citron and Pasquale 2014: 19). For this reason, attempts at regulating AI use has become a major agenda of industrial and governmental entities in recent time (Delvaux 2016, National Science and Technology Council Committee on Technology 2016, IEEE 2016, Hern 2016). However, AI governance is in need of ethical guidelines. For developing such guidelines, the full life cycle of algorithmic systems has to be taken into account. Dignum (2018) distinguished between three dimensions in which ethical reason is relevant: ethics by design, which addresses the integration of ethical reasoning in artificial autonomous systems. Ethics in-design addresses methods for analysis of ethical implications of AI systems, whereas ethics for-design regulates the codes of conduct for ensuring integrity of developers and users of AI systems.

In the context of value sensitive design (Aldewereld et al. 2014) and evaluation (Diakopoulos 2015) of autonomous systems, Rahwan (2018) argues for the need of a new algorithmic social contract to prevent the danger of losing democratic control over algorithmic decision making by extending the concept of human-in-the loop (HITL) (Sheridan 2006 in Raw). HITL instantiates human supervisory control of AI systems for identifying misbehaviour and establishing an accountable entity. Due to their wider range, in the case of AI based social assessment systems human supervision additionally has to account for a trade-off between different values and how to agree on distribution of costs and benefits to different stakeholders in a new social contract. This is what Rahwan (2018) denotes as society-in-the-loop. For initiating a public feedback on regulations and legislations, an articulation of and negotiating between (different) values is needed as well as monitoring compliance.  On the other hand, in the context of ethics-by design, it has been suggested to stop a system before it can become destructive (Ourseau and Armstrong 2016) or developing a moral Turing test (Wallach and Allen 2008). Instead Arnold and Scheutz (2018) suggest that AI systems should be tested in advance in a simulated environment that contains an ethical scenario generating mechanism. The objective of AI-FORA can be described as realizing the proposals made by Rahwan (2018) and Arnold and Scheutz (2018): Building a co-creation lab for initiating a public feedback on instantiating regulations on AI based social assessment systems that is informed by a scenario generating simulation.   

References    

Adjerid, I., Kelley, K. (2018). Big data in psychology: A framework for research advancement. American psychologist 73(7): 899-913.

Ahmad, H., Daud, A., Wang, L., Hong, H., Dawood, H., Yang, Y. (2017). Prediction of Rising Stars in the Game of Cricket. IEEE Access 5: 4104 – 4124.

Alaa, A., Yoon, J., Hu, S., van der Schaar, M. (2018). Personalized Risk Scoring for Critical Care Prognosis Using Mixtures of Gaussian Processes. IEEE Transactions on biomedical engineering 65(1): 207-218.

Aldewereld, H., Dignum, V., & hua Tan, Y. (2014). Design for values in software development. In In J. van den Jeroen, P. E. Vermaas, & I. van de Poel (Eds.), Handbook of ethics, values, and technological design (pp.1-12). Dordrecht: Springer.

Angwin, J., Larson, J., Mattu, S., Kirchner, L. (2016). Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, 23 May 2016; www.propublica.org/article/machine-bias-risk-assessmentsin-criminal-sentencing.

Arnold, T,. Scheutz, M. (2018). The big red button is too late: an alternative model for the ethical evaluation of AI systems. Ethics and information technology 20(1): 59-69. 

Barocas, S., Bradley, E., Honavar, V., Provost, F. (2017). Big Data, Data Science, and Civil Rights. arXiv preprint arXiv:1706.03102

Barrett, L. (2017). Reasonably Suspicious Algorithms: Predictive Policing at the United States Border. NYU Rev. L. and Soc. Change 41, no. 3: 327-365.

Benkler, Y. (2006). The wealth of networks: How social production transforms markets. New haven: Yale University Press,

Berk, R.A. & Bleich, J. (2013). Statistical Procedures for Forecasting Criminal Behavior. Criminology & Public Policy, 12, 513-544.

Burk, D. (2019). Algorithmic fair use. The University of Chicago Law Review, Vol. 86, No. 2, Symposium: Personalized Law (March 2019), pp. 283-308.

Casey, A., Niblet, A. (2017). The death of rules and standards. Indiana law journal 92(4): 1401-1447.

Chan, Y. et. Al. (2013). TEMPTING system: A hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries. Journal of biomedical informatics 46(Supplement S): S54-S62.

Citron, D. K., Pasquale, F. A. (2014). The scored society: due process for automated predictions. Washington Law Review, 89, 1–33

Danks, D., London, A. (2017). Algorithmic Bias in Autonomous Systems. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 4691-4697. IJCAI. https://doi.org/10.24963/ ijcai.2017/654.

De Spiegeleire et al. (2017). Defense – yesterday, today and tomorrow. In De Spiegeleire et al. (ed.) Artificial intelligence and the future of defence: strategic implications for small and medium sized force providers. Hague Centre for strategic studies. pp. 60-98.

Delvaux, M. (2016). Motion for a European Parliament resolution: with recommendations to the commission on civil law rules on robotics. Technical Report (2015/2103(INL)), European Commission.

Diakopoulos, N. (2015). Algorithmic accountability: Journalistic investigation of computational power structures. Digital Journalism 3(3): 398–415.

Dignum, V. (2018). Ethics in artificial intelligence: introduction to the special issue. Ethics and Information Technology 20:1–3.

Dressel, J., Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science advances 4: 1-5.

Eisenstadt, S. (2000). Multiple modernities, Daedalus 129(1): 1 – 30.

Ettensberger, F. (forthcoming). Comparing Supervised Learning Algorithms and Artificial Neural Networks for Conflict Prediction: Performance and Applicability of Deep Learning in the field. Quality & Quantity (online first).

Farooq, K., Hussain, A. (2016). A novel ontology and machine learning driven hybrid cardiovascular clinical prognosis as a complex adaptive clinical system. Complex adaptive Systems modelling 4(12).

Flores, A. W., Bechtel, K., Lowenkamp, C. T. (2016). False positives, false negatives, and false analyses: A rejoinder to Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.” Federal Probation 80(2): 38-46.

Fourcade, M., Healy, K. (2017). Categories all the way down. Historical Social Research / Historische Sozialforschung, Vol. 42, No. 1 (159), Markets and Classifications. Categorizations and Valuations as Social Processes Structuring Markets: 286-296.

Gitelman L. (ed.)(2013). Raw data is an oxymoron. Cambridge, MA: MIT Press.

Grgić-Hlača, N. et al. (2018). Human perceptions of fairness in algorithmic decision making: a case study of criminal risk perception. Conference: 27th World Wide Web (WWW) Conference Location: Lyon, FRANCE Date: APR 23-27, 2018 (WWW2018): 903-912

Gross, D. et. Al. (2013). Development of a Computer-Based Clinical Decision Support Tool for Selecting Appropriate Rehabilitation Interventions for Injured Workers. Journal of occupational rehabilitation 23(4): 597-609.

Hacking, I. (1990). The taming of chance. Cambridge: Cambridge University Press.

Haskell, E. (2014). A social interaction model for criminal Hotspots. In Squazzoni et al. (eds.) ECMS 2014: 745-751.

Hern, A. (2016). ‘partnership on artificial intelligence’ formed by Google, Facebook, Amazon, IBM, Microsoft and Apple. Technical report, The Guardian.           

IEEE (2016). Ethically aligned design. Technical report, The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems.

Jimenez, F., Jodar, R., del Pilar Martin, M., Sanchez, G., Sciavicco, G. (2017). Unsupervised feature selection for interpretable classification in behavioral assessment of children. Expert Systems 34(4).

Kalberg, S. (1980). Max Weber’s Types of Rationality: Cornerstones for the Analysis of Rationalization Processes in History. The American journal of sociology 85(5): 1145 – 1179.

Kleinberg, J., Mullainathan, S., Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores; https://arxiv.org/abs/1609.05807v2 (2016).

Kreiss, D., Finn, M., Turner, F. (2011). The limits of peer production: Some reminders from Max Weber for the network society. New media & society 13(2): 243 – 259.

Kuang, C. November 21, 2017. “Can A.I. be Taught to Explain Itself?” New York. https://www.nytimes. com/2017/11/21/magazine/ can-ai-be-taught-to-explain-itself. html.

Lampos, V., Yom-Tov, E., Pedbody, R., Cox, I. (2015). Assessing the impact of a health intervention via user-generated Internet content. Data mining and knowledge discovery 29(5): 1434–1457.

Lyon, D. (ed.)(2003). Surveillance as social sorting: privacy, risk, and digital discrimination. London: Routledge.

Mantelero, A. (2018). AI and Big Data: A blueprint for a human rights, social and ethical impact assessment. Computer law & security review 34(4): 754-772.

Muellerleile, C., Roberson, S. (2018). Digital Weberiansism: bureaucracy, information, and techno-rationality of neoliberal Capitalism. Indiana Journal of Global Legal Studies, Vol. 25, No. 1: 187-216.

National Science and Technology Council Committee on Technology. (2016). Preparing for the future of artificial intelligence. Technical report, Executive Office of the President.

Niklas, J., Sztandar-Sztanderska, K., Szymielewicz, K. (2015). Profiling the Unemployed in Poland: Social and Political Implications of Algorithmic Decision Making. https://panoptykon.org/sites/default/files/leadimagebiblioteka/panoptykon_profiling_report_final.pdf. (2015).

Orseau, L. Armstrong, S. (2016) Safely interruptible agents. Proceedings of the Thirty-Second Uncertainty in Artificial Intelligence Conference.)

Pacini, F. (2019). A modest proposal: The virtual politician. AI as provocation to reflect on representative democracy. Biolaw Journal 1(1): 101-113.

Pasquale, F. (2015). The black box society: The secret algorithms that control money and information. Cambridge: Harvard University Press.

Peeters, R., Schuilenburg, M. (2018). Machine Justice: Governing security through the bureaucracy of algorithms. Information polity 23(3): 267-280.

Perry, WL. et al. (2013). Predictive policing. The role of forecast in law enforcement operations. Santa Monica: RAND.

Porter, T. (1995). Trust in numbers. The pursuit of objectivity in science and public life. Princeton: Princeton University Press.

Rahwan, I. (2018). Society‑in‑the‑loop: programming the algorithmic social contract. Ethics and information technology 20(1): 5-14.

Riterich, A. (2018). Big data. Ethical debates. In Richertich, A. (Ed.) The big data agenda: Data ethics and critical data studies. University of Westminster Press. pp. 33-52.

Rubrichi, S., Battistotti, A., Quaglini, S. (2014). Patients’ involvement in e-health services quality assessment: A system for the automatic interpretation of SMS-based patients’ feedback. Journal of Biomedical Informatics 51: 41-48.

Sag, M. (2017). Internet Safe Harbors and the Transformation of Copyright Law, 93 Notre Dame Law Review 93: 499 – 564. 

Selvi, K., Pratp, K. (2016). Possibilistic LVQ neural network – an application to childhood autism grading. Neural Network World 26(3): 253-269.

Sheridan, T. B. (2006). Supervisory control. Handbook of Human Factors and Ergonomics (3rd ed., pp. 1025–1052). Hoboken: Wiley

Silva, S., Kenney, M. (2019). Algorithms, Platforms, and Ethnic Bias: An Integrative Essay. Phylon 55, No. 1 & 2: 9-37.

Umin, S. (2017). Artificial Intelligence and the Possibilities of Legal Paradigm Transformation: Impacts and Challenges on the Legislative Practice Governance. Korean journal of law & society 56: 351-385.

Wallach, W., Allen, C. (2008). Moral machines: Teaching robots right from wrong. Oxford: Oxford University Press.

Weber, M. (1922) Wirtschaft und Gesellschaft. Tübingen: Mohr

Yanhao Wei 2019 Yanhao Wei et al. (2019). Credit Scoring with Social Network Data. Marketing Science 35(2):234-258.

Youyou, W., Kosinski, M., Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Science112(4): 1036-1040.

Zeng, J., Ustun, B., Rudin, C. (2017) Interpretable classification models for recidivism prediction. Royal Statistical Society 180(3): 689-722.