Monoculture (computer science)

In computer science, a monoculture is a community of computers that all run identical software. All the computer systems in the community thus have the same vulnerabilities, and, like agricultural monocultures, are subject to catastrophic failure in the event of a successful attack.[1]

Overview

With the global trend of increased usage and reliance on computerized systems, some vendors supply solutions that are used throughout the industry (such as Microsoft Windows) - this forms algorithmic monocultures. Monocultures form naturally since they utilize economies of scale, it is cheaper to manufacture and distribute a single solution. Furthermore, by being used by a large community bugs are discovered relativity fast.

Like agricultural monocultures, algorithmic monocultures are not diverse, thus susceptible to correlated failures - a failure of many parts participating in the monoculture. In complete non-monocultures, where the outcome of all components are mutually independent thus un-correlated, the chance of catastrophic event (failure of all the parts in the monoculture) is the multiplication of each component failure probability (exponentially decreasing).

On the other end, perfect monocultures are completely correlated, thus have a single point of failure. This means that the chance of a catastrophic event is constant - the failure probably of the single component.

Examples

Since operating systems are used in almost every workstation they form monocultures. For example Dan Geer has argued that Microsoft is a monoculture, since a majority of the overall number of workstations connected to the Internet are running versions of the Microsoft Windows operating system, many of which are vulnerable to the same attacks.

Large monocultures can also arise from software libraries, for example the Log4Shell exploit in the popular Log4j library estimated to affect hundreds of millions of devices.[2]

Individual level concerns

The concept is significant when discussing computer security and viruses, the main threat is exposure to security vulnerabilities. Since monocultures are not diverse, any vulnerability found exists in all the individual members of the monoculture increasing the risk of exploitation.[3] An example to that is exploit Wednesday in which after Windows security patches are released there is an increase exploitation events on not updated machines.

Clifford Stoll wrote in 1989 after dealing with the Morris worm:[4]

A computer virus is specialized: a virus that works on an IBM PC cannot do anything to a Macintosh or a Unix computer. Similarly, the Arpanet virus could only strike at systems running Berkeley Unix. Computers running other operating systems—like AT&T Unix, VMS, or DOS—were totally immune.

Diversity, then, works against viruses. If all the systems on the Arpanet ran Berkeley Unix, the virus would have disabled all fifty thousand of them. Instead, it infected only a couple thousand. Biological viruses are just as specialized: we can't catch the flu from dogs.

Bureaucrats and managers will forever urge us to standardize on a single type of system: "Let's only use Sun workstations" or "Only buy IBM systems." Yet somehow our communities of computers are a diverse population—with Data General machines sitting next to Digital Vaxes; IBMs connected to Sonys. Like our neighborhoods, electronic communities thrive through diversity.

Another main concern is increased spread of algorithmic bias. In the light of increased usage of machine learning there is a growing awareness of the biases introduced by algorithms. The nature of monocultures exacerbate this problem since it makes the bias systemic and spreading unfair decisions.

Social level concerns

Monocultures may lead to Braess's like paradoxes in which introducing a "better option" (such as a more accurate algorithm) leads to suboptimal monocultural convergence - a monoculture whose correlated nature results in degraded overall quality of the decisions. Since monocultures form in areas of high-stakes decisions such as credit scoring and automated hiring, it is important to achieve optimal decision making.

This scenario can be studied through the lens of mechanism design, in which agents are choosing between a set of algorithms, some of which return correlated outputs. The overall impact of the decision making is measured by social welfare.

Suboptimal monocultures convergence in automated hiring

This section demonstrates the concern of suboptimal monoculture convergence using automated hiring as a case study. Hiring is the process of ranking a group of candidates and hiring the top-valued. In recent years automated hiring (automatically ranking candidates based on their interaction with an AI powered system) became popular.

As shown by Kleinberg,[5] under some assumptions, suboptimal automated hiring monocultures naturally form, namely, choosing the correlated algorithm is a dominant strategy, thus converging to monoculture that leads suboptimal social welfare.

Framework

In this scenario we will consider two firms and a group of candidate with hidden utilities of . For hiring process - each firm will produce a noisy-ranking of the candidates, then each firm (in a random order) hires the first available candidate in their ranking. Each firm can choose to use either an independent human rankers or use a common algorithmic ranking.

The ranking algorithm is modeled as a noisy distribution above permutations of parametrized by an accuracy parameter .

In order for to make sense it should satisfy these conditions:

  1. Differentiability: The probability of each permutation is continues and differentiable in
  2. Asymptotic optimality: For the true ranking :
  3. Monotonicity: The expected utility of the top-ranked candidate gets better as increases, even if any subset of is removed.

These conditions state that a firm should always prefer higher values of , even if it is not first in the selection order.

Both the algorithmic and human ranking methods are of the form of and differ by the accuracy parameters . The algorithmic ranking output is corotated - it always outputs the same permutation. In contrast, a human ranked premutation is drawn from independently for each of firms.

For strategies of the first and second firm, Social welfare is defied as the sum of utilities of the hired candidates.

Conditions to suboptimal convergence

The Braess's like paradox in this framework is suboptimal monocultures converges. That is, using the algorithmic ranking is dominant strategy thus converging toward monoculture yet it yields suboptimal welfare (welfare in a world without algorithmic ranking is higher).

The main theorem proved by Kleinberg[5] of this model is that for any and any noisy ranking family that satisfy these conditions:

  1. Preference for the first position: For all if then .
  2. Preference for weaker competition: For all .

there exists a such that both firms prefer using the sherd algorithmic ranking even though the social welfare is higher when both use the human evaluators. In other words - regardless of the accuracy of the human rankers there exists a more accurate algorithm whose introduction leads to suboptimal monoculture convergence.

The implications of this theorem is that under these conditions, firms will choose to use the algorithmic ranking even though that the correlated nature of algorithmic monocultures degrades total social welfare. Even though algorithmic rankings are more accurate.

The first condition on (Preference for the first position) is equivalent to a preference of firms to have independent ranking (in our setting - non algorithmic). This means that a firm should prefers independent ranking methods given all else is equal.

The intuition behind preference for weaker competition is that when a candidate is removed (hired by a different firm), the best remaining candidate is better in expectation when the removed candidate is chosen based on a less accurate ranking. Thus, a firm should always prefer that its competitors would be less accurate.

These conditions are met for that is the Mallows Model distributions and some types of random utility models (Gaussian or Laplacian noise).

See also

References

  1. ^ Goth, G. (2003). "Addressing the monoculture". IEEE Security & Privacy. 1 (6): 8–10. doi:10.1109/msecp.2003.1253561. ISSN 1540-7993. S2CID 16965084.
  2. ^ Murphy, Hannah (2021-12-14). "Hackers launch more than 1.2m attacks through Log4J flaw". Financial Times. Retrieved 2021-12-17.
  3. ^ Stamp, Mark (2004). "Risks of monoculture". Communications of the ACM. 47 (3): 120. doi:10.1145/971617.971650. ISSN 0001-0782. S2CID 16746625.
  4. ^ Stoll, Clifford (1989). The Cuckoo's Egg. Doubleday. pp. 320–321. ISBN 978-0-307-81942-0.
  5. ^ a b Kleinberg, Jon (2021). "Algorithmic monoculture and social welfare". Proceedings of the National Academy of Sciences. 118 (22). arXiv:2101.05853. Bibcode:2021PNAS..11818340K. doi:10.1073/pnas.2018340118. PMC 8179131. PMID 34035166.