Home :: Chapter 8 :: Web Topic 8.10

Web Topic 8.10
Signal Detection Theory and Signal Effectiveness


The major problem with using receiver responses as an index of signal effectiveness is that responses confound the effects of the amount of information provided by a signal, the receiver’s estimates of prior probabilities, and the relative payoffs of alternative actions. A female faced with choosing between two displaying males may fail to discriminate between their displays a) because the differences are too small for her to detect, or b) it does not pay for her to expend the effort to compare them.

Signal detection theory provides tools for separating the roles of the amount of information in signals from the value of that information. It allows one to compute an index, called receiver sensitivity and denoted by d', that can be used as another measure of the effectiveness of a signal set. The following discussion assumes that the reader is familiar with the general approach of signal detection theory as summarized in Web Topic 8.4.

Logic of method

Consider a hypothetical example in which females seek to identify a healthy male instead of a sick male as a mate. All males sing songs and song speed varies continuously among males. However, the distribution of song speeds for healthy males has a higher mean value than that for sick males. Let male song speed be denoted by w. The task for each female is to define a “red line” at some critical value wc such that any male whose song speed exceeds wc will be considered an acceptable mate and any male with a slower speed will be rejected. The optimal value of wc will depend upon a given female’s estimates of the prior probabilities of sick and healthy males, and the relative payoffs to her of correct versus wrong decisions. Thus different females, or the same female at different times, may set different values of wc.

If the distributions of song speed for sick and healthy males are at all overlapping, a female invoking her particular wc will make some correct choices and some errors. Let Phit denote the fraction of time that a given female correctly selects a healthy male for a mate and Pfalse alarm denote the fraction of the time that she mistakenly selects a sick male for a mate because his song rate is greater than wc.

Now consider three different populations of males that vary in the degree to which the distributions of song rate for sick and healthy males overlap. In each example, a graph on the left will plot song speed (w) on the horizontal axis and the probability that a given type of male (sick or healthy) will sing that song speed on the vertical axis. In all cases, we shall assume the distributions are roughly bell-shaped with the same variances. The mean song speed for each distribution is the one under the peak value of the bell-curve’s vertical axis. Consider first a population (A) in which there is little or no difference in the mean values of sick and healthy male song speeds: the two distributions are completely overlapping. This is shown on the left graph below:

Suppose we select pairs of healthy and sick males at random from this population and record their songs. We then play the two songs back to a test female from that population and see which speaker she approaches. We do this multiple times with different pairs of randomly sampled males to get an estimate of how often she correctly selects the healthy males (Phit) and how often she incorrectly selects sick ones (Pfalse alarm). The values of these two measures will depend on that female’s red line value of wc. We plot the two values on the graph on the right and label it wc1. We then select a second test female who is of a different age, has different nutritional condition, or because of different prior probabilities is likely to have a different red line value of wc, and repeat our experiment. We then plot her values of Phit and Pfalse alarm on the graph and label them. Adding values for more females allows us to see the relationship between Phit and Pfalse alarm as the relative payoffs to females of right versus wrong decisions changes. The graph on the right is called a receiver operating curve or ROC graph.

If the distributions of song speed for sick and healthy males are completely overlapping, it will be impossible for females to make accurate discriminations between them using song speed: there is no correlation between song speed and health, and it should be obvious that attending to song speed provides no information to females. In this case, the ROC graph is a straight line as shown in this example: Phit and Pfalse alarm remain proportional to each other at a fixed rate. Increasing one results in an equivalent increase in the other.

Next, consider a population (B) in which song speed is somewhat correlated with male health. This implies that the two distributions are not entirely overlapping, and there is thus a non-zero difference between their means. Let us call that difference d'.

If we now undertake playbacks of sick and healthy male songs to a series of females, we will get the plot of Phit versus Pfalse alarm shown on the right. Now, the ROC relationship between Phit and Pfalse alarm bends up towards the upper left corner of the graph. This means that for all values of wc, the cost to a female in terms of numbers of false alarms is much lower for every correct choice than was the case in population A. This is because there is now a significant correlation between male song rate and male health, and the information provided by songs reduces errors in female decisions.

In population (C), the correlation between male song speed and male health is even stronger than in population (B). The difference between distribution means, d', is a much larger number and the curvature of the ROC plot towards the upper left corner of the graph is even stronger:

These examples suggest that one should be able to estimate the difference between the distribution means, d', by estimating the degree to which the curvature in the ROC plots deviates from the straight line expected when there is no correlation between signal and condition. And surprisingly, this measure of the amount of information can be obtained using receiver responses. Even more surprising is the observation that if both distributions are bell-shaped and have similar variances, any pair of Phit and Pfalse alarm values will fall on only one possible ROC curve corresponding to only one d' value. This means that we could estimate d' from examining the Phit and Pfalse alarm values of only a single female.

Standard units

The major point of measures of signal effectiveness is to be able to compare one signal set to another, or perhaps obtain an average value for how effective most threat signals or most alarm signals are. Clearly, one cannot compare d' values if the units for one signal set are in songs/second and another is in brightness of red plumage coloration. As long as the relevant distributions are bell-shaped (Gaussian) or can be made so with appropriate transformations, one can convert the w values in any distribution plot into z scores. This is a scaling widely used in statistics and computed as follows. If the mean of a normal distribution is μ, and its standard deviation is σ , (where σ = √ variance  ), then the z score for w is

z ( w ) = w μ σ

We can thus replot any original probability distribution of w values as a probability distribution of z(w) values. This distribution will have its maximum when z(w) = 0 (e.g. when w = μ), and all z(w) values to the left of this peak will be negative (e.g. w < μ), and all z(w) values to the right of the peak will be positive (w > μ ). The difference between the means of two z-scaled distributions, d', will then be given as a multiple of their common standard deviation (if it is the same for both), or as a multiple of their average standard deviation (if they are different). Because d' is measured in standard deviation units, decreasing the average standard deviation of the distributions is equivalent to increasing the distances between their means: either reduces overlap between the distributions, and thus reduces errors.

Let the means for the two probability distributions be μ1 for healthy males and μ2 for sick males. We wish to convert the w axis for each distribution into z(w) values. For the first distribution,

z 1 ( w ) = w μ 1 σ

and for the second distribution and the same w,

z 2 ( w ) = w μ 2 σ

We note that

z 2 ( w ) z 1 ( w ) = μ 1 μ 1 σ = d'

which is the measure  we seek. We can thus estimate d' if we can estimate z1(w) and z2(w) from observations of a female’s decisions.

Applying the method

Suppose we perform our playback experiments on a female using songs of sick and healthy males from the same population. We now have values for Phit and Pfalse alarm for that female. Most statistics texts have tables in the back listing the area below a normal probability curve to the right or the left of some cutoff value of a z score. Usually the reader has a z value and wants to know the corresponding probability. In our case, we know the probability, but would like to know the corresponding z score. We thus locate the measured probabilities Phit and Pfalse alarm in this table, and then find the corresponding values zhit and zfalse alarm respectively. Since these z scores are based upon the same w, in this case that female’s wc, we can use their difference to compute d' = zhit – z false alarm. To provide a feeling for the scale of this measure, a receiver which correctly identifies both sick and healthy males 50% of the time (e.g., chance) has a d' = 0, that which is accurate 70% of the time has a d' = 1.04, that accurate 90% of the time has a d' = 2.56, and that accurate 99% of the time will have a d' = 4.65.

Additional measures from signal detection theory

An additional parameter of signal detection theory that can be extracted from Phit and Pfalse alarm data is bias: this is the degree to which a female is conservative about accepting males, and thus avoids false alarm errors at the expense of having more miss errors. It thus reflects the value of information independently of the amount of information. The simplest measure of bias is the criterion index c: it can be computed as c = – 0.5 (zhit + z false alarm). A female that has no bias accepts equal numbers of false alarms and miss errors (e.g. Pfalse alarm = 1 – Phit ), and their bias c = 0. When females avoid false alarms, c > 0, and when they avoid misses, c < 0. For any observed combination of Phit and Pfalse alarm, c depends upon the distance between that point and the diagonal running from top left to lower right corner of the ROC plot.

It is also possible to estimate the likelihood ratio parameter β, which is equal to the ratio of the likelihoods that a male is healthy to the likelihood that he is sick (see Web Topic 8.4 for derivation). It can be computed using ln (β) = c d' if the female is making optimal decisions. Using hit rates and false alarm rates, we can rewrite this as ln(β) = – 0.5 [zhit – zfalse alarm]2.

Non-normal distributions or unequal variances

If we know that the distributions of w for healthy and sick males are normally distributed with equal variances, we saw that we do not have to compute an entire ROC curve to obtain estimates of d', c, and β: instead, one pair of hit and false alarm rates will do. However, distributions may not be normal or have equal variances. The only way to detect this is to plot the ROC curve by obtaining data from multiple females or by manipulating one female's prior probabilities or payoff values. We can still compute a single d', c, and β from such a situation; however the analysis is more complicated than that given here. See MacMillan and Creelman (1991) for details.

Further reading

Macmillan, N. A. and C.D. Creelman. 2004. Detection Theory: A User’s Guide. 2nd Edition. New York: Cambridge University Press.

Wiley, R.H. 1994. Errors, exaggeration, and deception in animal communication. In Behavioral Mechanisms in Evolutionary Biology, (L.A. Real, ed.). Chicago: Chicago University Press. pp. 157–189.