| Why GLM? One quantitative description of inspection capability is the cracksize, a,
which can be detected with at least 90% probability, established with 95% confidence. This
is equivalent to finding a such that the lower confidence bound on [Pr(detect|a)
=0.9] is 0.95. The NDE community calls this cracksize a90/95.
Unfortunately this conveys little about the relationship between size and detectability
since infinitely many POD vs. cracksize curves can share the same a90/95
depending on appropriate combinations of capability (mean
cracksize having 90% POD) and experimental uncertainty. One method used to
establish a90/95 is based on "randomly" selecting 29
specimens, all with same cracksize, and observing 29 successes in 29 inspections. (No
provision is made for fewer than 29 successes other than a re-test.) Although the maximum
likelihood estimate of the underlying POD would be 1, the conventional
interpretation is POD = 0.9 with a 95% confidence, based on simple binomial
calculations.
The "29 of 29"method is based on untenable statistical underpinnings, yet enjoys widespread acceptance
in the NDE community largely because it is easy to implement. Better methods have been
suggested (Annis, et.al., 1989) based on inspections of cracks with different sizes and a
GLM modeling procedure. Wide acceptance of these methods has been slow, owing to their
requirement for specialized software. GLM procedures are available in sophisticated
statistics software packages but these are expensive and largely inaccessible to
nonstatisticians. The method described here removes this impediment.
Link Functions:
To begin, define probability of detection, pi = POD(ai),
as a function linked to the ith cracksize, ai. Common link
functions for binary data which map (-? < x < ?) into (0 < y < 1)
include the probit or inverse Normal function, the logit, logistic or log-odds function,
and the complementary log-log function, often called Weibull by engineers. These are:
probit f(x) = g(y)
=F-1(p)
logit f(x) = g(y)
= ln{ p/(1-p) }
complementary log-log f(x) = g(y)
= ln{ -ln(1-p) }
where f(x) is any polynomial sum, linear in the
parameters, and F()
is the standard normal cdf. Notice that when g(y) = y, the problem
reduces to an ordinary linear model, y = f(x). Since f(x) = g(y),
then y = g-1( f(x) ). We will refer to g-1()
as the link, and, using the probability of crack detection example, as POD(a).
probit(3) link: POD(a)
= 1 - F({log(a)-L}/S)
logit link: POD(a) = exp{L0+S0Log(a)}/[1+exp{L0+S0Log(a)}]
complementary log-log link; POD(a)
= 1 - exp{ -exp{L0+S0 log(a)}}
where L and S are model location and
scale parameters. (Note that F() is NOT a distribution of cracksizes, even though it has the same
mathematical form.) A comparison of the properties of these transforms can be found in
McCullagh and Nelder (1989).
The Likelihood Function:
For a given link function, the likelihood of L and
S, based on the result, y, of inspecting crack ai is
li( L, S | ai, yi) = pyi
? (1-pi)(1-y)
equation 1
which reduces to pi when yi is
1 and (1-pi) when yi is 0. This is the key
relationship on which the spreadsheet implementation is built.
A textbook development might proceed to describe the aggregate
likelihood for inspecting N independent cracks of different sizes as
|
n |
|
N-n |
|
L= |
P |
pi ? |
P |
(1 - pj) |
|
i=1 |
|
j=1 |
|
where n is the number of hits
(ones) and N-n is the number of misses (zeros); pi is the POD
given by the model for the ith hit and (1-pj) is the probability
for the jth miss. The observation would then be made that this repeated product
would prove computationally onerous, a difficulty greatly simplified by taking the
logarithm, thus transforming the series of products into one of sums.
|
n |
|
N-n |
|
ln(L)= |
S |
ln(pi) + |
S |
ln(1 - pj) |
equation 2 |
|
i=1 |
|
j=1 |
|
Finally, the model parameters would
be estimated so as to maximize this likelihood by differentiating equation 2 with respect
to the model parameters, which enter through the link, equating these derivatives to zero,
and solving the resulting equations simultaneously. Fortunately, the P/C spreadsheet can
streamline this tedious arithmetic by simply using the logarithms of the individual
likelihoods (equation 1) and the built-in SOLVER algorithm.
[First page] [back] [next] [Spreadsheets]
[References] |