Glossary | CREDIBLE

Aleatory Uncertainty – uncertainty inherent in the process itself (ie ‘inherent uncertainty’). Effectively irreducible, in contrast to epistemic uncertainty.

Bayesian Belief Network (also Belief Network or Bayesian Network) – graphical representation of a probabilistic dependency model. It consists of a set nodes representing variables, interconnected by arcs representing causal relationships between the variables.

Behavioural Model – a model that satisfies the limits of acceptability specified by the user. The limits of acceptability, or behavioural threshold, are often subjectively defined and represent the model performance required to provide an adequate representation of the system being modelled. Typically, behavioural models are selected from an ensemble of model simulations by comparing model output with observational data – using one or more performance metrics and assessing whether the model satisfies the defined levels of acceptability. Behavioural models are then retained for further analysis.

Boolean Logic – the classical logical calculus in which Truth(X) is restricted to 0 (FALSE) and 1 (TRUE).

Boundary/Initial Conditions – the terms boundary and initial may be used interchangeably to refer to the initial values of elements that may change inside the model. Boundary conditions may also refer to values along or across the boundaries of the model.

Calibration – learning the parameters of a model from observations of the system that it represents. There are a number of approaches, but usually they represent variations on probabilistic inversion, with more or less sophisticated accounting for structural uncertainty. Probabilistic inversion uses Bayes’s Theorem to compute the posterior distribution Pr(theta* | z), where theta* is the ‘correct’ value of the model parameters, and z are the observations. This distribution may be summarised, e.g. by its modal value, if a plug-in estimate of the parameters is required. One special case of calibration is to minimise the sum of squared differences between the model output and the observations. This makes some usually highly naive assumptions about the model’s structural uncertainty and the measurement errors.

Cat Model – catastrophe model, showing the probability of exceedence in a specified period, such as one year (vertical axis) against the loss (horizontal axis). Used by actuaries to advise on decisions regarding levels of risk transference (annual average loss is used for pricing, end/tail of the curve is used to assess risk). The area under the cat curve is equal to the expected loss, or risk.

Confidence Interval – used in the context of frequentist inference, a confidence interval is the range of values (calculated from the sample of observations) that is believed to contain the true parameter value with a particular probability (usually set to be 90%, 95% or 99%). See credible interval.

Credible Interval – used in the context of Bayesian inference, the credible interval is a range of values of a posterior distribution which is such that the density at any point inside the interval is greater than the density at any point outside, and the area under the curve for that interval is equal to a particular probability (usually set to be 90%, 95% or 99%). A crucial difference with confidence intervals used in the frequentist approach is that the credible interval specifies the range within which the parameters lie with a certain probability. See confidence interval.

Deep Uncertainty – uncertainty that cannot be described adequately in statistical terms or by a set of scenarios because different descriptions of the mechanisms and functional relationships being studied are available and there is little scientific basis for placing believable probabilities upon them (see Walker, Marchau, Swanson (2010). Addressing deep uncertainty using adaptive policies: Introduction to section 2, Technological Forecasting & Social Change, 77, 9171-923).

Deterministic – contains no randomness; the opposite of stochastic.

Deterministic Model – a model in which for a given input, the output is always the same. See stochastic model.

Distribution Function – function describing all the possible values and associated probability that a random variable can take within a given range.

Emerging Events – unforeseen events.

Epistemic Uncertainty – uncertainty due to incomplete knowledge. In principle reducible, in contrast to aleatory uncertainty.

Estimator – Any quantity that provides estimates of the value of an unknown quantity in the population from a sample of the same population.

Exposure – refers to the inventory of elements in an area in which hazard events may occur. If population and economic resources were not located in (exposed to) potentially dangerous settings, no problem of disaster risk would exist. It is used in the insurance industry to indicate the presence and size of insurance contracts for damage to property. It is the possibility of loss based on degree of vulnerability. Loss (at a location) is exposure X vulnerability.

It is possible to be exposed but not vulnerable, for example by living in a floodplain but having sufficient means to modify building structure and behaviour to mitigate potential loss. However to be vulnerable to an extreme event, it is necessary to also be exposed.

Exposure Data – input into Cat Model, such as: site locations (a.k.a. geocoding data); physical characteristics of structures (e.g. construction, occupancy, year built, number of stories); financial terms of insurance coverage (e.g. coverage value, limit, deductible).

Frequentist Vs Bayesian – the frequentist definition sees probability as the long-run expected frequency of occurrence of an event. So, for example, the probability of occurrence of an event A is defined as P(A)=n/N , where n is the number of times event A occurs in N opportunities. The Bayesian view of probability is related to the degree of belief, and it is a measure of the plausibility of an event given incomplete knowledge. To evaluate the probability of a hypothesis from a Bayesian viewpoint, it is necessary to specify some prior probability, which is then updated in the light of the observed data through the Bayes’ theorem.

Fuzzy Set Theory – An uncertainty calculus in which Truth(X) is not restricted to 0 (FALSE) and 1 (TRUE). The primitive Boolean Logical operations of AND, OR, and NOT are retained. For example, in Boolean Logic X AND Y = min{ Truth(X), Truth(Y) }. This still holds in Fuzzy Logic, except that the result may lie in the range [0, 1], rather than in the set {0, 1}.

Hazard – source of potential harm or damage (harm: to living things, damage: to property). See loss.

Hazard Event – a single hazard event; together the collection of hazard events make up the hazard outcome.

Hazard Footprint – the manifestation of a hazard event in space and time. For example, an earthquake at a fault creates seismic waves that travel through the surrounding area, and so the displacement of the surface of the ground at a given location will depend on the location and the elapsed time since the event.

Hazard Function – confusingly, the event rate of a stochastic process at time t, conditional on survival until at least time t. This ‘hazard’ has nothing to do with natural hazards.

Hazard Outcome – the set of hazard events over some location and time interval. The hazard outcome is a realisation of the stochastic process that describes a hazard’s aleatory uncertainty.

Hypothesis Testing – statistical method for testing a hypothesis about a parameter in a population using data measured in a sample.

Imprint Operator – a neologism (newly-coined word) to describe the subject-specific experience of a hazard event (or series of events). See Loss.

Inference – Body of statistical techniques that deal with the information from a sample in order to draw conclusions about the whole population.

Inherent Uncertainty – see aleatory uncertainty.

Initial/Boundary Conditions – the terms boundary and initial may be used interchangeably to refer to the initial values of elements that may change inside the model. Boundary conditions may also refer to values along or across the boundaries of the model.

Input Forcing/Variable – this is the data that are used to provide the input conditions required to drive a model. For example, the input forcing for a rainfall-runoff model would be the rainfall.

Irreducible Uncertainty – see aleatory uncertainty.

Likelihood Function – function indicating how likely a particular population is to produce an observed sample. Since the likelihood function is a monotonically increasing function, the logarithm of this function, called log-likelihood, it is usually of simpler form.

Loss – generic term for quantification of the negative impact of a hazard outcome on an entity. Thus ‘loss of life’, ‘loss of environmental services’, ‘loss of property’, ‘loss of structural integrity’, ‘loss of money’. Risk is defined as expected loss.

Markov Chain Monte Carlo – a class of computer algorithms that generates a Markov Chain (e.g. a random walk) in which the target distribution is the stationary distribution. Samples from the Markov Chain represent /dependent/ realisations of the target distribution. See also Monte Carlo Method.

Mean – arithmetic average of all values.

Model Limitations – accounting for limitations in the model when the model is used to predict the underlying system. All models are abstractions, and hence the parameters in the model do not necessarily relate directly to observable quantities in the underlying system. Moreover, often the model solution can only be found approximately (e.g. for models expressed as differential equations). Thus models all have parametric uncertainty. Additionally, even were the correct value of the parameters for the model to be known, then the model evaluated at those parameter values would not necessarily exactly represent the system. Thus all models also have structural uncertainty. These two sources of uncertainty can be represented probabilistically. Denote the model parameters as theta, and the ‘correct’ value of these parameters as theta*. Then the joint distribution over the system values Y and the correct model parameters theta* can be written in factorised form Pr(Y, theta*) = Pr(Y | theta*) P(theta*). The first term represents structural uncertainty, and the second represents parametric uncertainty. See also calibration.

Monte Carlo Method – a class of computer algorithms that repeats a calculation over a collection of realisations sampled independently from the same underlying distribution. The output is a collection of independent values used to proxy a distribution. The consistency of estimators (statistics used to estimate unknown parameters) based on the output of the Monte Carlo method is justified by the Weak Law of Large Numbers, which states that the sample average converges in probability towards the expected value. See also Markov Chain Monte Carlo, stochastic modelling, probabilistic modelling.

Non-Stationary Process – a processes that is not a stationary process. Examples of non-stationary processes are random walks with or without a drift (a slow steady change) and deterministic trends (trends that are constant, positive or negative, independent of time and/or space).

Ontological Uncertainty – uncertainty in the very definition of the quantities of interest (for instance, the meaning of the variables in a model). It may arise from mismatches in perception of the same problem by different stakeholders or when there the definition of variables implies subjective values. It may be considered a special type of epistemic uncertainty in those cases where it might be resolved by clarifying the conceptualization of the problem.

Output – data or predictions that are calculated by a model and are a function of the model input, structure and parameters.

Parameter – a notable characteristic that defines a system and helps determine its behaviour. Parameters calculated from the data are only estimates of the true value, thus uncertainty in the parameter values is one of the major sources of uncertainty in the model outputs. Parameters may have a physical explanation, and may be determined by measurement. But often they do not and are simply used to account for unmeasurable or poorly understood processes. Sensitivity analysis or Monte Carlo sampling is often required to explore which parameters improve the representation of the system being modelled and which can be ignored as having little effect on the outcome.

Parametric Uncertainty – in models of complex processes, uncertainty about the model parameters. See structural uncertainty, model limitations.

PDF – probability density function. For absolutely continuous random quantities, the function f(x) for which Pr{x < X <= x + dx} = f(x) dx. Found by differentiating the distribution function.

Population – complete set of objects of a similar nature which is of interest.

Posterior Distribution – in Bayesian inference, it is a conditional probability distribution summarizing the state of knowledge about unknown quantities (such as parameters, missing data, latent variables and models) given the observed data.

Prior Distribution – in Bayesian inference, it is the information about an unknown parameter that is combined with the probability distribution of the data (likelihood) to provide the posterior distribution which is used for future inference of the parameter.

Probabilistic Modelling – see stochastic modelling.

Probability – a formal framework for describing uncertainty. The probability calculus describes the rules by which probabilities are manipulated. These rules are accepted by all adherents (and are consistent with Boolean Logic), even though there are several radically different interpretations of probability itself. The most common ones are: classical probability (deals in equally-likely events), frequentist probability (deals in collections of similarly-distributed events), Bayesian probability (uses probability to quantify degree of belief), and logical probability (like Bayesian probability, except it makes the additional assertion that degrees of belief need not be subjective). There is a long-running debate about the propriety of Bayesian probability in science. Partly this is due to de Finetti’s inflammatory use of the word “subjective” (which upsets scientists who would like to be “objective”). Partly it is due to the requirement that all uncertainties must be quantified as probability distributions, which seems unreasonably hard in many situations.

Probability Calculus – the basic rules for assessing probabilities:

The starting-point is a finite set of primitive (basic) propositions, Omega. The probability calculus gives the rules that any probability assessment on subsets of Omega must satisfy in order to be logically consistent. These are: (1) Pr{A} >= 0 for all subsets A; (2) Pr{Omega} = 1; (3) if A and B are disjoint (have no element in common), then Pr{A, B} = Pr{A} + Pr{B}, where Pr{A, B} is the probability of A AND B. That’s it (there is an extension to infinite Omega). From this it is possible to infer rules such as Pr{not A} = 1 – Pr{A} and Pr{A OR B} = Pr{A} + Pr{B} – Pr{A, B}. A more complicated but very useful rule is the Law of Total Probability. If P = {P_1, …, P_n} is a partition of Omega — ie P_i and P_j are disjoint if i and j differ, and the union of all the P_i is equal to Omega — then Pr{A} = sum_i Pr{A, P_i}.

In the Bayesian approach, the quantity Pr{A} can be operationalised (defined in terms of measurable quantities), for practical applications, as the value to a person of a bet that pays $1 if A turns out to be TRUE and $0 if A turns out to be FALSE. Other probability approaches require additional restrictions that limit the domain of probabilities and would, generally, exclude probability from being an everyday quantity.

Conditional probability is defined as Pr(A | B} = Pr(A, B} / Pr{B}, where Pr{A, B} is the probability of A AND B. Note that this is a definition, although, according to the Bayesian approach, it follows as a theorem from more primitive operational concepts (/*explain*/). Formally, we should always adhere to the meaning inherent in the definition. In practice, however, it is convenient and not misleading to think of Pr{A | B} as “the probability of A, supposing B”. Combining conditional probabilities and the Law of Total Probability gives Pr{A} = sum_i Pr{A | P_i} Pr{P_i}.

Probability Mass Function – for discrete random quantities, the function f(x) = Pr{X = x}; i.e. a function that gives the probability that a discrete random variable is exactly equal to some value; see PDF for continuous random quantities.

Probability Notation – very tricky area. The following conventions are widespread:

Majuscule (upper case) roman letters indicate uncertainty quantities (scalar or vector), while minuscule (lower case) letters indicate particular values. Hence X is an uncertain quantity, while x is one particular value that X might take. Late-alphabet characters (U-Z) are typically numerical values, while early alphabet characters (A-E) are propositions, that can take only the values 0 (FALSE) and 1 (TRUE). Where this majuscule/minuscule convention is not used, some other convention, like ornamentation (eg using tildes ~) is often used. This is because it is crucial to distinguish between an uncertain quantity and a particular value it must take, but very confusing to use two unrelated symbols.

Pr{A} represents the probability of the proposition A. F() represents a Distribution Function. Hence F(x) = Pr{X <= x}. If X is absolutely continuous, then f() represents the PDF. If X is discrete, f() represents the Probability Mass Function. The functions F() and f() are typically distinguished by their arguments, so that F(x) is the distribution function of X, and F(y) the distribution function of Y. When it is necessary to be more specific, a subscript is used after the ‘F’ or ‘f’, eg F_Y(y). Where F() or f() depends on parameters (like the mean and variance of the scalar normal distribution), these are given after a semi-colon, eg F(x; mu, sigma^2).

Typically, E{X} is the expected value of X (scalar or vector), Var{X} is the variance (scalar or matrix), and Cov{X, Y} is the covariance between X and Y (scalar or matrix). It is useful to remember that Pr{A} = E{1[A]}, where 1[A] is the indicator function (ie 1[A] = 1 if A is TRUE, and 0 if A is FALSE). On this basis, expectation is a more primitive concept than probability.

The operation of summing over probability-weighted instances of a random variable is denoted using the integral operator, with respect to the measure dF(). Hence the expected value of X is denoted E{X} = int x dF(x). Where X is absolutely continuous, dF(x) = f(x) dx. Where X is discrete, the integral becomes the sum, E{X} = sum_i x_i f(x_i). This integral is known as the Stieltjes Integral. Its main purpose, as far as we are concerned, is to abstract from the issue of whether X is absolutely continuous, or discrete, or something else. //General framework has been rewritten to avoid use of this integral

Conditional probabilities are indicated with a vertical bar. Thus Pr{A | B} is read “the probability of A conditional upon B”. Pr{X <= x | Y = y} might also be written as F(x | y).

Random Walk – a series of sequential movements in which the direction and size of each move is randomly determined.

Recurrence Interval – see return period.

Resilience – the ability of a system or group to return to its former state after being changed or disturbed. Communities can improve their resilience by preparing for certain events e.g. community evacuation plans for severe flooding or volcanic eruption.

Return Period – defined to be the time-interval divided by the probability of exceedence of some specified threshold during that time interval. In stationary processes, one can take the time-interval to be one year (typically), and then the return period for threshold v is the expected number of years until an event of size v or greater occurs. For non-stationary stochastic processes, the concept can be rather confusing.

Risk – has a multitude of definitions. Commonly used measures of risk include expected loss (area under the exceedence probability curve), see loss. Risk = hazard x vulnerability.

Risk Assessment – the process of quantifying uncertainty and risk, and of describing qualitatively the additional sources of uncertainty that were not taken account of explicitly.

Risk Management – the decision-based activity of managing and communicating risk, comprising: early warning systems, response, recovery, and mitigation. Takes risk assessment as an input, along with other factors such as the public perception of risk.

Sample – portion of the elements of a population.

Sensitivity Analysis – in a narrow (local) sense, it is the study of how the output of a numerical model changes when one or more of its input factors are varied. In a broader (global) sense, the study of how different input factors contribute to the global variability (uncertainty) of the model output. In other words, local SA focuses on the behaviour of the model output when introducing some variations in a specific setting of the input factors, while global SA considers the entire output distribution (or “response surface”) that is obtained by varying all the input factors within their feasible ranges.

State Variables – describe the ‘state’ of a dynamic system at a given time in a way that determines its future behaviour.

Stationary Process – a stochastic process for which the distribution function is invariant to arbitrary shifts in the temporal and/or spatial index. A strong version (making reference to the distribution function) of a weakly stationary process.

Stochastic – containing some randomness; the opposite of deterministic.

Stochastic Model – a model (thought of as a function) that has the potential to return a different set of outputs when run at the same set of inputs. See deterministic model. See also Monte Carlo Method.

Structural Uncertainty – in models of complex processes, the uncertainty
that remains even were the parameters to be correctly chosen. See parametric uncertainty, model limitations.

Time Series – a collection of observations made sequentially through time, typically at uniformly spaced intervals.

Transfer Function – describes the linear relationship between the input and output of a system. Can be defined in either discrete or continuous time. The order of the transfer function is given by the order of the denominator.

Uncertainty – imperfect knowledge. See aleatory uncertainty, epistemic uncertainty. Typically quantified as probability.

[Uncertainty and risk have many definitions and so must be clearly defined in each context. For us, uncertainty is the probability of a hazard of a specific magnitude (always with reference to a region and a time-interval), and risk is the product of uncertainty and the resulting harm/damage. Thus it makes sense to talk of, eg, “the uncertainty of a magnitude 6 earthquake (in the Abruzzo region in the next ten years)”, and of the risk of such an earthquake, expressed (variously, depending on your point of view) in terms of mortality, morbidity, loss of environmental services, cost of damage to property.]

Uncertainty Analysis – provides a quantitative description of the uncertainty in the output of a numerical model. To some extent overlapping with global sensitivity analysis, some authors make the distinction that uncertainty analysis aims at quantifying the uncertainty in the model output, global sensitivity analysis at apportioning such uncertainty to the various input factors of the model.

Validation – the process of checking that a model that has been calibrated for a specific purpose is indeed representative of the system within the limits of acceptability, which may be dependent on the purpose of the model. Validation is often carried out using split samples – a time-series is split into two; part is used to calibrate the model and the other part to validate it. Few environmental models can be truly validated due to the approximations inherent in the modelling process however few can be totally rejected either.

Vulnerability – refers to the propensity of exposed elements such as human beings, their livelihoods, and assets to suffer adverse effects when impacted by hazard events. See vulnerability operator

Vulnerability Function – see vulnerability operator

Vulnerability Operator – the impact of a hazard event, expressed as a function of that hazard function; e.g. the maximum hazard over the duration of the event.

Weakly Stationary Process – a stochastic process X(t) with constant mean, E{X(t)} = mu for all t, and a covariance that depends only on the separation (temporal and/or spatial), Cov{X(t), X(t’)} = kappa(|t – t’|) for all t and t’.