{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "93f97204",
   "metadata": {},
   "source": [
    "# Fault Tree Uncertainties"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d32597b",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "This lecture puts elementary tools to work to approximate probability distributions of the annual failure rates of a system consisting of\n",
    "a number of critical parts.\n",
    "\n",
    "We’ll use log normal distributions to approximate probability distributions of critical  component parts.\n",
    "\n",
    "To  approximate the probability distribution of the **sum** of $ n $ log normal probability distributions that describes the failure rate of the\n",
    "entire system, we’ll compute the convolution of those $ n $ log normal probability distributions.\n",
    "\n",
    "We’ll use the following concepts and tools:\n",
    "\n",
    "- log normal distributions  \n",
    "- the convolution theorem that describes the probability distribution of the sum independent random variables  \n",
    "- fault tree analysis for approximating a failure rate of a multi-component system  \n",
    "- a hierarchical probability model for describing uncertain probabilities  \n",
    "- Fourier transforms and inverse Fourier tranforms as efficient ways of computing convolutions of sequences  \n",
    "\n",
    "\n",
    "For more about Fourier transforms see this quantecon lecture [Circulant Matrices](https://python.quantecon.org/eig_circulant.html)\n",
    "as well as these  lecture  [Covariance Stationary Processes](https://python-advanced.quantecon.org/arma.html) and [Estimation of Spectra](https://python-advanced.quantecon.org/estspec.html).\n",
    "\n",
    "El-Shanawany, Ardron,  and Walker [[El-Shanawany *et al.*, 2018](https://python.quantecon.org/zreferences.html#id23)] and Greenfield and Sargent [[Greenfield and Sargent, 1993](https://python.quantecon.org/zreferences.html#id22)]  used some of the methods described here  to approximate probabilities of failures of safety systems in nuclear facilities.\n",
    "\n",
    "These methods respond to some of the recommendations made by Apostolakis  [[Apostolakis, 1990](https://python.quantecon.org/zreferences.html#id21)] for constructing procedures for quantifying\n",
    "uncertainty about the reliability of a safety system.\n",
    "\n",
    "We’ll start by bringing in some Python machinery."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "302a682c",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "!pip install tabulate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71f58bc0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from scipy.signal import fftconvolve\n",
    "from tabulate import tabulate\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ed58342",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "np.set_printoptions(precision=3, suppress=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7595174",
   "metadata": {},
   "source": [
    "## Log normal distribution\n",
    "\n",
    "If a random variable $ x $ follows a normal distribution with mean $ \\mu $ and variance $ \\sigma^2 $,\n",
    "then the natural logarithm of $ x $, say $ y = \\log(x) $, follows a **log normal distribution** with parameters $ \\mu, \\sigma^2 $.\n",
    "\n",
    "Notice that we said **parameters** and not **mean and variance** $ \\mu,\\sigma^2 $.\n",
    "\n",
    "- $ \\mu $ and $ \\sigma^2 $ are the mean and variance of $ x = \\exp (y) $  \n",
    "- they are **not** the mean and variance of $ y $  \n",
    "- instead, the  mean of $ y $ is $ e ^{\\mu + \\frac{1}{2} \\sigma^2} $ and the variance of $ y $ is $ (e^{\\sigma^2} - 1) e^{2 \\mu + \\sigma^2} $  \n",
    "\n",
    "\n",
    "A log normal  random variable $ y $ is nonnegative.\n",
    "\n",
    "The density for a log normal random variate $ y $ is\n",
    "\n",
    "$$\n",
    "f(y) = \\frac{1}{y \\sigma \\sqrt{2 \\pi}} \\exp \\left(  \\frac{- (\\log y - \\mu)^2 }{2 \\sigma^2} \\right)\n",
    "$$\n",
    "\n",
    "for $ y \\geq 0 $.\n",
    "\n",
    "Important features of a log normal random variable are\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    " \\textrm{mean:} & \\quad e ^{\\mu + \\frac{1}{2} \\sigma^2} \\cr\n",
    " \\textrm{variance:}  & \\quad (e^{\\sigma^2} - 1) e^{2 \\mu + \\sigma^2} \\cr\n",
    "  \\textrm{median:} & \\quad e^\\mu \\cr\n",
    " \\textrm{mode:} & \\quad e^{\\mu - \\sigma^2} \\cr\n",
    " \\textrm{.95 quantile:} & \\quad e^{\\mu + 1.645 \\sigma} \\cr\n",
    " \\textrm{.95-.05 quantile ratio:}  & \\quad e^{1.645 \\sigma} \\cr\n",
    " \\end{aligned}\n",
    "$$\n",
    "\n",
    "Recall the following *stability* property of two independent normally distributed random variables:\n",
    "\n",
    "If $ x_1 $ is normal with mean $ \\mu_1 $ and variance $ \\sigma_1^2 $ and $ x_2 $ is independent of $ x_1 $ and normal with mean $ \\mu_2 $ and variance $ \\sigma_2^2 $, then $ x_1 + x_2 $ is normally distributed with\n",
    "mean $ \\mu_1 + \\mu_2 $ and variance $ \\sigma_1^2 + \\sigma_2^2 $.\n",
    "\n",
    "Independent log normal distributions have a different *stability* property.\n",
    "\n",
    "The **product** of  independent log normal random variables is also log normal.\n",
    "\n",
    "In particular, if $ y_1 $ is log normal with parameters $ (\\mu_1, \\sigma_1^2) $ and\n",
    "$ y_2 $ is log normal with parameters $ (\\mu_2, \\sigma_2^2) $, then the product $ y_1 y_2 $ is log normal\n",
    "with parameters $ (\\mu_1 + \\mu_2, \\sigma_1^2 + \\sigma_2^2) $.\n",
    "\n",
    ">**Note**\n",
    ">\n",
    ">While the product of two log normal distributions is log normal, the **sum** of two log normal distributions is **not** log normal.\n",
    "\n",
    "This observation sets the stage for challenge that confronts us in this lecture, namely, to approximate probability distributions of **sums** of independent log normal random variables.\n",
    "\n",
    "To compute the probability distribution of the sum of two log normal distributions, we can use the following convolution property of a probability distribution that is a sum of independent random variables."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0137fed1",
   "metadata": {},
   "source": [
    "## The Convolution Property\n",
    "\n",
    "Let $ x $ be a random variable with probability density $ f(x) $, where $ x \\in {\\bf R} $.\n",
    "\n",
    "Let $ y $ be a random variable with probability density $ g(y) $, where $ y \\in {\\bf R} $.\n",
    "\n",
    "Let $ x $ and $ y $ be independent random variables and let $ z = x + y \\in {\\bf R} $.\n",
    "\n",
    "Then the probability distribution of $ z $ is\n",
    "\n",
    "$$\n",
    "h(z) = (f * g)(z) \\equiv \\int_{-\\infty}^\\infty f (z) g(z - \\tau) d \\tau\n",
    "$$\n",
    "\n",
    "where  $ (f*g) $ denotes the **convolution** of the two functions $ f $ and $ g $.\n",
    "\n",
    "If the random variables are both nonnegative, then the above formula specializes to\n",
    "\n",
    "$$\n",
    "h(z) = (f * g)(z) \\equiv \\int_{0}^\\infty f (z) g(z - \\tau) d \\tau\n",
    "$$\n",
    "\n",
    "Below, we’ll use a discretized version of the preceding formula.\n",
    "\n",
    "In particular, we’ll replace both $ f $ and $ g $ with discretized counterparts, normalized to sum to $ 1 $ so that they are probability distributions.\n",
    "\n",
    "- by **discretized** we mean an equally spaced sampled version  \n",
    "\n",
    "\n",
    "Then we’ll use the following version of the above formula\n",
    "\n",
    "$$\n",
    "h_n = (f*g)_n = \\sum_{m=0}^\\infty f_m g_{n-m} , n \\geq 0\n",
    "$$\n",
    "\n",
    "to compute a discretized version of the probability distribution of the sum of two random variables,\n",
    "one with probability mass function $ f $, the other with probability mass function $ g $.\n",
    "\n",
    "Before applying the convolution property to sums of log normal distributions, let’s practice on some simple discrete distributions.\n",
    "\n",
    "To take one  example, let’s consider the following two probability distributions\n",
    "\n",
    "$$\n",
    "f_j = \\textrm{Prob} (X = j), j = 0, 1\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "g_j = \\textrm{Prob} (Y = j ) , j = 0, 1, 2, 3\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "h_j = \\textrm{Prob} (Z \\equiv X + Y = j) , j=0, 1, 2, 3, 4\n",
    "$$\n",
    "\n",
    "The convolution property tells us that\n",
    "\n",
    "$$\n",
    "h = f* g = g* f\n",
    "$$\n",
    "\n",
    "Let’s compute  an example using the `numpy.convolve` and `scipy.signal.fftconvolve`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "773c8fad",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "f = [.75, .25]\n",
    "g = [0., .6,  0., .4]\n",
    "h = np.convolve(f,g)\n",
    "hf = fftconvolve(f,g)\n",
    "\n",
    "print(\"f = \", f,  \", np.sum(f) = \", np.sum(f))\n",
    "print(\"g = \", g, \", np.sum(g) = \", np.sum(g))\n",
    "print(\"h = \", h, \", np.sum(h) = \", np.sum(h))\n",
    "print(\"hf = \", hf, \",np.sum(hf) = \", np.sum(hf))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ea0c908",
   "metadata": {},
   "source": [
    "A little later we’ll explain some advantages that come from using `scipy.signal.ftconvolve` rather than `numpy.convolve`.numpy program convolve.\n",
    "\n",
    "They provide the same answers but `scipy.signal.ftconvolve` is much faster.\n",
    "\n",
    "That’s why we rely on it later in this lecture."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96ad4823",
   "metadata": {},
   "source": [
    "## Approximating Distributions\n",
    "\n",
    "We’ll construct an example to verify that  discretized distributions can do a good job of approximating  samples drawn from underlying\n",
    "continuous distributions.\n",
    "\n",
    "We’ll start by generating samples of size 25000 of three independent  log normal random variates as well as pairwise and triple-wise sums.\n",
    "\n",
    "Then we’ll plot  histograms and compare them with convolutions of appropriate discretized log normal distributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "638a4ff2",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "## create sums of two and three log normal random variates ssum2 = s1 + s2 and ssum3 = s1 + s2 + s3\n",
    "\n",
    "\n",
    "mu1, sigma1 = 5., 1. # mean and standard deviation\n",
    "s1 = np.random.lognormal(mu1, sigma1, 25000)\n",
    "\n",
    "mu2, sigma2 = 5., 1. # mean and standard deviation\n",
    "s2 = np.random.lognormal(mu2, sigma2, 25000)\n",
    "\n",
    "mu3, sigma3 = 5., 1. # mean and standard deviation\n",
    "s3 = np.random.lognormal(mu3, sigma3, 25000)\n",
    "\n",
    "ssum2 = s1 + s2\n",
    "\n",
    "ssum3 = s1 + s2 + s3\n",
    "\n",
    "count, bins, ignored = plt.hist(s1, 1000, density=True, align='mid')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9de3ca2f",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "count, bins, ignored = plt.hist(ssum2, 1000, density=True, align='mid')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ffdeb12",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "count, bins, ignored = plt.hist(ssum3, 1000, density=True, align='mid')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4023d415",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "samp_mean2 = np.mean(s2)\n",
    "pop_mean2 = np.exp(mu2+ (sigma2**2)/2)\n",
    "\n",
    "pop_mean2, samp_mean2, mu2, sigma2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bca321ad",
   "metadata": {},
   "source": [
    "Here are helper functions that create a discretized version of a log normal\n",
    "probability density function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f909cc1c",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def p_log_normal(x,μ,σ):\n",
    "    p = 1 / (σ*x*np.sqrt(2*np.pi)) * np.exp(-1/2*((np.log(x) - μ)/σ)**2)\n",
    "    return p\n",
    "\n",
    "def pdf_seq(μ,σ,I,m):\n",
    "    x = np.arange(1e-7,I,m)\n",
    "    p_array = p_log_normal(x,μ,σ)\n",
    "    p_array_norm = p_array/np.sum(p_array)\n",
    "    return p_array,p_array_norm,x"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5e56bec",
   "metadata": {},
   "source": [
    "Now we shall set a grid length $ I $ and a grid increment size $ m =1 $ for our discretizations.\n",
    "\n",
    ">**Note**\n",
    ">\n",
    ">We set $ I $ equal to a power of two because we want to be free to use a Fast Fourier Transform\n",
    "to compute a convolution of two sequences (discrete distributions).\n",
    "\n",
    "We recommend experimenting with different values of the power $ p $ of 2.\n",
    "\n",
    "Setting it to 15 rather than 12, for example, improves how well the discretized probability mass function approximates the original continuous probability density function being studied."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c43a5c5",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "p=15\n",
    "I = 2**p # Truncation value\n",
    "m = .1 # increment size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8421fa46",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "## Cell to check -- note what happens when don't normalize!\n",
    "## things match up without adjustment. Compare with above\n",
    "\n",
    "p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)\n",
    "## compute number of points to evaluate the probability mass function\n",
    "NT = x.size\n",
    "\n",
    "plt.figure(figsize = (8,8))\n",
    "plt.subplot(2,1,1)\n",
    "plt.plot(x[:int(NT)],p1[:int(NT)],label = '')\n",
    "plt.xlim(0,2500)\n",
    "count, bins, ignored = plt.hist(s1, 1000, density=True, align='mid')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37b127d1",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# Compute mean from discretized pdf and compare with the theoretical value\n",
    "\n",
    "mean= np.sum(np.multiply(x[:NT],p1_norm[:NT]))\n",
    "meantheory = np.exp(mu1+.5*sigma1**2)\n",
    "mean, meantheory"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a23b7777",
   "metadata": {},
   "source": [
    "## Convolving Probability Mass Functions\n",
    "\n",
    "Now let’s use the convolution theorem to compute the probability distribution of a sum of the two log normal random variables we have parameterized above.\n",
    "\n",
    "We’ll also compute the probability of a sum of three log normal distributions constructed above.\n",
    "\n",
    "Before we do these things, we shall explain our choice of Python algorithm to compute a convolution\n",
    "of two sequences.\n",
    "\n",
    "Because the sequences that we convolve are long, we use the `scipy.signal.fftconvolve` function\n",
    "rather than the numpy.convove function.\n",
    "\n",
    "These two functions give virtually equivalent answers but for long sequences `scipy.signal.fftconvolve`\n",
    "is much faster.\n",
    "\n",
    "The program `scipy.signal.fftconvolve` uses fast Fourier transforms and their inverses to calculate convolutions.\n",
    "\n",
    "Let’s define the Fourier transform and the inverse Fourier transform.\n",
    "\n",
    "The **Fourier transform** of a sequence $ \\{x_t\\}_{t=0}^{T-1} $ is  a sequence of complex numbers\n",
    "$ \\{x(\\omega_j)\\}_{j=0}^{T-1} $ given by\n",
    "\n",
    "\n",
    "<a id='equation-eq-ft1'></a>\n",
    "$$\n",
    "x(\\omega_j) = \\sum_{t=0}^{T-1} x_t \\exp(- i \\omega_j t) \\tag{13.1}\n",
    "$$\n",
    "\n",
    "where $ \\omega_j = \\frac{2 \\pi j}{T} $ for $ j=0, 1, \\ldots, T-1 $.\n",
    "\n",
    "The **inverse Fourier transform** of the sequence $ \\{x(\\omega_j)\\}_{j=0}^{T-1} $ is\n",
    "\n",
    "\n",
    "<a id='equation-eq-ift1'></a>\n",
    "$$\n",
    "x_t = T^{-1} \\sum_{j=0}^{T-1} x(\\omega_j) \\exp (i \\omega_j t) \\tag{13.2}\n",
    "$$\n",
    "\n",
    "The sequences $ \\{x_t\\}_{t=0}^{T-1} $ and $ \\{x(\\omega_j)\\}_{j=0}^{T-1} $ contain the same information.\n",
    "\n",
    "The pair of equations [(13.1)](#equation-eq-ft1) and [(13.2)](#equation-eq-ift1) tell how to recover one series from its Fourier partner.\n",
    "\n",
    "The program `scipy.signal.fftconvolve` deploys  the theorem that  a convolution of two sequences $ \\{f_k\\}, \\{g_k\\} $ can be computed in the following way:\n",
    "\n",
    "- Compute Fourier transforms $ F(\\omega), G(\\omega) $ of the $ \\{f_k\\} $ and $ \\{g_k\\} $ sequences, respectively  \n",
    "- Form the product $ H (\\omega) = F(\\omega) G (\\omega) $  \n",
    "- The convolution of $ f * g $ is the inverse Fourier transform of $ H(\\omega) $  \n",
    "\n",
    "\n",
    "The **fast Fourier transform** and the associated **inverse fast Fourier transform** execute these\n",
    "calculations very quickly.\n",
    "\n",
    "This is the algorithm that   `scipy.signal.fftconvolve` uses.\n",
    "\n",
    "Let’s do a warmup calculation that compares the times taken by `numpy.convove` and `scipy.signal.fftconvolve`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82b117cc",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "\n",
    "p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)\n",
    "p2,p2_norm,x = pdf_seq(mu2,sigma2,I,m)\n",
    "p3,p3_norm,x = pdf_seq(mu3,sigma3,I,m)\n",
    "\n",
    "tic = time.perf_counter()\n",
    "\n",
    "c1 = np.convolve(p1_norm,p2_norm)\n",
    "c2 = np.convolve(c1,p3_norm)\n",
    "\n",
    "\n",
    "toc = time.perf_counter()\n",
    "\n",
    "tdiff1 = toc - tic\n",
    "\n",
    "tic = time.perf_counter()\n",
    "\n",
    "c1f = fftconvolve(p1_norm,p2_norm)\n",
    "c2f = fftconvolve(c1f,p3_norm)\n",
    "toc = time.perf_counter()\n",
    "\n",
    "toc = time.perf_counter()\n",
    "\n",
    "tdiff2 = toc - tic\n",
    "\n",
    "print(\"time with np.convolve = \", tdiff1,  \"; time with fftconvolve = \",  tdiff2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "925fec41",
   "metadata": {},
   "source": [
    "The fast Fourier transform is two orders of magnitude faster than `numpy.convolve`\n",
    "\n",
    "Now let’s plot our computed probability mass function approximation  for the sum of two log normal random variables against the histogram of the sample that we formed above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a8eae46",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "NT= np.size(x)\n",
    "\n",
    "plt.figure(figsize = (8,8))\n",
    "plt.subplot(2,1,1)\n",
    "plt.plot(x[:int(NT)],c1f[:int(NT)]/m,label = '')\n",
    "plt.xlim(0,5000)\n",
    "\n",
    "count, bins, ignored = plt.hist(ssum2, 1000, density=True, align='mid')\n",
    "# plt.plot(P2P3[:10000],label = 'FFT method',linestyle = '--')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8d9da98",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "NT= np.size(x)\n",
    "plt.figure(figsize = (8,8))\n",
    "plt.subplot(2,1,1)\n",
    "plt.plot(x[:int(NT)],c2f[:int(NT)]/m,label = '')\n",
    "plt.xlim(0,5000)\n",
    "\n",
    "count, bins, ignored = plt.hist(ssum3, 1000, density=True, align='mid')\n",
    "# plt.plot(P2P3[:10000],label = 'FFT method',linestyle = '--')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9eebe407",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "## Let's compute the mean of the discretized pdf\n",
    "mean= np.sum(np.multiply(x[:NT],c1f[:NT]))\n",
    "# meantheory = np.exp(mu1+.5*sigma1**2)\n",
    "mean, 2*meantheory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f329829",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "## Let's compute the mean of the discretized pdf\n",
    "mean= np.sum(np.multiply(x[:NT],c2f[:NT]))\n",
    "# meantheory = np.exp(mu1+.5*sigma1**2)\n",
    "mean, 3*meantheory"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "517d12cd",
   "metadata": {},
   "source": [
    "## Failure Tree Analysis\n",
    "\n",
    "We shall soon apply the convolution theorem to compute the probability of a **top event** in a failure tree analysis.\n",
    "\n",
    "Before applying the convolution theorem, we first describe the model that connects constituent events to the **top** end whose\n",
    "failure rate we seek to quantify.\n",
    "\n",
    "The model is an example of the widely used  **failure tree analysis** described by  El-Shanawany, Ardron,  and Walker [[El-Shanawany *et al.*, 2018](https://python.quantecon.org/zreferences.html#id23)].\n",
    "\n",
    "To construct the statistical model, we repeatedly use  what is called the **rare event approximation**.\n",
    "\n",
    "We want to compute the probabilty of an event $ A \\cup B $.\n",
    "\n",
    "- the union $ A \\cup B $ is the event that $ A $ OR $ B $ occurs  \n",
    "\n",
    "\n",
    "A law of probability tells us that  $ A $ OR $ B $ occurs with probability\n",
    "\n",
    "$$\n",
    "P(A \\cup B) = P(A) + P(B) - P(A \\cap B)\n",
    "$$\n",
    "\n",
    "where the intersection $ A \\cap B $ is the event that $ A $ **AND** $ B $ both occur and the union $ A \\cup B $ is\n",
    "the event that $ A $ **OR** $ B $ occurs.\n",
    "\n",
    "If $ A $ and $ B $ are independent, then\n",
    "\n",
    "$$\n",
    "P(A \\cap B) = P(A) P(B)\n",
    "$$\n",
    "\n",
    "If $ P(A) $ and $ P(B) $ are both small, then $ P(A) P(B) $ is even smaller.\n",
    "\n",
    "The **rare event approximation** is\n",
    "\n",
    "$$\n",
    "P(A \\cup B) \\approx P(A) + P(B)\n",
    "$$\n",
    "\n",
    "This approximation is widely used in evaluating system failures."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72dc9f35",
   "metadata": {},
   "source": [
    "## Application\n",
    "\n",
    "A system has been designed with the feature a system  failure occurs when **any** of  $ n $ critical  components  fails.\n",
    "\n",
    "The failure probability $ P(A_i) $  of each event $ A_i $  is small.\n",
    "\n",
    "We assume that failures of the components are statistically independent random variables.\n",
    "\n",
    "We repeatedly apply a **rare event approximation** to obtain the following formula for the problem\n",
    "of a system failure:\n",
    "\n",
    "$$\n",
    "P(F) \\approx P(A_1) + P (A_2) + \\cdots + P (A_n)\n",
    "$$\n",
    "\n",
    "or\n",
    "\n",
    "\n",
    "<a id='equation-eq-probtop'></a>\n",
    "$$\n",
    "P(F) \\approx \\sum_{i=1}^n P (A_i) \\tag{13.3}\n",
    "$$\n",
    "\n",
    "Probabilities for each event are recorded as failure rates per year."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88d2c414",
   "metadata": {},
   "source": [
    "## Failure Rates Unknown\n",
    "\n",
    "Now we come to the problem that really interests us, following  [[El-Shanawany *et al.*, 2018](https://python.quantecon.org/zreferences.html#id23)] and Greenfield and Sargent\n",
    "[[Greenfield and Sargent, 1993](https://python.quantecon.org/zreferences.html#id22)]  in the spirit of Apostolakis  [[Apostolakis, 1990](https://python.quantecon.org/zreferences.html#id21)].\n",
    "\n",
    "The constituent probabilities or failure rates $ P(A_i) $ are not known a priori and have to be estimated.\n",
    "\n",
    "We address this problem by specifying **probabilities of probabilities** that  capture one  notion of not knowing the constituent probabilities that are inputs into a failure tree analysis.\n",
    "\n",
    "Thus, we assume that a system analyst is uncertain about  the failure rates $ P(A_i), i =1, \\ldots, n $ for components of a system.\n",
    "\n",
    "The analyst copes with this situation by regarding the systems failure probability $ P(F) $ and each of the component probabilities $ P(A_i) $ as  random variables.\n",
    "\n",
    "- dispersions of the probability distribution of $ P(A_i) $ characterizes the analyst’s uncertainty about the failure probability $ P(A_i) $  \n",
    "- the dispersion of the implied probability distribution of $ P(F) $ characterizes his uncertainty about the probability of a system’s failure.  \n",
    "\n",
    "\n",
    "This leads to what is sometimes called a **hierarchical** model in which the analyst has  probabilities about the probabilities $ P(A_i) $.\n",
    "\n",
    "The analyst formalizes his uncertainty by assuming that\n",
    "\n",
    "- the failure probability $ P(A_i) $ is itself a log normal random variable with parameters $ (\\mu_i, \\sigma_i) $.  \n",
    "- failure rates $ P(A_i) $ and $ P(A_j) $ are statistically independent for all pairs with $ i \\neq j $.  \n",
    "\n",
    "\n",
    "The analyst  calibrates the parameters  $ (\\mu_i, \\sigma_i) $ for the failure events $ i = 1, \\ldots, n $ by reading reliability studies in engineering papers that have studied historical failure rates of components that are as similar as possible to the components being used in the system under study.\n",
    "\n",
    "The analyst assumes that such  information about the observed dispersion of annual failure rates, or times to failure, can inform him of what to expect about parts’ performances in his system.\n",
    "\n",
    "The analyst  assumes that the random variables $ P(A_i) $   are  statistically mutually independent.\n",
    "\n",
    "The analyst wants to approximate a probability mass function and cumulative distribution function\n",
    "of the systems failure probability $ P(F) $.\n",
    "\n",
    "- We say probability mass function because of how we discretize each random variable, as described earlier.  \n",
    "\n",
    "\n",
    "The analyst calculates the probability mass function for the **top event** $ F $, i.e., a **system failure**,  by repeatedly applying the convolution theorem to compute the probability distribution of a sum of independent log normal random variables, as described in equation\n",
    "[(13.3)](#equation-eq-probtop)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f7cd2ab",
   "metadata": {},
   "source": [
    "## Waste Hoist Failure Rate\n",
    "\n",
    "We’ll take close to a real world example by assuming that $ n = 14 $.\n",
    "\n",
    "The example estimates the annual failure rate of a critical  hoist at a nuclear waste facility.\n",
    "\n",
    "A regulatory agency wants the sytem to be designed in a way that makes the failure rate of the top event small with high probability.\n",
    "\n",
    "This example is Design Option B-2 (Case I) described in Table 10 on page 27 of [[Greenfield and Sargent, 1993](https://python.quantecon.org/zreferences.html#id22)].\n",
    "\n",
    "The table describes parameters $ \\mu_i, \\sigma_i $ for  fourteen log normal random variables that consist of  **seven pairs** of random variables that are identically and independently distributed.\n",
    "\n",
    "- Within a pair, parameters $ \\mu_i, \\sigma_i $ are the same  \n",
    "- As described in table 10 of [[Greenfield and Sargent, 1993](https://python.quantecon.org/zreferences.html#id22)]  p. 27, parameters of log normal distributions for  the seven unique probabilities $ P(A_i) $ have been calibrated to be the values in the following Python code:  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7be4e0e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "mu1, sigma1 = 4.28, 1.1947\n",
    "mu2, sigma2 = 3.39, 1.1947\n",
    "mu3, sigma3 = 2.795, 1.1947\n",
    "mu4, sigma4 = 2.717, 1.1947\n",
    "mu5, sigma5 = 2.717, 1.1947\n",
    "mu6, sigma6 = 1.444, 1.4632\n",
    "mu7, sigma7 = -.040, 1.4632"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "564d953c",
   "metadata": {},
   "source": [
    ">**Note**\n",
    ">\n",
    ">Because the failure rates are all very small,  log normal distributions with the\n",
    "above parameter values actually describe $ P(A_i) $ times $ 10^{-09} $.\n",
    "\n",
    "So the probabilities that we’ll put on the $ x $ axis of the probability mass function and associated cumulative distribution function should be multiplied by $ 10^{-09} $\n",
    "\n",
    "To extract a table that summarizes computed quantiles, we’ll use a helper function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cdca7822",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def find_nearest(array, value):\n",
    "    array = np.asarray(array)\n",
    "    idx = (np.abs(array - value)).argmin()\n",
    "    return idx"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d809a0d",
   "metadata": {},
   "source": [
    "We compute the required thirteen convolutions in the following code.\n",
    "\n",
    "(Please feel free to try different values of the power parameter $ p $ that we use to set the number of points in our grid for constructing\n",
    "the probability mass functions that discretize the continuous log normal distributions.)\n",
    "\n",
    "We’ll plot a counterpart to the cumulative distribution function (CDF) in  figure 5 on page 29 of [[Greenfield and Sargent, 1993](https://python.quantecon.org/zreferences.html#id22)]\n",
    "and we’ll also present a counterpart to their Table 11 on page 28."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08062742",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "p=15\n",
    "I = 2**p # Truncation value\n",
    "m =  .05 # increment size\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)\n",
    "p2,p2_norm,x = pdf_seq(mu2,sigma2,I,m)\n",
    "p3,p3_norm,x = pdf_seq(mu3,sigma3,I,m)\n",
    "p4,p4_norm,x = pdf_seq(mu4,sigma4,I,m)\n",
    "p5,p5_norm,x = pdf_seq(mu5,sigma5,I,m)\n",
    "p6,p6_norm,x = pdf_seq(mu6,sigma6,I,m)\n",
    "p7,p7_norm,x = pdf_seq(mu7,sigma7,I,m)\n",
    "p8,p8_norm,x = pdf_seq(mu7,sigma7,I,m)\n",
    "p9,p9_norm,x = pdf_seq(mu7,sigma7,I,m)\n",
    "p10,p10_norm,x = pdf_seq(mu7,sigma7,I,m)\n",
    "p11,p11_norm,x = pdf_seq(mu7,sigma7,I,m)\n",
    "p12,p12_norm,x = pdf_seq(mu7,sigma7,I,m)\n",
    "p13,p13_norm,x = pdf_seq(mu7,sigma7,I,m)\n",
    "p14,p14_norm,x = pdf_seq(mu7,sigma7,I,m)\n",
    "\n",
    "tic = time.perf_counter()\n",
    "\n",
    "c1 = fftconvolve(p1_norm,p2_norm)\n",
    "c2 = fftconvolve(c1,p3_norm)\n",
    "c3 = fftconvolve(c2,p4_norm)\n",
    "c4 = fftconvolve(c3,p5_norm)\n",
    "c5 = fftconvolve(c4,p6_norm)\n",
    "c6 = fftconvolve(c5,p7_norm)\n",
    "c7 = fftconvolve(c6,p8_norm)\n",
    "c8 = fftconvolve(c7,p9_norm)\n",
    "c9 = fftconvolve(c8,p10_norm)\n",
    "c10 = fftconvolve(c9,p11_norm)\n",
    "c11 = fftconvolve(c10,p12_norm)\n",
    "c12 = fftconvolve(c11,p13_norm)\n",
    "c13 = fftconvolve(c12,p14_norm)\n",
    "\n",
    "toc = time.perf_counter()\n",
    "\n",
    "tdiff13 = toc - tic\n",
    "\n",
    "print(\"time for 13 convolutions = \", tdiff13)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34954f2c",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "d13 = np.cumsum(c13)\n",
    "Nx=int(1400)\n",
    "plt.figure()\n",
    "plt.plot(x[0:int(Nx/m)],d13[0:int(Nx/m)])  # show Yad this -- I multiplied by m -- step size\n",
    "plt.hlines(0.5,min(x),Nx,linestyles='dotted',colors = {'black'})\n",
    "plt.hlines(0.9,min(x),Nx,linestyles='dotted',colors = {'black'})\n",
    "plt.hlines(0.95,min(x),Nx,linestyles='dotted',colors = {'black'})\n",
    "plt.hlines(0.1,min(x),Nx,linestyles='dotted',colors = {'black'})\n",
    "plt.hlines(0.05,min(x),Nx,linestyles='dotted',colors = {'black'})\n",
    "plt.ylim(0,1)\n",
    "plt.xlim(0,Nx)\n",
    "plt.xlabel(\"$x10^{-9}$\",loc = \"right\")\n",
    "plt.show()\n",
    "\n",
    "x_1 = x[find_nearest(d13,0.01)]\n",
    "x_5 = x[find_nearest(d13,0.05)]\n",
    "x_10 = x[find_nearest(d13,0.1)]\n",
    "x_50 = x[find_nearest(d13,0.50)]\n",
    "x_66 = x[find_nearest(d13,0.665)]\n",
    "x_85 = x[find_nearest(d13,0.85)]\n",
    "x_90 = x[find_nearest(d13,0.90)]\n",
    "x_95 = x[find_nearest(d13,0.95)]\n",
    "x_99 = x[find_nearest(d13,0.99)]\n",
    "x_9978 = x[find_nearest(d13,0.9978)]\n",
    "\n",
    "print(tabulate([\n",
    "    ['1%',f\"{x_1}\"],\n",
    "    ['5%',f\"{x_5}\"],\n",
    "    ['10%',f\"{x_10}\"],\n",
    "    ['50%',f\"{x_50}\"],\n",
    "    ['66.5%',f\"{x_66}\"],\n",
    "    ['85%',f\"{x_85}\"],\n",
    "    ['90%',f\"{x_90}\"],\n",
    "    ['95%',f\"{x_95}\"],\n",
    "    ['99%',f\"{x_99}\"],\n",
    "    ['99.78%',f\"{x_9978}\"]],\n",
    "    headers = ['Percentile', 'x * 1e-9']))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "231d5ea6",
   "metadata": {},
   "source": [
    "The above table agrees closely with column 2 of  Table 11 on p. 28 of  of [[Greenfield and Sargent, 1993](https://python.quantecon.org/zreferences.html#id22)].\n",
    "\n",
    "Discrepancies are probably due to slight differences in the number of digits retained in inputting $ \\mu_i, \\sigma_i, i = 1, \\ldots, 14 $\n",
    "and in the number of points deployed in the discretizations."
   ]
  }
 ],
 "metadata": {
  "date": 1751374720.0846407,
  "filename": "hoist_failure.md",
  "kernelspec": {
   "display_name": "Python",
   "language": "python3",
   "name": "python3"
  },
  "title": "Fault Tree Uncertainties"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}