# Data Compression

A large number of stochastic realisations of random fields requires a large amount of memory and powerful computational resources. To decrease memory requirements and computing time we offer to use a low-rank approximation for all realisations of the solution . This low-rank approximation allows us an effective postprocessing (computation of the mean value, variance, exceedance probability) with drastically reduced memory requirements (see Table 4). For each new realisation only the cor­responding low-rank update will be computed (see, e. g. ). This can be practical when, e. g. the results of many thousands Monte Carlo simulations should be com­puted and stored. Let v; є Rn, i = 1..Z, be the solution vector (already centred), where Z is a number of stochastic realisations of the solution. Let us build from all these vectors the matrix W = (vi,…, vZ) є RnxZ and consider the factorisation

W = ABT, where A є Rnxk and B є RZxk. (15)

We say that matrix W is a rank-k matrix if the representation in Eq. 15 is given. We denote the class of all rank-k matrices for which factors A and BT in Eq. 15 exist by R(k, n, Z).If W є R(k, n, Z) we say that W has a low-rank representation. The first aim is to compute a rank-k approximation Wk of W, such that

||W – Wk|| < e, k < min{n, Z}.

The second aim is to compute an update for the approximation Wk with a linear complexity for every new coming vector vZ+1. Below in Section 5.1 we present the algorithm which performs this.

To get the reduced singular value decomposition we omit all singular values, which are smaller than a given level e or, alternative variant, we leave a fixed number of largest singular values. After truncation we speak about reduced singular value decomposition (denoted by rSVD) Wk = UkZkVkT, where Uk є Rnxk contains the first k columns of U, Vk є RZxk contains the first k columns of V and Zk є Rkxk contains the k-biggest singular values of Z.

The computation of such basic statistics as the mean value, the variance, the ex­ceedance probability can be done with a linear complexity. The following examples illustrate computation of the mean value and the variance.

Let us take A := UkZk and BT := VT є RkxZ. Denote the j-th row of matrix A by aj є Rk and the i-th column of matrix BT by Ь; є Rk. It is evident, that if W is given explicitly, one can compute the mean value and the variance just keeping in memory 2 vectors – the mean (variance) and the current value. Below we show how to compute the mean and the variance if only A and B are given.

1. One can compute the mean solution v є R" as follows

1 z 1 z –

v= – XAb<=Ab. (Д6)

Z i=1 Z i=1

 rank к 5 20 max* p (x) — pк (.r) 1.7e-06 4.2e-10 maxj 1 var(p) (x) – var (p )* (x) 6.7e-05 2.3e-08
 Table 2 Rank-k approximation errors of the mean and of the variance of density in Case 1.

The computational complexity is O(k(Z + n)), in contrast to O(nZ)) for usual dense data format. As a demonstration we compute the mean.

2. One can compute the variance of the solution var(v) є Rn by the computing the covariance matrix and taking its diagonal. First, one computes the centred matrix

WC:=W —vlLr, where v = W • 1L/Z, and 11 = (1,…, 1)г є Rz. (17)

Computing Wc costs O(k2(n + Z)) (addition and truncation of rank-k matrices). By definition, the covariance matrix is C = . The reduced singular    value decomposition of Wc is (Wc)k = UkZkVl, Uk є Rnxk, Ek є Rkxk and Vk є RZxk can be computed via the QR algorithm [5, 13]. Now, the covariance matrix can be written like

The variance of the solution vector (i. e. the diagonal of the covariance matrix C can be computed with the complexity O(k2(Z + n)).

Table 2 demonstrates the rank-5 and rank-20 approximations of the mean and of the variance of density. One can see that both rank-k approximation errors are very small, much smaller than e. g. the discretisation error or Monte Carlo error (by com­puting the mean value).

Lemma 0.1. Let ||W —W*||2 < є, апсіщ be a rank-k approximation of the mean її. Піепа)\й-щ\ <ф., b) j| Wc — (W с) л-11 < є, r)||C-C,|| <^e2.

Proof: Since її = 11 and її* = ^W*1L, then

] ] P

|u-u*.||2 = — ||(W-W*)U||2 <-||(W-W*)||2-||1L||2 < -7=

Ns Ns N

Let I є RNs xNs be the identity matrix, then

1 T

||Wc-(Wc)*||2< ||W-W*||2.||I – — – І-11г||2<є, and

Ns *lb < ^jllWcWf – (Wc)*(Wc)[ |b Ns 1

2.1 Concatenation of Two Low-Rank Matrices

Let A and B such that Wk = ABT be given. Suppose also that matrix W Є Rnxm contains new m solution vectors. For a small m, computing the factors C Є Rnxk and D Є Rmxk such that W « CDT is not expensive. Now our purpose is to compute with a linear complexity the rank-k approximation of Wnew := [WW’ ] Є Rnx (Z+m). To do this, we build two concatenated matrices Anew := [AC] Є Rnx2k and Bnew = blockdiag[BT DT] Є R2kx(Z+m). Note that the difficulty now is that matrices Anew and Bnew have rank 2k. The rank k approximation of the new matrix Wnew is done with a linear complexity O((n + Z)k2 + k3) (for details see ).