Created on Tue Nov 3 16:21:35 2020

@author: Raphael Dutrieux

## Functions

### censored_lstsq

def censored_lstsq(
B,
M,
A
)

Solves least squares problem subject to missing data, compatible with numpy, dask and xarray.

Code inspired from http://alexhwilliams.info/itsneuronalblog/2018/02/26/censored-lstsq

Parameters
----------
B : (N, ...) numpy.ndarray or xarray.DataArray
M : (N, ...) boolean numpy.ndarray or xarray.DataArray
Mask giving the missing data (when False the data is not used in computation).
It must have the same size/chunks as B.
A : (N, P) numpy.array

Returns
-------
X : (P, ...) numpy.ndarray that minimizes norm(M*(AX - B))

Notes
-----
If B is two-dimensional (e.g. (N, K)), the least-squares solution is calculated for each of column
of B, returning X with dimension (P, K)

If B is more than 2D, the computation is made with a reduced to 2D keeping the first dimension to its original size. Same for M.

It should be used with xr.map_blocks (only full A is needed, B et M can be processed by blocks).

It uses a broadcasted solve for speed.

Examples
--------

# with numpy array

# with xarray
import xarray as xr
import numpy as np
xysize = 1098
tsize = 10
chunksxy = 100
chunks3D = {'x':chunksxy, 'y':chunksxy, 'Time':10}
chunks2D = {'x':chunksxy, 'y':chunksxy}

B = xr.DataArray(np.arange(np.prod([tsize,xysize,xysize])).reshape((tsize,xysize,xysize)), dims=('Time', "x", "y"),
coords=[('Time', np.arange(tsize)), ('x', np.arange(xysize)), ('y', np.arange(xysize))]).chunk(chunks=chunks3D)

M = xr.DataArray(np.random.rand(tsize,xysize,xysize), dims=('Time', "x", "y"),
coords=[('Time', np.arange(tsize)), ('x', np.arange(xysize)), ('y', np.arange(xysize))]).chunk(chunks=chunks3D)>0.05

A = np.random.rand(tsize, 5)

# The following are 3 different ways for the same result
## xarray
res = xr.map_blocks(censored_lstsq, B, args=[M], kwargs={'A':A})

res = da.array.blockwise(censored_lstsq, 'kmn', B.data, 'tmn', M.data, 'tmn', new_axes={'k':5}, dtype=A.dtype, A=A, meta=np.ndarray(()))
res = xr.DataArray(res, dims=['coeff', B.dims[1], B.dims[2]],
coords=[('coeff', np.arange(A.shape[1])), B.coords[B.dims[1]],
B.coords[B.dims[2]]])

res = da.array.map_blocks(censored_lstsq, B.data, M.data, A=A,
chunks=((A.shape[1],), B.chunks[1], B.chunks[2]),
meta=np.ndarray(()), dtype=np.float_)
res = xr.DataArray(res, dims=['coeff', B.dims[1], B.dims[2]],
coords=[('coeff', np.arange(A.shape[1])), B.coords[B.dims[1]],
B.coords[B.dims[2]]])


### correct_vi_date

def correct_vi_date(
vegetation_index,
large_scale_model,
date,
correction_vi
)

Corrects single date vegetation index using large scale vegetation index median value previously computed.
The difference between the prediction of the model and the large scale median value is used as a correction term for the vegetation index.
This is meant to take into account changes affecting all pixels equally which are not linked with a dieback.

Parameters
----------
vegetation_index : xarray DataArray
DataArray containing uncorrected vegetation index values
Binary array containing True if pixels are inside the region of interest.
large_scale_model : xarray (coeff: 5)
Array containing the five coefficients of the large scale median vegetation index model
date : str
Date in the format "YYYY-MM-DD"
correction_vi : xarray (Time)
Array containing the correction terms for each date which were added to the vegetation index for its correction

Returns
-------
vegetation_index : xarray DataArray
DataArray containing corrected vegetation index values
correction_vi : xarray (Time)
Array containing the correction terms for each date, with added correction term of the date used in the function.


### get_detection_dates

def get_detection_dates(
min_last_date_training,
nb_min_date=10
)

Returns the index of the last date which will be used for training from the masks.

Parameters
----------
Stack of masks with dimensions (Time,x,y)
min_last_date_training : str
Earliest date at which the training ends and the detection begins
nb_min_date : int, optional
Minimum number of dates used to train the model. The default is 10.

Returns
-------
detection_dates : xarray.DataArray (Time,x,y)
Boolean array, True at dates where the pixel is used for detection, False when used for training
first_detection_date_index : xarray.DataArray (x,y)
Array containing the index of the last date which will be used for training, or 0 if there isn't enough valid data.


### model_vi

def model_vi(
stack_vi,
one_dim=False
)

Models periodic vegetation index for each pixel.

Parameters
----------
stack_vi : array (Time,x,y)
Array containing vegetation index data, must contain coordinates 'Time' with format '%Y-%m-%d'
one_dim : bool
Has to be True if data dimension is only temporal, and have no spatial dimensions

Returns
-------
coeff_model : array (5,x,y)
Array containing the five coefficients of the vegetation index model for each pixel


### model_vi_correction

def model_vi_correction(
stack_vi,
dict_paths
)

Corrects stacked vegetation index using large scale vegetation index median value.
A periodic model is fitted on the median vegetation index of unmasked pixels within the forest mask. The difference between the prediction of this model and these large scale median values is used as a correction term to correct the vegetation index.
This is meant to take into account changes affecting all pixels equally which are not linked with a dieback.

Parameters
----------
stack_vi : array (Time,x,y)
array containing uncorrected vegetation index data
dict_paths : dict
Dictionnary containing vegetation index path for each date, and forest_mask key linking to the path of the pixels of interest.

Returns
-------
stack_vi : array (Time,x,y)
Array containing corrected vegetation index data
large_scale_model : xarray (coeff: 5)
Array containing the five coefficients of the large scale median vegetation index model
correction_vi : xarray (Time)
Array containing the correction terms for each date which were added to the vegetation index for its correction


### prediction_vegetation_index

def prediction_vegetation_index(
coeff_model,
date_list
)

Predicts the vegetation index from the model coefficients and the date

Parameters
----------
coeff_model : array (5,x,y)
Array containing the five coefficients of the vegetation index model for each pixel
date : str
Date in the format "YYYY-MM-DD"

Returns
-------
predicted_vi : array (x,y)
Array containing predicted vegetation index from the model