Skip to content

Module fordead.import_data

Created on Mon Nov 2 09:42:31 2020

@author: Raphael Dutrieux

Functions

clip_xarray

def clip_xarray(
    array,
    extent
)
Clips xarray with x,y coordinates to an extent.

Parameters
----------
band_paths : xarray DataArray
    DataArray with x and y coordinates
extent : list or 1D array, optional
    Extent used for cropping [xmin,ymin, xmax,ymax]. If None, there is no cropping. The default is None.

Returns
-------
xarray DataArray
    DataArray clipped to the given extent

get_band_paths

def get_band_paths(
    dict_sen_paths
)
Retrieves paths to each SENTINEL band for each date from the paths of the directories containing these bands for each date.

Parameters
----------
dict_sen_paths : dict
    dictionnary where keys are dates and values are the paths of the directory containing a file for each SENTINEL band

Returns
-------
DictSentinelPaths : dict
    dictionnary with the same keys as dict_sen_paths, but where the paths to directories are replaced with another dictionnary where keys are the name of the bands, and values are their paths.

get_cloudiness

def get_cloudiness(
    path_cloudiness,
    dict_path_bands,
    sentinel_source: str
)
Imports, computes and stores cloudiness for all dates

Parameters
----------
path_cloudiness : str
    Path where the TileInfo object storing cloudiness information for each date is saved and imported from.
dict_path_bands : dict
    Dictionnary where keys are dates, values are another dictionnary where keys are bands and values are their paths (dict_path_bands["YYYY-MM-DD"]["Mask"] -> Path to the mask)
sentinel_source : str
    'theia', 'scihub' or 'peps'

Returns
-------
dict
    Dictionnary where keys are dates and values the cloudiness percentage

get_date_cloudiness_perc

def get_date_cloudiness_perc(
    date_paths,
    sentinel_source: str
)
Computes cloudiness percentage of a Sentinel-2 date from the source mask (THEIA CLM or PEPS, scihub SCL)
A 20m resolution band is necessary for THEIA data to determine swath cover. B11 is used but could be replaced with another 20m band.
For THEIA, all pixels different to 0 in the mask are considered cloudy
For Scihub and PEPS, all pixels different to 4 or 5 in the mask are considered cloudy

Parameters
----------
date_paths : Dictionnary where keys are bands and values are their paths
    DESCRIPTION.
sentinel_source : str
    'theia', 'scihub' or 'peps'

Returns
-------
float
    Cloudiness percentage

get_raster_metadata

def get_raster_metadata(
    raster_path=None,
    raster=None,
    extent_shape_path=None
)
From a raster path or a raster, extracts all metadata and returns it in a dictionnary. If extent_shape_path is given, the metadata from the raster clipped with the shape is returned

Parameters
----------
raster_path : str, optional
    path of a raster. The default is None.
raster : xarray DataArray, optional
    xarray DataArray opened with rioxarray.open_rasterio. The default is None.
extent_shape_path : str, optional
    Path to a shapefile with a single polygon. The default is None.

Returns
-------
raster_meta : dict
    Dictionnary containing all metadata (dims, coords, attrs, sizes, shape, extent).

import_binary_raster

def import_binary_raster(
    raster_path,
    chunks=None
)
Imports forest mask

Parameters
----------
raster_path : str
    Path of the forest mask binary raster.
chunks : int, optional
    Chunks for import as dask array. If None, data is imported as xarray. The default is None.

Returns
-------
xarray DataArray
    Binary array containing True if pixels are inside the region of interest.

import_coeff_model

def import_coeff_model(
    path,
    chunks=None
)
Imports array containing the coefficients to the model for vegetation index prediction. The array has a "coeff" dimension containing each coefficient.

Parameters
----------
path : str
    ath of the file.
chunks : TYPE, optional
    Chunk size for import as dask array. The default is None.

Returns
-------
coeff_model : xarray DataArray of dask array
    Array containing the coefficients to the model for vegetation index prediction.

import_dieback_data

def import_dieback_data(
    dict_paths,
    chunks=None
)
Imports data relating to dieback detection

Parameters
----------
dict_paths : dict
    Dictionnary containg the keys "state_dieback", "first_date_dieback", "first_date_unconfirmed_dieback", and "count_dieback" whose values are the paths to the corresponding dieback data file.
chunks : int, optional
    Chunk size for import as dask array. The default is None.

Returns
-------
dieback_data : xarray DataSet or dask DataSet
    DataSet containing four DataArrays, "state" containing the state of the pixel after computations, 
    "first_date" containing the index of the date of the first anomaly when confirmed, 
    "first_date_unconfirmed" containing the date of pixel change, first anomaly if pixel is not detected as dieback, first non-anomaly if pixel is detected as dieback, 
    "count" containing the number of successive anomalies if "state" is True, or conversely the number of successive dates without anomalies. 

import_first_detection_date_index

def import_first_detection_date_index(
    path,
    chunks=None
)
Imports array containing the index of the first date used for detection instead of training

Parameters
----------
path : str
    Path of the file.
chunks : int, optional
    Chunk size for import as dask array. The default is None.

Returns
-------
first_detection_date_index : xarray DataArray of dask array
    Array containing the index of the first date used for detection instead of training

import_masked_vi

def import_masked_vi(
    dict_paths,
    date,
    chunks=None
)
Imports masked vegetation index

Parameters
----------
dict_paths : str
    Dictionnary where key "VegetationIndex" returns a dictionnary where keys are SENTINEL dates and values are paths to the files containing the values of the vegetation index for the SENTINEL date, and key "Masks" returns the equivalent for the masks files.
date : str
    Date in the format "YYYY-MM-DD"
chunks : int, optional
    Chunk size for import as dask array. The default is None.

Returns
-------
vegetation_index : xarray DataArray
    DataArray containing vegetation index values
mask : xarray DataArray
    DataArray containing mask values.

import_resampled_sen_stack

def import_resampled_sen_stack(
    band_paths,
    list_bands,
    interpolation_order=0,
    extent=None
)
Imports and resamples the bands as an xarray

Parameters
----------
band_paths : dict
    Dictionnary where keys are bands and values are their paths
list_bands : list
    List of bands to be imported
interpolation_order : int, optional
    Order of interpolation as used in scipy's ndimage.zoom (0 = nearest neighbour, 1 = linear, 2 = bi-linear, 3 = cubic). The default is 0.
extent : list or 1D array, optional
    Extent used for cropping [xmin,ymin, xmax,ymax]. If None, there is no cropping. The default is None.

Returns
-------
concatenated_stack_bands : xarray
    3D xarray with dimensions x,y and band

import_soil_data

def import_soil_data(
    dict_paths,
    chunks=None
)
Imports data relating to soil detection

Parameters
----------
dict_paths : dict
    Dictionnary containg the keys "state_soil", "first_date_soil" and "count_soil" whose values are the paths to the corresponding soil detection data file.
chunks : int, optional
    Chunk size for import as dask array. The default is None.

Returns
-------
soil_data : xarray DataSet or dask DataSet
    DataSet containing three DataArrays, "state" containing the state of the pixel after computations (True for soil), "first_date" containing the index of the date of the first soil anomaly, "count" containing the number of successive soil anomalies.

import_stacked_anomalies

def import_stacked_anomalies(
    paths_anomalies,
    chunks=None
)
Imports all stacked anomalies

Parameters
----------
dict_paths : str
    Dictionnary where keys are dates in the format "YYYY-MM-DD" and values are the paths to the raster file containing anomaly data.
chunks : int, optional
    Chunk size for import as dask array. The default is None.

Returns
-------
stack_anomalies : xarray DataArray
    3D binary DataArray with value True where there are anomalies, with Time coordinates.

import_stackedmaskedVI

def import_stackedmaskedVI(
    tuile,
    min_date=None,
    max_date=None,
    chunks=None
)
Imports 3D arrays of the vegetation index series and masks

Parameters
----------
tuile : Object of class TileInfo
    Object containing paths of vegetation index and masks for each date
max_date : str, optional
    Date in the format "YYYY-MM-DD". Only dates anterior to max_date are imported. If None, all dates are imported. The default is None.
chunks : int, optional
    Chunks for import as dask array. If None, data is imported as xarray. The default is None.

Returns
-------
stack_vi : xarray.DataArray or dask array
    DataArray containing vegetation index value with dimension Time, x and y
stack_masks : xarray.DataArray or dask array
    DataArray containing mask value with dimension Time, x and y

import_stress_data

def import_stress_data(
    dict_paths,
    chunks=None
)
Imports data relating to stress periods

Parameters
----------
dict_paths : dict
    Dictionnary containg the keys "dates_stress", "cum_diff_stress", "nb_dates_stress" and "nb_periods_stress" whose values are the paths to the corresponding stress data file.
chunks : int, optional
    Chunk size for import as dask array. The default is None.

Returns
-------
stress_data : xarray DataSet or dask DataSet
    DataSet containing four DataArrays, "date" containing the date index of each pixel state change, "nb_periods" containing the total number of stress periods detected for each pixel, "cum_diff" containing for each stress period the sum of the difference between the vegetation index and its prediction, multiplied by the weight if stress_index_mode is "weighted_mean", and "nb_dates" containing the number of valid dates of each stress period.

import_stress_index

def import_stress_index(
    path,
    chunks=None
)
Imports the stress index of all stress periods

Parameters
----------
path : str
    Path to the stress index raster stack.
chunks : int, optional
    Chunk size for import as dask array. The default is None.

Returns
-------
stress_index : xarray DataSet or dask DataSet (x,y,period)
    DataSet containing the value of the stress index for each pixel and each stress period.

initialize_dieback_data

def initialize_dieback_data(
    shape,
    coords
)
Initializes data relating to dieback detection

Parameters
----------
shape : tuple
    Tuple with sizes for the resulting array 
coords : Coordinates attribute of xarray DataArray
    Coordinates y and x

Returns
-------
dieback_data : xarray DataSet or dask DataSet
    DataSet containing four DataArrays, "state" containing the state of the pixel after computations, 
    "first_date" containing the index of the date of the first anomaly then confirmed, 
    "first_date_unconfirmed" containing the date of pixel change, first anomaly if pixel is not detected as dieback, first non-anomaly if pixel is detected as dieback, 
    "count" containing the number of successive anomalies if "state" is True, or conversely the number of successive dates without anomalies. 
    For all four arrays, all pixels are intitialized at zero.

initialize_soil_data

def initialize_soil_data(
    shape,
    coords
)
Initializes data relating to soil detection

Parameters
----------
shape : tuple
    Tuple with sizes for the resulting array 
coords : Coordinates attribute of xarray DataArray
    Coordinates y and x

Returns
-------
soil_data : xarray DataSet or dask DataSet
    DataSet containing three DataArrays, "state" containing the state of the pixel after computations, "first_date" containing the index of the date of the first soil anomaly, "count" containing the number of successive soil anomalies
    For all three arrays, all pixels are intitialized at zero.

initialize_stress_data

def initialize_stress_data(
    shape,
    coords,
    max_nb_stress_periods
)
Initializes data relating to stress periods

Parameters
----------
shape : tuple
    Tuple with sizes for the resulting array 
coords : Coordinates attribute of xarray DataArray
    Coordinates y and x
max_nb_stress_periods : int
    Maximum number of stress periods, used to set the number of bands in the DataArrays. "date" will contain max_nb_stress_periods*2+1 bands, "nb_periods" only one, and "cum_diff" and "nb_dates" will contain max_nb_stress_periods+1 bands.

Returns
-------
stress_data : xarray DataSet or dask DataSet
    DataSet containing four DataArrays, 
    "date" containing the date index of each pixel state change, 
    "nb_periods" containing the total number of stress periods detected for each pixel, 
    "cum_diff" containing for each stress period the sum of
    the difference between the vegetation index and its prediction, 
    multiplied by the weight if stress_index_mode is "weighted_mean", 
    and "nb_dates" containing the number of valid dates of each stress period.
    For all four arrays, all pixels are intitialized at zero.

retrieve_date_from_string

def retrieve_date_from_string(
    string
)
From a string containing a date in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY, retrieves the date in the format YYYY-MM-DD.
Works only for 20th and 21st centuries (years beginning with 19 or 20)

Parameters
----------
string : str
    String containing a date

Returns
-------
formatted_date : str
    Date in the format YYYY-MM-DD

Classes

Sat_catalog_theia

class Sat_catalog_theia(
    input_directory
)

Methods

build_catalog
def build_catalog(
    self,
    key
)
Parameters
----------
path_dir : str
    Directory containing files with filenames containing dates in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY

Returns
-------
dict_datepaths : dict
    Dictionnary linking formatted dates with the paths of the files from which the dates where extracted
read
def read(
    self,
    list_bands,
    interpolation_order=0,
    extent=None
)
Imports and resamples the bands as an xarray

Parameters
----------
band_paths : dict
    Dictionnary where keys are bands and values are their paths
list_bands : list
    List of bands to be imported
interpolation_order : int, optional
    Order of interpolation as used in scipy's ndimage.zoom (0 = nearest neighbour, 1 = linear, 2 = bi-linear, 3 = cubic). The default is 0.
extent : list or 1D array, optional
    Extent used for cropping [xmin,ymin, xmax,ymax]. If None, there is no cropping. The default is None.

Returns
-------
concatenated_stack_bands : xarray
    3D xarray with dimensions x,y and band

TileInfo

class TileInfo(
    data_directory
)

Methods

add_dirpath
def add_dirpath(
    self,
    key,
    path
)
Adds path to a directory to TileInfo object and creates parent directories if they don't exist already
Path can then by retrieved with self.paths[key] where self is the TileInfo object name.

Parameters
----------
key : str
    Key for the paths dictionnary.
path : str
    Path to add to the dictionnary.
add_parameters
def add_parameters(
    self,
    parameters
)
Adds attribute 'parameters' to TileInfo object which contains dictionnary of parameters and their values.
If attribute parameters already exists, checks for conflicts then updates parameters
In case of conflicts, meaning if parameter was unknown or changed, the parameter 'Overwrite' is set to True and can be used to know when to deleted previous computation results.

Parameters
----------
parameters : dict
    Dictionnary containing parameters and their values
add_path
def add_path(
    self,
    key,
    path
)
Adds path to TileInfo object and creates parent directories if they don't exist already
Path can then by retrieved with self.paths[key] where self is the TileInfo object name.

Parameters
----------
key : str
    Key for the paths dictionnary.
path : str
    Path to add to the dictionnary.
delete_attributes
def delete_attributes(
    self,
    *attrs
)
Using keys to attributes of the TileInfo object, deletes those attributes if they exist.

Parameters
----------
key_path : str
    Key of attributes of the object

Returns
-------
None.
delete_dirs
def delete_dirs(
    self,
    *key_paths
)
Using keys to paths (usually added through add_path or add_dirpath), deletes directory containing the file, or the directory if the path already links to a directory. 

Parameters
----------
key_path : str
    Key in the dictionnary containing paths

Returns
-------
None.
delete_files
def delete_files(
    self,
    *key_paths
)
Using keys to paths (usually added through add_path), deletes file 

Parameters
----------
key_path : str
    Key in the dictionnary containing paths

Returns
-------
None.
getdict_datepaths
def getdict_datepaths(
    self,
    key,
    path_dir
)
Parameters
----------
path_dir : str
    Directory containing files with filenames containing dates in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY

Returns
-------
dict_datepaths : dict
    Dictionnary linking formatted dates with the paths of the files from which the dates where extracted
getdict_paths
def getdict_paths(
    self,
    path_vi,
    path_masks
)
Adds paths to vegetation index files and mask files to TileInfo object, along with the list of dates.

Parameters
----------
path_vi : str
    Directory containing vegetation index files with filenames containing dates in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY
path_masks : str
    Directory containing mask files with filenames containing dates in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY
import_info
def import_info(
    self,
    path=None
)
Imports TileInfo object in the data_directory, or the one at path if the parameter is given
If no TileInfo object exists, the object remains unchanged

Parameters
----------
path : str, optional
    Path to a TileInfo object to be imported. The default is None.

Returns
-------
TileInfo object
    Imported TileInfo object if one exists already, or current TileInfo object if not.
def print_info(
    self
)
Prints parameters, dates used, and last computed date for anomalies.
save_info
def save_info(
    self,
    path=None
)
Saves the TileInfo object in its data_directory by default or in specified location
search_new_dates
def search_new_dates(
    self
)
Checks if there are new dates in vegetation index and mask directories, adds the paths and list of dates to the TileInfo object.