Module fordead.import_data
Created on Mon Nov 2 09:42:31 2020
@author: Raphael Dutrieux
Functions
clip_xarray
def clip_xarray(
array,
extent
)
Clips xarray with x,y coordinates to an extent.
Parameters
----------
band_paths : xarray DataArray
DataArray with x and y coordinates
extent : list or 1D array, optional
Extent used for cropping [xmin,ymin, xmax,ymax]. If None, there is no cropping. The default is None.
Returns
-------
xarray DataArray
DataArray clipped to the given extent
get_band_paths
def get_band_paths(
dict_sen_paths
)
Retrieves paths to each SENTINEL band for each date from the paths of the directories containing these bands for each date.
Parameters
----------
dict_sen_paths : dict
dictionnary where keys are dates and values are the paths of the directory containing a file for each SENTINEL band
Returns
-------
DictSentinelPaths : dict
dictionnary with the same keys as dict_sen_paths, but where the paths to directories are replaced with another dictionnary where keys are the name of the bands, and values are their paths.
get_cloudiness
def get_cloudiness(
path_cloudiness,
dict_path_bands,
sentinel_source: str
)
Imports, computes and stores cloudiness for all dates
Parameters
----------
path_cloudiness : str
Path where the TileInfo object storing cloudiness information for each date is saved and imported from.
dict_path_bands : dict
Dictionnary where keys are dates, values are another dictionnary where keys are bands and values are their paths (dict_path_bands["YYYY-MM-DD"]["Mask"] -> Path to the mask)
sentinel_source : str
'theia', 'scihub' or 'peps'
Returns
-------
dict
Dictionnary where keys are dates and values the cloudiness percentage
get_date_cloudiness_perc
def get_date_cloudiness_perc(
date_paths,
sentinel_source: str
)
Computes cloudiness percentage of a Sentinel-2 date from the source mask (THEIA CLM or PEPS, scihub SCL)
A 20m resolution band is necessary for THEIA data to determine swath cover. B11 is used but could be replaced with another 20m band.
For THEIA, all pixels different to 0 in the mask are considered cloudy
For Scihub and PEPS, all pixels different to 4 or 5 in the mask are considered cloudy
Parameters
----------
date_paths : Dictionnary where keys are bands and values are their paths
DESCRIPTION.
sentinel_source : str
'theia', 'scihub' or 'peps'
Returns
-------
float
Cloudiness percentage
get_raster_metadata
def get_raster_metadata(
raster_path=None,
raster=None,
extent_shape_path=None
)
From a raster path or a raster, extracts all metadata and returns it in a dictionnary. If extent_shape_path is given, the metadata from the raster clipped with the shape is returned
Parameters
----------
raster_path : str, optional
path of a raster. The default is None.
raster : xarray DataArray, optional
xarray DataArray opened with rioxarray.open_rasterio. The default is None.
extent_shape_path : str, optional
Path to a shapefile with a single polygon. The default is None.
Returns
-------
raster_meta : dict
Dictionnary containing all metadata (dims, coords, attrs, sizes, shape, extent).
import_binary_raster
def import_binary_raster(
raster_path,
chunks=None
)
Imports forest mask
Parameters
----------
raster_path : str
Path of the forest mask binary raster.
chunks : int, optional
Chunks for import as dask array. If None, data is imported as xarray. The default is None.
Returns
-------
xarray DataArray
Binary array containing True if pixels are inside the region of interest.
import_coeff_model
def import_coeff_model(
path,
chunks=None
)
Imports array containing the coefficients to the model for vegetation index prediction. The array has a "coeff" dimension containing each coefficient.
Parameters
----------
path : str
ath of the file.
chunks : TYPE, optional
Chunk size for import as dask array. The default is None.
Returns
-------
coeff_model : xarray DataArray of dask array
Array containing the coefficients to the model for vegetation index prediction.
import_dieback_data
def import_dieback_data(
dict_paths,
chunks=None
)
Imports data relating to dieback detection
Parameters
----------
dict_paths : dict
Dictionnary containg the keys "state_dieback", "first_date_dieback", "first_date_unconfirmed_dieback", and "count_dieback" whose values are the paths to the corresponding dieback data file.
chunks : int, optional
Chunk size for import as dask array. The default is None.
Returns
-------
dieback_data : xarray DataSet or dask DataSet
DataSet containing four DataArrays, "state" containing the state of the pixel after computations,
"first_date" containing the index of the date of the first anomaly when confirmed,
"first_date_unconfirmed" containing the date of pixel change, first anomaly if pixel is not detected as dieback, first non-anomaly if pixel is detected as dieback,
"count" containing the number of successive anomalies if "state" is True, or conversely the number of successive dates without anomalies.
import_first_detection_date_index
def import_first_detection_date_index(
path,
chunks=None
)
Imports array containing the index of the first date used for detection instead of training
Parameters
----------
path : str
Path of the file.
chunks : int, optional
Chunk size for import as dask array. The default is None.
Returns
-------
first_detection_date_index : xarray DataArray of dask array
Array containing the index of the first date used for detection instead of training
import_masked_vi
def import_masked_vi(
dict_paths,
date,
chunks=None
)
Imports masked vegetation index
Parameters
----------
dict_paths : str
Dictionnary where key "VegetationIndex" returns a dictionnary where keys are SENTINEL dates and values are paths to the files containing the values of the vegetation index for the SENTINEL date, and key "Masks" returns the equivalent for the masks files.
date : str
Date in the format "YYYY-MM-DD"
chunks : int, optional
Chunk size for import as dask array. The default is None.
Returns
-------
vegetation_index : xarray DataArray
DataArray containing vegetation index values
mask : xarray DataArray
DataArray containing mask values.
import_resampled_sen_stack
def import_resampled_sen_stack(
band_paths,
list_bands,
interpolation_order=0,
extent=None
)
Imports and resamples the bands as an xarray
Parameters
----------
band_paths : dict
Dictionnary where keys are bands and values are their paths
list_bands : list
List of bands to be imported
interpolation_order : int, optional
Order of interpolation as used in scipy's ndimage.zoom (0 = nearest neighbour, 1 = linear, 2 = bi-linear, 3 = cubic). The default is 0.
extent : list or 1D array, optional
Extent used for cropping [xmin,ymin, xmax,ymax]. If None, there is no cropping. The default is None.
Returns
-------
concatenated_stack_bands : xarray
3D xarray with dimensions x,y and band
import_soil_data
def import_soil_data(
dict_paths,
chunks=None
)
Imports data relating to soil detection
Parameters
----------
dict_paths : dict
Dictionnary containg the keys "state_soil", "first_date_soil" and "count_soil" whose values are the paths to the corresponding soil detection data file.
chunks : int, optional
Chunk size for import as dask array. The default is None.
Returns
-------
soil_data : xarray DataSet or dask DataSet
DataSet containing three DataArrays, "state" containing the state of the pixel after computations (True for soil), "first_date" containing the index of the date of the first soil anomaly, "count" containing the number of successive soil anomalies.
import_stacked_anomalies
def import_stacked_anomalies(
paths_anomalies,
chunks=None
)
Imports all stacked anomalies
Parameters
----------
dict_paths : str
Dictionnary where keys are dates in the format "YYYY-MM-DD" and values are the paths to the raster file containing anomaly data.
chunks : int, optional
Chunk size for import as dask array. The default is None.
Returns
-------
stack_anomalies : xarray DataArray
3D binary DataArray with value True where there are anomalies, with Time coordinates.
import_stackedmaskedVI
def import_stackedmaskedVI(
tuile,
min_date=None,
max_date=None,
chunks=None
)
Imports 3D arrays of the vegetation index series and masks
Parameters
----------
tuile : Object of class TileInfo
Object containing paths of vegetation index and masks for each date
max_date : str, optional
Date in the format "YYYY-MM-DD". Only dates anterior to max_date are imported. If None, all dates are imported. The default is None.
chunks : int, optional
Chunks for import as dask array. If None, data is imported as xarray. The default is None.
Returns
-------
stack_vi : xarray.DataArray or dask array
DataArray containing vegetation index value with dimension Time, x and y
stack_masks : xarray.DataArray or dask array
DataArray containing mask value with dimension Time, x and y
import_stress_data
def import_stress_data(
dict_paths,
chunks=None
)
Imports data relating to stress periods
Parameters
----------
dict_paths : dict
Dictionnary containg the keys "dates_stress", "cum_diff_stress", "nb_dates_stress" and "nb_periods_stress" whose values are the paths to the corresponding stress data file.
chunks : int, optional
Chunk size for import as dask array. The default is None.
Returns
-------
stress_data : xarray DataSet or dask DataSet
DataSet containing four DataArrays, "date" containing the date index of each pixel state change, "nb_periods" containing the total number of stress periods detected for each pixel, "cum_diff" containing for each stress period the sum of the difference between the vegetation index and its prediction, multiplied by the weight if stress_index_mode is "weighted_mean", and "nb_dates" containing the number of valid dates of each stress period.
import_stress_index
def import_stress_index(
path,
chunks=None
)
Imports the stress index of all stress periods
Parameters
----------
path : str
Path to the stress index raster stack.
chunks : int, optional
Chunk size for import as dask array. The default is None.
Returns
-------
stress_index : xarray DataSet or dask DataSet (x,y,period)
DataSet containing the value of the stress index for each pixel and each stress period.
initialize_dieback_data
def initialize_dieback_data(
shape,
coords
)
Initializes data relating to dieback detection
Parameters
----------
shape : tuple
Tuple with sizes for the resulting array
coords : Coordinates attribute of xarray DataArray
Coordinates y and x
Returns
-------
dieback_data : xarray DataSet or dask DataSet
DataSet containing four DataArrays, "state" containing the state of the pixel after computations,
"first_date" containing the index of the date of the first anomaly then confirmed,
"first_date_unconfirmed" containing the date of pixel change, first anomaly if pixel is not detected as dieback, first non-anomaly if pixel is detected as dieback,
"count" containing the number of successive anomalies if "state" is True, or conversely the number of successive dates without anomalies.
For all four arrays, all pixels are intitialized at zero.
initialize_soil_data
def initialize_soil_data(
shape,
coords
)
Initializes data relating to soil detection
Parameters
----------
shape : tuple
Tuple with sizes for the resulting array
coords : Coordinates attribute of xarray DataArray
Coordinates y and x
Returns
-------
soil_data : xarray DataSet or dask DataSet
DataSet containing three DataArrays, "state" containing the state of the pixel after computations, "first_date" containing the index of the date of the first soil anomaly, "count" containing the number of successive soil anomalies
For all three arrays, all pixels are intitialized at zero.
initialize_stress_data
def initialize_stress_data(
shape,
coords,
max_nb_stress_periods
)
Initializes data relating to stress periods
Parameters
----------
shape : tuple
Tuple with sizes for the resulting array
coords : Coordinates attribute of xarray DataArray
Coordinates y and x
max_nb_stress_periods : int
Maximum number of stress periods, used to set the number of bands in the DataArrays. "date" will contain max_nb_stress_periods*2+1 bands, "nb_periods" only one, and "cum_diff" and "nb_dates" will contain max_nb_stress_periods+1 bands.
Returns
-------
stress_data : xarray DataSet or dask DataSet
DataSet containing four DataArrays,
"date" containing the date index of each pixel state change,
"nb_periods" containing the total number of stress periods detected for each pixel,
"cum_diff" containing for each stress period the sum of
the difference between the vegetation index and its prediction,
multiplied by the weight if stress_index_mode is "weighted_mean",
and "nb_dates" containing the number of valid dates of each stress period.
For all four arrays, all pixels are intitialized at zero.
retrieve_date_from_string
def retrieve_date_from_string(
string
)
From a string containing a date in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY, retrieves the date in the format YYYY-MM-DD.
Works only for 20th and 21st centuries (years beginning with 19 or 20)
Parameters
----------
string : str
String containing a date
Returns
-------
formatted_date : str
Date in the format YYYY-MM-DD
Classes
Sat_catalog_theia
class Sat_catalog_theia(
input_directory
)
Methods
build_catalog
def build_catalog(
self,
key
)
Parameters
----------
path_dir : str
Directory containing files with filenames containing dates in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY
Returns
-------
dict_datepaths : dict
Dictionnary linking formatted dates with the paths of the files from which the dates where extracted
read
def read(
self,
list_bands,
interpolation_order=0,
extent=None
)
Imports and resamples the bands as an xarray
Parameters
----------
band_paths : dict
Dictionnary where keys are bands and values are their paths
list_bands : list
List of bands to be imported
interpolation_order : int, optional
Order of interpolation as used in scipy's ndimage.zoom (0 = nearest neighbour, 1 = linear, 2 = bi-linear, 3 = cubic). The default is 0.
extent : list or 1D array, optional
Extent used for cropping [xmin,ymin, xmax,ymax]. If None, there is no cropping. The default is None.
Returns
-------
concatenated_stack_bands : xarray
3D xarray with dimensions x,y and band
TileInfo
class TileInfo(
data_directory
)
Methods
add_dirpath
def add_dirpath(
self,
key,
path
)
Adds path to a directory to TileInfo object and creates parent directories if they don't exist already
Path can then by retrieved with self.paths[key] where self is the TileInfo object name.
Parameters
----------
key : str
Key for the paths dictionnary.
path : str
Path to add to the dictionnary.
add_parameters
def add_parameters(
self,
parameters
)
Adds attribute 'parameters' to TileInfo object which contains dictionnary of parameters and their values.
If attribute parameters already exists, checks for conflicts then updates parameters
In case of conflicts, meaning if parameter was unknown or changed, the parameter 'Overwrite' is set to True and can be used to know when to deleted previous computation results.
Parameters
----------
parameters : dict
Dictionnary containing parameters and their values
add_path
def add_path(
self,
key,
path
)
Adds path to TileInfo object and creates parent directories if they don't exist already
Path can then by retrieved with self.paths[key] where self is the TileInfo object name.
Parameters
----------
key : str
Key for the paths dictionnary.
path : str
Path to add to the dictionnary.
delete_attributes
def delete_attributes(
self,
*attrs
)
Using keys to attributes of the TileInfo object, deletes those attributes if they exist.
Parameters
----------
key_path : str
Key of attributes of the object
Returns
-------
None.
delete_dirs
def delete_dirs(
self,
*key_paths
)
Using keys to paths (usually added through add_path or add_dirpath), deletes directory containing the file, or the directory if the path already links to a directory.
Parameters
----------
key_path : str
Key in the dictionnary containing paths
Returns
-------
None.
delete_files
def delete_files(
self,
*key_paths
)
Using keys to paths (usually added through add_path), deletes file
Parameters
----------
key_path : str
Key in the dictionnary containing paths
Returns
-------
None.
getdict_datepaths
def getdict_datepaths(
self,
key,
path_dir
)
Parameters
----------
path_dir : str
Directory containing files with filenames containing dates in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY
Returns
-------
dict_datepaths : dict
Dictionnary linking formatted dates with the paths of the files from which the dates where extracted
getdict_paths
def getdict_paths(
self,
path_vi,
path_masks
)
Adds paths to vegetation index files and mask files to TileInfo object, along with the list of dates.
Parameters
----------
path_vi : str
Directory containing vegetation index files with filenames containing dates in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY
path_masks : str
Directory containing mask files with filenames containing dates in the format YYYY-MM-DD, YYYY_MM_DD, YYYYMMDD, DD-MM-YYYY, DD_MM_YYYY or DDMMYYYY
import_info
def import_info(
self,
path=None
)
Imports TileInfo object in the data_directory, or the one at path if the parameter is given
If no TileInfo object exists, the object remains unchanged
Parameters
----------
path : str, optional
Path to a TileInfo object to be imported. The default is None.
Returns
-------
TileInfo object
Imported TileInfo object if one exists already, or current TileInfo object if not.
print_info
def print_info(
self
)
Prints parameters, dates used, and last computed date for anomalies.
save_info
def save_info(
self,
path=None
)
Saves the TileInfo object in its data_directory by default or in specified location
search_new_dates
def search_new_dates(
self
)
Checks if there are new dates in vegetation index and mask directories, adds the paths and list of dates to the TileInfo object.