ldcpy package

Submodules

ldcpy.metrics module

ldcpy.plot module

class ldcpy.plot.calcsPlot(ds, varname, calc, sets, group_by=None, scale='linear', calc_type='raw', plot_type='spatial', transform='none', subset=None, approx_lat=None, approx_lon=None, lev=0, color='coolwarm', standardized_err=False, quantile=None, contour_levs=24, short_title=False, axes_symmetric=False, legend_loc='upper right', vert_plot=False, tex_format=False, legend_offset=None, weighted=True, basic_plot=False, cmax=None, cmin=None)[source]

Bases: object

This class contains code to plot calcs in an xarray Dataset that has either ‘lat’ and ‘lon’ dimensions, or a ‘time’ dimension.

get_calc_label(calc, data, data_type)[source]

get_calcs(da, data_type)[source]

get_plot_data(raw_data_1, raw_data_2=None)[source]

get_title(calc_name, c_name=None)[source]

hist_plot(plot_data, title)[source]

periodogram_plot(plot_data, title)[source]

plot_1d(plot_data, title)[source]

spatial_plot(da_sets, titles, data_type)[source]

time_series_plot(da_sets, titles, time_dim)[source]: time series plot

verify_plot_parameters()[source]

ldcpy.plot.plot(ds, varname, calc, sets, group_by=None, scale='linear', calc_type='raw', plot_type='spatial', transform='none', subset=None, lat=None, lon=None, lev=0, color='coolwarm', quantile=None, start=None, end=None, short_title=False, axes_symmetric=False, legend_loc='upper right', vert_plot=False, tex_format=False, legend_offset=None, weighted=None, basic_plot=False, cmax=None, cmin=None)[source]

Plots the data given an xarray dataset

Parameters:

ds (xarray.Dataset) – The input dataset
varname (str) – The name of the variable to be plotted
calc (str) –
The name of the calc to be plotted (must match a property name in the Datasetcalcs class in ldcpy.plot, for more information about the available calcs see ldcpy.Datasetcalcs) Acceptable values include:
- ns_con_var
- ew_con_var
- mean
- std
- variance
- prob_positive
- prob_negative
- odds_positive
- zscore
- mean_abs
- mean_squared
- rms
- sum
- sum_squared
- quantile
- lag1
- standardized_mean
- ann_harmonic_ratio
- pooled_variance_ratio
sets (list <str>) – The labels of the dataset to gather calcs from
group_by (str) –
how to group the data in time series plots. Valid groupings:
- time.day
- time.dayofyear
- time.month
- time.year
scale (str, optional) –
time-series y-axis plot transformation. (default “linear”) Valid options:
- linear
- log
calc_type (str, optional) –
The type of operation to be performed on the calcs. (default ‘raw’) Valid options:
- raw: the unaltered calc values
- diff: the difference between the calc values in the first set and every other set
- ratio: the ratio of the calc values in (2nd, 3rd, 4th… sets/1st set)
- calc_of_diff: the calc value computed on the difference between the first set and every other set
plot_type (str , optional) –
The type of plot to be created. (default ‘spatial’) Valid options:
- spatial: a plot of the world with values at each lat and lon point (takes the mean across the time dimension)
- time-series: A time-series plot of the data (computed by taking the mean across the lat and lon dimensions)
- histogram: A histogram of the time-series data
transform (str, optional) –
data transformation. (default ‘none’) Valid options:
- none
- log
subset (str, optional) –
subset of the data to gather calcs on (default None). Valid options:
- first5: the first 5 days of data
- DJF: data from the months December, January, February
- MAM: data from the months March, April, May
- JJA: data from the months June, July, August
- SON: data from the months September, October, November
lat (float, optional) – The latitude of the data to gather calcs on (default None).
lon (float , optional) – The longitude of the data to gather calcs on (default None).
lev (float, optional) – The level of the data to gather calcs on (used if plotting from a 3d data set), (default 0).
color (str, optional) – The color scheme for spatial plots, (default ‘coolwarm’). see https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html for more options
quantile (float, optional) – A value between 0 and 1 required if calc=”quantile”, corresponding to the desired quantile to gather, (default 0.5).
start (int, optional) – A value between 0 and the number of time slices indicating the start time of a subset, (default None).
end (int, optional) – A value between 0 and the number of time slices indicating the end time of a subset, (default None)
calc_ssim (bool, optional) – Whether or not to calculate the ssim (structural similarity index) between two plots (only applies to plot_type = ‘spatial’), (default False)
short_title (bool, optional) – If True, use a shortened title in the plot output (default False).
axes_symmetric (bool, optional) – Whether or not to make the colorbar axes symmetric about zero (used in a spatial plot) (default False)
legend_loc (str, optional) – The location to put the legend in a time-series plot in single-column format (plot_type = “time_series”, vert_plot=True) (default “upper right”)
vert_plot (bool, optional) – If true, forces plots into a single column format and enlarges text. (default False)
tex_format (bool, optional) – Whether to interpret all plot output strings as latex formatting (default False)
legend_offset (2-tuple, optional) – The x- and y- offset of the legend. Moves the corner of the legend specified by legend_loc to the specified location specified (where (0,0) is the bottom left corner of the plot and (1,1) is the top right corner). Only affects time-series, histogram, and periodogram plots.

Returns:

out

Return type:

None

ldcpy.plot.tex_escape(text)[source]

Parameters:: text – a plain text message
Returns:: the message escaped to appear correctly in LaTeX

ldcpy.util module

ldcpy.util.check_metrics(ds, varname, set1, set2, ks_tol=0.05, pcc_tol=0.99999, spre_tol=5.0, ssim_tol=0.995, **calcs_kwargs)[source]

Check the K-S, Pearson Correlation, and Spatial Relative Error calcs

Parameters:

ds (xarray.Dataset) – An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension
varname (str) – The variable of interest in the dataset
set1 (str) – The collection label of the “control” data
set2 (str) – The collection label of the (1st) data to compare
ks_tol (float, optional) – The p-value threshold (significance level) for the K-S test (default = .05)
pcc_tol (float, optional) – The default Pearson corrolation coefficient (default = .99999)
spre_tol (float, optional) – The percentage threshold for failing grid points in the spatial relative error test (default = 5.0).
ssim_tol (float, optional) – The threshold for the data ssim test (default = .995
**calcs_kwargs – Additional keyword arguments passed through to the Datasetcalcs instance.

Returns:

out

Return type:

Number of failing calcs

Notes

Check the K-S, Pearson Correlation, and Spatial Relative Error calcs from:

A. H. Baker, H. Xu, D. M. Hammerling, S. Li, and J. Clyne, “Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data”, in J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, Lecture Notes in Computer Science 10524, pp. 30–42, 2017 (doi:10.1007/978-3-319-67630-2_3).

Check the Data SSIM, which is a modification of SSIM calc from:

A.H. Baker, D.M. Hammerling, and T.L. Turton. “Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data”, Computer Graphics Forum 38(3), June 2019, pp. 517-528 (doi:10.1111/cgf.13707).

Default tolerances for the tests are:

K-S: fail if p-value < .05 (significance level) Pearson correlation coefficient: fail if coefficient < .99999 Spatial relative error: fail if > 5% of grid points fail relative error Data SSIM: fail if Data SSIM < .995

ldcpy.util.collect_datasets(data_type, varnames, list_of_ds, labels, coords_ds=None, file_sizes=None, **kwargs)[source]

Concatonate several different xarray datasets across a new “collection” dimension, which can be accessed with the specified labels. Stores them in an xarray dataset which can be passed to the ldcpy plot functions (Call this OR open_datasets() before plotting.)

Parameters:

data_type (string) – Current data types: :cam-fv, pop, wrf
varnames (list) – The variable(s) of interest to combine across input files (usually just one)
list_of_datasets (list) – The xarray datasets to be concatonated into a collection
labels (list) – The respective label to access data from each dataset (also used in plotting fcns)
coords_ds (xarray dataset) – (optional) Specify an additional file that contains lat/lon corrds (common for WRF data)
file_sizes (list) – (optional) sizes of files that each dataset corresponds to (used to print in compare_stats table
**kwargs – (optional) – Additional arguments passed on to xarray.concat(). A list of available arguments can be found here: https://xarray-test.readthedocs.io/en/latest/generated/xarray.concat.html

Returns:

out – a collection containing all the data from the list datasets

Return type:

xarray.Dataset

Notes

-WRF data must be postprocessed with xWRF before passing to ldcpy (e.g., ds = xr.open_dataset(wrf_file, engine=”netcdf4”).xwrf.postprocess()) -For now lat/lon info must be in the same file!

ldcpy.util.combine_datasets(ds_list)[source]

ldcpy.util.compare_stats(ds, varname: str, sets, significant_digits: int = 5, include_ssim: bool = False, weighted: bool = True, **calcs_kwargs)[source]

Print error summary statistics for multiple DataArrays (should just be a single time slice)

Parameters:

ds (xarray.Dataset) – An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension
varname (str) – The variable of interest in the dataset
sets (list of str) – The labels of the collection to compare (all will be compared to the first set)
significant_digits (int, optional) – The number of significant digits to use when printing stats (default 5)
include_ssim (bool, optional) – Whether or not to compute the image ssim - slow for 3D vars (default: False)
weighted (bool, optional) – Whether or not weight the means (default = True)
**calcs_kwargs – Additional keyword arguments passed through to the Datasetcalcs instance.

Returns:

out

Return type:

None

ldcpy.util.open_datasets(data_type, varnames, list_of_files, labels, weights=True, **kwargs)[source]

Open several different netCDF files, concatenate across a new ‘collection’ dimension, which can be accessed with the specified labels. Stores them in an xarray dataset which can be passed to the ldcpy plot functions.

Parameters:

data_type (string) – Current data types: :cam-fv, pop
varnames (list) – The variable(s) of interest to combine across input files (usually just one)
list_of_files (list) – The file paths for the netCDF file(s) to be opened
labels (list) – The respective label to access data from each netCDF file (also used in plotting fcns)
**kwargs – (optional) – Additional arguments passed on to xarray.open_mfdataset(). A list of available arguments can be found here: http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html

Returns:

out – a collection containing all the data from the list of files

Return type:

xarray.Dataset

Notes

wrf netcdf data must be postprocessed with xwrf, e.g. ds = xr.open_dataset(wrf_file, engine=”netcdf4”).xwrf.postprocess() So need to use collect_data instead.

ldcpy.util.save_metrics(full_ds, varname, set1, set2, time=0, lev=0, location='names.csv')[source]

full_dsxarray.Dataset: An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension
varnamestr: The variable of interest in the dataset
set1str: The collection label of the “control” data
set2str: The collection label of the (1st) data to compare
timeint, optional: The time index used t (default = 0)
timelev, optional: The level index of interest in a 3D dataset (default 0)

Returns:: out
Return type:: Number of failing metrics

ldcpy.util.subset_data(ds, subset=None, lat=None, lon=None, lev=None, start=None, end=None, time_dim_name=None, vertical_dim_name=None, lat_coord_name=None, lon_coord_name=None)[source]: Get a subset of the given dataArray, returns a dataArray

ldcpy.util.var_and_wt_coords(varname, ds_col)[source]

Module contents

Top-level module for ldcpy.