API Reference

This page provides an auto-generated summary of ldcpy’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

ldcpy Util (ldcpy.util)

ldcpy.util.check_metrics(ds, varname, set1, set2, ks_tol=0.05, pcc_tol=0.99999, spre_tol=5.0, ssim_tol=0.995, **calcs_kwargs)[source]

Check the K-S, Pearson Correlation, and Spatial Relative Error calcs

Parameters
  • ds (xarray.Dataset) – An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension

  • varname (str) – The variable of interest in the dataset

  • set1 (str) – The collection label of the “control” data

  • set2 (str) – The collection label of the (1st) data to compare

  • ks_tol (float, optional) – The p-value threshold (significance level) for the K-S test (default = .05)

  • pcc_tol (float, optional) – The default Pearson corrolation coefficient (default = .99999)

  • spre_tol (float, optional) – The percentage threshold for failing grid points in the spatial relative error test (default = 5.0).

  • ssim_tol (float, optional) – The threshold for the data ssim test (default = .995

  • **calcs_kwargs – Additional keyword arguments passed through to the Datasetcalcs instance.

Returns

out

Return type

Number of failing calcs

Notes

Check the K-S, Pearson Correlation, and Spatial Relative Error calcs from:

A. H. Baker, H. Xu, D. M. Hammerling, S. Li, and J. Clyne, “Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data”, in J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, Lecture Notes in Computer Science 10524, pp. 30–42, 2017 (doi:10.1007/978-3-319-67630-2_3).

Check the Data SSIM, which is a modification of SSIM calc from:

A.H. Baker, D.M. Hammerling, and T.L. Turton. “Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data”, Computer Graphics Forum 38(3), June 2019, pp. 517-528 (doi:10.1111/cgf.13707).

K-S: fail if p-value < .05 (significance level) Pearson correlation coefficient: fail if coefficient < .99999 Spatial relative error: fail if > 5% of grid points fail relative error Data SSIM: fail if Data SSIM < .995

ldcpy.util.collect_datasets(data_type, varnames, list_of_ds, labels, **kwargs)[source]

Concatonate several different xarray datasets across a new “collection” dimension, which can be accessed with the specified labels. Stores them in an xarray dataset which can be passed to the ldcpy plot functions (Call this OR open_datasets() before plotting.)

Parameters
  • varnames (list) – The variable(s) of interest to combine across input files (usually just one)

  • list_of_datasets (list) – The datasets to be concatonated into a collection

  • labels (list) –

    The respective label to access data from each dataset (also used in plotting fcns)

    **kwargs : (optional) – Additional arguments passed on to xarray.concat(). A list of available arguments can be found here: https://xarray-test.readthedocs.io/en/latest/generated/xarray.concat.html

Returns

out – a collection containing all the data from the list datasets

Return type

xarray.Dataset

ldcpy.util.compare_stats(ds, varname: str, sets, significant_digits: int = 5, include_ssim: bool = False, weighted: bool = True, **calcs_kwargs)[source]

Print error summary statistics for multiple DataArrays (should just be a single time slice)

Parameters
  • ds (xarray.Dataset) – An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension

  • varname (str) – The variable of interest in the dataset

  • sets (list of str) – The labels of the collection to compare (all will be compared to the first set)

  • significant_digits (int, optional) – The number of significant digits to use when printing stats (default 5)

  • include_ssim (bool, optional) – Whether or not to compute the image ssim - slow for 3D vars (default: False)

  • weighted (bool, optional) – Whether or not weight the means (default = True)

  • **calcs_kwargs – Additional keyword arguments passed through to the Datasetcalcs instance.

Returns

out

Return type

None

ldcpy.util.open_datasets(data_type, varnames, list_of_files, labels, **kwargs)[source]

Open several different netCDF files, concatenate across a new ‘collection’ dimension, which can be accessed with the specified labels. Stores them in an xarray dataset which can be passed to the ldcpy plot functions.

Parameters
  • varnames (list) – The variable(s) of interest to combine across input files (usually just one)

  • list_of_files (list) – The file paths for the netCDF file(s) to be opened

  • labels (list) – The respective label to access data from each netCDF file (also used in plotting fcns)

  • **kwargs – (optional) – Additional arguments passed on to xarray.open_mfdataset(). A list of available arguments can be found here: http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html

Returns

out – a collection containing all the data from the list of files

Return type

xarray.Dataset

ldcpy.util.subset_data(ds, subset=None, lat=None, lon=None, lev=None, start=None, end=None, time_dim_name='time', vertical_dim_name=None, lat_coord_name=None, lon_coord_name=None)[source]

Get a subset of the given dataArray, returns a dataArray

ldcpy Plot (ldcpy.plot)

class ldcpy.plot.calcsPlot(ds, varname, calc, sets, group_by=None, scale='linear', calc_type='raw', plot_type='spatial', transform='none', subset=None, approx_lat=None, approx_lon=None, lev=0, color='coolwarm', standardized_err=False, quantile=None, calc_ssim=False, contour_levs=24, short_title=False, axes_symmetric=False, legend_loc='upper right', vert_plot=False, tex_format=False, legend_offset=None, weighted=True)[source]

This class contains code to plot calcs in an xarray Dataset that has either ‘lat’ and ‘lon’ dimensions, or a ‘time’ dimension.

time_series_plot(da_sets, titles)[source]

time series plot

ldcpy.plot.plot(ds, varname, calc, sets, group_by=None, scale='linear', calc_type='raw', plot_type='spatial', transform='none', subset=None, lat=None, lon=None, lev=0, color='coolwarm', quantile=None, start=None, end=None, calc_ssim=False, short_title=False, axes_symmetric=False, legend_loc='upper right', vert_plot=False, tex_format=False, legend_offset=None, weighted=True)[source]

Plots the data given an xarray dataset

Parameters
  • ds (xarray.Dataset) – The input dataset

  • varname (str) – The name of the variable to be plotted

  • calc (str) –

    The name of the calc to be plotted (must match a property name in the Datasetcalcs class in ldcpy.plot, for more information about the available calcs see ldcpy.Datasetcalcs) Acceptable values include:

    • ns_con_var

    • ew_con_var

    • mean

    • std

    • variance

    • prob_positive

    • prob_negative

    • odds_positive

    • zscore

    • mean_abs

    • mean_squared

    • rms

    • sum

    • sum_squared

    • corr_lag1

    • quantile

    • lag1

    • standardized_mean

    • ann_harmonic_ratio

    • pooled_variance_ratio

  • sets (list <str>) – The labels of the dataset to gather calcs from

  • group_by (str) –

    how to group the data in time series plots. Valid groupings:

    • time.day

    • time.dayofyear

    • time.month

    • time.year

  • scale (str, optional) –

    time-series y-axis plot transformation. (default “linear”) Valid options:

    • linear

    • log

  • calc_type (str, optional) –

    The type of operation to be performed on the calcs. (default ‘raw’) Valid options:

    • raw: the unaltered calc values

    • diff: the difference between the calc values in the first set and every other set

    • ratio: the ratio of the calc values in (2nd, 3rd, 4th… sets/1st set)

    • calc_of_diff: the calc value computed on the difference between the first set and every other set

  • plot_type (str , optional) –

    The type of plot to be created. (default ‘spatial’) Valid options:

    • spatial: a plot of the world with values at each lat and lon point (takes the mean across the time dimension)

    • time-series: A time-series plot of the data (computed by taking the mean across the lat and lon dimensions)

    • histogram: A histogram of the time-series data

  • transform (str, optional) –

    data transformation. (default ‘none’) Valid options:

    • none

    • log

  • subset (str, optional) –

    subset of the data to gather calcs on (default None). Valid options:

    • first5: the first 5 days of data

    • DJF: data from the months December, January, February

    • MAM: data from the months March, April, May

    • JJA: data from the months June, July, August

    • SON: data from the months September, October, November

  • lat (float, optional) – The latitude of the data to gather calcs on (default None).

  • lon (float , optional) – The longitude of the data to gather calcs on (default None).

  • lev (float, optional) – The level of the data to gather calcs on (used if plotting from a 3d data set), (default 0).

  • color (str, optional) – The color scheme for spatial plots, (default ‘coolwarm’). see https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html for more options

  • quantile (float, optional) – A value between 0 and 1 required if calc=”quantile”, corresponding to the desired quantile to gather, (default 0.5).

  • start (int, optional) – A value between 0 and the number of time slices indicating the start time of a subset, (default None).

  • end (int, optional) – A value between 0 and the number of time slices indicating the end time of a subset, (default None)

  • calc_ssim (bool, optional) – Whether or not to calculate the ssim (structural similarity index) between two plots (only applies to plot_type = ‘spatial’), (default False)

  • short_title (bool, optional) – If True, use a shortened title in the plot output (default False).

  • axes_symmetric (bool, optional) – Whether or not to make the colorbar axes symmetric about zero (used in a spatial plot) (default False)

  • legend_loc (str, optional) – The location to put the legend in a time-series plot in single-column format (plot_type = “time_series”, vert_plot=True) (default “upper right”)

  • vert_plot (bool, optional) – If true, forces plots into a single column format and enlarges text. (default False)

  • tex_format (bool, optional) – Whether to interpret all plot output strings as latex formatting (default False)

  • legend_offset (2-tuple, optional) – The x- and y- offset of the legend. Moves the corner of the legend specified by legend_loc to the specified location specified (where (0,0) is the bottom left corner of the plot and (1,1) is the top right corner). Only affects time-series, histogram, and periodogram plots.

Returns

out

Return type

None

ldcpy.plot.tex_escape(text)[source]
Parameters

text – a plain text message

Returns

the message escaped to appear correctly in LaTeX

ldcpy Metrics (ldcpy.metrics)

class ldcpy.calcs.Datasetcalcs(ds: xarray.core.dataarray.DataArray, aggregate_dims: list, time_dim_name: str = 'time', lat_dim_name: Optional[str] = None, lon_dim_name: Optional[str] = None, vert_dim_name: Optional[str] = None, lat_coord_name: Optional[str] = None, lon_coord_name: Optional[str] = None, q: float = 0.5, spre_tol: float = 0.0001, weighted=True)[source]

This class contains calcs for each point of a dataset after aggregating across one or more dimensions, and a method to access these calcs. Expects a DataArray.

get_calc(name: str, q: Optional[int] = 0.5, grouping: Optional[str] = None, ddof=1)[source]

Gets a calc aggregated across one or more dimensions of the dataset

Parameters
  • name (str) – The name of the calc (must be identical to a property name)

  • q (float, optional) – (default 0.5)

Returns

out – A DataArray of the same size and dimensions the original dataarray, minus those dimensions that were aggregated across.

Return type

xarray.DataArray

get_single_calc(name: str)[source]

Gets a calc consisting of a single float value

Parameters

name (str) – the name of the calc (must be identical to a property name)

Returns

out – The calc value

Return type

float

property annual_harmonic_relative_ratio: xarray.core.dataarray.DataArray

The annual harmonic relative to the average periodogram value in a neighborhood of 50 frequencies around the annual frequency NOTE: This assumes the values along the “time” dimension are equally spaced. NOTE: This calc returns a lat-lon array regardless of aggregate dimensions, so can only be used in a spatial plot.

property annual_harmonic_relative_ratio_pct_sig: numpy.ndarray

The percentage of points past the significance cutoff (p value <= 0.01) for the annual harmonic relative to the average periodogram value in a neighborhood of 50 frequencies around the annual frequency

property ew_con_var: xarray.core.dataarray.DataArray

The East-West Contrast Variance averaged along the aggregate dimensions

property lag1: xarray.core.dataarray.DataArray

The deseasonalized lag-1 autocorrelation value by day of year NOTE: This calc returns an array of spatial values as the data set regardless of aggregate dimensions, so can only be plotted in a spatial plot.

property lag1_first_difference: xarray.core.dataarray.DataArray

The deseasonalized lag-1 autocorrelation value of the first difference of the data by day of year NOTE: This calc returns an array of spatial values as the data set regardless of aggregate dimensions, so can only be plotted in a spatial plot.

property mae_day_max: xarray.core.dataarray.DataArray

The day of maximum mean absolute value at the point. NOTE: only available in spatial and spatial comparison plots

property mean: xarray.core.dataarray.DataArray

The mean along the aggregate dimensions

property mean_abs: xarray.core.dataarray.DataArray

The mean of the absolute errors along the aggregate dimensions

property mean_squared: xarray.core.dataarray.DataArray

The absolute value of the mean along the aggregate dimensions

property ns_con_var: xarray.core.dataarray.DataArray

The North-South Contrast Variance averaged along the aggregate dimensions

property num_negative: xarray.core.dataarray.DataArray

The probability that a point is negative

property num_positive: xarray.core.dataarray.DataArray

The probability that a point is positive

property num_zero: xarray.core.dataarray.DataArray

The probability that a point is zero

property odds_positive: xarray.core.dataarray.DataArray

The odds that a point is positive = prob_positive/(1-prob_positive)

property pooled_variance: xarray.core.dataarray.DataArray

The overall variance of the dataset

property pooled_variance_ratio: xarray.core.dataarray.DataArray

The pooled variance along the aggregate dimensions

property prob_negative: xarray.core.dataarray.DataArray

The probability that a point is negative

property prob_positive: xarray.core.dataarray.DataArray

The probability that a point is positive

property root_mean_squared: xarray.core.dataarray.DataArray

The absolute value of the mean along the aggregate dimensions

property standardized_mean: xarray.core.dataarray.DataArray

The mean at each point along the aggregate dimensions divided by the standard deviation NOTE: will always be 0 if aggregating over all dimensions

property std: xarray.core.dataarray.DataArray

The standard deviation along the aggregate dimensions

property variance: xarray.core.dataarray.DataArray

The variance along the aggregate dimensions

property zscore: xarray.core.dataarray.DataArray

The z-score of a point averaged along the aggregate dimensions under the null hypothesis that the true mean is zero. NOTE: currently assumes we are aggregating along the time dimension so is only suitable for a spatial plot.

property zscore_cutoff: numpy.ndarray

The Z-Score cutoff for a point to be considered significant

property zscore_percent_significant: numpy.ndarray

The percent of points where the zscore is considered significant

class ldcpy.calcs.Diffcalcs(ds1: xarray.core.dataarray.DataArray, ds2: xarray.core.dataarray.DataArray, aggregate_dims: Optional[list] = None, **calcs_kwargs)[source]

This class contains calcs on the overall dataset that require more than one input dataset to compute

get_diff_calc(name: str)[source]

Gets a calc on the dataset that requires more than one input dataset

Parameters

name (str) – The name of the calc (must be identical to a property name)

Returns

out

Return type

float

property covariance: xarray.core.dataarray.DataArray

The covariance between the two datasets

property ks_p_value

The Kolmogorov-Smirnov p-value

property max_spatial_rel_error

We compute the relative error at each grid point and return the maximun.

property normalized_max_pointwise_error

The absolute value of the maximum pointwise difference, normalized by the range of values for the first set

property normalized_root_mean_squared

The absolute value of the mean along the aggregate dimensions, normalized by the range of values for the first set

property pearson_correlation_coefficient

returns the pearson correlation coefficient between the two datasets

property spatial_rel_error

At each grid point, we compute the relative error. Then we report the percentage of grid point whose relative error is above the specified tolerance (1e-4 by default).

property ssim_value

We compute the SSIM (structural similarity index) on the visualization of the spatial data.

property ssim_value_fp_fast

Faster implementation then ssim_value_fp_orig

property ssim_value_fp_fast2

Faster implementation then ssim_value_fp_orig Use other version below - not this one

property ssim_value_fp_old

To mimic what zchecker does - the ssim on the fp data with original constants and no scaling. This will return Nan on POP data.

property ssim_value_fp_orig

We compute the SSIM (structural similarity index) on the spatial data - using the data itself (we do not create an image).

Here we scale from [0,1] - then quantize to 256 bins