API Reference

This page provides an auto-generated summary of ldcpy’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

ldcpy Util (ldcpy.util)

ldcpy.util.check_metrics(ds, varname, set1, set2, ks_tol=0.05, pcc_tol=0.99999, spre_tol=5.0, ssim_tol=0.995, **calcs_kwargs)[source]

Check the K-S, Pearson Correlation, and Spatial Relative Error calcs

Parameters
  • ds (xarray.Dataset) – An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension

  • varname (str) – The variable of interest in the dataset

  • set1 (str) – The collection label of the “control” data

  • set2 (str) – The collection label of the (1st) data to compare

  • ks_tol (float, optional) – The p-value threshold (significance level) for the K-S test (default = .05)

  • pcc_tol (float, optional) – The default Pearson corrolation coefficient (default = .99999)

  • spre_tol (float, optional) – The percentage threshold for failing grid points in the spatial relative error test (default = 5.0).

  • ssim_tol (float, optional) – The threshold for the data ssim test (default = .995

  • **calcs_kwargs – Additional keyword arguments passed through to the Datasetcalcs instance.

Returns

out

Return type

Number of failing calcs

Notes

Check the K-S, Pearson Correlation, and Spatial Relative Error calcs from:

A. H. Baker, H. Xu, D. M. Hammerling, S. Li, and J. Clyne, “Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data”, in J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, Lecture Notes in Computer Science 10524, pp. 30–42, 2017 (doi:10.1007/978-3-319-67630-2_3).

Check the Data SSIM, which is a modification of SSIM calc from:

A.H. Baker, D.M. Hammerling, and T.L. Turton. “Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data”, Computer Graphics Forum 38(3), June 2019, pp. 517-528 (doi:10.1111/cgf.13707).

Default tolerances for the tests are:

K-S: fail if p-value < .05 (significance level) Pearson correlation coefficient: fail if coefficient < .99999 Spatial relative error: fail if > 5% of grid points fail relative error Data SSIM: fail if Data SSIM < .995

ldcpy.util.collect_datasets(data_type, varnames, list_of_ds, labels, **kwargs)[source]

Concatonate several different xarray datasets across a new “collection” dimension, which can be accessed with the specified labels. Stores them in an xarray dataset which can be passed to the ldcpy plot functions (Call this OR open_datasets() before plotting.)

Parameters
  • data_type (string) – Current data types: :cam-fv, pop

  • varnames (list) – The variable(s) of interest to combine across input files (usually just one)

  • list_of_datasets (list) – The datasets to be concatonated into a collection

  • labels (list) –

    The respective label to access data from each dataset (also used in plotting fcns)

    **kwargs : (optional) – Additional arguments passed on to xarray.concat(). A list of available arguments can be found here: https://xarray-test.readthedocs.io/en/latest/generated/xarray.concat.html

Returns

out – a collection containing all the data from the list datasets

Return type

xarray.Dataset

ldcpy.util.compare_stats(ds, varname: str, sets, significant_digits: int = 5, include_ssim: bool = False, weighted: bool = True, **calcs_kwargs)[source]

Print error summary statistics for multiple DataArrays (should just be a single time slice)

Parameters
  • ds (xarray.Dataset) – An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension

  • varname (str) – The variable of interest in the dataset

  • sets (list of str) – The labels of the collection to compare (all will be compared to the first set)

  • significant_digits (int, optional) – The number of significant digits to use when printing stats (default 5)

  • include_ssim (bool, optional) – Whether or not to compute the image ssim - slow for 3D vars (default: False)

  • weighted (bool, optional) – Whether or not weight the means (default = True)

  • **calcs_kwargs – Additional keyword arguments passed through to the Datasetcalcs instance.

Returns

out

Return type

None

ldcpy.util.open_datasets(data_type, varnames, list_of_files, labels, weights=True, **kwargs)[source]

Open several different netCDF files, concatenate across a new ‘collection’ dimension, which can be accessed with the specified labels. Stores them in an xarray dataset which can be passed to the ldcpy plot functions.

Parameters
  • data_type (string) – Current data types: :cam-fv, pop

  • varnames (list) – The variable(s) of interest to combine across input files (usually just one)

  • list_of_files (list) – The file paths for the netCDF file(s) to be opened

  • labels (list) – The respective label to access data from each netCDF file (also used in plotting fcns)

  • **kwargs – (optional) – Additional arguments passed on to xarray.open_mfdataset(). A list of available arguments can be found here: http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html

Returns

out – a collection containing all the data from the list of files

Return type

xarray.Dataset

ldcpy.util.save_metrics(full_ds, varname, set1, set2, time=0, lev=0, location='names.csv')[source]
full_dsxarray.Dataset

An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension

varnamestr

The variable of interest in the dataset

set1str

The collection label of the “control” data

set2str

The collection label of the (1st) data to compare

timeint, optional

The time index used t (default = 0)

timelev, optional

The level index of interest in a 3D dataset (default 0)

Returns

out

Return type

Number of failing metrics

ldcpy.util.subset_data(ds, subset=None, lat=None, lon=None, lev=None, start=None, end=None, time_dim_name='time', vertical_dim_name=None, lat_coord_name=None, lon_coord_name=None)[source]

Get a subset of the given dataArray, returns a dataArray

ldcpy Plot (ldcpy.plot)

class ldcpy.plot.calcsPlot(ds, varname, calc, sets, group_by=None, scale='linear', calc_type='raw', plot_type='spatial', transform='none', subset=None, approx_lat=None, approx_lon=None, lev=0, color='coolwarm', standardized_err=False, quantile=None, contour_levs=24, short_title=False, axes_symmetric=False, legend_loc='upper right', vert_plot=False, tex_format=False, legend_offset=None, weighted=True, basic_plot=False, cmax=None, cmin=None)[source]

This class contains code to plot calcs in an xarray Dataset that has either ‘lat’ and ‘lon’ dimensions, or a ‘time’ dimension.

time_series_plot(da_sets, titles)[source]

time series plot

ldcpy.plot.plot(ds, varname, calc, sets, group_by=None, scale='linear', calc_type='raw', plot_type='spatial', transform='none', subset=None, lat=None, lon=None, lev=0, color='coolwarm', quantile=None, start=None, end=None, short_title=False, axes_symmetric=False, legend_loc='upper right', vert_plot=False, tex_format=False, legend_offset=None, weighted=True, basic_plot=False, cmax=None, cmin=None)[source]

Plots the data given an xarray dataset

Parameters
  • ds (xarray.Dataset) – The input dataset

  • varname (str) – The name of the variable to be plotted

  • calc (str) –

    The name of the calc to be plotted (must match a property name in the Datasetcalcs class in ldcpy.plot, for more information about the available calcs see ldcpy.Datasetcalcs) Acceptable values include:

    • ns_con_var

    • ew_con_var

    • mean

    • std

    • variance

    • prob_positive

    • prob_negative

    • odds_positive

    • zscore

    • mean_abs

    • mean_squared

    • rms

    • sum

    • sum_squared

    • corr_lag1

    • quantile

    • lag1

    • standardized_mean

    • ann_harmonic_ratio

    • pooled_variance_ratio

  • sets (list <str>) – The labels of the dataset to gather calcs from

  • group_by (str) –

    how to group the data in time series plots. Valid groupings:

    • time.day

    • time.dayofyear

    • time.month

    • time.year

  • scale (str, optional) –

    time-series y-axis plot transformation. (default “linear”) Valid options:

    • linear

    • log

  • calc_type (str, optional) –

    The type of operation to be performed on the calcs. (default ‘raw’) Valid options:

    • raw: the unaltered calc values

    • diff: the difference between the calc values in the first set and every other set

    • ratio: the ratio of the calc values in (2nd, 3rd, 4th… sets/1st set)

    • calc_of_diff: the calc value computed on the difference between the first set and every other set

  • plot_type (str , optional) –

    The type of plot to be created. (default ‘spatial’) Valid options:

    • spatial: a plot of the world with values at each lat and lon point (takes the mean across the time dimension)

    • time-series: A time-series plot of the data (computed by taking the mean across the lat and lon dimensions)

    • histogram: A histogram of the time-series data

  • transform (str, optional) –

    data transformation. (default ‘none’) Valid options:

    • none

    • log

  • subset (str, optional) –

    subset of the data to gather calcs on (default None). Valid options:

    • first5: the first 5 days of data

    • DJF: data from the months December, January, February

    • MAM: data from the months March, April, May

    • JJA: data from the months June, July, August

    • SON: data from the months September, October, November

  • lat (float, optional) – The latitude of the data to gather calcs on (default None).

  • lon (float , optional) – The longitude of the data to gather calcs on (default None).

  • lev (float, optional) – The level of the data to gather calcs on (used if plotting from a 3d data set), (default 0).

  • color (str, optional) – The color scheme for spatial plots, (default ‘coolwarm’). see https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html for more options

  • quantile (float, optional) – A value between 0 and 1 required if calc=”quantile”, corresponding to the desired quantile to gather, (default 0.5).

  • start (int, optional) – A value between 0 and the number of time slices indicating the start time of a subset, (default None).

  • end (int, optional) – A value between 0 and the number of time slices indicating the end time of a subset, (default None)

  • calc_ssim (bool, optional) – Whether or not to calculate the ssim (structural similarity index) between two plots (only applies to plot_type = ‘spatial’), (default False)

  • short_title (bool, optional) – If True, use a shortened title in the plot output (default False).

  • axes_symmetric (bool, optional) – Whether or not to make the colorbar axes symmetric about zero (used in a spatial plot) (default False)

  • legend_loc (str, optional) – The location to put the legend in a time-series plot in single-column format (plot_type = “time_series”, vert_plot=True) (default “upper right”)

  • vert_plot (bool, optional) – If true, forces plots into a single column format and enlarges text. (default False)

  • tex_format (bool, optional) – Whether to interpret all plot output strings as latex formatting (default False)

  • legend_offset (2-tuple, optional) – The x- and y- offset of the legend. Moves the corner of the legend specified by legend_loc to the specified location specified (where (0,0) is the bottom left corner of the plot and (1,1) is the top right corner). Only affects time-series, histogram, and periodogram plots.

Returns

out

Return type

None

ldcpy.plot.tex_escape(text)[source]
Parameters

text – a plain text message

Returns

the message escaped to appear correctly in LaTeX

ldcpy Metrics (ldcpy.metrics)

class ldcpy.calcs.Datasetcalcs(ds: DataArray, data_type: str, aggregate_dims: list, time_dim_name: str = 'time', lat_dim_name: Optional[str] = None, lon_dim_name: Optional[str] = None, vert_dim_name: Optional[str] = None, lat_coord_name: Optional[str] = None, lon_coord_name: Optional[str] = None, q: float = 0.5, weighted=True)[source]

This class contains calcs for each point of a dataset after aggregating across one or more dimensions, and a method to access these calcs. Expects a DataArray.

get_calc(name: str, q: Optional[int] = 0.5, grouping: Optional[str] = None, ddof=1)[source]

Gets a calc aggregated across one or more dimensions of the dataset

Parameters
  • name (str) – The name of the calc (must be identical to a property name)

  • q (float, optional) – (default 0.5)

Returns

out – A DataArray of the same size and dimensions the original dataarray, minus those dimensions that were aggregated across.

Return type

xarray.DataArray

get_single_calc(name: str)[source]

Gets a calc consisting of a single float value

Parameters

name (str) – the name of the calc (must be identical to a property name)

Returns

out – The calc value

Return type

float

property annual_harmonic_relative_ratio: DataArray

The annual harmonic relative to the average periodogram value in a neighborhood of 50 frequencies around the annual frequency NOTE: This assumes the values along the “time” dimension are equally spaced. NOTE: This calc returns a lat-lon array regardless of aggregate dimensions, so can only be used in a spatial plot.

property annual_harmonic_relative_ratio_pct_sig: ndarray

The percentage of points past the significance cutoff (p value <= 0.01) for the annual harmonic relative to the average periodogram value in a neighborhood of 50 frequencies around the annual frequency

property cdf: DataArray

The empirical CDF of the dataset.

property entropy: DataArray

An estimate for the entropy of the data (using gzip) # lower is better (1.0 means random - no compression possible)

property ew_con_var: DataArray

The East-West Contrast Variance averaged along the aggregate dimensions

property lag1: DataArray

The deseasonalized lag-1 autocorrelation value by day of year NOTE: This calc returns an array of spatial values as the data set regardless of aggregate dimensions, so can only be plotted in a spatial plot.

property lag1_first_difference: DataArray

The deseasonalized lag-1 autocorrelation value of the first difference of the data by day of year NOTE: This calc returns an array of spatial values as the data set regardless of aggregate dimensions, so can only be plotted in a spatial plot.

property lat_autocorr: DataArray

the correlation of a variable with itself shifted in the latitude dimension

Type

Autocorrelation

property lev_autocorr: DataArray

the correlation of a variable with itself shifted in the vertical dimension

Type

Autocorrelation

property lon_autocorr: DataArray

the correlation of a variable with itself shifted in the longitude dimension

Type

Autocorrelation

property mae_day_max: DataArray

The day of maximum mean absolute value at the point. NOTE: only available in spatial and spatial comparison plots

property mean: DataArray

The mean along the aggregate dimensions

property mean_abs: DataArray

The mean of the absolute errors along the aggregate dimensions

property mean_squared: DataArray

The absolute value of the mean along the aggregate dimensions

property most_repeated: DataArray

Most repeated value in dataset

property most_repeated_percent: DataArray

Most repeated value in dataset

property n_s_first_differences: DataArray

First differences along the west-east direction

property ns_con_var: DataArray

The North-South Contrast Variance averaged along the aggregate dimensions

property num_negative: DataArray

The probability that a point is negative

property num_positive: DataArray

The probability that a point is positive

property num_zero: DataArray

The probability that a point is zero

property odds_positive: DataArray

The odds that a point is positive = prob_positive/(1-prob_positive)

property percent_unique: DataArray

Percentage of unique values in the dataset

property pooled_variance: DataArray

The overall variance of the dataset

property pooled_variance_ratio: DataArray

The pooled variance along the aggregate dimensions

property prob_negative: DataArray

The probability that a point is negative

property prob_positive: DataArray

The probability that a point is positive

property range: DataArray

The range of the dataset

property root_mean_squared: DataArray

The absolute value of the mean along the aggregate dimensions

property standardized_mean: DataArray

The mean at each point along the aggregate dimensions divided by the standard deviation NOTE: will always be 0 if aggregating over all dimensions

property std: DataArray

The standard deviation along the aggregate dimensions

property variance: DataArray

The variance along the aggregate dimensions

property w_e_derivative: DataArray

Derivative of dataset from west-east

property w_e_first_differences: DataArray

First differences along the west-east direction

property zscore: DataArray

The z-score of a point averaged along the aggregate dimensions under the null hypothesis that the true mean is zero. NOTE: currently assumes we are aggregating along the time dimension so is only suitable for a spatial plot.

property zscore_cutoff: ndarray

The Z-Score cutoff for a point to be considered significant

property zscore_percent_significant: ndarray

The percent of points where the zscore is considered significant

class ldcpy.calcs.Diffcalcs(ds1: DataArray, ds2: DataArray, data_type: str, aggregate_dims: Optional[list] = None, spre_tol: float = 0.0001, k1: float = 0.01, k2: float = 0.03, **calcs_kwargs)[source]

This class contains calcs on the overall dataset that require more than one input dataset to compute

get_diff_calc(name: str, color: Optional[str] = 'coolwarm')[source]

Gets a calc on the dataset that requires more than one input dataset

Parameters

name (str) – The name of the calc (must be identical to a property name)

Returns

out

Return type

float

property covariance: DataArray

The covariance between the two datasets

property ks_p_value

The Kolmogorov-Smirnov p-value

property max_spatial_rel_error

We compute the relative error at each grid point and return the maximun.

property normalized_max_pointwise_error

The absolute value of the maximum pointwise difference, normalized by the range of values for the first set

property normalized_root_mean_squared

The absolute value of the mean along the aggregate dimensions, normalized by the range of values for the first set

property pearson_correlation_coefficient

returns the pearson correlation coefficient between the two datasets

property spatial_rel_error

At each grid point, we compute the relative error. Then we report the percentage of grid point whose relative error is above the specified tolerance (1e-4 by default).

property ssim_value

We compute the SSIM (structural similarity index) on the visualization of the spatial data. This creates two plots and uses the standard SSIM.

property ssim_value_fp_fast

Faster implementation then ssim_value_fp_slow (this is the default DSSIM option).

property ssim_value_fp_slow

We compute the SSIM (structural similarity index) on the spatial data - using the data itself (we do not create an image) - this is the slower non-matrix implementation that is good for experiementing (not in practice).