ldcpy package

Submodules

ldcpy.metrics module

ldcpy.plot module

class ldcpy.plot.calcsPlot(ds, varname, calc, sets, group_by=None, scale='linear', calc_type='raw', plot_type='spatial', transform='none', subset=None, approx_lat=None, approx_lon=None, lev=0, color='coolwarm', standardized_err=False, quantile=None, contour_levs=24, short_title=False, axes_symmetric=False, legend_loc='upper right', vert_plot=False, tex_format=False, legend_offset=None, weighted=True, basic_plot=False, cmax=None, cmin=None)[source]

Bases: object

This class contains code to plot calcs in an xarray Dataset that has either ‘lat’ and ‘lon’ dimensions, or a ‘time’ dimension.

get_calc_label(calc, data, data_type)[source]
get_calcs(da, data_type)[source]
get_plot_data(raw_data_1, raw_data_2=None)[source]
get_title(calc_name, c_name=None)[source]
hist_plot(plot_data, title)[source]
periodogram_plot(plot_data, title)[source]
plot_1d(plot_data, title)[source]
spatial_plot(da_sets, titles, data_type)[source]
time_series_plot(da_sets, titles, time_dim)[source]

time series plot

verify_plot_parameters()[source]
ldcpy.plot.plot(ds, varname, calc, sets, group_by=None, scale='linear', calc_type='raw', plot_type='spatial', transform='none', subset=None, lat=None, lon=None, lev=0, color='coolwarm', quantile=None, start=None, end=None, short_title=False, axes_symmetric=False, legend_loc='upper right', vert_plot=False, tex_format=False, legend_offset=None, weighted=None, basic_plot=False, cmax=None, cmin=None)[source]

Plots the data given an xarray dataset

Parameters:
  • ds (xarray.Dataset) – The input dataset

  • varname (str) – The name of the variable to be plotted

  • calc (str) –

    The name of the calc to be plotted (must match a property name in the Datasetcalcs class in ldcpy.plot, for more information about the available calcs see ldcpy.Datasetcalcs) Acceptable values include:

    • ns_con_var

    • ew_con_var

    • mean

    • std

    • variance

    • prob_positive

    • prob_negative

    • odds_positive

    • zscore

    • mean_abs

    • mean_squared

    • rms

    • sum

    • sum_squared

    • quantile

    • lag1

    • standardized_mean

    • ann_harmonic_ratio

    • pooled_variance_ratio

  • sets (list <str>) – The labels of the dataset to gather calcs from

  • group_by (str) –

    how to group the data in time series plots. Valid groupings:

    • time.day

    • time.dayofyear

    • time.month

    • time.year

  • scale (str, optional) –

    time-series y-axis plot transformation. (default “linear”) Valid options:

    • linear

    • log

  • calc_type (str, optional) –

    The type of operation to be performed on the calcs. (default ‘raw’) Valid options:

    • raw: the unaltered calc values

    • diff: the difference between the calc values in the first set and every other set

    • ratio: the ratio of the calc values in (2nd, 3rd, 4th… sets/1st set)

    • calc_of_diff: the calc value computed on the difference between the first set and every other set

  • plot_type (str , optional) –

    The type of plot to be created. (default ‘spatial’) Valid options:

    • spatial: a plot of the world with values at each lat and lon point (takes the mean across the time dimension)

    • time-series: A time-series plot of the data (computed by taking the mean across the lat and lon dimensions)

    • histogram: A histogram of the time-series data

  • transform (str, optional) –

    data transformation. (default ‘none’) Valid options:

    • none

    • log

  • subset (str, optional) –

    subset of the data to gather calcs on (default None). Valid options:

    • first5: the first 5 days of data

    • DJF: data from the months December, January, February

    • MAM: data from the months March, April, May

    • JJA: data from the months June, July, August

    • SON: data from the months September, October, November

  • lat (float, optional) – The latitude of the data to gather calcs on (default None).

  • lon (float , optional) – The longitude of the data to gather calcs on (default None).

  • lev (float, optional) – The level of the data to gather calcs on (used if plotting from a 3d data set), (default 0).

  • color (str, optional) – The color scheme for spatial plots, (default ‘coolwarm’). see https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html for more options

  • quantile (float, optional) – A value between 0 and 1 required if calc=”quantile”, corresponding to the desired quantile to gather, (default 0.5).

  • start (int, optional) – A value between 0 and the number of time slices indicating the start time of a subset, (default None).

  • end (int, optional) – A value between 0 and the number of time slices indicating the end time of a subset, (default None)

  • calc_ssim (bool, optional) – Whether or not to calculate the ssim (structural similarity index) between two plots (only applies to plot_type = ‘spatial’), (default False)

  • short_title (bool, optional) – If True, use a shortened title in the plot output (default False).

  • axes_symmetric (bool, optional) – Whether or not to make the colorbar axes symmetric about zero (used in a spatial plot) (default False)

  • legend_loc (str, optional) – The location to put the legend in a time-series plot in single-column format (plot_type = “time_series”, vert_plot=True) (default “upper right”)

  • vert_plot (bool, optional) – If true, forces plots into a single column format and enlarges text. (default False)

  • tex_format (bool, optional) – Whether to interpret all plot output strings as latex formatting (default False)

  • legend_offset (2-tuple, optional) – The x- and y- offset of the legend. Moves the corner of the legend specified by legend_loc to the specified location specified (where (0,0) is the bottom left corner of the plot and (1,1) is the top right corner). Only affects time-series, histogram, and periodogram plots.

Returns:

out

Return type:

None

ldcpy.plot.tex_escape(text)[source]
Parameters:

text – a plain text message

Returns:

the message escaped to appear correctly in LaTeX

ldcpy.util module

ldcpy.util.check_metrics(ds, varname, set1, set2, ks_tol=0.05, pcc_tol=0.99999, spre_tol=5.0, ssim_tol=0.995, **calcs_kwargs)[source]

Check the K-S, Pearson Correlation, and Spatial Relative Error calcs

Parameters:
  • ds (xarray.Dataset) – An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension

  • varname (str) – The variable of interest in the dataset

  • set1 (str) – The collection label of the “control” data

  • set2 (str) – The collection label of the (1st) data to compare

  • ks_tol (float, optional) – The p-value threshold (significance level) for the K-S test (default = .05)

  • pcc_tol (float, optional) – The default Pearson corrolation coefficient (default = .99999)

  • spre_tol (float, optional) – The percentage threshold for failing grid points in the spatial relative error test (default = 5.0).

  • ssim_tol (float, optional) – The threshold for the data ssim test (default = .995

  • **calcs_kwargs – Additional keyword arguments passed through to the Datasetcalcs instance.

Returns:

out

Return type:

Number of failing calcs

Notes

Check the K-S, Pearson Correlation, and Spatial Relative Error calcs from:

A. H. Baker, H. Xu, D. M. Hammerling, S. Li, and J. Clyne, “Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data”, in J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, Lecture Notes in Computer Science 10524, pp. 30–42, 2017 (doi:10.1007/978-3-319-67630-2_3).

Check the Data SSIM, which is a modification of SSIM calc from:

A.H. Baker, D.M. Hammerling, and T.L. Turton. “Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data”, Computer Graphics Forum 38(3), June 2019, pp. 517-528 (doi:10.1111/cgf.13707).

Default tolerances for the tests are:

K-S: fail if p-value < .05 (significance level) Pearson correlation coefficient: fail if coefficient < .99999 Spatial relative error: fail if > 5% of grid points fail relative error Data SSIM: fail if Data SSIM < .995

ldcpy.util.collect_datasets(data_type, varnames, list_of_ds, labels, coords_ds=None, file_sizes=None, **kwargs)[source]

Concatonate several different xarray datasets across a new “collection” dimension, which can be accessed with the specified labels. Stores them in an xarray dataset which can be passed to the ldcpy plot functions (Call this OR open_datasets() before plotting.)

Parameters:
  • data_type (string) – Current data types: :cam-fv, pop, wrf

  • varnames (list) – The variable(s) of interest to combine across input files (usually just one)

  • list_of_datasets (list) – The xarray datasets to be concatonated into a collection

  • labels (list) – The respective label to access data from each dataset (also used in plotting fcns)

  • coords_ds (xarray dataset) – (optional) Specify an additional file that contains lat/lon corrds (common for WRF data)

  • file_sizes (list) – (optional) sizes of files that each dataset corresponds to (used to print in compare_stats table

  • **kwargs – (optional) – Additional arguments passed on to xarray.concat(). A list of available arguments can be found here: https://xarray-test.readthedocs.io/en/latest/generated/xarray.concat.html

Returns:

out – a collection containing all the data from the list datasets

Return type:

xarray.Dataset

Notes

-WRF data must be postprocessed with xWRF before passing to ldcpy (e.g., ds = xr.open_dataset(wrf_file, engine=”netcdf4”).xwrf.postprocess()) -For now lat/lon info must be in the same file!

ldcpy.util.combine_datasets(ds_list)[source]
ldcpy.util.compare_stats(ds, varname: str, sets, significant_digits: int = 5, include_ssim: bool = False, weighted: bool = True, **calcs_kwargs)[source]

Print error summary statistics for multiple DataArrays (should just be a single time slice)

Parameters:
  • ds (xarray.Dataset) – An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension

  • varname (str) – The variable of interest in the dataset

  • sets (list of str) – The labels of the collection to compare (all will be compared to the first set)

  • significant_digits (int, optional) – The number of significant digits to use when printing stats (default 5)

  • include_ssim (bool, optional) – Whether or not to compute the image ssim - slow for 3D vars (default: False)

  • weighted (bool, optional) – Whether or not weight the means (default = True)

  • **calcs_kwargs – Additional keyword arguments passed through to the Datasetcalcs instance.

Returns:

out

Return type:

None

ldcpy.util.open_datasets(data_type, varnames, list_of_files, labels, weights=True, **kwargs)[source]

Open several different netCDF files, concatenate across a new ‘collection’ dimension, which can be accessed with the specified labels. Stores them in an xarray dataset which can be passed to the ldcpy plot functions.

Parameters:
  • data_type (string) – Current data types: :cam-fv, pop

  • varnames (list) – The variable(s) of interest to combine across input files (usually just one)

  • list_of_files (list) – The file paths for the netCDF file(s) to be opened

  • labels (list) – The respective label to access data from each netCDF file (also used in plotting fcns)

  • **kwargs – (optional) – Additional arguments passed on to xarray.open_mfdataset(). A list of available arguments can be found here: http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html

Returns:

out – a collection containing all the data from the list of files

Return type:

xarray.Dataset

Notes

wrf netcdf data must be postprocessed with xwrf, e.g. ds = xr.open_dataset(wrf_file, engine=”netcdf4”).xwrf.postprocess() So need to use collect_data instead.

ldcpy.util.save_metrics(full_ds, varname, set1, set2, time=0, lev=0, location='names.csv')[source]
full_dsxarray.Dataset

An xarray dataset containing multiple netCDF files concatenated across a ‘collection’ dimension

varnamestr

The variable of interest in the dataset

set1str

The collection label of the “control” data

set2str

The collection label of the (1st) data to compare

timeint, optional

The time index used t (default = 0)

timelev, optional

The level index of interest in a 3D dataset (default 0)

Returns:

out

Return type:

Number of failing metrics

ldcpy.util.subset_data(ds, subset=None, lat=None, lon=None, lev=None, start=None, end=None, time_dim_name=None, vertical_dim_name=None, lat_coord_name=None, lon_coord_name=None)[source]

Get a subset of the given dataArray, returns a dataArray

ldcpy.util.var_and_wt_coords(varname, ds_col)[source]

Module contents

Top-level module for ldcpy.