Data Parsing Functions

gcmprocpy provides a range of functions for data extraction and manipulation. Below are the key plotting routines along with their detailed parameters and usage examples.

Data Exploration

Listing Dimensions

This function reads all the datasets and returns the unique dimensions present.

gcmprocpy.data_parse.dim_list(datasets)[source]

Retrieves a sorted list of unique dimension names across all datasets.

Parameters:

datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

Returns:

A sorted list of unique dimension names across all datasets.

Return type:

list

Example:

Load datasets and list unique dimensions.

datasets = gy.load_datasets(directory, dataset_filter)
dims = gy.dim_list(datasets)
print(dims)

Listing Variables

This function reads all the datasets and reutrns the variables listed in there.

gcmprocpy.data_parse.var_list(datasets)[source]

Reads all the datasets and returns the variables listed in them.

Parameters:

datasets (xarray.Dataset) – The loaded dataset opened using xarray.

Returns:

A sorted list of variable entries in the datasets.

Return type:

list

Example:

Load datasets and list unique variables.

datasets = gy.load_datasets(directory, dataset_filter)
vars = gy.var_list(datasets)
print(vars)

Listing Timestamps

This function compiles and returns a list of all timestamps present in the provided datasets.

gcmprocpy.data_parse.time_list(datasets)[source]

Compiles and returns a list of all timestamps present in the provided datasets. This function is particularly useful for aggregating time data from multiple sources.

Parameters:

datasets (list of tuples) – Each tuple in the list contains an xarray dataset and its corresponding filename. The function will iterate through each dataset to gather timestamps.

Returns:

A list containing all the datetime64 timestamps found in the datasets.

Return type:

list of np.datetime64

Example:

Load datasets and list unique timestamps.

datasets = gy.load_datasets(directory, dataset_filter)
times = gy.time_list(datasets)
print(times)

Listing Levels

This function reads all the datasets and returns the unique lev and ilev entries in sorted order.

gcmprocpy.data_parse.level_list(datasets, log_level=True)[source]

Reads all the datasets and returns the unique lev and ilev entries in sorted order.

Parameters:
  • datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

  • log_level (bool) – A flag indicating whether to display level in log values. Default is True.

Returns:

A sorted list of unique lev and ilev entries from the datasets.

Return type:

lev_ilevs (list)

Example:

Load datasets and list unique lev and ilev entries.

datasets = gy.load_datasets(directory, dataset_filter)
lev_ilevs = gy.level_list(datasets)
print(lev_ilevs)

Listing Longitudes

This function reads all the datasets and returns the unique longitude (lon) entries in sorted order.

gcmprocpy.data_parse.lon_list(datasets)[source]

Reads all the datasets and returns the unique longitude (lon) entries in sorted order.

Parameters:

datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

Returns:

A sorted list of unique longitude entries from the datasets.

Return type:

list

Example:

Load datasets and list unique longitude entries.

datasets = gy.load_datasets(directory, dataset_filter)
lons = gy.lon_list(datasets)
print(lons)

Listing Latitudes

This function reads all the datasets and returns the unique latitude (lat) entries in sorted order.

gcmprocpy.data_parse.lat_list(datasets)[source]

Reads all the datasets and returns the unique latitude (lat) entries in sorted order.

Parameters:

datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

Returns:

A sorted list of unique latitude entries from the datasets.

Return type:

list

Example:

Load datasets and list unique latitude entries.

datasets = gy.load_datasets(directory, dataset_filter)
lats = gy.lat_list(datasets)
print(lats)

Variable Information

This function provides detailed information about a specific variable in the datasets.

gcmprocpy.data_parse.var_info(datasets, variable_name)[source]

Retrieves the attributes and dimension information of a specified variable from all datasets.

Parameters:
  • datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

  • variable_name (str) – The name of the variable to retrieve attributes for.

Returns:

A dictionary where keys are filenames and values are dictionaries of attributes for the specified variable.

Return type:

dict

Example:

Load datasets and get information about a specific variable.

datasets = gy.load_datasets(directory, dataset_filter)
info = gy.var_info(datasets, 'variable_name')
print(info)

Dimension Information

This function provides detailed information about a specific dimension in the datasets.

gcmprocpy.data_parse.dim_info(datasets, dimension)[source]

Retrieves information about a specified dimension’s size across all datasets.

Parameters:
  • datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

  • dimension (str) – The name of the dimension to retrieve information for.

Returns:

A dictionary where keys are filenames and values are the size of the specified dimension.

If the dimension does not exist in a dataset, the value is None.

Return type:

dict

Example:

Load datasets and get information about a specific dimension.

datasets = gy.load_datasets(directory, dataset_filter)
info = gy.dim_info(datasets, 'dimension_name')
print(info)

Data Xarrays

Selected Time

This function extracts and processes data for a given variable at a specific time from multiple datasets. It also handles unit conversion and provides additional information if needed for plotting.

gcmprocpy.data_parse.arr_var(datasets, variable_name, time, selected_unit=None, log_level=True, plot_mode=False)[source]

Extracts and processes data for a given variable at a specific time from multiple datasets. It also handles unit conversion and provides additional information if needed for plotting.

Parameters:
  • datasets (list[tuple]) – Each tuple contains an xarray dataset and its filename. The function will search each dataset for the specified time and variable.

  • variable_name (str) – The name of the variable to be extracted.

  • time (Union[np.datetime64, str]) – The specific time for which data is to be extracted.

  • selected_unit (str, optional) – The desired unit for the variable. If None, the original unit is used.

  • log_level (bool) – A flag indicating whether to display level in log values. Default is True.

  • plot_mode (bool, optional) – If True, the function returns additional data useful for plotting.

Returns:

If plot_mode is False, returns only the variable values as a numpy array. If plot_mode is True, returns a tuple containing:

numpy.ndarray: The extracted variable values. numpy.ndarray: The corresponding level or ilevel values. str: The unit of the variable after conversion (if applicable). str: The long descriptive name of the variable. numpy.ndarray: Model time array corresponding to the specified time. str: The name of the dataset file from which data is extracted.

Return type:

Union[numpy.ndarray, tuple]

Selected Time, Level

This function extracts data from the dataset based on the specified variable, time, and level (lev/ilev).

gcmprocpy.data_parse.arr_lat_lon(datasets, variable_name, time, selected_lev_ilev=None, selected_unit=None, plot_mode=False)[source]

Extracts data from the dataset based on the specified variable, time, and level (lev/ilev).

Parameters:
  • datasets (xarray.Dataset) – The loaded dataset/s using xarray.

  • variable_name (str) – Name of the variable to extract.

  • time (Union[str, numpy.datetime64]) – Timestamp to filter the data.

  • selected_lev_ilev (Union[float, str], optional) – Level value to filter the data. If ‘mean’, calculates the mean over all levels.

  • selected_unit (str, optional) – Desired unit to convert the data to. If None, uses the original unit.

  • plot_mode (bool, optional) – If True, returns additional information for plotting.

Returns:

If plot_mode is False, returns an xarray object containing the variable values for the specified time and level. If plot_mode is True, returns a tuple containing:

xarray.DataArray: Array of variable values for the specified time and level. Union[float, str]: The level value used for data selection. xarray.DataArray: Array of latitude values corresponding to the variable values. xarray.DataArray: Array of longitude values corresponding to the variable values. str: Unit of the variable after conversion (if applicable). str: Long descriptive name of the variable. numpy.ndarray: Array containing Day, Hour, Min of the model run. str: Name of the dataset file from which data is extracted.

Return type:

Union[xarray.DataArray, tuple]

Selected Time, Latitude, Longitude

This function extracts data from the dataset for a given variable name, latitude, longitude, and time.

gcmprocpy.data_parse.arr_lev_var(datasets, variable_name, time, selected_lat, selected_lon, selected_unit=None, log_level=True, plot_mode=False)[source]

Extracts data from the dataset for a given variable name, latitude, longitude, and time.

Parameters:
  • datasets (xarray.Dataset) – The loaded dataset opened using xarray.

  • variable_name (str) – Name of the variable to retrieve.

  • time (str) – Timestamp to filter the data.

  • selected_lat (float) – Latitude value.

  • selected_lon (float) – Longitude value.

  • selected_unit (str, optional) – Desired unit to convert the data to. If None, uses the original unit.

  • log_level (bool) – A flag indicating whether to display level in log values. Default is True.

  • plot_mode (bool, optional) – If True, returns additional information for plotting.

Returns:

If plot_mode is False, returns an xarray object containing the variable values. If plot_mode is True, returns a tuple containing:

xarray.DataArray: Array of variable values for the specified time and latitude/longitude. xarray.DataArray: Array of level or ilevel values where data is not NaN. str: Unit of the variable after conversion (if applicable). str: Long descriptive name of the variable. numpy.ndarray: Array containing Day, Hour, Min of the model run. str: Name of the dataset file from which data is extracted.

Return type:

Union[xarray.DataArray, tuple]

Selected Time Latitude

This function extracts and processes data from the dataset based on a specific variable, time, and latitude.

gcmprocpy.data_parse.arr_lev_lon(datasets, variable_name, time, selected_lat, selected_unit=None, log_level=True, plot_mode=False)[source]

Extracts and processes data from the dataset based on a specific variable, time, and latitude.

Parameters:
  • datasets (xarray.Dataset) – The loaded dataset opened using xarray.

  • variable_name (str) – Name of the variable to extract.

  • time (Union[str, numpy.datetime64]) – Timestamp to filter the data.

  • selected_lat (float) – Latitude value to filter the data.

  • selected_unit (str, optional) – Desired unit to convert the data to. If None, uses the original unit.

  • log_level (bool) – A flag indicating whether to display level in log values. Default is True.

  • plot_mode (bool, optional) – If True, returns additional information for plotting.

Returns:

If plot_mode is False, returns an xarray object containing the variable values for the specified time and latitude. If plot_mode is True, returns a tuple containing:

xarray.DataArray: Array of variable values for the specified time and latitude. xarray.DataArray: Array of longitude values corresponding to the variable values. xarray.DataArray: Array of level or ilevel values where data is not NaN. float: The latitude value used for data selection. str: Unit of the variable after conversion (if applicable). str: Long descriptive name of the variable. numpy.ndarray: Array containing Day, Hour, Min of the model run. str: Name of the dataset file from which data is extracted.

Return type:

Union[xarray.DataArray, tuple]

Selected Time, Longitude

This function extracts data from a dataset based on the specified variable name, time, and longitude.

gcmprocpy.data_parse.arr_lev_lat(datasets, variable_name, time, selected_lon, selected_unit=None, log_level=True, plot_mode=False)[source]

Extracts data from a dataset based on the specified variable name, timestamp, and longitude.

Parameters:
  • datasets (xarray.Dataset) – The loaded dataset opened using xarray.

  • variable_name (str) – Name of the variable to extract.

  • time (Union[str, numpy.datetime64]) – Timestamp to filter the data.

  • selected_lon (Union[float, str]) – Longitude to filter the data, or ‘mean’ for averaging over all longitudes.

  • selected_unit (str, optional) – Desired unit to convert the data to. If None, uses the original unit.

  • log_level (bool) – A flag indicating whether to display level in log values. Default is True.

  • plot_mode (bool, optional) – If True, returns additional information for plotting.

Returns:

If plot_mode is False, returns an xarray object containing the variable values for the specified time and longitude. If plot_mode is True, returns a tuple containing:

xarray.DataArray: Array of variable values for the specified time and longitude. xarray.DataArray: Array of latitude values corresponding to the variable values. xarray.DataArray: Array of level or ilevel values where data is not NaN. str: Unit of the variable after conversion (if applicable). str: Long descriptive name of the variable. numpy.ndarray: Array containing Day, Hour, Min of the model run. str: Name of the dataset file from which data is extracted.

Return type:

Union[xarray.DataArray, tuple]

Selected Latitude, Longitude Over Time-range

This function extracts and processes data from multiple datasets using data across different levels and times for a given latitude and longitude.

gcmprocpy.data_parse.arr_lev_time(datasets, variable_name, selected_lat, selected_lon, selected_unit=None, log_level=True, plot_mode=False)[source]

This function extracts and processes data from multiple datasets based on specified parameters. It focuses on extracting data across different levels and times for a given latitude and longitude.

Parameters:
  • datasets (list[tuple]) – A list of tuples where each tuple contains an xarray dataset and its filename.

  • variable_name (str) – The name of the variable to be extracted from the dataset.

  • selected_lat (Union[float, str]) – The latitude value or ‘mean’ to average over all latitudes.

  • selected_lon (Union[float, str]) – The longitude value or ‘mean’ to average over all longitudes.

  • selected_unit (str, optional) – The desired unit for the variable. If None, the original unit is used.

  • log_level (bool) – A flag indicating whether to display level in log values. Default is True.

  • plot_mode (bool, optional) – If True, the function returns additional data useful for plotting.

Returns:

If plot_mode is False, returns a numpy array of variable values concatenated across datasets. If plot_mode is True, returns a tuple containing:

numpy.ndarray: Concatenated variable values. numpy.ndarray: Corresponding level or ilevel values. list: List of model times. Union[float, str]: The longitude used for data selection. str: The unit of the variable after conversion (if applicable). str: The long descriptive name of the variable.

Return type:

Union[numpy.ndarray, tuple]

Selected Level, Longitude Over Time-range

This function extracts and processes data from the dataset based on the specified variable name, longitude, and level/ilev.

gcmprocpy.data_parse.arr_lat_time(datasets, variable_name, selected_lon, selected_lev_ilev=None, selected_unit=None, plot_mode=False)[source]

Extracts and processes data from the dataset based on the specified variable name, longitude, and level/ilev.

Parameters:
  • datasets (list[tuple]) – Each tuple contains an xarray dataset and its filename.

  • variable_name (str) – The name of the variable to extract.

  • selected_lon (Union[float, str]) – Longitude value or ‘mean’ to average over all longitudes.

  • selected_lev_ilev (Union[float, str, None]) – Level or intermediate level value, ‘mean’ for averaging, or None if not applicable.

  • selected_unit (str, optional) – The desired unit for the variable. If None, the original unit is used.

  • plot_mode (bool, optional) – If True, returns additional data useful for plotting.

Returns:

If plot_mode is False, returns a numpy array of variable values concatenated across datasets. If plot_mode is True, returns a tuple containing:

numpy.ndarray: Concatenated variable values. numpy.ndarray: Latitude values corresponding to the variable values. list: List of model times. Union[float, str]: The longitude used for data selection. str: The unit of the variable after conversion (if applicable). str: The long descriptive name of the variable. str: Name of the dataset file from which data is extracted.

Return type:

Union[numpy.ndarray, tuple]

Data manipulation

Level log transformation

This function performs a log transformation on the pressure level array.

gcmprocpy.data_parse.level_log_transform(array, model, log_level)[source]

Applies a logarithmic or exponential transformation to the input array based on the model type and log_level flag.

Parameters:
  • array (numpy.ndarray) – The input array to be transformed.

  • model (str) – The model type, either ‘WACCM-X’ or ‘TIE-GCM’.

  • log_level (bool) – A flag indicating whether to apply a logarithmic transformation (True) or an exponential transformation (False).

Returns:

The transformed array.

Return type:

numpy.ndarray

mTime to Time

This function searches for a specific time in a dataset based on the provided model time (mtime) and returns the corresponding np.datetime64 time value. It iterates through multiple datasets to find a match.

gcmprocpy.data_parse.get_time(datasets, mtime)[source]

Searches for a specific time in a dataset based on the provided model time (mtime) and returns the corresponding np.datetime64 time value. It iterates through multiple datasets to find a match.

Parameters:
  • datasets (list[tuple]) – Each tuple contains an xarray dataset and its filename. The function will search each dataset for the time value.

  • mtime (list[int]) – Model time represented as a list of integers in the format [day, hour, minute].

Returns:

The corresponding datetime value in the dataset for the given mtime. Returns None if no match is found.

Return type:

np.datetime64

Time to mTime

This function finds and returns the model time (mtime) array that corresponds to a specific time in a dataset. The mtime is an array representing [Day, Hour, Min].

gcmprocpy.data_parse.get_mtime(ds, time)[source]

Finds and returns the model time (mtime) array that corresponds to a specific time in a dataset. The mtime is an array representing [Day, Hour, Min].

Parameters:
  • ds (xarray.Dataset) – The dataset opened using xarray, containing time and mtime data.

  • time (Union[str, numpy.datetime64]) – The timestamp for which the corresponding mtime is to be found.

Returns:

The mtime array containing [Day, Hour, Min] for the given timestamp.

Returns None if no corresponding mtime is found.

Return type:

numpy.ndarray

Average Z height

This function compute the average Z value for a given set of lat, lon, and lev from a dataset.

gcmprocpy.data_parse.calc_avg_ht(datasets, time, selected_lev_ilev)[source]

Compute the average Z value for a given set of latitude, longitude, and level from a dataset.

Parameters:
  • ds (xarray.Dataset) – The loaded dataset opened using xarray.

  • time (str) – Timestamp to filter the data.

  • selected_lev_ilev (float) – The level for which to retrieve data.

Returns:

The average ZG value for the given conditions.

Return type:

float

Other

Check lev/ilev

This function checks the dimensions of a given variable in a dataset to determine if it includes specific dimensions (‘lev’ or ‘ilev’).

gcmprocpy.data_parse.check_var_dims(ds, variable_name)[source]

Checks the dimensions of a given variable in a dataset to determine if it includes specific dimensions (‘lev’ or ‘ilev’).

Parameters:
  • ds (xarray.Dataset) – The dataset in which the variable’s dimensions are to be checked.

  • variable_name (str) – The name of the variable for which dimensions are being checked.

Returns:

Returns ‘lev’ if the variable includes the ‘lev’ dimension, ‘ilev’ if it includes the ‘ilev’ dimension,

’Variable not found in dataset’ if the variable does not exist in the dataset, and None if neither ‘lev’ nor ‘ilev’ are dimensions of the variable.

Return type:

str

Min/Max Array

This function finds the minimum and maximum values of varval from the 2D array

gcmprocpy.data_parse.min_max(variable_values)[source]

Find the minimum and maximum values of varval from the 2D array.

Parameters:

variable_values (xarray.DataArray) – A 2D array of variable values.

Returns:

float: Minimum value of the variable in the array. float: Maximum value of the variable in the array.

Return type:

tuple