Input File Generation

In addition to post-processing model output, gcmprocpy bundles two utilities that build the geophysical forcing / boundary-condition NetCDF files that drive a TIE-GCM or WACCM-X run:

  • gpigen — Geophysical Indices (GPI): daily 10.7 cm solar flux (f107d), its running average (f107a), and the 3-hourly Kp index, from GFZ Potsdam.

  • imfgen — Interplanetary Magnetic Field / solar-wind boundary conditions: bx/by/bz, solar-wind density and velocity, from OMNI 1-minute data or a BCWIND HDF5 file.

Both are available as console commands (gpigen / imfgen) and as a Python API under gcmprocpy.gpigen / gcmprocpy.imfgen. Each generate_* function returns an xarray.Dataset, so the data can be inspected or post-processed before being written to NetCDF.

Both generators take a --model / model= option selecting the target model’s input format: tiegcm (default – the format documented below) or waccmx, which emits the WACCM-X (CESM) format directly from the same source data – an unlimited time dimension, a YYYYMMDD date plus datesec, and (for GPI) a 3-hourly series with an ap index derived from Kp via the official Kp->ap lookup table. WACCM-X output files carry a WACCMX filename tag (gpi_WACCMX_... / imf_OMNI_WACCMX_...). This reproduces, in one step, the files that were previously produced by a separate NCL/ncrcat conversion.

Note

These tools require network access to fetch the source data (the GFZ API for gpigen; CDAWeb / OMNI for imfgen), or a local copy of the source files. gpigen depends on requests; imfgen depends on h5py (BCWIND) and hapiclient (OMNI via CDAWeb). All are installed automatically with gcmprocpy.

GPI (gpigen)

Each output file holds, on a daily grid (ndays):

  • year_dayYYYYDDD integer (4-digit year + 3-digit day of year)

  • f107d — daily 10.7 cm solar flux

  • f107a — running-average 10.7 cm solar flux (default 81-day centered)

  • kp — 3-hourly Kp index, shaped (ndays, 8)

Writes <prefix>_<begYYYYDDD>-<endYYYYDDD>.nc into --output-dir.

Mode: CLI

# Full series 1960-01-01 -> yesterday, 81-day centered avg, JSON API (defaults)
gpigen

# Arbitrary date range
gpigen --start 2024-01-01 --end 2024-06-01

# 27-day trailing average
gpigen --window 27 --trailing --prefix gpi_27avg

# Parse the raw 1932-onward text file instead of the JSON API, and write plots
gpigen --source textfile --plots

# WACCM-X 3-hourly solar-parameters format (date/datesec, f107/f107a/kp/ap)
gpigen --model waccmx --start 2024-01-01 --end 2024-06-01

gpigen

Build TIEGCM GPI NetCDF files from GFZ Potsdam data.

usage: gpigen [-h] [--version] [--start START] [--end END]
              [--source {json,textfile}] [--model {tiegcm,waccmx}]
              [--window WINDOW] [--trailing] [--status STATUS]
              [--output-dir OUTPUT_DIR] [--output OUTPUT] [--prefix PREFIX]
              [--no-write] [--plots] [--plots-dir PLOTS_DIR]
              [--cache-dir CACHE_DIR] [-q]
-h, --help

show this help message and exit

--version

show program’s version number and exit

--start <start>

Start date (YYYY-MM-DD, YYYYDDD, or ISO). Default: 1960-01-01.

--end <end>

End date (inclusive). Default: yesterday.

--source {json,textfile}

Data source. Default: json (GFZ JSON API).

--model {tiegcm,waccmx}

Target model input format. Default: tiegcm. ‘waccmx’ emits the WACCM-X 3-hourly solar-parameters format (date/datesec, f107/f107a/kp/ap, unlimited time, WACCMX filename tag).

--window <window>

Averaging window in days for f107a. Default: 81.

--trailing

Use a trailing average instead of centered.

--status <status>

GFZ ‘status’ query param (json source). Default: def.

--output-dir <output_dir>

Directory for the output .nc file. Default: cwd.

--output <output>

Explicit output path (overrides –output-dir/–prefix).

--prefix <prefix>

Filename prefix: <prefix>_<beg>-<end>.nc. Default: gpi.

--no-write

Build the dataset but do not write a file.

--plots

Also write per-year f107d/f107a PNGs (needs the ‘plot’ extra).

--plots-dir <plots_dir>

Directory for plots. Default: ./plots.

--cache-dir <cache_dir>

Where to store the downloaded text file (textfile source).

-q, --quiet

Suppress progress.

Mode: API

from gcmprocpy import gpigen

ds = gpigen.generate_gpi(
    start="2024-01-01",   # YYYY-MM-DD, YYYYDDD, ISO, or datetime
    end=None,             # default: yesterday
    source="json",        # "json" (GFZ API) or "textfile"
    window=81,            # averaging window in days
    centered=True,        # centered vs trailing
)

path = gpigen.save_gpi(ds, output_dir=".")     # write NetCDF
gpigen.make_plots(ds, output_dir="plots")      # optional per-year PNGs

The top-level entry points generate_gpi, save_gpi and make_plots are also re-exported directly on the gcmprocpy namespace.

gcmprocpy.gpigen.core.generate_gpi(start='1960-01-01', end=None, source='json', window=81, centered=True, status='def', cache_dir=None, verbose=False, model='tiegcm')[source]

Generate a GPI xarray.Dataset for [start, end].

Parameters:
  • start (date-like or None) – Inclusive bounds (YYYY-MM-DD, YYYYDDD, ISO, or datetime). end=None defaults to yesterday. The output begins at start but, for a centered window, ends window // 2 days before end (a centered average needs future data that does not yet exist).

  • end (date-like or None) – Inclusive bounds (YYYY-MM-DD, YYYYDDD, ISO, or datetime). end=None defaults to yesterday. The output begins at start but, for a centered window, ends window // 2 days before end (a centered average needs future data that does not yet exist).

  • source ({"json", "textfile"}) – GFZ JSON API (default) or the locally-parsed 1932-onward text file.

  • window (int) – Averaging window in days (default 81).

  • centered (bool) – Centered (default) vs trailing average for f107a.

  • status (str) – GFZ status query param for the JSON API (default "def").

  • cache_dir (str or None) – Where to drop the downloaded text file (textfile source only).

  • verbose (bool) – Print progress.

Returns:

Use gpigen.save_gpi() to write it to NetCDF.

Return type:

xarray.Dataset

gcmprocpy.gpigen.dataset.save_gpi(ds, output_dir='.', prefix='gpi', path=None)[source]

Write ds to NetCDF and return the path written.

path overrides the auto-generated name; otherwise the file is <output_dir>/<prefix>_<beg>-<end>.nc.

gcmprocpy.gpigen.dataset.build_dataset(year_day, f107d, f107a, kp, window, centered, missing_dates, model='tiegcm')[source]

Build the GPI xarray.Dataset for the target model.

model="tiegcm" (default): the TIE-GCM format – year_day/f107d/ f107a on ndays and kp on (ndays, nkp=8) – with the TIEGCM global attributes (unchanged). model="waccmx": the WACCM-X (CESM) 3-hourly solar-parameters format on a single (unlimited-on-write) time dimension of length ndays*8: date (YYYYMMDD), datesec (0..75600), f107 (= f107d repeated across the 8 daily slots), f107a (repeated), kp (flattened) and ap (derived from Kp via the official lookup table).

gcmprocpy.gpigen.dataset.gpi_filename(ds, prefix='gpi')[source]

<prefix>[_WACCMX]_<begYYYYDDD>-<endYYYYDDD>.nc from the dataset’s bounds.

A model='waccmx' dataset gains a WACCMX tag (e.g. gpi_WACCMX_...).

gcmprocpy.gpigen.plotting.make_plots(ds, output_dir='plots')[source]

Write f107d_<year>.png and f107a_<year>.png for each year.

Returns the list of files written. Importing matplotlib lazily keeps it an optional dependency for the core pipeline.

IMF / Solar-Wind Boundary Conditions (imfgen)

Each output file holds, on a per-minute grid (ndata):

  • bx, by, bz — IMF components (nT)

  • swden — solar-wind proton density (cm-3)

  • swvel — solar-wind flow speed (km/s)

  • a 0/1 *Mask quality flag for each channel (bxMask, byMask, bzMask, denMask, velMask; 0 = linearly interpolated)

  • dateYYYYDDD.frac (year, day-of-year, fractional day)

  • timestamp — ISO string YYYY-MM-DDTHH:MM:SS

OMNI access modes

The OMNI source can be fetched two ways (--omni-access / omni_access=):

  • hapi (default) — query CDAWeb’s HAPI server for the OMNI_HRO_1MIN dataset, retrieving only the requested window. Best for short ranges: a few-day request transfers a few days of data instead of whole-year files.

  • asc — download and parse the SPDF omni_min<year>.asc files (over FTP into --cache-dir). This reproduces the legacy per-year output exactly and is preferable for bulk regeneration of the full archive.

Both draw on the same underlying product (the same variables, fill values and 1-minute UTC grid), so the processed output matches.

Mode: CLI

# A specific range — fetched window-only from CDAWeb HAPI (default access).
imfgen --start 2020-01-01 --end 2020-12-31

# Full series 1982-01-01 -> yesterday, 10-min trailing average (defaults).
imfgen

# Reproduce the legacy per-year files from the SPDF ASCII archive.
imfgen --split-years --omni-access asc --cache-dir ./omni_asc --output-dir .

# Convert a BCWIND HDF5 file
imfgen --source bcwind --bcwind-path bcwind.h5

# WACCM-X format (unlimited time; date=YYYYMMDD + datefrac/datesec)
imfgen --model waccmx --start 2024-01-01 --end 2024-12-31

imfgen

Build TIEGCM IMF NetCDF files from OMNI or BCWIND data.

usage: imfgen [-h] [--version] [--source {omni,bcwind}]
              [--model {tiegcm,waccmx}] [--omni-access {hapi,asc}]
              [--start START] [--end END] [--window WINDOW]
              [--cache-dir CACHE_DIR] [--bcwind-path BCWIND_PATH]
              [--no-download] [--output-dir OUTPUT_DIR] [--output OUTPUT]
              [--prefix PREFIX] [--split-years] [--no-write] [-q]
-h, --help

show this help message and exit

--version

show program’s version number and exit

--source {omni,bcwind}

Data source. Default: omni (OMNI 1-minute data).

--model {tiegcm,waccmx}

Target model input format. Default: tiegcm. ‘waccmx’ emits the WACCM-X format (date as YYYYMMDD + datefrac/datesec, unlimited time dim, WACCMX filename tag).

--omni-access {hapi,asc}

How to fetch OMNI data: ‘hapi’ (default) pulls only the requested window from CDAWeb (fast for short ranges); ‘asc’ downloads the full omni_min<year>.asc files (reproduces legacy output exactly).

--start <start>

Start date (YYYY-MM-DD, YYYYDDD, or ISO). omni default: 1982-01-01.

--end <end>

End date (inclusive). omni default: yesterday.

--window <window>

Trailing-average window in minutes (omni). Default: 10.

--cache-dir <cache_dir>

Directory for omni_min<year>.asc files (–omni-access asc). Default: cwd.

--bcwind-path <bcwind_path>

Path to the BCWIND HDF5 file (required for –source bcwind).

--no-download

Do not fetch missing OMNI files over FTP; use local files only (–omni-access asc).

--output-dir <output_dir>

Directory for the output .nc file(s). Default: cwd.

--output <output>

Explicit output path (single-file mode; overrides –output-dir/–prefix).

--prefix <prefix>

Filename prefix: <prefix>_<beg>-<end>.nc. Default: imf_OMNI / imf_bcwind.

--split-years

Write one file per calendar year instead of a single range file.

--no-write

Build the dataset but do not write a file.

-q, --quiet

Suppress progress.

Mode: API

from gcmprocpy import imfgen

# OMNI -> one continuous Dataset for the range
ds = imfgen.generate_imf(
    start="2020-01-01",   # YYYY-MM-DD, YYYYDDD, ISO, or datetime
    end=None,             # default: yesterday
    source="omni",        # "omni" (default) or "bcwind"
    window=10,            # trailing-average window, minutes
    cache_dir="./omni_asc",
)
path = imfgen.save_imf(ds, output_dir=".")          # write NetCDF

# Per-year files (each interpolated within its own year), like the originals
for ds_year in imfgen.generate_imf_years(start="1982-01-01", cache_dir="./omni_asc"):
    imfgen.save_imf(ds_year, output_dir=".")

# BCWIND HDF5 -> Dataset
ds = imfgen.generate_imf(source="bcwind", bcwind_path="bcwind.h5")

The top-level entry points generate_imf, generate_imf_years and save_imf are also re-exported directly on the gcmprocpy namespace.

gcmprocpy.imfgen.core.generate_imf(start=None, end=None, source='omni', window=10, cache_dir=None, bcwind_path=None, download=True, omni_access='hapi', verbose=False, model='tiegcm')[source]

Generate an IMF xarray.Dataset.

Parameters:
  • start (date-like or None) – Inclusive bounds (YYYY-MM-DD, YYYYDDD, ISO, or datetime). For omni they default to 1982-01-01 and yesterday. For bcwind they optionally filter the file (default: the file’s full span).

  • end (date-like or None) – Inclusive bounds (YYYY-MM-DD, YYYYDDD, ISO, or datetime). For omni they default to 1982-01-01 and yesterday. For bcwind they optionally filter the file (default: the file’s full span).

  • source ({"omni", "bcwind"}) – OMNI 1-minute data (default) or a BCWIND HDF5 file.

  • window (int) – Trailing-average window in minutes for the OMNI pipeline (default 10). Ignored for bcwind (raw pass-through).

  • cache_dir (str or None) – Directory holding / receiving omni_min<year>.asc files (omni_access="asc" only).

  • bcwind_path (str or None) – Path to the BCWIND HDF5 file (required when source="bcwind").

  • download (bool) – Fetch missing OMNI year files over FTP (omni_access="asc" only; default True).

  • omni_access ({"hapi", "asc"}) – How to obtain OMNI data. "hapi" (default) fetches only the requested window from CDAWeb’s HAPI server (no whole-year download) – best for short ranges. "asc" downloads/parses the SPDF omni_min<year>.asc files and reproduces the legacy per-year output exactly. Ignored for bcwind.

  • verbose (bool) – Print progress.

  • model ({"tiegcm", "waccmx"}) – Target model input format (default "tiegcm"). "waccmx" emits the WACCM-X format (unlimited time; date as YYYYMMDD plus datefrac/datesec; a WACCMX filename tag).

Returns:

Use imfgen.save_imf() to write it to NetCDF.

Return type:

xarray.Dataset

gcmprocpy.imfgen.core.generate_imf_years(start=None, end=None, window=10, cache_dir=None, download=True, omni_access='hapi', verbose=False, model='tiegcm')[source]

Yield one OMNI Dataset per calendar year in [start, end].

Each year is generated independently (its own within-year interpolation), so the per-year files reproduce the legacy imf_OMNI_YYYY001-YYYYddd.nc files. This is what imfgen --split-years writes. (BCWIND files are a single span and are not split.) For bit-for-bit reproduction of the legacy files use omni_access="asc".

gcmprocpy.imfgen.dataset.save_imf(ds, output_dir='.', prefix=None, path=None)[source]

Write ds to NetCDF and return the path written.

path overrides the auto-generated <prefix>_<beg>-<end>.nc name. (For per-year output, generate each year with imfgen.generate_imf_years() and call this once per dataset – see imfgen --split-years.)

gcmprocpy.imfgen.dataset.build_dataset(processed, dates, timestamps, source='omni', source_path=None, model='tiegcm', datetimes=None)[source]

Build the IMF xarray.Dataset.

processed maps each channel in CHANNELS to (values, mask). dates is the YYYYDDD.frac float array; timestamps the ISO strings.

model selects the target model’s input format:

  • "tiegcm" (default): the TIE-GCM format on the ndata dimension, with date (YYYYDDD.frac) and the ISO timestamp – reproduced exactly (see the module docstring).

  • "waccmx": the WACCM-X (CESM) format on an (unlimited-on-write) time dimension, with date (YYYYMMDD int), datefrac (the old YYYYDDD.frac) and datesec (seconds of day). Requires datetimes (the source datetime objects); there is no ISO timestamp variable.

gcmprocpy.imfgen.dataset.imf_filename(ds, prefix=None)[source]

<prefix>_<begYYYYDDD>-<endYYYYDDD>.nc from the dataset’s bounds.

For model='waccmx' datasets the default prefix gains a WACCMX tag (e.g. imf_OMNI_WACCMX_...); an explicit prefix is used verbatim.