Code documentation

nctime

Module author: Guillaume Levavasseur <glipsl@ipsl.jussieu.fr>

nctime.nctcck.get_args(args=None)[source]

Returns parsed command-line arguments.

Returns:The argument parser
Return type:argparse.Namespace

Module author: Guillaume Levavasseur <glipsl@ipsl.jussieu.fr>

nctime.nctxck.get_args(args=None)[source]

Returns parsed command-line arguments.

Returns:The argument parser
Return type:argparse.Namespace

overlap

platform:Unix
synopsis:Highlight chunked NetCDF files producing overlap in a time series.
nctime.overlap.main.get_overlaps(g, shortest)[source]
Returns all overlapping files (default) as a list of tuples. Each tuple gathers:
  • The higher overlap bound,
  • The date to cut the file in order to resolve the overlap,
  • The corresponding cutting timestep.
Parameters:
  • g (networkx.DiGraph()) – The directed graph
  • shortest (list) – The most consecutive files list (from nx.DiGraph().shortest_path)
Returns:

The filenames

Return type:

list

nctime.overlap.main.resolve_overlap(ffp, pattern, from_date=None, to_date=None, cutting_timestep=None, partial=False)[source]

Resolve overlapping files. If full overlap, the corresponding file is removed. If partial overlap, the corresponding file is truncated into a new one and the old file is removed.

Parameters:
  • ffp (str) – The file full path
  • pattern (str) – The filename pattern
  • from_date (int) – Overlap starting date
  • to_date (int) – Overlap ending date
  • cutting_timestep (int) – Time step to cut the file
  • partial (boolean) – Resolve partial overlap if True
nctime.overlap.main.extract_dates(ffp)[source]

Extract dates attributes from netCDF file..

  • start = the start date of the file sub-period
  • end = the end date of the file sub-period
  • next = the date next to the end date depending on the frequency, the calendar, etc.
  • first_step = the first time axis step
  • last_step = the last time axis step
  • path = the file full path
Parameters:ffp (str) – The file full path to process
nctime.overlap.main.create_nodes(fh)[source]

Creates the node into the corresponding Graph(). One directed graph per dataset. Each file is analysed to be a node in its graph. A node is a filename with some attributes:

  • start = the start date of the file sub-period
  • end = the end date of the file sub-period
  • next = the date next to the end date depending on the frequency, the calendar, etc.
  • first_step = the first time axis step
  • last_step = the last time axis step
  • path = the file full path
Parameters:fh (handler.Filename) – The filename handler
nctime.overlap.main.create_edges(gid)[source]

Creates the edges between nodes into the corresponding Graph(). One directed graph per dataset.

  • Builds the “START” node with the appropriate edges to the nodes with the earliest start date
  • Builds the “END” node with the appropriate edges from the nodes with the latest end date
  • Builds edges between a node and its “backwards” nodes
Parameters:gid (str) – The graph id
nctime.overlap.main.evaluate_graph(gid)[source]

Evaluate the directed graph looking for a shortest path between “START” and “END” nodes. If a shorted path is found, it looks for the potential overlaps. Partial and full overlaps are supported. If no shortest path found, it creates a new node “BREAK” in the path.

Parameters:gid (str) – The graph id
nctime.overlap.main.format_path(path, partial_overlaps, full_overlaps)[source]

Formats the message to print as a diagnostic. It walks through the evaluated path of the directed graph and print the filenames with useful info.

Parameters:
  • path (list) – The node path as a result of the directed graph evaluation
  • partial_overlaps (dict) – Dictionary of partial overlaps
  • full_overlaps (list) – List of full overlapping files
Returns:

The formatted diagnostic to print

Return type:

str

nctime.overlap.main.yield_filedef(paths)[source]

Yields all file definition produced by dr2xml as input for XIOS.

Parameters:paths (list) – The list of filedef path
Returns:The dr2xml files
Return type:iter
nctime.overlap.main.get_patterns_from_filedef(path)[source]

Parses dr2xml files. Each filename pattern is deserialized as a dictionary of facet: value.

Parameters:path (str) – The path to scan
Returns:The filename deserialized with facets
Return type:dict
nctime.overlap.main.initializer(keys, values)[source]

Initialize process context by setting particular variables as global variables.

Parameters:
  • keys (list) – Argument name list
  • values (list) – Argument value list
nctime.overlap.main.run(args=None)[source]

Main process that:

  • Instantiates processing context,
  • Deduces start, end and next date from each filenames,
  • Builds the DiGraph,
  • Detects the shortest path between dates if exists,
  • Detects broken path between dates if exists,
  • Removes the overlapping files.
Parameters:args (ArgumentParser) – Command-line arguments parser
platform:Unix
synopsis:File handler for time axis.
class nctime.overlap.handler.Filename(ffp)[source]

Handler providing methods to deal with file processing.

get_start_end_dates(pattern, calendar)[source]

Wraps and records get_start_end_dates_from_filename() results.

Parameters:
  • Object pattern (re) – The filename pattern as a regex (from re library).
  • calendar (str) – The NetCDF calendar attribute
Returns:

Start and end dates as number of days since the referenced date

Return type:

float

nc_att_get(attribute, variable=None)[source]

Get attribute from NetCDF file. Default is to find into global attributes. If attribute key is not found, get the closest key name instead.

Parameters:
  • attribute (str) – The attribute key to get
  • variable (str) – The variable from which to find the attribute. Global is None.
Returns:

The attribute value

Return type:

str

class nctime.overlap.handler.Graph[source]

Handler providing methods to deal with Directed graph.

platform:Unix
synopsis:Processing context used in this module.
class nctime.overlap.context.ProcessingContext(args)[source]

Encapsulates the processing context/information for main process.

Parameters:args (ArgumentParser) – Parsed command-line arguments
Returns:The processing context
Return type:ProcessingContext
nctime.overlap.context.yield_xml_from_card(card_path)[source]

Yields XML path from run.card and config.card attributes.

Parameters:card_path (str) – Directory including run.card and config.card
Returns:The XML paths to use
Return type:iter
platform:Unix
synopsis:Constants used in this module.
platform:Unix
synopsis:Custom exceptions used in this module.

axis

platform:Unix
synopsis:Rewrite and/or check time axis of MIP NetCDF files.
nctime.axis.main.process(ffp)[source]

Process time axis checkup and rewriting if needed.

Parameters:ffp (str) – The file full path to process
Returns:The file status
Return type:list
nctime.axis.main.initializer(keys, values)[source]

Initialize process context by setting particular variables as global variables.

Parameters:
  • keys (list) – Argument name list
  • values (list) – Argument value list
nctime.axis.main.run(args=None)[source]

Main process that:

  • Instantiates processing context,
  • Defines the referenced time properties,
  • Instantiates threads pools,
  • Prints or logs the time axis diagnostics.
Parameters:args (ArgumentParser) – Command-line arguments parser
platform:Unix
synopsis:File handler for time axis.
class nctime.axis.handler.File(ffp, pattern, ref_units, ref_calendar, input_start_timestamp=None, input_end_timestamp=None)[source]

Handler providing methods to deal with file processing.

Returns:The file handler
Return type:File
build_time_axis()[source]

Rebuilds time axis from date axis, depending on MIP frequency, calendar and instant status.

Returns:The corresponding theoretical time axis
Return type:numpy.array
build_time_bounds()[source]

Rebuilds time boundaries from the start date, depending on MIP frequency, calendar and instant status.

Returns:The corresponding theoretical time boundaries as a [n, 2] array
Return type:numpy.array
check_axis_length(axis)[source]

numpy.arange could suddenly add last endpoint to the array in the case of high number of steps Due to rounding float issue and pre-calculated length in memory. As a workaround, always check length and remove last point if length are different.

Parameters:axis (numpy.array) – The axis to check
Returns:The cut axis
Return type:numpy.array
nc_var_delete(variable)[source]

Delete a NetCDF variable using NCO operators. A unique filename is generated to avoid multiprocessing errors. To overwrite the input file, the source file is dump using the cat Shell command-line to avoid Python memory limit.

Parameters:variable (str) – The variable to delete
Raises:Error – If the deletion failed
nc_att_delete(variable, attribute)[source]

Delete a NetCDF dimension attribute using NCO operators. A unique filename is generated to avoid multiprocessing errors. To overwrite the input file, the source file is dump using the cat Shell command-line to avoid Python memory limit.

Parameters:
  • attribute (str) – The attribute to delete
  • variable (str) – The variable that has the attribute
Raises:

Error – If the deletion failed

nc_var_overwrite(variable, data)[source]

Rewrite variable to NetCDF file without copy.

Parameters:
  • variable (str) – The variable to replace
  • array data (float) – The data array to overwrite
nc_att_overwrite(attribute, data, variable=None)[source]

Rewrite attribute to NetCDF file without copy.

Parameters:
  • attribute (str) – The attribute to replace
  • data (str) – The string to add to overwrite
  • variable (str) – The variable that has the attribute, default is global attributes
nc_att_get(attribute, variable=None)[source]

Get attribute from NetCDF file. Default is to find into global attributes. If attribute key is not found, get the closest key name instead.

Parameters:
  • attribute (str) – The attribute key to get
  • variable (str) – The variable from which to find the attribute. Global is None.
Returns:

The attribute value

Return type:

str

nc_file_rename(new_filename)[source]

Rename a NetCDF file.

Parameters:new_filename (str) – The new filename to apply
platform:Unix
synopsis:Custom exceptions used in this module.
exception nctime.axis.custom_exceptions.NetCDFVariableRemoveFail(variable, path)[source]

Raised when NetCDF variable removal fails.

exception nctime.axis.custom_exceptions.NetCDFAttributeRemoveFail(attribute, path, variable=None)[source]

Raised when NetCDF attribute removal fails.

platform:Unix
synopsis:Constants used in this module.
platform:Unix
synopsis:Processing context used in this module.
class nctime.axis.context.ProcessingContext(args)[source]

Encapsulates the processing context/information for main process.

Parameters:args (ArgumentParser) – Parsed command-line arguments
Returns:The processing context
Return type:ProcessingContext
nctime.axis.context.is_simulation_completed(card_path)[source]

Returns True if the simulation is completed.

Parameters:card_path (str) – Directory including run.card
Returns:True if the simulation is completed
Return type:boolean

utils

platform:Unix
synopsis:Useful functions to collect data from directories.
class nctime.utils.collector.Collecting(spinner)[source]

Spinner pending data collection.

next()[source]

Print collector spinner

class nctime.utils.collector.Collector(sources, spinner=False)[source]

Base collector class to yield regular NetCDF files.

Parameters:sources (list) – The list of sources to parse
Returns:The data collector
Return type:iter
first()[source]

Returns the first iterator element.

class nctime.utils.collector.FilterCollection[source]

Regex dictionary with a call method to evaluate a string against several regular expressions. The dictionary values are 2-tuples with the regular expression as a string and a boolean indicating to match (i.e., include) or non-match (i.e., exclude) the corresponding expression.

add(name=None, regex='*', inclusive=True)[source]

Add new filter

platform:Unix
synopsis:Constants used in this package.
platform:Unix
synopsis:Processing context used in this module.
class nctime.utils.context.BaseContext(args)[source]

Encapsulates the processing context/information for main process.

Parameters:args (ArgumentParser) – Parsed command-line arguments
Returns:The processing context
Return type:ProcessingContext
platform:Unix
synopsis:Useful functions to use with this package.
class nctime.utils.misc.ncopen(path, mode='r')[source]

Properly opens a netCDF file

Parameters:path (str) – The netCDF file full path
Returns:The netCDF dataset object
Return type:netCDF4.Dataset
nctime.utils.misc.match(pattern, string, inclusive=True)[source]

Validates a string against a regular expression. Only match at the beginning of the string. Default is to match inclusive regex.

Parameters:
  • pattern (str) – The regular expression to match
  • string (str) – The string to test
  • inclusive (boolean) – False if negative matching (i.e., exclude the regex)
Returns:

True if it matches

Return type:

boolean

nctime.utils.misc.get_project(ffp)[source]

Get project identifier from netCDF file.

Parameters:ffp (str) – The file full path
Returns:The lower-case project id
Return type:str
class nctime.utils.misc.ProcessContext(args)[source]

Encapsulates the processing context/information for child process.

Parameters:args (dict) – Dictionary of argument to pass to child process
Returns:The processing context
Return type:ProcessContext
platform:Unix
synopsis:Useful functions to use with this package.
class nctime.utils.custom_print.COLOR(color=None)[source]

Define color object for print statements Default is no color (i.e., restore original color)

class nctime.utils.custom_print.COLORS[source]

String colors for print statements

class nctime.utils.custom_print._TAGS[source]

Tags strings for print statements These are evaluated as properties, in order to defer until after enable_colors or disable_colors has been called during initialisation

class nctime.utils.custom_print.Print[source]

Class to manage and dispatch print statement depending on log and debug mode.

platform:Unix
synopsis:Class and methods used to parse command-line arguments.
class nctime.utils.parser.MultilineFormatter(prog, default_columns=120)[source]

Custom formatter class for argument parser to use with the Python argparse module.

class nctime.utils.parser.DirectoryChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Custom action class for argument parser to use with the Python argparse module.

static directory_checker(path)[source]

Checks if the supplied directory exists. The path is normalized without trailing slash.

Parameters:path (str) – The path list to check
Returns:The same path list
Return type:str
Raises:Error – If one of the directory does not exist
class nctime.utils.parser.InputChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Checks if the supplied input exists.

class nctime.utils.parser.CodeChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Checks if the supplied input exists.

class nctime.utils.parser.TimestampChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Checks if the supplied timestamp is valid.

class nctime.utils.parser.CalendarChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Checks if the supplied calendar is valid.

class nctime.utils.parser.TimeUnitsChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Checks if the supplied time units has valid format.

nctime.utils.parser.regex_validator(string)[source]

Validates a Python regular expression syntax.

Parameters:string (str) – The string to check
Returns:The Python regex
Return type:re.compile
Raises:Error – If invalid regular expression
nctime.utils.parser.positive_only(value)[source]

Validates a positive number.

Parameters:value (str) – The value submitted
Returns:
nctime.utils.parser.processes_validator(value)[source]

Validates the max processes number.

Parameters:value (str) – The max processes number submitted
Returns:
nctime.utils.parser.inc_converter(string)[source]

Checks the increment value syntax.

Parameters:string (str) – The string to check
Returns:The key/value tuple
Return type:list
Raises:Error – If invalid pair syntax
nctime.utils.parser.table_inc_converter(pair)[source]

Checks the key value syntax.

Parameters:pair (str) – The key/value pair to check
Returns:The key/value pair
Return type:list
Raises:Error – If invalid pair syntax
platform:Unix
synopsis:Custom exceptions used in this package.
exception nctime.utils.custom_exceptions.InvalidNetCDFFile(path)[source]

Raised when not a NetCDF file.

exception nctime.utils.custom_exceptions.RenamingNetCDFFailed(src, dst, exists=False)[source]

Raised when NetCDF renaming failed.

exception nctime.utils.custom_exceptions.NoNetCDFAttribute(attribute, path, variable=None)[source]

Raised when a NetCDF attribute is missing.

exception nctime.utils.custom_exceptions.NoNetCDFVariable(variable, path)[source]

Raised when a NetCDF variable is missing.

exception nctime.utils.custom_exceptions.NetCDFTimeStepNotFound(value, path)[source]

Raised when a NetCDF time index is not found.

exception nctime.utils.custom_exceptions.EmptyTimeAxis(path)[source]

Raised when a NetCDF time axis is empty.

exception nctime.utils.custom_exceptions.InvalidFrequency(frequency)[source]

Raised when frequency is unknown.

exception nctime.utils.custom_exceptions.InvalidTable(frequency)[source]

Raised when table is unknown.

exception nctime.utils.custom_exceptions.InvalidClimatologyFrequency(frequency)[source]

Raised when climatology frequency is unknown.

exception nctime.utils.custom_exceptions.InvalidUnits(units)[source]

Raised when time units is unknown.

exception nctime.utils.custom_exceptions.NoFileFound(paths)[source]

Raised when frequency no file found.

exception nctime.utils.custom_exceptions.NoRunCardFound(path)[source]

Raised when no file patterns found in filedef.

exception nctime.utils.custom_exceptions.NoConfigCardFound(path)[source]

Raised when no file patterns found in filedef.

platform:Unix
synopsis:Methods used to deal with NetCDF file time axis.
class nctime.utils.time.TimeInit(ref, tunits_default=None)[source]

Encapsulates the time properties from first file into processing context. These properties has to be used as reference for all files into the directory.

  • The calendar, the frequency and the realm are read from NetCDF global attributes and use to detect instantaneous time axis,
  • The NetCDF time units attribute has to be unchanged in respect with CF convention and archives designs.
Parameters:
  • ref (str) – The reference file full path
  • tunits_default (str) – The default time units if exists
Raises:
  • Error – If NetCDF time units attribute is missing
  • Error – If NetCDF frequency attribute is missing
  • Error – If NetCDF realm attribute is missing
  • Error – If NetCDF calendar attribute is missing
nctime.utils.time.control_time_units(tunits, tunits_default=None)[source]

Controls the time units format as at least “days since YYYY-MM-DD”. The time units can be forced within configuration file using the time_units_default option.

Parameters:
  • tunits (str) – The NetCDF time units string from file
  • tunits_default – The default time units that should be used
Returns:

The appropriate time units string formatted and controlled depending on the project

Return type:

str

nctime.utils.time.convert_time_units(tunits, table, frequency)[source]

Converts default time units from file into time units using the MIP frequency. As en example, for a 3-hourly file, the time units “days since YYYY-MM-DD” becomes “hours since YYYY-MM-DD”.

Parameters:
  • tunits (str) – The NetCDF time units string from file
  • frequency (str) – The time frequency
  • table (str) – The MIP table
Returns:

The converted time units string

Return type:

str

nctime.utils.time.untruncated_timestamp(timestamp)[source]

Returns proper digits for yearly and monthly truncated timestamps. The dates from filename are filled with the 0 digit to reach 14 digits. Consequently, yearly dates starts at January 1st and monthly dates starts at first day of the month.

Parameters:timestamp (str) – A date string from a filename
Returns:The filled timestamp
Return type:str
nctime.utils.time.truncated_timestamp(date, length)[source]

Returns proper digits depending on datetime object.

Parameters:
  • date (datetime.datetime) – Datetime or phony datetime object
  • length (int) – The timestamp length expected
Returns:

The corresponding timestamp

Return type:

str

nctime.utils.time.num2date(num_axis, units, calendar)[source]

A wrapper from netCDF4.num2date able to handle “years since” and “months since” units. If time units are not “years since” or “months since”, calls usual netcdftime.num2date.

Parameters:
  • num_axis (numpy.array) – The numerical time axis following units
  • units (str) – The proper time units
  • calendar (str) – The NetCDF calendar attribute
Returns:

The corresponding date axis

Return type:

array

nctime.utils.time.date2num(date_axis, units, calendar)[source]

A wrapper from netCDF4.date2num able to handle “years since” and “months since” units. If time units are not “years since” or “months since” calls usual netcdftime.date2num.

Parameters:
  • date_axis (numpy.array) – The date axis following units
  • units (str) – The proper time units
  • calendar (str) – The NetCDF calendar attribute
Returns:

The corresponding numerical time axis

Return type:

array

nctime.utils.time.add_month(date, months_to_add)[source]

Finds the next month from date.

Parameters:
  • date (netcdftime.datetime) – Accepts datetime or phony datetime from netCDF4.num2date.
  • months_to_add (int) – The number of months to add to the date
Returns:

The final date

Return type:

netcdftime.datetime

nctime.utils.time.add_year(date, years_to_add)[source]

Finds the next year from date.

Parameters:
  • date (netcdftime.datetime) – Accepts datetime or phony datetime from netCDF4.num2date.
  • years_to_add (int) – The number of years to add to the date
Returns:

The final date

Return type:

netcdftime.datetime

nctime.utils.time.get_start_end_dates_from_filename(filename, pattern, table, frequency, calendar, start=None, end=None)[source]

Returns datetime objects for start and end dates from the filename. To rebuild a proper time axis, the dates from filename are expected to set the first time boundary and not the middle of the time interval.

Parameters:
  • filename (str) – The filename
  • Object pattern (re) –

    The filename pattern as a regex (from re library).

  • table (str) – The MIP table
  • frequency (str) – The time frequency
  • calendar (str) – The NetCDF calendar attribute
  • start (str) – The timestamp to consider as start instead of filename timestamps
  • end (str) – The timestamp to consider as end instead of filename timestamps
Returns:

Start and end dates from the filename

Return type:

netcdftime.datetime

nctime.utils.time.get_last_timestep(ffp)[source]

Returns last time steps from time axis of a NetCDF file. :param str ffp: The file full path :returns: The last timestep :rtype: int

nctime.utils.time.get_next_timestep(ffp, current_timestep)[source]

Returns next time step from time axis given the current one.

Parameters:
  • ffp (str) – The file full path
  • current_timestep (int) – The current_timestep
Returns:

The next timestep

Return type:

int

nctime.utils.time.trunc(array, ndecimals)[source]

Truncates each item of a Numpy array to the decimal ndecimals

Parameters:
  • array (numpy.array) – The array to truncate
  • ndecimals (int) – Number of decimals to keep
Returns:

The truncated array

Return type:

numpy.array

nctime.utils.time.time_inc(table, frequency)[source]

Returns the time incrementation and time units depending on the MIP frequency and table.

Parameters:
  • table (str) – The MIP table
  • frequency (str) – The MIP frequency
Returns:

The corresponding time value and units

Return type:

list

nctime.utils.time.dates2int(dates)[source]

Converts (a list of) dates as integers.

Parameters:dates (list) – A list of datetime or phony datetime objects
Returns:The corresponding formatted integers
Return type:list or int
nctime.utils.time.dates2str(dates, iso_format=True)[source]

Converts (a list of) dates in format: %Y%m%d %H:%M:%s.

Parameters:
  • dates (netcdftime.datetime/list) – A list of datetime or phony datetime objects
  • iso_format (boolean) – ISO format date if True
Returns:

The corresponding formatted strings

Return type:

list or str

nctime.utils.time.date2str(date, iso_format=True)[source]

Converts date in format: %Y%m%d %H:%M:%s.

Parameters:
  • date (netcdftime.datetime) – A datetime or phony datetime objects
  • iso_format (boolean) – ISO format date if True
Returns:

The corresponding formatted string

Return type:

str

nctime.utils.time.str2dates(strings, iso_format=True)[source]

Converts (a list of) string in format: %Y%m%d %H:%M:%s into datetime objects.

Parameters:
  • strings (string/list) – A list of string to convert
  • iso_format (boolean) – ISO format date if True
Returns:

A list of datetime or phony datetime objects

Return type:

list or str

nctime.utils.time.str2date(string, iso_format=True)[source]

Converts string date format: %Y%m%d %H:%M:%s into datetime object

Parameters:
  • string (str) – The string to format
  • iso_format (boolean) – ISO format date if True
Returns:

A datetime or phony datetime objects

Return type:

netcdftime.datetime

Module author: Levavasseur Guillaume (CNRS/IPSL) <glipsl@ipsl.fr>