Dataset

class myplotspec.Dataset.Dataset(infile, address=None, dataset_cache=None, **kwargs)

Bases: object

Represents data.

Initializes dataset.

Parameters:
  • infile (str) – Path to input file, may contain environment variables
  • address (str) – Address within hdf5 file from which to load dataset (hdf5 only)
  • slice (slice) – Slice to load from hdf5 dataset (hdf5 only)
  • dataframe_kw (dict) – Keyword arguments passed to pandas.DataFrame(...) (hdf5 only)
  • read_csv_kw (dict) – Keyword arguments passed to pandas.read_csv(...) (text only)
  • verbose (int) – Level of verbose output
  • debug (int) – Level of debug output
  • kwargs (dict) – Additional keyword arguments
classmethod get_cache_key(infile=None, **kwargs)

Generates tuple of arguments to be used as key for dataset cache.

static get_cache_message(cache_key)

Generates message to be used when reloading previously-loaded dataset.

Parameters:cache_key (tuple) – key with which dataset object is stored in dataset cache
Returns:str – message to be used when reloading previously-loaded dataset
static add_shared_args(parser, **kwargs)

Adds command line arguments shared by all subclasses.

Parameters:
  • parser (ArgumentParser) – Nascent argument parser to which to add arguments
  • kwargs (dict) – Additional keyword arguments
static process_infiles(**kwargs)

Processes a list of infiles, expanding environment variables and wildcards.

Parameters:infile{s} (str, list) – Paths to infile(s), may contain environment variables and wildcards
Returns:list – Paths to infiles with environment variables and wildcards expanded
read(**kwargs)

Reads data from one or more infiles into a DataFrame.

If more than on infile is provided, the resulting DataFrame will consist of their merged data.

If an infile is an hdf5 file path and (optionally) address within the file in the form /path/to/file.h5:/address/within/file, the corresponding DataFrame’s values will be loaded from /address/within/file/values, its index will be loaded from /address/within/file/index, its column names will be loaded from the ‘columns’ attribute of /address/within/file if present, and index name will be loaded from the ‘index_name’ attribute of /address/within/file if present. Additional arguments provided in dataframe_kw will be passes to DataFrame.

If an infile is the path to a text file, the corresponding DataFrame will be loaded using read_csv, including additional arguments provided in read_csv_kw.

After generating the DataFrame from infiles, the index may be set by loading a list of residue names and numbers in the form XAA:# from indexfile. This is useful when loading data from files that do not specify residue names.

Parameters:
  • infile[s] (str) – Path(s) to input file(s); may contain environment variables and wildcards
  • dataframe_kw (dict) – Keyword arguments passed to DataFrame (hdf5 only)
  • read_csv_kw (dict) – Keyword arguments passed to read_csv (text only)
  • indexfile (str) – Path to index file; may contain environment variables
  • verbose (int) – Level of verbose output
  • kwargs (dict) – Additional keyword arguments
Returns:

DataFrame – Sequence DataFrame

write(outfile, **kwargs)

Writes DataFrame to text or hdf5.

If outfile is an hdf5 file path and (optionally) address within the file in the form /path/to/file.h5:/address/within/file, DataFrame’s values will be written to /address/within/file/values, index will be written to /address/within/file/index, column names will be written to the ‘columns’ attribute of /address/within/file, and index name will be written to the ‘index.name’ attribute of /address/within/file.

If outfile is the path to a text file, DataFrame will be written using to_string, including additional arguments provided in read_csv_kw.

Parameters:
  • outfile (str) – Path to output file; may be path to text file or path to hdf5 file in the form ‘/path/to/hdf5/file.h5:/address/within/hdf5/file’; may contain environment variables
  • hdf5_kw (dict) – Keyword arguments passed to create_dataset (hdf5 only)
  • read_csv_kw (dict) – Keyword arguments passed to to_string (text only)
  • verbose (int) – Level of verbose output
  • kwargs (dict) – Additional keyword arguments
load_dataset(cls=None, **kwargs)

Loads a dataset, or reloads a previously-loaded dataset from a cache.