Dataset¶
-
class
myplotspec.Dataset.Dataset(infile, address=None, dataset_cache=None, **kwargs)¶ Bases:
objectRepresents data.
Initializes dataset.
Parameters: - infile (str) – Path to input file, may contain environment variables
- address (str) – Address within hdf5 file from which to load dataset (hdf5 only)
- slice (slice) – Slice to load from hdf5 dataset (hdf5 only)
- dataframe_kw (dict) – Keyword arguments passed to pandas.DataFrame(...) (hdf5 only)
- read_csv_kw (dict) – Keyword arguments passed to pandas.read_csv(...) (text only)
- verbose (int) – Level of verbose output
- debug (int) – Level of debug output
- kwargs (dict) – Additional keyword arguments
-
classmethod
get_cache_key(infile=None, **kwargs)¶ Generates tuple of arguments to be used as key for dataset cache.
-
static
get_cache_message(cache_key)¶ Generates message to be used when reloading previously-loaded dataset.
Parameters: cache_key (tuple) – key with which dataset object is stored in dataset cache Returns: str – message to be used when reloading previously-loaded dataset
Adds command line arguments shared by all subclasses.
Parameters: - parser (ArgumentParser) – Nascent argument parser to which to add arguments
- kwargs (dict) – Additional keyword arguments
-
static
process_infiles(**kwargs)¶ Processes a list of infiles, expanding environment variables and wildcards.
Parameters: infile{s} (str, list) – Paths to infile(s), may contain environment variables and wildcards Returns: list – Paths to infiles with environment variables and wildcards expanded
-
read(**kwargs)¶ Reads data from one or more infiles into a DataFrame.
If more than on infile is provided, the resulting DataFrame will consist of their merged data.
If an infile is an hdf5 file path and (optionally) address within the file in the form
/path/to/file.h5:/address/within/file, the corresponding DataFrame’s values will be loaded from/address/within/file/values, its index will be loaded from/address/within/file/index, its column names will be loaded from the ‘columns’ attribute of/address/within/fileif present, and index name will be loaded from the ‘index_name’ attribute of/address/within/fileif present. Additional arguments provided in dataframe_kw will be passes toDataFrame.If an infile is the path to a text file, the corresponding DataFrame will be loaded using
read_csv, including additional arguments provided in read_csv_kw.After generating the DataFrame from infiles, the index may be set by loading a list of residue names and numbers in the form
XAA:#from indexfile. This is useful when loading data from files that do not specify residue names.Parameters: - infile[s] (str) – Path(s) to input file(s); may contain environment variables and wildcards
- dataframe_kw (dict) – Keyword arguments passed to
DataFrame(hdf5 only) - read_csv_kw (dict) – Keyword arguments passed to
read_csv(text only) - indexfile (str) – Path to index file; may contain environment variables
- verbose (int) – Level of verbose output
- kwargs (dict) – Additional keyword arguments
Returns: DataFrame – Sequence DataFrame
-
write(outfile, **kwargs)¶ Writes DataFrame to text or hdf5.
If outfile is an hdf5 file path and (optionally) address within the file in the form
/path/to/file.h5:/address/within/file, DataFrame’s values will be written to/address/within/file/values, index will be written to/address/within/file/index, column names will be written to the ‘columns’ attribute of/address/within/file, and index name will be written to the ‘index.name’ attribute of/address/within/file.If outfile is the path to a text file, DataFrame will be written using
to_string, including additional arguments provided in read_csv_kw.Parameters: - outfile (str) – Path to output file; may be path to text file or path to hdf5 file in the form ‘/path/to/hdf5/file.h5:/address/within/hdf5/file’; may contain environment variables
- hdf5_kw (dict) – Keyword arguments passed to
create_dataset(hdf5 only) - read_csv_kw (dict) – Keyword arguments passed to
to_string(text only) - verbose (int) – Level of verbose output
- kwargs (dict) – Additional keyword arguments
-
load_dataset(cls=None, **kwargs)¶ Loads a dataset, or reloads a previously-loaded dataset from a cache.