km3pipe.dataclasses
¶
Dataclasses for internal use. Heavily based on Numpy arrays.
Module Contents¶
Classes¶
2D generic Table with grouping index. |
Functions¶
|
Check if the array representation has a structured dtype. |
|
Check if the dtype is structured. |
|
Create structured dtype from a 2d ndarray with unstructured dtype. |
- km3pipe.dataclasses.has_structured_dt(arr)[source]¶
Check if the array representation has a structured dtype.
- km3pipe.dataclasses.inflate_dtype(arr, names)[source]¶
Create structured dtype from a 2d ndarray with unstructured dtype.
- class km3pipe.dataclasses.Table(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)[source]¶
2D generic Table with grouping index.
This is a np.recarray subclass with some metadata and helper methods.
You can initialize it directly from a structured numpy array, a pandas DataFrame, a dictionary of (columnar) arrays; or, initialize it from a list of rows/list of columns using the appropriate factory.
This class adds the following to
np.recarray
:- Parameters:
- data: array-like or dict(array-like)
numpy array with structured/flat dtype, or dict of arrays.
- h5loc: str
Location in HDF5 file where to store the data. [default: ‘/misc’]
- h5singleton: bool
Tables defined as h5singletons are only written once to an HDF5 file. This is used for headers for example (default=False).
- dtype: numpy dtype
Datatype over array. If not specified and data is an unstructured array,
names
needs to be specified. [default: None]
- Attributes:
- h5loc: str
HDF5 group where to write into. (default=’/misc’)
- split_h5: bool
Split the array into separate arrays, column-wise, when saving to hdf5? (default=False)
- name: str
Human-readable name, e.g. ‘Hits’
- h5singleton: bool
Tables defined as h5singletons are only written once to an HDF5 file. This is used for headers for example (default=False).
Methods
from_dict(arr_dict, dtype=None, **kwargs)
Create an Table from a dict of arrays (similar to pandas).
from_template(data, template, **kwargs)
Create an array from a dict of arrays with a predefined dtype.
sorted(by)
Sort the table by one of its columns.
append_columns(colnames, values)
Append new columns to the table.
to_dataframe()
Return as pandas dataframe.
from_dataframe(df, **kwargs)
Instantiate from a dataframe.
from_rows(list_of_rows, **kwargs)
Instantiate from an array-like with shape (n_rows, n_columns).
from_columns(list_of_columns, **kwargs)
Instantiate from an array-like with shape (n_columns, n_rows).
- classmethod from_dict(arr_dict, dtype=None, fillna=False, **kwargs)[source]¶
Generate a table from a dictionary of arrays.
- classmethod from_template(data, template)[source]¶
Create a table from a predefined datatype.
See the
templates_avail
property for available names.- Parameters:
- data
Data in a format that the
__init__
understands.- template: str or dict
Name of the dtype template to use from
kp.dataclasses_templates
or adict
containing the required attributes (see the other templates for reference).
- append_columns(colnames, values, **kwargs)[source]¶
Append new columns to the table.
When appending a single column,
values
can be a scalar or an array of either length 1 or the same length as this array (the one it’s appended to). In case of multiple columns, values must have the shapelist(arrays)
, and the dimension of each array has to match the length of this array.See the docs for
numpy.lib.recfunctions.append_fields
for an explanation of the remaining options.
- drop_columns(colnames, **kwargs)[source]¶
Drop columns from the table.
See the docs for
numpy.lib.recfunctions.drop_fields
for an explanation of the remaining options.