km3pipe.dataclasses

Dataclasses for internal use. Heavily based on Numpy arrays.

Module Contents

Classes

Table

2D generic Table with grouping index.

Functions

has_structured_dt(arr)

Check if the array representation has a structured dtype.

is_structured(dt)

Check if the dtype is structured.

inflate_dtype(arr, names)

Create structured dtype from a 2d ndarray with unstructured dtype.

km3pipe.dataclasses.has_structured_dt(arr)[source]

Check if the array representation has a structured dtype.

km3pipe.dataclasses.is_structured(dt)[source]

Check if the dtype is structured.

km3pipe.dataclasses.inflate_dtype(arr, names)[source]

Create structured dtype from a 2d ndarray with unstructured dtype.

class km3pipe.dataclasses.Table(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)[source]

2D generic Table with grouping index.

This is a np.recarray subclass with some metadata and helper methods.

You can initialize it directly from a structured numpy array, a pandas DataFrame, a dictionary of (columnar) arrays; or, initialize it from a list of rows/list of columns using the appropriate factory.

This class adds the following to np.recarray:

Parameters:
data: array-like or dict(array-like)

numpy array with structured/flat dtype, or dict of arrays.

h5loc: str

Location in HDF5 file where to store the data. [default: ‘/misc’]

h5singleton: bool

Tables defined as h5singletons are only written once to an HDF5 file. This is used for headers for example (default=False).

dtype: numpy dtype

Datatype over array. If not specified and data is an unstructured array, names needs to be specified. [default: None]

Attributes:
h5loc: str

HDF5 group where to write into. (default=’/misc’)

split_h5: bool

Split the array into separate arrays, column-wise, when saving to hdf5? (default=False)

name: str

Human-readable name, e.g. ‘Hits’

h5singleton: bool

Tables defined as h5singletons are only written once to an HDF5 file. This is used for headers for example (default=False).

Methods

from_dict(arr_dict, dtype=None, **kwargs)

Create an Table from a dict of arrays (similar to pandas).

from_template(data, template, **kwargs)

Create an array from a dict of arrays with a predefined dtype.

sorted(by)

Sort the table by one of its columns.

append_columns(colnames, values)

Append new columns to the table.

to_dataframe()

Return as pandas dataframe.

from_dataframe(df, **kwargs)

Instantiate from a dataframe.

from_rows(list_of_rows, **kwargs)

Instantiate from an array-like with shape (n_rows, n_columns).

from_columns(list_of_columns, **kwargs)

Instantiate from an array-like with shape (n_columns, n_rows).

property templates_avail[source]
property pos[source]
property dir[source]
property phi[source]
property theta[source]
property zenith[source]
property azimuth[source]
property triggered_rows[source]
classmethod from_dict(arr_dict, dtype=None, fillna=False, **kwargs)[source]

Generate a table from a dictionary of arrays.

classmethod from_columns(column_list, dtype=None, colnames=None, **kwargs)[source]
classmethod from_rows(row_list, dtype=None, colnames=None, **kwargs)[source]
classmethod from_template(data, template)[source]

Create a table from a predefined datatype.

See the templates_avail property for available names.

Parameters:
data

Data in a format that the __init__ understands.

template: str or dict

Name of the dtype template to use from kp.dataclasses_templates or a dict containing the required attributes (see the other templates for reference).

append_columns(colnames, values, **kwargs)[source]

Append new columns to the table.

When appending a single column, values can be a scalar or an array of either length 1 or the same length as this array (the one it’s appended to). In case of multiple columns, values must have the shape list(arrays), and the dimension of each array has to match the length of this array.

See the docs for numpy.lib.recfunctions.append_fields for an explanation of the remaining options.

drop_columns(colnames, **kwargs)[source]

Drop columns from the table.

See the docs for numpy.lib.recfunctions.drop_fields for an explanation of the remaining options.

sorted(by, **kwargs)[source]

Sort array by a column.

Parameters:
by: str

Name of the columns to sort by(e.g. ‘time’).

to_dataframe()[source]
classmethod from_dataframe(df, **kwargs)[source]
classmethod merge(tables, fillna=False)[source]

Merge a list of tables