.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_basic_analysis.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_basic_analysis.py: ====================== Basic Analysis Example ====================== .. GENERATED FROM PYTHON SOURCE LINES 9-18 .. code-block:: Python # Authors: Tamás Gál , Moritz Lotze # License: BSD-3 # Date: 2017-10-10 # Status: Under construction... # # sphinx_gallery_thumbnail_number = 5 .. GENERATED FROM PYTHON SOURCE LINES 19-23 Preparation ----------- The very first thing we do is importing our libraries and setting up the Jupyter Notebook environment. .. GENERATED FROM PYTHON SOURCE LINES 23-31 .. code-block:: Python import matplotlib.pyplot as plt # our plotting module import pandas as pd # the main HDF5 reader import numpy as np # must have import km3pipe as kp # some KM3NeT related helper functions import seaborn as sns # beautiful statistical plots! from km3net_testdata import data_path .. GENERATED FROM PYTHON SOURCE LINES 32-33 this is just to make our plots a bit "nicer", you can skip it .. GENERATED FROM PYTHON SOURCE LINES 33-37 .. code-block:: Python import km3pipe.style km3pipe.style.use("km3pipe") .. GENERATED FROM PYTHON SOURCE LINES 38-47 Accessing the Data File(s) -------------------------- In the following, we will work with one random simulation file with reconstruction information from JGandalf which has been converted from ROOT to HDF5 using the ``h5extract`` command line tool provided by ``KM3Pipe``. You can find the documentation here: https://km3py.pages.km3net.de/km3pipe/cmd.html#h5extract .. GENERATED FROM PYTHON SOURCE LINES 49-53 Note for Lyon Users ~~~~~~~~~~~~~~~~~~~ If you are working on the Lyon cluster, you just need to load the Python module with ``module load python`` and you are all set. .. GENERATED FROM PYTHON SOURCE LINES 55-69 Converting from ROOT to HDF5 (if needed) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Choose a file (take e.g. one from /in2p3/km3net/mc/...), load the appropriate Jpp/Aanet version and convert it via:: h5extract /path/to/a/reconstructed/file.root You can toggle a few options to include or exclude specific information. By default, everything will be extracted but you might want to skip Example the hit information. Have a look at ``h5extract -h``. You might also just pick some of the already converted files from HPSS/iRODS! .. GENERATED FROM PYTHON SOURCE LINES 71-73 First Look at the Data ---------------------- .. GENERATED FROM PYTHON SOURCE LINES 73-76 .. code-block:: Python filepath = data_path("hdf5/basic_analysis_sample.h5") .. GENERATED FROM PYTHON SOURCE LINES 77-85 We can have a quick look at the file with the ``ptdump`` command in the terminal:: ptdump filename.h5 For further information, check out the documentation of the KM3NeT HDF5 format definition: http://km3pipe.readthedocs.io/en/latest/hdf5.html .. GENERATED FROM PYTHON SOURCE LINES 87-91 The ``/event_info`` table contains general information about each event. The data is a simple 2D table and each event is represented by a single row. Let's have a look at the first few rows: .. GENERATED FROM PYTHON SOURCE LINES 91-94 .. code-block:: Python event_info = pd.read_hdf(filepath, "/event_info") print(event_info.head(5)) .. rst-class:: sphx-glr-script-out .. code-block:: none det_id frame_index livetime_sec ... weight_w3 run_id event_id 0 -1 5 0 ... 0.07448 1 0 1 -1 8 0 ... 0.13710 1 1 2 -1 13 0 ... 0.11890 1 2 3 -1 15 0 ... 0.29150 1 3 4 -1 18 0 ... 0.10220 1 4 [5 rows x 17 columns] .. GENERATED FROM PYTHON SOURCE LINES 95-97 You can easily inspect the columns/fields of a ``Pandas.Dataframe`` with the ``.dtypes`` attribute: .. GENERATED FROM PYTHON SOURCE LINES 97-100 .. code-block:: Python print(event_info.dtypes) .. rst-class:: sphx-glr-script-out .. code-block:: none det_id int32 frame_index uint32 livetime_sec uint64 mc_id int32 mc_t float64 n_events_gen uint64 n_files_gen uint64 overlays uint32 trigger_counter uint64 trigger_mask uint64 utc_nanoseconds uint64 utc_seconds uint64 weight_w1 float64 weight_w2 float64 weight_w3 float64 run_id uint64 event_id uint32 dtype: object .. GENERATED FROM PYTHON SOURCE LINES 101-104 And access the data either by the property syntax (if it's a valid Python identifier) or the dictionary syntax, for example to access the neutrino weights: .. GENERATED FROM PYTHON SOURCE LINES 104-109 .. code-block:: Python print(event_info.weight_w2) # property syntax print(event_info["weight_w2"]) # dictionary syntax .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1.396000e+09 1 8.907000e+09 2 5.709000e+09 3 8.747000e+10 4 3.571000e+09 ... 3479 7.968000e+11 3480 1.736000e+09 3481 4.861000e+09 3482 7.043000e+10 3483 7.744000e+12 Name: weight_w2, Length: 3484, dtype: float64 0 1.396000e+09 1 8.907000e+09 2 5.709000e+09 3 8.747000e+10 4 3.571000e+09 ... 3479 7.968000e+11 3480 1.736000e+09 3481 4.861000e+09 3482 7.043000e+10 3483 7.744000e+12 Name: weight_w2, Length: 3484, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 110-111 Next, we will read out the MC tracks which are stored under ``/mc_tracks``. .. GENERATED FROM PYTHON SOURCE LINES 111-114 .. code-block:: Python tracks = pd.read_hdf(filepath, "/mc_tracks") .. GENERATED FROM PYTHON SOURCE LINES 115-117 It has a similar structure, but now you can have multiple rows which belong to an event. The ``event_id`` column holds the ID of the corresponding event. .. GENERATED FROM PYTHON SOURCE LINES 117-120 .. code-block:: Python print(tracks.head(10)) .. rst-class:: sphx-glr-script-out .. code-block:: none bjorkeny dir_x dir_y dir_z ... pos_z time type event_id 0 0.057346 -0.616448 -0.781017 -0.100017 ... -71.802 0 -14 0 1 0.000000 0.488756 -0.535017 -0.689111 ... -71.802 0 22 0 2 0.000000 -0.656758 -0.746625 -0.105925 ... -71.802 0 -13 0 3 0.000000 0.412029 -0.878991 -0.240015 ... -71.802 0 2112 0 4 0.000000 -0.664951 -0.468928 0.581332 ... -71.802 0 -211 0 5 0.437484 0.113983 0.914457 0.388298 ... 30.360 0 -14 1 6 0.000000 -0.345462 0.923065 -0.169138 ... 30.360 0 22 1 7 0.000000 0.381285 0.828365 0.410406 ... 30.360 0 -13 1 8 0.000000 -0.191181 0.907296 0.374518 ... 30.360 0 3122 1 9 0.000000 -0.244006 0.922082 0.300377 ... 30.360 0 321 1 [10 rows x 15 columns] .. GENERATED FROM PYTHON SOURCE LINES 121-124 We now are accessing the first track for each event by grouping via ``event_id`` and calling the ``first()`` method of the ``Pandas.DataFrame`` object. .. GENERATED FROM PYTHON SOURCE LINES 124-127 .. code-block:: Python primaries = tracks.groupby("event_id").first() .. GENERATED FROM PYTHON SOURCE LINES 128-129 Here are the first 5 primaries: .. GENERATED FROM PYTHON SOURCE LINES 129-131 .. code-block:: Python print(primaries.head(5)) .. rst-class:: sphx-glr-script-out .. code-block:: none bjorkeny dir_x dir_y dir_z ... pos_y pos_z time type event_id ... 0 0.057346 -0.616448 -0.781017 -0.100017 ... 67.589 -71.802 0 -14 1 0.437484 0.113983 0.914457 0.388298 ... -109.844 30.360 0 -14 2 0.549859 -0.186416 -0.385939 -0.903493 ... 101.459 -30.985 0 14 3 0.056390 -0.371672 0.550002 -0.747902 ... 15.056 24.474 0 -14 4 0.049141 -0.124809 -0.979083 0.160683 ... 88.981 -65.848 0 14 [5 rows x 14 columns] .. GENERATED FROM PYTHON SOURCE LINES 132-134 Creating some Fancy Graphs -------------------------- .. GENERATED FROM PYTHON SOURCE LINES 137-142 .. code-block:: Python plt.hist(primaries.energy, bins=100, log=True) plt.xlabel("energy [GeV]") plt.ylabel("number of events") plt.title("Energy Distribution") .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_001.png :alt: Energy Distribution :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, 'Energy Distribution') .. GENERATED FROM PYTHON SOURCE LINES 144-149 .. code-block:: Python primaries.bjorkeny.hist(bins=100) plt.xlabel("bjorken-y") plt.ylabel("number of events") plt.title("bjorken-y Distribution") .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_002.png :alt: bjorken-y Distribution :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, 'bjorken-y Distribution') .. GENERATED FROM PYTHON SOURCE LINES 151-159 .. code-block:: Python zeniths = kp.math.zenith(primaries.filter(regex="^dir_.?$")) primaries["zenith"] = zeniths plt.hist(np.cos(primaries.zenith), bins=21, histtype="step", linewidth=2) plt.xlabel(r"cos($\theta$)") plt.ylabel("number of events") plt.title("Zenith Distribution") .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_003.png :alt: Zenith Distribution :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, 'Zenith Distribution') .. GENERATED FROM PYTHON SOURCE LINES 160-162 Starting positions of primaries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 163-169 .. code-block:: Python plt.hist2d(primaries.pos_x, primaries.pos_y, bins=100, cmap="viridis") plt.xlabel("x [m]") plt.ylabel("y [m]") plt.title("2D Plane") plt.colorbar() .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_004.png :alt: 2D Plane :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 170-172 If you have seaborn installed (`pip install seaborn`), you can easily create nice jointplots: .. GENERATED FROM PYTHON SOURCE LINES 173-187 .. code-block:: Python try: import seaborn as sns # noqa km3pipe.style.use("km3pipe") # reset matplotlib style except: print("No seaborn found, skipping example.") else: g = sns.jointplot(x="pos_x", y="pos_y", data=primaries, kind="hex") g.set_axis_labels("x [m]", "y[m]") plt.subplots_adjust(right=0.90) # make room for the colorbar plt.title("2D Plane") plt.colorbar() plt.legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_005.png :alt: 2D Plane :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument. .. GENERATED FROM PYTHON SOURCE LINES 189-199 .. code-block:: Python from mpl_toolkits.mplot3d import Axes3D # noqa fig = plt.figure() ax = fig.add_subplot(111, projection="3d") ax.scatter3D(primaries.pos_x, primaries.pos_y, primaries.pos_z, s=3) ax.set_xlabel("x [m]", labelpad=10) ax.set_ylabel("y [m]", labelpad=10) ax.set_zlabel("z [m]", labelpad=10) ax.set_title("3D Plane") .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_006.png :alt: 3D Plane :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_006.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, '3D Plane') .. GENERATED FROM PYTHON SOURCE LINES 201-204 .. code-block:: Python gandalfs = pd.read_hdf(filepath, "/reco/gandalf") print(gandalfs.head(5)) .. rst-class:: sphx-glr-script-out .. code-block:: none beta0 beta1 chi2 ... type upgoing_vs_downgoing event_id 0 0.016788 0.011857 -53.119816 ... 0.0 -0.274836 0 1 0.007835 0.005533 -32.504874 ... 0.0 3.907941 1 2 0.012057 0.008456 -81.195134 ... 0.0 -0.385038 2 3 0.007858 0.005554 -200.985734 ... 0.0 -0.809872 3 4 0.011166 0.007366 -89.451264 ... 0.0 -0.167897 4 [5 rows x 83 columns] .. GENERATED FROM PYTHON SOURCE LINES 206-208 .. code-block:: Python gandalfs.columns .. rst-class:: sphx-glr-script-out .. code-block:: none Index(['beta0', 'beta1', 'chi2', 'dir_x', 'dir_y', 'dir_z', 'jenergy_chi2', 'jenergy_energy', 'jstart_length', 'jstart_npe_mip', 'jstart_npe_mip_total', 'lambda', 'n_hits', 'n_iter', 'pos_x', 'pos_y', 'pos_z', 'rec_stage', 'rec_type', 'spread_beta0_iqr', 'spread_beta0_mad', 'spread_beta0_mean', 'spread_beta0_median', 'spread_beta0_std', 'spread_beta1_iqr', 'spread_beta1_mad', 'spread_beta1_mean', 'spread_beta1_median', 'spread_beta1_std', 'spread_chi2_iqr', 'spread_chi2_mad', 'spread_chi2_mean', 'spread_chi2_median', 'spread_chi2_std', 'spread_dir_x_iqr', 'spread_dir_x_mad', 'spread_dir_x_mean', 'spread_dir_x_median', 'spread_dir_x_std', 'spread_dir_y_iqr', 'spread_dir_y_mad', 'spread_dir_y_mean', 'spread_dir_y_median', 'spread_dir_y_std', 'spread_dir_z_iqr', 'spread_dir_z_mad', 'spread_dir_z_mean', 'spread_dir_z_median', 'spread_dir_z_std', 'spread_jenergy_chi2_iqr', 'spread_jenergy_chi2_mad', 'spread_jenergy_chi2_mean', 'spread_jenergy_chi2_median', 'spread_jenergy_chi2_std', 'spread_jenergy_energy_iqr', 'spread_jenergy_energy_mad', 'spread_jenergy_energy_mean', 'spread_jenergy_energy_median', 'spread_jenergy_energy_std', 'spread_n_hits_iqr', 'spread_n_hits_mad', 'spread_n_hits_mean', 'spread_n_hits_median', 'spread_n_hits_std', 'spread_pos_x_iqr', 'spread_pos_x_mad', 'spread_pos_x_mean', 'spread_pos_x_median', 'spread_pos_x_std', 'spread_pos_y_iqr', 'spread_pos_y_mad', 'spread_pos_y_mean', 'spread_pos_y_median', 'spread_pos_y_std', 'spread_pos_z_iqr', 'spread_pos_z_mad', 'spread_pos_z_mean', 'spread_pos_z_median', 'spread_pos_z_std', 'time', 'type', 'upgoing_vs_downgoing', 'event_id'], dtype='object') .. GENERATED FROM PYTHON SOURCE LINES 209-213 plt.hist(gandalfs['lambda'], bins=50, log=True) plt.xlabel('lambda parameter') plt.ylabel('count') plt.title('Lambda Distribution of Reconstructed Events') .. GENERATED FROM PYTHON SOURCE LINES 217-224 .. code-block:: Python gandalfs["zenith"] = kp.math.zenith(gandalfs.filter(regex="^dir_.?$")) plt.hist((gandalfs.zenith - primaries.zenith).dropna(), bins=100) plt.xlabel(r"true zenith - reconstructed zenith [rad]") plt.ylabel("count") plt.title("Zenith Reconstruction Difference") .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_007.png :alt: Zenith Reconstruction Difference :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_007.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /builds/km3py/km3pipe/src/km3pipe/math.py:268: RuntimeWarning: invalid value encountered in divide unit = vector / np.linalg.norm(vector, axis=1, **kwargs)[:, None] Text(0.5, 1.0, 'Zenith Reconstruction Difference') .. GENERATED FROM PYTHON SOURCE LINES 226-233 .. code-block:: Python l = 0.2 lambda_cut = gandalfs["lambda"] < l plt.hist((gandalfs.zenith - primaries.zenith)[lambda_cut].dropna(), bins=100) plt.xlabel(r"true zenith - reconstructed zenith [rad]") plt.ylabel("count") plt.title("Zenith Reconstruction Difference for lambda < {}".format(l)) .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_008.png :alt: Zenith Reconstruction Difference for lambda < 0.2 :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_008.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, 'Zenith Reconstruction Difference for lambda < 0.2') .. GENERATED FROM PYTHON SOURCE LINES 234-236 Combined zenith reco plot for different lambda cuts ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 236-251 .. code-block:: Python fig, ax = plt.subplots() for l in [100, 5, 2, 1, 0.1]: l_cut = gandalfs["lambda"] < l ax.hist( (primaries.zenith - gandalfs.zenith)[l_cut].dropna(), bins=100, label=r"$\lambda$ = {}".format(l), alpha=0.7, ) plt.xlabel(r"true zenith - reconstructed zenith [rad]") plt.ylabel("count") plt.legend() plt.title("Zenith Reconstruction Difference for some Lambda Cuts") .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_009.png :alt: Zenith Reconstruction Difference for some Lambda Cuts :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_009.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, 'Zenith Reconstruction Difference for some Lambda Cuts') .. GENERATED FROM PYTHON SOURCE LINES 252-261 Fitting Angular resolutions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's fit some distributions: gaussian + lorentz (aka norm + cauchy) Fitting the gaussian to the whole range is a very bad fit, so we make a second gaussian fit only to +- 10 degree. Conversely, the Cauchy (lorentz) distribution is a near perfect fit (note that ``2 gamma = FWHM``). .. GENERATED FROM PYTHON SOURCE LINES 261-305 .. code-block:: Python from scipy.stats import cauchy, norm # noqa residuals = gandalfs.zenith - primaries.zenith cut = (gandalfs["lambda"] < l) & (np.abs(residuals) < 2 * np.pi) residuals = residuals[cut] event_info[cut] # convert rad -> deg residuals = residuals * 180 / np.pi pi = 180 # x axis for plotting x = np.linspace(-pi, pi, 1000) c_loc, c_gamma = cauchy.fit(residuals) fwhm = 2 * c_gamma g_mu_bad, g_sigma_bad = norm.fit(residuals) g_mu, g_sigma = norm.fit(residuals[np.abs(residuals) < 10]) plt.hist(residuals, bins="auto", label="Histogram", density=True, alpha=0.7) plt.plot( x, cauchy(c_loc, c_gamma).pdf(x), label="Lorentz: FWHM $=${:.3f}".format(fwhm), linewidth=2, ) plt.plot( x, norm(g_mu_bad, g_sigma_bad).pdf(x), label="Unrestricted Gauss: $\sigma =$ {:.3f}".format(g_sigma_bad), linewidth=2, ) plt.plot( x, norm(g_mu, g_sigma).pdf(x), label="+- 10 deg Gauss: $\sigma =$ {:.3f}".format(g_sigma), linewidth=2, ) plt.xlim(-pi / 4, pi / 4) plt.xlabel("Zenith residuals / deg") plt.legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_010.png :alt: plot basic analysis :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_010.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 306-309 We can also look at the median resolution without doing any fits. In textbooks, this metric is also called Median Absolute Deviation. .. GENERATED FROM PYTHON SOURCE LINES 309-320 .. code-block:: Python resid_median = np.median(residuals) residuals_shifted_by_median = residuals - resid_median absolute_deviation = np.abs(residuals_shifted_by_median) resid_mad = np.median(absolute_deviation) plt.hist(np.abs(residuals), alpha=0.7, bins="auto", label="Absolute residuals") plt.axvline(resid_mad, label="MAD: {:.2f}".format(resid_mad), linewidth=3) plt.title("Average resolution: {:.3f} degree".format(resid_mad)) plt.legend() plt.xlabel("Absolute zenith residuals / deg") .. image-sg:: /auto_examples/images/sphx_glr_plot_basic_analysis_011.png :alt: Average resolution: 3.963 degree :srcset: /auto_examples/images/sphx_glr_plot_basic_analysis_011.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 4.833333333333329, 'Absolute zenith residuals / deg') .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.518 seconds) .. _sphx_glr_download_auto_examples_plot_basic_analysis.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_basic_analysis.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_basic_analysis.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_