# Investigate content of QA pawianHists.root file
 
```{autolink-concat}
```

This notebook shows how to to investigate the content of a `pawianHists.root` file that is the result of the QA step in Pawian. We make use of the {mod}`pawian.qa` module of Pawian Tools.

## Draw contained histograms

A `pawianHists.root` comes with several histograms of several PWA distributionsâ€”one for data, one for the fit, and one for Monte Carlo. The {class}`.PawianHists` class contains a few methods to quickly plot these 'contained' histograms.

First, open a `pawianHists.root` file as a {obj}`.PawianHists` object. In this example, we use the ROOT file that is provided in the tests of the {mod}`pawian.qa` module.

In [None]:
from os.path import dirname, realpath

import pawian
from pawian.qa import PawianHists

sample_dir = f"{dirname(realpath(pawian.__file__))}/samples"
filename = f"{sample_dir}/pawianHists_ROOT6_DDpi.root"
pawian_hists = PawianHists(filename)

Now it is quite easy to see which histograms are contained in the histogram file.

In [None]:
from pprint import pprint

histogram_names = pawian_hists.histogram_names
print(f"Number of histograms: {len(histogram_names)}\n")
pprint(histogram_names)

As can be seen, the names can be grouped by `Fit`, `Data`, and `MC`. The property {attr}`.unique_histogram_names` helps to identify which different types there are.

In [None]:
unique_names = pawian_hists.unique_histogram_names
print(f"Number of different histogram types: {len(unique_names)}\n")
pprint(unique_names)

Now, let's have a quick look at one of these histograms:

In [None]:
import matplotlib.pyplot as plt

pawian_hists.draw_histogram("DatapipDm")
plt.show()

This is quite ugly and, frankly speaking, not very interesting, because we actually want to **assess the quality** (QA) of our fit. Of course, you could just plot the histogram of the fit in the same figure, but that would still need some polishing to make it look nicer. And of course, you have to pay attention to use the correct names...

In [None]:
pawian_hists.draw_histogram("DatapipDm")
pawian_hists.draw_histogram("FitpipDm")
plt.show()

There is an easier way to do it. For a nice comparison plot, we use the {meth}`.draw_combined_histogram` method. This method works in the same way as {meth}`.draw_histogram`, but it needs one of the {meth}`.unique_histogram_names`.

Note how this method can also take arguments from {obj}`matplotlib.pyplot.hist` to make it look fancier, such as `density` to make the plots normalized. The histograms in the figure have been embedded with titles, so that we can nicely generate a legend as well.

Another thing to note: this time we draw the histogram on an {obj}`~matplotlib.axes.Axes` ({func}`~matplotlib.pyplot.subplot`), instead of the default {mod}`~matplotlib.pyplot` module, as to give us the means to modify the figure a bit.

We also applied a little trick here: we used {func}`~pawian.latex.convert` from the {mod}`pawian.latex` module to convert the histogram name to a LaTeX string.

In [None]:
histogram_name = "pipDm"
fig = plt.figure(tight_layout=True, figsize=(6, 4), dpi=120).add_subplot()
pawian_hists.draw_combined_histogram(
    histogram_name, plot_on=fig, density=True, alpha=0.5
)
plt.ylim(bottom=0)
plt.legend()

from pawian.latex import convert

plt.xlabel(f"$M({convert(histogram_name)}$)")
plt.ylabel("counts")
plt.show()

Finally, a thing you may want to have done immediately is to generate an overview of all histograms. This can be done with the {meth}`.draw_all_histograms` method. It takes some time to draw them all, but it's worth it!

By the way, notice how the Monte Carlo samples have been hidden by using `mc=False`. You can do the same with `data=False` and/or `fit=False`.

This method again takes arguments from {func}`matplotlib.pyplot.hist`, so feel free to play around and modify the figures.

In [None]:
fig = plt.figure(tight_layout=True, figsize=(15, 14))
pawian_hists.draw_all_histograms(
    mc=False, plot_on=fig, legend="upper right", alpha=0.5, density=True
)
plt.show()

## Plot vector distributions

A `pawianHists.root` file also contains trees with the original Lorentz vectors of the data and of the phase space sample. The weights of the phase space sample represent the intensity of fit result, so you can use those to draw the fit distribution.

The {class}`.PawianHists` class wraps its lorentz vectors in a {class}`~pandas.DataFrame` and you can access its members with the the DataFrame accessors provided by the {mod}`pawian.data` module.

In [None]:
print(pawian_hists.data.pwa.mass.mean())
print("\nAverage data weight:", pawian_hists.data.pwa.weights.mean())
print("Number of data events:", len(pawian_hists.data))
print("Number of MC events:", len(pawian_hists.fit))

The main thing we are interested in, is to compare the fit distributions with those of data. This is the same as with the histograms above, but now we can take an arbitrary binning using the functionality offered by {class}`pandas.DataFrame`.

In [None]:
data = pawian_hists.data
fit = pawian_hists.fit
piDm_data = data["pi+"] + data["D-"]
piDm_fit = fit["pi+"] + fit["D-"]

plot_options = {"bins": 150, "density": True, "alpha": 0.5}
piDm_data.pwa.mass.hist(**plot_options, weights=data.pwa.intensities)
piDm_fit.pwa.mass.hist(**plot_options, weights=fit.pwa.intensities);