HistogramData is a new Mantid module dealing with histograms. This is not a replacement for Histogram1D or ISpectrum, but rather deals with the histogram part of the data stored in those classes. This document is intended for Mantid developers. It informs about the transition to the new way of dealing with histograms, and explains the new interfaces.
Histogram data in Mantid is stored in a number of different ways that reflect the way it was produced, or the way algorithms deal with the data:
The new histogram type described below is designed to follow the needs of the current way Mantid deals with histograms (the following description is mainly for x-data, but the same applies to y-data):
Mantid developers are all familiar with these two facts:
Algorithms that “do not care” access the data and simply work with it. All other algorithms attempt to determine whether the x-data corresponds bin edges or points by comparing its length to that of the y-data (or alternatively use MatrixWorkspace::isHistogramData()). Based on the result they may convert, e.g., bin edges to points.
The copy-on-write mechanism is reflected in the interface of ISpectrum and MatrixWorkspace:
1 2 3 4 5 6 7 8 9 10 11 | using MantidVec = std::vector<double>;
using MantidVecPtr = Kernel::cow_ptr<std::vector<double>>;
// Current ISpectrum interface:
void setX(const MantidVec &X);
void setX(const MantidVecPtr &X);
void setX(const MantidVecPtr::ptr_type &X);
MantidVec &dataX();
const MantidVec &dataX() const;
const MantidVec &readX() const;
MantidVecPtr ptrX() const; // renamed to refX() in MatrixWorkspace
|
The shortcomings of the current implementation are mostly obvious and can also be found in the design document.
In its final form, we will be able to do things like the following (things not implemented yet are marked with an asterisk (*)):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | BinEdges edges{1.0, 2.0, 4.0, 8.0};
Counts counts1{4, 100, 4};
Counts counts2{0, 100, 0};
Histogram histogram1(edges, counts1);
Histogram histogram2(edges, counts2);
// x-data in histogram1 and histogram2 is shared
// Uncertainties are auto-generated, unless specified explicitly
auto errors = histogram1.countStandardDeviations();
errors[0]; // 2.0
errors[1]; // 10.0
errors[2]; // 2.0
// Arithmetics with histograms (*)
histogram1 += histogram2; // Checks size, throws if mismatched!
auto counts = histogram1.counts();
counts[0]; // 4.0
counts[1]; // 200.0
counts[2]; // 4.0
// Deals with errors as well!
errors = histogram1.countStandardDeviations();
errors[0]; // 2.0
errors[1]; // sqrt(200.0)
errors[2]; // 2.0
// Need bin centers (point data) instead of bin edges?
auto points = histogram.points();
// Need variance instead of standard deviation?
auto variances = histogram.countVariances();
// Need frequencies (distribution data) instead of counts?
auto frequencies = histogram.frequencies();
auto variances = histogram.frequencyVariances();
// Type-safe operations
CountStandardDeviations sigmas{0.1, 0.1};
histogram.setCountVariances(sigmas); // Ok, squares internally
sigmas += CountVariances{0.01, 0.01}; // Ok, takes sqrt before adding (*)
sigmas[0]; // 0.2
sigmas[1]; // 0.2
|
Further planned features:
Any feedback on additional capabilities of the new data types is highly appreciated. I will happily consider adding more features as long as they fit the overall design. Please get in contact with me!
Basic changes have been merged (soon after the 3.7 release). We will then work on reducing the use of the old interface (readX(), dataX(), readY(), ...) as much as possible. After that, more features will follow.
We also want to expose most parts of the HistogramData module to Python, but no schedule has been decided yet. Parts of the old interface will be kept alive for now, in particular to maintain support for the old Python interface.
MatrixWorkspace thus has a number of new public methods (details follow below):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | class MatrixWorkspace {
public:
// Note return by value (see below)
Histogram histogram(const size_t index) const;
template <typename... T> void setHistogram(const size_t index, T &&... data) &;
// Note return by value (see below)
BinEdges binEdges(const size_t index) const;
Points points(const size_t index) const;
template <typename... T> void setBinEdges(const size_t index, T &&... data) &;
template <typename... T> void setPoints(const size_t index, T &&... data) &;
// Note return by value (see below)
Counts counts(const size_t index) const;
CountVariances countVariances(const size_t index) const;
CountStandardDeviations countStandardDeviations(const size_t index) const;
Frequencies frequencies(const size_t index) const;
FrequencyVariances frequencyVariances(const size_t index) const;
FrequencyStandardDeviations frequencyStandardDeviations(const size_t index) const;
template <typename... T> void setCounts(const size_t index, T &&... data) & ;
template <typename... T> void setCountVariances(const size_t index, T &&... data) & ;
template <typename... T> void setCountStandardDeviations(const size_t index, T &&... data) & ;
template <typename... T> void setFrequencies(const size_t index, T &&... data) & ;
template <typename... T> void setFrequencyVariances(const size_t index, T &&... data) & ;
template <typename... T> void setFrequencyStandardDeviations(const size_t index, T &&... data) & ;
const HistogramX &x(const size_t index) const;
const HistogramY &y(const size_t index) const;
const HistogramE &e(const size_t index) const;
HistogramX &mutableX(const size_t index) &;
HistogramY &mutableY(const size_t index) &;
HistogramE &mutableE(const size_t index) &;
Kernel::cow_ptr<HistogramX> sharedX(const size_t index) const;
Kernel::cow_ptr<HistogramY> sharedY(const size_t index) const;
Kernel::cow_ptr<HistogramE> sharedE(const size_t index) const;
void setSharedX(const size_t index, const Kernel::cow_ptr<HistogramX> &x) &;
void setSharedY(const size_t index, const Kernel::cow_ptr<HistogramY> &y) &;
void setSharedE(const size_t index, const Kernel::cow_ptr<HistogramE> &e) &;
};
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | class Histogram {
public:
// ...
// Replacement for readX() and dataX() const
const HistogramX &x() const;
const HistogramY &y() const;
const HistogramE &e() const;
// Replacement for dataX()
HistogramX &mutableX() &;
HistogramY &mutableY() &;
HistogramE &mutableE() &;
// Replacement for refX()
Kernel::cow_ptr<HistogramX> sharedX() const;
Kernel::cow_ptr<HistogramY> sharedY() const;
Kernel::cow_ptr<HistogramE> sharedE() const;
// Replacement for setX()
void setSharedX(const Kernel::cow_ptr<HistogramX> &x) &;
void setSharedY(const Kernel::cow_ptr<HistogramY> &y) &;
void setSharedE(const Kernel::cow_ptr<HistogramE> &e) &;
};
|
Note that there is also Dx-data, but it is not widely used and thus omitted from this documentation. The interface for Dx is mostly equivalent to that for E.
For algorithms that work with bin edges, Histogram provides an interface for accessing and modifying the x-data as if it were stored as bin edges:
1 2 3 4 5 6 7 | class Histogram {
public:
// Returns by value!
BinEdges binEdges() const;
// Accepts any arguments that can be used to construct BinEdges
template <typename... T> void setBinEdges(T &&... data) &;
};
|
If the histogram stores point data, Histogram::binEdges() will automatically compute the bin edges from the points.
BinEdges contains a cow_ptr to HistogramX. If the histogram stores bin edges, the BinEdges object returned by Histogram::binEdges() references the same HistogramX, i.e., there is no expensive copy involved.
Setting the same BinEdges object on several histograms will share the underlying data.
Histogram::setBinEdges() includes a size check and throws if the histogram is incompatible with the size defined by the method arguments.
For algorithms that work with points, Histogram provides an interface for accessing and modifying the x-data as if it were stored as points:
1 2 3 4 5 6 7 | class Histogram {
public:
// Returns by value!
Points points() const;
// Accepts any arguments that can be used to construct Points
template <typename... T> void setPoints(T && ... data) &;
};
|
If the histogram stores bin edges, Histogram::points() will automatically compute the points from the bin edges.
Points contains a cow_ptr to HistogramX. If the histogram stores points, the Points object returned by Histogram::points() references the same HistogramX, i.e., there is no expensive copy involved.
Setting the same Points object on several histograms will share the underlying data.
Histogram::setPoints() includes a size check and throws if the histogram is incompatible with the size defined by the method arguments.
For algorithms that work with counts, Histogram provides an interface for accessing and modifying the y-data as if it were stored as counts:
1 2 3 4 5 6 7 | class Histogram {
public:
// Returns by value!
Counts counts() const;
// Accepts any arguments that can be used to construct Counts
template <typename... T> void setCounts(T &&... data) &;
};
|
Currently the histogram stores counts directly. If this were ever not the case, Histogram::counts() will automatically compute the counts from the frequencies.
Counts contains a cow_ptr to HistogramY. If the histogram stores counts (as in the current implementation), the Counts object returned by Histogram::counts() references the same HistogramY, i.e., there is no expensive copy involved.
Setting the same Counts object on several histograms will share the underlying data.
Histogram::setCounts() includes a size check and throws if the histogram is incompatible with the size defined by the method arguments.
For algorithms that work with frequencies (defined as counts divided by the bin width), Histogram provides an interface for accessing and modifying the y-data as if it were stored as frequencies:
1 2 3 4 5 6 7 | class Histogram {
public:
// Returns by value!
Frequencies frequencies() const;
// Accepts any arguments that can be used to construct Frequencies
template <typename... T> void setFrequencies(T &&... data) &;
};
|
Currently the histogram stores counts. Histogram::counts() will automatically compute the frequencies from the counts.
Frequencies contains a cow_ptr to HistogramY.
Setting the same Frequencies object on several histograms will not share the underlying data since a conversion is required. This is in contrast to BinEdges and Points where the internal storage mode is changed when setters are used. This is currently not the case for Counts and Frequencies, i.e., y-data is always stored as counts.
Histogram::setFrequencies() includes a size check and throws if the histogram is incompatible with the size defined by the method arguments.
For algorithms that work with counts or frequencies, Histogram provides an interface for accessing and modifying the e-data as if it were stored as variances or standard deviations of counts or frequencies:
1 2 3 4 5 6 7 8 9 10 11 12 13 | class Histogram {
public:
// Return by value!
CountVariances countVariances() const;
CountStandardDeviations countStandardDeviations() const;
FrequencyVariances frequencyVariances() const;
FrequencyStandardDeviations frequencyStandardDeviations() const;
// Accept any arguments that can be used to construct the respectivy object
template <typename... T> void setCountVariances(T &&... data) &;
template <typename... T> void setCountStandardDeviations(T &&... data) &;
template <typename... T> void setFrequencyVariances(T &&... data) &;
template <typename... T> void setFrequencyStandardDeviations(T &&... data) &;
};
|
Currently the histogram stores the standard deviations of the counts. When accessing the uncertainties via any of the other 3 types the above interface methods, Histogram will automatically compute the requested type from the standard deviations of the counts.
Each of the 4 types for uncertainties contains a cow_ptr to HistogramE. In the current implementation the CountStandardDeviations object returned by Histogram::countStandardDeviations() references the same HistogramE as stored in the histogram, i.e., there is no expensive copy involved.
Setting the same CountStandardDeviations object on several histograms will share the underlying data.
Setting any of the other 3 uncertantity objects on several histograms will not share the underlying data, since a conversion needs to take place.
All Histogram setters for uncertainties includes a size check and throw if the histogram is incompatible with the size defined by the method arguments.
All new classes and functions described here are part of the module HistogramData. The following code examples assume using namespace HistogramData;.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | /////////////////////////////////////////////////////
// Construct like std::vector<double>:
/////////////////////////////////////////////////////
Counts counts(length); // initialized to 0.0
Counts counts(length, 42.0);
Counts counts{0.1, 0.2, 0.4, 0.8};
std::vector<double> data(...);
Counts counts(data);
Counts counts(data.begin() + 1, data.end());
Counts counts(std::move(data));
/////////////////////////////////////////////////////
// Iterators:
/////////////////////////////////////////////////////
BinEdges edges = {1.0, 2.0, 4.0};
if(edges.cbegin() != edges.cend())
*(edges.begin()) += 0.1;
// Range-based for works thanks to iterators:
for (auto &edge : edges)
edge += 0.1;
/////////////////////////////////////////////////////
// Index operator:
/////////////////////////////////////////////////////
BinEdges edges = {1.0, 2.0, 4.0};
edges[0]; // 1.0
edges[1]; // 2.0
edges[2]; // 4.0
// Only const! This is not possible:
edges[0] += 0.1; // DOES NOT COMPILE
// REASON: BinEdges contains a copy-on-write pointer to data, dereferencing in
// tight loop is expensive, so interface prevents things like this:
for (size_t i = 0; i < edges.size(); ++i)
edges[i] += 0.1; // does not compile
// If you need write access via index, use:
auto x = edges.mutableData(); // works similar to current dataX()
for (size_t i = 0; i < x.size(); ++i)
x[i] += 0.1*i;
// Better (for simple cases):
edges += 0.1;
|
1 2 3 4 5 6 7 8 9 10 11 | // Works identically to BinEdges
Points points{0.1, 0.2, 0.4};
// ...
// Type safe!
BinEdges edges(...);
points = edges; // DOES NOT COMPILE
// Can convert
points = Points(edges); // Points are defined as mid-points between edges
edges = BinEdges(points); // Note that this is lossy, see ConvertToHistogram
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | /////////////////////////////////////////////////////
// Construct Histogram:
/////////////////////////////////////////////////////
Histogram histogram(BinEdges{0.1, 0.2, 0.4}, Counts(2, 1000));
histogram.xMode(); // returns Histogram::XMode::BinEdges
/////////////////////////////////////////////////////
// Assignment:
/////////////////////////////////////////////////////
histogram2 = histogram1; // Data is automatically shared
/////////////////////////////////////////////////////
// Basic access:
/////////////////////////////////////////////////////
auto edges = histogram.binEdges(); // size 3, references Histogram::x()
auto points = histogram.points(); // size 2, computed on the fly
points[0]; // 0.15
points[1]; // 0.3
const auto &x = histogram.x(); // size 3
auto &x = histogram.mutableX(); // size 3
/////////////////////////////////////////////////////
// Modify bin edges:
/////////////////////////////////////////////////////
auto edges = histogram.binEdges();
edges[1] += 0.1;
histogram.setBinEdges(edges);
/////////////////////////////////////////////////////
// Outlook (not implemented yet):
/////////////////////////////////////////////////////
histogram2 += histogram1; // Checks for compatible x, adds y and e
/////////////////////////////////////////////////////
// Side remark -- bin edges and points:
/////////////////////////////////////////////////////
Histogram histogram(BinEdges{0.1, 0.2, 0.4});
histogram.xMode(); // returns Histogram::XMode::BinEdges
auto edges = histogram.binEdges(); // size 3, references Histogram::x()
auto points = histogram.points(); // size 2, computed on the fly
// XMode::BinEdges, size 3 is compatible with Points of size 2, so:
histogram.setPoints(points); // size check passes
histogram.xMode(); // returns Histogram::XMode::Points
edges = histogram.binEdges(); // size 3, computed on the fly
points = histogram.points(); // size 2, references Histogram::x()
// Note that edges is now different from its initial values (same
// behavior as ConvertToPointData followed by ConvertToHistogram).
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | // Setting Histograms
outputWS->setHistogram(i, inputWS->histogram(i));
outputWS->setHistogram(i, eventWS->histogram(i)); // ok, histogram data computed based on events
outputWS->setHistogram(i, BinEdges{0.1, 0.2, 0.4}, Counts(2, 1000.0));
eventWS->setHistogram(i, Points{0.1, 0.2, 0.4}); // throws, EventWorkspace needs bin edges
eventWS->setHistogram(i, BinEdges{0.1, 0.2, 0.4}, Counts(2, 1000.0)); // throws, cannot have Y data
eventWS->setHistogram(i, BinEdges{0.1, 0.2, 0.4}); // ok
// Setting BinEdges , Counts, ...
outputWS->setCounts(i, 2, 1000.0);
outputWS->setCounts(i, data.begin() + 42, data.end());
// Preserve sharing
outputWS->setSharedY(i, inputWS->sharedY(i));
outputWS->setCounts(i, inputWS->counts(i)); // also shares, 'Counts' wraps a cow_ptr
outputWS->setBinEdges(i, inputWS->binEdges(i)); // shares if input storage mode is 'XMode::BinEdges'
|
For compatibility reasons an interface to the internal data, equivalent to the old interface, is still available. Using it is discouraged, since it cannot enforce size checks!
1 2 3 4 5 6 7 8 9 | class Histogram {
public:
MantidVec &dataX();
const MantidVec &dataX() const;
const MantidVec &readX() const;
// Pointer access is slightly modified, holding a HistogramX:
void setX(const Kernel::cow_ptr<HistogramX> &X);
Kernel::cow_ptr<HistogramX> ptrX() const;
};
|
In principle, Histogram removes the need for conversions between storage types of Y and E data, i.e., the algorithms ConvertToDistribution and ConvertFromDistribution, and manual conversions between standard deviations and variances.
Storing the uncertainties as standard deviations vs. storing them as variances suffers from a very similar problem.
Both options have shortcomings and I have currently not made up my mind about the best solution. In any case this will be a major change and is thus not part of the initial introduction of the HistogramData module. I would be happy about feedback or other ideas.
There are two issues you might encounter when implementing new algorithms or when running existing scripts that are not part of our automated testing:
Category: Concepts