The APyT mass spectrum fitting module

This module enables semi-automatic fitting of high-quality mass spectra previously processed using the APyT mass spectrum alignment module. It leverages the isotope database from the periodic table package to support chemically meaningful deconvolution of complex spectra.

The core functionality involves modeling the shape of individual isotope peaks, grouping them by chemical species and charge state, and fitting their contributions to the measured spectrum using constrained optimization.

General peak shape description

The general peak shape \(p(x)\) is modeled as the product of two components:

\[p(x) = a(x) \, d(x),\]

where:

  • \(a(x)\) is the activation function, describing the rising edge of the peak.

  • \(d(x)\) is the decay function, modeling the asymmetric trailing tail.

Decay function

The decay component is defined as a simple exponential function:

\[d(x) = \exp(-x).\]

Activation function

The onset of the peak is described by an error function:

\[a(x) = \frac 1 2 \left( \operatorname{erf}\left(\frac{\sqrt \pi}{2} x\right) + 1 \right).\]

The prefactor ensures that the peak position is located at \(x = 0\), i.e., \(p'(0) = 0\).

Implementation notes

Fitting a complete mass spectrum requires summing over all relevant peaks, each corresponding to a specific combination of element, isotope, and charge state. However, natural isotope abundances are used to constrain the relative intensities within each isotope group, reducing the number of independent fitting parameters to one intensity per group.

To allow for more flexible peak shapes, the decay component may also be modeled as a sum of two exponentials (a “double exponential decay”). The desired peak model is selected by passing one of the following identifiers:

  • error-expDecay: Default single exponential decay with error-function onset.

  • error-doubleExpDecay: Double exponential decay with error-function onset.

If custom peak shapes are needed, the internal function _peak_generic() must be extended (note: this function is not part of the public API).

Data structure overview

This module primarily uses Python dictionaries for I/O interfaces, detailed below.

Element dictionary

A dictionary where each key represents an element or molecule, and the value is a tuple containing:

  • A tuple of occurring charge states.

  • The nominal atomic or molecular volume (in nm³), used for subsequent reconstruction.

Peak dictionary

A dictionary summarizing the properties of an individual peak (i.e. element/charge state/abundance combination for a specific isotope):

  • element: Chemical symbol or molecular identifier.

  • charge: Charge state of the ion.

  • mass_charge_ratio: Mass-to-charge ratio (amu/e).

  • abundance: Isotopic abundance (as a decimal fraction).

  • is_max: Boolean indicating if this is the most abundant isotope.

  • volume: Atomic/molecular volume (nm³).

Element count dictionary

A dictionary summarizing fitted results for each element/charge state combination, with:

  • element: Chemical symbol or molecular identifier.

  • charge: Charge state of the ion.

  • count: Number of determined counts for this element/charge state combination.

  • fraction: Relative fraction of counts (excluding background).

List of functions

This module provides the following functions for spectrum fitting and interpretation:

apyt.spectrum.fit.check_compatibility(version)[source]

Check compatibility of given version against current module version.

Raises:

Exception – An exception is raised in case of incompatibility.

apyt.spectrum.fit.counts(peaks_list, function, params, data_range, bin_width, ignore_list=[], verbose=False)[source]

Get counts for all elements.

This functions loops trough all peaks in peaks_list with the is_max key set to True and returns the counts for each element and charge state combination.

Parameters:
  • peaks_list (list of dicts) – The list of all occurring peaks in the mass spectrum, as described in peak dictionary.

  • function (str) – The string identifying the general peak shape function, as described in implementation notes.

  • params (dict) – The dictionary with the fit parameter names as keys, and best-fit values as values, as described in the best_values ModelResult attribute of the LMFIT module.

  • data_range (tuple) – The covered data range, i.e. the minimum and maximum of the mass spectrum. This interval is required for analytical integration of the background counts.

  • bin_width (float) – The histogram bin width.

Keyword Arguments:
  • ignore_list (list of str) – The list of elements/molecules to ignore when calculating compositions. Defaults to empty list.

  • verbose (bool) – Whether to print the content of all element count dictionaries. Defaults to False.

Returns:

  • counts_list (list of dicts) – The list of all element count dictionaries, as described in element count dictionary.

  • total_counts (int) – The total number of counts (without background).

  • background (int) – The number of background counts.

apyt.spectrum.fit.enable_debug(is_dbg)[source]

Enable or disable debug output.

Parameters:

is_dbg (bool) – Whether to enable or disable debug output.

apyt.spectrum.fit.fit(spectrum, peaks_list, function, verbose=False, **kwargs)[source]

Fit mass spectrum.

This function internally uses the LMFIT module to fit the complete mass spectrum, where the peak positions are provided by the peaks_list argument.

Parameters:
  • spectrum (ndarray, shape (n, 2)) – The mass spectrum histogram data.

  • peaks_list (list of dicts) – The list of all occurring peaks in the mass spectrum, as described in peak dictionary.

  • function (str) – The string identifying the general peak shape function, as described in implementation notes.

Keyword Arguments:
  • peak_scale (list) – Whether to apply individual width scaling to selected peaks. Each scale parameter represents a scaling factor and it is applied to all peaks that match a corresponding regular expression from the provided list. Each regular expression defines one unique scale parameter.

  • peak_shift (list) – Whether to apply additional shifts to selected peaks. Each shift parameter represents an absolute shift that scales with the square root of the peak position. It is applied to all peaks that match a corresponding regular expression from the provided list. Each regular expression defines one unique shift parameter. A systematic shift of pure oxygen was first discovered in the vanadium pentoxide measurements of Simone Bauder in 2024.

  • scale_width (bool) – Whether to use a varying parameter for the peak width scaling. Theoretically, the peak width in the mass-to-charge scale is expected to be proportional to the square root of the peak position, but may show different behavior. The parameter is implemented as an exponent. Defaults to False, i.e. assume square root behavior, which resembles a fixed exponent value of 0.5.

  • verbose (bool) – Whether to print the fit results and statistics. Defaults to False.

Returns:

result – The result of the fit, as described by ModelResult of the LMFIT module.

Return type:

ModelResult

apyt.spectrum.fit.map_ids(mc_ratio, r, x, peaks_list, function, params, group_charge=True, verbose=False)[source]

Map mass-to-charge ratios to chemical IDs.

This function calculates a probability vector which contains the probabilities to find a specific element (with individual charge state if requested) associated with the given peaks in peaks_list (including background) at every position x in the mass-to-charge spectrum. These probabilities are eventually used for peak de-convolution and background subtraction to assign the corresponding chemical IDs. The atomic volumes of the events are also mapped and returned.

Parameters:
  • mc_ratio (ndarray, shape (n,)) – The mass-to-charge ratios of the n events.

  • r (ndarray, shape (n,)) – The n random numbers (between 0.0 and 1.0) required for mapping the probability vector to an actual chemical ID.

  • x (ndarray, shape (m,)) – The positions at which the probability vectors are calculated. The elements of the array must be equidistant and would typically represent the histogram bin centers.

  • peaks_list (list of dicts) – The list of all occurring peaks in the mass spectrum, as described in peak dictionary.

  • function (str) – The string identifying the general peak shape function, as described in implementation notes.

  • params (dict) – The dictionary with the fit parameter names as keys, and best-fit values as values, as described in the best_values ModelResult attribute of the LMFIT module.

  • group_charge (bool) – Whether to group all charge states of an individual element. Defaults to True.

  • verbose (bool) – Whether to print a list of all chemical IDs and their fractions in relation to the total counts (with background). Defaults to False.

Returns:

  • ids (ndarray, shape (n,)) – The chemical IDs of the n events.

  • Ω (ndarray, shape (n,)) – The atomic volumes of the n events.

apyt.spectrum.fit.peaks_list(element_dict, mode='mass', mass_decimals=3, verbose=False)[source]

Get list of all peaks for specified elements and charge states.

Parameters:

element_dict (dict) – The dictionary containing the elements and their charge states, as described in element dictionary.

Keyword Arguments:
  • mode (str) – Which mode to use for the calculation of the mass-to-charge ratios. In mass mode (recommended), the actual isotopic masses are used, while in isotope mode, the mass numbers of the isotopes are used. Defaults to mass.

  • mass_decimals (int) – The number of decimal places for the mass-to-charge ratios in mass mode should be limited to reduce the number of molecular isotopes. This setting effectively groups isotopes whose masses do not differ by more than the specified mass_decimals. This value should be set based on the resolution of the mass spectrum. Defaults to 3, i.e. group isotopes whose masses do not differ by more than 0.001.

  • verbose (bool) – Whether to print the content of all determined peak dictionaries. Defaults to False.

Returns:

peaks_list – The list of all occurring peaks in the mass spectrum, as described in peak dictionary.

Return type:

list of dicts

apyt.spectrum.fit.spectrum(x, peaks_list, function, params, elements_list=None)[source]

Calculate spectrum for specified list of elements.

Parameters:
  • x (ndarray or scalar) – The position(s) where to evaluate the mass spectrum.

  • peaks_list (list of dicts) – The list of all occurring peaks in the mass spectrum, as described in peak dictionary.

  • function (str) – The string identifying the general peak shape function, as described in implementation notes.

  • params (dict) – The dictionary with the fit parameter names as keys, and best-fit values as values, as described in the best_values ModelResult attribute of the LMFIT module.

Keyword Arguments:

elements_list (list of str) – The list specifying for which elements the mass spectrum should be evaluated. Defaults to None, indicating to use all elements occurring in peaks_list. Note that the background is not included.

Returns:

y – The cumulated spectrum value at position x for the list of provided elements. This is a scalar if x is a scalar.

Return type:

ndarray or scalar

apyt.spectrum.fit.split_molecules(ids, xyz, peaks_list, group_charge=True, shuffle=False, verbose=False)[source]

Split molecular events into individual atoms.

This function examines all events for potential molecules and decomposes them into their constituent atoms. The chemical IDs are remapped to unique elements in alphabetical order, and the three-dimensional coordinates of each molecule are assigned to all its individual atoms. The optional argument shuffle allows for random reordering of the atoms within a molecule.

Parameters:
  • ids (ndarray, shape (n,)) – The chemical IDs of the n events.

  • xyz (ndarray, shape (n, 3)) – The three-dimensional reconstructed positions of the n events.

  • peaks_list (list of dicts) – The list of all occurring peaks in the mass spectrum, as described in peak dictionary.

  • group_charge (bool) – Whether to group all charge states of an individual element. Defaults to True.

  • shuffle (bool) – Whether to randomly shuffle the order of the atoms after splitting. Defaults to False.

  • verbose (bool) – Whether to print a list of all chemical IDs and their fractions in relation to the total counts (with background). Defaults to False.

Returns:

  • ids_split (ndarray, shape (m,)) – The chemical IDs of the m events after splitting.

  • xyz_split (ndarray, shape (m, 3)) – The three-dimensional reconstructed positions of the m events after splitting.

apyt.spectrum.fit.version()[source]

Get version of this module.

Returns:

version – The version string of this module.

Return type:

str