PyMassSpec
Python Toolkit for Mass Spectrometry
Docs |
|
---|---|
Tests |
|
PyPI |
|
Anaconda |
|
Activity |
|
QA |
|
Other |
PyMassSpec is a Python package for processing gas chromatography-mass spectrometry data. PyMassSpec provides a framework and a set of components for rapid development and testing of methods for processing of chromatography–mass spectrometry data. PyMassSpec can be used interactively through the Python shell, in a Jupyter Notebook, or the functions can be collected into scripts when it is preferable to perform data processing in the batch mode.
Forked from the original PyMS Repository: https://github.com/ma-bio21/pyms. Originally by Andrew Isaac, Sean O’Callaghan and Vladimir Likić. The original publication can be found here: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-115
The original project seems to have been abandoned as there has been no activity since 2017.
Table of Contents
The PyMassSpec project
The directory structure of PyMassSpec is as follows:
/
├── pyms: The PyMassSpec code
│
├── pyms-data: Example GC-MS data files
│
├── pyms-demo: Examples of how to use PyMassSpec
│
├── tests: pytest tests
│
└── doc-source: Sphinx source for documentation
Installation
python3 -m pip install PyMassSpec --user
First add the required channels
conda config --add channels https://conda.anaconda.org/bioconda
conda config --add channels https://conda.anaconda.org/conda-forge
conda config --add channels https://conda.anaconda.org/domdfcoding
Then install
conda install PyMassSpec
python3 -m pip install git+https://github.com/PyMassSpec/PyMassSpec@master --user
Usage
A tutorial illustrating various PyMassSpec features in detail is provided in subsequent chapters of this User Guide. The commands executed interactively are grouped together by example, and can be found here.
The data used in the PyMassSpec documentation and examples is available here.
In the “Demos and Examples” section there is a page corresponding to each example, coded with the chapter number (ie. “pyms-demo/20a/” corresponds to the Example 20a, from Chapter 2).
Each example has a script named ‘proc.py’ which contains the commands given in the example. These scripts can be run with the following command:
$ python3 proc.py
Example processing GC-MS data
Download the file gc01_0812_066.jdx
and save it in the folder data
.
This file contains GC-MS data in the the JCAMP-DX format.
First the raw data is loaded:
>>> from pyms.GCMS.IO.JCAMP import JCAMP_reader
>>> jcamp_file = "data/gc01_0812_066.jdx"
>>> data = JCAMP_reader(jcamp_file)
-> Reading JCAMP file 'Data/gc01_0812_066.jdx'
>>> data
<pyms.GCMS.Class.GCMS_data at 0x7f3ec77da0b8>
The intensity matrix object is then built by binning the data:
>>> from pyms.IntensityMatrix import build_intensity_matrix_i
>>> im = build_intensity_matrix_i(data)
In this example, we show how to obtain the dimensions of the newly created intensity matrix, then loop over all ion chromatograms, and for each ion chromatogram apply Savitzky-Golay noise filter and tophat baseline correction:
>>> n_scan, n_mz = im.size
>>> from pyms.Noise.SavitzkyGolay import savitzky_golay
>>> from pyms.TopHat import tophat
>>> for ii in range(n_mz):
... print("working on IC", ii)
... ic = im.get_ic_at_index(ii)
... ic1 = savitzky_golay(ic)
... ic_smooth = savitzky_golay(ic1)
... ic_base = tophat(ic_smooth, struct="1.5m")
... im.set_ic_at_index(ii, ic_base)
The resulting noise and baseline corrected ion chromatogram is saved back into the intensity matrix.
Further examples can be found in the documentation
License
PyMassSpec is Free and Open Source software released under the GNU General Public License version 2.
Issues
If you encounter any problems, please file an issue along with a detailed description.
Installation
python3 -m pip install PyMassSpec --user
First add the required channels
conda config --add channels https://conda.anaconda.org/bioconda
conda config --add channels https://conda.anaconda.org/conda-forge
conda config --add channels https://conda.anaconda.org/domdfcoding
Then install
conda install PyMassSpec
python3 -m pip install git+https://github.com/PyMassSpec/PyMassSpec@master --user
GC-MS Raw Data Model
Table of Contents
Introduction
PyMassSpec can read gas chromatography-mass spectrometry (GC-MS) data stored in Analytical Data Interchange for Mass Spectrometry (ANDI-MS), 1 and Joint Committee on Atomic and Molecular Physical Data (JCAMP-DX) 2 formats. The information contained in the data files can vary significantly depending on the instrument, vendor’s software, or conversion utility. PyMassSpec makes the following assumptions about the information contained in the data file:
The data contain the m/z and intensity value pairs across a scan.
Each scan has a retention time.
Internally, PyMassSpec stores the raw data from ANDI files or JCAMP files as a
GCMS_data
object.
Example: Reading JCAMP GC-MS data
The PyMS package pyms.GCMS.IO.JCAMP
provides capabilities to read
the raw GC-MS data stored in the JCAMP-DX format.
First, setup the paths to the datafile and the output directory, then import JCAMP_reader.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
Read the raw JCAMP-dx data.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
data
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
<GCMS_data(305.582 - 4007.722 seconds, time step 0.3753183292781833, 9865 scans)>
A GCMS_data Object
The object data
(from the two previous examples) stores the raw data
as a pyms.GCMS.Class.GCMS_data
object. Within the GCMS_data
object, raw data are stored as a list of pyms.Spectrum.Scan
objects
and a list of retention times. There are several methods available to
access data and attributes of the GCMS_data
and Scan
objects.
The GCMS_data
object’s methods relate to the raw data. The main
properties relate to the masses, retention times and scans. For example,
the minimum and maximum mass from all of the raw data can be returned by
the following:
In [3]:
data.min_mass
50.0
In [4]:
data.max_mass
599.9
A list of the first 10 retention times can be returned with:
In [5]:
data.time_list[:10]
[305.582,
305.958,
306.333,
306.708,
307.084,
307.459,
307.834,
308.21,
308.585,
308.96]
The index of a specific retention time (in seconds) can be returned with:
In [6]:
data.get_index_at_time(400.0)
252
Note that this returns the index of the retention time in the data closest to the given retention time of 400.0 seconds.
The GCMS_data.tic
attribute returns a total ion chromatogram (TIC)
of the data as an IonChromatogram
object:
In [7]:
data.tic
<pyms.IonChromatogram.IonChromatogram at 0x7f6b22ff9d68>
The IonChromatogram
object is explained in a later example.
A Scan Object
A pyms.Spectrum.Scan
object contains a list of masses and a
corresponding list of intensity values from a single mass-spectrum scan
in the raw data. Typically only non-zero (or non-threshold) intensities
and corresponding masses are stored in the raw data.
A list of the first 10 pyms.Spectrum.Scan
objects can be returned
with:
In [8]:
scans = data.scan_list
scans[:10]
[<pyms.Spectrum.Scan at 0x7f6b4117a518>,
<pyms.Spectrum.Scan at 0x7f6b22ff9400>,
<pyms.Spectrum.Scan at 0x7f6b22ff9dd8>,
<pyms.Spectrum.Scan at 0x7f6b22ff9e80>,
<pyms.Spectrum.Scan at 0x7f6b22ff9f28>,
<pyms.Spectrum.Scan at 0x7f6b22ff9fd0>,
<pyms.Spectrum.Scan at 0x7f6b22ff9e48>,
<pyms.Spectrum.Scan at 0x7f6b22ff9668>,
<pyms.Spectrum.Scan at 0x7f6b22ff9d30>,
<pyms.Spectrum.Scan at 0x7f6b22ff9cf8>]
A list of the first 10 masses in a scan (e.g. the 1st scan) is returned with:
In [9]:
scans[0].mass_list[:10]
[50.1, 51.1, 53.1, 54.2, 55.1, 56.2, 57.2, 58.2, 59.1, 60.1]
A list of the first 10 corresponding intensities in a scan is returned with:
In [10]:
scans[0].intensity_list[:10]
[22128.0,
10221.0,
31400.0,
27352.0,
65688.0,
55416.0,
75192.0,
112688.0,
152256.0,
21896.0]
The minimum and maximum mass in an individual scan (e.g. the 1st scan) are returned with:
In [11]:
scans[0].min_mass
50.1
In [12]:
scans[0].max_mass
599.4
Exporting data and obtaining information about a data set
Often it is of interest to find out some basic information about the
data set, e.g. the number of scans, the retention time range, and m/z
range and so on. The GCMS_data
class provides a method info()
that can be used for this purpose.
In [13]:
data.info()
Data retention time range: 5.093 min -- 66.795 min
Time step: 0.375 s (std=0.000 s)
Number of scans: 9865
Minimum m/z measured: 50.000
Maximum m/z measured: 599.900
Mean number of m/z values per scan: 56
Median number of m/z values per scan: 40
The entire raw data of a GCMS_data
object can be exported to a file
with the method write()
:
In [14]:
data.write(output_directory / "data")
-> Writing intensities to '/home/vagrant/PyMassSpec/pyms-demo/jupyter/output/data.I.csv'
-> Writing m/z values to '/home/vagrant/PyMassSpec/pyms-demo/jupyter/output/data.mz.csv'
This method takes the filename (“output/data”, in this example) and writes two CSV files. One has extension “.I.csv” and contains the intensities (“output/data.I.csv” in this example), and the other has the extension “.mz” and contains the corresponding table of m/z value (“output/data.mz.csv” in this example). In general, these are not two-dimensional matrices, because different scans may have different number of m/z values recorded.
Note
This example is in pyms-demo/jupyter/reading_jcamp.ipynb
. There is also an example in that directory for reading ANDI-MS files.
Example: Comparing two GC-MS data sets
Occasionally it is useful to compare two data sets. For example, one may want to check the consistency between the data set exported in netCDF format from the manufacturer’s software, and the JCAMP format exported from a third party software.
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader and ANDI_reader.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.GCMS.IO.ANDI import ANDI_reader
Then the raw data is read as before.
In [2]:
andi_file = data_directory / "gc01_0812_066.cdf"
data1 = ANDI_reader(andi_file)
data1
-> Reading netCDF file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.cdf'
<GCMS_data(305.582 - 4007.721 seconds, time step 0.37531822789943226, 9865 scans)>
In [3]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data2 = JCAMP_reader(jcamp_file)
data2
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
<GCMS_data(305.582 - 4007.722 seconds, time step 0.3753183292781833, 9865 scans)>
To compare the two data sets, use the function diff()
In [4]:
from pyms.GCMS.Function import diff
diff(data1, data2)
Data sets have the same number of time points.
Time RMSD: 3.54e-04
Checking for consistency in scan lengths ...OK
Calculating maximum RMSD for m/z values and intensities ...
Max m/z RMSD: 1.03e-05
Max intensity RMSD: 0.00e+00
If the data cannot be compared, for example because of different number
of scans, or inconsistent number of m/z values in between two scans,
diff()
will report the difference. For example:
In [5]:
data2.trim(begin=1000, end=2000)
Trimming data to between 1000 and 2001 scans
In [6]:
diff(data1, data2)
The number of retention time points differ.
First data set: 9865 time points
Second data set: 1002 time points
Data sets are different.
Note
This example is in pyms-demo/jupyter/comparing_datasets.ipynb
.
Footnotes
GC-MS data derived objects
Table of Contents
In the raw GC-MS data, consecutive scans do not necessarily contain the same
mass per charge (mass) values. For data processing, it is often necessary to
convert the data to a matrix with a set number of masses and scans.
In PyMassSpec
the resulting object is called an intensity matrix.
In this chapter the methods for converting the raw GC-MS data to an
intensity matrix object are illustrated.
IntensityMatrix Object
The general scheme for converting raw mass values is to bin intensity values based on the interval the corresponding mass belongs to. The general procedure is as follows:
Set the interval between bins, lower and upper bin boundaries.
Calculate the number of bins to cover the range of all masses.
Centre the first bin at the minimum mass found for all the raw data.
Sum intensities whose masses are in a given bin.
A mass, \(m\), is considered to belong to a bin when \(c - l \le m < c + u\), where \(c\) is the centre of the bin, \(l\) is the lower boundary and \(u\) is the upper boundary of the bin. The default bin interval is one with a lower and upper boundary of \(\pm0.5\).
A function to bin masses to the nearest integer is also available. The default bin interval is one with a lower boundary of \(-0.3\) and upper boundary of \(+0.7\) (as per the NIST library).
Discussion of Binning Boundaries
For any chemical element \(X\), let \(w(x)\) be the atomic weight of \(X\), and
\(\delta(X) = \frac{w(X) - \{w(X)\}}{w(X)}\)
where \(\{a\}\) is the integer value of \(a\) (rounded to the nearest integer).
For example, for hydrogen \(\delta(^1\rm{H}) = \frac{1.007825032 - 1}{1.007825032} = 0.0076\). Similarly \(\delta(^{12}\rm{C}) = 0\), \(\delta(^{14}\rm{N}) = 0.00022\), \(\delta(^{16}\rm{O}) = -0.00032\), etc.
Let also \(\Delta(X) = w(X) - \{w(x)\}\). Then \(-0.023 <\Delta(^{31}\rm{P}), \Delta(^{28}\rm{Si}) < 0\).
Let a compound undergo GC-MS and let Y be one of it’s fragments. If Y consists of \(k_{1}\), \(k_{2}\) atoms of type \(X_{2}\),….., \(k_{r}\) atoms of type \(X_{r}\), then \(\Delta(Y) = k_{1}*\Delta(X_{1}) + k_{2}*\Delta(X_{2}) + ....+ k_{r}*\Delta(X_{r})\).
The fragment will usually not contain more than 2 or 3 P or Si atoms and if it’s molecular weight is less than 550 it may not contain more than 35 O atoms, so \(\Delta(Y) \geq -0.023*5 - 0.00051*35 = -0.133\).
On the other hand, of Y contains \(k\) H atoms and \(m\) N atoms, then \(\Delta(Y) \leq k*0.00783 + m*0.00051\). Since for each two hydrogen atoms at least one carbon (or heavier) atom is needed, giving the limit of no more than 80 hydrogen atoms. Therefore in this case (i.e. H and C atoms only) \(\Delta(Y) \leq 80*0.00783 = 0.63\). If carbon is replaced by any heavier atom, at least 2 hydrogen atoms will be eliminated and \(\Delta(Y)\) will become even smaller.
If the molecular weight of \(Y\) does not exceed 550 (typically the largest mass scanned for in a GC-MS setup) then \(\mathbf{-0.133 \leq \Delta(Y) \leq 0.63}\). This means that if we set our binning boundaries to \((-0.3, 0.7)\) or \((-0.2, 0.8)\) the opportunity for having a fragment whose molecular weight is very close to the boundary is minimised.
Since the resolution of MS is at least 0.1 dalton, we may assume that it’s error does not exceed 0.05, and MS accuracy will not cause additional problems.
Example: Building an Intensity Matrix
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
Read the raw data files.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
data
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
<GCMS_data(305.582 - 4007.722 seconds, time step 0.3753183292781833, 9865 scans)>
Then the data can be converted to an IntensityMatrix
using the
function build_intensity_matrix()
from pyms.IntensityMatrix
.
The default operation of build_intensity_matrix()
is to use a bin
interval of one and treat the masses as floating point numbers. The
default intensity matrix can be built as follows:
In [3]:
from pyms.IntensityMatrix import build_intensity_matrix
im = build_intensity_matrix(data)
im
<pyms.IntensityMatrix.IntensityMatrix at 0x7f31d8b12860>
The size as the number of scans and the number of bins can be returned with:
In [4]:
im.size
(9865, 551)
There are 9865 scans and 551 bins in this example.
The raw masses have been binned into new mass units based on the minimum mass in the raw data and the bin size. A list of the first ten new masses can be obtained as follows:
In [5]:
im.mass_list[:10]
[50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0]
The attributes im.min_mass
and im.max_mass
return the minimum
and maximum mass:
In [6]:
im.min_mass
50.0
In [7]:
im.max_mass
600.0
It is also possible to search for a particular mass, by finding the
index of the binned mass closest to the desired mass. For example, the
index of the closest binned mass to a mass of 73.3 \(m/z\) can be found
by using the methods im.get_index_of_mass()
:
In [8]:
index = im.get_index_of_mass(73.3)
index
23
The value of the closest mass can be returned by the method
im.get_mass_at_index()
:
In [9]:
im.get_mass_at_index(index)
73.0
A mass of 73.0 is returned in this example.
Build intensity matrix parameters
The bin interval can be set to values other than one, and binning boundaries can also be adjusted. In the example below, to fit the 0.5 bin interval, the upper and lower boundaries are set to ± 0.25.
In [10]:
im = build_intensity_matrix(data, 0.5, 0.25, 0.25)
im
<pyms.IntensityMatrix.IntensityMatrix at 0x7f31d8b8d710>
The size of the intensity matrix will reflect the change in the number of bins:
In [11]:
im.size
(9865, 1101)
In [12]:
im.mass_list[:10]
[50.0, 50.5, 51.0, 51.5, 52.0, 52.5, 53.0, 53.5, 54.0, 54.5]
In this example there are 9865 scans (as before), but 1101 bins.
The index and binned mass of the mass closest to 73.3 should also reflect the different binning.
In [13]:
index = im.get_index_of_mass(73.3)
index
47
In [14]:
im.get_mass_at_index(index)
73.5
Build integer mass intensity matrix
It is also possible to build an intensity matrix with integer masses and
a bin interval of one using build_intensity_matrix_i()
. The default
range for the binning is -0.3 and +0.7 mass units. The function is
imported from pyms.IntensityMatrix
:
In [15]:
from pyms.IntensityMatrix import build_intensity_matrix_i
im = build_intensity_matrix_i(data)
im
<pyms.IntensityMatrix.IntensityMatrix at 0x7f31d8b121d0>
In [16]:
im.size
(9865, 551)
In [17]:
im.mass_list[:10]
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
The masses are now integers.
In [18]:
index = im.get_index_of_mass(73.3)
index
23
In [19]:
im.get_mass_at_index(index)
73
The lower and upper bounds can be adjusted with
build_intensity_matrix_i(data, lower, upper)
.
Note
This example is in pyms-demo/jupyter/IntensityMatrix.ipynb
.
Example: MassSpectrum Objects
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader and build_intensity_matrix.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
Read the raw data files and create the IntensityMatrix.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
im = build_intensity_matrix(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
A MassSpectrum
object contains two attributes, mass_list
and
intensity_list
, a list of mass values and corresponding intensities,
respectively. A MassSpectrum
is returned by the IntensityMatrix
method get_ms_at_index(index)
.
For example, the properties of the first MassSpectrum
object can be
obtained as follows:
In [3]:
ms = im.get_ms_at_index(0)
ms
<pyms.Spectrum.MassSpectrum at 0x7ff678cfe080>
In [4]:
len(ms)
551
In [5]:
len(ms.mass_list)
551
In [6]:
len(ms.intensity_list)
551
The length of all attributes should be the same.
Note
This example is in pyms-demo/jupyter/MassSpectrum.ipynb
.
Example: IonChromatogram Objects
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader and build_intensity_matrix.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
Read the raw data files and create the IntensityMatrix.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
im = build_intensity_matrix(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
An IonChromatogram
object is a one dimensional vector containing
mass intensities as a function of retention time. This can can be either
\(m/z\) channel intensities (for example, the ion chromatogram at 73
\(m/z\)), or cumulative intensities over all measured \(m/z\) (TIC).
An IonChromatogram
object for the TIC can be obtained as follows:
In [3]:
data.tic
<pyms.IonChromatogram.IonChromatogram at 0x7f698cbb9e80>
The IonChromatogram
at index 0 can be obtained with:
In [4]:
im.get_ic_at_index(0)
<pyms.IonChromatogram.IonChromatogram at 0x7f69ac4e9198>
The IonChromatogram
for the closest mass to 73 can be obtained with:
In [5]:
im.get_ic_at_mass(73)
<pyms.IonChromatogram.IonChromatogram at 0x7f69ac4e95f8>
An ion chromatogram object has a method is_tic()
which returns
True
if the ion chromatogram is a TIC, False
otherwise.
In [6]:
data.tic.is_tic()
True
In [7]:
im.get_ic_at_mass(73).is_tic()
False
Note
This example is in pyms-demo/jupyter/IonChromatogram.ipynb
.
Writing IonChromatogram object to a file
Note
This example is in pyms-demo/31
The method write()
of an IonChromatogram
object allows the ion chromatogram to be saved to a file:
>>> tic.write("output/tic.dat", minutes=True)
>>> im.get_ic_at_mass(73).write("output/ic.dat", minutes=True)
The flag minutes=True
indicates that retention time will be saved in minutes.
The ion chromatogram object saved with with the
write()
method is a plain ASCII file which contains a pair of
(retention time, intensity) per line.
$ head tic.dat
5.0930 2.222021e+07
5.0993 2.212489e+07
5.1056 2.208650e+07
5.1118 2.208815e+07
5.1181 2.200635e+07
5.1243 2.200326e+07
5.1306 2.202363e+07
5.1368 2.198357e+07
5.1431 2.197408e+07
5.1493 2.193351e+07
Saving data
Note
This example is in pyms-demo/32
A matrix of intensity values can be saved to a file with the function
save_data()
from pyms.Utils.IO
. A matrix of intensity values can
be returned from an IntensityMatrix
with the method
intensity_array
.
For example,
>>> from pyms.Utils.IO import save_data
>>> mat = im.intensity_array
array([[22128., 0., 10221., ..., 0., 470., 0.],
[22040., 0., 10335., ..., 408., 0., 404.],
[21320., 0., 10133., ..., 492., 0., 422.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
>>> save_data("output/im.dat", mat)
It is also possible to save the list of masses (from
im.mass_list
and the list of retention times (from
im.time_list
using the save_data()
function.
For convenience, the intensity values, mass list and time list,
can be saved with the method
export_ascii()
.
For example,
>>> im.export_ascii("output/data")
will create data.im.dat
, data.rt.dat
and data.mz.dat
, where these
are the intensity matrix, retention time vector, and \(m/z\) vector. By default
the data is saved as space separated data with a “.dat” extension. It is
also possible to save the data as comma separated data with a “.csv”
extension with the command:
>>> im.export_ascii("output/data", "csv")
Additionally, the entire IntensityMatrix
can be exported to LECO CSV format.
>>> im.export_leco_csv("output/data_leco.csv")
This facility is useful for import into other analytical software packages. The format has a header line specifying the column heading information as:
scan, retention time, mass1, mass2, ...
and then each row as the intensity data.
Importing ASCII data
Note
This example is in pyms-demo/32
The LECO CSV format data can be imported directly into an IntensityMatrix
object. The data must follow the format outlined above. For example, the file saved above can be read and compared to the original:
>>> from pyms.IntensityMatrix import IntensityMatrix
>>> iim = IntensityMatrix([0],[0],[[0]])
>>> iim.import_leco_csv("output/data_leco.csv")
>>> im.size
>>> iim.size
The line IntensityMatrix([0],[0],[[0]])
is required to create an empty IntensityMatrix
object.
Data Filtering
Table of Contents
Introduction
In this chapter filtering techniques that allow pre-processing of GC-MS data for analysis and comparison to other pre-processed GC-MS data are covered.
Time strings
Before considering the filtering techniques, the mechanism for representing retention times is outlined here.
A time string is the specification of a time interval, that takes the format NUMBERs
or NUMBERm
for time interval in seconds or minutes. For example, these are valid time strings: 10s
(10 seconds) and 0.2m
(0.2 minutes).
Example: IntensityMatrix Resizing
Once an IntensityMatrix has been constructed from the raw GC-MS data, the entries of the matrix can be modified. These modifications can operate on the entire matrix, or individual masses or scans.
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader and build_intensity_matrix.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
Read the raw data files and create the IntensityMatrix.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
im = build_intensity_matrix(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Retention time range
A basic operation on the GC-MS data is to select a specific time range
for processing. In PyMassSpec, any data outside the chosen time range is
discarded. The trim()
method operates on the raw data, so any
subsequent processing only refers to the trimmed data.
The data can be trimmed to specific scans:
In [3]:
data.trim(1000, 2000)
data.info()
Trimming data to between 1000 and 2001 scans
Data retention time range: 11.342 min -- 17.604 min
Time step: 0.375 s (std=0.000 s)
Number of scans: 1002
Minimum m/z measured: 50.100
Maximum m/z measured: 467.100
Mean number of m/z values per scan: 57
Median number of m/z values per scan: 44
or specific retention times (in seconds
or minutes
):
In [4]:
data.trim("700s", "15m")
data.info()
Trimming data to between 54 and 587 scans
Data retention time range: 11.674 min -- 15.008 min
Time step: 0.375 s (std=0.000 s)
Number of scans: 534
Minimum m/z measured: 50.100
Maximum m/z measured: 395.200
Mean number of m/z values per scan: 59
Median number of m/z values per scan: 47
Mass Spectrum range and entries
An IntensityMatrix
object has a set mass range and interval that is
derived from the data at the time of building the intensity matrix. The
range of mass values can be cropped. This is done, primarily, to ensure
that the range of masses used are consistent when comparing samples.
The mass range of the intensity matrix can be “cropped” to a new (smaller) range as follows:
In [5]:
im.crop_mass(60, 400)
im.min_mass
60.0
In [6]:
im.max_mass
400.0
It is also possible to set all intensities for a given mass to zero. This is useful for ignoring masses associated with sample preparation. The mass can be “nulled” with:
In [7]:
im.null_mass(73)
sum(im.get_ic_at_mass(73).intensity_array)
0.0
As expected, the sum of the intensity array is 0
Note
This example is in pyms-demo/jupyter/IntensityMatrix_Resizing.ipynb
.
Noise smoothing
The purpose of noise smoothing is to remove high-frequency noise from data, and thereby increase the contribution of the signal relative to the contribution of the noise.
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
Read the raw data files and extract the TIC.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
tic = data.tic
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Window averaging
A simple approach to noise smoothing is moving average window smoothing.
In this approach the window of a fixed size (:math:2N+1
points) is
moved across the ion chromatogram, and the intensity value at each point
is replaced with the mean intensity calculated over the window size. The
example below illustrates smoothing of TIC by window averaging.
To apply mean window smoothing with a 5-point window:
In [3]:
from pyms.Noise.Window import window_smooth
tic1 = window_smooth(tic, window=5)
To apply median window smoothing with a 5-point window:
In [4]:
tic2 = window_smooth(tic, window=5, use_median=True)
To apply the mean windows smoothing, but specifying the window as a time string (in this example, 7 seconds):
In [5]:
tic3 = window_smooth(tic, window='7s')
Write the original TIC and the smoothed TICs to disk:
In [6]:
tic.write(output_directory / "noise_smoothing_tic.dat",minutes=True)
tic1.write(output_directory / "noise_smoothing_tic1.dat",minutes=True)
tic2.write(output_directory / "noise_smoothing_tic2.dat",minutes=True)
Window Averaging on Intensity Matrix
In the previous section, window averaging was applied to an Ion
Chromatogram object (in that case a TIC). Where filtering is to be
performed on all Ion Chromatograms, the window_smooth_im()
function
may be used instead.
The use of this function is identical to the Ion Chromatogram
window_smooth()
function, except that an Intensity Matrix is passed
to it.
For example, to perform window smoothing on an IntensityMatrix
object with a 5 point window and mean window smoothing:
In [7]:
from pyms.IntensityMatrix import build_intensity_matrix
from pyms.Noise.Window import window_smooth_im
im = build_intensity_matrix(data)
im_smooth1 = window_smooth_im(im, window=5, use_median=False)
Write the IC for mass 73 to disk for both the original and smoothed
IntensityMatrix
:
In [8]:
ic = im.get_ic_at_index(73)
ic_smooth1 = im_smooth1.get_ic_at_index(73)
ic.write(output_directory/"noise_smoothing_ic.dat", minutes=True)
ic_smooth1.write(output_directory/"noise_smoothing_ic_smooth1.dat", minutes=True)
Savitzky–Golay noise filter
A more sophisticated noise filter is the Savitzky-Golay filter. Given the data loaded as above, this filter can be applied as follows:
In [9]:
from pyms.Noise.SavitzkyGolay import savitzky_golay
tic4 = savitzky_golay(tic)
Write the smoothed TIC to disk:
In [10]:
tic4.write(output_directory / "noise_smoothing_tic4.dat",minutes=True)
In this example the default parameters were used.
The savitzky_golay()
function described above acts on a single
IonChromatogram
. Where it is desired to perform Savitzky Golay
filtering on the whole IntensityMatrix
the function
savitzky_golay_im()
may be used as follows:
In [11]:
from pyms.Noise.SavitzkyGolay import savitzky_golay_im
im_smooth2 = savitzky_golay_im(im)
Write the IC for mass 73 in the smoothed IntensityMatrix
to disk:
In [12]:
ic_smooth2 = im_smooth2.get_ic_at_index(73)
ic_smooth2.write(output_directory/"noise_smoothing_ic_smooth2.dat",minutes=True)
Note
This example is in pyms-demo/jupyter/NoiseSmoothing.ipynb
.
Baseline Correction
Baseline distortion originating from instrument imperfections and
experimental setup is often observed in mass spectrometry data, and
off-line baseline correction is often an important step in data
pre-processing. There are many approaches for baseline correction. One
advanced approach is based on the top-hat transform developed in mathematical morphology 1, and used extensively in digital image
processing for tasks such as image enhancement. Top-hat baseline
correction was previously applied in proteomics based mass spectrometry 2.
PyMS currently implements only the top-hat baseline corrector, using the
SciPy package ndimage
.
Application of the top-hat baseline corrector requires the size of the
structural element to be specified. The structural element needs to be
larger than the features one wants to retain in the spectrum after the
top-hat transform. In the example below, the top-hat baseline corrector
is applied to the TIC of the data set gc01_0812_066.cdf
, with the
structural element of 1.5 minutes:
The purpose of noise smoothing is to remove high-frequency noise from data, and thereby increase the contribution of the signal relative to the contribution of the noise.
First, setup the paths to the datafiles and the output directory, then import ANDI_reader, savitzky_golay and tophat.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.ANDI import ANDI_reader
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.TopHat import tophat
Read the raw data files and extract the TIC.
In [2]:
andi_file = data_directory / "gc01_0812_066.cdf"
data = ANDI_reader(andi_file)
tic = data.tic
-> Reading netCDF file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.cdf'
Perform Savitzky-Golay smoothing
In [3]:
tic1 = savitzky_golay(tic)
Perform Tophat baseline correction
In [4]:
tic2 = tophat(tic1, struct="1.5m")
Save the output to disk
In [5]:
tic.write(output_directory / "baseline_correction_tic.dat",minutes=True)
tic1.write(output_directory / "baseline_correction_tic_smooth.dat",minutes=True)
tic2.write(output_directory / "baseline_correction_tic_smooth_bc.dat",minutes=True)
Tophat Baseline correction on an Intensity Matrix object
The tophat()
function acts on a single IonChromatogram
. To
perform baseline correction on an IntensityMatrix
object (i.e. on
all Ion Chromatograms
) the tophat_im()
function may be used.
Using the same value for struct
as above, tophat_im()
is used as
follows:
In [6]:
from pyms.TopHat import tophat_im
from pyms.IntensityMatrix import build_intensity_matrix
im = build_intensity_matrix(data)
im_base_corr = tophat_im(im, struct="1.5m")
Write the IC for mass 73 to disk for both the original and smoothed
IntensityMatrix
:
In [7]:
ic = im.get_ic_at_index(73)
ic_base_corr = im_base_corr.get_ic_at_index(73)
ic.write(output_directory/"baseline_correction_ic.dat",minutes=True)
ic_base_corr.write(output_directory/"baseline_correction_ic_base_corr.dat",minutes=True)
Note
This example is in pyms-demo/jupyter/BaselineCorrection.ipynb
.
Pre-processing the IntensityMatrix
Noise smoothing and baseline correction can be applied to each
IonChromatogram
in an IntensityMatrix
.
First, setup the paths to the datafiles and the output directory, then import the required functions.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.TopHat import tophat
Read the raw data files and build the IntensityMatrix
:
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
im = build_intensity_matrix(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Perform Savitzky-Golay smoothing and Tophat baseline correction
In [3]:
n_scan, n_mz = im.size
for ii in range(n_mz):
# print("Working on IC#", ii+1)
ic = im.get_ic_at_index(ii)
ic_smooth = savitzky_golay(ic)
ic_bc = tophat(ic_smooth, struct="1.5m")
im.set_ic_at_index(ii, ic_bc)
Alternatively, the filtering may be performed on the IntensityMatrix
without using a for
loop, as outlined in previous examples. However
filtering by IonChromatogram
in a for
loop as described here is
much faster.
The resulting IntensityMatrix
object can be “dumped” to a file for later
retrieval. There are general perpose object file handling methods in
pyms.Utils.IO
. For example;
>>> from pyms.Utils.IO import dump_object
>>> dump_object(im, "output/im-proc.dump")
Note
This example is in pyms-demo/jupyter/IntensityMatrix_Preprocessing.ipynb
.
Peak detection and representation
Table of Contents
Example: Peak Objects
Fundamental to GC-MS analysis is the identification of individual
components of the sample mix. The basic component unit is represented as
a signal peak. In PyMassSpec a signal peak is represented as Peak
object. PyMassSpec provides functions to detect peaks and create peaks
(discussed at the end of the chapter).
A peak object stores a minimal set of information about a signal peak, namely, the retention time at which the peak apex occurs and the mass spectra at the apex. Additional information, such as, peak width, TIC and individual ion areas can be filtered from the GC-MS data and added to the Peak object information.
Creating a Peak Object
A peak object can be created for a scan at a given retention time by
providing the retention time (in minutes or seconds) and the
MassSpectrum
object of the scan.
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
Read the raw data files.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Build the IntensityMatrix
.
In [3]:
from pyms.IntensityMatrix import build_intensity_matrix_i
im = build_intensity_matrix_i(data)
Extract the MassSpectrum
at 31.17 minutes in this example.
In [4]:
index = im.get_index_at_time(31.17*60.0)
ms = im.get_ms_at_index(index)
Create a Peak
object for the given retention time.
In [5]:
from pyms.Peak.Class import Peak
peak = Peak(31.17, ms, minutes=True)
By default the retention time is assumed to be in seconds. The parameter
minutes
can be set to True
if the retention time is given in
minutes. Internally, PyMassSpec stores retention times in seconds, so
the minutes
parameter ensures the input and output of the retention
time are in the same units.
Peak Object properties
The retention time of the peak, in seconds, can be returned with
pyms.Peak.Class.Peak.rt
. The mass spectrum can be returned with
pyms.Peak.Class.Peak.mass_spectrum
.
The Peak
object constructs a unique identification (UID) based on
the spectrum and retention time. This helps in managing lists of peaks
(covered in the next chapter). The UID can be returned with
pyms.Peak.Class.Peak.UID
. The format of the UID is the masses of the
two most abundant ions in the spectrum, the ratio of the abundances of
the two ions, and the retention time (in the same units as given when
the Peak object was created). The format is:
Mass1-Mass2-Ratio-RT
For example:
In [6]:
peak.rt
1870.2
In [7]:
peak.UID
'319-73-74-1870.20'
In [8]:
index = im.get_index_of_mass(73.3)
index
23
Modifying a Peak Object
The Peak object has methods for modifying the mass spectrum. The mass
range can be cropped to a smaller range with crop_mass()
, and the
intensity values for a single ion can be set to zero with
null_mass()
. For example, the mass range can be set from 60 to 450
\(m/z\), and the ions related to sample preparation can be ignored by
setting their intensities to zero as follows:
In [9]:
peak.crop_mass(60, 450)
peak.null_mass(73)
peak.null_mass(147)
The UID is automatically updated to reflect the changes:
In [10]:
peak.UID
'319-205-54-1870.20'
It is also possible to change the peak mass spectrum by setting the
attribute pyms.Peak.Class.Peak.mass_spectrum
.
Note
This example is in pyms-demo/jupyter/Peak.ipynb
.
Peak Detection
The general use of a Peak
object is to extract them from the GC-MS data and build a list of peaks. In PyMassSpec, the function for peak detection is based on the method of Biller and Biemann (1974) 1.
The basic process is to find all maximising ions in a pre-set window of scans, for a given scan.
The ions that maximise at a given scan are taken to belong to the same peak.
The function is BillerBiemann()
. in pyms.BillerBiemann
.
The function has parameters for the window width for detecting the local maxima (points
), and the number of scans
across which neighbouring, apexing, ions are combined and considered as belonging to the same peak.
The number of neighbouring scans to combine is related to the likelihood of detecting a peak apex at a single scan or several neighbouring scans.
This is more likely when there are many scans across the peak.
It is also possible, however, when there are very few scans across the peak.
The scans are combined by taking all apexing ions to have occurred at the scan that had to greatest TIC prior to combining scans.
Example: Peak Detection
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader and build_intensity_matrix.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
Read the raw data file and build the IntensityMatrix
.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
im = build_intensity_matrix(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Preprocess the data (Savitzky-Golay smoothing and Tophat baseline detection).
In [3]:
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.TopHat import tophat
n_scan, n_mz = im.size
for ii in range(n_mz):
ic = im.get_ic_at_index(ii)
ic_smooth = savitzky_golay(ic)
ic_bc = tophat(ic_smooth, struct="1.5m")
im.set_ic_at_index(ii, ic_bc)
Now the Biller and Biemann based technique can be applied to detect peaks.
In [4]:
from pyms.BillerBiemann import BillerBiemann
peak_list = BillerBiemann(im)
peak_list[:10]
[<pyms.Peak.Class.Peak at 0x7f8deca882b0>,
<pyms.Peak.Class.Peak at 0x7f8deca88550>,
<pyms.Peak.Class.Peak at 0x7f8dce908160>,
<pyms.Peak.Class.Peak at 0x7f8df3bc5c50>,
<pyms.Peak.Class.Peak at 0x7f8dc3a043c8>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04588>,
<pyms.Peak.Class.Peak at 0x7f8dc3a045f8>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04668>,
<pyms.Peak.Class.Peak at 0x7f8dc3a046d8>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04748>]
In [5]:
len(peak_list)
9845
Note that this is nearly as many peaks as there are scans in the data (9865 scans). This is due to noise and the simplicity of the technique.
The number of detected peaks can be constrained by the selection of better parameters. Parameters can be determined by counting the number of points across a peak, and examining where peaks are found. For example, the peak list can be found with the parameters of a window of 9 points and by combining 2 neighbouring scans if they apex next to each other:
In [6]:
peak_list = BillerBiemann(im, points=9, scans=2)
peak_list[:10]
[<pyms.Peak.Class.Peak at 0x7f8dae545be0>,
<pyms.Peak.Class.Peak at 0x7f8dae545c18>,
<pyms.Peak.Class.Peak at 0x7f8dae545c88>,
<pyms.Peak.Class.Peak at 0x7f8dae545cf8>,
<pyms.Peak.Class.Peak at 0x7f8dae545d68>,
<pyms.Peak.Class.Peak at 0x7f8dae545dd8>,
<pyms.Peak.Class.Peak at 0x7f8dae545e48>,
<pyms.Peak.Class.Peak at 0x7f8dae545eb8>,
<pyms.Peak.Class.Peak at 0x7f8dae545f28>,
<pyms.Peak.Class.Peak at 0x7f8dae545f98>]
In [7]:
len(peak_list)
3695
The number of detected peaks has been reduced, but there are still many more than would be expected from the sample. Functions to filter the peak list are covered in the next example.
Example: Peak List Filtering
There are two functions to filter the list of Peak objects.
The first, rel_threshold()
modifies the mass spectrum stored in each
peak so any intensity that is less than a given percentage of the
maximum intensity for the peak is removed.
The second, num_ions_threshold()
, removes any peak that has less than
a given number of ions above a given threshold.
Once the peak list has been constructed, the filters can be applied as follows:
In [8]:
from pyms.BillerBiemann import rel_threshold, num_ions_threshold
pl = rel_threshold(peak_list, percent=2)
pl[:10]
[<pyms.Peak.Class.Peak at 0x7f8dc3a045f8>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04630>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04748>,
<pyms.Peak.Class.Peak at 0x7f8dc3a047b8>,
<pyms.Peak.Class.Peak at 0x7f8dc3a048d0>,
<pyms.Peak.Class.Peak at 0x7f8dc3a049e8>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04a20>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04b38>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04b70>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04c88>]
In [9]:
new_peak_list = num_ions_threshold(pl, n=3, cutoff=10000)
new_peak_list[:10]
[<pyms.Peak.Class.Peak at 0x7f8deca8e128>,
<pyms.Peak.Class.Peak at 0x7f8deca8e1d0>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04780>,
<pyms.Peak.Class.Peak at 0x7f8dc3a04550>,
<pyms.Peak.Class.Peak at 0x7f8dbb3cf3c8>,
<pyms.Peak.Class.Peak at 0x7f8dbb3cf048>,
<pyms.Peak.Class.Peak at 0x7f8dbb3cf4a8>,
<pyms.Peak.Class.Peak at 0x7f8dbb3cf550>,
<pyms.Peak.Class.Peak at 0x7f8dbb3cf5f8>,
<pyms.Peak.Class.Peak at 0x7f8dbb3cf6a0>]
In [10]:
len(new_peak_list)
146
The number of detected peaks is now more realistic of what would be expected in the test sample.
Note
This example is in pyms-demo/jupyter/Peak_Detection.ipynb
.
Noise analysis for peak filtering
In the previous example the cutoff parameter for peak filtering was set by the user. This can work well for individual data files, but can cause problems when applied to large experiments with many individual data files. Where experimental conditions have changed slightly between experimental runs, the ion intensity over the GC-MS run may also change. This means that an inflexible cutoff value can work for some data files, while excluding too many, or including too many peaks in other files.
An alternative to manually setting the value for cutoff is to use the
window_analyzer()
function. This function examines a Total Ion
Chromatogram (TIC) and computes a value for the median absolute
deviation in troughs between peaks. This gives an approximate threshold
value above which false peaks from noise should be filtered out.
First, build the Peak list as before
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.TopHat import tophat
from pyms.BillerBiemann import BillerBiemann
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
im = build_intensity_matrix(data)
n_scan, n_mz = im.size
for ii in range(n_mz):
ic = im.get_ic_at_index(ii)
ic_smooth = savitzky_golay(ic)
ic_bc = tophat(ic_smooth, struct="1.5m")
im.set_ic_at_index(ii, ic_bc)
peak_list = BillerBiemann(im, points=9, scans=2)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Compute the noise value.
In [2]:
from pyms.Noise.Analysis import window_analyzer
tic = data.tic
noise_level = window_analyzer(tic)
noise_level
432.1719792438844
Filter the Peak List using this noise value as the cutoff.
In [3]:
from pyms.BillerBiemann import num_ions_threshold
filtered_peak_list = num_ions_threshold(peak_list, n=3, cutoff=noise_level)
filtered_peak_list[:10]
[<pyms.Peak.Class.Peak at 0x7f4a9864f128>,
<pyms.Peak.Class.Peak at 0x7f4a9864f2e8>,
<pyms.Peak.Class.Peak at 0x7f4a9864f320>,
<pyms.Peak.Class.Peak at 0x7f4a9864f3c8>,
<pyms.Peak.Class.Peak at 0x7f4a9864f518>,
<pyms.Peak.Class.Peak at 0x7f4a9864f4a8>,
<pyms.Peak.Class.Peak at 0x7f4a9864f6a0>,
<pyms.Peak.Class.Peak at 0x7f4a9864f748>,
<pyms.Peak.Class.Peak at 0x7f4a9864f7f0>,
<pyms.Peak.Class.Peak at 0x7f4a9864f898>]
In [4]:
len(filtered_peak_list)
612
Note
This example is in pyms-demo/jupyter/Peak_Filtering_Noise_Analysis.ipynb
.
Peak Area Estimation
The Peak
object does not contain any information about the width or
area of the peak when it is first created. This information can be added
after the instantiation of a Peak object. The area of the peak can be
set with the attribute area
.
The total peak area can by obtained by the peak_sum_area()
function
in pyms.Peak.Function
. The function determines the total area as the
sum of the ion intensities for all masses that apex at the given peak.
To calculate the peak area of a single mass, the intensities are added
from the apex of the mass peak outwards.
Edge values are added until the following conditions are met:
the added intensity adds less than 0.5% to the accumulated area; or
the added intensity starts increasing (i.e. when the ion is common to co-eluting compounds).
To avoid noise effects, the edge value is taken at the midpoint of three consecutive edge values.
First, build the Peak list as before
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.TopHat import tophat
from pyms.BillerBiemann import BillerBiemann
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
im = build_intensity_matrix(data)
n_scan, n_mz = im.size
for ii in range(n_mz):
ic = im.get_ic_at_index(ii)
ic_smooth = savitzky_golay(ic)
ic_bc = tophat(ic_smooth, struct="1.5m")
im.set_ic_at_index(ii, ic_bc)
peak_list = BillerBiemann(im, points=9, scans=2)
from pyms.Noise.Analysis import window_analyzer
tic = data.tic
noise_level = window_analyzer(tic)
from pyms.BillerBiemann import num_ions_threshold
filtered_peak_list = num_ions_threshold(peak_list, n=3, cutoff=noise_level)
filtered_peak_list[:10]
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
[<pyms.Peak.Class.Peak at 0x7fa8eae80198>,
<pyms.Peak.Class.Peak at 0x7fa8eae80208>,
<pyms.Peak.Class.Peak at 0x7fa8eae802b0>,
<pyms.Peak.Class.Peak at 0x7fa8eae80358>,
<pyms.Peak.Class.Peak at 0x7fa8eae80400>,
<pyms.Peak.Class.Peak at 0x7fa8eae804a8>,
<pyms.Peak.Class.Peak at 0x7fa8eae80550>,
<pyms.Peak.Class.Peak at 0x7fa8eae805f8>,
<pyms.Peak.Class.Peak at 0x7fa8eae806a0>,
<pyms.Peak.Class.Peak at 0x7fa8eae80748>]
Given a list of peaks, areas can be determined and added as follows:
In [2]:
from pyms.Peak.Function import peak_sum_area
for peak in peak_list:
area = peak_sum_area(im, peak)
peak.area = area
Note
This example is in pyms-demo/jupyter/Peak_Area_Estimation.ipynb
.
Individual Ion Areas
Note
This example is in pyms-demo/56
While the previous approach uses the sum of all areas in the peak to estimate the peak area, the user may also choose to record the area of each individual ion in each peak.
This can be useful when the intention is to later perform quantitation based on the area of a single characteristic ion for a particular compound. It is also essential if using the Common Ion Algorithm for quantitation, outlined in the section Common Ion Area Quantitation.
To set the area of each ion for each peak, the following code is used:
>>> from pyms.Peak.Function import peak_top_ion_areas
>>> for peak in peak_list:
... area_dict = peak_top_ions_areas(intensity_matrix, peak)
... peak.set_ion_areas(area_dict)
...
This will set the areas of the 5 most abundant ions in each peak.
If it is desired to record more than the top five ions, the argument num_ions=x
should be supplied, where x
is the number of most abundant ions to be recorded.
For example:
... area_dict = peak_top_ions_areas(intensity_matrix, peak, num_ions=10)
will record the 10 most abundant ions for each peak.
The individual ion areas can be set instead of, or in addition to the total area for each peak.
Reading the area of a single ion in a peak
If the individual ion areas have been set for a peak, it is possible to read the area of an individual ion for the peak. For example:
>>> peak.get_ion_area(101)
will return the area of the \(m/z\) value 101 for the peak.
If the area of that ion has not been set (i.e. it was not one of the most abundant ions), the function will return None
.
References
- 1
Biller JE and Biemann K. Reconstructed mass spectra, a novel approach for the utilization of gas chromatograph–mass spectrometer data. Anal. Lett., 7:515–528, 1974
Peak alignment by dynamic programming
PyMS provides functions to align GC-MS peaks by dynamic programming 1. The peak alignment by dynamic programming uses both peak apex retention time and mass spectra. This information is determined from the raw GC-MS data by applying a series of processing steps to produce data that can then be aligned and used for statistical analysis. The details are described in this chapter.
Preparation of multiple experiments for peak alignment by dynamic programming
Example: Creating an Experiment
Before aligning peaks from multiple experiments, the peak objects need
to be created and encapsulated into Experiment
objects. During this
process it is often useful to pre-process the peaks in some way, for
example to null certain m/z channels and/or to select a certain
retention time range.
The procedure starts the same as in the previous examples, namely:
read a file,
bin the data into fixed mass values,
smooth the data,
remove the baseline,
deconvolute peaks,
filter the peaks,
set the mass range,
remove uninformative ions, and
estimate peak areas.
First, setup the paths to the datafiles and the output directory, then import ANDI_reader and build_intensity_matrix_i.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.ANDI import ANDI_reader
from pyms.IntensityMatrix import build_intensity_matrix_i
Read the raw data file and build the IntensityMatrix
.
In [2]:
andi_file = data_directory / "a0806_077.cdf"
data = ANDI_reader(andi_file)
im = build_intensity_matrix_i(data)
-> Reading netCDF file '/home/vagrant/PyMassSpec/pyms-data/a0806_077.cdf'
Preprocess the data (Savitzky-Golay smoothing and Tophat baseline detection)
In [3]:
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.TopHat import tophat
n_scan, n_mz = im.size
for ii in range(n_mz):
ic = im.get_ic_at_index(ii)
ic1 = savitzky_golay(ic)
ic_smooth = savitzky_golay(ic1) # Why the second pass here?
ic_bc = tophat(ic_smooth, struct="1.5m")
im.set_ic_at_index(ii, ic_bc)
Now the Biller and Biemann based technique can be applied to detect peaks.
In [4]:
from pyms.BillerBiemann import BillerBiemann
pl = BillerBiemann(im, points=9, scans=2)
len(pl)
1191
Trim the peak list by relative intensity
In [5]:
from pyms.BillerBiemann import rel_threshold, num_ions_threshold
apl = rel_threshold(pl, percent=2)
len(apl)
1191
Trim the peak list by noise threshold
In [6]:
peak_list = num_ions_threshold(apl, n=3, cutoff=3000)
len(peak_list)
225
Set the mass range, remove unwanted ions and estimate the peak area
In [7]:
from pyms.Peak.Function import peak_sum_area
for peak in peak_list:
peak.crop_mass(51, 540)
peak.null_mass(73)
peak.null_mass(147)
area = peak_sum_area(im, peak)
peak.area = area
Create an Experiment
.
In [8]:
from pyms.Experiment import Experiment
expr = Experiment("a0806_077", peak_list)
Set the time range for all Experiments
In [9]:
expr.sele_rt_range(["6.5m", "21m"])
Save the experiment to disk.
In [10]:
expr.dump(output_directory / "experiments" / "a0806_077.expr")
Note
This example is in pyms-demo/jupyter/Experiment.ipynb
.
Example: Creating Multiple Experiments
In example three GC-MS experiments are prepared for peak alignment. The
experiments are named a0806_077
, a0806_078
, a0806_079
, and
represent separate GC-MS sample runs from the same biological sample.
The procedure is the same as for the previous example, but is repeated three times.
First, setup the paths to the datafiles and the output directory, then import the required functions.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.BillerBiemann import BillerBiemann, num_ions_threshold, rel_threshold
from pyms.Experiment import Experiment
from pyms.GCMS.IO.ANDI import ANDI_reader
from pyms.IntensityMatrix import build_intensity_matrix_i
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.Peak.Function import peak_sum_area, peak_top_ion_areas
from pyms.TopHat import tophat
Define the data files to process
In [2]:
expr_codes = ["a0806_077", "a0806_078", "a0806_079"]
# expr_codes = ["a0806_140", "a0806_141", "a0806_142"]
Loop over the experiments and perform the processing.
In [3]:
for expr_code in expr_codes:
print(f" -> Processing experiment '{expr_code}'")
andi_file = data_directory / f"{expr_code}.cdf"
data = ANDI_reader(andi_file)
im = build_intensity_matrix_i(data)
n_scan, n_mz = im.size
# Preprocess the data (Savitzky-Golay smoothing and Tophat baseline detection)
for ii in range(n_mz):
ic = im.get_ic_at_index(ii)
ic1 = savitzky_golay(ic)
ic_smooth = savitzky_golay(ic1) # Why the second pass here?
ic_bc = tophat(ic_smooth, struct="1.5m")
im.set_ic_at_index(ii, ic_bc)
# Peak detection
pl = BillerBiemann(im, points=9, scans=2)
# Trim the peak list by relative intensity
apl = rel_threshold(pl, percent=2)
# Trim the peak list by noise threshold
peak_list = num_ions_threshold(apl, n=3, cutoff=3000)
print("\t -> Number of Peaks found:", len(peak_list))
print("\t -> Executing peak post-processing and quantification...")
# Set the mass range, remove unwanted ions and estimate the peak area
# For peak alignment, all experiments must have the same mass range
for peak in peak_list:
peak.crop_mass(51, 540)
peak.null_mass(73)
peak.null_mass(147)
area = peak_sum_area(im, peak)
peak.area = area
area_dict = peak_top_ion_areas(im, peak)
peak.ion_areas = area_dict
# Create an Experiment
expr = Experiment(expr_code, peak_list)
# Use the same retention time range for all experiments
lo_rt_limit = "6.5m"
hi_rt_limit = "21m"
print(f"\t -> Selecting retention time range between '{lo_rt_limit}' and '{hi_rt_limit}'")
expr.sele_rt_range([lo_rt_limit, hi_rt_limit])
# Save the experiment to disk.
output_file = output_directory / "experiments" / f"{expr_code}.expr"
print(f"\t -> Saving the result as '{output_file}'")
expr.dump(output_file)
-> Processing experiment 'a0806_077'
-> Reading netCDF file '/home/vagrant/PyMassSpec/pyms-data/a0806_077.cdf'
-> Number of Peaks found: 225
-> Executing peak post-processing and quantification...
-> Selecting retention time range between '6.5m' and '21m'
-> Saving the result as '/home/vagrant/PyMassSpec/pyms-demo/jupyter/output/experiments/a0806_077.expr'
-> Processing experiment 'a0806_078'
-> Reading netCDF file '/home/vagrant/PyMassSpec/pyms-data/a0806_078.cdf'
-> Number of Peaks found: 238
-> Executing peak post-processing and quantification...
-> Selecting retention time range between '6.5m' and '21m'
-> Saving the result as '/home/vagrant/PyMassSpec/pyms-demo/jupyter/output/experiments/a0806_078.expr'
-> Processing experiment 'a0806_079'
-> Reading netCDF file '/home/vagrant/PyMassSpec/pyms-data/a0806_079.cdf'
-> Number of Peaks found: 268
-> Executing peak post-processing and quantification...
-> Selecting retention time range between '6.5m' and '21m'
-> Saving the result as '/home/vagrant/PyMassSpec/pyms-demo/jupyter/output/experiments/a0806_079.expr'
The previous set of data all belong to the same experimental condition. That is, they represent one group and any comparison between the data is a within group comparison. For the original experiment, another set of GC-MS data was collected for a different experimental condition. This group must also be stored as a set of experiments, and can be used for between group comparison.
The second set of data files are named a0806_140
, a0806_141
, and
a0806_142
, and are processed and stored as above.
In the example notebook, you can uncomment the line in code cell 2 and run the notebook again to process the second set of data files.
Note
This example is in pyms-demo/jupyter/Multiple_Experiments.ipynb
.
Dynamic Programming Alignment
Example: Within-state alignment of peak lists from multiple experiments
In this example the experiments a0806_077
, a0806_078
, and
a0806_079
prepared in the previous example will be aligned, and
therefore the notebook Multiple_Experiments.ipynb
must be run first
to create the files a0806_077.expr
, a0806_078.expr
,
a0806_079.expr
. These files contain the post-processed peak lists
from the three experiments.
First, determine the directory to the experiment files and import the required functions.
In [1]:
import pathlib
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.DPA.PairwiseAlignment import PairwiseAlignment, align_with_tree
from pyms.DPA.Alignment import exprl2alignment
from pyms.Experiment import load_expr
Define the input experiments list.
In [2]:
exprA_codes = ["a0806_077", "a0806_078", "a0806_079"]
Read the experiment files from disk and create a list of the loaded
Experiment
objects.
In [3]:
expr_list = []
for expr_code in exprA_codes:
file_name = output_directory / "experiments" / f"{expr_code}.expr"
expr = load_expr(file_name)
expr_list.append(expr)
Define the within-state alignment parameters.
In [4]:
Dw = 2.5 # rt modulation [s]
Gw = 0.30 # gap penalty
Convert each Experiment
object is converted into an Alignment
object with the function exprl2alignment()
..
In [5]:
F1 = exprl2alignment(expr_list)
In this example, there is only one experimental condition so the
alignment object is only for within group alignment (this special case
is called 1-alignment). The variable F1
is a Python list containing
three alignment objects.
Perform pairwise alignment. The class |pyms.DPA.Class.PairwiseAlignment| calculates the similarity between all peaks in one sample with those of another sample. This is done for all possible pairwise alignments (2-alignments).
In [6]:
T1 = PairwiseAlignment(F1, Dw, Gw)
Calculating pairwise alignments for 3 alignments (D=2.50, gap=0.30)
-> 2 pairs remaining
-> 1 pairs remaining
-> 0 pairs remaining
-> Clustering 6 pairwise alignments.Done
The parameters for the alignment by dynamic programming are: Dw
, the
retention time modulation in seconds; and Gw
, the gap penalty. These
parameters are explained in detail in 1.
The output of PairwiseAlignment
(T1
) is an object which contains
the dendrogram tree that maps the similarity relationship between the
input 1-alignments, and also 1-alignments themselves.
The function align_with_tree()
then takes the object T1
and
aligns the individual alignment objects according to the guide tree.
In [7]:
A1 = align_with_tree(T1, min_peaks=2)
Aligning 3 items with guide tree (D=2.50, gap=0.30)
-> 1 item(s) remaining
-> 0 item(s) remaining
In this example, the individual alignments are three 1-alignments, and
the function align_with_tree()
first creates a 2-alignment from the
two most similar 1-alignments and then adds the third 1-alignment to
this to create a 3-alignment.
The parameter min_peaks=2
specifies that any peak column of the data
matrix that has fewer than two peaks in the final alignment will be
dropped. This is useful to clean up the data matrix of accidental peaks
that are not truly observed over the set of replicates.
Finally, the resulting 3-alignment is saved by writing alignment tables
containing peak retention times (rt.csv
) and the corresponding peak
areas (area.csv
). These are plain ASCII files in CSV format.
In [8]:
A1.write_csv(
output_directory / "within_state_alignment" / 'a_rt.csv',
output_directory / "within_state_alignment" / 'a_area.csv',
)
The file area1.csv
contains the data matrix where the corresponding
peaks are aligned in the columns and each row corresponds to an
experiment. The file rt1.csv
is useful for manually inspecting the
alignment.
Example: Between-state alignment of peak lists from multiple experiments
In the previous example the list of peaks were aligned within a single experiment with multiple replicates (“within-state alignment”). In practice, it is of more interest to compare the two experimental states.
In a typical experimental setup there can be multiple replicate experiments on each experimental state or condition. To analyze the results of such an experiment statistically, the list of peaks need to be aligned within each experimental state and also between the states. The result of such an alignment would be the data matrix of integrated peak areas. The data matrix contains a row for each sample and the number of columns is determined by the number of unique peaks (metabolites) detected in all the experiments.
In principle, all experiments could be aligned across conditions and replicates in the one process. However, a more robust approach is to first align experiments within each set of replicates (within-state alignment), and then to align the resulting alignments (between-state alignment) 1.
This example demonstrates how the peak lists from two cell states are aligned.
Cell state A, consisting of three aligned experiments (
a0806_077
,a0806_078
, anda0806_079
), andCell state B, consisting of three aligned experiments (
a0806_140
,a0806_141
, anda0806_142
).
These experiments were created in the notebook
Multiple_Experiments.ipynb
.
First, perform within-state alignment for cell state B.
In [9]:
exprB_codes = ["a0806_140", "a0806_141", "a0806_142"]
expr_list = []
for expr_code in exprB_codes:
file_name = output_directory / "experiments" / f"{expr_code}.expr"
expr = load_expr(file_name)
expr_list.append(expr)
F2 = exprl2alignment(expr_list)
T2 = PairwiseAlignment(F2, Dw, Gw)
A2 = align_with_tree(T2, min_peaks=2)
A2.write_csv(
output_directory / "within_state_alignment" / 'b_rt.csv',
output_directory / "within_state_alignment" / 'b_area.csv',
)
Calculating pairwise alignments for 3 alignments (D=2.50, gap=0.30)
-> 2 pairs remaining
-> 1 pairs remaining
-> 0 pairs remaining
-> Clustering 6 pairwise alignments.Done
Aligning 3 items with guide tree (D=2.50, gap=0.30)
-> 1 item(s) remaining
-> 0 item(s) remaining
A1
and A2
are the results of the within group alignments for
cell state A and B, respectively. The between-state alignment can be
performed as follows alignment commands:
In [10]:
# Define the within-state alignment parameters.
Db = 10.0 # rt modulation
Gb = 0.30 # gap penalty
T9 = PairwiseAlignment([A1,A2], Db, Gb)
A9 = align_with_tree(T9)
A9.write_csv(
output_directory / "between_state_alignment" / 'rt.csv',
output_directory / "between_state_alignment" / 'area.csv')
Calculating pairwise alignments for 2 alignments (D=10.00, gap=0.30)
-> 0 pairs remaining
-> Clustering 2 pairwise alignments.Done
Aligning 2 items with guide tree (D=10.00, gap=0.30)
-> 0 item(s) remaining
Store the aligned peaks to disk.
In [11]:
from pyms.Peak.List.IO import store_peaks
aligned_peaks = A9.aligned_peaks()
store_peaks(aligned_peaks, output_directory / "between_state_alignment" / 'peaks.bin')
In this example the retention time tolerance for between-state alignment is greater compared to the retention time tolerance for the within-state alignment as we expect less fidelity in retention times between them. The same functions are used for the within-state and between-state alignment. The result of the alignment is saved to a file as the area and retention time matrices (described above).
Note
These examples are in pyms-demo/jupyter/DPA.ipynb
.
Common Ion Area Quantitation
Note
This example is in pyms-demo/64
The area.csv
file produced in the preceding section lists the total area of each peak in the alignment.
The total area is the sum of the areas of each of the individual ions in the peak.
While this approach produces broadly accurate results, it can result in errors where neighbouring peaks or unfiltered noise add to the peak in some way.
One alternative to this approach is to pick a single ion which is common to a particular peak (compound), and to report only the area of this ion for each occurrence of that peak in the alignment.
Using the method common_ion()
of the class Alignment
, PyMassSpec
can select an ion for each aligned peak which is both abundant and occurs most often for that peak.
We call this the ‘Common Ion Algorithm’ (CIA).
To use this method it is essential that the individual ion areas have been set (see section Individual Ion Areas).
Using the Common Ion Algorithm
When using the CIA for area quantitation, a different method of the class Alignment
is used to write the area matrix; write_common_ion_csv()
.
This requires a list of the common ions for each peak in the alignment.
This list is generated using the Alignment class method common_ion()
.
Continuing from the previous example, the following invokes common ion filtering on previously created alignment object ‘A9’:
>>> common_ion_list = A9.common_ion()
The variable ‘common_ion_list’ is a list of the common ion for each peak in the alignment. This list is the same length as the alignment. To write peak areas using common ion quantitation:
>>> A9.write_common_ion_csv('output/area_common_ion.csv',common_ion_list)
The Display Module
Table of Contents
PyMassSpec has graphical capabilities to display information such as
IonChromatogram
objects (ICs),
Total Ion Chromatograms (TICs), and detected lists of Peaks.
Example: Displaying a TIC
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
Read the raw data files and extract the TIC
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
tic = data.tic
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Import matplotlib and the plot_ic()
function, create a subplot, and
plot the TIC:
In [3]:
import matplotlib.pyplot as plt
from pyms.Display import plot_ic
%matplotlib inline
# Change to ``notebook`` for an interactive view
fig, ax = plt.subplots(1, 1, figsize=(8, 5))
# Plot the TIC
plot_ic(ax, tic, label="TIC")
# Set the title
ax.set_title("TIC for gc01_0812_066")
# Add the legend
plt.legend()
plt.show()

In addition to the TIC, other arguments may be passed to plot_ic()
.
These can adjust the line colour or the text of the legend entry. See
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html
for a full list of the possible arguments.
An IonChromatogram
can be plotted in the same manner as the TIC in
the example above.
When not running in Jupyter Notebook, the plot may appear in a separate window looking like this:

Graphics window displayed by the script 70a/proc.py
Note
This example is in pyms-demo/jupyter/Displaying_TIC.ipynb
and pyms-demo/70a/proc.py
.
Example: Displaying Multiple IonChromatogram Objects
Multiple IonChromatogram
objects can be plotted on the same figure.
To start, load a datafile and create an IntensityMatrix
as before.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix_i
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
tic = data.tic
im = build_intensity_matrix_i(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Extract the desired IonChromatograms from the IntensityMatrix
.
In [2]:
ic73 = im.get_ic_at_mass(73)
ic147 = im.get_ic_at_mass(147)
Import matplotlib and the plot_ic()
function, create a subplot, and
plot the ICs on the chart:
In [3]:
import matplotlib.pyplot as plt
from pyms.Display import plot_ic
%matplotlib inline
# Change to ``notebook`` for an interactive view
fig, ax = plt.subplots(1, 1, figsize=(8, 5))
# Plot the ICs
plot_ic(ax, tic, label="TIC")
plot_ic(ax, ic73, label="m/z 73")
plot_ic(ax, ic147, label="m/z 147")
# Set the title
ax.set_title("TIC and ICs for m/z = 73 & 147")
# Add the legend
plt.legend()
plt.show()

When not running in Jupyter Notebook, the plot may appear in a separate window looking like this:

Graphics window displayed by the script 70b/proc.py
Note
This example is in pyms-demo/jupyter/Displaying_Multiple_IC.ipynb
and pyms-demo/70b/proc.py
.
Example: Displaying a Mass Spectrum
The pyms Display module can also be used to display individual mass spectra.
To start, load a datafile and create an IntensityMatrix
as before.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix_i
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
tic = data.tic
im = build_intensity_matrix_i(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Extract the desired MassSpectrum
from the IntensityMatrix
.
In [2]:
ms = im.get_ms_at_index(1024)
Import matplotlib and the |plot_mass_spec()| function, create a subplot, and plot the spectrum on the chart:
In [3]:
import matplotlib.pyplot as plt
from pyms.Display import plot_mass_spec
%matplotlib inline
# Change to ``notebook`` for an interactive view
fig, ax = plt.subplots(1, 1, figsize=(8, 5))
# Plot the spectrum
plot_mass_spec(ax, ms)
# Set the title
ax.set_title("Mass Spectrum at index 1024")
# Reduce the x-axis range to better visualise the data
ax.set_xlim(50, 350)
plt.show()

When not running in Jupyter Notebook, the spectrum may appear in a separate window looking like this:

Graphics window displayed by the script 70c/proc.py
Note
This example is in pyms-demo/jupyter/Displaying_Mass_Spec.ipynb
and pyms-demo/70c/proc.py
.
Example: Displaying Detected Peaks
The pyms.Display.Display
module also allows for detected peaks to marked on a TIC
plot.
First, setup the paths to the datafiles and the output directory, then import JCAMP_reader and build_intensity_matrix.
In [1]:
import pathlib
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
Read the raw data files, extract the TIC and build the
IntensityMatrix
.
In [2]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
data.trim("500s", "2000s")
tic = data.tic
im = build_intensity_matrix(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Trimming data to between 520 and 4517 scans
Perform pre-filtering and peak detection. For more information on
detecting peaks see
“Peak detection and representation <chapter06.html>
_”.
In [3]:
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.TopHat import tophat
from pyms.BillerBiemann import BillerBiemann, rel_threshold, num_ions_threshold
n_scan, n_mz = im.size
for ii in range(n_mz):
ic = im.get_ic_at_index(ii)
ic_smooth = savitzky_golay(ic)
ic_bc = tophat(ic_smooth, struct="1.5m")
im.set_ic_at_index(ii, ic_bc)
# Detect Peaks
peak_list = BillerBiemann(im, points=9, scans=2)
print("Number of peaks found: ", len(peak_list))
# Filter the peak list, first by removing all intensities in a peak less than a
# given relative threshold, then by removing all peaks that have less than a
# given number of ions above a given value
pl = rel_threshold(peak_list, percent=2)
new_peak_list = num_ions_threshold(pl, n=3, cutoff=10000)
print("Number of filtered peaks: ", len(new_peak_list))
Number of peaks found: 1467
Number of filtered peaks: 72
Get Ion Chromatograms for 4 separate m/z channels.
In [4]:
ic191 = im.get_ic_at_mass(191)
ic73 = im.get_ic_at_mass(73)
ic57 = im.get_ic_at_mass(57)
ic55 = im.get_ic_at_mass(55)
Import matplotlib, and the plot_ic()
and plot_peaks()
functions.
In [5]:
import matplotlib.pyplot as plt
from pyms.Display import plot_ic, plot_peaks
Create a subplot, and plot the TIC.
In [6]:
%matplotlib inline
# Change to ``notebook`` for an interactive view
fig, ax = plt.subplots(1, 1, figsize=(8, 5))
# Plot the ICs
plot_ic(ax, tic, label="TIC")
plot_ic(ax, ic191, label="m/z 191")
plot_ic(ax, ic73, label="m/z 73")
plot_ic(ax, ic57, label="m/z 57")
plot_ic(ax, ic55, label="m/z 55")
# Plot the peaks
plot_peaks(ax, new_peak_list)
# Set the title
ax.set_title('TIC, ICs, and PyMS Detected Peaks')
# Add the legend
plt.legend()
plt.show()

The function plot_peaks()
adds the PyMassSpec detected peaks to the
figure.
The function store_peaks()
in proc_save_peaks.py
stores the peaks, while
load_peaks()
in proc.py
loads them for the Display class to use.
When not running in Jupyter Notebook, the plot may appear in a separate window looking like this:

Graphics window displayed by the script 71/proc.py
Note
This example is in pyms-demo/jupyter/Displaying_Detected_Peaks.ipynb
and pyms-demo/71/proc.py
.
Example: User Interaction With The Plot Window
The class pyms.Display.ClickEventHandler
allows for additional interaction with
the plot on top of that provided by matplotlib
.
Note: This may not work in Jupyter Notebook
To use the class, first import and process the data before:
In [1]:
import pathlib
import matplotlib.pyplot as plt
from pyms.GCMS.IO.JCAMP import JCAMP_reader
from pyms.IntensityMatrix import build_intensity_matrix
from pyms.Display import plot_ic, plot_peaks
from pyms.Noise.SavitzkyGolay import savitzky_golay
from pyms.TopHat import tophat
from pyms.BillerBiemann import BillerBiemann, rel_threshold, num_ions_threshold
In [2]:
data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data"
# Change this if the data files are stored in a different location
output_directory = pathlib.Path(".").resolve() / "output"
In [3]:
jcamp_file = data_directory / "gc01_0812_066.jdx"
data = JCAMP_reader(jcamp_file)
data.trim("500s", "2000s")
tic = data.tic
im = build_intensity_matrix(data)
-> Reading JCAMP file '/home/vagrant/PyMassSpec/pyms-data/gc01_0812_066.jdx'
Trimming data to between 520 and 4517 scans
In [4]:
n_scan, n_mz = im.size
for ii in range(n_mz):
ic = im.get_ic_at_index(ii)
ic_smooth = savitzky_golay(ic)
ic_bc = tophat(ic_smooth, struct="1.5m")
im.set_ic_at_index(ii, ic_bc)
In [5]:
peak_list = BillerBiemann(im, points=9, scans=2)
pl = rel_threshold(peak_list, percent=2)
new_peak_list = num_ions_threshold(pl, n=3, cutoff=10000)
print("Number of filtered peaks: ", len(new_peak_list))
Number of filtered peaks: 72
Creating the plot proceeds much as before, except that
pyms.Display.ClickEventHandler
must be called before
plt.show()
.
You should also assign this to a variable to prevent it being garbage collected.
In [6]:
from pyms.Display import ClickEventHandler
%matplotlib inline
# Change to ``notebook`` for an interactive view
fig, ax = plt.subplots(1, 1, figsize=(8, 5))
# Plot the TIC
plot_ic(ax, tic, label="TIC")
# Plot the peaks
plot_peaks(ax, new_peak_list)
# Set the title
ax.set_title('TIC for gc01_0812_066 with Detected Peaks')
# Set up the ClickEventHandler
handler = ClickEventHandler(new_peak_list)
# Add the legend
plt.legend()
plt.show()

Clicking on a Peak causes a list of the 5 highest intensity ions at that Peak to be written to the terminal in order. The output should look similar to this:
RT: 1031.823
Mass Intensity
158.0 2206317.857142857
73.0 628007.1428571426
218.0 492717.04761904746
159.0 316150.4285714285
147.0 196663.95238095228
If there is no Peak close to the point on the chart that was clicked, the the following will be shown in the terminal:
No Peak at this point
The pyms.Display.ClickEventHandler
class can be configured with a different
tolerance, in seconds, when clicking on a Peak, and to display a
different number of top n ions when a Peak is clicked.
In addition, clicking the right mouse button on a Peak displays the mass spectrum at the peak in a new window.

The mass spectrum displayed by PyMassSpec when a peak in the graphics window is right clicked
To zoom in on a portion of the plot, select the button,
hold down the left mouse button while dragging a rectangle over
the area of interest. To return to the original view, click on the
button.
The button allows panning across the zoomed plot.
Note
This example is in pyms-demo/jupyter/Display_User_Interaction.ipynb
and pyms-demo/72/proc.py
.
pyms.Base
Base for PyMassSpec classes.
Classes:
Base class. |
-
class
pymsBaseClass
[source] Bases:
object
Base class.
Methods:
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.-
dump
(file_name, protocol=3)[source] Dumps an object to a file through
pickle.dump()
.
-
pyms.BillerBiemann
Functions to perform Biller and Biemann deconvolution.
Functions:
|
Deconvolution based on the algorithm of Biller and Biemann (1974). |
|
Returns the scan indices for the apexes of the ion. |
|
List of retention time and intensity of local maxima for ion. |
|
List of retention time and intensity of local maxima for ion. |
|
Constructs a matrix containing only data for scans in which particular ions apexed. |
|
Remove Peaks where there are fewer than |
|
Remove ions with relative intensities less than the given relative percentage of the maximum intensity. |
|
Reconstruct the TIC as sum of maxima. |
-
BillerBiemann
(im, points=3, scans=1)[source] Deconvolution based on the algorithm of Biller and Biemann (1974).
- Parameters
im (
BaseIntensityMatrix
)points (
int
) – Number of scans over which to consider a maxima to be a peak. Default3
.scans (
int
) – Number of scans to combine peaks from to compensate for spectra skewing. Default1
.
- Return type
- Returns
List of detected peaks
- Authors
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
get_maxima_indices
(ion_intensities, points=3)[source] Returns the scan indices for the apexes of the ion.
- Parameters
- Author
Andrew Isaac, Dominic Davis-Foster (type assertions)
Example:
>>> # A trivial set of data with two clear peaks >>> data = [1, 2, 3, 4, 5, 4, 3, 2, 1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1] >>> get_maxima_indices(data) [4, 13] >>> # Wider window (more points) >>> get_maxima_indices(data, points=10) [13]
-
get_maxima_list
(ic, points=3)[source] List of retention time and intensity of local maxima for ion.
- Parameters
ic (
IonChromatogram
)points (
int
) – Number of scans over which to consider a maxima to be a peak. Default3
.
- Return type
- Returns
A list of retention time and intensity of local maxima for ion.
- Author
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
get_maxima_list_reduced
(ic, mp_rt, points=13, window=3)[source] List of retention time and intensity of local maxima for ion.
Only peaks around a specific retention time are recorded.Created for use with gap filling algorithm.- Parameters
ic (
IonChromatogram
)mp_rt (
float
) – The retention time of the missing peakpoints (
int
) – Number of scans over which to consider a maxima to be a peak. Default13
.window (
int
) – The window aroundmp_rt
where peaks should be recorded. Default3
.
- Return type
- Returns
A list of 2-element tuple containing the retention time and intensity of local maxima for each ion.
- Author
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
get_maxima_matrix
(im, points=3, scans=1)[source] Constructs a matrix containing only data for scans in which particular ions apexed.
The data can be optionally consolidated into the scan within a range with the highest total intensity by adjusting the
scans
parameter. By default this is1
, which does not consolidate the data.The columns are ion masses and the rows are scans. Get matrix of local maxima for each ion.
- Parameters
im (
BaseIntensityMatrix
)points (
int
) – Number of scans over which to consider a maxima to be a peak. Default3
.scans (
int
) – Number of scans to combine peaks from to compensate for spectra skewing. Default1
.
- Return type
- Returns
A matrix of giving the intensities of ion masses (columns) and for each scan (rows).
- Author
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
num_ions_threshold
(pl, n, cutoff, copy_peaks=True)[source] Remove Peaks where there are fewer than
n
ions with intensities above the given threshold.- Parameters
- Return type
- Returns
A new list of Peak objects.
- Author
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
rel_threshold
(pl, percent=2, copy_peaks=True)[source] Remove ions with relative intensities less than the given relative percentage of the maximum intensity.
- Parameters
- Return type
- Returns
A new list of Peak objects with threshold ions.
- Author
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
sum_maxima
(im, points=3, scans=1)[source] Reconstruct the TIC as sum of maxima.
- Parameters
im (
BaseIntensityMatrix
)points (
int
) – Peak if maxima over ‘points’ number of scans. Default3
.scans (
int
) – Number of scans to combine peaks from to compensate for spectra skewing. Default1
.
- Return type
- Returns
The reconstructed TIC.
- Author
Andrew Isaac, Dominic Davis-Foster (type assertions)
pyms.Display
Class to Display Ion Chromatograms and TIC.
Classes:
|
Class to enable clicking of chromatogram to view the intensities top n most intense ions at that peak, and viewing of the mass spectrum with a right click |
|
Class to display Ion Chromatograms and Total Ion Chromatograms from |
Functions:
|
Invert the mass spectrum for display in a head2tail plot. |
|
Plots two mass spectra head to tail. |
|
Plots an Ion Chromatogram. |
|
Plots a Mass Spectrum. |
|
Plots the locations of peaks as found by PyMassSpec. |
-
class
ClickEventHandler
(peak_list, fig=None, ax=None, tolerance=0.005, n_intensities=5)[source] Bases:
object
Class to enable clicking of chromatogram to view the intensities top n most intense ions at that peak, and viewing of the mass spectrum with a right click
Methods:
get_n_largest
(intensity_list)Computes the indices of the largest n ion intensities for writing to console.
onclick
(event)Finds the n highest intensity m/z channels for the selected peak.
-
class
Display
(fig=None, ax=None)[source] Bases:
object
Class to display Ion Chromatograms and Total Ion Chromatograms from
pyms.IonChromatogram.IonChromatogram
usingmatplotlib.pyplot
.- Parameters
If
fig
is not given thenfig
andax
default to:>>> fig = plt.figure() >>> ax = fig.add_subplot(111)
If only
fig
is given thenax
defaults to:>>> ax = fig.add_subplot(111)
- Author
Sean O’Callaghan
- Author
Vladimir Likic
- Author
Dominic Davis-Foster
Methods:
do_plotting
([plot_label])Plots TIC and IC(s) if they have been created by
plot_tic()
orplot_ic()
.get_5_largest
(intensity_list)Returns the indices of the 5 largest ion intensities.
onclick
(event)Finds the 5 highest intensity m/z channels for the selected peak.
plot_ic
(ic, **kwargs)Plots an Ion Chromatogram.
plot_mass_spec
(mass_spec, **kwargs)Plots a Mass Spectrum.
plot_peaks
(peak_list[, label])Plots the locations of peaks as found by PyMassSpec.
plot_tic
(tic[, minutes])Plots a Total Ion Chromatogram.
save_chart
(filepath[, filetypes])Save the chart to the given path with the given filetypes.
Show the chart on screen.
-
do_plotting
(plot_label=None)[source] Plots TIC and IC(s) if they have been created by
plot_tic()
orplot_ic()
.Also adds detected peaks if they have been added by
plot_peaks()
-
onclick
(event)[source] Finds the 5 highest intensity m/z channels for the selected peak. The peak is selected by clicking on it. If a button other than the left one is clicked, a new plot of the mass spectrum is displayed.
- Parameters
event – a mouse click by the user
-
plot_ic
(ic, **kwargs)[source] Plots an Ion Chromatogram.
- Parameters
ic (
IonChromatogram
) – Ion Chromatograms m/z channels for plotting- Other Parameters
matplotlib.lines.Line2D
properties. Used to specify properties like a line label (for auto legends), linewidth, antialiasing, marker face color.Example:
>>> plot_ic(im.get_ic_at_index(5), label='IC @ Index 5', linewidth=2)
See https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html for the list of possible kwargs
- Return type
-
plot_mass_spec
(mass_spec, **kwargs)[source] Plots a Mass Spectrum.
- Parameters
mass_spec (
MassSpectrum
) – The mass spectrum at a given time/index- Other Parameters
matplotlib.lines.Line2D
properties. Used to specify properties like a line label (for auto legends), linewidth, antialiasing, marker face color.Example:
>>> plot_mass_spec(im.get_ms_at_index(5), linewidth=2) >>> ax.set_title(f"Mass spec for peak at time {im.get_time_at_index(5):5.2f}")
See https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html for the list of possible kwargs
- Return type
-
plot_tic
(tic, minutes=False, **kwargs)[source] Plots a Total Ion Chromatogram.
- Parameters
tic (
IonChromatogram
) – Total Ion Chromatogram.minutes (
bool
) – Whether to show the time in minutes. DefaultFalse
.
- Other Parameters
matplotlib.lines.Line2D
properties. Used to specify properties like a line label (for auto legends), linewidth, antialiasing, marker face color.Example:
>>> plot_tic(data.tic, label='TIC', linewidth=2)
See https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html for the list of possible kwargs
- Return type
-
invert_mass_spec
(mass_spec, inplace=False)[source] Invert the mass spectrum for display in a head2tail plot.
- Parameters
mass_spec (
MassSpectrum
) – The Mass Spectrum to normalizeinplace (
bool
) – Whether the inversion should be applied to theMassSpectrum
object given, or to a copy (default behaviour). DefaultFalse
.
- Return type
- Returns
The normalized mass spectrum
-
plot_head2tail
(ax, top_mass_spec, bottom_mass_spec, top_spec_kwargs=None, bottom_spec_kwargs=None)[source] Plots two mass spectra head to tail.
- Parameters
ax (
Axes
) – The axes to plot the MassSpectra ontop_mass_spec (
MassSpectrum
) – The Mass Spectrum to plot on topbottom_mass_spec (
MassSpectrum
) – The Mass Spectrum to plot on the bottomtop_spec_kwargs (
Optional
[Dict
]) – A dictionary of keyword arguments for the top mass spectrum. Defaults to red with a line width of 0.5bottom_spec_kwargs (
Optional
[Dict
]) – A dictionary of keyword arguments for the bottom mass spectrum. Defaults to blue with a line width of 0.5
- top_spec_kwargs and bottom_spec_kwargs are used to specify properties like a line label
(for auto legends), linewidth, antialiasing, marker face color.
See https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html for the list of possible kwargs
- Returns
A tuple of container with all the bars and optionally errorbars for the top and bottom spectra.
- Return type
tuple of
matplotlib.container.BarContainer
-
plot_ic
(ax, ic, minutes=False, **kwargs)[source] Plots an Ion Chromatogram.
- Parameters
ax (
Axes
) – The axes to plot the IonChromatogram onic (
IonChromatogram
) – Ion Chromatograms m/z channels for plottingminutes (
bool
) – Whether the x-axis should be plotted in minutes. Default False (plotted in seconds). DefaultFalse
.
- Other Parameters
matplotlib.lines.Line2D
properties. Used to specify properties like a line label (for auto legends), linewidth, antialiasing, marker face color.Example:
>>> plot_ic(im.get_ic_at_index(5), label='IC @ Index 5', linewidth=2)
See https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html for the list of possible kwargs
- Return type
- Returns
A list of Line2D objects representing the plotted data.
-
plot_mass_spec
(ax, mass_spec, **kwargs)[source] Plots a Mass Spectrum.
- Parameters
ax (
Axes
) – The axes to plot the MassSpectrum onmass_spec (
MassSpectrum
) – The mass spectrum to plot
- Other Parameters
matplotlib.lines.Line2D
properties. Used to specify properties like a line label (for auto legends), linewidth, antialiasing, marker face color.Example:
>>> plot_mass_spec(im.get_ms_at_index(5), linewidth=2) >>> ax.set_title(f"Mass spec for peak at time {im.get_time_at_index(5):5.2f}")
See https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html for the list of possible kwargs
- Returns
Container with all the bars and optionally errorbars.
- Return type
pyms.DPA
Table of Contents
Alignment of peak lists by dynamic programming.
pyms.DPA.Alignment
Classes for peak alignment by dynamic programming.
Classes:
|
Models an alignment of peak lists. |
Functions:
|
Converts a list of experiments into a list of alignments. |
-
class
Alignment
(expr)[source] Bases:
object
Models an alignment of peak lists.
- Parameters
expr (
Optional
[Experiment
]) – The experiment to be converted into an alignment object.- Authors
Woon Wai Keen, Qiao Wang, Vladimir Likic, Dominic Davis-Foster.
Methods:
__len__
()Returns the length of the alignment, defined as the number of peak positions in the alignment.
aligned_peaks
([minutes])Returns a list of Peak objects where each peak has the combined spectra and average retention time of all peaks that aligned.
Calculates a common ion among the peaks of an aligned peak.
filter_min_peaks
(min_peaks)Filters alignment positions that have less peaks than
min_peaks
.get_area_alignment
([require_all_expr])Returns a Pandas dataframe containing the peak areas of the aligned peaks.
get_highest_mz_ion
(ion_dict)Returns the preferred ion for quantitiation.
get_ms_alignment
([require_all_expr])Returns a Pandas dataframe of mass spectra for the aligned peaks.
get_peak_alignment
([minutes, require_all_expr])Returns a Pandas dataframe of aligned retention times.
get_peaks_alignment
([require_all_expr])Returns a Pandas dataframe of Peak objects for the aligned peaks.
write_common_ion_csv
(area_file_name, …[, …])Writes the alignment to CSV files.
write_csv
(rt_file_name, area_file_name[, …])Writes the alignment to CSV files.
write_ion_areas_csv
(ms_file_name[, minutes])Write Ion Areas to CSV File.
Attributes:
List of experiment codes.
-
__len__
()[source] Returns the length of the alignment, defined as the number of peak positions in the alignment.
- Return type
- Authors
Qiao Wang, Vladimir Likic
-
aligned_peaks
(minutes=False)[source] Returns a list of Peak objects where each peak has the combined spectra and average retention time of all peaks that aligned.
-
filter_min_peaks
(min_peaks)[source] Filters alignment positions that have less peaks than
min_peaks
.This function is useful only for within state alignment.
- Parameters
min_peaks (
int
) – Minimum number of peaks required for the alignment position to survive filtering.- Author
Qiao Wang
-
get_area_alignment
(require_all_expr=True)[source] Returns a Pandas dataframe containing the peak areas of the aligned peaks.
-
static
get_highest_mz_ion
(ion_dict)[source] Returns the preferred ion for quantitiation.
Looks at the list of candidate ions, selects those which have highest occurrence, and selects the heaviest of those.
-
get_ms_alignment
(require_all_expr=True)[source] Returns a Pandas dataframe of mass spectra for the aligned peaks.
-
get_peak_alignment
(minutes=True, require_all_expr=True)[source] Returns a Pandas dataframe of aligned retention times.
- Parameters
- Authors
Woon Wai Keen, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster
- Return type
-
get_peaks_alignment
(require_all_expr=True)[source] Returns a Pandas dataframe of Peak objects for the aligned peaks.
-
write_common_ion_csv
(area_file_name, top_ion_list, minutes=True)[source] Writes the alignment to CSV files.
This function writes two files: one containing the alignment of peak retention times and the other containing the alignment of peak areas.
- Parameters
area_file_name (
Union
[str
,Path
,PathLike
]) – The name for the areas alignment file.top_ion_list (
Sequence
[float
]) – A list of the highest intensity common ion along the aligned peaks.minutes (
bool
) – Whether to save retention times in minutes. IfFalse
, retention time will be saved in seconds. DefaultTrue
.
- Authors
Woon Wai Keen, Andrew Isaac, Sean O’Callaghan, Vladimir Likic, Dominic Davis-Foster (pathlib support)
-
write_csv
(rt_file_name, area_file_name, minutes=True)[source] Writes the alignment to CSV files.
This function writes two files: one containing the alignment of peak retention times and the other containing the alignment of peak areas.
- Parameters
rt_file_name (
Union
[str
,Path
,PathLike
]) – The name for the retention time alignment file.area_file_name (
Union
[str
,Path
,PathLike
]) – The name for the areas alignment file.minutes (
bool
) – Whether to save retention times in minutes. IfFalse
, retention time will be saved in seconds. DefaultTrue
.
- Authors
Woon Wai Keen, Andrew Isaac, Vladimir Likic, David Kainer, Dominic Davis-Foster (pathlib support)
-
exprl2alignment
(expr_list)[source] Converts a list of experiments into a list of alignments.
- Parameters
expr_list (
List
[Experiment
]) – The list of experiments to be converted into an alignment objects.- Return type
- Returns
A list of alignment objects for the experiments.
- Author
Vladimir Likic
pyms.DPA.PairwiseAlignment
Classes for peak alignment by dynamic programming.
Classes:
|
Models pairwise alignment of alignments. |
Functions:
|
Aligns two alignments. |
|
Aligns a list of alignments using the supplied guide tree. |
|
A helper function for sorting peak positions in a alignment. |
|
Calculates similarity score between two alignments (new method). |
|
Solves optimal path in score matrix based on global sequence alignment. |
|
Merges two alignments with gaps added in from DP traceback. |
|
Calculates the similarity between the two alignment positions. |
|
Calculates the score matrix between two alignments. |
|
Calculates the score matrix between two alignments. |
-
class
PairwiseAlignment
(alignments, D, gap)[source] Bases:
object
Models pairwise alignment of alignments.
-
align_with_tree
(T, min_peaks=1)[source] Aligns a list of alignments using the supplied guide tree.
- Parameters
T (
PairwiseAlignment
) – The pairwise alignment object.min_peaks (
int
) – Default1
.
- Return type
- Returns
The final alignment consisting of aligned input alignments.
- Authors
Woon Wai Keen, Vladimir Likic
-
alignment_compare
(x, y)[source] A helper function for sorting peak positions in a alignment.
- Parameters
x
y
- Return type
-
alignment_similarity
(traces, score_matrix, gap)[source] Calculates similarity score between two alignments (new method).
-
merge_alignments
(A1, A2, traces)[source] Merges two alignments with gaps added in from DP traceback.
-
position_similarity
(pos1, pos2, D)[source] Calculates the similarity between the two alignment positions.
A score of 0 is best and 1 is worst.
pyms.DPA.IO
Functions for writing peak alignment to various file formats.
Functions:
|
Writes the alignment to an excel file, with colouring showing possible mis-alignments. |
|
Creates a csv file with UID, common and qualifying ions and their ratios for mass hunter interpretation. |
|
Write an alignment to an Excel workbook. |
-
write_excel
(alignment, file_name, minutes=True)[source] Writes the alignment to an excel file, with colouring showing possible mis-alignments.
- Parameters
- Author
David Kainer
pyms.DPA.clustering
Provides Pycluster.treecluster regardless of which library provides it.
Functions:
|
Perform hierarchical clustering, and return a Tree object. |
-
treecluster
(data, mask=None, weight=None, transpose=False, method='m', dist='e', distancematrix=None)[source] Perform hierarchical clustering, and return a Tree object.
This function implements the pairwise single, complete, centroid, and average linkage hierarchical clustering methods.
- Keyword arguments:
data: nrows x ncolumns array containing the data values.
mask: nrows x ncolumns array of integers, showing which data are missing. If mask[i][j]==0, then data[i][j] is missing.
weight: the weights to be used when calculating distances.
transpose: - if False, rows are clustered; - if True, columns are clustered.
dist: specifies the distance function to be used: - dist == ‘e’: Euclidean distance - dist == ‘b’: City Block distance - dist == ‘c’: Pearson correlation - dist == ‘a’: absolute value of the correlation - dist == ‘u’: uncentered correlation - dist == ‘x’: absolute uncentered correlation - dist == ‘s’: Spearman’s rank correlation - dist == ‘k’: Kendall’s tau
method: specifies which linkage method is used: - method == ‘s’: Single pairwise linkage - method == ‘m’: Complete (maximum) pairwise linkage (default) - method == ‘c’: Centroid linkage - method == ‘a’: Average pairwise linkage
distancematrix: The distance matrix between the items. There are three ways in which you can pass a distance matrix: 1. a 2D NumPy array (in which only the left-lower part of the array will be accessed); 2. a 1D NumPy array containing the distances consecutively; 3. a list of rows containing the lower-triangular part of the distance matrix.
Examples are:
>>> from numpy import array >>> # option 1: >>> distance = array([[0.0, 1.1, 2.3], ... [1.1, 0.0, 4.5], ... [2.3, 4.5, 0.0]]) >>> # option 2: >>> distance = array([1.1, 2.3, 4.5]) >>> # option 3: >>> distance = [array([]), ... array([1.1]), ... array([2.3, 4.5])]
These three correspond to the same distance matrix.
PLEASE NOTE: As the treecluster routine may shuffle the values in the distance matrix as part of the clustering algorithm, be sure to save this array in a different variable before calling treecluster if you need it later.
Either data or distancematrix should be None. If distancematrix is None, the hierarchical clustering solution is calculated from the values stored in the argument data. If data is None, the hierarchical clustering solution is instead calculated from the distance matrix. Pairwise centroid-linkage clustering can be performed only from the data values and not from the distance matrix. Pairwise single-, maximum-, and average-linkage clustering can be calculated from the data values or from the distance matrix.
Return value: treecluster returns a Tree object describing the hierarchical clustering result. See the description of the Tree class for more information.
pyms.eic
Class to model a subset of data from an Intensity Matrix.
Classes:
|
Represents an extracted subset of the chromatographic data. |
Functions:
|
Given an intensity matrix and a list of masses, construct a |
-
class
ExtractedIntensityMatrix
(time_list, mass_list, intensity_array)[source] Bases:
BaseIntensityMatrix
Represents an extracted subset of the chromatographic data.
- Parameters
- Authors
Dominic Davis-Foster
New in version 2.3.0.
Methods:
__eq__
(other)Return whether this intensity matrix object is equal to another object.
__len__
()Returns the number of scans in the intensity matrix.
crop_mass
(mass_min, mass_max)Crops mass spectrum.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.get_ic_at_index
(ix)Returns the ion chromatogram at the specified index.
get_ic_at_mass
([mass])Returns the ion chromatogram for the nearest binned mass to the specified mass.
get_index_at_time
(time)Returns the nearest index corresponding to the given time.
get_index_of_mass
(mass)Returns the index of the nearest binned mass to the given mass.
Returns binned mass at index.
get_ms_at_index
(ix)Returns a mass spectrum for a given scan index.
Returns the spectral intensities for scan index.
Returns time at given index.
Iterate over column indices.
Iterates over row indices.
null_mass
(mass)Ignore given (closest) mass in spectra.
reduce_mass_spectra
([n_intensities])Reduces the mass spectra by retaining the top n_intensities, discarding all other intensities.
set_ic_at_index
(ix, ic)Sets the intensity of the mass at index
ix
in each scan to a new value.Attributes:
Constructs a Base Peak Chromatogram from the data.
Returns an
IonChromatogram
object representing this EIC.Returns a copy of the intensity array.
Returns a copy of the intensity array as a list of lists of floats.
Returns a copy of the intensity matrix.
Returns a list of the masses.
Returns the intensity matrix as a list of lists of floats.
Returns the maximum m/z value in the spectrum.
Returns the minimum m/z value in the spectrum.
Gets the size of intensity matrix.
Returns a copy of the time list.
-
__eq__
(other) Return whether this intensity matrix object is equal to another object.
- Parameters
other – The other object to test equality with.
- Return type
-
property
bpc
Constructs a Base Peak Chromatogram from the data.
This represents the most intense ion – out of those used to create the
ExtractedIntensityMatrix
– for each scan.- Authors
Dominic Davis-Foster
New in version 2.3.0.
- Return type
-
crop_mass
(mass_min, mass_max) Crops mass spectrum.
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
property
eic
Returns an
IonChromatogram
object representing this EIC.- Return type
-
get_ic_at_index
(ix) Returns the ion chromatogram at the specified index.
- Parameters
ix (
int
) – Index of an ion chromatogram in the intensity data matrix.- Return type
- Returns
Ion chromatogram at given index.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
-
get_ic_at_mass
(mass=None)[source] Returns the ion chromatogram for the nearest binned mass to the specified mass.
If no mass value is given, the function returns the extracted ion chromatogram.
-
get_index_at_time
(time) Returns the nearest index corresponding to the given time.
- Parameters
time (
float
) – Time in seconds- Return type
- Returns
Nearest index corresponding to given time
- Authors
Lewis Lee, Tim Erwin, Vladimir Likic
Changed in version 2.3.0: Now returns
-1
if no index is found.
-
get_index_of_mass
(mass) Returns the index of the nearest binned mass to the given mass.
-
get_mass_at_index
(ix) Returns binned mass at index.
-
get_ms_at_index
(ix) Returns a mass spectrum for a given scan index.
- Parameters
ix (
int
) – The index of the scan.- Author
Andrew Isaac
- Return type
-
get_scan_at_index
(ix) Returns the spectral intensities for scan index.
-
get_time_at_index
(ix) Returns time at given index.
-
property
intensity_array
Returns a copy of the intensity array.
- Return type
- Returns
Matrix of intensity values.
- Authors
Andrew Isaac, Lewis Lee
-
property
intensity_array_list
Returns a copy of the intensity array as a list of lists of floats.
-
property
intensity_matrix
Returns a copy of the intensity matrix.
- Return type
- Returns
Matrix of intensity values.
- Author
Andrew Isaac
-
property
mass_list
Returns a list of the masses.
-
property
matrix_list
Returns the intensity matrix as a list of lists of floats.
- Return type
- Returns
Matrix of intensity values
- Author
Andrew Isaac
-
property
max_mass
Returns the maximum m/z value in the spectrum.
-
property
min_mass
Returns the minimum m/z value in the spectrum.
-
null_mass
(mass) Ignore given (closest) mass in spectra.
- Parameters
mass (
float
) – Mass value to remove- Author
Andrew Isaac
-
reduce_mass_spectra
(n_intensities=5) Reduces the mass spectra by retaining the top n_intensities, discarding all other intensities.
- Parameters
n_intensities (
int
) – The number of top intensities to keep. Default5
.- Author
Vladimir Likic
-
set_ic_at_index
(ix, ic) Sets the intensity of the mass at index
ix
in each scan to a new value.- Parameters
ix (
int
) – Index of an ion chromatogram in the intensity data matrix to be setic (
IonChromatogram
) – Ion chromatogram that will be copied at positionix
in the data matrix
The length of the ion chromatogram must match the appropriate dimension of the intensity matrix.
- Author
Vladimir Likic
-
property
size
Gets the size of intensity matrix.
-
build_extracted_intensity_matrix
(im, masses, left_bound=0.5, right_bound=0.5)[source] Given an intensity matrix and a list of masses, construct a
ExtractedIntensityMatrix
for those masses.The masses can either be:
single masses (of type
float
),an iterable of masses.
left_bound
andright_bound
specify a range in which to include values for around each mass. For example, a mass of169
with bounds of0.3
and0.7
would include every mass between168.7
and169.7
(inclusive on both sides).Set the bounds to
0
to include only the given masses.- Parameters
- Return type
pyms.Experiment
Models a GC-MS experiment, represented by a list of signal peaks.
Classes:
|
Models an experiment. |
Functions:
|
Loads an experiment saved with |
|
Reads the set of experiment files and returns a list of |
-
class
Experiment
(expr_code, peak_list)[source] Bases:
pymsBaseClass
Models an experiment.
- Parameters
- Author
Vladimir Likic, Andrew Isaac, Dominic Davis-Foster (type assertions, properties and pathlib support)
Methods:
__copy__
()Returns a new Experiment object containing a copy of the data in this object.
__deepcopy__
([memodict])Returns a new Experiment object containing a copy of the data in this object.
__eq__
(other)Return whether this Experiment object is equal to another object.
__len__
()Returns the number of peaks in the Experiment.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.sele_rt_range
(rt_range)Discards all peaks which have the retention time outside the specified range.
Attributes:
Returns the expr_code of the experiment.
Returns the peak list.
-
__copy__
()[source] Returns a new Experiment object containing a copy of the data in this object.
- Return type
-
__deepcopy__
(memodict={})[source] Returns a new Experiment object containing a copy of the data in this object.
- Return type
-
__eq__
(other)[source] Return whether this Experiment object is equal to another object.
- Parameters
other – The other object to test equality with.
- Return type
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
load_expr
(file_name)[source] Loads an experiment saved with
pyms.Experiment.Experiment.dump()
.
-
read_expr_list
(file_name)[source] Reads the set of experiment files and returns a list of
pyms.Experiment.Experiment
objects.
pyms.Gapfill
Table of Contents
Gap Filling Routines.
pyms.Gapfill.Class
Provides a class for handling Missing Peaks in an output file (i.e. area.csv
).
Classes:
|
Class to encapsulate a peak object identified as missing in the output area matrix fom PyMassSpec. |
|
A collection of MissingPeak objects. |
-
class
MissingPeak
(common_ion, qual_ion_1, qual_ion_2, rt=0.0)[source] Bases:
object
Class to encapsulate a peak object identified as missing in the output area matrix fom PyMassSpec.
- Parameters
- Authors
Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster
Attributes:
Returns the common ion for the peak object across an experiment.
The area of the common ion
The retention time of the apex of the peak
Returns the top (most abundant) ion for the peak object.
Returns the second most abundant ion for the peak object.
Returns the retention time of the peak.
-
property
common_ion
Returns the common ion for the peak object across an experiment.
- Return type
- Returns
Common ion for the peak
- Author
Jairus Bowne
-
property
qual_ion1
Returns the top (most abundant) ion for the peak object.
- Return type
- Returns
Most abundant ion
- Author
Jairus Bowne
-
property
qual_ion2
Returns the second most abundant ion for the peak object.
- Return type
- Returns
Second most abundant ion
- Author
Jairus Bowne
-
class
Sample
(sample_name, matrix_position)[source] Bases:
object
A collection of MissingPeak objects.
- Parameters
- Authors
Sean O’Callaghan, Dominic Davis-Foster (properties)
Methods:
add_missing_peak
(missing_peak)Add a new MissingPeak object to the Sample.
Returns a dictionary containing
average_rt : exact_rt
pairs.Attributes:
Returns a list of the MissingPeak objects in the Sample object.
Returns name of the sample.
Returns a dictionary containing
rt : area
pairs.-
add_missing_peak
(missing_peak)[source] Add a new MissingPeak object to the Sample.
- Parameters
missing_peak (
MissingPeak
) – The missing peak object to be added.
-
property
missing_peaks
Returns a list of the MissingPeak objects in the Sample object.
- Return type
pyms.Gapfill.Function
Functions to fill missing peak objects.
Classes:
|
Flag to indicate the filetype for |
Functions:
|
Convert a .csv file to a pandas DataFrame. |
|
Integrates raw data around missing peak locations to fill |
|
Finds the |
|
Creates a new |
|
Creates a new rt.csv file, replacing |
-
enum
MissingPeakFiletype
(value)[source] Bases:
enum_tools.custom_enums.IntEnum
Flag to indicate the filetype for
pyms.Gapfill.Function.missing_peak_finder()
.New in version 2.3.0.
- Member Type
Valid values are as follows:
-
MZML
= <MissingPeakFiletype.MZML: 1>
-
NETCDF
= <MissingPeakFiletype.NETCDF: 2>
-
file2dataframe
(file_name)[source] Convert a .csv file to a pandas DataFrame.
- Parameters
- Authors
Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster (pathlib support)
New in version 2.3.0.
- Return type
-
missing_peak_finder
(sample, file_name, points=3, null_ions=None, crop_ions=None, threshold=1000, rt_window=1, filetype=<MissingPeakFiletype.MZML: 1>)[source] Integrates raw data around missing peak locations to fill
NA
s in the data matrix.- Parameters
sample (
Sample
) – The sample object containing missing peaksfile_name (
str
) – Name of the raw data filepoints (
int
) – Peak finding - Peak if maxima over ‘points’ number of scans. Default3
.null_ions (
Optional
[List
]) – Ions to be deleted in the matrix. Default[73, 147]
.crop_ions (
Optional
[List
]) – Range of Ions to be considered. Default[50, 540]
.threshold (
int
) – Minimum intensity of IonChromatogram allowable to fill. Default1000
.rt_window (
float
) – Window in seconds around average RT to look for. Default1
.filetype (
MissingPeakFiletype
) – Default<MissingPeakFiletype.MZML: 1>
.
- Author
Sean O’Callaghan
-
mp_finder
(input_matrix)[source] Finds the
'NA'
s in the transformedarea_ci.csv
file and makespyms.Gapfill.Class.Sample
objects with them
-
write_filled_csv
(sample_list, area_file, filled_area_file)[source] Creates a new
area_ci.csv
file, replacing NAs with values from the sample_list objects where possible.
pyms.GCMS
Table of Contents
Module to handle raw data.
Class to model GC-MS data.
Classes:
|
Generic object for GC-MS data. |
Data:
-
class
GCMS_data
(time_list, scan_list)[source] Bases:
pymsBaseClass
,TimeListMixin
,MaxMinMassMixin
,GetIndexTimeMixin
Generic object for GC-MS data.
Contains the raw data as a list of scans and a list of times.
- Parameters
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster (type assertions and properties)
Methods:
__eq__
(other)Return whether this GCMS_data object is equal to another object.
__len__
()Returns the length of the data object, defined as the number of scans.
__repr__
()Return a string representation of the
GCMS_data
.__str__
()Return
str(self)
.dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.get_index_at_time
(time)Returns the nearest index corresponding to the given time.
Returns time at given index.
info
([print_scan_n])Prints some information about the data.
trim
([begin, end])Trims data in the time domain.
write
(file_root)Writes the entire raw data to two CSV files:
write_intensities_stream
(file_name)Loop over all scans and, for each scan, write the intensities to the given file, one intensity per line.
Attributes:
Returns the maximum m/z value in the spectrum.
Returns the maximum retention time for the data in seconds.
Returns the minimum m/z value in the spectrum.
Returns the minimum retention time for the data in seconds.
Return a list of the scan objects.
Returns the total ion chromatogram.
Return a copy of the time list.
Returns the time step of the data.
Returns the standard deviation of the time step of the data.
-
__eq__
(other)[source] Return whether this GCMS_data object is equal to another object.
- Parameters
other – The other object to test equality with.
- Return type
-
__len__
()[source] Returns the length of the data object, defined as the number of scans.
- Author
Vladimir Likic
- Return type
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
get_index_at_time
(time) Returns the nearest index corresponding to the given time.
- Parameters
time (
float
) – Time in seconds- Return type
- Returns
Nearest index corresponding to given time
- Authors
Lewis Lee, Tim Erwin, Vladimir Likic
Changed in version 2.3.0: Now returns
-1
if no index is found.
-
get_time_at_index
(ix) Returns time at given index.
-
property
max_mass
Returns the maximum m/z value in the spectrum.
-
property
min_mass
Returns the minimum m/z value in the spectrum.
-
property
scan_list
Return a list of the scan objects.
-
property
tic
Returns the total ion chromatogram.
- Author
Andrew Isaac
- Return type
-
property
time_step_std
Returns the standard deviation of the time step of the data.
- Return type
-
trim
(begin=None, end=None)[source] Trims data in the time domain.
The arguments
begin
andend
can be either integers (in which case they are taken as the first/last scan number for trimming) or strings in which case they are treated as time strings and converted to scan numbers.At present both
begin
andend
must be of the same type, either both scan numbers or time strings.At least one of
begin
andend
is required.
-
write
(file_root)[source] Writes the entire raw data to two CSV files:
<file_root>.I.csv
, containing the intensities; and<file_root>.mz.csv
, containing the corresponding m/z values.
In general these are not two-dimensional matrices, because different scans may have different numbers of m/z values recorded.
pyms.GCMS.Function
Provides conversion and information functions for GC-MS data objects.
Functions:
|
Compares two GCMS_data objects. |
|
Converts the window selection parameter into points based on the time step in an ion chromatogram. |
-
ic_window_points
(ic, window_sele, half_window=False)[source] Converts the window selection parameter into points based on the time step in an ion chromatogram.
- Parameters
ic (
IonChromatogram
) – ion chromatogram object relevant for the conversionwindow_sele (
Union
[int
,str
]) – The window selection parameter. This can be an integer or time string. If an integer, taken as the number of points. If a string, must of the form'<NUMBER>s'
or'<NUMBER>m'
, specifying a time in seconds or minutes, respectivelyhalf_window (
bool
) – Specifies whether to return half-window. DefaultFalse
.
- Author
Vladimir Likic
- Return type
pyms.GCMS.IO
Input/output functions for GC-MS data files.
pyms.GCMS.IO.ANDI
Functions for reading ANDI-MS data files.
Functions:
|
A reader for ANDI-MS NetCDF files. |
pyms.GCMS.IO.JCAMP
Functions for I/O of data in JCAMP-DX format.
Functions:
|
Generic reader for JCAMP DX files. |
pyms.GCMS.IO.MZML
Functions for reading mzML format data files.
Functions:
|
A reader for mzML files. |
pyms.IntensityMatrix
Class to model Intensity Matrix.
Classes:
|
Enumeration of supported ASCII filetypes for |
|
Base class for intensity matrices of binned raw data. |
|
Intensity matrix of binned raw data. |
Functions:
|
Sets the full intensity matrix with flexible bins. |
|
Sets the full intensity matrix with integer bins. |
|
Imports data in LECO CSV format. |
-
enum
AsciiFiletypes
(value)[source] Bases:
enum_tools.custom_enums.IntEnum
Enumeration of supported ASCII filetypes for
export_ascii()
.New in version 2.3.0.
- Member Type
Valid values are as follows:
-
ASCII_DAT
= <AsciiFiletypes.ASCII_DAT: 1> Tab-delimited ASCII file
-
ASCII_CSV
= <AsciiFiletypes.ASCII_CSV: 0> Comma-separated values file
-
class
BaseIntensityMatrix
(time_list, mass_list, intensity_array)[source] Bases:
pymsBaseClass
,TimeListMixin
,MassListMixin
,IntensityArrayMixin
,GetIndexTimeMixin
Base class for intensity matrices of binned raw data.
- Parameters
- Authors
Andrew Isaac, Dominic Davis-Foster (type assertions and properties)
Methods:
__eq__
(other)Return whether this intensity matrix object is equal to another object.
__len__
()Returns the number of scans in the intensity matrix.
crop_mass
(mass_min, mass_max)Crops mass spectrum.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.get_ic_at_index
(ix)Returns the ion chromatogram at the specified index.
get_index_at_time
(time)Returns the nearest index corresponding to the given time.
get_index_of_mass
(mass)Returns the index of the nearest binned mass to the given mass.
Returns binned mass at index.
get_ms_at_index
(ix)Returns a mass spectrum for a given scan index.
Returns the spectral intensities for scan index.
Returns time at given index.
Iterate over column indices.
Iterates over row indices.
null_mass
(mass)Ignore given (closest) mass in spectra.
reduce_mass_spectra
([n_intensities])Reduces the mass spectra by retaining the top n_intensities, discarding all other intensities.
set_ic_at_index
(ix, ic)Sets the intensity of the mass at index
ix
in each scan to a new value.Attributes:
Returns a copy of the intensity array.
Returns a copy of the intensity array as a list of lists of floats.
Returns a copy of the intensity matrix.
Returns a list of the masses.
Returns the intensity matrix as a list of lists of floats.
Returns the maximum m/z value in the spectrum.
Returns the minimum m/z value in the spectrum.
Gets the size of intensity matrix.
Returns a copy of the time list.
-
__eq__
(other)[source] Return whether this intensity matrix object is equal to another object.
- Parameters
other – The other object to test equality with.
- Return type
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
get_ic_at_index
(ix)[source] Returns the ion chromatogram at the specified index.
- Parameters
ix (
int
) – Index of an ion chromatogram in the intensity data matrix.- Return type
- Returns
Ion chromatogram at given index.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
-
get_index_at_time
(time) Returns the nearest index corresponding to the given time.
- Parameters
time (
float
) – Time in seconds- Return type
- Returns
Nearest index corresponding to given time
- Authors
Lewis Lee, Tim Erwin, Vladimir Likic
Changed in version 2.3.0: Now returns
-1
if no index is found.
-
get_ms_at_index
(ix)[source] Returns a mass spectrum for a given scan index.
- Parameters
ix (
int
) – The index of the scan.- Author
Andrew Isaac
- Return type
-
get_time_at_index
(ix) Returns time at given index.
-
property
intensity_array
Returns a copy of the intensity array.
- Return type
- Returns
Matrix of intensity values.
- Authors
Andrew Isaac, Lewis Lee
-
property
intensity_array_list
Returns a copy of the intensity array as a list of lists of floats.
-
property
intensity_matrix
Returns a copy of the intensity matrix.
- Return type
- Returns
Matrix of intensity values.
- Author
Andrew Isaac
-
property
mass_list
Returns a list of the masses.
-
property
matrix_list
Returns the intensity matrix as a list of lists of floats.
- Return type
- Returns
Matrix of intensity values
- Author
Andrew Isaac
-
property
max_mass
Returns the maximum m/z value in the spectrum.
-
property
min_mass
Returns the minimum m/z value in the spectrum.
-
null_mass
(mass)[source] Ignore given (closest) mass in spectra.
- Parameters
mass (
float
) – Mass value to remove- Author
Andrew Isaac
-
reduce_mass_spectra
(n_intensities=5)[source] Reduces the mass spectra by retaining the top n_intensities, discarding all other intensities.
- Parameters
n_intensities (
int
) – The number of top intensities to keep. Default5
.- Author
Vladimir Likic
-
set_ic_at_index
(ix, ic)[source] Sets the intensity of the mass at index
ix
in each scan to a new value.- Parameters
ix (
int
) – Index of an ion chromatogram in the intensity data matrix to be setic (
IonChromatogram
) – Ion chromatogram that will be copied at positionix
in the data matrix
The length of the ion chromatogram must match the appropriate dimension of the intensity matrix.
- Author
Vladimir Likic
-
property
size
Gets the size of intensity matrix.
-
class
IntensityMatrix
(time_list, mass_list, intensity_array)[source] Bases:
BaseIntensityMatrix
Intensity matrix of binned raw data.
- Parameters
- Authors
Andrew Isaac, Dominic Davis-Foster (type assertions and properties)
Methods:
__eq__
(other)Return whether this intensity matrix object is equal to another object.
__len__
()Returns the number of scans in the intensity matrix.
crop_mass
(mass_min, mass_max)Crops mass spectrum.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.export_ascii
(root_name[, fmt])Exports the intensity matrix, retention time vector, and m/z vector to the ascii format.
export_leco_csv
(file_name)Exports data in LECO CSV format.
get_ic_at_index
(ix)Returns the ion chromatogram at the specified index.
get_ic_at_mass
([mass])Returns the ion chromatogram for the nearest binned mass to the specified mass.
get_index_at_time
(time)Returns the nearest index corresponding to the given time.
get_index_of_mass
(mass)Returns the index of the nearest binned mass to the given mass.
Returns binned mass at index.
get_ms_at_index
(ix)Returns a mass spectrum for a given scan index.
Returns the spectral intensities for scan index.
Returns time at given index.
Iterate over local column indices.
Iterates over the local row indices.
null_mass
(mass)Ignore given (closest) mass in spectra.
reduce_mass_spectra
([n_intensities])Reduces the mass spectra by retaining the top n_intensities, discarding all other intensities.
set_ic_at_index
(ix, ic)Sets the intensity of the mass at index
ix
in each scan to a new value.Attributes:
Constructs a Base Peak Chromatogram from the data.
Returns a copy of the intensity array.
Returns a copy of the intensity array as a list of lists of floats.
Returns a copy of the intensity matrix.
Gets the local size of intensity matrix.
Returns a list of the masses.
Returns the intensity matrix as a list of lists of floats.
Returns the maximum m/z value in the spectrum.
Returns the minimum m/z value in the spectrum.
Gets the size of intensity matrix.
Returns the TIC of the intensity matrix.
Returns a copy of the time list.
-
__eq__
(other) Return whether this intensity matrix object is equal to another object.
- Parameters
other – The other object to test equality with.
- Return type
-
property
bpc
Constructs a Base Peak Chromatogram from the data.
This represents the most intense ion for each scan.
- Authors
Dominic Davis-Foster
New in version 2.3.0.
- Return type
-
crop_mass
(mass_min, mass_max) Crops mass spectrum.
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
export_ascii
(root_name, fmt=<AsciiFiletypes.ASCII_DAT: 1>)[source] Exports the intensity matrix, retention time vector, and m/z vector to the ascii format.
By default, export_ascii(“NAME”) will create NAME.im.dat, NAME.rt.dat, and NAME.mz.dat where these are the intensity matrix, retention time vector, and m/z vector in tab delimited format.
If
format
==<AsciiFiletypes.ASCII_CSV>
, the files will be in the CSV format, named NAME.im.csv, NAME.rt.csv, and NAME.mz.csv.- Parameters
root_name (
Union
[str
,Path
,PathLike
]) – Root name for the output filesfmt (
AsciiFiletypes
) – Format of the output file, either<AsciiFiletypes.ASCII_DAT>
or<AsciiFiletypes.ASCII_CSV>
. Default<AsciiFiletypes.ASCII_DAT: 1>
.
- Authors
Milica Ng, Andrew Isaac, Vladimir Likic, Dominic Davis-Foster (pathlib support)
-
get_ic_at_index
(ix) Returns the ion chromatogram at the specified index.
- Parameters
ix (
int
) – Index of an ion chromatogram in the intensity data matrix.- Return type
- Returns
Ion chromatogram at given index.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
-
get_ic_at_mass
(mass=None)[source] Returns the ion chromatogram for the nearest binned mass to the specified mass.
If no mass value is given, the function returns the total ion chromatogram.
-
get_index_at_time
(time) Returns the nearest index corresponding to the given time.
- Parameters
time (
float
) – Time in seconds- Return type
- Returns
Nearest index corresponding to given time
- Authors
Lewis Lee, Tim Erwin, Vladimir Likic
Changed in version 2.3.0: Now returns
-1
if no index is found.
-
get_index_of_mass
(mass) Returns the index of the nearest binned mass to the given mass.
-
get_mass_at_index
(ix) Returns binned mass at index.
-
get_ms_at_index
(ix) Returns a mass spectrum for a given scan index.
- Parameters
ix (
int
) – The index of the scan.- Author
Andrew Isaac
- Return type
-
get_scan_at_index
(ix) Returns the spectral intensities for scan index.
-
get_time_at_index
(ix) Returns time at given index.
-
property
intensity_array
Returns a copy of the intensity array.
- Return type
- Returns
Matrix of intensity values.
- Authors
Andrew Isaac, Lewis Lee
-
property
intensity_array_list
Returns a copy of the intensity array as a list of lists of floats.
-
property
intensity_matrix
Returns a copy of the intensity matrix.
- Return type
- Returns
Matrix of intensity values.
- Author
Andrew Isaac
-
property
local_size
Gets the local size of intensity matrix.
- Returns
Number of rows and cols
- Return type
- Author
Luke Hodkinson
-
property
mass_list
Returns a list of the masses.
-
property
matrix_list
Returns the intensity matrix as a list of lists of floats.
- Return type
- Returns
Matrix of intensity values
- Author
Andrew Isaac
-
property
max_mass
Returns the maximum m/z value in the spectrum.
-
property
min_mass
Returns the minimum m/z value in the spectrum.
-
null_mass
(mass) Ignore given (closest) mass in spectra.
- Parameters
mass (
float
) – Mass value to remove- Author
Andrew Isaac
-
reduce_mass_spectra
(n_intensities=5) Reduces the mass spectra by retaining the top n_intensities, discarding all other intensities.
- Parameters
n_intensities (
int
) – The number of top intensities to keep. Default5
.- Author
Vladimir Likic
-
set_ic_at_index
(ix, ic) Sets the intensity of the mass at index
ix
in each scan to a new value.- Parameters
ix (
int
) – Index of an ion chromatogram in the intensity data matrix to be setic (
IonChromatogram
) – Ion chromatogram that will be copied at positionix
in the data matrix
The length of the ion chromatogram must match the appropriate dimension of the intensity matrix.
- Author
Vladimir Likic
-
property
size
Gets the size of intensity matrix.
-
property
tic
Returns the TIC of the intensity matrix.
New in version 2.3.0.
- Return type
-
build_intensity_matrix
(data, bin_interval=1, bin_left=0.5, bin_right=0.5, min_mass=None)[source] Sets the full intensity matrix with flexible bins.
The first bin is centered around
min_mass
, and subsequent bins are offset bybin_interval
.- Parameters
data (
GCMS_data
) – Raw GCMS databin_interval (
float
) – interval between bin centres. Default1
.bin_left (
float
) – left bin boundary offset. Default0.5
.bin_right (
float
) – right bin boundary offset. Default0.5
.min_mass (
Optional
[float
]) – Minimum mass to bin (default minimum mass from data). DefaultNone
.
- Return type
- Returns
Binned IntensityMatrix object
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
pyms.IonChromatogram
Classes to model a GC-MS Ion Chromatogram.
Classes:
|
Models a base peak chromatogram (BPC). |
|
Models an extracted ion chromatogram (EIC). |
|
Models an ion chromatogram. |
-
class
BasePeakChromatogram
(intensity_list, time_list)[source] Bases:
IonChromatogram
Models a base peak chromatogram (BPC).
An ion chromatogram is a set of intensities as a function of retention time. This can can be either m/z channel intensities (for example, ion chromatograms at
m/z = 65
), or cumulative intensities over all measured m/z. In the latter case the ion chromatogram is total ion chromatogram (TIC).- Parameters
- Authors
Lewis Lee, Vladimir Likic, Dominic Davis-Foster (type assertions and properties)
New in version 2.3.0.
Methods:
__copy__
()Returns a new IonChromatogram containing a copy of the data in this object.
__eq__
(other)Return whether this IonChromatogram object is equal to another object.
__len__
()Returns the length of the IonChromatogram object.
__sub__
(other)Subtracts another IC from the current one (in place).
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.get_index_at_time
(time)Returns the nearest index corresponding to the given time.
Returns the intensity at the given index.
Returns time at given index.
is_bpc
()Returns whether the ion chromatogram is a base peak chromatogram (BPC).
is_eic
()Returns whether the ion chromatogram is an extracted ion chromatogram (EIC).
is_tic
()Returns whether the ion chromatogram is a total ion chromatogram (TIC) or extracted ion chromatogram (EIC).
write
(file_name[, minutes, formatting])Writes the ion chromatogram to the specified file.
Attributes:
Returns a copy of the intensity array.
Returns a copy of the intensity array as a list of lists of floats.
Returns a copy of the intensity matrix.
Returns the m/z channel of the IC.
Returns the intensity matrix as a list of lists of floats.
Returns a copy of the time list.
Returns the time step.
-
__copy__
() Returns a new IonChromatogram containing a copy of the data in this object.
- Return type
-
__eq__
(other) Return whether this IonChromatogram object is equal to another object.
-
__len__
() Returns the length of the IonChromatogram object.
- Authors
Lewis Lee, Vladimir Likic
- Return type
-
__sub__
(other) Subtracts another IC from the current one (in place).
- Parameters
other (
IonChromatogram
) – Another IC- Return type
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
get_index_at_time
(time) Returns the nearest index corresponding to the given time.
- Parameters
time (
float
) – Time in seconds- Return type
- Returns
Nearest index corresponding to given time
- Authors
Lewis Lee, Tim Erwin, Vladimir Likic
Changed in version 2.3.0: Now returns
-1
if no index is found.
-
get_intensity_at_index
(ix) Returns the intensity at the given index.
-
get_time_at_index
(ix) Returns time at given index.
-
property
intensity_array
Returns a copy of the intensity array.
- Return type
- Returns
Matrix of intensity values.
- Authors
Andrew Isaac, Lewis Lee
-
property
intensity_array_list
Returns a copy of the intensity array as a list of lists of floats.
-
property
intensity_matrix
Returns a copy of the intensity matrix.
- Return type
- Returns
Matrix of intensity values.
- Author
Andrew Isaac
-
static
is_bpc
()[source] Returns whether the ion chromatogram is a base peak chromatogram (BPC).
- Return type
-
static
is_eic
() Returns whether the ion chromatogram is an extracted ion chromatogram (EIC).
New in version 2.3.0.
- Return type
-
is_tic
() Returns whether the ion chromatogram is a total ion chromatogram (TIC) or extracted ion chromatogram (EIC).
- Authors
Lewis Lee, Vladimir Likic
- Return type
-
property
mass
Returns the m/z channel of the IC.
-
property
matrix_list
Returns the intensity matrix as a list of lists of floats.
- Return type
- Returns
Matrix of intensity values
- Author
Andrew Isaac
-
property
time_list
Returns a copy of the time list.
-
write
(file_name, minutes=False, formatting=True) Writes the ion chromatogram to the specified file.
- Parameters
- Authors
Lewis Lee, Vladimir Likic, Dominic Davis-Foster (pathlib support)
-
class
ExtractedIonChromatogram
(intensity_list, time_list, masses)[source] Bases:
IonChromatogram
Models an extracted ion chromatogram (EIC).
An ion chromatogram is a set of intensities as a function of retention time. This can can be either m/z channel intensities (for example, ion chromatograms at
m/z = 65
), or cumulative intensities over all measured m/z. In the latter case the ion chromatogram is total ion chromatogram (TIC).- Parameters
- Authors
Lewis Lee, Vladimir Likic, Dominic Davis-Foster (type assertions and properties)
New in version 2.3.0.
Methods:
__copy__
()Returns a new IonChromatogram containing a copy of the data in this object.
__eq__
(other)Return whether this IonChromatogram object is equal to another object.
__len__
()Returns the length of the IonChromatogram object.
__sub__
(other)Subtracts another IC from the current one (in place).
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.get_index_at_time
(time)Returns the nearest index corresponding to the given time.
Returns the intensity at the given index.
Returns time at given index.
is_bpc
()Returns whether the ion chromatogram is a base peak chromatogram (BPC).
is_eic
()Returns whether the ion chromatogram is an extracted ion chromatogram (EIC).
is_tic
()Returns whether the ion chromatogram is a total ion chromatogram (TIC) or extracted ion chromatogram (EIC).
write
(file_name[, minutes, formatting])Writes the ion chromatogram to the specified file.
Attributes:
Returns a copy of the intensity array.
Returns a copy of the intensity array as a list of lists of floats.
Returns a copy of the intensity matrix.
Returns the m/z channel of the IC.
List of extracted masses in the EIC.
Returns the intensity matrix as a list of lists of floats.
Returns a copy of the time list.
Returns the time step.
-
__copy__
() Returns a new IonChromatogram containing a copy of the data in this object.
- Return type
-
__eq__
(other) Return whether this IonChromatogram object is equal to another object.
-
__len__
() Returns the length of the IonChromatogram object.
- Authors
Lewis Lee, Vladimir Likic
- Return type
-
__sub__
(other) Subtracts another IC from the current one (in place).
- Parameters
other (
IonChromatogram
) – Another IC- Return type
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
get_index_at_time
(time) Returns the nearest index corresponding to the given time.
- Parameters
time (
float
) – Time in seconds- Return type
- Returns
Nearest index corresponding to given time
- Authors
Lewis Lee, Tim Erwin, Vladimir Likic
Changed in version 2.3.0: Now returns
-1
if no index is found.
-
get_intensity_at_index
(ix) Returns the intensity at the given index.
-
get_time_at_index
(ix) Returns time at given index.
-
property
intensity_array
Returns a copy of the intensity array.
- Return type
- Returns
Matrix of intensity values.
- Authors
Andrew Isaac, Lewis Lee
-
property
intensity_array_list
Returns a copy of the intensity array as a list of lists of floats.
-
property
intensity_matrix
Returns a copy of the intensity matrix.
- Return type
- Returns
Matrix of intensity values.
- Author
Andrew Isaac
-
static
is_bpc
() Returns whether the ion chromatogram is a base peak chromatogram (BPC).
New in version 2.3.0.
- Return type
-
static
is_eic
()[source] Returns whether the ion chromatogram is an extracted ion chromatogram (EIC).
- Return type
-
is_tic
() Returns whether the ion chromatogram is a total ion chromatogram (TIC) or extracted ion chromatogram (EIC).
- Authors
Lewis Lee, Vladimir Likic
- Return type
-
property
mass
Returns the m/z channel of the IC.
-
property
matrix_list
Returns the intensity matrix as a list of lists of floats.
- Return type
- Returns
Matrix of intensity values
- Author
Andrew Isaac
-
property
time_list
Returns a copy of the time list.
-
write
(file_name, minutes=False, formatting=True) Writes the ion chromatogram to the specified file.
- Parameters
- Authors
Lewis Lee, Vladimir Likic, Dominic Davis-Foster (pathlib support)
-
class
IonChromatogram
(intensity_list, time_list, mass=None)[source] Bases:
pymsBaseClass
,TimeListMixin
,IntensityArrayMixin
,GetIndexTimeMixin
Models an ion chromatogram.
An ion chromatogram is a set of intensities as a function of retention time. This can can be either m/z channel intensities (for example, ion chromatograms at
m/z = 65
), or cumulative intensities over all measured m/z. In the latter case the ion chromatogram is total ion chromatogram (TIC).The nature of an IonChromatogram object can be revealed by inspecting the value of the attribute ‘mass’. This is set to the m/z value of the ion chromatogram, or to
None
for TIC.- Parameters
- Authors
Lewis Lee, Vladimir Likic, Dominic Davis-Foster (type assertions and properties)
Changed in version 2.3.0: The
ia
parameter was renamed tointensity_list
.Methods:
__copy__
()Returns a new IonChromatogram containing a copy of the data in this object.
__eq__
(other)Return whether this IonChromatogram object is equal to another object.
__len__
()Returns the length of the IonChromatogram object.
__sub__
(other)Subtracts another IC from the current one (in place).
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.get_index_at_time
(time)Returns the nearest index corresponding to the given time.
Returns the intensity at the given index.
Returns time at given index.
is_bpc
()Returns whether the ion chromatogram is a base peak chromatogram (BPC).
is_eic
()Returns whether the ion chromatogram is an extracted ion chromatogram (EIC).
is_tic
()Returns whether the ion chromatogram is a total ion chromatogram (TIC) or extracted ion chromatogram (EIC).
write
(file_name[, minutes, formatting])Writes the ion chromatogram to the specified file.
Attributes:
Returns a copy of the intensity array.
Returns a copy of the intensity array as a list of lists of floats.
Returns a copy of the intensity matrix.
Returns the m/z channel of the IC.
Returns the intensity matrix as a list of lists of floats.
Returns a copy of the time list.
Returns the time step.
-
__copy__
()[source] Returns a new IonChromatogram containing a copy of the data in this object.
- Return type
-
__len__
()[source] Returns the length of the IonChromatogram object.
- Authors
Lewis Lee, Vladimir Likic
- Return type
-
__sub__
(other)[source] Subtracts another IC from the current one (in place).
- Parameters
other (
IonChromatogram
) – Another IC- Return type
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
get_index_at_time
(time) Returns the nearest index corresponding to the given time.
- Parameters
time (
float
) – Time in seconds- Return type
- Returns
Nearest index corresponding to given time
- Authors
Lewis Lee, Tim Erwin, Vladimir Likic
Changed in version 2.3.0: Now returns
-1
if no index is found.
-
get_time_at_index
(ix) Returns time at given index.
-
property
intensity_array
Returns a copy of the intensity array.
- Return type
- Returns
Matrix of intensity values.
- Authors
Andrew Isaac, Lewis Lee
-
property
intensity_array_list
Returns a copy of the intensity array as a list of lists of floats.
-
property
intensity_matrix
Returns a copy of the intensity matrix.
- Return type
- Returns
Matrix of intensity values.
- Author
Andrew Isaac
-
static
is_bpc
()[source] Returns whether the ion chromatogram is a base peak chromatogram (BPC).
New in version 2.3.0.
- Return type
-
static
is_eic
()[source] Returns whether the ion chromatogram is an extracted ion chromatogram (EIC).
New in version 2.3.0.
- Return type
-
is_tic
()[source] Returns whether the ion chromatogram is a total ion chromatogram (TIC) or extracted ion chromatogram (EIC).
- Authors
Lewis Lee, Vladimir Likic
- Return type
-
property
mass
Returns the m/z channel of the IC.
-
property
matrix_list
Returns the intensity matrix as a list of lists of floats.
- Return type
- Returns
Matrix of intensity values
- Author
Andrew Isaac
-
property
time_list
Returns a copy of the time list.
pyms.json
Custom JSON Encoder to support PyMassSpec classes.
Classes:
|
Custom JSON Encoder to support PyMassSpec classes. |
-
class
PyMassSpecEncoder
(*args, **kwargs)[source] Bases:
JSONEncoder
Custom JSON Encoder to support PyMassSpec classes.
Methods:
default
(o)Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).encode
(o)Return a JSON string representation of a Python data structure.
iterencode
(o[, _one_shot])Encode the given object and yield each string representation as available.
-
default
(o)[source] Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
-
encode
(o) Return a JSON string representation of a Python data structure.
>>> from json.encoder import JSONEncoder >>> JSONEncoder().encode({"foo": ["bar", "baz"]}) '{"foo": ["bar", "baz"]}'
- Return type
-
pyms.Mixins
Mixins for PyMassSpec Classes.
Classes:
Mixin class for retention time attributes and methods. |
|
Mixin class for |
|
Mixin class to add the |
|
Mixin class to add the |
|
Mixin class to add the |
-
class
GetIndexTimeMixin
[source] Bases:
object
Mixin class for retention time attributes and methods.
Methods:
get_index_at_time
(time)Returns the nearest index corresponding to the given time.
Returns time at given index.
-
get_index_at_time
(time)[source] Returns the nearest index corresponding to the given time.
- Parameters
time (
float
) – Time in seconds- Return type
- Returns
Nearest index corresponding to given time
- Authors
Lewis Lee, Tim Erwin, Vladimir Likic
Changed in version 2.3.0: Now returns
-1
if no index is found.
-
-
class
IntensityArrayMixin
[source] Bases:
object
Mixin class for
intensity_array
attribute.Attributes:
Returns a copy of the intensity array.
Returns a copy of the intensity array as a list of lists of floats.
Returns a copy of the intensity matrix.
Returns the intensity matrix as a list of lists of floats.
-
property
intensity_array
Returns a copy of the intensity array.
- Return type
- Returns
Matrix of intensity values.
- Authors
Andrew Isaac, Lewis Lee
-
property
intensity_array_list
Returns a copy of the intensity array as a list of lists of floats.
-
property
intensity_matrix
Returns a copy of the intensity matrix.
- Return type
- Returns
Matrix of intensity values.
- Author
Andrew Isaac
-
property
-
class
MassListMixin
[source] Bases:
MaxMinMassMixin
Mixin class to add the
mass_list
property, which returns a copy of the internal_mass_list
attribute.Attributes:
Returns a list of the masses.
-
class
MaxMinMassMixin
[source] Bases:
object
Mixin class to add the
min_mass
andmax_mass
properties, which provide read-only access to the internal_min_mass
and_max_mass
attributes.Attributes:
Returns the maximum m/z value in the spectrum.
Returns the minimum m/z value in the spectrum.
-
property
max_mass
Returns the maximum m/z value in the spectrum.
-
property
pyms.Spectrum
Classes to model Mass Spectra and Scans.
Classes:
|
Represents a composite mass spectrum. |
|
Models a binned mass spectrum. |
|
Generic object for a single Scan’s raw data. |
Data:
Invariant |
|
Invariant |
|
Invariant |
Functions:
|
Convert the given numpy array to a numeric data type. |
|
Normalize the intensities in the given Mass Spectrum to values between |
-
class
CompositeMassSpectrum
(mass_list, intensity_list)[source] Bases:
MassSpectrum
Represents a composite mass spectrum.
- Parameters
- Author
Dominic Davis-Foster
Methods:
__copy__
()Returns a copy of the object.
__eq__
(other)Return whether this object is equal to another object.
__len__
()Returns the length of the object.
crop
([min_mz, max_mz, inplace])Crop the Mass Spectrum between the given mz values.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.from_dict
(dictionary)Create a
Scan
from a dictionary.from_jcamp
(file_name)Create a MassSpectrum from a JCAMP-DX file.
from_mz_int_pairs
(mz_int_pairs)Construct a MassSpectrum from a list of (m/z, intensity) tuples.
from_spectra
(spectra)Construct a
CompositeMassSpectrum
from multipleMassSpectrum
objects.get_intensity_for_mass
(mass)Returns the intensity for the given mass.
get_mass_for_intensity
(intensity)Returns the mass for the given intensity.
icrop
([min_index, max_index, inplace])Crop the Mass Spectrum between the given indices.
Iterate over the peaks in the mass spectrum.
Returns the indices of the
n
largest peaks in the Mass Spectrum.Attributes:
Returns a copy of the intensity list.
Returns a list of the masses.
Returns the intensity list.
Returns the maximum m/z value in the spectrum.
Returns the minimum m/z value in the spectrum.
The number of mass spectra combined to create this composite spectrum.
-
__eq__
(other) Return whether this object is equal to another object.
-
__len__
() Returns the length of the object.
- Authors
Andrew Isaac, Qiao Wang, Vladimir Likic
- Return type
-
crop
(min_mz=None, max_mz=None, inplace=False) Crop the Mass Spectrum between the given mz values.
- Parameters
- Return type
- Returns
The cropped Mass Spectrum
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
classmethod
from_dict
(dictionary) Create a
Scan
from a dictionary.The dictionary’s keys must match the arguments taken bt the class’s constructor.
-
classmethod
from_jcamp
(file_name) Create a MassSpectrum from a JCAMP-DX file.
-
classmethod
from_mz_int_pairs
(mz_int_pairs) Construct a MassSpectrum from a list of (m/z, intensity) tuples.
-
classmethod
from_spectra
(spectra)[source] Construct a
CompositeMassSpectrum
from multipleMassSpectrum
objects.If no
MassSpectrum
objects are given an emptyCompositeMassSpectrum
is returned.- Parameters
spectra (
Iterable
[MassSpectrum
])- Return type
-
get_intensity_for_mass
(mass) Returns the intensity for the given mass.
-
get_mass_for_intensity
(intensity) Returns the mass for the given intensity. If more than one mass has the given intensity, the first mass is returned.
-
icrop
(min_index=0, max_index=- 1, inplace=False) Crop the Mass Spectrum between the given indices.
- Parameters
- Return type
- Returns
The cropped Mass Spectrum
-
property
intensity_list
Returns a copy of the intensity list.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
- Return type
-
iter_peaks
() Iterate over the peaks in the mass spectrum.
-
property
mass_list
Returns a list of the masses.
-
property
mass_spec
Returns the intensity list.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
- Return type
-
property
max_mass
Returns the maximum m/z value in the spectrum.
-
property
min_mass
Returns the minimum m/z value in the spectrum.
-
n_largest_peaks
(n) Returns the indices of the
n
largest peaks in the Mass Spectrum.
-
class
MassSpectrum
(mass_list, intensity_list)[source] Bases:
Scan
Models a binned mass spectrum.
- Parameters
- Authors
Andrew Isaac, Qiao Wang, Vladimir Likic, Dominic Davis-Foster
Methods:
__copy__
()Returns a copy of the object.
__eq__
(other)Return whether this object is equal to another object.
__len__
()Returns the length of the object.
crop
([min_mz, max_mz, inplace])Crop the Mass Spectrum between the given mz values.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.from_dict
(dictionary)Create a
Scan
from a dictionary.from_jcamp
(file_name)Create a MassSpectrum from a JCAMP-DX file.
from_mz_int_pairs
(mz_int_pairs)Construct a MassSpectrum from a list of (m/z, intensity) tuples.
get_intensity_for_mass
(mass)Returns the intensity for the given mass.
get_mass_for_intensity
(intensity)Returns the mass for the given intensity.
icrop
([min_index, max_index, inplace])Crop the Mass Spectrum between the given indices.
Iterate over the peaks in the mass spectrum.
Returns the indices of the
n
largest peaks in the Mass Spectrum.Attributes:
Returns a copy of the intensity list.
Returns a list of the masses.
Returns the intensity list.
Returns the maximum m/z value in the spectrum.
Returns the minimum m/z value in the spectrum.
-
__eq__
(other) Return whether this object is equal to another object.
-
__len__
() Returns the length of the object.
- Authors
Andrew Isaac, Qiao Wang, Vladimir Likic
- Return type
-
crop
(min_mz=None, max_mz=None, inplace=False)[source] Crop the Mass Spectrum between the given mz values.
- Parameters
- Return type
- Returns
The cropped Mass Spectrum
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
classmethod
from_dict
(dictionary) Create a
Scan
from a dictionary.The dictionary’s keys must match the arguments taken bt the class’s constructor.
-
classmethod
from_mz_int_pairs
(mz_int_pairs)[source] Construct a MassSpectrum from a list of (m/z, intensity) tuples.
-
get_mass_for_intensity
(intensity)[source] Returns the mass for the given intensity. If more than one mass has the given intensity, the first mass is returned.
-
icrop
(min_index=0, max_index=- 1, inplace=False)[source] Crop the Mass Spectrum between the given indices.
- Parameters
- Return type
- Returns
The cropped Mass Spectrum
-
property
intensity_list
Returns a copy of the intensity list.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
- Return type
-
iter_peaks
() Iterate over the peaks in the mass spectrum.
-
property
mass_list
Returns a list of the masses.
-
property
mass_spec
Returns the intensity list.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
- Return type
-
property
max_mass
Returns the maximum m/z value in the spectrum.
-
property
min_mass
Returns the minimum m/z value in the spectrum.
-
class
Scan
(mass_list, intensity_list)[source] Bases:
pymsBaseClass
,MassListMixin
Generic object for a single Scan’s raw data.
- Parameters
- Authors
Andrew Isaac, Qiao Wang, Vladimir Likic, Dominic Davis-Foster
Methods:
__copy__
()Returns a copy of the object.
__eq__
(other)Return whether this object is equal to another object.
__len__
()Returns the length of the object.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.from_dict
(dictionary)Create a
Scan
from a dictionary.Iterate over the peaks in the mass spectrum.
Attributes:
Returns a copy of the intensity list.
Returns a list of the masses.
Returns the intensity list.
Returns the maximum m/z value in the spectrum.
Returns the minimum m/z value in the spectrum.
-
__len__
()[source] Returns the length of the object.
- Authors
Andrew Isaac, Qiao Wang, Vladimir Likic
- Return type
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
classmethod
from_dict
(dictionary)[source] Create a
Scan
from a dictionary.The dictionary’s keys must match the arguments taken bt the class’s constructor.
-
property
intensity_list
Returns a copy of the intensity list.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
- Return type
-
property
mass_list
Returns a list of the masses.
-
property
mass_spec
Returns the intensity list.
- Authors
Qiao Wang, Andrew Isaac, Vladimir Likic
- Return type
-
property
max_mass
Returns the maximum m/z value in the spectrum.
-
_C
= TypeVar(_C, bound=CompositeMassSpectrum) Type:
TypeVar
Invariant
TypeVar
bound topyms.Spectrum.CompositeMassSpectrum
.
-
_M
= TypeVar(_M, bound=MassSpectrum) Type:
TypeVar
Invariant
TypeVar
bound topyms.Spectrum.MassSpectrum
.
-
_S
= TypeVar(_S, bound=Scan) Type:
TypeVar
Invariant
TypeVar
bound topyms.Spectrum.Scan
.
-
array_as_numeric
(array)[source] Convert the given numpy array to a numeric data type.
If the data in the array is already in a numeric data type no changes will be made.
If
array
is a pythonSequence
then it will first be converted to a numpy array.
-
normalize_mass_spec
(mass_spec, relative_to=None, inplace=False, max_intensity=100)[source] Normalize the intensities in the given Mass Spectrum to values between
0
andmax_intensity
, which by default is100.0
.- Parameters
mass_spec (
MassSpectrum
) – The Mass Spectrum to normalizerelative_to (
Optional
[float
]) – The largest intensity in the original data set. If not None the intensities are computed relative to this value. If None the value is calculated from the mass spectrum. This can be useful when normalizing several mass spectra to each other. DefaultNone
.inplace (
bool
) – Whether the normalization should be applied to theMassSpectrum
object given, or to a copy (default behaviour). DefaultFalse
.max_intensity (
float
) – The maximum intensity in the normalized spectrum. If omitted the range 0-100.0 is used. If an integer the normalized intensities will be integers. Default100
.
- Return type
- Returns
The normalized mass spectrum
pyms.Noise
Table of Contents
Noise processing functions.
pyms.Noise.Analysis
Noise analysis functions.
Functions:
|
A simple estimator of the signal noise based on randomly placed windows and median absolute deviation. |
-
window_analyzer
(ic, window=256, n_windows=1024, rand_seed=None)[source] A simple estimator of the signal noise based on randomly placed windows and median absolute deviation.
The noise value is estimated by repeatedly and picking random windows (of a specified width) and calculating median absolute deviation (MAD). The noise estimate is given by the minimum MAD.
- Parameters
- Return type
- Returns
The noise estimate.
- Author
Vladimir Likic
pyms.Noise.SavitzkyGolay
Savitzky-Golay noise filter.
Functions:
|
Applies Savitzky-Golay filter on an ion chromatogram. |
|
Applies Savitzky-Golay filter on Intensity Matrix. |
-
savitzky_golay
(ic, window=7, degree=2)[source] Applies Savitzky-Golay filter on an ion chromatogram.
- Parameters
ic (
IonChromatogram
) – The input ion chromatogram.window (
Union
[int
,str
]) – The window selection parameter. This can be an integer or time string. If an integer, taken as the number of points. If a string, must be the form'<NUMBER>s'
or'<NUMBER>m'
, specifying a time in seconds or minutes, respectively. Default7
.degree (
int
) – degree of the fitting polynomial for the Savitzky-Golay filter. Default2
.
- Return type
- Returns
Smoothed ion chromatogram.
- Authors
Uwe Schmitt, Vladimir Likic, Dominic Davis-Foster
-
savitzky_golay_im
(im, window=7, degree=2)[source] Applies Savitzky-Golay filter on Intensity Matrix.
Simply wraps around the Savitzky Golay function above.
- Parameters
im (
BaseIntensityMatrix
)window (
Union
[int
,str
]) – The window selection parameter. Default7
.degree (
int
) – degree of the fitting polynomial for the Savitzky-Golay filter. Default2
.
- Returns
Smoothed IntensityMatrix.
- Return type
- Authors
Sean O’Callaghan, Vladimir Likic, Dominic Davis-Foster
pyms.Noise.Window
Moving window noise filter.
Functions:
|
Applies window smoothing on ion chromatogram. |
|
Applies window smoothing on Intensity Matrix. |
-
window_smooth
(ic, window=3, use_median=False)[source] Applies window smoothing on ion chromatogram.
- Parameters
ic (
IonChromatogram
)window (
Union
[int
,str
]) – The window selection parameter. This can be an integer or time string. If an integer, taken as the number of points. If a string, must be in the form'<NUMBER>s'
or'<NUMBER>m'
, specifying a time in seconds or minutes, respectively Default3
.use_median (
bool
) – Whether to use the the mean or median window smoothing. DefaultFalse
.
- Return type
- Returns
Smoothed ion chromatogram
- Authors
Vladimir Likic, Dominic Davis-Foster (type assertions)
pyms.Peak
Table of Contents
Functions for modelling signal peaks.
pyms.Peak.Class
Provides a class to model signal peak.
Classes:
|
Models a signal peak. |
|
Subclass of |
|
Subclass of |
-
class
AbstractPeak
(rt=0.0, minutes=False, outlier=False)[source] Bases:
pymsBaseClass
Models a signal peak.
- Parameters
- Authors
Vladimir Likic, Andrew Isaac, Dominic Davis-Foster (type assertions and properties), David Kainer (outlier flag)
New in version 2.3.0.
Attributes:
Return the unique peak ID (UID), either:
The area under the peak.
The peak boundaries in points.
Returns a copy of the ion areas dict.
The retention time of the peak, in seconds.
Methods:
__eq__
(other)Return whether this Peak object is equal to another object.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.get_ion_area
(ion)Returns the area of a single ion chromatogram under the peak.
make_UID
()Create a unique peak ID (UID).
set_bounds
(left, apex, right)Sets peak boundaries in points.
set_ion_area
(ion, area)Sets the area for a single ion.
-
property
UID
Return the unique peak ID (UID), either:
Integer masses of top two intensities and their ratio (as
Mass1-Mass2-Ratio*100
); orthe single mass as an integer and the retention time.
- Return type
- Returns
UID string
- Author
Andrew Isaac
-
__eq__
(other)[source] Return whether this Peak object is equal to another object.
- Parameters
other – The other object to test equality with.
- Return type
-
property
bounds
The peak boundaries in points.
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
property
ion_areas
Returns a copy of the ion areas dict.
- Return type
- Returns
The dictionary of
ion: ion area
pairs
-
make_UID
()[source] Create a unique peak ID (UID).
The UID comprises the retention time of the peak to two decimal places. Subclasses may define a more unique ID.
- Author
Andrew Isaac
-
class
ICPeak
(rt=0.0, mass=None, minutes=False, outlier=False)[source] Bases:
AbstractPeak
Subclass of
Peak
representing a peak in an ion chromatogram for a single mass.- Parameters
- Authors
Vladimir Likic, Andrew Isaac, Dominic Davis-Foster (type assertions and properties), David Kainer (outlier flag)
New in version 2.3.0.
Attributes:
Return the unique peak ID (UID), either:
The area under the peak.
The peak boundaries in points.
The mass for a single ion chromatogram peak.
Returns a copy of the ion areas dict.
The retention time of the peak, in seconds.
Methods:
__eq__
(other)Return whether this Peak object is equal to another object.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.get_ion_area
(ion)Returns the area of a single ion chromatogram under the peak.
make_UID
()Create a unique peak ID (UID):
set_bounds
(left, apex, right)Sets peak boundaries in points.
set_ion_area
(ion, area)Sets the area for a single ion.
-
property
UID
Return the unique peak ID (UID), either:
Integer masses of top two intensities and their ratio (as
Mass1-Mass2-Ratio*100
); orthe single mass as an integer and the retention time.
- Return type
- Returns
UID string
- Author
Andrew Isaac
-
__eq__
(other)[source] Return whether this Peak object is equal to another object.
- Parameters
other – The other object to test equality with.
- Return type
-
property
bounds
The peak boundaries in points.
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
get_ion_area
(ion) Returns the area of a single ion chromatogram under the peak.
-
property
ic_mass
The mass for a single ion chromatogram peak.
-
property
ion_areas
Returns a copy of the ion areas dict.
- Return type
- Returns
The dictionary of
ion: ion area
pairs
-
make_UID
()[source] Create a unique peak ID (UID):
the single mass as an integer and the retention time.
- Author
Andrew Isaac
-
set_bounds
(left, apex, right) Sets peak boundaries in points.
-
class
Peak
(rt: float, ms: Optional[pyms.Spectrum.MassSpectrum], minutes: bool = ..., outlier: bool = ...)[source] -
class
Peak
(rt: float, ms: float, minutes: bool = ..., outlier: bool = ...) Bases:
AbstractPeak
Subclass of
Peak
representing a peak in a mass spectrum.- Parameters
- Authors
Vladimir Likic, Andrew Isaac, Dominic Davis-Foster (type assertions and properties), David Kainer (outlier flag)
Changed in version 2.3.0: Functionality related to single ion peaks has moved to the
ICPeak
class. The two classes share a common base class,AbstractPeak
, which can be used in type checks for functions that accept either type of peak.Changed in version 2.3.0: If the
ms
argument is unset an empty mass spectrum is used, rather thanNone
in previous versions.Attributes:
Return the unique peak ID (UID), either:
The area under the peak.
The peak boundaries in points.
Returns a copy of the ion areas dict.
The mass spectrum at the apex of the peak.
The retention time of the peak, in seconds.
Methods:
__eq__
(other)Return whether this Peak object is equal to another object.
crop_mass
(mass_min, mass_max)Crops mass spectrum.
dump
(file_name[, protocol])Dumps an object to a file through
pickle.dump()
.find_mass_spectrum
(data[, from_bounds])get_int_of_ion
(ion)Returns the intensity of a given ion in this peak.
get_ion_area
(ion)Returns the area of a single ion chromatogram under the peak.
Returns the m/z value with the third highest intensity.
make_UID
()Create a unique peak ID (UID):
null_mass
(mass)Ignore given mass in spectra.
set_bounds
(left, apex, right)Sets peak boundaries in points.
set_ion_area
(ion, area)Sets the area for a single ion.
top_ions
([num_ions])Computes the highest #num_ions intensity ions.
-
property
UID
Return the unique peak ID (UID), either:
Integer masses of top two intensities and their ratio (as
Mass1-Mass2-Ratio*100
); orthe single mass as an integer and the retention time.
- Return type
- Returns
UID string
- Author
Andrew Isaac
-
__eq__
(other)[source] Return whether this Peak object is equal to another object.
- Parameters
other – The other object to test equality with.
- Return type
-
property
bounds
The peak boundaries in points.
-
dump
(file_name, protocol=3) Dumps an object to a file through
pickle.dump()
.
-
find_mass_spectrum
(data, from_bounds=False)[source] Sets the peak’s mass spectrum from the data.
Clears the single ion chromatogram mass.
- Parameters
data (
BaseIntensityMatrix
)from_bounds (
float
) – Whether to use the attributepyms.Peak.Class.Peak.pt_bounds
or to find the peak apex from the peak retention time. DefaultFalse
.
-
get_ion_area
(ion) Returns the area of a single ion chromatogram under the peak.
-
property
ion_areas
Returns a copy of the ion areas dict.
- Return type
- Returns
The dictionary of
ion: ion area
pairs
-
make_UID
()[source] Create a unique peak ID (UID):
Integer masses of top two intensities and their ratio (as
Mass1-Mass2-Ratio*100
); or
- Author
Andrew Isaac
-
property
mass_spectrum
The mass spectrum at the apex of the peak.
- Return type
-
null_mass
(mass)[source] Ignore given mass in spectra.
- Parameters
mass (
float
) – Mass value to remove- Author
Andrew Isaac
-
set_bounds
(left, apex, right) Sets peak boundaries in points.
-
set_ion_area
(ion, area) Sets the area for a single ion.
pyms.Peak.Function
Functions related to Peak modification.
Functions:
|
Find bound of peak by summing intensities until change in sum is less than |
|
Find bounds of peak by summing intensities until change in sum is less than |
|
Calculates the median of the left and right bounds found for each apexing peak mass. |
|
Approximate the peak bounds (left and right offsets from apex). |
|
Calculate the sum of the raw ion areas based on detected boundaries. |
|
Calculate and return the ion areas of the five most abundant ions in the peak. |
|
Computes the highest 5 intensity ions. |
|
Computes the highest #num_ions intensity ions. |
-
half_area
(ia, max_bound=0, tol=0.5)[source] Find bound of peak by summing intensities until change in sum is less than
tol
percent of the current area.- Parameters
- Return type
- Returns
Half peak area, boundary offset, shared (True if shared ion).
- Authors
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
ion_area
(ia, apex, max_bound=0, tol=0.5)[source] Find bounds of peak by summing intensities until change in sum is less than
tol
percent of the current area.- Parameters
- Return type
- Returns
Area, left and right boundary offset, shared left, shared right.
- Authors
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
median_bounds
(im, peak, shared=True)[source] Calculates the median of the left and right bounds found for each apexing peak mass.
- Parameters
im (
BaseIntensityMatrix
) – The originating IntensityMatrix object.peak (
Peak
)shared (
bool
) – Include shared ions shared with neighbouring peak. DefaultTrue
.
- Return type
- Returns
Median left and right boundary offset in points.
- Authors
Andrew Isaac, Dominic Davis-Foster
-
peak_pt_bounds
(im, peak)[source] Approximate the peak bounds (left and right offsets from apex).
- Parameters
im (
BaseIntensityMatrix
) – The originating IntensityMatrix objectpeak (
Peak
)
- Return type
- Returns
Sum of peak apex ions in detected bounds
- Authors
Andrew Isaac, Sean O’Callaghan, Dominic Davis-Foster
-
peak_sum_area
(im, peak, single_ion=False, max_bound=0)[source] Calculate the sum of the raw ion areas based on detected boundaries.
- Parameters
im (
BaseIntensityMatrix
) – The originating IntensityMatrix object.peak (
Peak
)single_ion (
bool
) – whether single ion areas should be returned. DefaultFalse
.max_bound (
int
) – Optional value to limit size of detected bound. Default0
.
- Return type
- Returns
Sum of peak apex ions in detected bounds.
- Overloads
- Authors
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
peak_top_ion_areas
(im, peak, n_top_ions=5, max_bound=0)[source] Calculate and return the ion areas of the five most abundant ions in the peak.
- Parameters
im (
IntensityMatrix
) – The originating IntensityMatrix object.peak (
Peak
)n_top_ions (
int
) – Number of top ions to return areas for. Default5
.max_bound (
int
) – Optional value to limit size of detected bound. Default0
.
- Return type
- Returns
Dictionary of
ion : ion_area pairs
.- Authors
Sean O’Callaghan, Dominic Davis-Foster (type assertions)
-
top_ions_v1
(peak, num_ions=5)[source] Computes the highest 5 intensity ions.
- Parameters
- Return type
- Returns
A list of the top 5 highest intensity ions
- Authors
Sean O’Callaghan, Dominic Davis-Foster (type assertions)
Deprecated since version 2.0.0: This will be removed in 2.4.0. Use
pyms.Peak.Function.top_ions_v2()
instead
-
top_ions_v2
(peak, num_ions=5)[source] Computes the highest #num_ions intensity ions.
- Parameters
- Return type
- Returns
A list of the num_ions highest intensity ions
- Authors
Sean O’Callaghan, Dominic Davis-Foster (type assertions)
Deprecated since version 2.1.2: This will be removed in 2.5.0. Use
pyms.Peak.Class.Peak.top_ions()
instead
pyms.Peak.List
Functions for modelling peak lists.
pyms.Peak.List.Function
Functions related to Peak modification.
Functions:
|
Create a peak that consists of a composite spectrum from all spectra in the list of peaks. |
|
Gets the best matching Retention Time and spectra from ‘data’ for each peak in the peak list. |
|
Returns whether |
|
Selects peaks from a retention time range. |
-
composite_peak
(peak_list, ignore_outliers=False)[source] Create a peak that consists of a composite spectrum from all spectra in the list of peaks.
-
fill_peaks
(data, peak_list, D, minutes=False)[source] Gets the best matching Retention Time and spectra from ‘data’ for each peak in the peak list.
- Parameters
data (
BaseIntensityMatrix
) – A data IntensityMatrix that has the same mass range as the peaks in the peak listD (
float
) – Peak width standard deviation in seconds. Determines search window width.minutes (
bool
) – Return retention time as minutes. DefaultFalse
.
- Return type
- Returns
List of Peak Objects
- Authors
Andrew Isaac, Dominic Davis-Foster (type assertions)
-
is_peak_list
(peaks)[source] Returns whether
peaks
is a valid peak list.- Author
Dominic Davis-Foster
- Return type
pyms.Peak.List.IO
Functions related to storing and loading a list of Peak objects.
Functions:
|
Loads the peak_list stored with |
|
Store the list of peak objects. |
-
load_peaks
(file_name)[source] Loads the peak_list stored with
store_peaks()
.
pyms.Simulator
Table of Contents
Provides functions for simulation of GCMS data.
Functions:
|
Adds noise to an IntensityMatrix object. |
|
Adds noise drawn from a normal distribution with constant scale to an ion chromatogram. |
|
Adds noise to an IntensityMatrix object. |
|
Adds noise to an ic. |
|
Returns a simulated ion chromatogram of a pure component. |
|
Calculates a point on a gaussian density function. |
|
Simulator of GCMS data. |
-
add_gaussc_noise
(im, scale)[source] Adds noise to an IntensityMatrix object.
- Parameters
im (
BaseIntensityMatrix
) – the intensity matrix objectscale (
float
) – the scale of the normal distribution from which the noise is drawn
- Author
Sean O’Callaghan
-
add_gaussc_noise_ic
(ic, scale)[source] Adds noise drawn from a normal distribution with constant scale to an ion chromatogram.
- Parameters
ic (
IonChromatogram
) – The ion Chromatogram.scale (
float
) – The scale of the normal distribution.
- Author
Sean O’Callaghan
-
add_gaussv_noise
(im, scale, cutoff, prop)[source] Adds noise to an IntensityMatrix object.
- Parameters
im (
BaseIntensityMatrix
) – the intensity matrix objectscale (
int
) – the scale of the normal distribution from which the noise is drawncutoff (
int
) – The level below which the intensity of the ic at that point has no effect on the scale of the noise distributionscale – The scale of the normal distribution for ic values
prop (
float
) – For intensity values above the cutoff, the scale is multiplied by the ic value multiplied byprop
.
- Author
Sean O’Callaghan
-
add_gaussv_noise_ic
(ic, scale, cutoff, prop)[source] Adds noise to an ic. The noise value is drawn from a normal distribution, the scale of this distribution depends on the value of the ic at the point where the noise is being added
- Parameters
ic (
IonChromatogram
) – The IonChromatogramcutoff (
int
) – The level below which the intensity of the ic at that point has no effect on the scale of the noise distributionscale (
int
) – The scale of the normal distribution for ic values below the cutoff is modified for values above the cutoffprop (
float
) – For ic values above the cutoff, the scale is multiplied by the ic value multiplied byprop
.
- Author
Sean O’Callaghan
-
chromatogram
(n_scan, x_zero, sigma, peak_scale)[source] Returns a simulated ion chromatogram of a pure component.
The ion chromatogram contains a single gaussian peak.
-
gaussian
(point, mean, sigma, scale)[source] Calculates a point on a gaussian density function.
f = s*exp(-((x-x0)^2)/(2*w^2))
pyms.TopHat
Top-hat baseline corrector.
Functions:
|
Top-hat baseline correction on Ion Chromatogram. |
|
Top-hat baseline correction on Intensity Matrix. |
-
tophat
(ic, struct=None)[source] Top-hat baseline correction on Ion Chromatogram.
- Parameters
- Return type
- Returns
Top-hat corrected ion chromatogram.
- Authors
Woon Wai Keen, Vladimir Likic, Dominic Davis-Foster (type assertions)
-
tophat_im
(im, struct=None)[source] Top-hat baseline correction on Intensity Matrix.
Wraps around the TopHat function above.
- Parameters
im (
BaseIntensityMatrix
) – The input Intensity Matrix.struct (
Optional
[str
]) – Top-hat structural element as time string. The structural element needs to be larger than the features one wants to retain in the spectrum after the top-hat transform. DefaultNone
.
- Return type
- Returns
Top-hat corrected IntensityMatrix Matrix
- Author
Sean O’Callaghan
pyms.Utils
Table of Contents
Utility functions for PyMassSpec wide use.
pyms.Utils.IO
General I/O functions.
Functions:
|
Dumps an object to a file through |
|
Returns lines from a file, as a list. |
|
Loads an object previously dumped with |
|
Convert string filename into pathlib.Path object and create parent directories if required. |
|
Saves a list of numbers or a list of lists of numbers to a file with specific formatting. |
-
dump_object
(obj, file_name)[source] Dumps an object to a file through
pickle.dump()
.
-
file_lines
(file_name, strip=False)[source] Returns lines from a file, as a list.
- Parameters
- Return type
- Returns
A list of lines
- Authors
Vladimir Likic, Dominic Davis-Foster (pathlib support)
-
load_object
(file_name)[source] Loads an object previously dumped with
dump_object()
.
-
prepare_filepath
(file_name, mkdirs=True)[source] Convert string filename into pathlib.Path object and create parent directories if required.
-
save_data
(file_name, data, format_str='%.6f', prepend='', sep=' ', compressed=False)[source] Saves a list of numbers or a list of lists of numbers to a file with specific formatting.
- Parameters
data (
Union
[List
[float
],List
[List
[float
]]]) – A list of numbers, or a list of listsformat_str (
str
) – A format string for individual entries. Default'%.6f'
.prepend (
str
) – A string, printed before each row. Default''
.sep (
str
) – A string, printed after each number. Default'␣'
.compressed (
bool
) – IfTrue
, the output will be gzipped. DefaultFalse
.
- Authors
Vladimir Likic, Dominic Davis-Foster (pathlib support)
pyms.Utils.Math
Provides mathematical functions.
Functions:
|
Median absolute deviation. |
|
Test if a string, or list of strings, contains a numeric value(s). |
|
Identify outliers using the median absolute deviation (MAD). |
|
Return the sample arithmetic mean of data. |
|
Return the median (middle value) of numeric data. |
|
Identify outliers using the median value. |
|
Identify outliers using a percentile. |
|
Calculates RMSD for the 2 lists. |
|
Return the square root of the sample variance. |
|
Generates a list by using start, stop, and step values. |
-
mad_based_outlier
(data, thresh=3.5)[source] Identify outliers using the median absolute deviation (MAD).
- Parameters
- Author
David Kainer
- Url
-
mean
(data)[source] Return the sample arithmetic mean of data.
>>> mean([1, 2, 3, 4, 4]) 2.8
>>> from fractions import Fraction as F >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)]) Fraction(13, 21)
>>> from decimal import Decimal as D >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")]) Decimal('0.5625')
If
data
is empty, StatisticsError will be raised.
-
median
(data)[source] Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values:
>>> median([1, 3, 5]) 3 >>> median([1, 3, 5, 7]) 4.0
-
median_outliers
(data, m=2.5)[source] Identify outliers using the median value.
- Parameters
data
m (
float
) – Default2.5
.
- Author
David Kainer
- Author
- Author
Benjamin Bannier (https://stackoverflow.com/users/176922/benjamin-bannier)
- Url
http://stackoverflow.com/questions/11686720/is-there-a-numpy-builtin-to-reject-outliers-from-a-list
-
percentile_based_outlier
(data, threshold=95)[source] Identify outliers using a percentile.
- Parameters
- Author
David Kainer
- Url
-
std
(data, xbar=None) Return the square root of the sample variance.
See
variance
for arguments and other details.>>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 1.0810874155219827
pyms.Utils.Time
Time conversion and related functions.
Functions:
|
Returns whether the argument is a string in the format of a number. |
|
Resolves time string of the form |
|
Converts window selection parameter into points based on the time step in an ion chromatogram |
-
is_str_num
(arg)[source] Returns whether the argument is a string in the format of a number.
The number can be an integer, or alternatively a floating point number in scientific or engineering format.
-
time_str_secs
(time_str)[source] Resolves time string of the form
'<NUMBER>s'
or'<NUMBER>m'
and returns the time in seconds.
-
window_sele_points
(ic, window_sele, half_window=False)[source] Converts window selection parameter into points based on the time step in an ion chromatogram
- Parameters
ic (
IonChromatogram
) – ion chromatogram object relevant for the conversionwindow_sele (
Union
[int
,str
]) – The window selection parameter. This can be an integer or time string. If an integer, taken as the number of points. If a string, must of the form'<NUMBER>s'
or'<NUMBER>m'
, specifying a time in seconds or minutes, respectivelyhalf_window (
bool
) – Specifies whether to return half-window. DefaultFalse
.
- Return type
- Returns
The number of points in the window
- Author
Vladimir Likic
pyms.Utils.Utils
General utility functions.
Functions:
|
Returns whether |
|
Returns whether the object represents a filesystem path. |
|
Returns whether the object is a |
|
Returns whether the object is a |
-
is_sequence_of
(obj, of)[source] Returns whether the object is a
Sequence
, and not a string, of the given type.
-
pyms.Utils.Utils.
signedinteger
numpy.signedinteger
at runtime;int
when type checking.
Changelog
Changes in v2.3.0
All functions, classes and methods now have PEP 484 type hints. Contributed by Chris Davis-Foster in #4#4
All modules now implement
__all__
to limit the objects imported when using*
imports.Removed the following deprecated functions:
Removed object
Suggested replacement
pyms.Experiment.Experiment.get_expr_code()
pyms.Experiment.Experiment.get_peak_list()
pyms.Experiment.Experiment.store()
pyms.Experiment.store_expr()
pyms.GCMS.Class.GCMS_data.get_scan_list()
pyms.GCMS.Class.GCMS_data.get_tic()
pyms.Gapfill.Class.MissingPeak.get_common_ion()
pyms.Gapfill.Class.MissingPeak.get_common_ion_area()
pyms.Gapfill.Class.MissingPeak.get_exact_rt()
pyms.Gapfill.Class.MissingPeak.get_qual_ion1()
pyms.Gapfill.Class.MissingPeak.get_qual_ion2()
pyms.Gapfill.Class.MissingPeak.get_rt()
pyms.Gapfill.Class.MissingPeak.set_common_ion_area()
pyms.Gapfill.Class.MissingPeak.set_exact_rt()
pyms.Gapfill.Class.Sample.get_missing_peaks()
pyms.Gapfill.Class.Sample.get_mp_rt_area_dict()
pyms.Gapfill.Class.Sample.get_name()
pyms.Gapfill.Function.transposed()
pyms.IonChromatogram.IonChromatogram.get_mass()
pyms.IonChromatogram.IonChromatogram.get_time_step()
pyms.IonChromatogram.IonChromatogram.set_intensity_array()
pyms.Mixins.MaxMinMassMixin.get_max_mass()
pyms.Mixins.MaxMinMassMixin.get_min_mass()
pyms.Mixins.MassListMixin.get_mass_list()
pyms.Mixins.TimeListMixin.get_time_list()
pyms.Mixins.IntensityArrayMixin.get_intensity_array()
pyms.Mixins.IntensityArrayMixin.get_matrix_list()
pyms.Peak.Class.Peak.get_area()
pyms.Peak.Class.Peak.get_ic_mass()
pyms.Peak.Class.Peak.get_ion_areas()
pyms.Peak.Class.Peak.get_mass_spectrum()
pyms.Peak.Class.Peak.get_pt_bounds()
pyms.Peak.Class.Peak.get_rt()
pyms.Peak.Class.Peak.get_UID()
pyms.Peak.Class.Peak.set_area()
pyms.Peak.Class.Peak.set_ic_mass()
pyms.Peak.Class.Peak.set_ion_areas()
pyms.Peak.Class.Peak.set_mass_spectrum()
pyms.Peak.Class.Peak.set_pt_bounds()
pyms.Peak.Class.Peak.pt_bounds
pyms.Utils.Utils.is_positive_int()
pyms.Utils.Utils.is_list_of_dec_nums()
Renamed
pyms.Gapfill.Function.file2matrix()
topyms.Gapfill.Function.file2dataframe()
. The function now returns a Pandas DataFrame.Split
pyms.IntensityMatrix.IntensityMatrix
into two classes:pyms.IntensityMatrix.BaseIntensityMatrix
andpyms.IntensityMatrix.IntensityMatrix
. This makes subclassing easier.Split
pyms.Peak.Class.Peak
into three classes:pyms.Peak.Class.AbstractPeak
,pyms.Peak.Class.Peak
,pyms.Peak.Class.ICPeak
.ICPeak
is returned when a mass is passed to thePeak
constructor instead of a mass spectrum.Added the following functions and classes:
Flag to indicate the filetype for
pyms.Gapfill.Function.missing_peak_finder()
.Enumeration of supported ASCII filetypes for
export_ascii()
.Constructs a Base Peak Chromatogram from the data.
Returns whether the ion chromatogram is an extracted ion chromatogram (EIC).
Returns whether the ion chromatogram is a base peak chromatogram (BPC).
Models an extracted ion chromatogram (EIC).
Models a base peak chromatogram (BPC).
Convert the given numpy array to a numeric data type.
Returns whether the object represents a filesystem path.
Returns whether the object is a
Sequence
, and not a string.pyms.Utils.Utils.is_sequence_of
(obj, of)Returns whether the object is a
Sequence
, and not a string, of the given type.Returns whether
obj
is a numerical value (int
, :class`float` etc).Class to model a subset of data from an Intensity Matrix.
The
ia
parameter ofpyms.IonChromatogram.IonChromatogram`
was renamed tointensity_list
.
Changes in v2.2.22-beta2
pyms.Spectrum.Scan
andpyms.Spectrum.MassSpectrum
can now accept any values formass
andintensity
that that can be converted to afloat
orint
. This includes strings representing numbers. Previously onlyint
andfloat
values were permitted.If the mass and intensity values supplied to a
pyms.Spectrum.Scan
or apyms.Spectrum.MassSpectrum
arefloat
,int
, or a data type derived fromnumpy.number
, the data is stored in that type. For other data types, such as strings,decimal.Decimal
etc., the data is stored asfloat
.If the data contains values in mixed types then, in most cases, all values will be converted to
float
. If you wish to control this behaviour you should construct anumpy.ndarray
with the desired type. See https://numpy.org/devdocs/user/basics.types.html for a list of types.A
TypeError
is no longer raised when creating apyms.Spectrum.Scan
or apyms.Spectrum.MassSpectrum
with afloat
,int
etc. rather than a sequence. Instead, value is treated as being the sole element in a list.Passing a non-numeric string or a list of non-numeric strings to
pyms.Spectrum.Scan
orpyms.Spectrum.MassSpectrum
now raises aValueError
and not aTypeError
as in previous versions.pyms.Peak.Class.Peak.ion_areas()
now accepts dictionary keys asfloat
as well asint
.
Changes in v2.2.22-beta1
ANDI_reader()
andpyms.Spectrum.Scan
were modified to allow ANDI-MS files to be read if the data either:had the m/z data stored from highest m/z to lowest; or
contained 0-length scans.
Introduction
Examples of PyMassSpec use given in the User Guide.
Further examples can be found at https://github.com/domdfcoding/PyMassSpec/tree/master/pyms-demo/jupyter
Chapter 1 – GC-MS Raw Data Model
Chapter 2 – GC-MS data derived objects
32 – Saving IonChromatogram and IntensityMatrix information.
Chapter 5 – Peak alignment by dynamic programming.
PyMassSpec test and example data files
The example data files can be downloaded using the links below:
gc01_0812_066.cdf
– GC-MS data acquired on Agilent 5975C MSD interfaced with Agilent 7890A GC. Data was exported as NetCDF from Agilent ChemStation.
gc01_0812_066.jdx
– GC-MS data acquired on Agilent 5975C MSD interfaced with Agilent 7890A GC. Data was read with GCMS FileTranslatorPro (Scientific Instrument Services, Inc), and exported in JCAMP-DX format
a0806_079.cdf
– GC-MS data of a life-cycle stage of a parasite. Each data file is the output of a separate GC-MS processing run of a sample prepared from the same life-cycle stage.
a0806_142.cdf
– GC-MS data of a different life-cycle stage of the parasite. Each data file is the output of a separate GC-MS processing run of a sample prepared from the same life-cycle stage, but a different stage to the previous samples.
MM-10.0_1_no_processing.cdf
– GC-TOF data aquired on Leco Pegasus machine using ChromaTOF software.
nist08_test.jca
– NIST formatted test data
Copyright © 2006-2011 Vladimir Likic, Bio21 Molecular Science and Biotechnology Institute, the University of Melbourne, Melbourne, Australia. All rights reserved.
20e
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | """proc.py """ # TODO: mzML demo; need example mzML file import pathlib data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data" # Change this if the data files are stored in a different location from pyms.GCMS.IO.MZML import mzML_reader # read the raw data mzml_file = data_directory / ".mzML" data = mzML_reader(mzml_file) print(data) # raw data operations print("minimum mass found in all data: ", data.min_mass) print("maximum mass found in all data: ", data.max_mass) # time time = data.time_list print(time) print("number of retention times: ", len(time)) print("retention time of 1st scan: ", time[0], "sec") print("index of 400sec in time_list: ", data.get_index_at_time(400.0)) # TIC tic = data.tic print(tic) print("number of scans in TIC: ", len(tic)) print("start time of TIC: ", tic.get_time_at_index(0), "sec") # raw scans scans = data.scan_list print(scans) print(scans[0].mass_list) print("1st mass value for 1st scan: ", scans[0].mass_list[0]) print("1st intensity value for 1st scan: ", scans[0].intensity_list[0]) print("minimum mass found in 1st scan: ", scans[0].min_mass) print("maximum mass found in 1st scan: ", scans[0].max_mass) |
32
Saving IonChromatogram and IntensityMatrix information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | """proc.py """ from pyms.GCMS.IO.JCAMP import JCAMP_reader from pyms.IntensityMatrix import build_intensity_matrix from pyms.Utils.IO import save_data # read the raw data as a GCMS_data object jcamp_file = "data/gc01_0812_066.jdx" data = JCAMP_reader(jcamp_file) # IntensityMatrix # must build intensity matrix before accessing any intensity matrix methods. # default, float masses with interval (bin interval) of one from min mass print("default intensity matrix, bin interval = 1, boundary +/- 0.5") im = build_intensity_matrix(data) # # Saving data # # save the intensity matrix values to a file mat = im.intensity_array print("saving intensity matrix intensity values...") save_data("output/im.dat", mat) # Export the entire IntensityMatrix as CSV. This will create # data.im.csv, data.mz.csv, and data.rt.csv where # these are the intensity matrix, retention time # vector, and m/z vector in the CSV format print("exporting intensity matrix data...") im.export_ascii("output/data") # Export the entire IntensityMatrix as LECO CSV. This is # useful for import into AnalyzerPro print("exporting intensity matrix data to LECO CSV format...") im.export_leco_csv("output/data_leco.csv") # # Import saved data # from pyms.IntensityMatrix import import_leco_csv # import LECO CSV file print("importing intensity matrix data from LECO CSV format...") iim = import_leco_csv("output/data_leco.csv") # Check size to original print("Output dimensions:", im.size, " Input dimensions:", iim.size) |
55
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | """proc.py """ from pyms.GCMS.IO.ANDI import ANDI_reader from pyms.IntensityMatrix import build_intensity_matrix_i from pyms.Noise.SavitzkyGolay import savitzky_golay from pyms.TopHat import tophat from pyms.BillerBiemann import BillerBiemann, rel_threshold, num_ions_threshold from pyms.Peak.Function import peak_sum_area # read the raw data as a GCMS_data object andi_file = "data/gc01_0812_066.cdf" data = ANDI_reader(andi_file) im = build_intensity_matrix_i(data) n_scan, n_mz = im.size print("Intensity matrix size (scans, masses):", (n_scan, n_mz)) # noise filter and baseline correct for ii in range(n_mz): ic = im.get_ic_at_index(ii) ic_smooth = savitzky_golay(ic) ic_bc = tophat(ic_smooth, struct="1.5m") im.set_ic_at_index(ii, ic_bc) # Use Biller and Biemann technique to find apexing ions at a scan. peak_list = BillerBiemann(im, points=9, scans=2) # percentage ratio of ion intensity to max ion intensity r = 2 # minimum number of ions, n n = 3 # greater than or equal to threshold, t t = 10000 # trim by relative intensity pl = rel_threshold(peak_list, r) # trim by threshold new_peak_list = num_ions_threshold(pl, n, t) print("Number of filtered peaks: ", len(new_peak_list)) # find and set areas print("Peak areas") print("UID, RT, height, area") for peak in new_peak_list: rt = peak.rt # Only test interesting sub-set from 29.5 to 32.5 minutes if rt >= 29.5*60.0 and rt <= 32.5*60.0: # determine and set area area = peak_sum_area(im, peak) peak.area = area # print some details UID = peak.UID # height as sum of the intensities of the apexing ions height = sum(peak.mass_spectrum.mass_spec) print(UID + f", {rt / 60.0:.2f}, {height:.2f}, {peak.area:.2f}") |
56
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | """proc.py """ from pyms.GCMS.IO.ANDI import ANDI_reader from pyms.IntensityMatrix import build_intensity_matrix_i from pyms.Noise.SavitzkyGolay import savitzky_golay from pyms.TopHat import tophat from pyms.BillerBiemann import BillerBiemann, rel_threshold, num_ions_threshold from pyms.Peak.Function import peak_top_ion_areas # read the raw data as a GCMS_data object andi_file = "data/gc01_0812_066.cdf" data = ANDI_reader(andi_file) im = build_intensity_matrix_i(data) n_scan, n_mz = im.size print("Intensity matrix size (scans, masses):", (n_scan, n_mz)) # noise filter and baseline correct for ii in range(n_mz): ic = im.get_ic_at_index(ii) ic_smooth = savitzky_golay(ic) ic_bc = tophat(ic_smooth, struct="1.5m") im.set_ic_at_index(ii, ic_bc) # Use Biller and Biemann technique to find apexing ions at a scan. peak_list = BillerBiemann(im, points=9, scans=2) # percentage ratio of ion intensity to max ion intensity r = 2 # minimum number of ions, n n = 3 # greater than or equal to threshold, t t = 10000 # trim by relative intensity pl = rel_threshold(peak_list, r) # trim by threshold new_peak_list = num_ions_threshold(pl, n, t) print("Number of filtered peaks: ", len(new_peak_list)) # find and set areas print("Top 5 most abundant ions for each peak ") for peak in new_peak_list: rt = peak.rt # Only test interesting sub-set from 29.5 to 32.5 minutes if rt >= 29.5*60.0 and rt <= 32.5*60.0: # determine and set ion areas, use default num of ions =5 areas_dict = peak_top_ion_areas(im, peak) peak.ion_areas = areas_dict area_dict = peak.ion_areas # print the top 5 ions for each peak print(area_dict.keys()) |
64
Peak alignment with the “common ion” filtering.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | """proc.py """ import os from pyms.Experiment import load_expr from pyms.DPA.PairwiseAlignment import PairwiseAlignment, align_with_tree from pyms.DPA.Alignment import exprl2alignment #from pyms.Peak.List.IO import store_peaks # define the input experiments list exprA_codes = [ "a0806_077", "a0806_078", "a0806_079" ] exprB_codes = [ "a0806_140", "a0806_141", "a0806_142" ] # within replicates alignment parameters Dw = 2.5 # rt modulation [s] Gw = 0.30 # gap penalty # do the alignment print('Aligning expt A') expr_list = [] expr_dir = "../old demos/61a/output/" for expr_code in exprA_codes: file_name = os.path.join(expr_dir, expr_code + ".expr") expr = load_expr(file_name) expr_list.append(expr) F1 = exprl2alignment(expr_list) T1 = PairwiseAlignment(F1, Dw, Gw) A1 = align_with_tree(T1, min_peaks=2) top_ion_list = A1.common_ion() A1.write_common_ion_csv('output/area2.csv', top_ion_list) print('Aligning expt B') expr_list = [] expr_dir = "../old demos/61b/output/" for expr_code in exprB_codes: file_name = os.path.join(expr_dir, expr_code + ".expr") expr = load_expr(file_name) expr_list.append(expr) F2 = exprl2alignment(expr_list) T2 = PairwiseAlignment(F2, Dw, Gw) A2 = align_with_tree(T2, min_peaks=2) # between replicates alignment parameters Db = 10.0 # rt modulation Gb = 0.30 # gap penalty top_ion_list = A2.common_ion() A2.write_common_ion_csv('output/area1.csv', top_ion_list) print('Aligning input {1,2}') T9 = PairwiseAlignment([A1,A2], Db, Gb) A9 = align_with_tree(T9) top_ion_list = A9.common_ion() A9.write_common_ion_csv('output/area.csv', top_ion_list) |
A1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | """ proc.py Plot detected peaks using matplotlib """ import sys sys.path.append("../..") import pathlib import matplotlib matplotlib.use("TkAgg") import matplotlib.pyplot as plt from pyms.BillerBiemann import BillerBiemann, rel_threshold, num_ions_threshold from pyms.Display import plot_ic, plot_peaks from pyms.GCMS.IO.ANDI import ANDI_reader from pyms.IntensityMatrix import build_intensity_matrix_i from pyms.Noise.SavitzkyGolay import savitzky_golay from pyms.TopHat import tophat data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data" # Change this if the data files are stored in a different location output_directory = pathlib.Path(".").resolve() / "output" # Read raw data andi_file = data_directory / "MM-10.0_1_no_processing.cdf" data = ANDI_reader(andi_file) # Build Intensity Matrix im = build_intensity_matrix_i(data) # Perform pre-filtering and peak detection. n_scan, n_mz = im.size for ii in range(n_mz): ic = im.get_ic_at_index(ii) ic_smooth = savitzky_golay(ic) ic_bc = tophat(ic_smooth, struct="1.5m") im.set_ic_at_index(ii, ic_bc) # Detect Peaks peak_list = BillerBiemann(im, points=9, scans=2) print("Number of peaks found: ", len(peak_list)) # Filter the peak list, first by removing all intensities in a peak less than a # given relative threshold, then by removing all peaks that have less than a # given number of ions above a given value # Parameters # percentage ratio of ion intensity to max ion intensity percent = 2 # minimum number of ions, n n = 3 # greater than or equal to threshold, t cutoff = 10000 # trim by relative intensity pl = rel_threshold(peak_list, percent) # trim by threshold new_peak_list = num_ions_threshold(pl, n, cutoff) print("Number of filtered peaks: ", len(new_peak_list)) # TIC from raw data tic = data.tic # Get Ion Chromatograms for all m/z channels n_mz = len(im.mass_list) # Create a subplot fig, ax = plt.subplots(1, 1) # Plot the peaks plot_peaks(ax, new_peak_list, style="lines") # Note: No idea why, but the dots for the peaks consistently appear 2e7 below the apex of the peak. # As an alternative, the positions of the peaks can be shown with thin grey lines as in this example. # The peak positions seem to appear OK in the other examples. # See pyms-demo/scripts/Displaying_Detected_Peaks.py for a better example # Plot the TIC plot_ic(ax, tic, label="TIC") # Plot the ICs # for m in range(n_mz): # plot_ic(ax, im.get_ic_at_index(m)) # Set the title ax.set_title('TIC and PyMS Detected Peaks') # Add the legend plt.legend() # Show the plot plt.show() |
A2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | """proc.py This example demonstrates processing of a GC-TOF (Leco Pegasus-ChromaTOF) generated dataset. GC-TOF data is made up of nearly 10 scans per second. As a result of this, the peak detection window in the Biller Biemann algorithm has a higher value as compared to the value used to process GC-Quad data. Due to the same reason, the value of the number of scans in the Biller Biemann algorithm has a higher value as compared to the value used to process GC-Quad data. """ import pathlib from pyms.GCMS.IO.ANDI import ANDI_reader from pyms.IntensityMatrix import build_intensity_matrix_i from pyms.Noise.SavitzkyGolay import savitzky_golay from pyms.TopHat import tophat from pyms.Peak.Function import peak_sum_area from pyms.Display import Display from pyms.BillerBiemann import BillerBiemann, rel_threshold, num_ions_threshold data_directory = pathlib.Path(".").resolve().parent.parent / "pyms-data" # Change this if the data files are stored in a different location output_directory = pathlib.Path(".").resolve() / "output" # from numpy import * # read raw data andi_file = data_directory / "MM-10.0_1_no_processing.cdf" data = ANDI_reader(andi_file) # Build Intensity Matrix im = build_intensity_matrix_i(data) n_scan, n_mz = im.size # perform necessary pre filtering for ii in range(n_mz): ic = im.get_ic_at_index(ii) ic_smooth = savitzky_golay(ic) ic_bc = tophat(ic_smooth, struct="1.5m") im.set_ic_at_index(ii, ic_bc) # Detect Peaks peak_list = BillerBiemann(im, points=15, scans=3) print("Number of peaks found: ", len(peak_list)) ######### Filter peaks############### # Filter the peak list, # first by removing all intensities in a peak less than a given relative # threshold, # then by removing all peaks that have less than a given number of ions above # a given value # Parameters # percentage ratio of ion intensity to max ion intensity r = 2 # minimum number of ions, n n = 2 # greater than or equal to threshold, t t = 4000 # trim by relative intensity pl = rel_threshold(peak_list, r) # trim by threshold new_peak_list = num_ions_threshold(pl, n, t) print("Number of filtered peaks: ", len(new_peak_list)) print("Peak areas") print("UID, RT, height, area") for peak in new_peak_list: rt = peak.rt # determine and set area area = peak_sum_area(im, peak) peak.area = area # print some details UID = peak.UID # height as sum of the intensities of the apexing ions height = sum(peak.mass_spectrum.mass_spec.tolist()) print(UID + f", {rt:.2f}, {height:.2f}, {peak.area:.2f}") # TIC from raw data tic = data.tic # baseline correction for TIC tic_bc = tophat(tic, struct="1.5m") # Get Ion Chromatograms for all m/z channels n_mz = len(im.mass_list) ic_list = [] for m in range(n_mz): ic_list.append(im.get_ic_at_index(m)) # Create a new display object, this time plot the ICs # and the TIC, as well as the peak list display = Display() display.plot_tic(tic_bc, 'TIC BC') for ic in ic_list: display.plot_ic(ic) display.plot_peaks(new_peak_list, 'Peaks') display.do_plotting('TIC, and PyMassSpec Detected Peaks') display.show_chart() |
x10
An example of parallel processing of data. Shows how to loop over all ICs in an intensity matrix, and perform noise smoothing on each IC (in parallel). Please see User Guide for instructions how to run this example on multiple CPUs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | """proc.py """ from pyms.GCMS.IO.ANDI import ANDI_reader from pyms.IntensityMatrix import build_intensity_matrix_i from pyms.Noise.Window import window_smooth # read the raw data as a GCMS_data object andi_file = "data/gc01_0812_066.cdf" data = ANDI_reader(andi_file) # build the intensity matrix im = build_intensity_matrix_i(data) # get the size of the intensity matrix n_scan, n_mz = im.size print("Size of the intensity matrix is (n_scans, n_mz):", n_scan, n_mz) # loop over all m/z values, fetch the corresponding IC, and perform # noise smoothing for ii in im.iter_ic_indices(): print(ii+1,) ic = im.get_ic_at_index(ii) ic_smooth = window_smooth(ic, window=7) |
Overview
PyMassSpec
uses tox to automate testing and packaging, and pre-commit to maintain code quality.
Install pre-commit
with pip
and install the git hook:
python -m pip install pre-commit
pre-commit install
Coding style
Yapf is used for code formatting, and isort is used to sort imports.
yapf
and isort
can be run manually via pre-commit
:
pre-commit run yapf -a
pre-commit run isort -a
The complete autoformatting suite can be run with pre-commit
:
pre-commit run -a
Automated tests
Tests are run with tox
and pytest
. To run tests for a specific Python version, such as Python 3.6, run:
tox -e py36
To run tests for all Python versions, simply run:
tox
A series of reference images for test_Display.py
are in the “tests/baseline” directory.
If these files need to be regenerated, run the following command:
pytest --mpl-generate-path="tests/baseline" tests/test_Display.py
Build documentation locally
The documentation is powered by Sphinx. A local copy of the documentation can be built with tox
:
tox -e docs
Downloading source code
The PyMassSpec
source code is available on GitHub,
and can be accessed from the following URL: https://github.com/PyMassSpec/PyMassSpec
If you have git
installed, you can clone the repository with the following command:
git clone https://github.com/PyMassSpec/PyMassSpec
Cloning into 'PyMassSpec'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (47/47), done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 173 (delta 16), reused 17 (delta 6), pack-reused 126
Receiving objects: 100% (173/173), 126.56 KiB | 678.00 KiB/s, done.
Resolving deltas: 100% (66/66), done.

Downloading a ‘zip’ file of the source code
Building from source
The recommended way to build PyMassSpec
is to use tox:
tox -e build
The source and wheel distributions will be in the directory dist
.
If you wish, you may also use pep517.build or another PEP 517-compatible build tool.
PyMassSpec coding Style Guide
Table of Contents
This document provides specific style conventions for PyMassSpec. It should be read in conjunction with PEP 8 “Style Guide for Python Code”, by Guido van Rossum and Barry Warsaw
General
Grouping commands and using newlines
Sort functions and class methods alphabetically, with dunder methods at the top.
Return copy.copy or copy.deepcopy only when this will not impact performance or otherwise absolutely necessary. Alternatively, use numpy.array().tolist().
Organise commands into logical groups, and separate if necessary with newlines to improve readability.
Example:
# -- snip --
if not isinstance(file_name, str):
raise TypeError("'file_name' must be a string")
try:
file = CDF(file_name)
self.__file_name = file_name
self.__file_handle = file
except CDFError:
error("Cannot open file '%s'" % file_name)
print(" -> Processing netCDF file '%s'" % (self.__file_name))
self.__set_min_max_mass(file)
self.__set_intensity_list(file)
# -- snip --
In block statements (such as for loops and if statements), do not use the blank line in a single group of statements; use one blank line to separate if the block contains more than one group of statements.
Examples:
# -- snip --
td_list = []
for ii in range(len(time_list) - 1):
td = time_list[ii + 1] - time_list[ii]
td_list.append(td)
# -- snip --
# -- snip ---
if len(time_list) > len(intensity_matrix):
self.set_scan_index()
scan_index_list = self.__scan_index_list
count = 0
while len(intensity_matrix) < len(time_list):
count = count + 1
scan = numpy.repeat([0], max_mass - min_mass + 1)
intensity_matrix.insert(0, scan)
# -- snip ---
File pointers
Use fp
for file pointer variables. If simultaneous use of two or more file
pointers is required, use fp1
, fp2
, etc.
Example:
with open("some_file.txt", 'w', encoding="UTF-8") as fp1:
with open("another.txt", 'w', encoding="UTF-8") as fp2:
pass
Short Comments
If a comment is short, the period at the end is best omitted. Longer comments of block comments generally consist of one or more paragraphs built out of complete sentences, and each sentence should end with a period.
Imports
Grouping
Group imports as:
Standard library imports
External module imports
Other PyMassSpec subpackage imports
This subpackage imports
Separate each group by a blank line.
Import forms
For standard library modules, always import the entire module name space. i.e.
# stdlib
import os
...
os.path()
Naming Styles
Variable names
Global variable names should be prefixed with an underscore to prevent their export from the module.
For Specific variable names:
Use
file_name
instead offilename
Use
fp
for file pointer, i.e.with open(file_name, 'r', encoding="UTF-8") as fp: pass
Module names
Module names should be short, starting with an uppercase letter (i.e. Utils.py).
Class names
Class names use the CapWords convention. Classes for internal use have a leading underscore in addition.
Exception Names
Exceptions should be handled via the function
pyms.Utils.Error.error()
.
Function Names
Function names should be lowercase, with words separated by underscores where suitable to improve readability.
Method Names
Method names should follow the same principles as the function names.
Internal methods and instance variables
Use one leading underscore only for internal methods and instance variables which are not intended to be part of the class’s public interface.
Class-private names
Use two leading underscores to denote class-private names, this includes
class-private methods (eg. __privfunc()
).
Note
Python “mangles” these names with the class name:
if class Foo has an attribute named __a
, it cannot be accessed by Foo.__a
.
(it still could be accessed by calling Foo._Foo__a
.)
Private/public class attributes
Public attributes should have no leading or trailing underscores. Private attributes should have two leading underscores, no trailing underscores. Non-public attributes should have a single leading underscore, no trailing underscores (the difference between private and non-public is that the former will never be useful for a derived class, while the latter might be).
Reminder: Python names with specific meanings
_single_leading_underscore
: weak “internal use” indicator (e.g. “from M import *
” does not import objects whose name starts with an underscore).single_trailing_underscore_
: used by convention to avoid conflicts with Python keyword, “Tkinter.Toplevel(master, class_='ClassName')
”.__double_leading_underscore
: class-private names as of Python 1.4.__double_leading_and_trailing_underscore__
: “magic” objects or attributes that live in user-controlled namespaces, e.g.__init__
,__import__
or__file__
.
Docstrings
General
All sub-packages, modules, functions, and classes must have proper Sphinx docstrings
When designating types for :type and :rtype, use the official names from the ‘types’ package i.e.
BooleanType
,StringType
,FileType
etc.All docstrings must start with a single summary sentence concisely describing the function, and this sentence must not be terminated by a period. Additional description may follow in the form of multi-sentenced paragraphs, separated by a blank line from the summary sentence - Leave one blank line above and below the docstring
Separate
:summary
,:param
/:type
,:return
/:rtype
,:author
strings with one blank line
Packages
Package doctrings are defined in __init__.py
. This example shows top three lines of pyms.__input__.py
:
Example:
"""
The root of the package pyms
"""
Modules
A summary for the module should be written concisely in a single sentence, enclosed above and below with lines containing only """
Example:
"""
Provides general I/O functions
"""
Functions
In all functions the following Sphinx tags must be defined:
:param
:return
:author
Other fields are optional.
The parameter and return types must be specified using type annotations per PEP 484.
Example:
# stdlib
from typing import IO
def open_for_reading(file_name: str) -> IO:
"""
Opens file for reading, returns file pointer
:param file_name: Name of the file to be opened for reading
:return: Pointer to the opened file
:author: Jake Blues
"""
Classes
The root class docstring must contain the
:author
field,
in addition to :param
and :return
fields for the __init__
method.
Other fields are optional.
__init__
should have no docstring.
Methods docstrings adhere to rules for Functions. Docstrings are optional for special methods (i.e.
__len__()
,__del__()
, etc).Class methods. The rules for functions apply, except that the tag
:author
does not need to be defined (if authors are given in the class docstring).Examples:
class ChemStation: """ ANDI-MS reader for Agilent ChemStation NetCDF files :param file_name: The name of the ANDI-MS file :author: Jake Blues """ def __init__(self, file_name: str): pass
License
PyMassSpec
is licensed under the GNU General Public License v2.0
The GNU GPL is the most widely used free software license and has a strong copyleft requirement. When distributing derived works, the source code of the work must be made available under the same license. There are multiple variants of the GNU GPL, each with different requirements.
Permissions | Conditions | Limitations |
---|---|---|
|
|
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.
View the Function Index or browse the Source Code.