pyms.Gapfill

Gap Filling Routines.

pyms.Gapfill.Class

Provides a class for handling Missing Peaks in an output file (i.e. area.csv).

Classes:

MissingPeak(common_ion, qual_ion_1, qual_ion_2)

Class to encapsulate a peak object identified as missing in the output area matrix fom PyMassSpec.

Sample(sample_name, matrix_position)

A collection of MissingPeak objects.

class MissingPeak(common_ion, qual_ion_1, qual_ion_2, rt=0.0)[source]

Bases: object

Class to encapsulate a peak object identified as missing in the output area matrix fom PyMassSpec.

Parameters:
  • common_ion (int) – Common ion for the peak across samples in an experiment.

  • qual_ion_1 (int) – The top (most abundant) ion for the peak object

  • qual_ion_2 (int) – The second most abundant ion for the peak object

  • rt (float) – Retention time of the peak. Default 0.0.

Authors:

Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster

Attributes:

common_ion

Returns the common ion for the peak object across an experiment.

common_ion_area

The area of the common ion

exact_rt

The retention time of the apex of the peak

qual_ion1

Returns the top (most abundant) ion for the peak object.

qual_ion2

Returns the second most abundant ion for the peak object.

rt

Returns the retention time of the peak.

property common_ion

Returns the common ion for the peak object across an experiment.

Return type:

int

Returns:

Common ion for the peak

Author:

Jairus Bowne

common_ion_area

Type:    Optional[float]

The area of the common ion

exact_rt

Type:    Optional[float]

The retention time of the apex of the peak

property qual_ion1

Returns the top (most abundant) ion for the peak object.

Return type:

int

Returns:

Most abundant ion

Author:

Jairus Bowne

property qual_ion2

Returns the second most abundant ion for the peak object.

Return type:

int

Returns:

Second most abundant ion

Author:

Jairus Bowne

property rt

Returns the retention time of the peak.

Return type:

float

class Sample(sample_name, matrix_position)[source]

Bases: object

A collection of MissingPeak objects.

Parameters:
  • sample_name (str) – the experiment code/name.

  • matrix_position (int) – position along x-axis where sample is located.

Authors:

Sean O’Callaghan, Dominic Davis-Foster (properties)

Methods:

add_missing_peak(missing_peak)

Add a new MissingPeak object to the Sample.

get_mp_rt_exact_rt_dict()

Returns a dictionary containing average_rt : exact_rt pairs.

Attributes:

missing_peaks

Returns a list of the MissingPeak objects in the Sample object.

name

Returns name of the sample.

rt_areas

Returns a dictionary containing rt : area pairs.

add_missing_peak(missing_peak)[source]

Add a new MissingPeak object to the Sample.

Parameters:

missing_peak (MissingPeak) – The missing peak object to be added.

get_mp_rt_exact_rt_dict()[source]

Returns a dictionary containing average_rt : exact_rt pairs.

Return type:

Dict[float, Optional[float]]

property missing_peaks

Returns a list of the MissingPeak objects in the Sample object.

Return type:

List[MissingPeak]

property name

Returns name of the sample.

Return type:

str

property rt_areas

Returns a dictionary containing rt : area pairs.

Return type:

Dict[float, Optional[float]]

pyms.Gapfill.Function

Functions to fill missing peak objects.

Classes:

MissingPeakFiletype(value)

Flag to indicate the filetype for pyms.Gapfill.Function.missing_peak_finder().

Functions:

file2dataframe(file_name)

Convert a .csv file to a pandas DataFrame.

missing_peak_finder(sample, file_name[, ...])

Integrates raw data around missing peak locations to fill NAs in the data matrix.

mp_finder(input_matrix)

Finds the 'NA's in the transformed area_ci.csv file and makes pyms.Gapfill.Class.Sample objects with them

write_filled_csv(sample_list, area_file, ...)

Creates a new area_ci.csv file, replacing NAs with values from the sample_list objects where possible.

write_filled_rt_csv(sample_list, rt_file, ...)

Creates a new rt.csv file, replacing 'NA's with values from the sample_list objects where possible.

enum MissingPeakFiletype(value)[source]

Bases: IntEnum

Flag to indicate the filetype for pyms.Gapfill.Function.missing_peak_finder().

New in version 2.3.0.

Member Type:

int

Valid values are as follows:

MZML = <MissingPeakFiletype.MZML: 1>
NETCDF = <MissingPeakFiletype.NETCDF: 2>
file2dataframe(file_name)[source]

Convert a .csv file to a pandas DataFrame.

Parameters:

file_name (Union[str, Path, PathLike]) – CSV file to read.

Authors:

Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster (pathlib support)

New in version 2.3.0.

Return type:

DataFrame

missing_peak_finder(sample, file_name, points=3, null_ions=None, crop_ions=None, threshold=1000, rt_window=1, filetype=MissingPeakFiletype.MZML)[source]

Integrates raw data around missing peak locations to fill NAs in the data matrix.

Parameters:
  • sample (Sample) – The sample object containing missing peaks

  • file_name (str) – Name of the raw data file

  • points (int) – Peak finding - Peak if maxima over ‘points’ number of scans. Default 3.

  • null_ions (Optional[List]) – Ions to be deleted in the matrix. Default [73, 147].

  • crop_ions (Optional[List]) – Range of Ions to be considered. Default [50, 540].

  • threshold (int) – Minimum intensity of IonChromatogram allowable to fill. Default 1000.

  • rt_window (float) – Window in seconds around average RT to look for. Default 1.

  • filetype (MissingPeakFiletype) – Default <MissingPeakFiletype.MZML: 1>.

Author:

Sean O’Callaghan

mp_finder(input_matrix)[source]

Finds the 'NA's in the transformed area_ci.csv file and makes pyms.Gapfill.Class.Sample objects with them

Parameters:

input_matrix (List) – Data matrix derived from the area_ci.csv file.

Return type:

List[Sample]

Authors:

Jairus Bowne, Sean O’Callaghan

write_filled_csv(sample_list, area_file, filled_area_file)[source]

Creates a new area_ci.csv file, replacing NAs with values from the sample_list objects where possible.

Parameters:
Authors:

Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster

write_filled_rt_csv(sample_list, rt_file, filled_rt_file)[source]

Creates a new rt.csv file, replacing 'NA's with values from the sample_list objects where possible.

Parameters:
Authors:

Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster