pyms.Gapfill

Gap Filling Routines.

pyms.Gapfill.Class

Provides a class for handling Missing Peaks in an output file (i.e. area.csv).

Classes:

MissingPeak(common_ion, qual_ion_1, qual_ion_2)

Class to encapsulate a peak object identified as missing in the output area matrix fom PyMassSpec.

Sample(sample_name, matrix_position)

A collection of MissingPeak objects.

class MissingPeak(common_ion, qual_ion_1, qual_ion_2, rt=0.0)[source]

Bases: object

Class to encapsulate a peak object identified as missing in the output area matrix fom PyMassSpec.

Parameters
  • common_ion (int) – Common ion for the peak across samples in an experiment.

  • qual_ion_1 (int) – The top (most abundant) ion for the peak object

  • qual_ion_2 (int) – The second most abundant ion for the peak object

  • rt (float) – Retention time of the peak. Default 0.0.

Authors

Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster

Attributes:

common_ion

Returns the common ion for the peak object across an experiment.

common_ion_area

The area of the common ion

exact_rt

The retention time of the apex of the peak

qual_ion1

Returns the top (most abundant) ion for the peak object.

qual_ion2

Returns the second most abundant ion for the peak object.

rt

Returns the retention time of the peak.

property common_ion

Returns the common ion for the peak object across an experiment.

Return type

int

Returns

Common ion for the peak

Author

Jairus Bowne

common_ion_area

Type:    Optional[float]

The area of the common ion

exact_rt

Type:    Optional[float]

The retention time of the apex of the peak

property qual_ion1

Returns the top (most abundant) ion for the peak object.

Return type

int

Returns

Most abundant ion

Author

Jairus Bowne

property qual_ion2

Returns the second most abundant ion for the peak object.

Return type

int

Returns

Second most abundant ion

Author

Jairus Bowne

property rt

Returns the retention time of the peak.

Return type

float

class Sample(sample_name, matrix_position)[source]

Bases: object

A collection of MissingPeak objects.

Parameters
  • sample_name (str) – the experiment code/name.

  • matrix_position (int) – position along x-axis where sample is located.

Authors

Sean O’Callaghan, Dominic Davis-Foster (properties)

Methods:

add_missing_peak(missing_peak)

Add a new MissingPeak object to the Sample.

get_mp_rt_exact_rt_dict()

Returns a dictionary containing average_rt : exact_rt pairs.

Attributes:

missing_peaks

Returns a list of the MissingPeak objects in the Sample object.

name

Returns name of the sample.

rt_areas

Returns a dictionary containing rt : area pairs.

add_missing_peak(missing_peak)[source]

Add a new MissingPeak object to the Sample.

Parameters

missing_peak (MissingPeak) – The missing peak object to be added.

get_mp_rt_exact_rt_dict()[source]

Returns a dictionary containing average_rt : exact_rt pairs.

Return type

Dict[float, Optional[float]]

property missing_peaks

Returns a list of the MissingPeak objects in the Sample object.

Return type

List[MissingPeak]

property name

Returns name of the sample.

Return type

str

property rt_areas

Returns a dictionary containing rt : area pairs.

Return type

Dict[float, Optional[float]]

pyms.Gapfill.Function

Functions to fill missing peak objects.

Classes:

MissingPeakFiletype(value)

Flag to indicate the filetype for pyms.Gapfill.Function.missing_peak_finder().

Functions:

file2dataframe(file_name)

Convert a .csv file to a pandas DataFrame.

missing_peak_finder(sample, file_name[, …])

Integrates raw data around missing peak locations to fill NAs in the data matrix.

mp_finder(input_matrix)

Finds the 'NA's in the transformed area_ci.csv file and makes pyms.Gapfill.Class.Sample objects with them

write_filled_csv(sample_list, area_file, …)

Creates a new area_ci.csv file, replacing NAs with values from the sample_list objects where possible.

write_filled_rt_csv(sample_list, rt_file, …)

Creates a new rt.csv file, replacing 'NA's with values from the sample_list objects where possible.

enum MissingPeakFiletype(value)[source]

Bases: enum_tools.custom_enums.IntEnum

Flag to indicate the filetype for pyms.Gapfill.Function.missing_peak_finder().

New in version 2.3.0.

Member Type

int

Valid values are as follows:

MZML = <MissingPeakFiletype.MZML: 1>
NETCDF = <MissingPeakFiletype.NETCDF: 2>
file2dataframe(file_name)[source]

Convert a .csv file to a pandas DataFrame.

Parameters

file_name (Union[str, Path, PathLike]) – CSV file to read.

Authors

Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster (pathlib support)

New in version 2.3.0.

Return type

DataFrame

missing_peak_finder(sample, file_name, points=3, null_ions=None, crop_ions=None, threshold=1000, rt_window=1, filetype=<MissingPeakFiletype.MZML: 1>)[source]

Integrates raw data around missing peak locations to fill NAs in the data matrix.

Parameters
  • sample (Sample) – The sample object containing missing peaks

  • file_name (str) – Name of the raw data file

  • points (int) – Peak finding - Peak if maxima over ‘points’ number of scans. Default 3.

  • null_ions (Optional[List]) – Ions to be deleted in the matrix. Default [73, 147].

  • crop_ions (Optional[List]) – Range of Ions to be considered. Default [50, 540].

  • threshold (int) – Minimum intensity of IonChromatogram allowable to fill. Default 1000.

  • rt_window (float) – Window in seconds around average RT to look for. Default 1.

  • filetype (MissingPeakFiletype) – Default <MissingPeakFiletype.MZML: 1>.

Author

Sean O’Callaghan

mp_finder(input_matrix)[source]

Finds the 'NA's in the transformed area_ci.csv file and makes pyms.Gapfill.Class.Sample objects with them

Parameters

input_matrix (List) – Data matrix derived from the area_ci.csv file.

Return type

List[Sample]

Authors

Jairus Bowne, Sean O’Callaghan

write_filled_csv(sample_list, area_file, filled_area_file)[source]

Creates a new area_ci.csv file, replacing NAs with values from the sample_list objects where possible.

Parameters
Authors

Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster

write_filled_rt_csv(sample_list, rt_file, filled_rt_file)[source]

Creates a new rt.csv file, replacing 'NA's with values from the sample_list objects where possible.

Parameters
Authors

Jairus Bowne, Sean O’Callaghan, Dominic Davis-Foster