Pre-processing#

  • Proper pre-processing is a prerequisite for stability, reproducability, performance and interpretability.

    • Stability: Reducing the impact of noise and outliers and transforming to a common distribution.

    • Reproducability: Common practices for pre-processing to ensure the results are valid across users, institutions etc.

    • Performance: Simplifying the job of the machine learning by not having to learn trivial generalisations (e.g., means, scales, etc.).

    • Interpretability: Being able to compare and recognise across datasets, methods and projects. Trusting the observed plots, statistics, etc.

Previously introduced pre-processing#

  • Simple statistics: mean, trimmed mean, median, standard deviation, Median Absolute Deviation, covariance, and Minimum Covariance Determiniant.

  • Series smoothing by various moving window techniques (mean, median, Gaussian, Savitzky-Golay), and second derivative constraints (Whittaker).

  • Decomposition: Fast Fourier Transform, Direct Cosine Transform, Seasonal Trend decomposition by LOESS smoothing, detrending by subtracting a smoothed signal.

Further pre-processing#

  • Imputation: Exchanging missing (or outlying) data with an estimate/guestimate based on training data or neighbours.

  • Time formatting: Representing measurements in a common time format and time zone.

  • Synchronisation: Making the measurements in two or more time series correspond (time points and resolution), e.g., by removal of non-overlapping regions, time accumulation or interpolation.

  • Encoding: Transform time series into variograms.