Pre-processing

Pre-processing#

Proper pre-processing is a prerequisite for stability, reproducability, performance and interpretability.
- Stability: Reducing the impact of noise and outliers and transforming to a common distribution.
- Reproducability: Common practices for pre-processing to ensure the results are valid across users, institutions etc.
- Performance: Simplifying the job of the machine learning by not having to learn trivial generalisations (e.g., means, scales, etc.).
- Interpretability: Being able to compare and recognise across datasets, methods and projects. Trusting the observed plots, statistics, etc.

Simple statistics: mean, trimmed mean, median, standard deviation, Median Absolute Deviation, covariance, and Minimum Covariance Determiniant.
Series smoothing by various moving window techniques (mean, median, Gaussian, Savitzky-Golay), and second derivative constraints (Whittaker).
Decomposition: Fast Fourier Transform, Direct Cosine Transform, Seasonal Trend decomposition by LOESS smoothing, detrending by subtracting a smoothed signal.

Imputation: Exchanging missing (or outlying) data with an estimate/guestimate based on training data or neighbours.
Time formatting: Representing measurements in a common time format and time zone.
Synchronisation: Making the measurements in two or more time series correspond (time points and resolution), e.g., by removal of non-overlapping regions, time accumulation or interpolation.
Encoding: Transform time series into variograms.