Zephyr Scientific Library (zscilib)
|
Statistics-related functions. More...
Files | |
file | statistics.h |
API header file for statistics in zscilib. | |
Data Structures | |
struct | zsl_sta_linreg |
Simple linear regression coefficients. More... | |
Functions | |
int | zsl_sta_mean (struct zsl_vec *v, zsl_real_t *m) |
Computes the arithmetic mean (average) of a vector. More... | |
int | zsl_sta_trim_mean (struct zsl_vec *v, zsl_real_t p, zsl_real_t *m) |
Computes the trimmed arithmetic mean (average) of a vector. More... | |
int | zsl_sta_weighted_mean (struct zsl_vec *v, struct zsl_vec *w, zsl_real_t *m) |
Computes the weighted arithmetic mean (average) of a data vector (v) and a weight vector (w). More... | |
int | zsl_sta_time_weighted_mean (struct zsl_vec *v, struct zsl_vec *t, zsl_real_t *m) |
Computes the time-weighted arithmetic mean (average) of a positive data vector (v) and its time vector (w). More... | |
int | zsl_sta_demean (struct zsl_vec *v, struct zsl_vec *w) |
Subtracts the mean of vector v from every component of the vector. The output vector w then has a zero mean. More... | |
int | zsl_sta_percentile (struct zsl_vec *v, zsl_real_t p, zsl_real_t *val) |
Computes the given percentile of a vector. More... | |
int | zsl_sta_median (struct zsl_vec *v, zsl_real_t *m) |
Computes the median of a vector (the value separating the higher half from the lower half of a data sample). More... | |
int | zsl_sta_weighted_median (struct zsl_vec *v, struct zsl_vec *w, zsl_real_t *m) |
Computes the weighted median of a data vector (v) and a weight vector (w). More... | |
int | zsl_sta_quart (struct zsl_vec *v, zsl_real_t *q1, zsl_real_t *q2, zsl_real_t *q3) |
Calculates the first, second and third quartiles of a vector v. More... | |
int | zsl_sta_quart_range (struct zsl_vec *v, zsl_real_t *r) |
Calculates the numeric difference between the third and the first quartiles of a vector v. More... | |
int | zsl_sta_mode (struct zsl_vec *v, struct zsl_vec *w) |
Computes the mode or modes of a vector v. More... | |
int | zsl_sta_data_range (struct zsl_vec *v, zsl_real_t *r) |
Computes the difference between the greatest value and the lowest in a vector v. More... | |
int | zsl_sta_mean_abs_dev (struct zsl_vec *v, zsl_real_t *m) |
Computes the mean absolute deviation of a data vector v. More... | |
int | zsl_sta_median_abs_dev (struct zsl_vec *v, zsl_real_t *m) |
Computes the median absolute deviation of a data vector v. More... | |
int | zsl_sta_var (struct zsl_vec *v, zsl_real_t *var) |
Computes the variance of a vector v (the average of the squared differences from the mean). More... | |
int | zsl_sta_std_dev (struct zsl_vec *v, zsl_real_t *s) |
Computes the standard deviation of vector v. More... | |
int | zsl_sta_covar (struct zsl_vec *v, struct zsl_vec *w, zsl_real_t *c) |
Computes the variance of two sets of data: v and w. More... | |
int | zsl_sta_covar_mtx (struct zsl_mtx *m, struct zsl_mtx *mc) |
Calculates the nxn covariance matrix of a set of n vectors of the same length. More... | |
int | zsl_sta_linear_reg (struct zsl_vec *x, struct zsl_vec *y, struct zsl_sta_linreg *c) |
Calculates the slope, intercept and correlation coefficient of the linear regression of two vectors, allowing us to make a prediction of y on the basis of x. More... | |
int | zsl_sta_mult_linear_reg (struct zsl_mtx *x, struct zsl_vec *y, struct zsl_vec *b, zsl_real_t *r) |
Calculates the coefficients (vector 'b') of the multiple linear regression of the x_i values (columns of the matrix 'x') and the y values. More... | |
int | zsl_sta_weighted_mult_linear_reg (struct zsl_mtx *x, struct zsl_vec *y, struct zsl_vec *w, struct zsl_vec *b, zsl_real_t *r) |
Calculates the coefficients (vector 'b') of the weighted multiple linear regression of the x_i values (columns of the matrix 'x'), the y values and the weights in the vector 'w'. More... | |
int | zsl_sta_quad_fit (struct zsl_mtx *m, struct zsl_vec *b) |
This function uses the least squares fitting method to compute the coefficients of a quadric surface given a set of tridimensional points. More... | |
int | zsl_sta_abs_err (zsl_real_t *val, zsl_real_t *exp_val, zsl_real_t *err) |
Calculates the absolute error given a value and its expected value. More... | |
int | zsl_sta_rel_err (zsl_real_t *val, zsl_real_t *exp_val, zsl_real_t *err) |
Calculates the relative error given a value and its expected value. More... | |
int | zsl_sta_sta_err (struct zsl_vec *v, zsl_real_t *err) |
Calculates the standard error of the mean of a sample (vector v). More... | |
Statistics-related functions.
int zsl_sta_abs_err | ( | zsl_real_t * | val, |
zsl_real_t * | exp_val, | ||
zsl_real_t * | err | ||
) |
Calculates the absolute error given a value and its expected value.
val | Input value. |
exp_val | Input expected value. |
err | Output absolute error. |
Definition at line 665 of file statistics.c.
int zsl_sta_covar | ( | struct zsl_vec * | v, |
struct zsl_vec * | w, | ||
zsl_real_t * | c | ||
) |
Computes the variance of two sets of data: v and w.
v | First set of data. |
w | Second set of data. |
c | Covariance of the vectors v and w. |
Definition at line 383 of file statistics.c.
Referenced by zsl_sta_covar_mtx().
Calculates the nxn covariance matrix of a set of n vectors of the same length.
m | Input matrix, whose columns are the different data sets. |
mc | Output nxn covariance matrix. |
Definition at line 408 of file statistics.c.
int zsl_sta_data_range | ( | struct zsl_vec * | v, |
zsl_real_t * | r | ||
) |
Computes the difference between the greatest value and the lowest in a vector v.
v | The vector to use. |
r | The range of the data in v. |
Definition at line 309 of file statistics.c.
Subtracts the mean of vector v from every component of the vector. The output vector w then has a zero mean.
v | The vector to use. |
w | The output vector with zero mean. |
Definition at line 140 of file statistics.c.
Referenced by zsl_sta_covar(), zsl_sta_mean_abs_dev(), and zsl_sta_var().
int zsl_sta_linear_reg | ( | struct zsl_vec * | x, |
struct zsl_vec * | y, | ||
struct zsl_sta_linreg * | c | ||
) |
Calculates the slope, intercept and correlation coefficient of the linear regression of two vectors, allowing us to make a prediction of y on the basis of x.
Simple linear regression is useful for predicting a quantitative response. It assumes that there is an approximately linear relationship between vector x and vector y, and calculates a series of coefficients to project this relationship in either direction.
The output of this function is a slope and intercept value, such that the resulting line closely tracks the linear progression of the input samples. The correlation coefficient estimates the 'closeness' of the match.
Given the equation 'y = slope * x + intercept', where we provide x, we can estimate the y value for a arbitrary value of x, where x is related to the range of values provided in vector 'x' (the x axis), and y is related to the values provided in vector 'y' (the y axis).
Simple linear regression is a special case of the multiple linear regression (see below). The correlation coefficient is the square root of the coefficient of determination, a measure useful in multiple linear regression.
x | The first input vector, corresponding to the x-axis. |
y | The second input vector, corresponding to the y-axis. |
c | Pointer to the calculated linear regression coefficients. |
Definition at line 434 of file statistics.c.
int zsl_sta_mean | ( | struct zsl_vec * | v, |
zsl_real_t * | m | ||
) |
Computes the arithmetic mean (average) of a vector.
v | The vector to use. |
m | The arithmetic mean of the components of v. |
Definition at line 14 of file statistics.c.
Referenced by zsl_sta_demean(), zsl_sta_mult_linear_reg(), zsl_sta_trim_mean(), and zsl_sta_weighted_mult_linear_reg().
int zsl_sta_mean_abs_dev | ( | struct zsl_vec * | v, |
zsl_real_t * | m | ||
) |
Computes the mean absolute deviation of a data vector v.
The mean absolute deviation is calculated by computing the mean of the de-meaned data vector, i. e., the arithmetic mean of the absolute value of each value in v minus the mean of the data in 'v'. This number describes the average deviation from the arithmetic mean of the dataset in the vector 'v'.
v | The vector to use. |
m | The mean absolute deviation. |
Definition at line 320 of file statistics.c.
int zsl_sta_median | ( | struct zsl_vec * | v, |
zsl_real_t * | m | ||
) |
Computes the median of a vector (the value separating the higher half from the lower half of a data sample).
v | The vector to use. |
m | The median of the components of v. |
Definition at line 184 of file statistics.c.
Referenced by zsl_sta_median_abs_dev().
int zsl_sta_median_abs_dev | ( | struct zsl_vec * | v, |
zsl_real_t * | m | ||
) |
Computes the median absolute deviation of a data vector v.
The mean absolute deviation is calculated by computing the median of the absolute value of each value in 'v' minus the median of the data in 'v'. This provides a robust estimate of variability.
v | The vector to use. |
m | The median absolute deviation. |
Definition at line 341 of file statistics.c.
Computes the mode or modes of a vector v.
v | The vector to use. |
w | Output vector whose components are the modes. If there is only one mode, the length of w will be 1. |
Definition at line 257 of file statistics.c.
int zsl_sta_mult_linear_reg | ( | struct zsl_mtx * | x, |
struct zsl_vec * | y, | ||
struct zsl_vec * | b, | ||
zsl_real_t * | r | ||
) |
Calculates the coefficients (vector 'b') of the multiple linear regression of the x_i values (columns of the matrix 'x') and the y values.
x | Matrix, whose columns are the different x_i datasets. |
y | The second input dataset, corresponding to the y-axis. |
b | Pointer to the calculated multiple linear regression coefficients. |
r | Pointer to the calculated coefficient of determination (also reffered to as R squared). |
Definition at line 466 of file statistics.c.
int zsl_sta_percentile | ( | struct zsl_vec * | v, |
zsl_real_t | p, | ||
zsl_real_t * | val | ||
) |
Computes the given percentile of a vector.
v | The input vector. |
p | The percentile to be calculated. |
val | The output value. |
Definition at line 158 of file statistics.c.
Referenced by zsl_sta_median(), zsl_sta_quart(), zsl_sta_quart_range(), and zsl_sta_trim_mean().
This function uses the least squares fitting method to compute the coefficients of a quadric surface given a set of tridimensional points.
A quadric is a 3D surface that is defined by the equation: Ax^2 + By^2 + Cz^2 + 2Dxy + 2Exz + 2Fyz + 2Gx + 2Hy + 2Iz = 1. Spheres and ellipsoids are special cases of quadrics. This function takes a set of points (x,y,z) and returns the coeffitiens (A, B, C, D, E, F, G, H, I) of the quadric surface that best fit the given points.
m | Matrix, whose rows are the (x, y, z) points. |
b | Pointer to the calculated coefficients of the quadric. |
Definition at line 618 of file statistics.c.
int zsl_sta_quart | ( | struct zsl_vec * | v, |
zsl_real_t * | q1, | ||
zsl_real_t * | q2, | ||
zsl_real_t * | q3 | ||
) |
Calculates the first, second and third quartiles of a vector v.
v | The vector to use. |
q1 | The first quartile of v. |
q2 | The second quartile of v, also the median of v. |
q3 | The third quartile of v. |
Definition at line 235 of file statistics.c.
int zsl_sta_quart_range | ( | struct zsl_vec * | v, |
zsl_real_t * | r | ||
) |
Calculates the numeric difference between the third and the first quartiles of a vector v.
v | The input vector. |
r | The interquartile range of v. |
Definition at line 245 of file statistics.c.
int zsl_sta_rel_err | ( | zsl_real_t * | val, |
zsl_real_t * | exp_val, | ||
zsl_real_t * | err | ||
) |
Calculates the relative error given a value and its expected value.
val | Input value. |
exp_val | Input expected value. |
err | Output relative error. |
Definition at line 672 of file statistics.c.
int zsl_sta_sta_err | ( | struct zsl_vec * | v, |
zsl_real_t * | err | ||
) |
Calculates the standard error of the mean of a sample (vector v).
The standard error of the mean measures how far the arithmetic mean of the sample in vector 'v' ยก is likely to be from the true total population mean.
v | Sample data vector. |
err | Output standard error of the mean. |
Definition at line 679 of file statistics.c.
int zsl_sta_std_dev | ( | struct zsl_vec * | v, |
zsl_real_t * | s | ||
) |
Computes the standard deviation of vector v.
Standard deviation is an indication of how spread-out numbers in 'v' are, relative to the mean. It helps differentiate what is in the "standard" range (1 standard deviation from mean), and what is outside (above or below) this range, to pick out statistical outliers.
v | The vector to use. |
s | The output standard deviation of the vector v. |
Definition at line 372 of file statistics.c.
int zsl_sta_time_weighted_mean | ( | struct zsl_vec * | v, |
struct zsl_vec * | t, | ||
zsl_real_t * | m | ||
) |
Computes the time-weighted arithmetic mean (average) of a positive data vector (v) and its time vector (w).
The time-weighted mean takes into consideration not only the numerical levels of a particular variable, but also the amount of time spent on it.
v | The data vector to use, with positive coefficients. |
t | The vector containing the time associated to the data vector. |
m | The time-weighted arithmetic mean of the components of v taking the times in the vector t into account. |
Definition at line 93 of file statistics.c.
int zsl_sta_trim_mean | ( | struct zsl_vec * | v, |
zsl_real_t | p, | ||
zsl_real_t * | m | ||
) |
Computes the trimmed arithmetic mean (average) of a vector.
The trimmed arithmetic mean of a dataset is described by a number (in this case 'p') from 0 to 50 that describes the percent of the data that will not be taken into account when computing the mean. Thus, a 3% trimmed mean will only use 94% of the data to calculate the arithmetic mean, and will ignore the lowest 3% of data and the highest 3% of data in the sorted data vector.
v | The vector to use. |
p | The percent of data that will be ignored in the computation of the mean (0.0 .. 50.0). |
m | The trimmed arithmetic mean of the components of v. |
Definition at line 21 of file statistics.c.
int zsl_sta_var | ( | struct zsl_vec * | v, |
zsl_real_t * | var | ||
) |
Computes the variance of a vector v (the average of the squared differences from the mean).
v | The vector to use. |
var | The variance of v. |
Definition at line 356 of file statistics.c.
Referenced by zsl_sta_sta_err(), and zsl_sta_std_dev().
int zsl_sta_weighted_mean | ( | struct zsl_vec * | v, |
struct zsl_vec * | w, | ||
zsl_real_t * | m | ||
) |
Computes the weighted arithmetic mean (average) of a data vector (v) and a weight vector (w).
v | The data vector to use. |
w | The vector containing the weights to use. |
m | The weighted arithmetic mean of the components of v taking the weights in the vector w into account. |
Definition at line 57 of file statistics.c.
int zsl_sta_weighted_median | ( | struct zsl_vec * | v, |
struct zsl_vec * | w, | ||
zsl_real_t * | m | ||
) |
Computes the weighted median of a data vector (v) and a weight vector (w).
v | The data vector to use. |
w | The vector containing the weights to use. |
m | The weighted median of the components of v taking the weights in the vector w into account. |
Definition at line 191 of file statistics.c.
int zsl_sta_weighted_mult_linear_reg | ( | struct zsl_mtx * | x, |
struct zsl_vec * | y, | ||
struct zsl_vec * | w, | ||
struct zsl_vec * | b, | ||
zsl_real_t * | r | ||
) |
Calculates the coefficients (vector 'b') of the weighted multiple linear regression of the x_i values (columns of the matrix 'x'), the y values and the weights in the vector 'w'.
x | Matrix, whose columns are the different x_i datasets. |
y | The second input dataset, corresponding to the y-axis. |
w | The weights to use in the weighted least squares. |
b | Pointer to the calculated weighted multiple linear regression coefficients. |
r | Pointer to the calculated coefficient of determination (also reffered to as R squared). |
Definition at line 535 of file statistics.c.