Zephyr Scientific Library (zscilib)
Files | Data Structures | Functions
Statistics

Statistics-related functions. More...

Files

file  statistics.h
 API header file for statistics in zscilib.
 

Data Structures

struct  zsl_sta_linreg
 Simple linear regression coefficients. More...
 

Functions

int zsl_sta_mean (struct zsl_vec *v, zsl_real_t *m)
 Computes the arithmetic mean (average) of a vector. More...
 
int zsl_sta_trim_mean (struct zsl_vec *v, zsl_real_t p, zsl_real_t *m)
 Computes the trimmed arithmetic mean (average) of a vector. More...
 
int zsl_sta_weighted_mean (struct zsl_vec *v, struct zsl_vec *w, zsl_real_t *m)
 Computes the weighted arithmetic mean (average) of a data vector (v) and a weight vector (w). More...
 
int zsl_sta_time_weighted_mean (struct zsl_vec *v, struct zsl_vec *t, zsl_real_t *m)
 Computes the time-weighted arithmetic mean (average) of a positive data vector (v) and its time vector (w). More...
 
int zsl_sta_demean (struct zsl_vec *v, struct zsl_vec *w)
 Subtracts the mean of vector v from every component of the vector. The output vector w then has a zero mean. More...
 
int zsl_sta_percentile (struct zsl_vec *v, zsl_real_t p, zsl_real_t *val)
 Computes the given percentile of a vector. More...
 
int zsl_sta_median (struct zsl_vec *v, zsl_real_t *m)
 Computes the median of a vector (the value separating the higher half from the lower half of a data sample). More...
 
int zsl_sta_weighted_median (struct zsl_vec *v, struct zsl_vec *w, zsl_real_t *m)
 Computes the weighted median of a data vector (v) and a weight vector (w). More...
 
int zsl_sta_quart (struct zsl_vec *v, zsl_real_t *q1, zsl_real_t *q2, zsl_real_t *q3)
 Calculates the first, second and third quartiles of a vector v. More...
 
int zsl_sta_quart_range (struct zsl_vec *v, zsl_real_t *r)
 Calculates the numeric difference between the third and the first quartiles of a vector v. More...
 
int zsl_sta_mode (struct zsl_vec *v, struct zsl_vec *w)
 Computes the mode or modes of a vector v. More...
 
int zsl_sta_data_range (struct zsl_vec *v, zsl_real_t *r)
 Computes the difference between the greatest value and the lowest in a vector v. More...
 
int zsl_sta_mean_abs_dev (struct zsl_vec *v, zsl_real_t *m)
 Computes the mean absolute deviation of a data vector v. More...
 
int zsl_sta_median_abs_dev (struct zsl_vec *v, zsl_real_t *m)
 Computes the median absolute deviation of a data vector v. More...
 
int zsl_sta_var (struct zsl_vec *v, zsl_real_t *var)
 Computes the variance of a vector v (the average of the squared differences from the mean). More...
 
int zsl_sta_std_dev (struct zsl_vec *v, zsl_real_t *s)
 Computes the standard deviation of vector v. More...
 
int zsl_sta_covar (struct zsl_vec *v, struct zsl_vec *w, zsl_real_t *c)
 Computes the variance of two sets of data: v and w. More...
 
int zsl_sta_covar_mtx (struct zsl_mtx *m, struct zsl_mtx *mc)
 Calculates the nxn covariance matrix of a set of n vectors of the same length. More...
 
int zsl_sta_linear_reg (struct zsl_vec *x, struct zsl_vec *y, struct zsl_sta_linreg *c)
 Calculates the slope, intercept and correlation coefficient of the linear regression of two vectors, allowing us to make a prediction of y on the basis of x. More...
 
int zsl_sta_mult_linear_reg (struct zsl_mtx *x, struct zsl_vec *y, struct zsl_vec *b, zsl_real_t *r)
 Calculates the coefficients (vector 'b') of the multiple linear regression of the x_i values (columns of the matrix 'x') and the y values. More...
 
int zsl_sta_weighted_mult_linear_reg (struct zsl_mtx *x, struct zsl_vec *y, struct zsl_vec *w, struct zsl_vec *b, zsl_real_t *r)
 Calculates the coefficients (vector 'b') of the weighted multiple linear regression of the x_i values (columns of the matrix 'x'), the y values and the weights in the vector 'w'. More...
 
int zsl_sta_quad_fit (struct zsl_mtx *m, struct zsl_vec *b)
 This function uses the least squares fitting method to compute the coefficients of a quadric surface given a set of tridimensional points. More...
 
int zsl_sta_abs_err (zsl_real_t *val, zsl_real_t *exp_val, zsl_real_t *err)
 Calculates the absolute error given a value and its expected value. More...
 
int zsl_sta_rel_err (zsl_real_t *val, zsl_real_t *exp_val, zsl_real_t *err)
 Calculates the relative error given a value and its expected value. More...
 
int zsl_sta_sta_err (struct zsl_vec *v, zsl_real_t *err)
 Calculates the standard error of the mean of a sample (vector v). More...
 

Detailed Description

Statistics-related functions.

Function Documentation

◆ zsl_sta_abs_err()

int zsl_sta_abs_err ( zsl_real_t val,
zsl_real_t exp_val,
zsl_real_t err 
)

Calculates the absolute error given a value and its expected value.

Parameters
valInput value.
exp_valInput expected value.
errOutput absolute error.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 665 of file statistics.c.

◆ zsl_sta_covar()

int zsl_sta_covar ( struct zsl_vec v,
struct zsl_vec w,
zsl_real_t c 
)

Computes the variance of two sets of data: v and w.

Parameters
vFirst set of data.
wSecond set of data.
cCovariance of the vectors v and w.
Returns
0 on success, and -EINVAL if the vectors aren't identically sized.

Definition at line 383 of file statistics.c.

Referenced by zsl_sta_covar_mtx().

◆ zsl_sta_covar_mtx()

int zsl_sta_covar_mtx ( struct zsl_mtx m,
struct zsl_mtx mc 
)

Calculates the nxn covariance matrix of a set of n vectors of the same length.

Parameters
mInput matrix, whose columns are the different data sets.
mcOutput nxn covariance matrix.
Returns
0 on success, and -EINVAL if 'mc' is not a square matrix with the same number of columns as 'm'.

Definition at line 408 of file statistics.c.

◆ zsl_sta_data_range()

int zsl_sta_data_range ( struct zsl_vec v,
zsl_real_t r 
)

Computes the difference between the greatest value and the lowest in a vector v.

Parameters
vThe vector to use.
rThe range of the data in v.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 309 of file statistics.c.

◆ zsl_sta_demean()

int zsl_sta_demean ( struct zsl_vec v,
struct zsl_vec w 
)

Subtracts the mean of vector v from every component of the vector. The output vector w then has a zero mean.

Parameters
vThe vector to use.
wThe output vector with zero mean.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 140 of file statistics.c.

Referenced by zsl_sta_covar(), zsl_sta_mean_abs_dev(), and zsl_sta_var().

◆ zsl_sta_linear_reg()

int zsl_sta_linear_reg ( struct zsl_vec x,
struct zsl_vec y,
struct zsl_sta_linreg c 
)

Calculates the slope, intercept and correlation coefficient of the linear regression of two vectors, allowing us to make a prediction of y on the basis of x.

Simple linear regression is useful for predicting a quantitative response. It assumes that there is an approximately linear relationship between vector x and vector y, and calculates a series of coefficients to project this relationship in either direction.

The output of this function is a slope and intercept value, such that the resulting line closely tracks the linear progression of the input samples. The correlation coefficient estimates the 'closeness' of the match.

Given the equation 'y = slope * x + intercept', where we provide x, we can estimate the y value for a arbitrary value of x, where x is related to the range of values provided in vector 'x' (the x axis), and y is related to the values provided in vector 'y' (the y axis).

Simple linear regression is a special case of the multiple linear regression (see below). The correlation coefficient is the square root of the coefficient of determination, a measure useful in multiple linear regression.

Parameters
xThe first input vector, corresponding to the x-axis.
yThe second input vector, corresponding to the y-axis.
cPointer to the calculated linear regression coefficients.
Returns
0 on success, and -EINVAL if the vectors aren't identically sized.

Definition at line 434 of file statistics.c.

◆ zsl_sta_mean()

int zsl_sta_mean ( struct zsl_vec v,
zsl_real_t m 
)

Computes the arithmetic mean (average) of a vector.

Parameters
vThe vector to use.
mThe arithmetic mean of the components of v.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 14 of file statistics.c.

Referenced by zsl_sta_demean(), zsl_sta_mult_linear_reg(), zsl_sta_trim_mean(), and zsl_sta_weighted_mult_linear_reg().

◆ zsl_sta_mean_abs_dev()

int zsl_sta_mean_abs_dev ( struct zsl_vec v,
zsl_real_t m 
)

Computes the mean absolute deviation of a data vector v.

The mean absolute deviation is calculated by computing the mean of the de-meaned data vector, i. e., the arithmetic mean of the absolute value of each value in v minus the mean of the data in 'v'. This number describes the average deviation from the arithmetic mean of the dataset in the vector 'v'.

Parameters
vThe vector to use.
mThe mean absolute deviation.
Returns
0 if everything executed correctly. If the dimension of the data vector v is zero, a negative error is returned.

Definition at line 320 of file statistics.c.

◆ zsl_sta_median()

int zsl_sta_median ( struct zsl_vec v,
zsl_real_t m 
)

Computes the median of a vector (the value separating the higher half from the lower half of a data sample).

Parameters
vThe vector to use.
mThe median of the components of v.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 184 of file statistics.c.

Referenced by zsl_sta_median_abs_dev().

◆ zsl_sta_median_abs_dev()

int zsl_sta_median_abs_dev ( struct zsl_vec v,
zsl_real_t m 
)

Computes the median absolute deviation of a data vector v.

The mean absolute deviation is calculated by computing the median of the absolute value of each value in 'v' minus the median of the data in 'v'. This provides a robust estimate of variability.

Parameters
vThe vector to use.
mThe median absolute deviation.
Returns
0 if everything executed correctly. otherwise an appropriate error code.

Definition at line 341 of file statistics.c.

◆ zsl_sta_mode()

int zsl_sta_mode ( struct zsl_vec v,
struct zsl_vec w 
)

Computes the mode or modes of a vector v.

Parameters
vThe vector to use.
wOutput vector whose components are the modes. If there is only one mode, the length of w will be 1.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 257 of file statistics.c.

◆ zsl_sta_mult_linear_reg()

int zsl_sta_mult_linear_reg ( struct zsl_mtx x,
struct zsl_vec y,
struct zsl_vec b,
zsl_real_t r 
)

Calculates the coefficients (vector 'b') of the multiple linear regression of the x_i values (columns of the matrix 'x') and the y values.

Parameters
xMatrix, whose columns are the different x_i datasets.
yThe second input dataset, corresponding to the y-axis.
bPointer to the calculated multiple linear regression coefficients.
rPointer to the calculated coefficient of determination (also reffered to as R squared).
Returns
0 on success, and -EINVAL if dimensions of the input vectors and matrix don't match.

Definition at line 466 of file statistics.c.

◆ zsl_sta_percentile()

int zsl_sta_percentile ( struct zsl_vec v,
zsl_real_t  p,
zsl_real_t val 
)

Computes the given percentile of a vector.

Parameters
vThe input vector.
pThe percentile to be calculated.
valThe output value.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 158 of file statistics.c.

Referenced by zsl_sta_median(), zsl_sta_quart(), zsl_sta_quart_range(), and zsl_sta_trim_mean().

◆ zsl_sta_quad_fit()

int zsl_sta_quad_fit ( struct zsl_mtx m,
struct zsl_vec b 
)

This function uses the least squares fitting method to compute the coefficients of a quadric surface given a set of tridimensional points.

A quadric is a 3D surface that is defined by the equation: Ax^2 + By^2 + Cz^2 + 2Dxy + 2Exz + 2Fyz + 2Gx + 2Hy + 2Iz = 1. Spheres and ellipsoids are special cases of quadrics. This function takes a set of points (x,y,z) and returns the coeffitiens (A, B, C, D, E, F, G, H, I) of the quadric surface that best fit the given points.

Parameters
mMatrix, whose rows are the (x, y, z) points.
bPointer to the calculated coefficients of the quadric.
Returns
0 on success, and -EINVAL if dimension of the input vectors isn't 9 and the input matrix isn't a Nx3 matrix.

Definition at line 618 of file statistics.c.

◆ zsl_sta_quart()

int zsl_sta_quart ( struct zsl_vec v,
zsl_real_t q1,
zsl_real_t q2,
zsl_real_t q3 
)

Calculates the first, second and third quartiles of a vector v.

Parameters
vThe vector to use.
q1The first quartile of v.
q2The second quartile of v, also the median of v.
q3The third quartile of v.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 235 of file statistics.c.

◆ zsl_sta_quart_range()

int zsl_sta_quart_range ( struct zsl_vec v,
zsl_real_t r 
)

Calculates the numeric difference between the third and the first quartiles of a vector v.

Parameters
vThe input vector.
rThe interquartile range of v.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 245 of file statistics.c.

◆ zsl_sta_rel_err()

int zsl_sta_rel_err ( zsl_real_t val,
zsl_real_t exp_val,
zsl_real_t err 
)

Calculates the relative error given a value and its expected value.

Parameters
valInput value.
exp_valInput expected value.
errOutput relative error.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 672 of file statistics.c.

◆ zsl_sta_sta_err()

int zsl_sta_sta_err ( struct zsl_vec v,
zsl_real_t err 
)

Calculates the standard error of the mean of a sample (vector v).

The standard error of the mean measures how far the arithmetic mean of the sample in vector 'v' ยก is likely to be from the true total population mean.

Parameters
vSample data vector.
errOutput standard error of the mean.
Returns
0 if everything executed correctly. If the dimension of the vector 'v' is zero, a negative error is returned.

Definition at line 679 of file statistics.c.

◆ zsl_sta_std_dev()

int zsl_sta_std_dev ( struct zsl_vec v,
zsl_real_t s 
)

Computes the standard deviation of vector v.

Standard deviation is an indication of how spread-out numbers in 'v' are, relative to the mean. It helps differentiate what is in the "standard" range (1 standard deviation from mean), and what is outside (above or below) this range, to pick out statistical outliers.

Parameters
vThe vector to use.
sThe output standard deviation of the vector v.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 372 of file statistics.c.

◆ zsl_sta_time_weighted_mean()

int zsl_sta_time_weighted_mean ( struct zsl_vec v,
struct zsl_vec t,
zsl_real_t m 
)

Computes the time-weighted arithmetic mean (average) of a positive data vector (v) and its time vector (w).

The time-weighted mean takes into consideration not only the numerical levels of a particular variable, but also the amount of time spent on it.

Parameters
vThe data vector to use, with positive coefficients.
tThe vector containing the time associated to the data vector.
mThe time-weighted arithmetic mean of the components of v taking the times in the vector t into account.
Returns
0 if everything executed correctly, -EINVAL if the dimensions of v and w don't match, or if any elements in 'v' are negative or if any time value in the vector 't' is repeated.

Definition at line 93 of file statistics.c.

◆ zsl_sta_trim_mean()

int zsl_sta_trim_mean ( struct zsl_vec v,
zsl_real_t  p,
zsl_real_t m 
)

Computes the trimmed arithmetic mean (average) of a vector.

The trimmed arithmetic mean of a dataset is described by a number (in this case 'p') from 0 to 50 that describes the percent of the data that will not be taken into account when computing the mean. Thus, a 3% trimmed mean will only use 94% of the data to calculate the arithmetic mean, and will ignore the lowest 3% of data and the highest 3% of data in the sorted data vector.

Parameters
vThe vector to use.
pThe percent of data that will be ignored in the computation of the mean (0.0 .. 50.0).
mThe trimmed arithmetic mean of the components of v.
Returns
0 if everything executed correctly, -EINVAL if the number 'p' is not between 0.0 and 50.0.

Definition at line 21 of file statistics.c.

◆ zsl_sta_var()

int zsl_sta_var ( struct zsl_vec v,
zsl_real_t var 
)

Computes the variance of a vector v (the average of the squared differences from the mean).

Parameters
vThe vector to use.
varThe variance of v.
Returns
0 if everything executed correctly, otherwise an appropriate error code.

Definition at line 356 of file statistics.c.

Referenced by zsl_sta_sta_err(), and zsl_sta_std_dev().

◆ zsl_sta_weighted_mean()

int zsl_sta_weighted_mean ( struct zsl_vec v,
struct zsl_vec w,
zsl_real_t m 
)

Computes the weighted arithmetic mean (average) of a data vector (v) and a weight vector (w).

Parameters
vThe data vector to use.
wThe vector containing the weights to use.
mThe weighted arithmetic mean of the components of v taking the weights in the vector w into account.
Returns
0 if everything executed correctly, -EINVAL if the dimensions of v and w don't match, or if any weights are negative or all of them are zero.

Definition at line 57 of file statistics.c.

◆ zsl_sta_weighted_median()

int zsl_sta_weighted_median ( struct zsl_vec v,
struct zsl_vec w,
zsl_real_t m 
)

Computes the weighted median of a data vector (v) and a weight vector (w).

Parameters
vThe data vector to use.
wThe vector containing the weights to use.
mThe weighted median of the components of v taking the weights in the vector w into account.
Returns
0 if everything executed correctly, -EINVAL if the dimensions of v and w don't match, or if any weights are negative or the sum of all the weights is not 1.

Definition at line 191 of file statistics.c.

◆ zsl_sta_weighted_mult_linear_reg()

int zsl_sta_weighted_mult_linear_reg ( struct zsl_mtx x,
struct zsl_vec y,
struct zsl_vec w,
struct zsl_vec b,
zsl_real_t r 
)

Calculates the coefficients (vector 'b') of the weighted multiple linear regression of the x_i values (columns of the matrix 'x'), the y values and the weights in the vector 'w'.

Parameters
xMatrix, whose columns are the different x_i datasets.
yThe second input dataset, corresponding to the y-axis.
wThe weights to use in the weighted least squares.
bPointer to the calculated weighted multiple linear regression coefficients.
rPointer to the calculated coefficient of determination (also reffered to as R squared).
Returns
0 on success, and -EINVAL if dimensions of the input vectors and matrix don't match.

Definition at line 535 of file statistics.c.