g01 Chapter Contents
g01 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_approx_quantiles_fixed (g01anc)

## 1  Purpose

nag_approx_quantiles_fixed (g01anc) finds approximate quantiles from a data stream of known size using an out-of-core algorithm.

## 2  Specification

 #include #include
 void nag_approx_quantiles_fixed (Integer *ind, Integer n, const double rv[], Integer nb, double eps, Integer *np, const double q[], double qv[], Integer nq, double rcomm[], Integer lrcomm, Integer icomm[], Integer licomm, NagError *fail)

## 3  Description

A quantile is a value which divides a frequency distribution such that there is a given proportion of data values below the quantile. For example, the median of a dataset is the $0.5$ quantile because half the values are less than or equal to it.
nag_approx_quantiles_fixed (g01anc) uses a slightly modified version of an algorithm described in a paper by Zhang and Wang (2007) to determine $\epsilon$-approximate quantiles of a data stream of $n$ real values, where $n$ is known. Given any quantile $q\in \left[0.0,1.0\right]$, an $\epsilon$-approximate quantile is defined as an element in the data stream whose rank falls within $\left[\left(q-\epsilon \right)n,\left(q+\epsilon \right)n\right]$. In case of more than one $\epsilon$-approximate quantile being available, the one closest to $qn$ is returned.

## 4  References

Zhang Q and Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams Proceedings of the 19th International Conference on Scientific and Statistical Database Management IEEE Computer Society 29

## 5  Arguments

1:     indInteger *Input/Output
On entry: indicates the action required in the current call to nag_approx_quantiles_fixed (g01anc).
${\mathbf{ind}}=0$
Return the required length of rcomm and icomm in ${\mathbf{icomm}}\left[0\right]$ and ${\mathbf{icomm}}\left[1\right]$ respectively. n and eps must be set and licomm must be at least $2$.
${\mathbf{ind}}=1$
Initialise the communication arrays and process the first nb values from the data stream as supplied in rv.
${\mathbf{ind}}=2$
Process the next block of nb values from the data stream. The calling program must update rv and (if required) nb, and re-enter nag_approx_quantiles_fixed (g01anc) with all other parameters unchanged.
${\mathbf{ind}}=3$
Calculate the nq $\epsilon$-approximate quantiles specified in q. The calling program must set q and nq and re-enter nag_approx_quantiles_fixed (g01anc) with all other parameters unchanged. This option can be chosen only when ${\mathbf{np}}\ge ⌈\mathrm{exp}\left(1.0\right)/{\mathbf{eps}}⌉$.
On exit: indicates output from a successful call.
${\mathbf{ind}}=1$
Lengths of rcomm and icomm have been returned in ${\mathbf{icomm}}\left[0\right]$ and ${\mathbf{icomm}}\left[1\right]$ respectively.
${\mathbf{ind}}=2$
nag_approx_quantiles_fixed (g01anc) has processed np data points and expects to be called again with additional data (i.e., ${\mathbf{np}}<{\mathbf{n}}$).
${\mathbf{ind}}=3$
nag_approx_quantiles_fixed (g01anc) has returned the requested $\epsilon$-approximate quantiles in qv. These quantiles are based on np data points.
${\mathbf{ind}}=4$
Routine has processed all n data points (i.e., ${\mathbf{np}}={\mathbf{n}}$).
Constraint: on entry ${\mathbf{ind}}=0$, $1$, $2$ or $3$.
2:     nIntegerInput
On entry: $n$, the total number of values in the data stream.
Constraint: ${\mathbf{n}}>0$.
3:     rv[$\mathit{dim}$]const doubleInput
Note: the dimension, dim, of the array rv must be at least ${\mathbf{nb}}$ when ${\mathbf{ind}}=1$ or $2$.
On entry: if ${\mathbf{ind}}=1$ or $2$, the vector containing the current block of data, otherwise rv is not referenced.
4:     nbIntegerInput
On entry: if ${\mathbf{ind}}=1$ or $2$, the size of the current block of data. The size of blocks of data in array rv can vary; therefore nb can change between calls to nag_approx_quantiles_fixed (g01anc).
Constraint: if ${\mathbf{ind}}=1$ or $2$, ${\mathbf{nb}}>0$.
5:     epsdoubleInput
On entry: approximation factor $\epsilon$.
Constraint: ${\mathbf{eps}}\ge \mathrm{exp}\left(1.0\right)/{\mathbf{n}}\text{​ and ​}{\mathbf{eps}}\le 1.0$.
6:     npInteger *Output
On exit: the number of elements processed so far.
7:     q[$\mathit{dim}$]const doubleInput
Note: the dimension, dim, of the array q must be at least ${\mathbf{nq}}$ when ${\mathbf{ind}}=3$.
On entry: if ${\mathbf{ind}}=3$, the quantiles to be calculated, otherwise q is not referenced. Note that ${\mathbf{q}}\left[i\right]=0.0$, corresponds to the minimum value and ${\mathbf{q}}\left[i\right]=1.0$ to the maximum value.
Constraint: if ${\mathbf{ind}}=3$, $0.0\le {\mathbf{q}}\left[\mathit{i}-1\right]\le 1.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{nq}}$.
8:     qv[$\mathit{dim}$]doubleOutput
Note: the dimension, dim, of the array qv must be at least ${\mathbf{nq}}$ when ${\mathbf{ind}}=3$.
On exit: if ${\mathbf{ind}}=3$, ${\mathbf{qv}}\left[i\right]$ contains the $\epsilon$-approximate quantiles specified by the value provided in ${\mathbf{q}}\left[i\right]$.
9:     nqIntegerInput
On entry: if ${\mathbf{ind}}=3$, the number of quantiles requested, otherwise nq is not referenced.
Constraint: if ${\mathbf{ind}}=3$, ${\mathbf{nq}}>0$.
10:   rcomm[lrcomm]doubleCommunication Array
11:   lrcommIntegerInput
On entry: the dimension of the array rcomm.
Constraint: if ${\mathbf{ind}}\ne 0$, lrcomm must be at least equal to the value returned in ${\mathbf{icomm}}\left[0\right]$ by a call to nag_approx_quantiles_fixed (g01anc) with ${\mathbf{ind}}=0$. This will not be more than $x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×{\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$, where $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$.
12:   icomm[licomm]IntegerCommunication Array
13:   licommIntegerInput
On entry: the dimension of the array icomm.
Constraints:
• if ${\mathbf{ind}}=0$, ${\mathbf{licomm}}\ge 2$;
• otherwise licomm must be at least equal to the value returned in ${\mathbf{icomm}}\left[1\right]$ by a call to nag_approx_quantiles_fixed (g01anc) with ${\mathbf{ind}}=0$. This will not be more than $2×\left(x+2×\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(x,⌈x/2.0⌉+1\right)×y\right)+y+6$, where $x=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,⌊\mathrm{log}\left({\mathbf{eps}}×{\mathbf{n}}\right)/{\mathbf{eps}}⌋\right)$ and $y={\mathrm{log}}_{2}\left({\mathbf{n}}/x+1.0\right)+1$.
14:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_ARRAY_SIZE
On entry, licomm is too small: ${\mathbf{licomm}}=〈\mathit{\text{value}}〉$.
On entry, lrcomm is too small: ${\mathbf{lrcomm}}=〈\mathit{\text{value}}〉$.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
NE_INT
On entry, ${\mathbf{ind}}=1$ or $2$ and ${\mathbf{nb}}=〈\mathit{\text{value}}〉$.
Constraint: if ${\mathbf{ind}}=1$ or $2$ then ${\mathbf{nb}}>0$.
On entry, ${\mathbf{ind}}=3$ and ${\mathbf{nq}}=〈\mathit{\text{value}}〉$.
Constraint: if ${\mathbf{ind}}=3$ then ${\mathbf{nq}}>0$.
On entry, ${\mathbf{ind}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ind}}=0$, $1$, $2$ or $3$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}>0$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_Q_OUT_OF_RANGE
On entry, ${\mathbf{ind}}=3$ and ${\mathbf{q}}\left[〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$.
Constraint: if ${\mathbf{ind}}=3$ then $0.0\le {\mathbf{q}}\left[i\right]\le 1.0$ for all $i$.
NE_REAL
On entry, ${\mathbf{eps}}=〈\mathit{\text{value}}〉$.
Constraint: $\mathrm{exp}\left(1.0\right)/{\mathbf{n}}\le {\mathbf{eps}}\le 1.0$.
NE_TOO_SMALL
Number of data elements streamed, $〈\mathit{\text{value}}〉$ is not sufficient for a quantile query when ${\mathbf{eps}}=〈\mathit{\text{value}}〉$.
Supply more data or reprocess the data with a higher eps value.

Not applicable.

## 8  Further Comments

The average time taken by nag_approx_quantiles_fixed (g01anc) is ${\mathbf{n}}\mathrm{log}\left(1/\epsilon \mathrm{log}\left(\epsilon {\mathbf{n}}\right)\right)$.

## 9  Example

This example calculates $\epsilon$-approximate quantile for $q=0.25$, $0.5$ and $1.0$ for a data stream of $60$ values. The stream is read in four blocks of varying size.

### 9.1  Program Text

Program Text (g01ance.c)

### 9.2  Program Data

Program Data (g01ance.d)

### 9.3  Program Results

Program Results (g01ance.r)