REGR_ (Linear Regression) Function s

The linear regression functions are:

REGR_SLOPE
REGR_INTERCEPT
REGR_COUNT
REGR_R2
REGR_AVGX
REGR_AVGY
REGR_SXX
REGR_SYY
REGR_SXY

Syntax

linear_regr::=

Description of the illustration linear_regr.gif

See Also:

Table 2-10, "Implicit Type Conversion Matrix" for more information on implicit conversion and "Numeric Precedence" for information on numeric precedence

Oracle applies the function to the set of (expr1, expr2) pairs after eliminating all pairs for which either expr1 or expr2 is null. Oracle computes all the regression functions simultaneously during a single pass through the data.

expr1 is interpreted as a value of the dependent variable (a y value), and expr2 is interpreted as a value of the independent variable (an x value).

REGR_SLOPE returns the slope of the line. The return value is a numeric datatype and can be null. After the elimination of null (expr1, expr2) pairs, it makes the following computation:
```
COVAR_POP(expr1, expr2) / VAR_POP(expr2)
```
REGR_INTERCEPT returns the y-intercept of the regression line. The return value is a numeric datatype and can be null. After the elimination of null (expr1, expr2) pairs, it makes the following computation:
```
AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2)
```
REGR_COUNT returns an integer that is the number of non-null number pairs used to fit the regression line.
REGR_R2 returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. The return value is a numeric datatype and can be null. VAR_POP(expr1) and VAR_POP(expr2) are evaluated after the elimination of null pairs. The return values are:
```
NULL if VAR_POP(expr2)  = 0

                        1 if VAR_POP(expr1)  = 0 and 
                             VAR_POP(expr2) != 0

POWER(CORR(expr1,expr),2) if VAR_POP(expr1)  > 0 and
                             VAR_POP(expr2  != 0
```

All of the remaining regression functions return a numeric datatype and can be null:

REGR_AVGX evaluates the average of the independent variable (expr2) of the regression line. It makes the following computation after the elimination of null (expr1, expr2) pairs:
```
AVG(expr2)
```
REGR_AVGY evaluates the average of the dependent variable (expr1) of the regression line. It makes the following computation after the elimination of null (expr1, expr2) pairs:
```
AVG(expr1)
```

REGR_SXY, REGR_SXX, REGR_SYY are auxiliary functions that are used to compute various diagnostic statistics.

REGR_SXX makes the following computation after the elimination of null (expr1, expr2) pairs:
```
REGR_COUNT(expr1, expr2) * VAR_POP(expr2)
```
REGR_SYY makes the following computation after the elimination of null (expr1, expr2) pairs:
```
REGR_COUNT(expr1, expr2) * VAR_POP(expr1)
```
REGR_SXY makes the following computation after the elimination of null (expr1, expr2) pairs:
```
REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2)
```

The following examples are based on the sample tables sh.sales and sh.products.

General Linear Regression Example

The following example provides a comparison of the various linear regression functions used in their analytic form. The analytic form of these functions can be useful when you want to use regression statistics for calculations such as finding the salary predicted for each employee by the model. The sections that follow on the individual linear regression functions contain examples of the aggregate form of these functions.

SELECT job_id, employee_id ID, salary,
REGR_SLOPE(SYSDATE-hire_date, salary)
   OVER (PARTITION BY job_id) slope,
REGR_INTERCEPT(SYSDATE-hire_date, salary)
   OVER (PARTITION BY job_id) intcpt,
REGR_R2(SYSDATE-hire_date, salary)
   OVER (PARTITION BY job_id) rsqr,
REGR_COUNT(SYSDATE-hire_date, salary)
   OVER (PARTITION BY job_id) count,
REGR_AVGX(SYSDATE-hire_date, salary)
   OVER (PARTITION BY job_id) avgx,
REGR_AVGY(SYSDATE-hire_date, salary)
   OVER (PARTITION BY job_id) avgy
   FROM employees
   WHERE department_id in (50, 80)
   ORDER BY job_id, employee_id;

JOB_ID        ID     SALARY SLOPE    INTCPT  RSQR  COUNT       AVGX      AVGY
---------- ----- ---------- ----- --------- ----- ------ ---------- ---------
SA_MAN       145      14000  .355 -1707.035  .832      5  12200.000  2626.589
SA_MAN       146      13500  .355 -1707.035  .832      5  12200.000  2626.589
SA_MAN       147      12000  .355 -1707.035  .832      5  12200.000  2626.589
SA_MAN       148      11000  .355 -1707.035  .832      5  12200.000  2626.589
SA_MAN       149      10500  .355 -1707.035  .832      5  12200.000  2626.589
SA_REP       150      10000  .257   404.763  .647     29   8396.552  2561.244
SA_REP       151       9500  .257   404.763  .647     29   8396.552  2561.244
SA_REP       152       9000  .257   404.763  .647     29   8396.552  2561.244
SA_REP       153       8000  .257   404.763  .647     29   8396.552  2561.244
SA_REP       154       7500  .257   404.763  .647     29   8396.552  2561.244
SA_REP       155       7000  .257   404.763  .647     29   8396.552  2561.244
SA_REP       156      10000  .257   404.763  .647     29   8396.552  2561.244
...

REGR_SLOPE and REGR_INTERCEPT Examples

The following example calculates the slope and regression of the linear regression model for time employed (SYSDATE - hire_date) and salary using the sample table hr.employees. Results are grouped by job_id.

SELECT job_id,
REGR_SLOPE(SYSDATE-hire_date, salary) slope,
REGR_INTERCEPT(SYSDATE-hire_date, salary) intercept
   FROM employees
   WHERE department_id in (50,80)
   GROUP BY job_id
   ORDER BY job_id;
JOB_ID          SLOPE  INTERCEPT
---------- ---------- ----------
JOB_ID     SLOPE    INTERCEPT
---------- ----- ------------
SA_MAN      .355 -1707.030762
SA_REP      .257   404.767151
SH_CLERK    .745   159.015293
ST_CLERK    .904   134.409050
ST_MAN      .479  -570.077291

REGR_COUNT Examples

The following example calculates the count of by job_id for time employed (SYSDATE - hire_date) and salary using the sample table hr.employees. Results are grouped by job_id.

SELECT job_id,
REGR_COUNT(SYSDATE-hire_date, salary) count
   FROM employees
   WHERE department_id in (30, 50)
   GROUP BY job_id;
JOB_ID      COUNT
---------- ------
ST_MAN          5
PU_MAN          1
SH_CLERK       20
PU_CLERK        5
ST_CLERK       20

REGR_R2 Examples

The following example calculates the coefficient of determination the linear regression of time employed (SYSDATE - hire_date) and salary using the sample table hr.employees:

SELECT job_id,REGR_R2(SYSDATE-hire_date, salary) Regr_R2   FROM employees   WHERE department_id in (80, 50)   GROUP by job_id;

JOB_ID         REGR_R2
---------- -----------
ST_MAN      .694185080
SH_CLERK    .879799698
SA_MAN      .832447480
SA_REP      .647007156
ST_CLERK    .742808493

REGR_AVGY and REGR_AVGX Examples

The following example calculates the average values for time employed (SYSDATE - hire_date) and salary using the sample table hr.employees. Results are grouped by job_id:

SELECT job_id,
REGR_AVGY(SYSDATE-hire_date, salary) avgy,
REGR_AVGX(SYSDATE-hire_date, salary) avgx
   FROM employees
   WHERE department_id in (30,50)
   GROUP BY job_id;
JOB_ID                AVGY   AVGX
---------- --------------- ------
ST_MAN      2899.055555556   7280
PU_MAN      3785.455555556  11000
SH_CLERK    2531.955555556   4925
PU_CLERK    2709.255555556   2780
ST_CLERK    2631.605555556   2785

REGR_SXY, REGR_SXX, and REGR_SYY Examples

The following example calculates three types of diagnostic statistics for the linear regression of time employed (SYSDATE - hire_date) and salary using the sample table hr.employees:

SELECT job_id,
REGR_SXY(SYSDATE-hire_date, salary) regr_sxy,
REGR_SXX(SYSDATE-hire_date, salary) regr_sxx,
REGR_SYY(SYSDATE-hire_date, salary) regr_syy
   FROM employees
   WHERE department_id in (80, 50)
   GROUP BY job_id
   ORDER BY job_id;
 
JOB_ID       REGR_SXY    REGR_SXX   REGR_SYY
---------- ---------- ----------- ----------
SA_MAN        3303500   9300000.0    1409642
SA_REP     16819665.5  65489655.2 6676562.55
SH_CLERK      4248650   5705500.0    3596039
ST_CLERK      3531545   3905500.0 4299084.55
ST_MAN        2180460   4548000.0  1505915.2

REGR_ (Linear Regression) Functions

REGR_ (Linear Regression) Function s