Tuesday, November 25, 2008

SQL 2008 Excel-like LINEST Function

Once again, I find myself replicating functionality in SQL that Excel provides out of the box. A necessary step if you find yourself analyzing large data sets with rigorous math. This time, it's the LINEST function, Excel's tried and true linear regression function. Admittedly, what I built does not provide the same level of information, but it does provide the big three: slope, y-intercept, and r-Squared. Once the base linear regression function is built, extending to handle exponential regression becomes straightforward, but I will save that for a different post.

Caveat

This will work out of the box for SQL 2008. You'll need to modify for lower versions of SQL Server.

First create a new Table type (only available in SQL 2008)

CREATE TYPE dbo.RegressionInput AS TABLE
(
x DECIMAL(18, 6),
y DECIMAL(18, 6)
)

GO

Then create the function

CREATE FUNCTION dbo.LinearRegression
(
@RegressionInputs AS dbo.RegressionInput READONLY
)
RETURNS @RegressionOutput TABLE
(
Slope DECIMAL(18, 6),
Intercept DECIMAL(18, 6),
RSquare DECIMAL(18, 6)
)
AS
BEGIN


DECLARE @Xaverage AS DECIMAL(18, 6)
DECLARE @Yaverage AS DECIMAL(18, 6)
DECLARE @slope AS DECIMAL(18, 6)
DECLARE @intercept AS DECIMAL(18, 6)
DECLARE @rSquare AS DECIMAL(18, 6)

SELECT
@Xaverage = AVG(x),
@Yaverage = AVG(y)
FROM
@RegressionInputs

SELECT
@slope = SUM((x - @Xaverage) * (y - @Yaverage))/SUM(POWER(x - @Xaverage, 2))
FROM
@RegressionInputs

SELECT
@intercept = @Yaverage - (@slope * @Xaverage)


SELECT @rSquare = 1 - (SUM(POWER(y - (@intercept + @slope * x), 2))/(SUM(POWER(y - (@intercept + @slope * x), 2)) + SUM(POWER(((@intercept + @slope * x) - @Yaverage), 2))))
FROM
@RegressionInputs

INSERT INTO
@RegressionOutput
(
Slope,
Intercept,
RSquare
)
SELECT
@slope,
@intercept,
@rSquare

RETURN

END
GO








11 comments:

Daniel Monteiro said...

Hello. I'd like to know if you have this sql function to calculate the LINEST function using multiples values...

daniel_m_vaz@yahoo.com.br

Micah Minarik said...

Daniel,

I have no idea if you are still tracking this as I have been away from it for a while now, but to answer your question (if I think I understand it), no I have not implemented multi-linear (LINEST with multiple values) regression in SQL. That said, I do have a C# implementation since multi-linear regression involves linear algebra and I found SQL to be a bit "clunky" when I thought about it as an implementation language. I would be more than willing to share it with you if you'd like.

Julia said...

Hi!!
My name is Julia, and I would love to see your c# implementation for multiple regression.
Thanks!

Julia

Unknown said...

Hi!!
My name is Chitra, and I would love to see your c# implementation for multiple regression.
Thanks!

Chitra

Micah Minarik said...

Chitra and Julia, I obviously have not attended to this blog in over a year, so apologies. If you are still interested, drop me a note and I'll get you the multiple regression implementation.

Jatin Soni said...
This comment has been removed by the author.
Jatin Soni said...

Hi,
I would Like to see C# or SQL implementation of Multiple Regression. Can you please let me know how to achieve this?

steven NJUKI said...

Hello
Can you please share your c# or SQL implementation of multi linear regression?

Jatin Soni said...

Can you please share your c# or SQL implementation of multi linear regression

Jatin Soni said...

Can you please share your c# or SQL implementation of multi linear regression

Colin M Campbell said...

I have developed multiple regression in SQL. Haven't put it into a procedure, but that wouldn't be hard. it is at my blog In the SQL Data Mine:

http://sqldatamine.blogspot.com/2013/12/true-multiple-regression-using-sql.html