I have a MS SQL table that contains stock data with the following columns:
Id, Symbol, Date, Open, High, Low, Close.
I would like to self-join the table, so I can get a day-to-day % change for
I must create a query that will join the table with itself in a way that every record contains also the data from the previous session (be aware, that I cannot use yesterday’s date).
My idea is to do something like this:
select * from quotes t1 inner join quotes t2 on t1.symbol = t2.symbol and t2.date = (select max(date) from quotes where symbol = t1.symbol and date < t1.date)
However I do not know if that’s the correct/fastest way. What should I take into account when thinking about performance? (E.g. will putting UNIQUE index on a (Symbol, Date) pair improve performance?)
There will be around 100,000 new records every year in this table. I am using MS SQL Server 2008
One option is to use a recursive cte (if I’m understanding your requirements correctly):
WITH RNCTE AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY date) rn FROM quotes ), CTE AS ( SELECT symbol, date, rn, cast(0 as decimal(10,2)) perc, closed FROM RNCTE WHERE rn = 1 UNION ALL SELECT r.symbol, r.date, r.rn, cast(c.closed/r.closed as decimal(10,2)) perc, r.closed FROM CTE c JOIN RNCTE r on c.symbol = r.symbol AND c.rn+1 = r.rn ) SELECT * FROM CTE ORDER BY symbol, date
If you need a running total for each symbol to use as the percentage change, then easy enough to add an additional column for that amount — wasn’t completely sure what your intentions were, so the above just divides the current closed amount by the previous closed amount.
You do something like this:
with OrderedQuotes as ( select row_number() over(order by Symbol, Date) RowNum, ID, Symbol, Date, Open, High, Low, Close from Quotes ) select a.Symbol, a.Date, a.Open, a.High, a.Low, a.Close, a.Date PrevDate, a.Open PrevOpen, a.High PrevHigh, a.Low PrevLow, a.Close PrevClose, b.Close-a.Close/a.Close PctChange from OrderedQuotes a join OrderedQuotes b on a.Symbol = b.Symbol and a.RowNum = b.RowNum + 1
If you change the last join to a left join you get a row for the first date for each symbol, not sure if you need that.
Something like this w’d work in SQLite:
SELECT .. FROM quotes t1, quotes t2 WHERE t1.symbol = t2.symbol AND t1.date < t2.date GROUP BY t2.ID HAVING t2.date = MIN(t2.date)
Given SQLite is a simplest of a kind, maybe in MSSQL this will also work with minimal changes.
What you had is fine. I don’t know if translating the sub-query into the join will help. However, you asked for it, so the way to do it might be to join the table to itself once more.
select * from quotes t1 inner join quotes t2 on t1.symbol = t2.symbol and t1.date > t2.date left outer join quotes t3 on t2.symbol = t3.symbol and t2.date > t3.date where t3.date is null
You could do something like this:
DECLARE @Today DATETIME SELECT @Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)) ;WITH today AS ( SELECT Id , Symbol , Date , [OPEN] , High , LOW , [CLOSE], DATEADD(DAY, -1, Date) AS yesterday FROM quotes WHERE date = @today ) SELECT * FROM today LEFT JOIN quotes yesterday ON today.Symbol = yesterday.Symbol AND today.yesterday = yesterday.Date
That way you limit your “today” results, if that’s an option.
EDIT: The CTEs listed as other questions may work well, but I tend to be hesitant to use ROW_NUMBER when dealing with 100K rows or more. If the previous day may not always be yesterday, I tend to prefer to pull out the check for the previous day in its own query then use it for reference:
DECLARE @Today DATETIME, @PreviousDay DATETIME SELECT @Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)); SELECT @PreviousDay = MAX(Date) FROM quotes WHERE Date < @Today; WITH today AS ( SELECT Id , Symbol , Date , [OPEN] , High , LOW , [CLOSE] FROM quotes WHERE date = @today ) SELECT * FROM today LEFT JOIN quotes AS previousday ON today.Symbol = previousday.Symbol AND previousday.Date = @PreviousDay
;WITH cte AS ( SELECT symbol, date, [Open], [High], [Low], [Close], ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY date) AS Id FROM quotes ) SELECT c1.Id, c1.symbol, c1.date, c1.[Open], c1.[High], c1.[Low], c1.[Close], ISNULL(c2.[Close] / c1.[Close], 0) AS perc FROM cte c1 LEFT JOIN cte c2 ON c1.symbol = c2.symbol AND c1.Id = c2.Id + 1 ORDER BY c1.symbol, c1.date
For improving performance(avoiding sorting and RID Lookup) use this index
CREATE INDEX ix_symbol$date_quotes ON quotes(symbol, date) INCLUDE([Open], [High], [Low], [Close])
Simple demo on SQLFiddle
SELECT * FROM quotes q_curr CROSS APPLY ( SELECT TOP(1) * FROM quotes WHERE symbol = q_curr.symbol AND date < q_curr.date ORDER BY date DESC ) q_prev