Introduction
Most of us are familiar with CDC tables in SQL Server. CDC or Change Data Capture is used to track and capture the DML changes happening in a table and makes use of log entries for capturing the information. SQL 2016 provides similar kind of functionality for capturing the history of the data related to memory optimized tables. These historical data capturing feature is referred to as Temporal Tables.Memory Optimized Tables were introduced in SQLServer 2014 and they provide an efficient way to store and manipulate data using natively compiled procedures, They provide two levels of durability Schema Only and Schema and Data. Schema Only will only make sure schema is preserved in case of a server restart whereas in the latter case both the schema as well as the data is persisted as in the case of a normal table (disk based).
Temporal tables are only supported in the case where durability option is set to schema and data. These tables will be persisted to disk along with data. When we created a temporal table it will preserve the history and will provide data easily based on our point in time analysis.
The temporal tables makes use of a history table internally to track the history of data changes happening in the memory optimized table. The main table will have two datetime2 type fields which are referred to as period columns. This is used by the table to determine the validity of each record to provide an effective point in time analysis. The date fields determine the validity of a record ie period for which record is(was) valid
Illustration
Consider the below case where we have a database to which we are going to add a memory optimized table and make it a temporal table.
For creating a memory optimized table we need to first have a file group in the database that supports memory optimized data. This can be done by using a series of ALTER DATABASE statements as shown below (Assumption is that SQL2016LocalTest is an already existing database in your SQLServer instance)
ALTER DATABASE [SQL2016LocalTest] ADD FILEGROUP mem_opt CONTAINS MEMORY_OPTIMIZED_DATA
ALTER DATABASE [SQL2016LocalTest] ADD FILE (name='mem_opt1', filename='') TO FILEGROUP mem_opt
ALTER DATABASE [SQL2016LocalTest] SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON
GO
ALTER DATABASE [SQL2016LocalTest] ADD FILEGROUP mem_opt CONTAINS MEMORY_OPTIMIZED_DATA
ALTER DATABASE [SQL2016LocalTest] ADD FILE (name='mem_opt1', filename='
ALTER DATABASE [SQL2016LocalTest] SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON
GO
Once the filegroup is setup as above we shall create the memory optimized table on it using CREATE TABLE statement as below
CREATE TABLE dbo.HotelBooking
(
BookingID int,
RoomNo int,
CustomerName varchar(100),
BookingDate datetime,
ValidFrom datetime2 generated always as row start not null default ('19000101'),
ValidTo datetime2 generated always as row end not null default ('99991231'),
PRIMARY KEY CLUSTERED (BookingID),
period for system_time(ValidFrom, ValidTo)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.BookingHistory));
Once the above statement is executed we will have a temporal table setup with an internal history table which will used for tracking the history of data from the main table.
You can check this by expanding tables under the corresponding databases node to see the tables as per below
As you see the table will be represented with a small clock icon indicating that it is a temporal table and will also have the history table specified under it. The history table will be an exact replica of the main table in schema and will have clustered index on the key column.
Now lets insert some data to this table. For evaluating the history capture its best if you do the data manipulation operation over a period of time.
INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1001,
101,
'Peter',
'20150923'
)
UPDATE dbo.HotelBooking
SET CustomerName = 'Ramu'
WHERE RoomNo = 101
INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1002,
115,
'John',
'20160314'
)
INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1005,
212,
'Mary',
'20160416'
)
UPDATE dbo.HotelBooking
SET BookingDate = '20160420'
WHERE BookingID = 1005
SET CustomerName = 'Ramu'
WHERE RoomNo = 101
INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1002,
115,
'John',
'20160314'
)
INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1005,
212,
'Mary',
'20160416'
)
UPDATE dbo.HotelBooking
SET BookingDate = '20160420'
WHERE BookingID = 1005
INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1011,
307,
'Khader',
'20160617'
)
UPDATE dbo.HotelBooking
SET CustomerName = 'Michelle' (
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1011,
307,
'Khader',
'20160617'
)
UPDATE dbo.HotelBooking
WHERE RoomNo = 101
INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1015,
115,
'Mohsin',
'20160622'
)
DELETE FROM dbo.HotelBooking
WHERE BookingID = 1002
Once you've done the above data addition/modification you can check the main and history table and you will get the below data snapshot
I have done the data modification over a period of time (around 2 days) to indicate the effect on history table.
If you analyze the above data you can see how the progression of data is getting captured in the history
Everytime a new insert happens it will be captured in the main table HotelBooking with ValidFrom set to the corresponding system time and ValidTo set to the maxdate value (9999-12-31 23:59:59.99999)
Similarly every update operation is carried out by means of two internal operations a delete of the old entry followed by insertion of new entry. This can be seen from resultset above where you will have an entry in both the tables (see records with BookingID 1001 and 1005 in the above case). The history table will have the entry with the values prior to the modification with ValidFrom as actual ValidFrom value from the main entry and ValidTo as the corresponding system time. The main table entry will have the modified values with ValidFrom as the corresponding system time and ValidTo as the max date value.
In the case of delete operations the record will be removed from the main table and there will be an entry added to the history table with ValidFrom as the original ValidFrom and ValidTo as the corresponding system time.
Now that we have got an idea of what happens on the background and how data will get captured in the two tables lets now see some functions which are associated to Temporal tables and which will help us to query temporal data for time based analysis.
AS OF time
AS OF time function gives the snapshot of temporal table data at an instant of time i.e the data that is valid at that instanceLets try it on our sample table and illustration is below
SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME AS OF '2016-07-25 05:15'
Now see the last resultset and compare it with the original table vs history entries and you will notice that there are only 3 entries in the resultset.
If you check the ValidFrom and ValidTo values for the resultset entries its pretty evident that those were the entries which were valid as on specified snapshot time (ie ValidFrom < 2016-07-25 05:15 < ValidTo)
The ignored records were those which were either expired before snapshot date (ex: 1001 Peter) or the ones which became valid after the snapshot time (ex: 1015,1011)
So effectively what AS OF function does is to apply a query logic as below
SELECT * FROM dbo.HotelBooking WHERE '2016-07-25 05:15' BETWEEN ValidFrom AND ValidTo
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE '2016-07-25 05:15' BETWEEN ValidFrom AND ValidTo
Execute this and you will get the same resultset as returned by AS OF
Graphically this can be depicted using the three scenarios as shown below
FROM time1 TO time2
FROM time1 TO time2 function returns all records whose validity overlaps the period specified by time1 to time2.
The illustration for this can be shown as below
SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME FROM '2016-07-24 10:34' TO '2016-07-25 05:16:55.4510553'
The resultset in this consists of all records with the interval ValidFrom to ValidTo overlapping between interval specified by time1 and time2. One thing to note here is that record with BookingID 1011 was not included as it became valid only from the end boundary value whereas function considers only the cases where interval starts or ends within the boundary value ie the equivalent query can be written as
SELECT * FROM dbo.HotelBooking WHERE ValidFrom < '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE ValidFrom < '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
BETWEEN time1 AND time2
BETWEEN function works quite similar to the FROM function to get the resultset where record interval of ValidFrom to ValidTo overlaps with the interval specified by time1 to time2. The only difference between BETWEEN and FROM is that BETWEEN considers the end boundary value (time2) also as a part of the interval unlike the FROM function.
So if you see the illustration for BETWEEN it looks like below
Check the result and you can see that its very much similar to resultset for FROM with only addition being the record with BookingID 1011 which starts at the same time as the end time of the interval
So equivalent query in this case can be given as
SELECT * FROM dbo.HotelBooking WHERE ValidFrom <= '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE ValidFrom <= '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
ORDER BY ValidFrom
Corresponding graphical representation is shown below
CONTAINED IN (time1,time2)
CONTAINED IN function returns a resultset where the record validity falls within the interval from time1 to time2 ie ValidFrom and ValidTo both lying within the interval
Illustration is given below
SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME CONTAINED IN ( '2016-07-24 10:32','2016-07-25 05:18:35.3157322') ORDER BY ValidFrom
The resultset on analysis reveals that only those records whose validity falls fully within the interval time1 to time2 are returned by CONTAINED IN function.
The equivalent query is as below
SELECT * FROM dbo.HotelBooking WHERE ValidFrom >= '2016-07-24 10:32' AND ValidTo <= '2016-07-25 05:18:35.3157322'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE ValidFrom >= '2016-07-24 10:32' AND ValidTo <= '2016-07-25 05:18:35.3157322' ORDER BY ValidFrom
Graphically this can be represented as below
ALL
The ALL function simply combines the results from both the main and history table in the resultset
Illustration below
SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME ALL ORDER BY ValidFrom
The resultset includes all records from both the tables
The equivalent query is as below
SELECT * FROM dbo.HotelBooking
UNION ALL
SELECT * FROM dbo.BookingHistory ORDER BY ValidFrom
Conclusion
The above illustrations gives a clear idea about how temporal functions can be used to generate time based results from a temporal table.
This shows that Temporal tables provide a much convenient way of capturing history and using it for effective time based analysis
Hope you this article was informative enough. Feel free to revert with any comments you may have.
Hope you this article was informative enough. Feel free to revert with any comments you may have.
No comments:
Post a Comment