Thursday, September 22, 2011

Change Data Capture in SQL Server 2008


Change Data Capture(CDC) tracks DML(Inserts, Updates, Deletes) and DDL(Create, Alter, Drop) activities on SQL Server tables. Basically, CDC reads the transaction log to capture the changes made to the tables and then writes that information into the corresponding change tables using SQL Agent jobs. To store the information about the changes made to the tables being tracked, SQL Server creates "change" tables that have the same structure(columns) as that of the source tables, plus a few more columns that are specific to CDC. This feature is only available on enterprise, developer and evaluation edition and can work with any recovery model(Simple, Bulk-logged, Full). Also, please note that there is no UI available to configure change data capture, you will have to do it programatically.

The first step in setting up Change Data Capture is to enable the database for Change Data Capture. To do that, you will have to execute the sys.sp_cdc_enable_db stored procedure. Please note that you need to be a member of the sysadmin fixed server role to execute the stored procedure. The next step is to enable the table(that you want to track) for Change Data Capture. To do that, you will need to execute thesys.sp_cdc_enable_table stored procedure. Please note that you need to be a member of the db_owner fixed database role to execute the stored procedure.

Lets enable CDC in one of tables in the AdventureWorks Database and see how SQL Server tracks the modifications. As said before, we need to enable CDC on the database as shown in the figure below.

Now lets enable CDC on the Product table by executing the statement EXEC sys.sp_cdc_enable_table 'Production', 'Product',@role_name = NULL as shown in the figure below.

As soon as CDC is enabled, SQL Server will create some system objects(as shown in the figure below) to track and store changes made to the source table. The change table(in this case cdc.Production_Product_CT) is added after CDC is enabled on the table.

Now lets do some inserts and updates to the Product table. As shown in the figure below, we did two inserts and one update to the Production.Product table in the AdventureWorks database.

This information must have been captured in the change table cdc.Production_Product_CT, lets find out.

As shown in the screen shot above, we have 4 rows: the first two rows indicate inserts, the third and fourth rows represent before and after image the update operation respectively. This is indicated by the __$operation column whose value is 1 for a delete, 2 for an insert, 3 is the "before image" of an update, and 4 is the "after image" of an update.
To extract change data, there are two built-in functions you can leverage:
  • cdc.fn_cdc_get_all_changes_ - This function returns all changes within a specified range.
  • cdc.fn_cdc_get_net_changes_ -  As the name indicates, this function returns only the final change for each row within a specified range.
You will have to know the relevant Log Sequence Numbers(LSNs) to extract data using these functions. However, you can use sys.fn_cdc_map_time_lsn function to map datetime ranges to LSN ranges.
For example, if your ETL system extracts change data on a daily basis, then use the following code:
DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10);
SELECT @begin_time = CAST(CONVERT (date, GETDATE()) as varchar)+' 12:00:00.000'
SELECT @end_time = CAST(CONVERT (date, GETDATE()+1) as varchar)+' 12:00:00.000'
SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);
SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);
SELECT * FROM cdc.fn_cdc_get_all_changes_Production_Product(@begin_lsn, @end_lsn, 'all')
If we execute the above code in our case, it will return three rows as shown in the figure below.

No comments:

Post a Comment