Binary Content and SharePoint Performance

It looks like not only count of SharePoint objects (count of rows in the SharePoint SQL tables) affects SharePoint performance. The amount of binary contents also influences SharePoint performance based on article below.

Source: http://technet.microsoft.com/en-us/magazine/2009.06.insidesharepoint.aspx

By default, SharePoint stores BLOB data in the Content column of the AllDocStreams table in the content database. The obvious advantage of this approach is straightforward transactional consistency between relational data and the associated non-relational file contents. For example, it’s not complicated to insert the metadata of a Word document along with the unstructured content into a content database, nor is it complicated to associate metadata with the corresponding unstructured content in select, update, or delete operations. However, the most obvious disadvantage of the default approach is an inefficient use of storage resources. Despite an I/O subsystem optimized for high performance, the SQL Server storage engine is not exactly a file-server replacement.

A SQL Server database consists of transaction log and data files, as illustrated in Figure 1. In order to ensure reliable transactional behavior, SQL Server first writes all transaction records to the log file before it flushes the corresponding data in 8KB pages to the data file on disk. Depending on the selected recovery model, this requires more than twice the BLOB size in storage capacity until you perform a backup and purge the transaction log. Moreover, SQL Server does not store unstructured SharePoint content directly in data pages. Instead, SQL Server uses a separate collection of text/image pages and only stores a 16-byte text pointer to the BLOB’s root node in the data row. Text/image pages are organized in a balanced tree, yet there is only one collection of text/image pages for each table. For the AllDocStreams table, this means that the content of all files is spread across the same text/image page collection. A single text/image page can hold data fragments from multiple BLOBs, or it may hold intermediate nodes for BLOBs larger than 32KB in size.

By default, SharePoint stores BLOB data in the Content column of the AllDocStreams table in the content database. The obvious advantage of this approach is straightforward transactional consistency between relational data and the associated non-relational file contents. For example, it’s not complicated to insert the metadata of a Word document along with the unstructured content into a content database, nor is it complicated to associate metadata with the corresponding unstructured content in select, update, or delete operations. However, the most obvious disadvantage of the default approach is an inefficient use of storage resources. Despite an I/O subsystem optimized for high performance, the SQL Server storage engine is not exactly a file-server replacement.

A SQL Server database consists of transaction log and data files, as illustrated in Figure 1. In order to ensure reliable transactional behavior, SQL Server first writes all transaction records to the log file before it flushes the corresponding data in 8KB pages to the data file on disk. Depending on the selected recovery model, this requires more than twice the BLOB size in storage capacity until you perform a backup and purge the transaction log. Moreover, SQL Server does not store unstructured SharePoint content directly in data pages. Instead, SQL Server uses a separate collection of text/image pages and only stores a 16-byte text pointer to the BLOB’s root node in the data row. Text/image pages are organized in a balanced tree, yet there is only one collection of text/image pages for each table. For the AllDocStreams table, this means that the content of all files is spread across the same text/image page collection. A single text/image page can hold data fragments from multiple BLOBs, or it may hold intermediate nodes for BLOBs larger than 32KB in size.

image001

Figure 1 Default SharePoint BLOB storage in SQL Server

Let’s not dive too deeply into SQL Server internals, though. The point is that when reading unstructured content, SQL Server must go through the data row to get the text pointer and then through the BLOB’s root node and possibly additional intermediate nodes to locate all data fragments spread across any number of text/image pages that SQL Server must load into memory in full to get all data blocks. This is because SQL Server performs I/O operations at the page level. These complexities impair file-streaming performance in comparison to direct access through the file system.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s