What are the best practices for using a GUID as a primary key, specifically regarding performance?

database-design entity-framework guid primary-key sql-server

Question

In virtually all of the tables in my application, the GUID serves as the primary key. I've heard that this might cause performance problems. Although I haven't seen any issues, I'm preparing to launch a new application and I still want to use GUIDs as the Primary Keys. However, I was considering utilizing a Composite Primary Key (The GUID and maybe another field.)

When you have several settings, such as "production," "test," and "dev" databases, and when you need to migrate data across databases, a GUID is handy and simple to handle.

I'll be using Entity Framework 4.3, and before adding the Guid into the database, I want to assign it in the application code. I don't want SQL to produce the Guid, for example.

In order to avoid the alleged performance penalties associated with this method, what is the optimal procedure for developing GUID-based primary keys?

1
328
8/13/2014 5:20:43 AM

Accepted Answer

If you must, you could definitely make the case for using GUIDs as the table's primary key if they appear to be a logical option for your main key. The GUID column should be used as the grouping key, which is what SQL Server uses by default unless you expressly instruct it otherwise.

You should ideally keep these two topics separate:

  1. A zzz-24 One of the possible keys that consistently and uniquely identifies every row in your database is the logical construct -zzz. Really, anything could be this:INT , aGUID Choose whatever makes the most sense for your situation from, a string.

  2. A tiny, stable, ever-increasing data type is your best option for the grouping key (the column or columns that constitute the "clustered index" on the table), which is a physical storage-related issue.INT or BIGINT as the default choice.

The clustering key on a SQL Server table is often also used as the primary key, but that doesn't have to be the case! When the old GUID-based Primary / Clustered Key was split into two independent keys—the Primary (Logical) Key on the GUID and the Clustering (Ordering) Key on a separate key—I personally saw enormous speed increases.INT IDENTITY(1,1) column.

A lot of people, including Jennifer Tripp, the Queen of Indexing, have remarked several times thatGUID Since the clustering key isn't ideal, it will cause significant page and index fragmentation as well as subpar performance because of its unpredictability.

Yes, I am aware that therenewsequentialid() yet, even it is not genuinely and totally sequential and hence has the same issues as the SQL Server 2005 and earlier versions.GUID - but a little less obviously.

The clustering key on a table will be added to each and every item on each and every non-clustered index on your table as well, so you really want to make sure it's as tiny as possible. This is another thing to take into account. Usually, anINT for the great majority of tables should be adequate, particularly when compared to aGUID You may save yourself hundreds of megabytes of storage space on disk and in server memory by using the clustering key.

Using quick calculationsINT vs. GUID as the Clustering Key and Primary:

  • a base table with one million rows (3.8 MB vs. 15.26 MB)
  • Unclustered indexes in six (22.89 MB vs. 91.55 MB)

And that's just on one table! 25 MB total vs. 106 MB

Excellent writing by Kimberly Tripp provides further fuel for thought. Read it, reread it, and consider it. It really is the gospel of SQL Server indexing.

PS: Of course, most of these arguments won't actually have much of an influence on you if you're working with only a few hundred or a few thousand rows. However, those elements become very critical and vital to comprehend once you reach tens or hundreds of thousands of rows or when you begin counting in millions.

If you would like to have yourPKGUID the other column as your secondary key (but not as your clustering key), andMYINT (INT IDENTITY Use this as your clustering key:

CREATE TABLE dbo.MyTable
(PKGUID UNIQUEIDENTIFIER NOT NULL,
 MyINT INT IDENTITY(1,1) NOT NULL,
 .... add more columns as needed ...... )

ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY NONCLUSTERED (PKGUID)

CREATE UNIQUE CLUSTERED INDEX CIX_MyTable ON dbo.MyTable(MyINT)

Basically, all you have to do is inform thePRIMARY KEY restriction that it'sNONCLUSTERED (if not, it is automatically formed as your clustered index) - and you then build a second index that is specified asCLUSTERED

If you need to "re-engineer" an existing system for performance, this will work and is a viable solution. If you create a new system from scratch and are not involved in a replication situation, I recommend choosingID INT IDENTITY(1,1) I use it as my clustered primary key since it is the most effective option.

483
8/13/2012 4:59:39 PM

Popular Answer

Since 2005, I've used GUIDs as PKs. It is without a doubt the greatest method for combining dispersed data in the field of distributed databases. Without having to worry about integers matching across connected tables, you can fire and forget about merge tables. Joins for GUIDs may be replicated without incident.

The way I use GUIDs is as follows:

  1. GUID = PK High row tables (over 50 million entries) may need table splitting or other performance measures since GUIDs are indexed similarly to strings. As SQL Server becomes more and more efficient, performance issues become less and less relevant.

  2. NON-Clustered index is the PK Guid. A GUID should only ever be cluster indexed if it is NewSequentialID. Even so, a server reset will result in significant ordering issues.

  3. Every table should have a ClusterID Int. Your table is ordered by this CLUSTERED Index.

  4. Although joining on ClusterIDs (int) is more effective, performance is unaffected since I deal with tables that include 20–30 million records. Use the ClusterID concept as your main key and join on ClusterID if you want the best performance.

My email table is below.

CREATE TABLE [Core].[Email] (
    [EmailID]      UNIQUEIDENTIFIER CONSTRAINT [DF_Email_EmailID] DEFAULT (newsequentialid()) NOT NULL,        
    [EmailAddress] NVARCHAR (50)    CONSTRAINT [DF_Email_EmailAddress] DEFAULT ('') NOT NULL,        
    [CreatedDate]  DATETIME         CONSTRAINT [DF_Email_CreatedDate] DEFAULT (getutcdate()) NOT NULL,      
    [ClusterID] INT NOT NULL IDENTITY,
    CONSTRAINT [PK_Email] PRIMARY KEY NonCLUSTERED ([EmailID] ASC)
);
GO

CREATE UNIQUE CLUSTERED INDEX [IX_Email_ClusterID] ON [Core].[Email] ([ClusterID])
GO

CREATE UNIQUE NONCLUSTERED INDEX [IX_Email_EmailAddress] ON [Core].[Email] ([EmailAddress] Asc)


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow