Skip to content

Commit 23b23ac

Browse files
authored
Merge pull request #5438 from segmentio/identifiers-guide
Add Identifiers Guide
2 parents 9b87101 + d49080b commit 23b23ac

File tree

1 file changed

+78
-0
lines changed

1 file changed

+78
-0
lines changed

src/guides/working-with-ids.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: Working with Identifiers
3+
hidden: true
4+
---
5+
6+
> warning "Critical Segment recommendation"
7+
> Segment recommends that you use `uuid4` for `anonymousId`.
8+
9+
As part of your Segment implementation, you’ll come across various identifiers (IDs) that Segment’s systems may process. The three most prominent identifiers you’ll encounter are `anonymousId`, `userId`, and `groupId`.
10+
11+
This guide explains the most common Segment IDs, why Segment recommends formats like `uuidv4`, and other ID mechanics.
12+
13+
## Understanding the standard identifiers
14+
15+
This section explains the purpose of the three primary IDs and introduces the other two categories that may come into play as you expand your CDP implementation.
16+
17+
### Purpose
18+
19+
A critical component of the Segment CDP is to identify the user through time. To do this, Segment’s default implementations use two identifiers, `anonymousID` and `userID`.
20+
21+
The following table describes the purposes of these two IDs, as well as `groupId`:
22+
23+
| Identifier | Purpose |
24+
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
25+
| `anonymousId` | `anonymousId` tracks user activity in CDPs and beyond. It lets you attach an identifier to an anonymous user and helps you ensure that all data is captured before Segment identifies the user through a `userId`. |
26+
| `userId` | `userId` comes into play once Segment has identified a user, which usually occurs through a form of authentication, like a login. |
27+
| `groupId` | `groupId` lets you capture B2B relationships between individual users and groups they may represent, serving as an identifier for these groups. |
28+
29+
### Identifier generation
30+
31+
Here's how Segment generates the IDs you just learned about:
32+
33+
#### `anonymousId` generation
34+
35+
`anonymousId` generation relates to the two types of libraries (or SDKs) that CDPs offer. Client-side libraries, like web and mobile, automatically generate a [universally unique identifier](https://en.wikipedia.org/wiki/Universally_unique_identifier){:target="_blank"} (UUID), whereas server-side libraries, like .NET, Node.js, and Java, make you generate these IDs yourself. As a result, you have the option to set the `anonymousId` manually in client-side libraries/SDKs.
36+
37+
#### `userId` generation
38+
39+
`userId` is a canonical identifier that you generate on your side, no matter what library or SDK you're using. Because `userId` is woven into your service or product delivery, it has the highest fidelity.
40+
41+
#### `groupId` generation
42+
43+
`groupId` generation is identical to `userId` generation. You generate `groupId` and maintain it off-platform in your customer database.
44+
45+
## Segment's guidance on identifier formats
46+
47+
As you work with identifiers, **Segment recommends that you use `uuidv4` for `anonymousId`**. The following table lists the criteria that Segment recommends your identifiers satisfy, as well as why Segment recommends `uuidv4`:
48+
49+
| Trait | Reasoning |
50+
| ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
51+
| Global uniqueness | `uuidv4` generates statistically unique identifiers without needing a central authority or coordination between systems. This is ideal for distributed systems. |
52+
| Non-sequential | Unlike incremental integer IDs, `uuidv4` generates non-sequential IDs. This offers a security advantage, as it makes it harder for malicious users to guess other valid IDs. |
53+
| No information leakage | `uuidv4` doesn't reveal information about the data its associated with, unlike other ID generation strategies that may encode information about the data, time creation, or, even worse, personal data on the individual it identifies. |
54+
| Standardized | UUIDs are standardized, which means they are widely recognized and suported across various platforms and languages. |
55+
| No collision | The likelihood of collision, or the generation of two identical UUIDs, is infinitesimally small, even after generation of billions of UUIDs. |
56+
| Easy generation | You can generate `uuidv4` easily, and it has many deployments across virtually all programming languages. |
57+
58+
### Persistence and resetting
59+
60+
This section explains the persistence of client-side and server-side identifiers.
61+
62+
#### Client-side persistence
63+
64+
Most client-side libraries and SDKs write used identifiers into some form of memory, like cookies and `localStorage` on the web or in-memory databases on mobile devices.
65+
66+
This simplifies persistence and, in most cases, allows libraries and SDKs to fetch IDs automatically from memory, so that you don't have to send all IDs deliberately. Because users may change, though, CDPs offer the functionality to reset these IDs. For Segment, the corresponding method is `analytics.reset()`.
67+
68+
#### Server-side persistence
69+
70+
Servers don't have this kind of memory readily available. Because of this, you'd need to deploy ID persistence as a custom component on your infrastructure.
71+
72+
Segment finds that this is rarely necessary, however, as most servers only process data on known users instead of anonymous users. As a result, servers will already have access to a `userId`. Because there is no ID persistence in requests to your CDP, you won't need to worry about resetting.
73+
74+
## Going beyond the default
75+
76+
While this guide focused on `anonymousId`, `userId`, and `groupId`, other identifiers also exist, like IDFA, system IDs, and so on. Such identifiers vary in their origin, importance, and persistence. Often, these identifiers are system-generated and, as a result, don't require conscious design decisions as you implement your CDP.
77+
78+
Segment recommends applying the formatting criteria discussed on this page to, at a minimum, `anonymousId`, `userId`, and `groupId`. Segment also recommends that you use these criteria for other identifiers you may work with, even beyond Segment's standard IDs.

0 commit comments

Comments
 (0)