Skip to content

Commit f74515c

Browse files
committed
Updating mapping example
1 parent 2eb408b commit f74515c

File tree

10 files changed

+22
-14
lines changed

10 files changed

+22
-14
lines changed

learn-pr/azure/building-end-to-end-data-governance-master-data-stack-with-microsoft-purview-cluedin/includes/5-map-data.md

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,30 +4,30 @@ Interestingly, CluedIn doesn't mandate that we onboard this data in any specific
44

55
So, we can onboard the data in the order we find them in the datasources tab:
66

7-
1. Navigate through Integrations in CluedIn to the ingestion endpoint called Companies.csv. You can do this by selecting **Integrations**, then **Data Sources**, then selecting the data source, then the endpoint that drops down beneath it, and then you'll see the copy data automation from Companies.csv. Select that and you'll see your data!
7+
1. Navigate through Integrations in CluedIn to the ingestion endpoint called Employees.csv. You can do this by selecting **Integrations**, then **Data Sources**, then selecting the data source, then the endpoint that drops down beneath it, and then you'll see the copy data automation from Employees.csv. Select that and you'll see your data!
88

99
>[!NOTE]
1010
> It's worth mentioning that this isn't a CSV file hosted in CluedIn; it's the raw data hosted in the staging area in CluedIn. You'll see a preview of 10 rows of data, including the companies in our original CSV file.
1111
12-
:::image type="content" source="../media/Data_Preview_Arca.png" alt-text="Screenshot of the CluedIn window, showing the companies.csv window.":::
12+
:::image type="content" source="../media/preview-employees.png" alt-text="Screenshot of the CluedIn window, showing the employees.csv window.":::
1313

1414
1. CluedIn may prompt you to map the data, in which case you'll select the **Create Mapping** button. If not, you can select the **Map** tab and then the **Map data** button.
1515

16-
:::image type="content" source="../media/Automapping_New_Employee.png" alt-text="Screenshot of the Create Mapping page in CluedIn, showing the configure tab.":::
16+
:::image type="content" source="../media/automapping-new-employee.png" alt-text="Screenshot of the Create Mapping page in CluedIn, showing the configure tab.":::
1717

1818
1. Choose **Auto Mapping**.
1919

20-
1. On the **Configure** tab, you'll choose **Entity Type**. Add **Company** here, and select the prompt that says **create "Company"** and select an icon for your new entity type.
20+
1. On the **Configure** tab, you'll choose **Entity Type**. Add **Employee** here, and select the prompt that says **create "Employee"** and select an icon for your new entity type.
2121

2222
An Entity Type in CluedIn can be thought of as the Domain of the data we're working with. This is typically some type of noun such as a Person, Company, Vendor, or Dog.
2323

2424
CluedIn provides all of the domains in the Microsoft Common Data Model by default and some of the domains you'll need to master won't always fall into the ones that already exist.
2525

26-
1. Next, we're going to create a new **Vocabulary**. Think of this as the structure or "schema" of the domain. In the vocabulary box, you can search **Company** again and select the **create "Company"** option. We're going to create a new structure that matches our data. Follow the prompts to create a new vocabulary, and you'll see a preview of the new structure that contains all the columns names the CSV has.
26+
1. Next, we're going to create a new **Vocabulary**. Think of this as the structure or "schema" of the domain. In the vocabulary box, you can search **Employee** again and select the **create "Employee"** option. We're going to create a new structure that matches our data. Follow the prompts to create a new vocabulary, and you'll see a preview of the new structure that contains all the columns names the CSV has.
2727

28-
:::image type="content" source="../media/Automapping_Create_Vocabulary.png" alt-text="Screenshot of the Create Mapping page in CluedIn, showing the vocabulary selector on the configure tab.":::
28+
:::image type="content" source="../media/automapping-create-vocabulary.png" alt-text="Screenshot of the Create Mapping page in CluedIn, showing the vocabulary selector on the configure tab.":::
2929

30-
Instead of creating the model of a **Company** upfront before ingesting data, we'll instead use the data to create our model and as we start to see more data on companies, we'll evolve and change the model as we need to. This agile approach yields an automated way for us to integrate the data - but also allows us to work with changes in model alongside the data.
30+
Instead of creating the model of an **Employee** upfront before ingesting data, we'll instead use the data to create our model and as we start to see more data on companies, we'll evolve and change the model as we need to. This agile approach yields an automated way for us to integrate the data - but also allows us to work with changes in model alongside the data.
3131

3232
1. Select **Create Mapping** to accept the default names. If you chose **Automated Mapping** as suggested, CluedIn will provide some automated *smarts* including the automated detection of potential unique identifiers of the data.
3333

@@ -37,23 +37,31 @@ So, we can onboard the data in the order we find them in the datasources tab:
3737

3838
1. Select the **Next** button and you'll be taken to the entity mapping where we can see how CluedIn will interpret some of the data in its core properties.
3939

40-
1. Notice that the Name of the entity in CluedIn has automatched to use the Company Name. Notice under the Origin Code Accordian that CluedIn has chosen the ID column as the Origin Code (Primary Key) and it has also chosen that both the ID and the Email most likely can be used as ways to uniquely look up each of the employees in this list.
40+
1. Notice that the Name of the entity in CluedIn has automatched to use the Employee Name. Notice under the Origin Code Accordian that CluedIn has chosen the ID column as the Origin Code (Primary Key) and it has also chosen that both the ID and the Email most likely can be used as ways to uniquely look up each of the employees in this list.
4141

42-
:::image type="content" source="../media/Remove_PersonId_Code.png" alt-text="Screenshot of the general details of the map entity tab of companies.csv.":::
42+
:::image type="content" source="../media/remove-personid-code.png" alt-text="Screenshot of the general details of the map entity tab of employee.csv.":::
43+
44+
1. Under **Entity Codes**, select th edit pencil next to **Person Id**.
45+
1. Select **Custom** and enter **EmployeeID** as the Custom Origin Value.
46+
This tells CluedIn that this value originates from a company employee ID. For key values it can be important to update these origins with labels that specify their actual origins, so CluedIn can match them together. CluedIn's autogenerated origins will be based on the file, but if pieces of data across several files come from the same place, you should specify that with a custom origin.
4347

4448
1. Select **Next** to go to the next section that is called **Edges**.
4549

46-
In this section we'll specify if there are columns that refer to records that aren't Companies, but are some type of relationship to another record. This could easily be a relationship to a Domain of the same type (Company) or in this example, it seems to be referring to a Person (from the column name person_id). We already know, although it's a person, we could also say that it's referring to an Employee. The good part about CluedIn's mapping process, is that we can choose one type now and we can always come back and change this later, once we know more about our data. CluedIn will handle all the cleanup necessary.
50+
In this section we can specify if there are columns that refer to records that aren't Employees, but are some type of relationship to another record.
4751

48-
:::image type="content" source="../media/Person_Edge_Create.png" alt-text="Screenshot of the Add Edge menu on CluedIn.":::
52+
This could easily be a relationship to a Domain of the same type (Employee) or example:
53+
The companies .csv file has a column called person_id. It seems to be referring to a Person (from the column name person_id). We already know, although it's a person. When we create a mapping for companies, we could also say that this column is referring to an Employee. The good part about CluedIn's mapping process, is that we can choose one type now and we can always come back and change this later, once we know more about our data. CluedIn will handle all the cleanup necessary.
4954

5055
1. Select **Finish** to leave the mapping.
5156

5257
1. Select the **Process** tab, then select **Process**, and **Confirm**.
5358

54-
:::image type="content" source="../media/Process_Companies.png" alt-text="Screenshot of the are you sure you want to process page in CluedIn.":::
55-
5659
1. Repeat the steps above for all the other files, until all the data is processed in the platform. For entity and vocabulary for the other data files, you can choose existing entities and vocabularies, or create new ones.
5760

5861
>[!NOTE]
59-
> For the next part of the learning module, you will need to repeat the process above with the other 6 files with their respective Domains, Vocabularies, Identifiers and more.
62+
> For the next part of the learning module, you will need to repeat the process above with the other 6 files with their respective Domains, Vocabularies, Identifiers and more.
63+
64+
>[!TIP]
65+
> When you're mapping your other files, follow these tips:
66+
>1. Update any key values with custom origin codes where multiple files reference the same information. For example, mark any employee GUIDs as the EmployeeID entity code.
67+
>1. Add edges where records of one type reference records of another type. Like Companies referencing Employee IDs.
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)