Skip to content

Commit 469dcd7

Browse files
Update create_index_from_csv.md
1 parent d4b8964 commit 469dcd7

File tree

1 file changed

+8
-16
lines changed

1 file changed

+8
-16
lines changed

docs/create_index_from_csv.md

Lines changed: 8 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,12 @@
11
### Create an Azure search index from a csv file
22
:sparkles: Here we outline how to create an Azure search index from a CSV file summarizing funded award data exported from Reporter.nih.gov
33

4-
### 1) Generate input CSV
4+
### 1) Download input CSV
55
:ear: If you already have your csv ready, skip to section (2)
66

7-
Our input data comes from the csv export option for [Reporter.nih.gov](https://reporter.nih.gov/). Navigate to reporter.nih.gov and select `Advanced Search`. Input your search parameters. In this case we filtered for awards made by NIGMS in FY 23. In the top right, select `Export`.
7+
Download this public [csv file](https://www.kaggle.com/datasets/henryshan/2023-data-scientists-salary?resource=download) from kaggle to use as our input.
88

9-
Select your export columns and make sure you export as a csv. In the example input data file we only selected 'Title', 'Project_ID', and 'Total_Cost', although a few other columns were also exported.
10-
11-
![Export from Reporter](/docs/images/1_export_reporter_csv.png)
12-
13-
If using the UI to upload, you need to make two small edits to the csv that gets exported. First, remove the extra comma at the end of each line. Second, replace the spaces in column names in the header row. You can do this using something like Python, or just do a find/replace in a text editor.
9+
![Kaggle-csv](/docs/images/kaggle-input.jpeg)
1410

1511
### 2) Import data into Azure blob storage
1612
:ear: If you already added your data to blob storage skip to section (3)
@@ -35,41 +31,37 @@ Navigate to AI Search and [create a new search](https://learn.microsoft.com/en-u
3531

3632
![Create new search](/docs/images/5_create_new_db.png)
3733

38-
Click `Import data`
34+
Click `Import data`.
3935

4036
![Import Data](/docs/images/6_import_data.png)
4137

4238
Now fill out all the necessary parameters.
4339
+ Data Source: Select `Azure Blob Storage`. New options will drop down.
44-
+ Data source name: This can be anything, but go with something like `grant-data`.
40+
+ Data source name: This can be anything, but go with something like `ds-salaries-data`.
4541
+ Data to extract: Select `Content and metadata`.
4642
+ Parsing mode: Select `Delimited text`. Check the `First Line Contains Header` box and leave `Delimiter Character` as `,`.
47-
+ Delimiter Headers: Enter the comma-delimited list of column headers.
4843
+ Connection string: Click `Choose an existing connection` and navigate to your storage account and container.
4944
+ Managed identity authentication: Leave as default.
5045
+ Container name: Should be populated when you connect via Connection String, but otherwise just enter your container name here.
5146
+ Blob folder: *Optional*, if you have a folder within the container with the file(s) you want to index, enter that path here.
5247
+ Description: *Optional*.
5348
+ If you get errors when trying to go to the next screen, make sure you don't have trailing commas in your csv, and there are not spaces in the header names. If this happens, fix those errors, re-upload to blob storage, and then try again!
5449

55-
![Connect to blog](/docs/images/7_connect_to_blob.png)
50+
![Connect to blog](/docs/images/import-data.jpeg)
5651

5752
Skip ahead to `Customize target index`.
5853
+ Give your index a name.
5954
+ Make `Project_Number` your key.
6055
+ Make sure the expected column names are present under fields. For the columns you expect to use, select `Retrievable` and `Searchable`. If you select all the columns you will just pay for indexing you are not using.
6156

62-
![Customize index](/docs/images/8_target_index.png)
57+
![Customize index](/docs/images/index-csv.jpeg)
6358

6459
Advance to `Create an indexer`, name your indexer, then click `Submit`.
6560

66-
![Create indexer](/docs/images/9_create_indexer.png)
61+
![Create indexer](/docs/images/create-indexer.jpeg)
6762

6863
Navigate to `Indexes` on the left panel and wait until your index shows as many documents as you have lines in your file. It will read 0 documents until it is finished indexing. The example 500 line csv takes about one minute.
6964

70-
![Check index](/docs/images/10_check_index.png)
71-
72-
7365
And that is it! Now return to [the tutorial notebook to run queries against this csv using GPT-4]( /notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb).
7466

7567

0 commit comments

Comments
 (0)