Skip to content

Commit e1e4b28

Browse files
Lola Davidsonccreutzi
authored andcommitted
Add Data Analysis Examples
1 parent 97b223b commit e1e4b28

File tree

14 files changed

+533
-3
lines changed

14 files changed

+533
-3
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ For more information about how to connect to the different APIs from MATLAB, inc
5959
### Agentic Workflows
6060
- [Solve Simple Math Problem Using AI Agent](/examples/SolveSimpleMathProblemUsingAIAgent.md)
6161
- [Fit Polynomial to Data Using AI Agent](/examples/FitPolynomialToDataUsingAIAgentExample.md) (requires Curve Fitting Toolbox™)
62+
### Data Analysis
63+
- [Analyze Table Data Using ChatGPT](/examples/AnalyzeTableDataUsingChatGPTExample.md)
64+
- [Visualize Table Data Using ChatGPT](/examples/VisualizeTableDataUsingChatGPTExample.md)
6265
### Tool Calling
6366
- [Analyze Scientific Papers Using ChatGPT Function Calls](/examples/AnalyzeScientificPapersUsingFunctionCalls.md)
6467
- [Analyze Text Data Using Parallel Function Calls with ChatGPT](/examples/AnalyzeTextDataUsingParallelFunctionCallwithChatGPT.md)
@@ -101,4 +104,4 @@ The license is available in the [license.txt](license.txt) file in this GitHub r
101104
## Community Support
102105
[MATLAB Central](https://www.mathworks.com/matlabcentral)
103106

104-
Copyright 2023-2025 The MathWorks, Inc.
107+
Copyright 2023-2026 The MathWorks, Inc.
Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
# Analyze Table Data Using ChatGPT
2+
3+
This example shows how to generate suggestions for analyzing tabular data in MATLAB® using ChatGPT™.
4+
5+
First, build a prompt that describes table data to ChatGPT. Next, generate insights and suggestions for data analysis in MATLAB.
6+
7+
# Setup
8+
9+
Using the OpenAI® API requires an OpenAI API key. For information on how to obtain an OpenAI API key, as well as pricing, terms and conditions of use, and information about available models, see the OpenAI documentation at [https://platform.openai.com/docs/overview](https://platform.openai.com/docs/overview).
10+
11+
To connect to the OpenAI API from MATLAB using LLMs with MATLAB, specify the OpenAI API key as an environment variable and save it to a file called ".env".
12+
13+
![image_0.png](AnalyzeTableDataUsingChatGPTExample_media/image_0.png)
14+
15+
To connect to OpenAI, the ".env" file must be on the search path.
16+
17+
Load the environment file using the `loadenv` function.
18+
19+
```matlab
20+
loadenv(".env")
21+
```
22+
23+
# Describe Table to ChatGPT
24+
25+
Create a table containing data that represents domestic airline flights in the United States in 2008.
26+
27+
A sample of this dataset will be sent to the AI model as part of the system prompt.
28+
29+
```matlab
30+
airlineData = readtable("airlinesmall_subset.xlsx",Sheet="2008");
31+
```
32+
33+
Calculate summary statistics to describe the table variables. Include statistics that might be useful for string and numeric data.
34+
35+
```matlab
36+
summaryStruct = summary(airlineData,Statistics=["nummissing" "numunique" "min" "max" "mean"]);
37+
```
38+
39+
Convert the summary statistics to JSON\-formatted text.
40+
41+
```matlab
42+
summaryString = string(jsonencode(summaryStruct,ConvertInfAndNaN=false));
43+
```
44+
45+
To clearly identify rows in the table, add row labels. Then, capture a random 5\-row sample of the data.
46+
47+
```matlab
48+
dataSample = airlineData;
49+
dataSample = addvars(dataSample,"Row " + (1:height(dataSample))', ...
50+
NewVariableNames="RowLabels",Before=1);
51+
rng default
52+
randomIdx = randperm(height(dataSample),5);
53+
randomIdx = sort(randomIdx);
54+
dataSample = dataSample(randomIdx,:);
55+
```
56+
57+
Convert the sample data to JSON\-formatted text.
58+
59+
```matlab
60+
sampleString = string(jsonencode(dataSample,ConvertInfAndNaN=false));
61+
```
62+
63+
Combine the summary and sample into a full description of the table.
64+
65+
```matlab
66+
dataName = "airlineData";
67+
dataDescription = "The MATLAB workspace contains a table with the name `" + dataName + "`." + newline + ...
68+
"Here are the basic summary statistics: " + newline + summaryString + newline + ...
69+
"Here is a random 5-row sample of the dataset: " + newline + sampleString;
70+
```
71+
72+
Create a system prompt for ChatGPT that includes the data description. In the prompt, specify that responses typically include MATLAB code.
73+
74+
```matlab
75+
systemPrompt = "You are a chat assistant designed to help analyze " + ...
76+
"tabular data using MATLAB. Your responses are concise and " + ...
77+
"typically contain MATLAB code snippets or suggest specific MATLAB functions." + ...
78+
newline + dataDescription;
79+
```
80+
81+
Connect to the OpenAI Chat Completion API using the [`openAIChat`](../doc/functions/openAIChat.md) function. Specify the model name.
82+
83+
```matlab
84+
mdl = openAIChat(systemPrompt,ModelName="gpt-4.1-mini");
85+
```
86+
87+
# Ask ChatGPT Questions About Data
88+
89+
You can ask ChatGPT for insights into your data and suggestions for analysis in MATLAB. For example, you can ask for an overview of the data, or ask how to clean up the data and visualize it.
90+
91+
```matlab
92+
generate(mdl,"Give me a high level overview of this dataset with a few interesting insights.")
93+
```
94+
95+
```matlabTextOutput
96+
ans =
97+
"This airlineData dataset contains flight records from the year 2008, with 1753 entries. It includes details such as dates (Month, DayofMonth, DayOfWeek), times (Departure, Arrival times both actual and scheduled), airline carriers, flight numbers, tail numbers, elapsed times, delays, and cancellation/diversion status.
98+
99+
Key variables:
100+
- Flight identifiers: UniqueCarrier (20 unique carriers), FlightNum, TailNum
101+
- Timing: DepTime, ArrTime, CRSDepTime, CRSArrTime, ActualElapsedTime, AirTime
102+
- Delays: ArrDelay, DepDelay, CarrierDelay, WeatherDelay, SecurityDelay, LateAircraftDelay
103+
- Locations: Origin and Dest airports (182 unique origins, 183 unique destinations)
104+
- Distances and Taxi times: Distance, TaxiIn, TaxiOut
105+
- Cancellation and diversion info
106+
107+
Insight highlights:
108+
- The mean arrival delay is about 10 minutes, with a max delay of 567 minutes indicating some heavy delays.
109+
- The average flight distance is around 706 miles.
110+
- A small portion of flights were canceled (around 1.37%) or diverted (about 0.34%).
111+
- Many delay-related variables have >75% missing values which may reflect only delays when applicable.
112+
- The departure delay (mean ~11 minutes) is close to arrival delay, indicating delays accumulate through the flight.
113+
- There is variation in scheduled versus actual times, with some flights departing or arriving earlier/later than scheduled.
114+
- Taxi out times are generally longer than taxi in times (16.8 min vs 7 min average).
115+
116+
Would you like a specific analysis or visualization for any aspect in this dataset?"
117+
118+
119+
```
120+
121+
```matlab
122+
generate(mdl,"Describe how I can clean up this data for further analysis in MATLAB.")
123+
```
124+
125+
```matlabTextOutput
126+
ans =
127+
"To clean up the airlineData table for further analysis in MATLAB, you can follow these steps:
128+
129+
1. Handle missing values:
130+
- Identify columns with missing values using `ismissing`.
131+
- For delay columns (CarrierDelay, WeatherDelay, etc.), replace missing with 0 if appropriate or remove rows with missing critical values.
132+
2. Remove or impute outliers if needed.
133+
3. Convert categorical variables from cell arrays to categorical type.
134+
4. Remove or filter canceled and diverted flights if your analysis excludes them.
135+
5. Fix data types for time columns if you need to analyze time (convert to datetime or duration).
136+
6. Remove unnecessary columns or rename for clarity.
137+
138+
Here is example code snippets:
139+
140+
```matlab
141+
% 1. Replace NaNs in delay columns with zero
142+
delayCols = {'CarrierDelay','WeatherDelay','SDelay','SecurityDelay','LateAircraftDelay'};
143+
for i = 1:length(delayCols)
144+
col = delayCols{i};
145+
airlineData.(col)(ismissing(airlineData.(col))) = 0;
146+
end
147+
148+
% 2. Convert cellular columns to categorical
149+
airlineData.UniqueCarrier = categorical(airlineData.UniqueCarrier);
150+
airlineData.Origin = categorical(airlineData.Origin);
151+
airlineData.Dest = categorical(airlineData.Dest);
152+
airlineData.CancellationCode = categorical(airlineData.CancellationCode);
153+
154+
% 3. Remove canceled and diverted flights if needed
155+
airlineData = airlineData(airlineData.Cancelled==0 & airlineData.Diverted==0, :);
156+
157+
% 4. Convert times to datetime or duration (optional)
158+
% For example, convert CRSDepTime and CRSArrTime to duration from midnight
159+
convertTime = @(t) hours(floor(t/100)) + minutes(mod(t,100));
160+
airlineData.CRSDepTime = convertTime(airlineData.CRSDepTime);
161+
airlineData.CRSArrTime = convertTime(airlineData.CRSArrTime);
162+
163+
% 5. Remove or impute other missing values if needed (e.g., DepTime, ArrTime)
164+
% For example, remove rows with missing DepTime or ArrTime
165+
airlineData = airlineData(~ismissing(airlineData.DepTime) & ~ismissing(airlineData.ArrTime), :);
166+
```
167+
168+
This should prepare your data for subsequent analysis. Let me know if you need code for specific cleaning or preprocessing tasks."
169+
170+
171+
```
172+
173+
```matlab
174+
generate(mdl,"Give me a variety of visualizations I can create in MATLAB to explore this data.")
175+
```
176+
177+
```matlabTextOutput
178+
ans =
179+
"Here are several types of visualizations you can create in MATLAB to explore the airlineData table:
180+
181+
1. Histogram of Arrival Delays
182+
```matlab
183+
histogram(airlineData.ArrDelay)
184+
xlabel('Arrival Delay (minutes)')
185+
ylabel('Frequency')
186+
title('Histogram of Arrival Delays')
187+
```
188+
189+
2. Boxplot of Departure Delays by Day of Week
190+
```matlab
191+
boxplot(airlineData.DepDelay, airlineData.DayOfWeek)
192+
xlabel('Day of Week')
193+
ylabel('Departure Delay (minutes)')
194+
title('Departure Delays by Day of Week')
195+
```
196+
197+
3. Scatter plot of Distance vs Actual Elapsed Time
198+
```matlab
199+
scatter(airlineData.Distance, airlineData.ActualElapsedTime)
200+
xlabel('Distance (miles)')
201+
ylabel('Actual Elapsed Time (minutes)')
202+
title('Distance vs Actual Elapsed Time')
203+
```
204+
205+
4. Bar chart of number of flights by Month
206+
```matlab
207+
counts = groupcounts(airlineData.Month);
208+
bar(1:12, counts)
209+
xlabel('Month')
210+
ylabel('Number of Flights')
211+
title('Number of Flights per Month')
212+
```
213+
214+
5. Boxplot of Arrival Delay by Carrier
215+
```matlab
216+
boxplot(airlineData.ArrDelay, airlineData.UniqueCarrier)
217+
xlabel('Carrier')
218+
ylabel('Arrival Delay (minutes)')
219+
title('Arrival Delay by Carrier')
220+
```
221+
222+
6. Scatter plot of DepDelay vs ArrDelay with color indicating Cancelled status
223+
```matlab
224+
gscatter(airlineData.DepDelay, airlineData.ArrDelay, airlineData.Cancelled, 'br', 'xo')
225+
xlabel('Departure Delay (minutes)')
226+
ylabel('Arrival Delay (minutes)')
227+
title('Departure vs Arrival Delay by Cancelled Status')
228+
legend({'Not Cancelled', 'Cancelled'})
229+
```
230+
231+
7. Time series of average arrival delay by day
232+
```matlab
233+
dailyAvgDelay = varfun(@mean, airlineData, 'InputVariables', 'ArrDelay', ...
234+
'GroupingVariables', {'Month', 'DayofMonth'});
235+
plot(datenum(2008, dailyAvgDelay.Month, dailyAvgDelay.DayofMonth), dailyAvgDelay.mean_ArrDelay)
236+
datetick('x', 'mmm-dd')
237+
xlabel('Date')
238+
ylabel('Average Arrival Delay (minutes)')
239+
title('Daily Average Arrival Delay')
240+
```
241+
242+
If you want code examples for any other specific visualizations or analyses, just ask!"
243+
244+
245+
```
246+
247+
*Copyright 2026 The MathWorks, Inc.*
6.05 KB
Loading

0 commit comments

Comments
 (0)