Skip to content

Commit ad29d41

Browse files
committed
Minor textual updates
1 parent f9d3665 commit ad29d41

File tree

2 files changed

+6
-15
lines changed

2 files changed

+6
-15
lines changed

notebooks/CART_LawSchoolAdmissionBar.ipynb

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -363,7 +363,7 @@
363363
"- ugpa: The student's undergraduate GPA, continous variable;\n",
364364
"- bar: Ground truth label indicating whether or not the student passed the bar, i.e. passed 1st time, passed 2nd time, failed, non-graduated\n",
365365
"\n",
366-
"The CART method will be used evaluate the distribution and correlation differences between the real and synthetic data. CART generally produces higher quality synthetic datasets, but might not run on datasets with categorical variables with 20+ categories. Use Gaussian Copula in those cases.\n",
366+
"The CART method is used to generate the synthetic data. CART generally produces higher quality synthetic datasets, but might not run on datasets with categorical variables with 20+ categories. Use Gaussian Copula in those cases.\n",
367367
"\n",
368368
"*The original paper can be found [here](https://files.eric.ed.gov/fulltext/ED469370.pdf)."
369369
]
@@ -372,7 +372,7 @@
372372
"cell_type": "markdown",
373373
"metadata": {},
374374
"source": [
375-
"### 0. Preview of data"
375+
"### 0. Preview of real data"
376376
]
377377
},
378378
{
@@ -523,7 +523,7 @@
523523
"source": [
524524
"**UI text #2**\n",
525525
"\n",
526-
"If detected data types are incorrect, please change this locally in the dataset before attaching it again."
526+
"If the detected data types are incorrect, please change this locally in the source dataset before attaching it to the app."
527527
]
528528
},
529529
{
@@ -659,15 +659,6 @@
659659
"df.head()"
660660
]
661661
},
662-
{
663-
"cell_type": "markdown",
664-
"metadata": {},
665-
"source": [
666-
"**UI text**\n",
667-
"\n",
668-
"In the generated synthetic data, our objective is to capture the one-dimensional, two-dimensional, and higher-dimensional relationships between variables in the dataset. The plots above will be used to evaluate how effectively these relationships are represented in the synthetic data."
669-
]
670-
},
671662
{
672663
"cell_type": "markdown",
673664
"metadata": {},

src/locales/en.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
"syntheticData": {
6060
"demo": {
6161
"heading": "Information about demo dataset",
62-
"description": "A subset of the [Law School Admission Bar](https://www.kaggle.com/datasets/danofer/law-school-admissions-bar-passage)* dataset is used as a demo. Synthetic data will be generated for the following columns:\n \n  \n- sex: student gender, i.e. 1 (male), 2 (female);\n- race1: race of student, i.e., asian, black, hispanic, white, other;\n- ugpa: undergraduate GPA of student (average course grades), continous variable;\n- bar: Ground truth label indicating whether or not the student passed the bar, i.e., passed 1st time, passed 2nd time, failed, non-graduated.\n \n  \n\nThe CART method will be used evaluate the distribution and correlation differences between the real and synthetic data. CART generally produces higher quality synthetic datasets, but might not run on datasets with categorical variables with 20+ categories. Use Gaussian Copula in those cases.\n \n  \n\n*The original paper can be found [here](https://files.eric.ed.gov/fulltext/ED469370.pdf)\n \n  \n"
62+
"description": "A subset of the [Law School Admission Bar](https://www.kaggle.com/datasets/danofer/law-school-admissions-bar-passage)* dataset is used as a demo. Synthetic data will be generated for the following columns:\n \n  \n- sex: student gender, i.e. 1 (male), 2 (female);\n- race1: race of student, i.e., asian, black, hispanic, white, other;\n- ugpa: undergraduate GPA of student (average course grades), continous variable;\n- bar: Ground truth label indicating whether or not the student passed the bar, i.e., passed 1st time, passed 2nd time, failed, non-graduated.\n \n  \n\nThe The CART method is used to generate the synthetic data. CART generally produces higher quality synthetic datasets, but might not run on datasets with categorical variables with 20+ categories. Use Gaussian Copula in those cases.\n \n  \n\n*The original paper can be found [here](https://files.eric.ed.gov/fulltext/ED469370.pdf)\n \n  \n"
6363
},
6464
"exportToPDF": "Download evaluation report as pdf",
6565
"exportToJSON": "Download synthetic data as json",
@@ -90,15 +90,15 @@
9090
"title": "Try it out!",
9191
"description": "Do you not have a dataset at hand? No worries use our demo dataset."
9292
},
93-
"columnsInDatasetInfo": "If detected data types are incorrect, please change this locally in the dataset before attaching it again.",
93+
"columnsInDatasetInfo": "If the detected data types are incorrect, please change this locally in the source dataset before attaching it to the app.",
9494
"univariateCharts": "Univariate distributions",
9595
"bivariateDistributionRealData": "Bivariate distribution",
9696
"univariateDistributionSyntheticData": "Univariate distribution",
9797
"bivariateDistributionSyntheticData": "Bivariate distribution",
9898
"correlationRealdata": "Correlation matrix",
9999
"correlationSyntheticData": "Correlation matrix",
100100
"dataSetPreview": {
101-
"heading": "0. Preview of data"
101+
"heading": "0. Preview of real data"
102102
},
103103
"columnsInDataset": "1. Data types detection",
104104
"_explanatoryDataAnalysisTitle": "2. Explanatory data analysis",

0 commit comments

Comments
 (0)