You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/v1/samples-designer.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -126,7 +126,7 @@ The sample datasets are available under **Datasets**-**Samples** category. You c
126
126
|German Credit Card UCI dataset|The UCI Statlog (German Credit Card) dataset ([Statlog+German+Credit+Data](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data)), using the german.data file.<br/>The dataset classifies people, described by a set of attributes, as low or high credit risks. Each example represents a person. There are 20 features, both numerical and categorical, and a binary label (the credit risk value). High credit risk entries have label = 2, low credit risk entries have label = 1. The cost of misclassifying a low risk example as high is 1, whereas the cost of misclassifying a high risk example as low is 5.|
127
127
|IMDB Movie Titles|The dataset contains information about movies that were rated in X tweets: IMDB movie ID, movie name, genre, and production year. There are 17K movies in the dataset. The dataset was introduced in the paper "S. Dooms, T. De Pessemier and L. Martens. MovieTweetings: a Movie Rating Dataset Collected From Twitter. Workshop on Crowdsourcing and Human Computation for Recommender Systems, CrowdRec at RecSys 2013."|
128
128
|Movie Ratings|The dataset is an extended version of the Movie Tweetings dataset. The dataset has 170K ratings for movies, extracted from well-structured tweets on X. Each instance represents a tweet and is a tuple: user ID, IMDB movie ID, rating, timestamp, number of favorites for this tweet, and number of retweets of this tweet. The dataset was made available by A. Said, S. Dooms, B. Loni and D. Tikk for Recommender Systems Challenge 2014.|
129
-
|Weather Dataset|Hourly land-based weather observations from NOAA ([merged data from 201304 to 201310](https://az754797.vo.msecnd.net/data/WeatherDataset.csv)).<br/>The weather data covers observations made from airport weather stations, covering the time period April-October 2013. Before uploading to the designer, the dataset was processed as follows: <br/> - Weather station IDs were mapped to corresponding airport IDs <br/> - Weather stations not associated with the 70 busiest airports were filtered out <br/> - The Date column was split into separate Year, Month, and Day columns <br/> - The following columns were selected: AirportID, Year, Month, Day, Time, TimeZone, SkyCondition, Visibility, WeatherType, DryBulbFarenheit, DryBulbCelsius, WetBulbFarenheit, WetBulbCelsius, DewPointFarenheit, DewPointCelsius, RelativeHumidity, WindSpeed, WindDirection, ValueForWindCharacter, StationPressure, PressureTendency, PressureChange, SeaLevelPressure, RecordType, HourlyPrecip, Altimeter|
129
+
|Weather Dataset|Hourly land-based weather observations from NOAA (merged data from 201304 to 201310).<br/>The weather data covers observations made from airport weather stations, covering the time period April-October 2013. Before uploading to the designer, the dataset was processed as follows: <br/> - Weather station IDs were mapped to corresponding airport IDs <br/> - Weather stations not associated with the 70 busiest airports were filtered out <br/> - The Date column was split into separate Year, Month, and Day columns <br/> - The following columns were selected: AirportID, Year, Month, Day, Time, TimeZone, SkyCondition, Visibility, WeatherType, DryBulbFarenheit, DryBulbCelsius, WetBulbFarenheit, WetBulbCelsius, DewPointFarenheit, DewPointCelsius, RelativeHumidity, WindSpeed, WindDirection, ValueForWindCharacter, StationPressure, PressureTendency, PressureChange, SeaLevelPressure, RecordType, HourlyPrecip, Altimeter|
130
130
|Wikipedia SP 500 Dataset|Data is derived from Wikipedia (https://www.wikipedia.org/) based on articles of each S&P 500 company, stored as XML data. <br/>Before uploading to the designer, the dataset was processed as follows: <br/> - Extract text content for each specific company <br/> - Remove wiki formatting <br/> - Remove non-alphanumeric characters <br/> - Convert all text to lowercase <br/> - Known company categories were added <br/>Note that for some companies an article could not be found, so the number of records is less than 500.|
131
131
|Restaurant Feature Data| A set of metadata about restaurants and their features, such as food type, dining style, and location. <br/>**Usage**: Use this dataset, in combination with the other two restaurant datasets, to train and test a recommender system.<br/> **Related Research**: Bache, K. and Lichman, M. (2013). [UCI Machine Learning Repository](https://archive.ics.uci.edu/). Irvine, CA: University of California, School of Information and Computer Science.|
132
132
|Restaurant Ratings| Contains ratings given by users to restaurants on a scale from 0 to 2.<br/>**Usage**: Use this dataset, in combination with the other two restaurant datasets, to train and test a recommender system. <br/>**Related Research**: Bache, K. and Lichman, M. (2013). [UCI Machine Learning Repository](https://archive.ics.uci.edu/). Irvine, CA: University of California, School of Information and Computer Science.|
0 commit comments