Skip to content

Commit 26b9d65

Browse files
4.0 release notes (dotnet#7302)
1 parent 4a6ce50 commit 26b9d65

File tree

1 file changed

+159
-0
lines changed

1 file changed

+159
-0
lines changed
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# [ML.NET](http://dot.net/ml) 4.0
2+
3+
## **New Features**
4+
- **Add sweepable estimator to NER** ([6965](https://github.com/dotnet/machinelearning/pull/6965))
5+
- **Introducing Tiktoken Tokenizer** ([6981](https://github.com/dotnet/machinelearning/pull/6981))
6+
- **Add text normalizer transformer to AutoML** ([6998](https://github.com/dotnet/machinelearning/pull/6998))
7+
- **Introducing Llama Tokenizer** ([#7078](https://github.com/dotnet/machinelearning/pull/7078))
8+
- **Introducing CodeGen Tokenizer** ([#7139](https://github.com/dotnet/machinelearning/pull/7139))
9+
- **Support Gpt-4o tokenizer model** ([#7157](https://github.com/dotnet/machinelearning/pull/7157))
10+
- **Add GenAI core package** ([#7177](https://github.com/dotnet/machinelearning/pull/7177))
11+
- **Use new System.Numerics.Tensors library for DataFrame arithmetic operations (.net8)** ([#7179](https://github.com/dotnet/machinelearning/pull/7179)) - Thanks @asmirnov82!
12+
- **Add Microsoft.ML.GenAI.Phi** ([#7184](https://github.com/dotnet/machinelearning/pull/7184))
13+
- **[GenAI] Add LLaMA support** ([#7220](https://github.com/dotnet/machinelearning/pull/7220))
14+
- **[GenAI] Support Llama 3.2 1B and 3B model** ([#7245](https://github.com/dotnet/machinelearning/pull/7245))
15+
- **[GenAI] Introduce CausalLMPipelineChatClient for MEAI.IChatClient** ([#7270](https://github.com/dotnet/machinelearning/pull/7270))
16+
- **Can now set advanced runtime settings in the MLContext.** ([#7273](https://github.com/dotnet/machinelearning/pull/7273))
17+
- **Introducing WordPiece and Bert tokenizers** ([#7275](https://github.com/dotnet/machinelearning/pull/7275))
18+
19+
## **Enhancements**
20+
- **Add support for Apache.Arrow.Types.TimestampType to DataFrame** ([6871](https://github.com/dotnet/machinelearning/pull/6871)) - Thanks @asmirnov82!
21+
- **Add new type to key-value converter** ([6973](https://github.com/dotnet/machinelearning/pull/6973))
22+
- **Update OnnxRuntime to 1.16.3** ([6975](https://github.com/dotnet/machinelearning/pull/6975))
23+
- **Tokenizer's Interfaces Cleanup** ([7001](https://github.com/dotnet/machinelearning/pull/7001))
24+
- **Match SweepableEstimatorFactory name with Ml.net name.** ([7007](https://github.com/dotnet/machinelearning/pull/7007))
25+
- **First round of perf improvements for tiktoken** ([7012](https://github.com/dotnet/machinelearning/pull/7012))
26+
- **Tweak CreateByModelNameAsync** ([7015](https://github.com/dotnet/machinelearning/pull/7015))
27+
- **Avoid LruCache in Tiktoken when cacheSize specified is 0** ([7016](https://github.com/dotnet/machinelearning/pull/7016))
28+
- **Tweak Tiktoken's BytePairEncode for improved perf** ([7017](https://github.com/dotnet/machinelearning/pull/7017))
29+
- **Optimize regexes used in tiktoken** ([7020](https://github.com/dotnet/machinelearning/pull/7020))
30+
- **Address the feedback on the tokenizer's library** ([7024](https://github.com/dotnet/machinelearning/pull/7024))
31+
- **Add Span support in tokenizer's Model abstraction** ([7035](https://github.com/dotnet/machinelearning/pull/7035))
32+
- **Adding needed Tokenizer's APIs** ([7047](https://github.com/dotnet/machinelearning/pull/7047))
33+
- **Add Tiktoken Synchronous Creation Using Model Name** ([#7080](https://github.com/dotnet/machinelearning/pull/7080))
34+
- **Embed Tiktoken data files** ([#7098](https://github.com/dotnet/machinelearning/pull/7098))
35+
- **Tokenizer's APIs Polishing** ([#7108](https://github.com/dotnet/machinelearning/pull/7108))
36+
- **More tokenizer's APIs cleanup** ([#7110](https://github.com/dotnet/machinelearning/pull/7110))
37+
- **Add more required Tokenizer APIs** ([#7114](https://github.com/dotnet/machinelearning/pull/7114))
38+
- **Tokenizer's APIs Update** ([#7128](https://github.com/dotnet/machinelearning/pull/7128))
39+
- **Allow developers to supply their own function to infer column data types from data while loading CSVs** ([#7142](https://github.com/dotnet/machinelearning/pull/7142)) - Thanks @sevenzees!
40+
- **Implement DataFrameColumn Apply and DropNulls methods** ([#7123](https://github.com/dotnet/machinelearning/pull/7123)) - Thanks @asmirnov82!
41+
- **Extend dataframe orderby method to allow defining preferred position for null values** ([#7118](https://github.com/dotnet/machinelearning/pull/7118)) - Thanks @asmirnov82!
42+
- **Implement ToString() method for DataFrameColumn class** ([#7103](https://github.com/dotnet/machinelearning/pull/7103)) - Thanks @asmirnov82!
43+
- **Added error handling, removed unwanted null check and enhanced readability** ([#7147](https://github.com/dotnet/machinelearning/pull/7147)) - Thanks @ravibaghel!
44+
- **Add targeting .Net 8.0 for DataFrame package** ([#7168](https://github.com/dotnet/machinelearning/pull/7168)) - Thanks @asmirnov82!
45+
- **create unique temporary directories to prevent permission issues** ([#7173](https://github.com/dotnet/machinelearning/pull/7173)) - Thanks @ErikApption!
46+
- **Tokenizer APIs Update** ([#7190](https://github.com/dotnet/machinelearning/pull/7190))
47+
- **Make most Tokenizer abstract methods virtual** ([#7198](https://github.com/dotnet/machinelearning/pull/7198))
48+
- **Reduce Tiktoken Creation Memory Allocation** ([#7202](https://github.com/dotnet/machinelearning/pull/7202))
49+
- **Refactor Namespace and Seald Classes in Microsoft.ML.AutoML.SourceGenerator Project** ([#7223](https://github.com/dotnet/machinelearning/pull/7223)) - Thanks @mhshahmoradi!
50+
- **[GenAI] Add generateEmbedding API to CausalLMPipeline** ([#7227](https://github.com/dotnet/machinelearning/pull/7227))
51+
- **[GenAI] Add Mistral 7B Instruction V0.3** ([#7231](https://github.com/dotnet/machinelearning/pull/7231))
52+
- **Move the Tokenizer's data into separate packages.** ([#7248](https://github.com/dotnet/machinelearning/pull/7248))
53+
- **Load onnx model from Stream of bytes** ([#7254](https://github.com/dotnet/machinelearning/pull/7254))
54+
- **Update tiktoken regexes** ([#7255](https://github.com/dotnet/machinelearning/pull/7255))
55+
- **Misc Changes** ([#7264](https://github.com/dotnet/machinelearning/pull/7264))
56+
- **Address the feedback regarding Bert tokenizer** ([#7280](https://github.com/dotnet/machinelearning/pull/7280))
57+
- **Add Timeout to Regex used in the tokenizers** ([#7284](https://github.com/dotnet/machinelearning/pull/7284))
58+
- **Final tokenizer's cleanup** ([#7291](https://github.com/dotnet/machinelearning/pull/7291))
59+
60+
## **Bug Fixes**
61+
- **Fix formatting that fails in VS** ([7023](https://github.com/dotnet/machinelearning/pull/7023))
62+
- **Issue #6606 - Add sample variance and standard deviation to NormalizeMeanVariance** ([6885](https://github.com/dotnet/machinelearning/pull/6885)) - Thanks @tearlant!
63+
- **Rename NameEntity to NamedEntity** ([#6917](https://github.com/dotnet/machinelearning/pull/6917))
64+
- **Fixes NER to correctly expand/shrink the labels** ([#6928](https://github.com/dotnet/machinelearning/pull/6928))
65+
- **fix #6949** ([#6951](https://github.com/dotnet/machinelearning/pull/6951))
66+
- **Fix DataFrame NullCount property of StringDataFrameColumn** ([#7090](https://github.com/dotnet/machinelearning/pull/7090)) - Thanks @asmirnov82!
67+
- **Fix Logical binary operations not supported exception** ([#7093](https://github.com/dotnet/machinelearning/pull/7093)) - Thanks @asmirnov82!
68+
- **Fix inconsistency in DataFrameColumns Clone API implementation** ([#7100](https://github.com/dotnet/machinelearning/pull/7100)) - Thanks @asmirnov82!
69+
- **Add Tiktoken's missing model names** ([#7111](https://github.com/dotnet/machinelearning/pull/7111))
70+
- **Accessing data by column after adding columns to a DataFrame returns error data** ([#7136](https://github.com/dotnet/machinelearning/pull/7136)) - Thanks @feiyun0112!
71+
- **Fix iterator type so that it matches boundary condition type** ([#7150](https://github.com/dotnet/machinelearning/pull/7150))
72+
- **Fix crash in Microsoft.ML.Recommender with validation set** ([#7196](https://github.com/dotnet/machinelearning/pull/7196))
73+
- **Fix #7203** ([#7207](https://github.com/dotnet/machinelearning/pull/7207))
74+
- **Fix decoding special tokens in SentencePiece tokenizer** ([#7233](https://github.com/dotnet/machinelearning/pull/7233))
75+
- **Fix dataframe incorrectly parse CSV when renameDuplicatedColumns is true** ([#7242](https://github.com/dotnet/machinelearning/pull/7242)) - Thanks @asmirnov82!
76+
- **Fixes #7271 AOT for ML.Tokenizers** ([#7272](https://github.com/dotnet/machinelearning/pull/7272)) - Thanks @euju-ms!
77+
78+
## **Build / Test updates**
79+
- **[main] Update dependencies from dotnet/arcade** ([#6703](https://github.com/dotnet/machinelearning/pull/6703))
80+
- **Migrate to the 'locker' GitHub action for locking closed/stale issues/PRs** ([6896](https://github.com/dotnet/machinelearning/pull/6896))
81+
- **Reorganize dataframe files** ([6872](https://github.com/dotnet/machinelearning/pull/6872)) - Thanks @asmirnov82!
82+
- **Updated ml.net versioning** ([6907](https://github.com/dotnet/machinelearning/pull/6907))
83+
- **Don't include the SDK in our helix payload** ([6918](https://github.com/dotnet/machinelearning/pull/6918))
84+
- **Make double assertions compare with tolerance instead of precision** ([6923](https://github.com/dotnet/machinelearning/pull/6923))
85+
- **Fix assert by only accessing idx** ([6924](https://github.com/dotnet/machinelearning/pull/6924))
86+
- **Only use semi-colons for NoWarn - fixes build break** ([6935](https://github.com/dotnet/machinelearning/pull/6935))
87+
- **Packaging cleanup** ([6939](https://github.com/dotnet/machinelearning/pull/6939))
88+
- **Add Backport github workflow** ([6944](https://github.com/dotnet/machinelearning/pull/6944))
89+
- **[main] Update dependencies from dotnet/arcade** ([6957](https://github.com/dotnet/machinelearning/pull/6957))
90+
- **Update .NET Runtimes to latest version** ([6964](https://github.com/dotnet/machinelearning/pull/6964))
91+
- **Testing light gbm bad allocation** ([6968](https://github.com/dotnet/machinelearning/pull/6968))
92+
- **[main] Update dependencies from dotnet/arcade** ([6969](https://github.com/dotnet/machinelearning/pull/6969))
93+
- **[main] Update dependencies from dotnet/arcade** ([6976](https://github.com/dotnet/machinelearning/pull/6976))
94+
- **FabricBot: Onboarding to GitOps.ResourceManagement because of FabricBot decommissioning** ([6983](https://github.com/dotnet/machinelearning/pull/6983))
95+
- **[main] Update dependencies from dotnet/arcade** ([6985](https://github.com/dotnet/machinelearning/pull/6985))
96+
- **[main] Update dependencies from dotnet/arcade** ([6995](https://github.com/dotnet/machinelearning/pull/6995))
97+
- **Temp fix for the race condition during the tests.** ([7021](https://github.com/dotnet/machinelearning/pull/7021))
98+
- **Make MlImage tests not block file for reading** ([7029](https://github.com/dotnet/machinelearning/pull/7029))
99+
- **Remove SourceLink SDK references** ([7037](https://github.com/dotnet/machinelearning/pull/7037))
100+
- **Change official build to use 1ES templates** ([7048](https://github.com/dotnet/machinelearning/pull/7048))
101+
- **Auto-generated baselines by 1ES Pipeline Templates** ([7051](https://github.com/dotnet/machinelearning/pull/7051))
102+
- **Update package versions in use by ML.NET tests** ([7055](https://github.com/dotnet/machinelearning/pull/7055))
103+
- **testing arm python brew overwite** ([7058](https://github.com/dotnet/machinelearning/pull/7058))
104+
- **Split out non concurrent test collections.** ([#6937](https://github.com/dotnet/machinelearning/pull/6937))
105+
- **[release/3.0] Update dependencies from dotnet/arcade** ([#6938](https://github.com/dotnet/machinelearning/pull/6938))
106+
- **Branding for 3.0.1** ([#6943](https://github.com/dotnet/machinelearning/pull/6943))
107+
- **Add Backport github workflow** ([#6944](https://github.com/dotnet/machinelearning/pull/6944))
108+
- **Torch sharp version updates and test fixes** ([#6954](https://github.com/dotnet/machinelearning/pull/6954))
109+
- **[main] Update dependencies from dotnet/arcade** ([#6957](https://github.com/dotnet/machinelearning/pull/6957))
110+
- **Working on memory issue during tests for TorchSharp** ([#7022](https://github.com/dotnet/machinelearning/pull/7022))
111+
- **M1 helix testing** ([#7033](https://github.com/dotnet/machinelearning/pull/7033))
112+
- **[main] Update dependencies from dotnet/arcade** ([#7052](https://github.com/dotnet/machinelearning/pull/7052))
113+
- **[main] Update dependencies from dotnet/arcade** ([#7075](https://github.com/dotnet/machinelearning/pull/7075))
114+
- **Reenable log publishing** ([#7076](https://github.com/dotnet/machinelearning/pull/7076))
115+
- **[main] Update dependencies from dotnet/arcade** ([#7079](https://github.com/dotnet/machinelearning/pull/7079))
116+
- **Update VMs** ([#7087](https://github.com/dotnet/machinelearning/pull/7087))
117+
- **Don't trigger PR validation builds for docs only changes** ([#7096](https://github.com/dotnet/machinelearning/pull/7096))
118+
- **Add CodeQL exclusions file** ([#7105](https://github.com/dotnet/machinelearning/pull/7105))
119+
- **Don't use deprecated -pt images** ([#7131](https://github.com/dotnet/machinelearning/pull/7131))
120+
- **Update locker.yml** ([#7133](https://github.com/dotnet/machinelearning/pull/7133))
121+
- **[main] Update dependencies from dotnet/arcade** ([#7138](https://github.com/dotnet/machinelearning/pull/7138))
122+
- **Try enabling TSA scan during build** ([#7149](https://github.com/dotnet/machinelearning/pull/7149))
123+
- **[main] Update dependencies from dotnet/arcade** ([#7151](https://github.com/dotnet/machinelearning/pull/7151))
124+
- **Remove Codeql.SourceRoot** ([#7155](https://github.com/dotnet/machinelearning/pull/7155))
125+
- **[main] Update dependencies from dotnet/arcade** ([#7161](https://github.com/dotnet/machinelearning/pull/7161))
126+
- **[main] Update dependencies from dotnet/arcade** ([#7165](https://github.com/dotnet/machinelearning/pull/7165))
127+
- **Add a stub packageSourceMapping** ([#7171](https://github.com/dotnet/machinelearning/pull/7171))
128+
- **update torchsharp and helix image** ([#7188](https://github.com/dotnet/machinelearning/pull/7188))
129+
- **Publish source index directly from repo** ([#7189](https://github.com/dotnet/machinelearning/pull/7189))
130+
- **Add package readmes** ([#7200](https://github.com/dotnet/machinelearning/pull/7200))
131+
- **Update dependency versions.** ([#7216](https://github.com/dotnet/machinelearning/pull/7216))
132+
- **[main] Update dependencies from dotnet/arcade** ([#7218](https://github.com/dotnet/machinelearning/pull/7218))
133+
- **Directly refer sql data client 4.8.6 package in GenAI tests to fix security vulnerable package** ([#7228](https://github.com/dotnet/machinelearning/pull/7228))
134+
- **[main] Update dependencies from dotnet/arcade** ([#7235](https://github.com/dotnet/machinelearning/pull/7235))
135+
- **docs: update nuget package badge** ([#7236](https://github.com/dotnet/machinelearning/pull/7236)) - Thanks @WeihanLi!
136+
- **[GenAI] Enable pack** ([#7237](https://github.com/dotnet/machinelearning/pull/7237))
137+
- **[GenAI] pack GenAI core package** ([#7246](https://github.com/dotnet/machinelearning/pull/7246))
138+
- **Enable SDL tools** ([#7247](https://github.com/dotnet/machinelearning/pull/7247))
139+
- **Add Service Tree ID for .NET Libraries** ([#7252](https://github.com/dotnet/machinelearning/pull/7252))
140+
- **fixing apple silicon official build** ([#7278](https://github.com/dotnet/machinelearning/pull/7278))
141+
- **fixing osx ci** ([#7279](https://github.com/dotnet/machinelearning/pull/7279))
142+
- **Fixing native lookup** ([#7282](https://github.com/dotnet/machinelearning/pull/7282))
143+
- **Add the components governance file `cgmanifest.json` for tokenizer's vocab files** ([#7283](https://github.com/dotnet/machinelearning/pull/7283))
144+
- **Update To MacOS 13** ([#7285](https://github.com/dotnet/machinelearning/pull/7285))
145+
- **Updated remote executor** ([#7295](https://github.com/dotnet/machinelearning/pull/7295))
146+
- **Update dependencies from maintenance-packages to latest versions** ([#7301](https://github.com/dotnet/machinelearning/pull/7301))
147+
148+
## **Documentation Updates**
149+
- **Update developer-guide.md** ([6870](https://github.com/dotnet/machinelearning/pull/6870)) - Thanks @computerscienceiscool!
150+
- **Update release-3.0.0.md** ([6895](https://github.com/dotnet/machinelearning/pull/6895)) - Thanks @taeerhebend!
151+
- **Update branding for 3.0.2** ([#6970](https://github.com/dotnet/machinelearning/pull/6970))
152+
- **Add release notes for 4.0-preview1** ([#7064](https://github.com/dotnet/machinelearning/pull/7064))
153+
- **Update readmes for Tokenizers and Microsoft.ML** ([#7070](https://github.com/dotnet/machinelearning/pull/7070))
154+
- **Adding migration guide for deepdev** ([#7073](https://github.com/dotnet/machinelearning/pull/7073))
155+
- **Update PACKAGE.md to include Llama info** ([#7104](https://github.com/dotnet/machinelearning/pull/7104))
156+
- **Update the tokenizer migration guide** ([#7109](https://github.com/dotnet/machinelearning/pull/7109))
157+
- **add document for GenAI** ([#7170](https://github.com/dotnet/machinelearning/pull/7170))
158+
- **[GenAI] Add readme to Microsoft.ML.GenAI.Phi** ([#7206](https://github.com/dotnet/machinelearning/pull/7206))
159+
- **Update wording in LDA docs** ([#7253](https://github.com/dotnet/machinelearning/pull/7253))

0 commit comments

Comments
 (0)