|  | 
|  | 1 | +# [ML.NET](http://dot.net/ml) 4.0 | 
|  | 2 | + | 
|  | 3 | +## **New Features** | 
|  | 4 | +- **Add sweepable estimator to NER** ([6965](https://github.com/dotnet/machinelearning/pull/6965)) | 
|  | 5 | +- **Introducing Tiktoken Tokenizer** ([6981](https://github.com/dotnet/machinelearning/pull/6981)) | 
|  | 6 | +- **Add text normalizer transformer to AutoML** ([6998](https://github.com/dotnet/machinelearning/pull/6998)) | 
|  | 7 | +- **Introducing Llama Tokenizer** ([#7078](https://github.com/dotnet/machinelearning/pull/7078)) | 
|  | 8 | +- **Introducing CodeGen Tokenizer** ([#7139](https://github.com/dotnet/machinelearning/pull/7139)) | 
|  | 9 | +- **Support Gpt-4o tokenizer model** ([#7157](https://github.com/dotnet/machinelearning/pull/7157)) | 
|  | 10 | +- **Add GenAI core package** ([#7177](https://github.com/dotnet/machinelearning/pull/7177)) | 
|  | 11 | +- **Use new System.Numerics.Tensors library for DataFrame arithmetic operations  (.net8)** ([#7179](https://github.com/dotnet/machinelearning/pull/7179)) - Thanks @asmirnov82! | 
|  | 12 | +- **Add Microsoft.ML.GenAI.Phi** ([#7184](https://github.com/dotnet/machinelearning/pull/7184)) | 
|  | 13 | +- **[GenAI] Add LLaMA support** ([#7220](https://github.com/dotnet/machinelearning/pull/7220)) | 
|  | 14 | +- **[GenAI] Support Llama 3.2 1B and 3B model** ([#7245](https://github.com/dotnet/machinelearning/pull/7245)) | 
|  | 15 | +- **[GenAI] Introduce CausalLMPipelineChatClient for MEAI.IChatClient** ([#7270](https://github.com/dotnet/machinelearning/pull/7270)) | 
|  | 16 | +- **Can now set advanced runtime settings in the MLContext.** ([#7273](https://github.com/dotnet/machinelearning/pull/7273)) | 
|  | 17 | +- **Introducing WordPiece and Bert tokenizers** ([#7275](https://github.com/dotnet/machinelearning/pull/7275)) | 
|  | 18 | + | 
|  | 19 | +## **Enhancements** | 
|  | 20 | +- **Add support for Apache.Arrow.Types.TimestampType to DataFrame** ([6871](https://github.com/dotnet/machinelearning/pull/6871)) - Thanks @asmirnov82! | 
|  | 21 | +- **Add new type to key-value converter** ([6973](https://github.com/dotnet/machinelearning/pull/6973)) | 
|  | 22 | +- **Update OnnxRuntime to 1.16.3** ([6975](https://github.com/dotnet/machinelearning/pull/6975)) | 
|  | 23 | +- **Tokenizer's Interfaces Cleanup** ([7001](https://github.com/dotnet/machinelearning/pull/7001)) | 
|  | 24 | +- **Match  SweepableEstimatorFactory name with Ml.net name.** ([7007](https://github.com/dotnet/machinelearning/pull/7007)) | 
|  | 25 | +- **First round of perf improvements for tiktoken** ([7012](https://github.com/dotnet/machinelearning/pull/7012)) | 
|  | 26 | +- **Tweak CreateByModelNameAsync** ([7015](https://github.com/dotnet/machinelearning/pull/7015)) | 
|  | 27 | +- **Avoid LruCache in Tiktoken when cacheSize specified is 0** ([7016](https://github.com/dotnet/machinelearning/pull/7016)) | 
|  | 28 | +- **Tweak Tiktoken's BytePairEncode for improved perf** ([7017](https://github.com/dotnet/machinelearning/pull/7017)) | 
|  | 29 | +- **Optimize regexes used in tiktoken** ([7020](https://github.com/dotnet/machinelearning/pull/7020)) | 
|  | 30 | +- **Address the feedback on the tokenizer's library** ([7024](https://github.com/dotnet/machinelearning/pull/7024)) | 
|  | 31 | +- **Add Span support in tokenizer's Model abstraction** ([7035](https://github.com/dotnet/machinelearning/pull/7035)) | 
|  | 32 | +- **Adding needed Tokenizer's APIs** ([7047](https://github.com/dotnet/machinelearning/pull/7047)) | 
|  | 33 | +- **Add Tiktoken Synchronous Creation Using Model Name** ([#7080](https://github.com/dotnet/machinelearning/pull/7080)) | 
|  | 34 | +- **Embed Tiktoken data files** ([#7098](https://github.com/dotnet/machinelearning/pull/7098)) | 
|  | 35 | +- **Tokenizer's APIs Polishing** ([#7108](https://github.com/dotnet/machinelearning/pull/7108)) | 
|  | 36 | +- **More tokenizer's APIs cleanup** ([#7110](https://github.com/dotnet/machinelearning/pull/7110)) | 
|  | 37 | +- **Add more required Tokenizer APIs** ([#7114](https://github.com/dotnet/machinelearning/pull/7114)) | 
|  | 38 | +- **Tokenizer's APIs Update** ([#7128](https://github.com/dotnet/machinelearning/pull/7128)) | 
|  | 39 | +- **Allow developers to supply their own function to infer column data types from data while loading CSVs** ([#7142](https://github.com/dotnet/machinelearning/pull/7142)) - Thanks @sevenzees! | 
|  | 40 | +- **Implement DataFrameColumn Apply and DropNulls methods** ([#7123](https://github.com/dotnet/machinelearning/pull/7123)) - Thanks @asmirnov82! | 
|  | 41 | +- **Extend dataframe orderby method to allow defining preferred position for null values** ([#7118](https://github.com/dotnet/machinelearning/pull/7118)) - Thanks @asmirnov82! | 
|  | 42 | +- **Implement ToString() method for DataFrameColumn class** ([#7103](https://github.com/dotnet/machinelearning/pull/7103)) - Thanks @asmirnov82! | 
|  | 43 | +- **Added error handling, removed unwanted null check and enhanced readability** ([#7147](https://github.com/dotnet/machinelearning/pull/7147)) - Thanks @ravibaghel!  | 
|  | 44 | +- **Add targeting .Net 8.0 for DataFrame package** ([#7168](https://github.com/dotnet/machinelearning/pull/7168)) - Thanks @asmirnov82! | 
|  | 45 | +- **create unique temporary directories to prevent permission issues** ([#7173](https://github.com/dotnet/machinelearning/pull/7173)) - Thanks @ErikApption! | 
|  | 46 | +- **Tokenizer APIs Update** ([#7190](https://github.com/dotnet/machinelearning/pull/7190)) | 
|  | 47 | +- **Make most Tokenizer abstract methods virtual** ([#7198](https://github.com/dotnet/machinelearning/pull/7198)) | 
|  | 48 | +- **Reduce Tiktoken Creation Memory Allocation** ([#7202](https://github.com/dotnet/machinelearning/pull/7202)) | 
|  | 49 | +- **Refactor Namespace and Seald Classes in Microsoft.ML.AutoML.SourceGenerator Project** ([#7223](https://github.com/dotnet/machinelearning/pull/7223)) - Thanks @mhshahmoradi! | 
|  | 50 | +- **[GenAI] Add generateEmbedding API to CausalLMPipeline** ([#7227](https://github.com/dotnet/machinelearning/pull/7227)) | 
|  | 51 | +- **[GenAI] Add Mistral 7B Instruction V0.3** ([#7231](https://github.com/dotnet/machinelearning/pull/7231)) | 
|  | 52 | +- **Move the Tokenizer's data into separate packages.** ([#7248](https://github.com/dotnet/machinelearning/pull/7248)) | 
|  | 53 | +- **Load onnx model from Stream of bytes** ([#7254](https://github.com/dotnet/machinelearning/pull/7254)) | 
|  | 54 | +- **Update tiktoken regexes** ([#7255](https://github.com/dotnet/machinelearning/pull/7255)) | 
|  | 55 | +- **Misc Changes** ([#7264](https://github.com/dotnet/machinelearning/pull/7264)) | 
|  | 56 | +- **Address the feedback regarding Bert tokenizer** ([#7280](https://github.com/dotnet/machinelearning/pull/7280)) | 
|  | 57 | +- **Add Timeout to Regex used in the tokenizers** ([#7284](https://github.com/dotnet/machinelearning/pull/7284)) | 
|  | 58 | +- **Final tokenizer's cleanup** ([#7291](https://github.com/dotnet/machinelearning/pull/7291)) | 
|  | 59 | + | 
|  | 60 | +## **Bug Fixes** | 
|  | 61 | +- **Fix formatting that fails in VS** ([7023](https://github.com/dotnet/machinelearning/pull/7023)) | 
|  | 62 | +- **Issue #6606 - Add sample variance and standard deviation to NormalizeMeanVariance** ([6885](https://github.com/dotnet/machinelearning/pull/6885)) - Thanks @tearlant! | 
|  | 63 | +- **Rename NameEntity to NamedEntity** ([#6917](https://github.com/dotnet/machinelearning/pull/6917)) | 
|  | 64 | +- **Fixes NER to correctly expand/shrink the labels** ([#6928](https://github.com/dotnet/machinelearning/pull/6928)) | 
|  | 65 | +- **fix #6949** ([#6951](https://github.com/dotnet/machinelearning/pull/6951)) | 
|  | 66 | +- **Fix DataFrame NullCount property of StringDataFrameColumn** ([#7090](https://github.com/dotnet/machinelearning/pull/7090)) - Thanks @asmirnov82! | 
|  | 67 | +- **Fix Logical binary operations not supported exception** ([#7093](https://github.com/dotnet/machinelearning/pull/7093)) - Thanks @asmirnov82! | 
|  | 68 | +- **Fix inconsistency in DataFrameColumns Clone API implementation** ([#7100](https://github.com/dotnet/machinelearning/pull/7100)) - Thanks @asmirnov82! | 
|  | 69 | +- **Add Tiktoken's missing model names** ([#7111](https://github.com/dotnet/machinelearning/pull/7111)) | 
|  | 70 | +- **Accessing data by column after adding columns to a DataFrame returns error data** ([#7136](https://github.com/dotnet/machinelearning/pull/7136)) - Thanks @feiyun0112! | 
|  | 71 | +- **Fix iterator type so that it matches boundary condition type** ([#7150](https://github.com/dotnet/machinelearning/pull/7150)) | 
|  | 72 | +- **Fix crash in Microsoft.ML.Recommender with validation set** ([#7196](https://github.com/dotnet/machinelearning/pull/7196)) | 
|  | 73 | +- **Fix #7203** ([#7207](https://github.com/dotnet/machinelearning/pull/7207)) | 
|  | 74 | +- **Fix decoding special tokens in SentencePiece tokenizer** ([#7233](https://github.com/dotnet/machinelearning/pull/7233)) | 
|  | 75 | +- **Fix dataframe incorrectly parse CSV when renameDuplicatedColumns is true** ([#7242](https://github.com/dotnet/machinelearning/pull/7242)) - Thanks @asmirnov82! | 
|  | 76 | +- **Fixes #7271 AOT for ML.Tokenizers** ([#7272](https://github.com/dotnet/machinelearning/pull/7272)) - Thanks @euju-ms! | 
|  | 77 | + | 
|  | 78 | +## **Build / Test updates** | 
|  | 79 | +- **[main] Update dependencies from dotnet/arcade** ([#6703](https://github.com/dotnet/machinelearning/pull/6703)) | 
|  | 80 | +- **Migrate to the 'locker' GitHub action for locking closed/stale issues/PRs** ([6896](https://github.com/dotnet/machinelearning/pull/6896)) | 
|  | 81 | +- **Reorganize dataframe files** ([6872](https://github.com/dotnet/machinelearning/pull/6872)) - Thanks @asmirnov82! | 
|  | 82 | +- **Updated ml.net versioning** ([6907](https://github.com/dotnet/machinelearning/pull/6907)) | 
|  | 83 | +- **Don't include the SDK in our helix payload** ([6918](https://github.com/dotnet/machinelearning/pull/6918)) | 
|  | 84 | +- **Make double assertions compare with tolerance instead of precision** ([6923](https://github.com/dotnet/machinelearning/pull/6923)) | 
|  | 85 | +- **Fix assert by only accessing idx** ([6924](https://github.com/dotnet/machinelearning/pull/6924)) | 
|  | 86 | +- **Only use semi-colons for NoWarn - fixes build break** ([6935](https://github.com/dotnet/machinelearning/pull/6935)) | 
|  | 87 | +- **Packaging cleanup** ([6939](https://github.com/dotnet/machinelearning/pull/6939)) | 
|  | 88 | +- **Add Backport github workflow** ([6944](https://github.com/dotnet/machinelearning/pull/6944)) | 
|  | 89 | +- **[main] Update dependencies from dotnet/arcade** ([6957](https://github.com/dotnet/machinelearning/pull/6957)) | 
|  | 90 | +- **Update .NET Runtimes to latest version** ([6964](https://github.com/dotnet/machinelearning/pull/6964)) | 
|  | 91 | +- **Testing light gbm bad allocation** ([6968](https://github.com/dotnet/machinelearning/pull/6968)) | 
|  | 92 | +- **[main] Update dependencies from dotnet/arcade** ([6969](https://github.com/dotnet/machinelearning/pull/6969)) | 
|  | 93 | +- **[main] Update dependencies from dotnet/arcade** ([6976](https://github.com/dotnet/machinelearning/pull/6976)) | 
|  | 94 | +- **FabricBot: Onboarding to GitOps.ResourceManagement because of FabricBot decommissioning** ([6983](https://github.com/dotnet/machinelearning/pull/6983)) | 
|  | 95 | +- **[main] Update dependencies from dotnet/arcade** ([6985](https://github.com/dotnet/machinelearning/pull/6985)) | 
|  | 96 | +- **[main] Update dependencies from dotnet/arcade** ([6995](https://github.com/dotnet/machinelearning/pull/6995)) | 
|  | 97 | +- **Temp fix for the race condition during the tests.** ([7021](https://github.com/dotnet/machinelearning/pull/7021)) | 
|  | 98 | +- **Make MlImage tests not block file for reading** ([7029](https://github.com/dotnet/machinelearning/pull/7029)) | 
|  | 99 | +- **Remove SourceLink SDK references** ([7037](https://github.com/dotnet/machinelearning/pull/7037)) | 
|  | 100 | +- **Change official build to use 1ES templates** ([7048](https://github.com/dotnet/machinelearning/pull/7048)) | 
|  | 101 | +- **Auto-generated baselines by 1ES Pipeline Templates** ([7051](https://github.com/dotnet/machinelearning/pull/7051)) | 
|  | 102 | +- **Update package versions in use by ML.NET tests** ([7055](https://github.com/dotnet/machinelearning/pull/7055)) | 
|  | 103 | +- **testing arm python brew overwite** ([7058](https://github.com/dotnet/machinelearning/pull/7058)) | 
|  | 104 | +- **Split out non concurrent test collections.** ([#6937](https://github.com/dotnet/machinelearning/pull/6937)) | 
|  | 105 | +- **[release/3.0] Update dependencies from dotnet/arcade** ([#6938](https://github.com/dotnet/machinelearning/pull/6938)) | 
|  | 106 | +- **Branding for 3.0.1** ([#6943](https://github.com/dotnet/machinelearning/pull/6943)) | 
|  | 107 | +- **Add Backport github workflow** ([#6944](https://github.com/dotnet/machinelearning/pull/6944)) | 
|  | 108 | +- **Torch sharp version updates and test fixes** ([#6954](https://github.com/dotnet/machinelearning/pull/6954)) | 
|  | 109 | +- **[main] Update dependencies from dotnet/arcade** ([#6957](https://github.com/dotnet/machinelearning/pull/6957)) | 
|  | 110 | +- **Working on memory issue during tests for TorchSharp** ([#7022](https://github.com/dotnet/machinelearning/pull/7022)) | 
|  | 111 | +- **M1 helix testing** ([#7033](https://github.com/dotnet/machinelearning/pull/7033)) | 
|  | 112 | +- **[main] Update dependencies from dotnet/arcade** ([#7052](https://github.com/dotnet/machinelearning/pull/7052)) | 
|  | 113 | +- **[main] Update dependencies from dotnet/arcade** ([#7075](https://github.com/dotnet/machinelearning/pull/7075)) | 
|  | 114 | +- **Reenable log publishing** ([#7076](https://github.com/dotnet/machinelearning/pull/7076)) | 
|  | 115 | +- **[main] Update dependencies from dotnet/arcade** ([#7079](https://github.com/dotnet/machinelearning/pull/7079)) | 
|  | 116 | +- **Update VMs** ([#7087](https://github.com/dotnet/machinelearning/pull/7087)) | 
|  | 117 | +- **Don't trigger PR validation builds for docs only changes** ([#7096](https://github.com/dotnet/machinelearning/pull/7096)) | 
|  | 118 | +- **Add CodeQL exclusions file** ([#7105](https://github.com/dotnet/machinelearning/pull/7105)) | 
|  | 119 | +- **Don't use deprecated -pt images** ([#7131](https://github.com/dotnet/machinelearning/pull/7131)) | 
|  | 120 | +- **Update locker.yml** ([#7133](https://github.com/dotnet/machinelearning/pull/7133)) | 
|  | 121 | +- **[main] Update dependencies from dotnet/arcade** ([#7138](https://github.com/dotnet/machinelearning/pull/7138)) | 
|  | 122 | +- **Try enabling TSA scan during build** ([#7149](https://github.com/dotnet/machinelearning/pull/7149)) | 
|  | 123 | +- **[main] Update dependencies from dotnet/arcade** ([#7151](https://github.com/dotnet/machinelearning/pull/7151)) | 
|  | 124 | +- **Remove Codeql.SourceRoot** ([#7155](https://github.com/dotnet/machinelearning/pull/7155)) | 
|  | 125 | +- **[main] Update dependencies from dotnet/arcade** ([#7161](https://github.com/dotnet/machinelearning/pull/7161)) | 
|  | 126 | +- **[main] Update dependencies from dotnet/arcade** ([#7165](https://github.com/dotnet/machinelearning/pull/7165)) | 
|  | 127 | +- **Add a stub packageSourceMapping** ([#7171](https://github.com/dotnet/machinelearning/pull/7171)) | 
|  | 128 | +- **update torchsharp and helix image** ([#7188](https://github.com/dotnet/machinelearning/pull/7188)) | 
|  | 129 | +- **Publish source index directly from repo** ([#7189](https://github.com/dotnet/machinelearning/pull/7189)) | 
|  | 130 | +- **Add package readmes** ([#7200](https://github.com/dotnet/machinelearning/pull/7200)) | 
|  | 131 | +- **Update dependency versions.** ([#7216](https://github.com/dotnet/machinelearning/pull/7216)) | 
|  | 132 | +- **[main] Update dependencies from dotnet/arcade** ([#7218](https://github.com/dotnet/machinelearning/pull/7218)) | 
|  | 133 | +- **Directly refer sql data client 4.8.6 package in GenAI tests to fix security vulnerable package** ([#7228](https://github.com/dotnet/machinelearning/pull/7228)) | 
|  | 134 | +- **[main] Update dependencies from dotnet/arcade** ([#7235](https://github.com/dotnet/machinelearning/pull/7235)) | 
|  | 135 | +- **docs: update nuget package badge** ([#7236](https://github.com/dotnet/machinelearning/pull/7236)) - Thanks @WeihanLi! | 
|  | 136 | +- **[GenAI] Enable pack** ([#7237](https://github.com/dotnet/machinelearning/pull/7237)) | 
|  | 137 | +- **[GenAI] pack GenAI core package** ([#7246](https://github.com/dotnet/machinelearning/pull/7246)) | 
|  | 138 | +- **Enable SDL tools** ([#7247](https://github.com/dotnet/machinelearning/pull/7247)) | 
|  | 139 | +- **Add Service Tree ID for .NET Libraries** ([#7252](https://github.com/dotnet/machinelearning/pull/7252)) | 
|  | 140 | +- **fixing apple silicon official build** ([#7278](https://github.com/dotnet/machinelearning/pull/7278)) | 
|  | 141 | +- **fixing osx ci** ([#7279](https://github.com/dotnet/machinelearning/pull/7279)) | 
|  | 142 | +- **Fixing native lookup** ([#7282](https://github.com/dotnet/machinelearning/pull/7282)) | 
|  | 143 | +- **Add the components governance file `cgmanifest.json` for tokenizer's vocab files** ([#7283](https://github.com/dotnet/machinelearning/pull/7283)) | 
|  | 144 | +- **Update To MacOS 13** ([#7285](https://github.com/dotnet/machinelearning/pull/7285)) | 
|  | 145 | +- **Updated remote executor** ([#7295](https://github.com/dotnet/machinelearning/pull/7295)) | 
|  | 146 | +- **Update dependencies from maintenance-packages to latest versions** ([#7301](https://github.com/dotnet/machinelearning/pull/7301)) | 
|  | 147 | + | 
|  | 148 | +## **Documentation Updates** | 
|  | 149 | +- **Update developer-guide.md** ([6870](https://github.com/dotnet/machinelearning/pull/6870)) - Thanks @computerscienceiscool! | 
|  | 150 | +- **Update release-3.0.0.md** ([6895](https://github.com/dotnet/machinelearning/pull/6895)) - Thanks @taeerhebend! | 
|  | 151 | +- **Update branding for 3.0.2** ([#6970](https://github.com/dotnet/machinelearning/pull/6970)) | 
|  | 152 | +- **Add release notes for 4.0-preview1** ([#7064](https://github.com/dotnet/machinelearning/pull/7064)) | 
|  | 153 | +- **Update readmes for Tokenizers and Microsoft.ML** ([#7070](https://github.com/dotnet/machinelearning/pull/7070)) | 
|  | 154 | +- **Adding migration guide for deepdev** ([#7073](https://github.com/dotnet/machinelearning/pull/7073)) | 
|  | 155 | +- **Update PACKAGE.md to include Llama info** ([#7104](https://github.com/dotnet/machinelearning/pull/7104)) | 
|  | 156 | +- **Update the tokenizer migration guide** ([#7109](https://github.com/dotnet/machinelearning/pull/7109)) | 
|  | 157 | +- **add document for GenAI** ([#7170](https://github.com/dotnet/machinelearning/pull/7170)) | 
|  | 158 | +- **[GenAI] Add readme to Microsoft.ML.GenAI.Phi** ([#7206](https://github.com/dotnet/machinelearning/pull/7206)) | 
|  | 159 | +- **Update wording in LDA docs** ([#7253](https://github.com/dotnet/machinelearning/pull/7253)) | 
0 commit comments