This project highlights how NCache can be used with ML.NET for predicting taxi fares. In real time machine learning scenarios, data is flowing very rapidly and machine learning model is being retrained continuously on the basis of this new data so that accuracy can be maintained. The number of inserts per second is very high in such scenarios. Also, this data is being read on ML side. This can cause slowdowns. To avoid such issues NCache has been used. NCache is a highly scalable distributed cache in .NET. Using NCache for data processing increases the performance of our system as it provides fast read/write operations. NCache being scalable can handle efficiently if the data sets are very large. ML.NET reads data from a file, a data source or Enumerable. NCache provides distributed data structures thus making it very easy for direct retrieval and reading of data by ML.NET. No extra data manipulation is required. User can easily store data in the form of lists and directly load it in his ML.NET model. NCache’s pub/sub functionality can be efficiently used here for notifications without any performance issues.
Following are required before you can run this app:
- NCache Enterprise (version 5.0)
- DotNET Core Runtime (version 3.0)
- DotNET Core SDK (version 3.0)
- Visual Studio (VS2019)
Note that the app has been tested with the mentioned versions.
A cache needs to be linked with the app.
On installing NCache, two caches are created by the name of mypartitionedcache and myreplicatedcache.
If the default cache is not available, Create a new cache.
Mention this cache name in "CacheId" attribute of following files:
- `\TaxiFarePrediction\TaxiFarePredictionConsoleApp\App.config
- `\AddTripDataInCache\AddTripDataInCache\App.config
This solution contains three projects:
- TaxiFarePrediction
- AddTripDataInCache
- DataStructures
Build the project in windows environment and run TaxiFarePrediction app first. Then use the AddTripDataInCache for adding real time continuous data in cache that will be used for retraining of model.
TaxiFarePredictionConsoleApp adds initial training data into cache and uses this further for initially training the ML.NET model.
AddTripDataInCache reads a data chunk from .csv data file. It then takes data from cache on which previous model training has been done. It uses the sliding window concept for combining previous data with new data and updates the dataset in cache. It also publishes a message to the specified topic so that ML.NET application receives a notification that model needs to be retrained.
Following is an explanation to the working of different parts of this taxi fare prediction model.
The app has two main users
- User responsible for training and retraining of model.
- User that provides new taxi trip data.
The user trains the model on basis of initial data. Whenever new data is added then this user is notified via pub/sub and model retraining takes place.
This user updates taxi trip data through a public API.
Here are how the app functions:
- The user runs the console app which connects to NCache and loads data.
- The user trains the machine learning model on the basis of this initial data.
- The user subscribes to a topic using pub/sub for receiving updates on whenever new data is added.
- On basis of this new data, the model is retrained and new predictions are made.
- The user adds new taxi trip data into cache.
- A message is published that new data has been added in cache.
All parts of the app are connected to an NCache feature. Following is a detail on each feature's implementation and use:

Handling Pub/Sub in NCache can be done in the following way:
// create pub/sub topic on match start
_cache.MessagingService.CreateTopic(DataStructures.Constants.TopicName);
// fetch topic handler from cache
ITopic topic = _cache.MessagingService.GetTopic(DataStructures.Constants.TopicName);
// publish Message on topic
topic.Publish(message, DeliveryOption.All);
// subscribe this server to ongoing match topic
_cache.MessagingService.GetTopic(DataStructures.Constants.TopicName).CreateSubscription(_selfSubscribe);
Previous data is retrieved from cache. When new data is received then initial chunk of cached data is discarded and this new data is appended with the leftover chunk of previous data and updated in cache. Distributed lists are used for achieving this.
A model has to be trained for using it to make predictions. This model is saved in the form of a .zip file and used when predictions are being made.
Building model pipeline evolves around two steps. First, loading dataset from cache. And second, transforming the data according to ML algorithm.
Model is trained on basis of specified ML algorithm and saved in the form of a .zip file for later use. The pipeline is also saved as a .zip file.
The model and pipeline are loaded from the .zip files and retrained according to the ML.NET retraining algorithm. The retrained model and pipeline are then again saved for future use.
After every training/retraining of model, single value predictions are made on transformed data for testing model accuracy.
- Visual Studio 2019- For creation of project and code
The complete online documentation for NCache is available at: http://www.alachisoft.com/resources/docs/#ncache
The complete programmers guide of NCache is available at: http://www.alachisoft.com/resources/docs/ncache/prog-guide/
Alachisoft [C] provides various sources of technical support.
- Please refer to http://www.alachisoft.com/support.html to select a support resource you find suitable for your issue.
- To request additional features in the future, or if you notice any discrepancy regarding this document, please drop an email to support@alachisoft.com.
[C] Copyright 2020 Alachisoft