Update limitations and improve wording

shubhaat · web-flow · commit 41e3e842040a · 2025-08-07T08:56:17.000-07:00
Rate limits documented seperately
diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md
@@ -31,7 +31,7 @@ stack: preview 9.1
 serverless: preview
 ```
 
-ELSER on EIS enables you to use the ELSER model without using ML nodes in your infrastructure and with that, it simplifies the semantic search and hybrid search experience.
+ELSER on EIS enables you to use the ELSER model on GPUs, without having to manage your own ML nodes. We expect better performance for throughput and latency than ML nodes, and will continue to benchmark, remove limitations and address concerns as we move towards General Availability. 
 
 ### Private preview access
 
@@ -40,10 +40,6 @@ Private preview access is available by submitting the form provided [here](https
 ### Limitations
 
 While we do encourage experimentation, we do not recommend implementing production use cases on top of this feature while it is in Technical Preview.
-The known limitations include 
-- Maximum batch size is 16 for ingest requests
-- Rate limit for search and ingest at 2000 tokens per minute
-- We do not support autoscaling at this point, so many parallel requests will result in performance degradations. Autoscaling coming soon. 
 
 #### Access
 
@@ -65,3 +61,6 @@ Performance may vary during the Technical Preview.
 
 Batches are limited to a maximum of 16 documents.
 This is particularly relevant when using the [_bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/v9/operation/operation-bulk) for data ingestion.
+
+#### Rate Limits 
+Rate limit for search and ingest is currently at 2000 tokens per minute