Skip to content

Commit ef85c12

Browse files
merge master to stage branch
2 parents ad88c6a + 1b387ae commit ef85c12

File tree

258 files changed

+28987
-467
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

258 files changed

+28987
-467
lines changed

content/assets/favicon.png

100755100644
44.6 KB
Loading

content/assets/favicon1.png

1.37 KB
Loading
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
title: 3 ways a data fabric enables a data-first approach
3+
date: 2022-03-15T10:07:10.175Z
4+
author: Ted Dunning & Ellen Friedman
5+
authorimage: /img/Avatar1.svg
6+
disable: false
7+
tags:
8+
- hpe-ezmeral-data-fabric
9+
---
10+
**Editor’s note: This article was originally posted on HPE Enterprise.nxt on March 15, 2022**
11+
12+
- - -
13+
14+
A well-engineered modern data fabric allows DevOps and other teams to access data in the way they prefer.
15+
16+
A data-first enterprise is a big advantage, but this strategy also puts a lot of demands on your data technology. That's all right unless your data technology puts a lot of demands on you.
17+
18+
Take modern cars. Vehicles can now easily contain over 100 computers to manage functions like adaptive cruise control, stability control, and anti-lock braking. These systems make cars much more complicated internally than decades ago, but they are much easier and safer to drive because of them.
19+
20+
Similarly, modern data technology needs to make it easier for users and system administrators of large-scale systems to work with data in more sophisticated and varied ways and at more locations, as is the case in a data-first enterprise. What does your data infrastructure need to do to help rather than be a hindrance?
21+
22+
### How does data fabric meet the demands of a data-first approach?
23+
24+
A data fabric is a highly scalable data infrastructure designed to store, manage, and move data as a unifying layer across an enterprise from edge to data center, on-premises or in the cloud.
25+
26+
In a data-first environment, data is treated as a foundational resource, one that is not used up when it is accessed. The capabilities of your data infrastructure should support the reuse of data and use by multiple applications. For example, the HPE Ezmeral Data Fabric File and Object Store software supports data reuse and data sharing by not requiring specialized data access methods. Off-the-shelf and custom applications can directly access data stored in the data fabric.
27+
28+
Here's a sample of some of the many ways a modern data fabric has a positive impact in a data-first approach.
29+
30+
### To move or not to move: Data where it needs to be
31+
32+
Data motion is a key issue in large-scale data systems. Data motion can include motion within a cluster and between clusters. Making wrong assumptions about data motion is one of the most common ways businesses inadvertently give up their ability to extract value from data.
33+
34+
At one extreme, people may have an ingrained assumption that data motion is not a viable option, based on legacy systems that lack any provision for moving data. Without motion, data that could have value if put into a more global context may be discarded instead.
35+
36+
37+
![Block text](/img/3waysadatafabric-enablesadatafirstapproach-quote.png "Block text")
38+
39+
40+
At other companies, the pendulum has swung radically to the opposite extreme, with a policy that all data to be analyzed must be moved to a central data center, either on premises or in the cloud. Unfortunately, the costs of data motion mount up, and where large amounts of data are at issue, only a tiny fraction of all possible data will be moved. Once again, data you could analyze is simply discarded.
41+
42+
Fortunately, a new middle ground is emerging. In telecommunications, finance, media, manufacturing, and other business sectors, far more data is collected than could possibly be moved back to headquarters. Data is partially processed in situ before extracts and summaries are moved to a central location or a regional sub-center for further analysis on aggregates from many edge sources. This edge processing strategy commonly uses pattern recognition to pull out the interesting bits or anomalies for transfer.
43+
44+
There are many reasons to move data, including communication between data centers or to a secondary cluster as part of a disaster recovery plan. The key to success is choice: You should be able to efficiently move data as needed or store and analyze it in place, all within the same system. Data motion should not have to be coded into each application.
45+
46+
Taking the example of HPE Ezmeral Data Fabric, selective data motion can be configured rather than coded, and data fabric moves the data invisibly. The data fabric even builds in remote access via a global namespace in case you need it.
47+
48+
### Decouple the query engine from data storage
49+
50+
The term database conjures up images of big iron running a system like Postgres or Oracle or a data warehouse like Teradata. All of the classical databases had this in common: The software that handles the storage of data is tightly integrated with the software that optimizes and executes queries.
51+
52+
Another common element of such database systems is that when it comes to processing the data in the database, it is their way or the highway. You could submit queries from practically any computer language around, but you can't do what SQL won't do. For applications such as machine learning, a SQL database just isn't a good fit except for data extraction. Even then, severe scale limitations are common with high-end databases.
53+
54+
The situation is changing. The trend now is to separate the query and storage engines into independent parts. The functional independence of query and storage isn't entirely new, but the idea that a SQL query engine should mostly query data stored in ordinary files is a big change.
55+
56+
The practical impact of this separation is that you can reuse data for a completely different purpose than originally intended. If your original purpose was to clear payments and produce statements and bills for tens of millions of credit card accounts, then a SQL query engine like Presto might be just the ticket.
57+
58+
However, in a data-driven enterprise, the real value from data doesn't usually come from collecting entirely new data. Instead, it comes from reusing or combining existing data in new ways, often with new tools. Mixing and matching query engines on the same data strikes gold. Locking data up in a monolithic database is just the opposite.
59+
60+
For example, while recently working on an open source project, one of the authors (Dunning) built some geospatial processing that quickly outgrew the relational database in use. Python and Parquet files worked great for initial extraction and cleaning, but indexing and sorting the historical data involved billions of geohash (a public domain geocode system) operations.
61+
62+
Storing that data in a data fabric allowed a seamless transition to distributed processing steps in [the Julia programming language](https://julialang.org/) that ran 100 times faster and could scale more easily. Keeping simple tasks simple is a big win for data fabric in these kinds of systems.
63+
64+
### Object storage vs. files
65+
66+
One change that characterizes a data-first enterprise is that architectural control moves much closer to the developer and data scientists in the line of business. Previously, much of that control was in the technology group in IT. This change is, in fact, the core driving force behind the DevOps movement. A major consequence of this shift has been the commoditization of IT services, an approach taken to extremes by the public cloud vendors.
67+
68+
An unforeseen (but obvious in hindsight) side effect of this shift is a divergence between how DevOps teams view data infrastructure and how IT teams view it. The DevOps point of view is all about simplicity and flexibility, while the IT view has always been about optimization of provisioning combined with centralized control.
69+
70+
Pushing for simplicity and flexibility drives a preference for data access with as little participation by the operating system as possible and certainly no special OS privileges. These constraints may put something as simple as mounting a file system out of bounds. These limits make object storage systems very attractive, since objects are accessed directly using simple protocols like HTTP instead of asking the OS to get file data from some file store. All you need is network access. On the other hand, performance isn't usually very high, and objects don't work like files, so compatibility suffers.
71+
72+
Prioritizing optimization, in contrast, leads to highly manageable systems like storage appliances to provide block storage to blade servers. In this model, just enough storage is allocated to applications that primarily use non-distributed file systems for data. The operating system kernel mounts these file systems and controls all access. That's fine for some things, but it hurts scalability and makes DevOps harder.
73+
74+
The fact is, recent technology makes both goals achievable. If a modern data fabric is engineered to allow access to data as either files or as objects, this flexibility frees DevOps teams to access data in the way they prefer. In addition, a data fabric can have built-in capabilities for data management and data motion. These capabilities make it much easier for IT teams to manage the overall system at scale.
75+
76+
### Data fabric for data-first
77+
78+
Having the right data infrastructure lets you focus on the decisions that will make your organization a data-first enterprise. Whether it is choosing the right level of data motion, using multiple tools to analyze the same data, or storing data as objects or files, your data infrastructure needs to provide a host of advanced capabilities but still be easy enough to drive. A modern data fabric does just that.
79+
80+
### Lessons for leaders
81+
82+
* A data fabric is a highly scalable data infrastructure designed to store, manage, and move data as a unifying layer across an enterprise from edge to data center, on premises or in the cloud.
83+
* A well-engineered modern data fabric allows DevOps and other teams to access data in the way they prefer.
84+
* Making wrong assumptions about data motion is one of the most common ways businesses inadvertently give up their ability to extract value from data.
85+
86+
<br />
87+
88+
> > <span style="color:grey; font-family:Arial; font-size:1em"> This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.</span>
89+
90+
<br />
91+
92+
<u>**About the authors:**</u>
93+
94+
Ted Dunning is chief technologist for data fabric at HPE. He has a PhD in computer science and authored more than 10 books focused on data sciences. He has more than 25 patents in advanced computing and plays the mandolin and guitar, both poorly.
95+
96+
Ellen Friedman is a principal technologist at Hewlett Packard Enterprise focused on large-scale data analytics and machine learning. Ellen worked at MapR Technologies prior to her current role at HPE. She was a committer for the Apache Drill and Apache Mahout open source projects and is a co-author of multiple books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, and the Practical Machine Learning series.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: Announcing Chapel 1.30.0!
3+
date: 2023-03-24T00:29:50.615Z
4+
priority: 2
5+
externalLink: https://chapel-lang.org/blog/posts/announcing-chapel-1.30/
6+
author: Brad Chamberlain
7+
authorimage: https://chapel-lang.org/blog/authors/brad-chamberlain/photo.jpg
8+
disable: false
9+
tags:
10+
- chapel
11+
- opensource
12+
- GPU Programming
13+
- HPC
14+
---
15+
External Blog
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: Announcing Chapel 1.31!
3+
date: 2023-06-22T23:45:57.101Z
4+
externalLink: https://chapel-lang.org/blog/posts/announcing-chapel-1.31/
5+
author: Brad Chamberlain
6+
authorimage: https://chapel-lang.org/blog/authors/brad-chamberlain/photo.jpg
7+
disable: false
8+
tags:
9+
- opensource
10+
- GPU Programming
11+
- HPC
12+
- chapel
13+
- LLVM
14+
---
15+
External blog
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Announcing HPE Swarm Learning 2.0.0
3+
date: 2023-06-12T16:48:53.923Z
4+
featuredBlog: true
5+
author: HPE Swarm Learning Team
6+
authorimage: /img/Avatar1.svg
7+
disable: false
8+
tags:
9+
- swarm-learning
10+
---
11+
We’re excited to announce the HPE Swarm Learning 2.0.0 community release!!
12+
13+
In the previous version of HPE Swarm Learning, if the sentinel Swarm Network (SN) node goes down during Swarm training, the training process would stop, and there was no way to resume it. However, with this release, we have addressed the issue by implementing a mesh topology (connectivity) between SNs, replacing the previous star topology where only the sentinel SN was connected to other SNs.
14+
15+
Also, we now support multiple blockchain miners instead of just one miner in the sentinel SN. Now, even if the initial sentinel SN goes down, since other SNs also function as miners, it allows the training to continue uninterrupted. Additionally, when the initial sentinel SN is down and if a new SN wants to join the network, it can seamlessly integrate and join the Swarm network with the help of any other SN node. This **high availability configuration** ensures improved resilience and robustness of HPE Swarm Learning.
16+
17+
In the HPE Swarm Learning sync stage (defined by sync frequency), when it is time to share the learning from the individual model, one of the Swarm Learning (SL) nodes is designated as the “leader” node. This leader node collects the individual models from each peer node and merges them into a single model by combining parameters of all the individuals. The **Leader Failure Detection and Recovery (LFDR)** feature enables SL nodes to continue Swarm training during the merging process when an SL leader node fails. A new SL leader node is selected to continue the merging process. If the failed SL leader node comes back after the new SL leader node is in action, the failed SL leader node is treated as a normal SL node and contributes its learning to the swarm global model.
18+
19+
With the HPE Swarm Learning v2.0.0 release, a user can now extend a Swarm client to support other machine learning platforms as well. Currently Swarm client supports machine learning platforms like PyTorch and Keras (based on Tensorflow 2 in backend). Please find the instructions to extend Swarm client [here](https://github.com/HewlettPackard/swarm-learning/blob/master/lib/src/README.md).
20+
21+
#### **2.0.0 release contains following updates:**
22+
23+
* High availability for SN
24+
25+
* Handles Sentinel node failure
26+
* Ensures any SN node can act as sentinel while adding new node
27+
* Supports mesh topology of SN network
28+
* High availability for SL leader
29+
30+
* Elects a new merge leader when a leader failure is detected
31+
* Handles stale leader recovery
32+
* Swarm Learning Management UI (SLM-UI)
33+
34+
* Supports Swarm product installation through SLM-UI
35+
* Deploys and manages Swarm Learning through SLM-UI
36+
* Swarm client library
37+
38+
* Extends Swarm Learning for new ML platforms
39+
* Improved diagnostics and utility script for logs collection.
40+
41+
#### For complete details on this new release, please refer to the following resources:
42+
43+
* [HPE Swarm Learning home page](https://github.com/HewlettPackard/swarm-learning)
44+
* [HPE Swarm Learning client readme](https://github.com/HewlettPackard/swarm-learning/blob/master/lib/src/README.md)
45+
46+
For any questions, start a discussion in our [\#hpe-swarm-learning](https://hpedev.slack.com/archives/C04A5DK9TUK) slack channel on [HPE Developer Slack Workspace](https://slack.hpedev.io/)

0 commit comments

Comments
 (0)