closes #98, closes #99, closes #101

JPHaus · JPHaus · commit b83032f01125 · 2025-07-06T22:45:58.000-05:00
diff --git a/Concepts/Data Ingestion/Data Ingestion.md b/Concepts/Data Ingestion/Data Ingestion.md
@@ -28,6 +28,45 @@ Common data sources include:
 
 ### 2. Ingestion Patterns
 
+#### Extract, Transform, Load (ETL)
+
+ETL is a traditional ingestion pattern where data is extracted from a source, transformed (during the ingestion process), and then loaded into the destination.
+
+```mermaid
+%%{init: { "flowchart": { "useMaxWidth": true } } }%%
+graph LR
+    A[Data Source] 
+    B[Extract]
+    C[Transform<br/>Data Validation<br/>Business Rules<br/>Cleaning]
+    D[Load]
+    E[(Data Warehouse)]
+    
+    A -->|Raw data| B
+    B -->|Extracted data| C
+    C -->|Clean data| D
+    D -->|Structured data| E
+```
+
+#### Extract, Load, Transform (ELT)
+
+ELT is the modern ingestion pattern where raw data is extracted and loaded directly into the destination, then transformed within the destination system. ELT is the more popular pattern because storage is cheap and keeping the raw data allows for more flexibility in future data use cases.
+
+```mermaid
+%%{init: { "flowchart": { "useMaxWidth": true } } }%%
+graph LR
+    A[Data Source] 
+    B[Extract]
+    C[Load]
+    D[Transform<br/>In destination]
+    E[(Data Warehouse/Lake)]
+    
+    A -->|Raw data| B
+    B -->|Extracted data| C
+    C -->|Raw data| E
+    E -->|Stored data| D
+    D -->|Transformed data| E
+```
+
 #### [[Batch Data Processing|Batch Ingestion]]
 
 Data is collected and processed in discrete chunks at scheduled intervals.
@@ -195,25 +234,7 @@ graph LR
     D -->|Load processed data| E
 ```
 
-## Common Data Ingestion Challenges
-
-### Scalability
-
-- Volume Growth: Handling increasing data volumes
-- Source System Impact: Minimizing load on operational systems
-- Resource Management: Efficiently using compute and storage resources
-
-### Reliability
-
-- Source System Downtime: Handling unavailable data sources
-- Network Issues: Managing connectivity problems
-- Data Consistency: Ensuring data integrity across systems
-
-### Complexity
 
-- Schema Evolution: Handling changes in source data structures
-- Multiple Sources: Managing diverse data sources and formats
-- Dependency Management: Coordinating ingestion across related datasets
 
 %% wiki footer: Please don't edit anything below this line %%
 
diff --git a/Concepts/Data Management/Data Management.md b/Concepts/Data Management/Data Management.md
@@ -0,0 +1,40 @@
+---
+Aliases:
+  - Concepts/Data Management
+Tags:
+  - seedling
+publish: true
+---
+
+Data Management is the practice of collecting, organizing, protecting, and storing data in a way that enables efficient access, analysis, and decision-making throughout its entire lifecycle. It encompasses the policies, procedures, and technologies used to ensure data is accurate, available, secure, and compliant with regulations while meeting business requirements.
+
+## Data Management Components
+
+### 1. [[Data Governance]]
+
+Data Governance establishes the policies, procedures, and standards for managing data across an organization.
+
+### 2. [[Data Quality Management]]
+
+Data quality ensures that data is accurate, complete, consistent, and fit for its intended use.
+
+#placeholder 
+
+### 3. [[Data Catalog]]
+
+Data cataloging creates a centralized inventory of data assets with metadata to improve discoverability and understanding.
+
+### 4. [[Data Security]]
+
+Data security protects data from unauthorized access, corruption, and theft throughout its lifecycle.
+
+#placeholder 
+
+%% wiki footer: Please don't edit anything below this line %%
+
+## This note in GitHub
+
+<span class="git-footer">[Edit In GitHub](https://github.dev/data-engineering-community/data-engineering-wiki/blob/main/Concepts/Data%20Management/Data%20Management.md "git-hub-edit-note") | [Copy this note](https://raw.githubusercontent.com/data-engineering-community/data-engineering-wiki/main/Concepts/Data%20Management/Data%20Management.md "git-hub-copy-note")</span>
+
+<span class="git-footer">Was this page helpful?
+[👍](https://tally.so/r/mOaxjk?rating=Yes&url=https://dataengineering.wiki/Concepts/Data%20Management/Data%20Management) or [👎](https://tally.so/r/mOaxjk?rating=No&url=https://dataengineering.wiki/Concepts/Data%20Management/Data%20Management)</span>
diff --git a/Concepts/Data Processing/Data Processing.md b/Concepts/Data Processing/Data Processing.md
@@ -0,0 +1,40 @@
+---
+Aliases:
+  - Concepts/Data Processing
+Tags:
+  - seedling
+publish: true
+---
+
+Data Processing is the act of transforming raw data into meaningful, actionable information. It involves collecting, manipulating, filtering, sorting, and analyzing data to extract insights, support decision-making, and enable business operations. Data processing focuses on what happens to data after it has been ingested into your systems.
+
+## Data Processing Components
+
+### 1. Processing Systems
+
+- [[Online Transaction Processing|OLTP (Online Transaction Processing)]]
+- [[Online Analytical Processing|OLAP (Online Analytical Processing)]]
+- [[Hybrid Transactional Analytical Processing|HTAP (Hybrid Transactional Analytical Processing)]]
+
+### 2. Processing Execution Models
+
+- [[Batch Data Processing|Batch Processing]]
+- [[Stream Data Processing|Stream Processing]]
+- [[Micro-batch Processing]]
+
+### 3. [[Workflow Orchestration]]
+
+Scheduling/workflow orchestration manages the coordination of processing jobs.
+
+### 4. Processing Architectures
+
+![[Data Architecture#Popular Data Architecture Patterns]]
+
+%% wiki footer: Please don't edit anything below this line %%
+
+## This note in GitHub
+
+<span class="git-footer">[Edit In GitHub](https://github.dev/data-engineering-community/data-engineering-wiki/blob/main/Concepts/Data%20Processing/Data%20Processing.md "git-hub-edit-note") | [Copy this note](https://raw.githubusercontent.com/data-engineering-community/data-engineering-wiki/main/Concepts/Data%20Processing/Data%20Processing.md "git-hub-copy-note")</span>
+
+<span class="git-footer">Was this page helpful?
+[👍](https://tally.so/r/mOaxjk?rating=Yes&url=https://dataengineering.wiki/Concepts/Data%20Processing/Data%20Processing) or [👎](https://tally.so/r/mOaxjk?rating=No&url=https://dataengineering.wiki/Concepts/Data%20Processing/Data%20Processing)</span>
diff --git a/Concepts/Data Storage/Data Storage.md b/Concepts/Data Storage/Data Storage.md
@@ -0,0 +1,55 @@
+---
+Aliases: [Concepts/Data Storage]
+Tags: [incubating]
+publish: true
+---
+
+This page contains an overview of the technologies and systems used to store and retrieve data in various formats and structures. Modern data storage can be fundamentally divided into two categories: **Databases** (managed storage with built-in compute) and **Object Storage** (raw storage that requires external compute).
+
+## 1. Databases (Storage + Compute)
+
+[[Database|Databases]] provide both storage and built-in compute capabilities with structured query interfaces.
+
+### [[Relational Database]]
+
+A relational database is a traditional structured storage using tables, rows, and columns with ACID properties.
+
+### Non-Relational (NoSQL) Databases
+
+NoSQL databases store data in flexible formats such as documents, key-value pairs, graphs, or columns, enabling scalability and schema-less design for diverse data types.
+
+![[Non-relational Database#Types of Non-relational Databases]]
+
+## 2. [[Object/Blob Storage]]
+
+Object storage provides raw data persistence without built-in compute - requiring external processing engines.
+
+```mermaid
+%%{init: { "flowchart": { "useMaxWidth": true } } }%%
+graph TB
+    A[Applications] 
+    B[Object Storage API]
+    
+    subgraph "Object Storage"
+        C[Bucket/Container]
+        D[Objects/Files]
+        E[Metadata]
+    end
+    
+    A -->|PUT/GET/DELETE| B
+    B --> C
+    C --> D
+    C --> E
+    
+    F[External Compute] -->|Process files| D
+```
+See the **data stores** category for examples and popular tools.
+
+%% wiki footer: Please don't edit anything below this line %%
+
+## This note in GitHub
+
+<span class="git-footer">[Edit In GitHub](https://github.dev/data-engineering-community/data-engineering-wiki/blob/main/Concepts/Data%20Storage/Data%20Storage.md "git-hub-edit-note") | [Copy this note](https://raw.githubusercontent.com/data-engineering-community/data-engineering-wiki/main/Concepts/Data%20Storage/Data%20Storage.md "git-hub-copy-note")</span>
+
+<span class="git-footer">Was this page helpful?
+[👍](https://tally.so/r/mOaxjk?rating=Yes&url=https://dataengineering.wiki/Concepts/Data%20Storage/Data%20Storage) or [👎](https://tally.so/r/mOaxjk?rating=No&url=https://dataengineering.wiki/Concepts/Data%20Storage/Data%20Storage)</span>