Merge pull request #1103 from steam-bell-92/main

sanjay-kv · web-flow · commit b00b72f47f4f · 2025-10-29T11:02:11.000+11:00
[Docs]: Python Pandas Library Added
diff --git a/docs/Pandas/pd_data_analysis.md b/docs/Pandas/pd_data_analysis.md
@@ -0,0 +1,83 @@
+# Basic Data Analysis
+
+Once you have loaded your data into a DataFrame, Pandas offers simple and powerful methods for quickly exploring and summarizing your data, which is the core of any Data Science workflow.
+
+1. **Inspecting the Data**
+
+Before performing any analysis, you must first understand the structure and quality of your dataset.
+This step helps identify data types, missing values, and potential anomalies.
+
+|Method|Description|
+|:-----|:----------|
+|`df.head()`|Displays the first n rows (default 5) for a quick look at the data.|
+|`df.tail()`|Displays the last n rows (default 5).|
+|`df.info()`|Shows column data types, non-null counts, and memory usage.|
+|`df.describe()`|Generates summary statistics for numeric columns.|
+|`df.shape`|Returns a tuple (rows, columns).|
+|`df.dtypes`|Displays data types of all columns.|
+
+
+2. **Handling Missing Data (NaN)**
+
+Real-world data often has missing or incomplete entries.
+Handling them correctly is essential to avoid biased or invalid results.
+
+|Method|Description|
+|:-----|:----------|
+|`df.isnull().sum()`|Counts missing (NaN) values per column.|
+|`df.dropna()`|Removes rows with missing values.|
+|`df.fillna(value)`|Fills missing values with a specific value.|
+|`df.fillna(df.mean())`|Fills missing values with the mean (for numeric columns).|
+
+
+3. **Data Selection and Filtering**
+
+Once the data is clean, you often need to focus on specific rows or columns to analyze relevant subsets.
+
+|Method|Description|
+|:-----|:----------|
+|`df['col']`|Selects a single column (returns a Series).|
+|`df[['col1','col2']]`|Selects multiple columns.|
+|`df.loc[row_labels, col_labels]`|Selects by label (rows and columns).|
+|`df.iloc[row_index, col_index]`|Selects by integer index position.|
+|`df[df['col'] > value]`|Filters rows based on a condition.|
+
+
+4. **Grouping and Aggregation**
+
+After filtering, you often need to summarize or compare groups within your data.
+
+|Method|Description|
+|:-----|:----------|
+|`df.groupby('col').agg()`|Groups data by the specified column, then applies an aggregate function (e.g., `mean()`, `sum()`, `count()`).|
+|`df.describe()`|Generates descriptive statistics (mean, std, min, max, etc.) for numerical columns.|
+|`df['col'].value_counts()`|Counts the frequency of unique values in a column.|
+
+
+5. **Data Transformation & Cleaning**
+
+Data transformation involves reshaping, reformatting, or correcting data to make it more consistent and analysis-ready.
+
+|Method|Description|
+|:-----|:----------|
+|`df.rename(columns={'old':'new'})`|Renames columns.|
+|`df.drop(columns=['col'])`|Removes one or more columns.|
+|`df.replace(old, new)`|Replaces specific values.|
+|`df.astype('type')`|Changes the data type of a column.|
+|`df.sort_values(by='col')`|Sorts rows by column values.|
+|`df.reset_index(drop=True)`|Resets the DataFrame index.|
+
+
+***Quick Statistics***
+
+Once the data is ready, you can compute summary statistics to get insights about its distribution and relationships.
+
+|Method|Description|
+|:-----|:----------|
+|`df.mean()`|Computes the mean (average) for numeric columns.|
+|`df.std()`|Computes the standard deviation for numeric columns.|
+|`df.min()`|Returns the minimum value for each column.|
+|`df.max()`|Returns the maximum value for each column.|
+|`df.median()`|Computes the median (50th percentile) for numeric columns.|
+|`df.corr()`|Computes pairwise correlation between numeric columns.|
+ No newline at end of file
diff --git a/docs/Pandas/pd_dataframes.md b/docs/Pandas/pd_dataframes.md
@@ -0,0 +1,70 @@
+# Key Data Structures: Series and DataFrame
+
+Pandas introduces two primary data structures: the Series and the DataFrame. Understanding these is crucial, as they form the basis of nearly all operations in the library.
+
+## The Series (1D)
+
+A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floats, Python objects, etc). You can think of a Series as a single column in a spreadsheet or a single vector in a dataset.
+
+***Key components***:
+
+**Data**: The actual values stored.
+
+**Index** (Label): The labels used to access the data.
+
+Creating a Series
+
+```Python
+import pandas as pd
+
+# Creating a Series from a list
+data = [10, 20, 30, 40]
+s = pd.Series(data, name='Example_Series')
+print(s)
+```
+
+Output:
+```Python
+0    10    <-- Index (Default integer)
+1    20
+2    30
+3    40
+Name: Example_Series, dtype: int64
+```
+
+## The DataFrame (2D)
+
+A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure. It is the most common object you will work with in Pandas and is analogous to a complete spreadsheet or a table in a database.
+
+***Key components***:
+
+**Data**: The actual values arranged in rows and columns.
+
+**Rows Index**: Labels for each row.
+
+**Column Index**: Labels for each column (the column names).
+
+Creating a DataFrame
+The most common way to create a DataFrame is from a Python dictionary, where the keys become the column names.
+
+```Python
+# Creating a DataFrame from a dictionary
+data = {
+    'Name': ['Alice', 'Bob', 'Charlie'],
+    'Age': [25, 30, 22],
+    'City': ['New York', 'London', 'Paris']
+}
+
+df = pd.DataFrame(data)
+
+print(df)
+```
+
+Output:
+```Python
+       Name  Age      City
+0     Alice   25  New York  <-- Row Index
+1       Bob   30    London
+2   Charlie   22     Paris
+^-- Column Names/Index
+```
diff --git a/docs/Pandas/pd_input_output.md b/docs/Pandas/pd_input_output.md
@@ -0,0 +1,43 @@
+# Data Input/Output
+
+One of the greatest strengths of Pandas is its ability to effortlessly read data into and write data out of a DataFrame from various file formats. This is achieved primarily through the functions prefixed with `pd.read_` and the methods prefixed with `df.to_`.
+
+## Reading Data into a DataFrame
+To load data into a Pandas DataFrame, you use the appropriate `pd.read_...()` function. The most common input format is CSV.
+
+|Function|File Type|Example Usage|
+|:-------|:--------|:------------|
+|`pd.read_csv()`|Comma-Separated Values (Text files)|`df = pd.read_csv('data.csv')`|
+|`pd.read_excel()`|Microsoft Excel files|`df = pd.read_excel('data.xlsx')`|
+|`pd.read_json()`|JavaScript Object Notation|`df = pd.read_json('data.json')`|
+|`pd.read_sql()`|SQL database tables|`df = pd.read_sql(query, connection)`|
+
+
+**Example**: Reading a CSV File
+
+The `read_csv()` function is highly flexible, supporting parameters to handle delimiters, missing values, and specific column selection.
+
+```Python
+# Load data from a CSV file into a DataFrame
+df_sales = pd.read_csv('sales_data.csv')
+```
+
+## Writing Data from a DataFrame
+
+After you've cleaned, transformed, or analyzed your data, you'll use a `.to_...()` method on the DataFrame object to save the results.
+
+|Method|File Type|Example Usage|
+|:-----|:--------|:------------|
+|`df.to_csv()`|Comma-Separated Values|`df.to_csv('cleaned_data.csv', index=False)`|
+|`df.to_excel()`|Microsoft Excel files|`df.to_excel('analysis.xlsx', sheet_name='Summary')`|
+|`df.to_json()`|JavaScript Object Notation|`df.to_json('data_output.json')`|
+
+
+**Example**: Writing to a CSV File
+
+When writing to a CSV, it is best practice to use `index=False` to prevent the DataFrame's row indices (the 0, 1, 2, ... numbers) from being saved as an unnecessary extra column in the file.
+
+```Python
+# index=False ensures the row index is NOT included in the file
+df_sales.to_csv('processed_sales.csv', index=False)
+```
diff --git a/docs/Pandas/pd_intro.md b/docs/Pandas/pd_intro.md
@@ -0,0 +1,45 @@
+# Introduction to Pandas
+
+## What is Pandas?
+Pandas is a powerful, open-source Python library essential for data analysis and data manipulation. It provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
+
+At its core, Pandas is designed to make working with labeled and relational data (like data found in spreadsheets or SQL tables) both intuitive and fast. It is built on top of the NumPy library and is the standard tool used by data professionals for critical tasks such as:
+
+- Data Cleaning: Handling missing data, filtering, and correcting errors.
+- Data Transformation: Grouping, merging, reshaping, and pivoting datasets.
+- Data Exploration: Calculating descriptive statistics and inspecting data structure.
+
+### Installation and Setup 🛠️
+Pandas is not included in the standard Python library and must be installed separately.
+
+1. **Installation**
+
+Open your terminal or command prompt and run the following command:
+
+```Bash
+pip install pandas
+```
+
+If you are using the Anaconda distribution (common for data science), you can use the conda 
+
+```Bash
+conda install pandas
+```
+
+2. **Importing & Verifying**
+
+Once installed, you can begin using Pandas by importing it into your Python environment (script, Jupyter Notebook, etc.) using the widely accepted alias pd. It's also good practice to check the version you are using.
+
+```Python
+import pandas as pd
+
+# Check the version of Pandas installed
+print(pd.__version__)
+```
+
+### Foundation and Ecosystem
+
+It's helpful for users to know that Pandas is deeply integrated with the wider Python data science ecosystem:
+
+- Built on NumPy: Internally, Pandas relies heavily on the NumPy library for fast array-based computation, which is why it performs complex operations so quickly.
+- Data Visualization: Pandas data structures work seamlessly with popular visualization libraries like Matplotlib and Seaborn.
diff --git a/sidebars.ts b/sidebars.ts
@@ -116,6 +116,17 @@ const sidebars: SidebarsConfig = {
         },
       ],
     },
+    {
+      type: "category",
+      label: "Pandas",
+      className: "custom-sidebar-pandas",
+      items: [
+        "Pandas/pd_intro",
+        "Pandas/pd_dataframes",
+        "Pandas/pd_input_output",
+        "Pandas/pd_data_analysis", 
+      ],
+    },
     {
       type: "category",
       label: "🗄️ SQL",