|
| 1 | +--- |
| 2 | +Title: 'size()' |
| 3 | +Description: 'Returns a Series containing the size (row count) of each group.' |
| 4 | +Subjects: |
| 5 | + - 'Computer Science' |
| 6 | + - 'Data Science' |
| 7 | +Tags: |
| 8 | + - 'Data Structures' |
| 9 | + - 'Pandas' |
| 10 | +CatalogContent: |
| 11 | + - 'learn-python-3' |
| 12 | + - 'paths/data-science' |
| 13 | +--- |
| 14 | + |
| 15 | +The **`size()`** method in pandas returns the number of rows or elements in each group created by the `groupby()` [function](https://www.codecademy.com/resources/docs/pandas/built-in-functions). It provides a quick way to determine group sizes without applying an aggregation function. |
| 16 | + |
| 17 | +## Syntax |
| 18 | + |
| 19 | +```pseudo |
| 20 | +DataFrameGroupBy.size() |
| 21 | +``` |
| 22 | + |
| 23 | +**Parameters:** |
| 24 | + |
| 25 | +The `size()` method doesn't take any parameters. |
| 26 | + |
| 27 | +**Return value:** |
| 28 | + |
| 29 | +The `size()` method returns a Series containing the size (row count) of each group created by `groupby()`. |
| 30 | + |
| 31 | +## Example 1: Counting Rows by Group |
| 32 | + |
| 33 | +In this example, a [DataFrame](https://www.codecademy.com/resources/docs/pandas/dataframe) of employees is grouped by their department, and `size()` counts how many employees belong to each department: |
| 34 | + |
| 35 | +```py |
| 36 | +import pandas as pd |
| 37 | + |
| 38 | +data = { |
| 39 | + 'Department': ['HR', 'IT', 'HR', 'Finance', 'IT', 'Finance'], |
| 40 | + 'Employee': ['John', 'Sara', 'Mike', 'Anna', 'Tom', 'Chris'] |
| 41 | +} |
| 42 | +df = pd.DataFrame(data) |
| 43 | + |
| 44 | +group_sizes = df.groupby('Department').size() |
| 45 | +print(group_sizes) |
| 46 | +``` |
| 47 | + |
| 48 | +The output of this code is: |
| 49 | + |
| 50 | +```shell |
| 51 | +Department |
| 52 | +Finance 2 |
| 53 | +HR 2 |
| 54 | +IT 2 |
| 55 | +dtype: int64 |
| 56 | +``` |
| 57 | + |
| 58 | +## Example 2: Using Multiple Grouping Columns |
| 59 | + |
| 60 | +In this example, `size()` counts the number of members in each combination of team and shift within a dataset: |
| 61 | + |
| 62 | +```py |
| 63 | +import pandas as pd |
| 64 | + |
| 65 | +data = { |
| 66 | + 'Team': ['A', 'A', 'B', 'B', 'B', 'C'], |
| 67 | + 'Shift': ['Day', 'Night', 'Day', 'Night', 'Day', 'Day'], |
| 68 | + 'Name': ['John', 'Sara', 'Mike', 'Anna', 'Tom', 'Chris'] |
| 69 | +} |
| 70 | +df = pd.DataFrame(data) |
| 71 | + |
| 72 | +group_sizes = df.groupby(['Team', 'Shift']).size() |
| 73 | +print(group_sizes) |
| 74 | +``` |
| 75 | + |
| 76 | +The output of this code is: |
| 77 | + |
| 78 | +```shell |
| 79 | +Team Shift |
| 80 | +A Day 1 |
| 81 | + Night 1 |
| 82 | +B Day 2 |
| 83 | + Night 1 |
| 84 | +C Day 1 |
| 85 | +dtype: int64 |
| 86 | +``` |
| 87 | + |
| 88 | +## Codebyte Example: Counting Transactions Per Product |
| 89 | + |
| 90 | +In this example, `size()` is used to count how many sales transactions occurred for each product in a store dataset: |
| 91 | + |
| 92 | +```codebyte/python |
| 93 | +import pandas as pd |
| 94 | +
|
| 95 | +sales = pd.DataFrame({ |
| 96 | + 'Product': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Banana', 'Apple'], |
| 97 | + 'Customer': ['A', 'B', 'C', 'A', 'D', 'E', 'F'] |
| 98 | +}) |
| 99 | +
|
| 100 | +counts = sales.groupby('Product').size() |
| 101 | +print(counts) |
| 102 | +``` |
| 103 | + |
| 104 | +## Frequently Asked Questions |
| 105 | + |
| 106 | +### 1. What is the pandas `groupby().size()` method? |
| 107 | + |
| 108 | +`groupby().size()` returns the number of rows in each group created by `groupby()`. |
| 109 | + |
| 110 | +### 2. What is the purpose of `groupby()` in pandas? |
| 111 | + |
| 112 | +`groupby()` splits data into groups based on selected column values to enable aggregation and summarization. |
| 113 | + |
| 114 | +### 3. What does `NaN` stand for in pandas? |
| 115 | + |
| 116 | +`NaN` stands for Not a Number and indicates missing or undefined data. |
0 commit comments