6.1 Basic Grouping

# group by a single column
g = df.groupby("segment")

Iterate or inspect groups:

for name, group in g: ...
g.get_group("young")
g.size() ⇒ counts per group

```

6.2 Aggregation

# single aggregation
```

agg1 = g\["age"].mean()

# multiple aggregations

agg2 = g.agg({
"age": \["mean", "min", "max"],
"salary": "sum"
})

```

6.3 Named Aggregations

# clean output with new column names
```

g2 = df.groupby("segment").agg(
avg\_age=("age", "mean"),
total\_salary=("salary", "sum"),
count=("age", "size")
)

```

6.4 Transformation vs. Filter vs. Apply

transform() ⇒ return aligned output:

# z‑score within group
```

df\["age\_z"] = g\["age"].transform(lambda x: (x - x.mean()) / x.std())

filter() ⇒ subset groups by condition:

# keep groups with >10 rows
df\_filt = g.filter(lambda x: len(x) > 10)

apply() ⇒ arbitrary function on each group, slower:

def top\_n(x, n=3):
return x.nlargest(n, "salary")

df\_top = g.apply(top\_n)

```

6.5 Grouping on Multiple Keys & Index Levels

# multi‑key grouping
```

multi = df.groupby(\["department", "role"]).agg(
mean\_perf=("performance", "mean"),
n=("id", "size")
)

# group by index level

ts = df.set\_index(\["date", "category"])
res = ts.groupby(level="category").sum()

```

6.6 Time‑Aware Grouping

# group by month from datetime index
```

ts = df.set\_index("timestamp")
monthly = ts\["value"].groupby(pd.Grouper(freq="M")).sum()

```

6.7 Performance Tips

Use observed=True for categoricals to skip empty groups.
Avoid apply when possible; prefer vectorised transform or agg.
Pre‑allocate results for custom loops via dict then pd.concat.

6 · GroupBy (“split‑apply‑combine”)

6.1 Basic Grouping

6.2 Aggregation

6.3 Named Aggregations

6.4 Transformation vs. Filter vs. Apply

6.5 Grouping on Multiple Keys & Index Levels

6.6 Time‑Aware Grouping

6.7 Performance Tips