Skip to main content

DataFrameGroupBy

Methods

min

Computes the minimum value for each group and returns a DataFrame.

DataFrameGroupBy.min(skipna=True, numeric_only=False) # "skipna" indicates whether to ignore missing values, default "True". "numeric_only" indicates whether to compute only numeric columns, default "False".

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.min() # Returns a DataFrame composed of the minimum values of other columns grouped by "name"

max

Computes the maximum value for each group and returns a DataFrame.

DataFrameGroupBy.max(skipna=True, numeric_only=False)

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.max() # Returns a DataFrame composed of the maximum values of other columns grouped by "name"

sum

Computes the sum for each group and returns a DataFrame.

DataFrameGroupBy.sum(skipna=True, numeric_only=False)

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.sum() # Returns a DataFrame composed of the sums of other columns grouped by "name"

std

Computes the standard deviation for each group and returns a DataFrame.

DataFrameGroupBy.std(skipna=True, numeric_only=False, ddof=n) # "ddof" stands for Delta Degrees of Freedom. "n" is the degrees of freedom adjustment value, specifically the value subtracted from the sample size. By default, ddof=1, which computes the sample standard deviation. Set ddof=0 for population standard deviation.

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.std() # Returns a DataFrame composed of the standard deviations of other columns grouped by "name"

var

Computes the variance for each group and returns a DataFrame.

DataFrameGroupBy.var(skipna=True, numeric_only=False, ddof=n)

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.var() # Returns a DataFrame composed of the variances of other columns grouped by "name"

mean

Computes the mean for each group and returns a DataFrame.

DataFrameGroupBy.mean(skipna=True, numeric_only=False)

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.mean() # Returns a DataFrame composed of the means of other columns grouped by "name"

diff

Computes the difference of the specified order for each group and returns a DataFrame.

DataFrameGroupBy.diff(periods = n) # "n" is the difference order, default 1

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.diff(1) # Returns a DataFrame composed of the first-order differences of other columns grouped by "name"

cumprod

Computes the cumulative product for each group and returns a DataFrame.

DataFrameGroupBy.cumprod(skipna=True)

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.cumprod(skipna=True) # Returns a DataFrame composed of the cumulative products of other columns grouped by "name"

pct_change

Computes the percentage change for each group and returns a DataFrame.

DataFrameGroupBy.pct_change(periods=1) # "periods" is the offset period, default 1. Returns error with non-numeric columns.

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.pct_change(1) # Returns a DataFrame composed of the percentage changes of other columns grouped by "name"

agg

Applies multiple aggregation functions to each group and returns a DataFrame.

DataFrameGroupBy.agg()

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
def my_sum(s):
return s.sum()
d2 = g1.agg(my_sum)
return d2 # Returns a DataFrame with aggregated results for each group grouped by "name"

apply

Applies a custom function to each group and returns a DataFrame.

DataFrameGroupBy.apply()

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
def my_mean(x):
res = x[0]
res["age"] = x["age"].mean()
res["count"] = x["count"].mean()
return res
d2 = g1.apply(my_mean)
return d2 # Returns a DataFrame transformed by the custom function "my_mean" for each group

transform

Transforms the grouped data using a specified function and returns a DataFrame with the same size as the original.

DataFrameGroupBy.transform()

# Example
d1 = DataFrame({ "name": ["js", "js", "js", "go", "go"], "count": [1, 2, 3, 4, 5], "age": [1, 1, 2, 3, 3] })
g1 = d1.groupby(["name"])
return g1.transform("mean") # Returns a DataFrame transformed by the specified function (mean) for each group