3 views (last 30 days)

Show older comments

Hello community,

for my current project i need to divide the values of a table for each group (parent-group) into 10 eqaully sized subgroups. The values within a parent-group are arranged hierarchically. By creating the subgroups, the hierarchical order has to remain, so that the first subgroup of a parent-group contains the lowest 10% of the values, the second subgroup of a parent group contains the next larger 10%-stack (>10% - 20%) of the values and so on. In the following, i will give an example:

Data = array2table(rand(1000,1));

Data.Properties.VariableNames{1} = 'Value';

ParentGroups = [1:1:10]';

Data.ParentGroups = repelem(ParentGroups,100);

Data = sortrows(Data,{'ParentGroups','Value'});

My first approach was to create the deciles of each group, which separate each parent-group into 10 equally sized parts. The problem here is, that the function i used simply outputs the values of the deciles, but there is no reference to their location in the original data. For creating the deciles, I used the following code:

decileFnc = @(x) prctile(x,[0:10:100]);

deciles = varfun(decileFnc,Data,'GroupingVariables','ParentGroups','InputVariables',1);

deciles.Fun_Value = string(deciles.Fun_Value);

deciles = [deciles(:,1:2),regexp(deciles.Fun_Value, '\s+', 'split')];

doubleFnc = @(x) double(x);

deciles = [deciles(:,1:2),varfun(doubleFnc,deciles,'InputVariables',3:13)];

The last block of code is used to unstack the output of varfun, i just shared it for the sake of completeness.

I'm really hoping for some help here and appreciate every answer.

Peter Perkins
on 6 Aug 2021

>> x = rand(16,1);

>> g = repmat((1:2)',8,1);

>> t = table(g,x)

t =

16×2 table

g x

_ _______

1 0.81472

2 0.90579

1 0.12699

2 0.91338

1 0.63236

2 0.09754

1 0.2785

2 0.54688

1 0.95751

2 0.96489

1 0.15761

2 0.97059

1 0.95717

2 0.48538

1 0.80028

2 0.14189

Here's a function that takes a vector and returns the quartile bin in which each value falls.

function qbin = fun(x)

q = quantile(x,[0 .25 .5 .75 1]);

qbin = discretize(x,q);

end

Apply that function to each group of rows in the table.

>> tq = rowfun(@fun, t, "GroupingVariables","g")

tq =

16×3 table

g GroupCount Var3

_ __________ ____

1 8 3

1 8 1

1 8 2

1 8 2

1 8 4

1 8 1

1 8 4

1 8 3

2 8 3

2 8 3

2 8 1

2 8 2

2 8 4

2 8 4

2 8 2

2 8 1

rowfun has returned its output table in group order. To attach the quartile bin numbers to the correct rows of the original data, use the second output of sort.

>> [~,ord] = sort(t.g);

>> t.qbin(ord) = tq.Var3

t =

16×3 table

g x qbin

_ _______ ____

1 0.81472 3

2 0.90579 3

1 0.12699 1

2 0.91338 3

1 0.63236 2

2 0.09754 1

1 0.2785 2

2 0.54688 2

1 0.95751 4

2 0.96489 4

1 0.15761 1

2 0.97059 4

1 0.95717 4

2 0.48538 2

1 0.80028 3

2 0.14189 1

To make the result easier to confirm, sort by x within group; notice that the quartile bin numbers are as expected.

sortrows(t,["g" "qbin"])

ans =

16×3 table

g x qbin

_ _______ ____

1 0.12699 1

1 0.15761 1

1 0.63236 2

1 0.2785 2

1 0.81472 3

1 0.80028 3

1 0.95751 4

1 0.95717 4

2 0.09754 1

2 0.14189 1

2 0.54688 2

2 0.48538 2

2 0.90579 3

2 0.91338 3

2 0.96489 4

2 0.97059 4

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!