It seems to be working just as advertised - you have to open your cpu hungry stuff in different terminals though to take advantage of its power.
Edit: maybe launching programs with setsid has the same result, maybe someone can enlighten me about it...
AND enable it - it is disabled by default:
sysctl kernel.sched_autogroup_enabled=1
Disabling it:
sysctl kernel.sched_autogroup_enabled=0
I use the 2.6.38-2-686-bigmem kernel from experimental (4 GB RAM) on an Athlon II x2 250 @3.00 GHz CPU.
I compiled Wine with "make -j 20" (that are 20 parallel threads) and i did not feel it on my desktop/browsing tasks.
With that above value set to 0, i had very slow desktop redrawing, extremely slow program launch - opening a gnome-terminal for example took a few seconds, whereas with it enabled it is instant.
BTW on previous kernels such as the 2.6.32 i remember trying the exactly same command (make -j 20 on wine) and it led to complete desktop lockup after 15-20 seconds - so the 2.6.38 kernel has some other tweaks that help in this area.