How I fixed my DSDT for kacpi_notify 97% CPU.

Getting your soundcard to work, using Debian on non-i386 hardware, etc

How I fixed my DSDT for kacpi_notify 97% CPU.

Postby hellokitty » 2010-08-28 05:31

DISCLAIMER: Altering your DSDT is serious business and can cause physical & unrepairable damage to your machine! This is for advanced users only, or those who don't mind destroying their machines in the learning process. You've been warned!

[This post will be updated if I find that the fix didn't work after awhile of usage]

Konnichi wa,

My Toshiba Satellite A305-S6872
Debian 5 Lenny
Kernel 2.6.35.3 at http://www.kernel.org

I'm just posting this here to give others a chance at fixing kacpi_notify or kacpid taking up alot of CPU in the recent kernels. I've google'd around and never saw a solution; just everyone pointing to buggy BIOS/DSDT table(which is true).

Here we go...

First, this all started with lack of fan with my Toshiba Satellite A305-S6872. So I copied my dsdt(cp /proc/acpi/dsdt) decompiled it, then compiled it again. There are several places online that deal with compile-errors for DSDT so I'm skipping that stuff. Bottom line, is I figured out where the empty FAN ON/OFF methods were and took an example someone else posted online for another laptop of same brand, similar model and it worked fine... and I had to remove the thermal module on boot up(added modprobe -r thermal to /etc/rc.local) to prevent kapci_notify taking 50% CPU. I got the idea of removing thermal because all the online complaints about kacpid and/or kacpi_notify being out of control seemed to mention thermal stuff. This worked fine in kernel 2.6.32.4, but since 2.6.34 - when DSDT from initrd support was dropped - I've had to compile my DSDT into the kernel. The iasl compiler is stricter when I'm running kernel 2.6.35.3 and I had to solve some new errors, but after that I compiled it into the kernel.

Now, kacpi_notify was taking 97% of my CPU on boot up... I could kill it by removing thermal, but for some reason in this newer kernel... removing thermal makes my laptop not able to suspend and the fan won't work. So, removing thermal was no longer an option. I had to fix the kacpi_notify issue.

I looked at my DSDT and placed Store("string",Debug) messages all over anything that mentioned the THRM and FAN stuff... added to my kernel cmdline acpi.debug_layer=0xFFFFFFFF and acpi.debug_level=0xA.... and compiled kernel with CONFIG_ACPI_DEBUG=y. NOTE: If you see alot of messages scrolling by on boot-up or shutdown that end up delaying the computer from moving forward. Don't use the kernel boot/cmdline args. Instead, boot up normally then echo 0xA > /proc/acpi/debug_level and echo 0xFFFFFFFF > /proc/acpi/debug_layer ....then before shutdown echo 0 into both those files before shutting down.


The result is that all the FAN & THERMAL messages would be sent to dmesg so I can follow the logic of what's going on.... see below:
Code: Select all
    Scope (_TZ)
    {
        PowerResource (FN00, 0x00, 0x0000)
        {
            Method (_STA, 0, Serialized)
            {
                Store ("In _TZ.PowerResource._STA", Debug)
                Return (One)
            }

            Method (_ON, 0, Serialized)
            {
                Store ("In _TZ.PowerResource._ON", Debug)
            }

            Method (_OFF, 0, Serialized)
            {
                Store ("In _TZ.PowerResource._OFF", Debug)
            }
        }

        Device (FAN)
        {
            Name (_HID, EisaId ("PNP0C0B"))
            Name (_UID, Zero)
            Name (_PR0, Package (0x01)
            {
                FN00
            })
            Name (STAT, Zero)
            Method (_PSC, 0, Serialized)
            {
                Store ("querying fan status", Debug)
                Store (\_SB.PCI0.LPC.EC0.FNST, STAT)
                \_SB.PCI0.LPC.EC0._Q11 ()
                Return (STAT)
            }

            Method (_PS0, 0, Serialized)
            {
                Store ("turning fan ON", Debug)
                \_SB.PCI0.LPC.EC0._Q11 ()
                Store (Zero, STAT)
            }

            Method (_PS3, 0, Serialized)
            {
                Store ("turning fan OFF", Debug)
                \_SB.PCI0.LPC.EC0._Q11 ()
                Store (0x03, STAT)
            }
        }

        ThermalZone (THRM)
        {
            Method (_TMP, 0, Serialized)
            {
                Store ("In THRM._TMP", Debug)
                If (\_SB.PCI0.LPC.EC0.ECOK)
                {
                    Store (RDEC (0x9C, 0xFF, Zero), Local0)
          Store ("In THRM._TMP->Ifstatement=true", Debug)
                    Return (Add (0x0AAC, Multiply (Local0, 0x0A)))
                }
      Store ("In THRM._TMP->Ifstatement=false", Debug)
                Return (0x0BB8)
            }

            Method (_AC0, 0, Serialized)
            {
      Store("In THRM._AC0",Debug)
                If (LLess (FNON, 0x28))
                {
          Store("In THRM._AC0, FNON < 0x28",Debug)
                    Return (0x0D68)
                }
                Else
                {
          Store("In THRM._AC0, FNON >= 0x28",Debug)
                    Return (Add (0x0AAC, Multiply (FNON, 0x0A)))
                }
            }

            Method (_PSV, 0, Serialized)
            {
      Store("In THRM._PSV",Debug)
                If (LNotEqual (PERN, One))
                {
          Store ("In THRM._PSV, PERN != One", Debug)
                    If (LLess (TRON, 0x64))
                    {
         Store ("In THRM._PSV, TRON < 0x64", Debug)
                        Return (0x0E58)
                    }
                    Else
                    {
         Store ("In THRM._PSV, TRON >= 0x64", Debug)
                        Return (0x0EF8)
                    }
                }
                Else
                {
          Store("In THRM._PSV, PERN = One", Debug)
                    If (LLess (TRON, 0x64))
                    {
         Store ("In THRM._PSV, TRON < 0x64", Debug)
                        Return (0x0EBC)
                    }
                    Else
                    {
         Store ("In THRM._PSV, TRON >= 0x64", Debug)
                        Return (0x0F20)
                    }
                }
            }

            Method (_CRT, 0, Serialized)
            {
      Store ("In THRM._CRT", Debug)
                If (LNotEqual (PERN, One))
                {
          Store ("In THRM._CRT, PERN = One", Debug)
                    If (LLess (TRON, 0x64))
                    {
         Store ("In THRM._CRT, TRON < 0x64", Debug)
                        Return (0x0E58)
                    }
                    Else
                    {
         Store ("In THRM._CRT, TRON >= 0x64", Debug)
                        Return (0x0EF8)
                    }
                }
                Else
                {
          Store ("In THRM._CRT, PERN != One", Debug)
                    If (LLess (TRON, 0x64))
                    {
         Store ("In THRM._CRT, TRON < 0x64", Debug)
                        Return (0x0EBC)
                    }
                    Else
                    {
         Store ("In THRM._CRT, TRON >= 0x64", Debug)
                        Return (0x0F20)
                    }
                }
            }

            Method (_SCP, 1, Serialized)
            {
      Store ("In THRM._SCP", Debug)
                Store (Arg0, CTYP)
            }

            Name (_AL0, Package (0x01)
            {
                FAN
            })
            Method (_PSL, 0, Serialized)
            {
      Store ("In THRM._PSL", Debug)
                If (CMPE)
                {
          Store ("In THRM._PSL, CMPE=true", Debug)
                    Return (Package (0x02)
                    {
                        \_PR.CPU0,
                        \_PR.CPU1
                    })
                }
      Else
      {
         Store("In THRM._PSL, CMPE not true", Debug)
      }

                Return (Package (0x01)
                {
                    \_PR.CPU0
                })
            }

            Name (_TC1, 0x02)
            Name (_TC2, 0x05)
            Name (_TSP, 0x012C)
        }
    }


Notice all the Store-to-Debug message that'll tell me exactly how ACPI is moving through the thermal & fan controls. Note that for me, the DeviceFAN area disappeared from 2.6.35.3. When I decompiled dsdt in 2.6.32.3 it had DeviceFAN(empty _PS0 and _PS3) and no PowerResource area. I copied the DeviceFAN stuff(with the fix for the _PS0 & PS3 from someone else's toshiba-laptop fix) into this dsdt produced from kernel 2.6.35.3. And when I say "produced", I mean cp /proc/acpi/dsdt and iasl -d dsdt to produce the "dsdt.dsl".
So, anyway... recompile kernel.... booting up with the debug flags mentioned above... this is what I saw being written at least hundred times per second to dmesg:
Code: Select all
[ACPI Debug]  String [0x0C] "In THRM._AC0"
[ACPI Debug]  String [0x1A] "In THRM._AC0, FNON >= 0x28"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0F] "turning fan OFF"
[ACPI Debug]  String [0x2A] "Notifying TZ.THRM with 0x81 in method _Q11"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._OFF"
[ACPI Debug]  String [0x0C] "In THRM._PSV"
[ACPI Debug]  String [0x19] "In THRM._PSV, PERN != One"
[ACPI Debug]  String [0x19] "In THRM._PSV, TRON < 0x64"
[ACPI Debug]  String [0x0C] "In THRM._AC0"
[ACPI Debug]  String [0x1A] "In THRM._AC0, FNON >= 0x28"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0F] "turning fan OFF"
[ACPI Debug]  String [0x2A] "Notifying TZ.THRM with 0x81 in method _Q11"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._OFF"
[ACPI Debug]  String [0x0C] "In THRM._PSV"
[ACPI Debug]  String [0x19] "In THRM._PSV, PERN != One"
[ACPI Debug]  String [0x19] "In THRM._PSV, TRON < 0x64"
[ACPI Debug]  String [0x0C] "In THRM._AC0"
[ACPI Debug]  String [0x1A] "In THRM._AC0, FNON >= 0x28"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0F] "turning fan OFF"
[ACPI Debug]  String [0x2A] "Notifying TZ.THRM with 0x81 in method _Q11"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._OFF"
[ACPI Debug]  String [0x0C] "In THRM._PSV"
[ACPI Debug]  String [0x19] "In THRM._PSV, PERN != One"
[ACPI Debug]  String [0x19] "In THRM._PSV, TRON < 0x64"
[ACPI Debug]  String [0x0C] "In THRM._AC0"
[ACPI Debug]  String [0x1A] "In THRM._AC0, FNON >= 0x28"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0F] "turning fan OFF"
[ACPI Debug]  String [0x2A] "Notifying TZ.THRM with 0x81 in method _Q11"

So... see that loop above. For some reason, something wants to turn the fan off but the fan in my laptop never goes completely off... just very-very quiet, even when I boot into the Win7 that came with the Laptop on purchase. And, that PowerResource._STA always return One. I concluded that some kernel thread in kacpi_notify wants to turn off the fan, then checks the status which always returns One, so it tries to turn it off again... and again... and again. I also see that the first parameter to this PowerResource package is "FN00", and I have "FN00" in my Device(FAN) area so I concluded that this PowerResource object was only for the fan. For reasons I don't understand, my laptop's fan works just fine as long as I have the Device(FAN) there so I don't know what the PowerResource object is for other than causing me trouble.... so I changed the "Return (One)" to "Return (Zero)", breaking the infinite loop. Recompiled my kernel with the new dsdt.hex that I get by compiling this modified dsdt.dsl(iasl -tc dsdt.dsdt)....Rebooted... and check the "top" command. To my great delight, kacpi_notify stayed at 0%-2% CPU. Then I checked dmesg:
Code: Select all
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
[ACPI Debug]  String [0x19] "In _TZ.PowerResource._STA"

This seems to appear about every 3 seconds... which is equal to what I have in my /proc/acpi/thermal_zone/THRM/polling_frequency.
In my /etc/rc.local I have "echo -n 3 > /proc/acpi/thermal_zone/THRM/polling_frequency", so I guess that makes sense.

So things almost seem to work even though:
- I never see the "querying fan status" message, which is odd....
- I never see the "turning fan OFF" message, which is odd....
- I do see the "In _TZ.PowerResource._ON" message, followed by the "turning fan ON" when I do an intense CPU activity like render a complex scene in Maya2008 for Linux. When I stop the render, the fan stops shortly afterwards... but I don't see the "turning fan OFF" message. O_o;

Somehow, suspend works, my fan comes on at thermal-trip-points, and kacpi stuff never goes beyond about 2% CPU usage..... BUT...

UPDATE: As it turns out, after leaving the laptop on for awhile idle... the fan speeds up alittle bit. If I then do some CPU intensive task, the fan comes on full power and won't go back off even after CPU goes back to idle and temperature goes back to below trip-point. So, then I just removed the PowerResource area completely... removed acpi stuff from the kernel cmdline(so it just looks like this: /boot/vmlinuz-2.6.35.3 root=/dev/sda3 rw vga=792 acpi=copy_dsdt), booted up then "echo 0xFFFFFFFF > /proc/acpi/debug_layer" and "echo 0xF > /proc/acpi/debug_level" and ended up seeing:
Code: Select all
ACPI: Execute Method [\_TZ_.THRM._PSV] (Node f703ae3c)
[ACPI Debug]  String [0x0C] "In THRM._PSV"
[ACPI Debug]  String [0x19] "In THRM._PSV, PERN != One"
[ACPI Debug]  String [0x19] "In THRM._PSV, TRON < 0x64"
   utils-0286 [00] evaluate_integer      : Return value [3672]
ACPI: Execute Method [\_TZ_.THRM._AC0] (Node f703ae50)
[ACPI Debug]  String [0x0C] "In THRM._AC0"
[ACPI Debug]  String [0x1A] "In THRM._AC0, FNON >= 0x28"
   utils-0286 [00] evaluate_integer      : Return value [3432]
ACPI: Execute Method [\_TZ_.THRM._TMP] (Node f703ae64)
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
   utils-0286 [00] evaluate_integer      : Return value [3172]
 thermal-0265 [00] thermal_get_temperatur: Temperature is 3172 dK
ACPI: Execute Method [\_TZ_.FAN_._PSC] (Node f703aeb4)
[ACPI Debug]  String [0x13] "querying fan status"
[ACPI Debug]  String [0x2A] "Notifying TZ.THRM with 0x81 in method _Q11"
  evmisc-0125 [00] ev_queue_notify_reques: Dispatching Notify on [THRM] Node f703ae78 Value 0x81 (**Device Specific**)
   utils-0286 [00] evaluate_integer      : Return value [0]
     bus-0248 [00] bus_get_power         : Device [FAN] power state is D0
ACPI: Execute Method [\_TZ_.FAN_._PS3] (Node f703ae8c)
[ACPI Debug]  String [0x0F] "turning fan OFF"
[ACPI Debug]  String [0x2A] "Notifying TZ.THRM with 0x81 in method _Q11"
  evmisc-0125 [00] ev_queue_notify_reques: Dispatching Notify on [THRM] Node f703ae78 Value 0x81 (**Device Specific**)
     bus-0356 [00] bus_set_power         : Device [FAN] transitioned to D3
ACPI: Execute Method [\_TZ_.THRM._PSV] (Node f703ae3c)
[ACPI Debug]  String [0x0C] "In THRM._PSV"
[ACPI Debug]  String [0x19] "In THRM._PSV, PERN != One"
[ACPI Debug]  String [0x19] "In THRM._PSV, TRON < 0x64"
   utils-0286 [00] evaluate_integer      : Return value [3672]
ACPI: Execute Method [\_TZ_.THRM._AC0] (Node f703ae50)
[ACPI Debug]  String [0x0C] "In THRM._AC0"
[ACPI Debug]  String [0x1A] "In THRM._AC0, FNON >= 0x28"
   utils-0286 [00] evaluate_integer      : Return value [3432]
ACPI: Execute Method [\_TZ_.THRM._TMP] (Node f703ae64)
[ACPI Debug]  String [0x0C] "In THRM._TMP"
[ACPI Debug]  String [0x1E] "In THRM._TMP->Ifstatement=true"
   utils-0286 [00] evaluate_integer      : Return value [3172]
 thermal-0265 [00] thermal_get_temperatur: Temperature is 3172 dK
ACPI: Execute Method [\_TZ_.FAN_._PSC] (Node f703aeb4)
[ACPI Debug]  String [0x13] "querying fan status"

So, the bus keeps saying the fan is ON( bus-0248 [00] bus_get_power : Device [FAN] power state is D0), and then keeps trying to turn OFF the fan.... and thus, kacpi_notify is now at 50% CPU all the time. I think if it wasn't for the "bus_get_power" thing saying the fan is ON, when it's really OFF, this would have worked fine. I'm going to keep poking at this DSDT... but just posting my steps for future people.

UPDATE 2:
Okay, so what I decided to remove next was the "\_SB.PCI0.LPC.EC0._Q11 ()" in the Device(FAN)._PSC, to break the CPU intensive loop. Bad idea! Just removing that line causes random locks up and the screen started fading to white, then to black, as if someone physically bent the screen too hard and broke stuff inside it. So, I immediately hard-power-off the laptop... boot into my 2.6.32.3 working kernel(I got both kernels in my grub.cfg) and tried something else.
I put the "\_SB.PCI0.LPC.EC0._Q11 ()" back into the Device(FAN)._PSC... then, I did more googling and saw this: "http://costela.net/files/dsdt_toshiba_l300_21c.txt". <--- The main difference is that they remove "\_SB.PCI0.LPC.EC0._Q11 ()" from Device(FAN)._PS0 and Device(FAN)._PS3. So, I did that too... recompiled... rebooted.... everything works but kacpi_notify taking 50% CPU. Okay, fine, acpi debug messages were still showing loops similar to above. I googled around some more about kacpi_notify and loops and ran into:

http://www.mail-archive.com/linux-acpi@ ... 02476.html
Code: Select all
Right now we have overheating NX and NC series of notebooks
from HP with AMD processors (somehow Intel DSDTs do not
use Notify from infinite While loop), plus several "kacpid uses
100% of cpu" caused by Notify() events, which were scheduled to
rush through the queue if the bloker exits.


Hmm, caused by Notify events, eh?... so I look at my dmesg and see...

[ACPI Debug] String [0x2A] "Notifying TZ.THRM with 0x81 in method _Q11"
evmisc-0125 [00] ev_queue_notify_reques: Dispatching Notify on [THRM] Node f703ae78 Value 0x81 (**Device Specific**)

This is actually outside of the _TZ scope, called Device(EC0)._Q11. It looks to be what is called when "\_SB.PCI0.LPC.EC0._Q11 ()" is executed. So, I took at look at the method:
Code: Select all
Method (_Q11, 0, NotSerialized)

  Store (0x11, P80H)
  Acquire (MUTS, 0xFFFF)
  OSMI (0xC2)
  Release (MUTS)
  Store("Notifying TZ.THRM with 0x81 in method _Q11", Debug)
  Notify (\_TZ.THRM, 0x81)
 }

So, I decided to remove the "Notify (\_TZ.THRM, 0x81)".... recompile.... reboot.

So far, so good! Fan goes on & off when it's suppose to, I can suspend & resume many times and the fan still works properly.
dmesg shows the following lines written every 3 seconds(my thermal polling setting):
Code: Select all
[ACPI Debug]  String [0x13] "querying fan status"
     bus-0209 [00] bus_get_power         : Device [FAN] power state is D0
ACPI: Execute Method [\_TZ_.FAN_._PS3] (Node f703ce8c)
[ACPI Debug]  String [0x0F] "turning fan OFF"
     bus-0317 [00] bus_set_power         : Device [FAN] transitioned to D3
ACPI: Execute Method [\_TZ_.FAN_._PSC] (Node f703ceb4)
[ACPI Debug]  String [0x13] "querying fan status"
     bus-0209 [00] bus_get_power         : Device [FAN] power state is D0
ACPI: Execute Method [\_TZ_.FAN_._PS3] (Node f703ce8c)
[ACPI Debug]  String [0x0F] "turning fan OFF"
     bus-0317 [00] bus_set_power         : Device [FAN] transitioned to D3


Even when I do a cpu intensive activity that turns on the fan, I keep seeing those messages.... no "turning fan ON" messages, odd.
kacpi_notify is now staying at 0%-1%. I'll update this if I find a problem with this configuration - but for now I'm solid.

Owari,
User avatar
hellokitty
 
Posts: 2
Joined: 2010-08-28 04:43

Re: How I fixed my DSDT for kacpi_notify 97% CPU.

Postby rzr » 2011-11-02 02:48

Can you publish a diff between the 2 sources ?

--
http://rzr.online.fr/q/sensor
User avatar
rzr
 
Posts: 45
Joined: 2006-05-30 20:39
Location: fr/35/rennes

Re: How I fixed my DSDT for kacpi_notify 97% CPU.

Postby hellokitty » 2013-01-08 22:28

Here's the "source code" --- http://pastebin.com/raw.php?i=NUsGQT2f

Since I can't attach it to this forum.
User avatar
hellokitty
 
Posts: 2
Joined: 2010-08-28 04:43


Return to Hardware

Who is online

Users browsing this forum: No registered users and 7 guests

fashionable