Pandora’s Box of Troubleshooting: High Sierra 10.13.4 Kernel Panics

I generally like troubleshooting. The juicier the problem, the more rewarding is the feeling of having solved it. Of course, solving other people’s problems is often lot more fun than solving your own – they are grateful, and you get to feel good about having done something nice for someone else. Possibly the best part is – if you can’t solve it, you don’t have to live with the consequences! This post is a rather dull story of one of my own troubleshooting experiences. I advice you to skip it. Unless of course google has brought you here while searching for kernel panics under High Sierra, especially related to memory, storage, thunderbolt, firmware, SSD or appleRAID. Then read on.

Recently I found myself running out of storage, and decided that if I was going to take this photography business seriously, I should invest in a solution that is both larger and faster. I picked up a couple of good deals on thunderbolt gear, but upon starting to copy data across, I started experiencing kernel panics, on my own computer. Uh oh.

No no no no no no no no no no no no no no ….

For those who don’t know, kernel panics are the worst kind of problem:

  • Can be caused by practically anything (software or hardware)
  • Difficult to diagnose (world class level obfuscated error messages™)
  • Fatal outcome (you can’t trust your computer not to crash at a moment’s notice)

According to apple themselves “In most cases, kernel panics are not caused by an issue with the Mac itself. They are usually caused by software that was installed, or a problem with connected hardware. “

My understanding of kernel panics is that they are generally the result of your computer being surprised. Some information the computer expected is not where it was expected to be, or perhaps some information was found where it wasn’t supposed to be. In a nutshell, kernel panics are caused by corruption of the memory, where the term corruption typically meaning “anything that the CPU decides it doesn’t like”.

The trouble is that there are a lot of things that can corrupt the memory.

Diagnosis of the cause behind a kernel panic is hence rather typically time-consuming and tedious. I first experienced kernel panics with my first OS X apple computer, over a decade ago. Back then, it was dodgy USB 2.0 PCI expansion cards with terrible drivers that caused the problems. Nothing has really changed in this area, as USB3.0 PCI express expansion cards are also causing exactly the same flakey behaviour for users on both Mac and PC… but I digress.

PCI and PCI express expansion cards are particularly susceptible to causing KP’s, as they have DMA, or direct memory access – the ability to directly access (and hence directly corrupt) the memory of the system. Thunderbolt is effectively PCI express, and hence it also benefits from the same kind of DMA. There are several other ports and connectors with this same deep-level access, such as firewire and expresscard.

Thinking bad drivers were causing the issue, I reinstalled my entire macOS high Sierra system. The kernel panics stopped! Hooray! Until they happened again, and I was left to continue troubleshooting, now with the added “joy” of reinstalling all my apps.

I next unplugged my thunderbolt devices, and just worked with data off my USB 3.0 drives. This finally solved the kernel panic problem… until it didn’t. I experienced several more kernel panics (one of which completely corrupted my USB SSD…) and realised that the problem probably wasn’t caused by the thunderbolt devices. I began to suspect my latest addition of a USB 3.0 SSD and so I eliminated that… and another kernel panic occurred. I resorted to removing all peripherals and my other USB3.0 drives, but finally experienced one more kernel panic .

Any time you speak to apple about a kernel panic, they will often ask you to reinstall the original factory RAM. Third party ram is often a cause of kernel panics, something that I have also experienced. However, I had not had any problems with KP until just recently, and the ram I am using is all Crucial Mac Ram. So that ruled the RAM out, right?

I downloaded rember, to confirm this for certain. Rember is a classic trusted memory testing program “memtest“, packaged up in a nice simple modern GUI. I ran a RAM test on loop, and had no issues – the tests seemed to be passing fine! I clicked “stop” on the test, and experienced a kernel panic. What on earth?! I found this was repeatable – if I chose to run a test, and then abort the test early, and about 70% of the time, the system would kernel panic!

Now that this seemed to be possibly memory related, I removed all my iMac’s ram. I noticed a small amount of dust, which I promptly dislodged. Then I tested the RAM sequentially, one stick at a time with rember, with all ram passing flawlessly. I reinstalled all the ram, and have experienced no further issues with rember triggering KP’s.

Conclusion (aka TL;DR)

So it seems that dusting the RAM bay, and reseating all the RAM, fixed the problem for me.

This is the point where I call this problem solved. Time will tell if I continue to experience KP’s, with or without thunderbolt devices, and I will update this post if I do.

I hope this one ends here.

 

Leave a Reply

Your email address will not be published. Required fields are marked *