If something can’t go wrong, it will anyway

About a year ago, the retired moocher decided it was time to retire a couple of elderly Drobo Gen 2 direct attached storage devices. After much research, the moocher undertook an orgie of parts ordering, skinned knuckles, and arcane incantations to produce a FreeNAS storage server with collateral duties for networking and media service. Then there was thunder. It wasn’t bad but freenas wouldn’t run. This post describes the recovery actions and reminds all to take a key precaution.

References

The characters in this drama in order of appearance

  1. FreeNAS 11.2 U5.
  2. Thunderstorm short wave
  3. Debian 9.4
  4. Ubuntu 18.04 LTS
  5. CentOS 7 18-04 version
  6. Mac Mini circa 2009

What Happened

This spring, we had a number of power fades courtesy of some frisky thunderstorms, auto accidents, etc that caused the power to hiccup as relays tried to clear faults and reenergize the circuit. Fortunately, there was no lightening damage but the power cuts apparently caught the FreeNAS system doing something important to the system disk. The machine became colicky and after a time, brain dead.

What Syslog Said

Startup messages indicated that /dev/da0 had gone bad, that was, the kernel could not find the system files it needed to start FreeNAS. It turned out that the USB nubbin used was a singleton that was electrically OK but had a few dings in bad places.

Initial Recovery

Initial recovery was to download a new copy of FreeNAS, transfer it to a thumb drive, and reinstall FreeNAS on a redundant pair of new 64GB Samsung USB nubbins. FreeNAS woke up, discovered my two media sets, the primary ZRAID2 volume and the secondary USB replication volume. But all was not good. Although they were still present, the metadata needed for my jails and VMs went missing.

One of the clever things about FreeNAS is that the OS is on media separate from the storage pool. The storage can be carried to any FreeNAS system, connected, and the system started. FreeNAS will find the disks and reconstitute the volume from the metadata on each physical disk in the volume set. The system disk has a small amount of configuration data on it that is easily backed up and restored. This arrangement allows for the failure and replacement of system media without heartburn.

In fact, pre-installed nubbins can be kept on hand and popped in when needed. It doesn’t get any easier.

Now you tell me

No worries mate, just restore your saved configuration. You do have one, don’t you? Where did I put that?

It turns out I didn’t have one! So, lucky me, I had to revive the VMs and Jails individually repeating the installation and setup of each. But the main file system reconstituted itself, the data was there, the jail and VM storage was there, etc. Just the configuration data was missing so some things had to be reconfigured from the moocher’s memory, a particularly bit-rot prone form of backup.

The Virtual Machine becomes real.

This system has several jobs: FreeNAS file service, Time Machine client storage pools, UniFI controller, Plex server, and Roon Server. As is often the case in the free open source software world, programs make changes while other programs lag behind. Roon Server runs in an iohyve created VM. I was unable to get it to come back up. So, no big deal, reinstall Roon Server in a new IOHYVE or BHYVE. IOHYVE is a shell command for managing BHYVE virtual machines. There is also a graphical thing in the FreeNAS GUI. Well, the two don’t talk so there is no way to use IOHYVE to sort a colicky BHYVE GUI created BHYVE. I never did figure out how to recover it (assumed to be trivial with a saved configuration). So, I tried to reinstall Debian and Roon Server

Well several nights later, I gave up. Debian would install but the network wouldn’t come up. That was after making a directory by hand that the installer didn’t create. The installer would appear to complete but the OS wouldn’t start. After a lot of reading the fine print from the text based installer, I realized said directory was missing and created it by hand. The installer ran and the Debian 9.4 started but the network device would not hook up. I never could find a solution for this problem.

I tried Ubuntu 18.04 LTS and Centos 7 18-04, both with no joy. Different things wrong with each that they didn’t quite come up. The solution was to dust off the old Mac Mini 2009 Intel Core i2 and place it in service. It is perfectly happy reconstituting FLAC. Ubuntu 18.04 LTS went on easily, the full install, Roon Server went on easily with hacks to start script to run as daemon rather than root. CIFS went on easily and the mount for FreeNAS_Media was practiced simple.

This is in contrast to the situation last summer where things went together and came up as advertised. But this was with an earlier FreeNAS 11, Debian 9.1, etc. FOSS is always an adventure. Most times pleasant, but sometimes you meet a bear and things don’t quite fit together.

The Jails

They just sorted. So easily, I forgot the details. I believe I only had to edit the settings to tell them to start. No need to reinstall UniFi or Plex. I may add UNMS in another.

Other lost stuff

There was a slew of other clean-up to do that the saved configuration would have restored. This included the following.

  • Snap shot of the primary array. Use the temporary credential and root user.
  • Replication of the primary array to the bug-out USB disk
  • Time machine happy with a new container. Use the Wizard.
  • Actually saving a configuration.

Where to save the configuration?

The System General tab has a button to save the configuration. This copies key files from the nubbins to an archive that downloads to the browser. I created a folder on FreeNAS_Media and collect them there. Why there? Well, it is RAIDZ2 and backed up. So it should be a safe as anywhere. Three disks have to fail to loose it. Yes it can happen. After all this is the thunderstorm-prone South-East United States. Once things settle a bit, I’ll keep a copy on iCloud.

Lightening precautions at Dismal Manor

Lightening precautions at Dismal Manor include a whole house surge protector, point of use surge protectors, and home owner’s hazard insurance which covers electrical surge damage. Surges have several causes, lightening, wind-blown tree branches, and auto accidents in which a broken HV conductor contacts the dodgy insulation of the LV lines on the transformer secondary or falls near the pole ground elevating ground potential until relaying reenergizes the transformer.

Lightening is sort of obvious. The HV wire in single wire earth return gets hit causing a voltage spike. Or the strike hits the pole and goes to earth at the ground pushing the neutral around. Or a tree branch contacts a HV line and it arcs to the tree, relays dump the power while the arc quenches and re-close causing a brief interruption. Any of these can disturb power enough to cause a disk error or memory error and a direct hit will break stuff.

Service Grounding

The first line of defense is proper grounding. Next time you have a panel replaced or circuit breakers replaced, have the electrician inspect the grounding. There should be a robust ground block earthed at two points 10 feet apart. The earthing serves several purposes.

  • It keeps the neutral (the white wire) at ground potential
  • It keeps the ground circuits (the green wire) at ground potential.
  • It backs up the pole ground for the center tap of the transformer. The white wire connects to the transformer secondary winding center tap. Red and black to the ends. The center tap is earthed at the pole. The pole ground is also the earth return for the high voltage current in the transformer primary.

If the ground is not right, the lights will behave strangely when motors start. Some will get bright and others will dim. This usually presents when older hard-starting loads start. A front-loading washer going on spin was a classic cause but furnace blowers starting, air conditioner condensers, or other appliances starting can cause the same sort of disturbance of voltage balance. This can result from a weak ground at either end of the feeder from the transformer to the service panel. Check your end and have the utility check theirs including all connections. The utility has responsibility from the pole to the meter socket.

Surge protection in depth

Whole house surge protection clamps the red and black voltage when there is a surge. The device directs the surge energy to ground offering some protection to the house wiring and devices from all but the worst insults like a direct strike to the house service or structure.

Just about everything has transistors, even light bulbs now that LED lighting is ascendent. Just loosing the light bulbs can be a significant expense. Then there’s the cook top, oven, refrigerator, washer, dryer, dishwasher, phones, TVs, etc.

To back up the whole house power protection, I have point of use power protection for the computers, audio, and video equipment. This can take one of three forms, uninterruptible power supplies for the computers, power conditioning for the AV electronics, and surge strips to distribute conditioned power to devices.

Some of the AV equipment is collectable and although it could be replaced with modern equivalents having a similar voice and imaging ability, it would be sad to loose it, so it is protected. Eventually, a Furman Power conditioner will assume this job.

And if all that fails, there’s homeowner’s insurance subject to a deductible that will pay for service inspection and repair, and replacement of damaged equipment. But we’d probably have an argument over the value and replacement costs of my Great American Sound Ampzilla and Dhalquist DQ-10s.