ThatsNasus website

Opening

Don’t you love it to get a phone call from a friend that tells you, that they managed to fry their filesystem on some external drives? Turns out the person calling me is just a little impatient, which is bound to create nightmares with NTFS and SMR drives via USB. They transferred a bunch of data from one USB drive to another using their Ubuntu laptop. The drives were connected to the same USB Controller, as well as their external mouse. As soon as the drives have hit their internal buffer limit, the transfer speed plummeted to near zero, the laptop seemed to hang itself, the mouse was not responding anymore, while the USB Controller was waiting for the drives to tell that they were done with moving data. The “fix” of an impatient person: Unplugging the drives mid-transfer and hard-resetting the laptop by holding the power button until the screen went black.

Naively I thought to myself “Can’t be too bad”, and went over to their place. GParted refused to fix it and pointed me to a Windows machine instead. So my friend’s unused Windows 7 laptop tried to fix it automatically — which, very on-brand, it did not — and I decided to let chkdsk /f handle things while I played the waiting game. After verifying the functionality is restored, I gave my friend a quick and dirty rundown on what happened and also explained to them, that they might opt to moving less data at once and maybe even use the internal storage of the laptop as intermediate buffer, so the drives have more time to finish their work and USB Controller does not get congested that horribly again. After that, I went home, happy to have fixed an issue, taught someone a bit more about the inner workings of drives and data movement and expected to never hear from it again.

Analysis

A week later, same person calling me again. Topic: “I managed to kill it again…“. Oh no, that’s not an accident anymore, that’s a pattern. Damn, how do you remove “autopilot” from someone’s behavior? Possible with a lot of time and patience, but until that change has happened, they would probably kill the filesystem another 10 or more times. I could teach them how to fix that themselves, but that is just putting a band-aid on it, instead of solving the problem. Even if they would be able to restore the filesystem themself, killing the drives mid-transfer by disconnecting them from the power source creates stress on the hardware and is likely to cause physical damage to the read/write heads or the platters. Additionally, they were juggling 6 external drives on two USB ports on the laptop, plugging in, unplugging, vibrations, exhaust heat of the laptop, worst case maybe even dropping a drive from the edge of the table… everything kind of sucks.

While thinking about this on a trip to my bathroom, I looked at my homelab in my hallway, which just… works. But actually getting a server and setting up a full proxmox environment, creating a ceph cluster on it and so on would have been overkill, too expensive, too much waste of time and the learning curve for someone without the technical background like them would feel more like a solid wall. Around the same time, I got the old PC from my girlfriend that she did not need anymore. It’s dated, not capable of actual gaming workloads, but having an Intel Core i5-4570 and a mainboard with 8 SATA ports basically screams “NAS” all over it. So I proposed my idea to them. Let’s build a custom TrueNAS box, with the following thoughts in my head:

Less torture for their drives
centralized storage
pilot-project for them to see what annoys them and what they like to get a better picture of actual demand, like:
Need more Space? -> Bigger case, more drives.
Too much management overhead or too much power consumption? -> Synology.
Need some more workloads like video reencoding or maybe a media server like Plex? -> Proxmox (Or some other virtualization platform).

And since TrueNAS Scale is capable of running some containers, scales decently in a homelab setup with dated hardware and the box mostly sitting idle, it seemed like a jack-of-all-trades, master of none. If that isn’t the definition of a sweetspot for a pilot project without having to sink like 2000 bucks upfront into a setup that might not even make them happy, I don’t know what is.

Making theory become reality (feat. Sisyphus)

They liked the idea, so I started working. Dismantle the old box (Gpu: gone, old, dead drives: gone), clean everything up and remove the carpet that covered everything inside it, re-applied fresh thermal paste, some cable management, added some donor RAM I had laying around (2x4GB) and a donor drive I wasn’t using anyway. After booting the box, having some issues with the memory and running memtest, I realized that the board / CPU was not being nice to the RAM. All four slots on the board threw errors, with multiple different DIMMs. Okay, that’s bad. Lesson learned: Next time check your hardware first, before suggesting it as a box for someone else. Luckily I had another board with an AMD FX4300 lying around which was known-good, since I planned to resell it on eBay. Change of plans, i5 out, spaceheater in. Yes, less performance, more heat, more power draw, but way better than 30k memtest errors in 30 seconds. After fixing the hardware, I went on to configure the bios, install TrueNAS Scale onto that machine, test the performance, noise, tune the fan-curves, general setup stuff.

After validating the box is working properly (even with one eye covered, one leg amputated and choked by a slow hard drive), I called my friend and told them “if you are not mad that we break open the cases of your external hard drives and free them from their prison, I have an idea that accelerates things a bit…“. Their reaction naturally at first was something like “WTF? Why? My poor drives. My poor data.”. I explained to them, that those drives like it way better to have a stable power supply, don’t get stressed out with overly conservative head parkings imposed by the USB Controller in those enclosures, not being shaken and overheated, and most of all not have their power cut off during a data transfer. They thought about it and decided to crack four of those six drives. I also told them, that they should think about investing in a bit more RAM and maybe a SSD as root drive for TrueNAS. After getting confirmation that they are willing to invest, I searched up components, sent the links over and -90 bucks (and four days) later, I got to assemble the TrueNAS to its - for now - final form. Ripped out the donor RAM and HDD, added the 2x8GB of new RAM and the SSD, reinstalled TrueNAS and configured the system with their credentials and so on.

A day later, I got the 4 drives ready to be dismantled, which was another point of a tradeoff decision for me to make. Clone the data off the drives into my ceph cluster over USB and then destroy the cases, or crack the cases first and then dump the data via SATA? I had no idea about the health of the drives and smart over USB is a nightmare. So I decided to go with the faster route, if one of those drives is acting up, better to have it not spin for days instead of hours. So I ended up:

cracking the drive open
mounting the drive
dumping the data into my ceph
unmounting the drive
smartctl -a the drive (for a baseline health status)
smartctl -t long the drive (to let the drive test itself completely)
smartctl -a the drive again for a comparison
evaluating the drives health status
repeat for the other drives.

One drive (according to the statement of my friend the newest) was extremely suspicious and I didn’t even feel comfortable repurposing it into that TrueNAS box. A “new” drive having 42312 power on hours and over 1.2 million head parkings, something does not feel right here. Anyway, does not matter if there is something off or some mixup in either of our heads about the drives’ order, that drive has worked for three times its estimated lifespan, and there is no changing that. The other three drives were used and have their marks as well, but healthy enough. My evaluation baseline of “healthy” in this context is asking myself “Would I trust that drive with my data without having a backup?“. I told them that one drive is in bad shape and I would not recommend throwing it into the box. I can do it, if they want to, but they should start looking for a replacement basically yesterday, because I would not trust that drive with data I do not want to lose. They acknowledged and wanted to have it in that box anyways. Spoiler alert: my warning was correct.

One by one I added the drives to that TrueNAS box, created pools on them without any backup or raid for multiple reasons. One being the lack of space, because the drives were full to the brim, one even to the extent that I could not put the data back onto that drive due to ZFS overhead (~700MiB free of 3.6TiB on NTFS). Another reason was their data layout. They had carefully and painfully over years created a data structure that worked for them. Now just ripping that apart and forcing my (subjectively, not objectively) “better” layout onto them would just make the experience more painful. Jumping from external drives to a server that is dusti-, I mean chugging along in a corner is a major leap already and I didn’t want to destroy the project right at the beginning with frustrating “where’s my data?” moments over and over again.

Then I created the SMB and NFS shares (samba for their windows laptop, nfs for the linux devices), defined the ACL and started the share services. Then I had another waiting game to play, while my ceph drowned that poor box in data. 8 OSDs pushing data into that little TrueNAS box felt a bit like tattooing ants with an orbital laser cannon. The good thing: this way I load tested the TrueNAS for 8 hours straight and it didn’t flinch one time. After the data was completely migrated, I made some finishing touches like asking them for their local IP address on their laptop, so I can set up TrueNAS ready-to-run in their network with a static IP address, proper samba announcement, correct subnet mask and so on. And then, there was nothing left to do, besides telling my friend that they can pick up that box at any time.

Deployment

A few days later, they stopped by to pick up the server, I grabbed an ethernet cable, power cable, keyboard and vga cable, as well as a few tools like screwdrivers - just in case. At their place, we picked a neat spot where the server can sit without being in danger of getting flooded, knocked over, standing in the way and so on. Connected the box to the network, plugged in the power cord and flipped the switch on the PSU. The box roared to life, then got silent after the fan curve throttled. Not even a full three minutes later, we checked in on their laptop, the webinterface was reachable, the shares appeared in the network discovery, perfect. Their smile was worth every minute, let me tell you that. It just worked. Them seeing all six drives available on the laptop, being able to interact with all drives without replugging and after seeing the transfer rates they were as happy as a 3-year-old who gets 5 scoops of ice cream, it was infectious. I am always happy, when my work makes someone else happy, solves problems and makes life easier.

After the initial hype, I gave them a brief overview where to find what in the webinterface, like disks, datasets, monitoring and reports. And then it happened. ZFS on pool 3 reports errors. Even worse: write errors. I chose to flag the pool as readonly to stop TrueNAS from trying to “fix” it and possibly even make it worse, cleared the error log and watched to see if new ones pop up. Luckily flagging it stopped the accumulation and I got to dig a bit more. SMART told me (or us at this point), the drive has 41 relocated sectors. It literally couldn’t have been scripted better. Pool 3 resided on the drive I was suspicious about from the get go, so having it now fail right in front of them, like a few days after me telling them I don’t trust it, felt like destiny. I explained to them that this is the reason why we opted for TrueNAS. It tells you when things happen and not silently leaves you wondering in a few months with “where is my data?“. Naturally - and I bet everyone would feel that way - they didn’t like having to watch their data vanish into nothingness. Relatable, I wouldn’t want to watch 2TB of my data vanish silently either. But on the positive side, they were impressed that TrueNAS can do this.

I gave them some homework, to pull the data off that dying drive as much as they are able to, because that drive is about to “retire” and every second that drive keeps spinning, is a second closer to saying “bye bye data”. They started right away and wondered why moving the data did not work. I answered that implicit question with a short explanation, that moving - so the process of copying and then deleting the source - is still a task that writes. The system does not care about what to write, and deleting is just writing zeros. Or - to be precise - to tell the filesystem to free up used blocks to be used again, but a write nonetheless. Since I flagged the pool read only, they were only able to copy off that drive.

A little later I got home and over the next few days I got messages about drives that they were looking at. Two drives were interesting enough to be considered (description and cost, ebay can be a mess) and they bought them. I got over to my friend’s place after those drives arrived, we cracked the cases, installed them into the TrueNAS box and treated them with the same routine I used for the other drives. SMART readout, long testrun, which takes quite some time for 8TB drives. So I got home again and waited for the next day to have the smart tests completed. At that occasion I took the dying 2TB drive with me, because my friend wasn’t entirely sure if they copied everything. Now I got some more homework to do in parallel, which you can read more about here, if you are interested in some data forensics.

Both drives were healthy. Interested in a crash course about “carefulness is the mother of the porcellai— uh, TrueNAS box”? (Yes, I know, don’t throw rocks please, old cheese is fine though)

Arbitrary restrictions and finishing touches

The webinterface does not allow the creation of a two drive zraid1. Makes sense. It would just be a mirror with way more resource overhead because of parity calculations, while a mirror would have the same effect with less pain. Well, now what? They still have another 8TB drive in an enclosure, but that drive is full as well. If I now make a two drive mirror from those two drives, they could stuff the data into that mirror pool. When then choosing to add the third 8TB drive, they gain “nothing” (well, except more data resilience). So I chose to “negotiate” via the CLI about creating a two drive raidz1. Successfully.

After more homework for my friend and more data movement, they told me that the third 8TB drive is empty and ready to be tortured as well. So I got over there - again - dismantled the drive, installed it, read the SMART values, triggered another long smarttest and then got home. The next day - after the test completed - I read out the smart values again, evaluated again and tried to add the drive via the webinterface to the existing pool, which should work just fine, ZFS supports adding more drives to an existing raidzX. It’s painful because of all the recalculations of the parity data, but it is supported. The webinterface begged to differ though. So I got back to negotiating via CLI. I wiped the drive to remove the remaining NTFS signatures and then attached the drive with a simple zpool attach command. The box is cooking up a new recipe for the parity cake and it’s taking its time.

After two days of letting TrueNAS reflow the data, my friend and I noticed that the pool only showed up with 10.79TiB of usable capacity. That seemed weird, but on the other hand TrueNAS also started a scrub on that pool. We decided to wait for the scrub to finish, maybe that resolves the issue with the missing capacity. Another day of waiting and: still 10.79TiB. Now I started wondering where the issue is. TrueNAS reports the pool to be healthy, three drives wide at 7.28TiB each and with a layout of raidz1. Everything fine on paper, but something was clearly off here. Since I expected this to be the fabric of which nightmares will be made, I asked my friend to bring over the box to me. Here I have my systems, my environment, my tools, my everything. As soon as the box arrived, I connected it to my network, powered it on and started digging. SMART values? All good. Scrub issues? Completed without errors. Reading out everything I could think of via the commandline: pool status, zfs status, trying to export and reimport the pool, resyncing the pool (zpool sync) and much more boring stuff. When poking at middleware I noticed, that middleware also correctly shows the pool with 24TB total capacity and 16TB allocated (which would have translated to roughly 14.5TiB usable space). So where is the missing capacity? I dumped the pool metadata, and - after some careful reading - lost my mind. Everything reported “all green, no issues here”. And yet: the pool dump revealed: expansion completed and the reflow finished to 99.45-something percent. So the reflow apparently had a hiccup or died or whatever, but TrueNAS (and zfs) did not bother to show that anywhere.

Theory and reality

Okay, now I have several routes to take from here, and the most sane road in my opinion is: backup the data into my ceph (and any other available storage device I could find), then try to poke the pool to either fix it on the fly, or at least learn something from it before nuking and rebuilding it. After the backup finished (which took almost two days), I had forgotten about my idea of poking the pool and just nuked it and went ahead trying my exact same route that I have taken before to create the two drive raidz1 pool, then expanding it. Result: 10.79TiB again. So it is not an issue of having had too much data on it, one thing ruled out. Looking at the pool dump and now knowing what to look for, I was baffled. Expansion completed successfully, but the reflow finished with 427%. Okay, lesson learned: I have no idea what exactly causes the mess, but clearly something in that box is not acting according to the ZFS documentation. Might be a version mismatch, might be a snowflake thing with TrueNAS, I simply do not know. And since I wanted the box to be available for my friend again, I took the pragmatic route and destroyed the pool once again. After some clicky-clicky web ui magic, I created the pool according to the way TrueNAS expects me to create it. Three drives, raidz1, and 30 seconds later, I had a 14.4TiB usable capacity pool. The usual stuff followed right away, creating a dataset, setting permissions, creating the SMB and NFS shares. And now it’s up to the waiting game (again) of moving all the data back onto the - now correctly set up - pool.

While now transferring all the data back, I realized: Yes, raidz1 has performance overhead for parity. I knew that beforehand. I expected the transfer to be slower than onto a mirror or stripe. But I had not expected the rates to be around 30-40MiB/s. Thinking about it, it makes sense, but I did not expect it to be that bad. The combination of three spinning platters, only 16GB of RAM and no dedicated metadata drive for the pool, results in very heavy IO-waits (meaning the CPU and everything else in the transfer chain has to wait until the drives report “I’m done, next job please”), which in turn stalls the transfer. Adding a metadata drive (does not even have to be a SSD, given that RAM and gigabit would be the next natural bottleneck) would certainly resolve the issue, since the metadata for the data and parity information for that specific pool would not have to be written to the data drives as well. Having the metadata on the drives where the data resides, results in very heavy random IO, since data gets written in one place on the drive, metadata gets written to another place on the drive, making the read-write heads jumping across the drive all the time, stalling it. When adding a dedicated metadata drive, the metadata would be written - almost - sequentially to the metadata drive while the actual data would be written almost perfectly sequentially to the data drives. But that also has a Catch-22: If the metadata drive fails, that pool and all its data is toast. So to reliably add a metadata drive, you would need two drives in a mirror (at least) so one drive can fail and it is still possible to rescue the data or the pool. That means: Throwing money at it. There is no real trick to play for me here to circumvent that issue, sadly.

The most interesting question after all this - at least to me - is: After having three drives in the pool, would my friend be able to just add another drive via the webinterface now, because it’s now a valid pool layout and not some cursed franken-raid? Time will tell. Anyway, long story short: Data is now being transferred to the box, slowly but steadily, the pool is rebuilt, the pool is healthy, the pool reports 14.4TiB usable capacity. My friend is happy, I am annoyed and happy. Chapter closed (hopefully).

Stopping a friend from destroying their data