A ZFS Masterclass with Tom Lawrence

In part 2, Tom Lawrence of Lawrence Systems gives us an absolute masterclass in ZFS, answers some audience questions, talks about some of his favorite open-source projects, and opines about the shift away from the cloud back to self hosting.

Did you miss Part 1?

Interested in learning more or keeping up with Tom Lawrence?

https://lawrencesystems.com/

Other Ways to Connect with the Uncast Show

Ed: 0:08

Hi everyone and welcome back to another episode of the Uncast Show. Well, in fact I'm not going to say another episode, in fact it's actually part B of my awesome interview with the great Tom Lawrence. Now, in this part we're going to be continuing from part A. Well, Tom, I reckon we can't let people wait any longer to start talking about ZFS. I'm sure a lot of the Unraid guys out there. There's been a lot of buzz about ZFS coming into Unraid. But what I'm going to ask you about ZFS is what's the point? Why do I need it? I'm going to ask you about ZFS is what's the point? Why do I need it? So can you shed some light on why ZFS is special compared to other file systems? People might know?

Tom Lawrence: 0:51

Yes. So I will quote Michael Lucas and call it the billion dollar file system. And if you don't know who Michael Lucas is, he didn't write the book on ZFS, he wrote two of them and Michael is really a wealth of knowledge to help me understand that file system better through his reading. But too long, don't have time to read a book on it. Zfs has one of the most developed file systems for maintaining the integrity of your data and also giving you performance, and the performance matters a ton in the enterprise world, as does integrity. Those are two things that are somewhat opposed to each other, if you think about it, because I mean we can always stripe it all together and get really good performance, but no integrity, and that little balance is hard to keep. Zfs has been able to through an amazing amount of engineering, starting all of its history in the earliest days when it was written all the way today. It was really ahead of its time when it first came out from a structure standpoint, and it's just got the attention of all the right people. Matter of fact, zfs is in more places than people realize because it is the back end for so many commercial storage products, even though they may have a fancy name on the front of it. If you could pull back and look what's under the hood of them, you go, oh, that's just ZFS under the hood. You just licensed ZFS. Yes, you did. That is very, very common and some companies don't do it as well. So, bitrot, when you start dealing with things at scale, we can solve by having extra drives and a parity to be able to validate that. Zfs takes that to the next level with the way their parity system works. It's definitely truly one of the most advanced ones. Combined with the way the ARC system works.

Tom Lawrence: 2:38

My friends over at 45 Drives did a great video they just published the other day about this. Because people don't understand, it's the single drive IOP problem that you should have if you're new to ZFS. You look at it and go I have five drives set up. I shouldn't get this fast a performance out of them because they're all grouped together in what they call a VDEV. But the way ZFS works is it takes a bunch of small writes, groups them together into transaction groups and then flushes them to the drive, which technically ends up giving you this higher level of performance that you wouldn't expect out of there. Combine that with the way it uses memory. So ZFS is this performant, high integrity file system and I don't think you can. Even ButterFS is the only thing I'm aware of that comes close, but comes close, but does not match ZFS in terms of that performance.

Ed: 3:25

And also, I'm sure, a lot of people, when they kind of think about ZFS and they haven't kind of used it before I get a lot of people say, oh, do I have to have ECC RAM? Do I have to have one gig per terabyte? Your thoughts on that, please, tom.

Tom Lawrence: 3:40

Ah, yes, you do not need to have the ECC RAM and I actually have a. I think it's a 30 terabyte server with eight gigs of RAM. So does it now? More RAM means more performance. Zfs is RAM efficient. Some people say RAM hungry, but it just doesn't leave any unused. If you, if I, took and stuck more memory in that particular machine, it would happily consume all of it for caching. That is the advantage, but it's not a need. It just lowers the bar of the performance All those performance advantages.

Tom Lawrence: 4:15

When I'm talking about it, because the entire system has 8 gigs of RAM in it. That is just with over 30 terabytes of data on there. It doesn't perform very well, but it doesn't need to. It's a backup server. That's the reason it exists. It just keeps copies of my data. Um, it has a slow random seek, so if I was to start searching files on there it'd be a little slow. But you know, all it is is a extra copy of all my videos. I'm not too worried about it. If I ever have the chance for them, will they transfer a little bit slow? Yeah, but if I'm disaster recovering, then I'm just happy. I have one, yeah, right. But that being said, the ECC is an important one because people keep pushing that myth that you have to have it. Ecc is a nice have, not just for you know, not because of ZFS, but because hey bit what happened.

Ed: 5:00

It's not for everything, isn't it? Yeah, any file system, really, it all goes to the RAM first.

Tom Lawrence: 5:05

Yep it all. Goes to the RAM first.

Ed: 5:06

Yep it all goes to the RAM first. No, with any file system.

Tom Lawrence: 5:08

Yeah, zfs is a copy-on-write file system and they call it an atomic write. One of the things that happens with copy-on-write is there never can be data loss, because it's never is not something you would usually use on anything computers. Usually you say, well, maybe this happens or maybe that happens. But ZFS is a very, very complicated but beautifully done system that says there's always a valid copy of your data. We gather up all those transactions and we're going to flush them in an atomic write. So we have those transactions, whatever they are one or many, whatever would fit into that segment. We write them out. Then we're going to create a checksum of what we just wrote out and if that checksum fails, so does that right. We got to figure out why it failed. We can unwind that right and figure it out, but that's it. If once that checksum is passed and it validates, so it's already been written then it's checksummed, then it's committed. Now that data's live, then we can pull out the pointer for the old data. That way there's never a time where it's an in-between.

Tom Lawrence: 6:13

You may lose in a catastrophic failure the data that was in the queue, but you don't lose the data there. So the right is either completed or not completed, and if it's not completed, there's not a partial right. That would leave you in an invalid state. So with some files actually many of them they will pick up and delete that pointer and start writing. And this is how you get data corruption. If you just unplug the ZFS system, just pick it back up, you'll lose transactions in flight. But those transactions in flight were either committed or not, and if they weren't committed, they're just not there. You always have the previous commit, completely full integrity. This is why, if you have memory that's going bad, that would corrupt the data, absolutely, you're right, but that data won't pass a checksum. You can't checksum, corrupt data and validate it. Hence the write will not complete and the system will lock up, crash or whatever happens, but your last known good write exists. This is the entire premise of a copy-on-write file system.

Ed: 7:13

So you know it doesn't matter. You know, especially for a home labber, that you haven't got ECC RAM because it will either work or it won't, with the data writes, like you say, if the checksum doesn't pass, you know you've got some bad RAM and you need to check the RAM and maybe get some more.

Tom Lawrence: 7:28

Yeah, and where this can help. If I am busy writing from my computer to my NAS and my NAS does have some memory that is having a checksum problem and I don't have ECC, I could lose the data heading over there. But I'll have whatever data did make it. There's not an in-between, there's not corrupted data on there. The data is missing because it didn't make it. The system died prior to it. So that file I'm copying over there didn't make it, but the last version of that file if it already existed, if it was a change, it made it. The integrity of that file does exist.

Ed: 8:00

And can I ask you about how that would pan out with ARC caching and maybe you can just explain to people listening what the RAM ARC cache is? Yes, and if you had bad RAM and there was some data in the ARC cache that was going to be read out to a client, there's some stuff that's there that's being accessed lots of times to a client. There's some stuff that's there that's being accessed lots of times. Could that be corrupted in the ram and then kind of given out? Or would cfs somehow know that that's not the correct? It's been corrupted in the art cache you know I've never had it happen.

Tom Lawrence: 8:35

Um, when reading from the art cache because this is such a hard thing to simulate could data that's in cache, because now it's been read from the drive where the integrity was validated, but be held in memory, where the program running obviously does a level of validation? But could there be a bit flip in between? Possibly? Yeah, that would be. I can't read the file. It doesn't mean the data itself is corrupted, it means my version of the data pulled from cache. So that is possibly an issue. It's something I guess could happen. It seems unlikely but I would definitely put that on the could.

Tom Lawrence: 9:12

The problem is if someone was reading a file and a file was corrupted and they're scratching their head, but they read the file a second time and it's not corrupted. And if I go to validate it, like they copied the file and I want to repeat the process if it doesn't corrupt again the same way the second time it's run because the data on the disk was never affected, that becomes very challenging because it's such a hard to produce problem, you know. So I can't say that has never happened, but it's one of those things that would be extremely hard to reproduce because we can't get a consistent bit flip. But I've been trying to find because we were and I want to do this uh test again we were yanking memory out of systems, um, live systems, trying to break a zfs system.

Tom Lawrence: 9:53

I did this in a video a long time ago and I want to do an updated version. I got some old hardware laying around and I'm looking just we're gonna zap it with electricity, we're gonna do everything we can to try to corrupt data in flight because I want to see what happens on it. But it's shocking how much ZFS will survive, because that was how I'm fascinated. We did this as a long before I understood more about how ZFS worked. What sent me down a rabbit hole of how did this work was me smashing on it and taking and pulling cards out. Going stupid thing keeps recovering. Why does this keep working? And that was like forever ago.

Ed: 10:27

Talking about that, I remember I saw a video of yours years ago. I think you were reviewing some TrueNAS system from iX Systems where it had two motherboards in it that were connected to the same backplane and you could actually pull out a motherboard and then carp would let you know that the one's down and switch over. I thought I thought that is so amazing. I never knew that you could share a backplane. Yeah, it's two different computers. Really, that's pretty cool.

Tom Lawrence: 10:56

it's really clever the engineering they put into that. Um, it does require that you're using the sas drives on there, because the sas drives have dual connectors so they're able to take that backplane, split it into two boards. I had to get permission from the engineers. I said look, I'll do it once in the video. I said, because normally you want you to power down the motherboard before you pull the motherboard out. I'm like but I just want to pull the motherboard out. I want to show that I want to simulate a catastrophic failure without destroying your motherboard. You're like well, if you don't do it many times cause they said it could actually pulling it out, you are Cause it.

Tom Lawrence: 11:25

I forgot how many Watts. It's pretty heavy wattage on here. They're like it will arc internally. You can't see it, but when you're sliding it out, when those connectors pull apart, there's an arc and they're like that's why we want to power it down first. Uh, so I did slide it out already synchronized. They keep the two motherboards in sync with each other at very high speed so the other motherboard doesn't have to pick up where the other one left off. It's always in sync, mirroring everything, just not committing it, but when it realizes the other motherboard's gone, then it starts actually committing the data. It's just watching all the data, seeing all the open files, and then it goes from there.

Ed: 12:05

And with the virtual IP that they use. Was that kind of like set inside the actual server or was that done on the switch or firewall? The virtual IP for the system, that's all done on a server.

Tom Lawrence: 12:18

I believe they're using CARP on the back end of that. So the two motherboards, they they have a back plane I forget I did state it in a video so you have like 10 gig or 25 gig connections on there, but the motherboard themselves also have an internal ip um that they share and that's connected at like 100 gig or more, because that's one of the limitations they have to overcome. And the newer ones because that was a few years ago are are probably faster. They actually have to create even though it's done via virtual network, there's an internal network connection that binds these two boards together, because that's how they talk for all the synchronizations. They're able to talk to the SATA, but they actually have to synchronize all the information, what's in the ARC cache on each one, the memory synchronization. That all has to occur at incredible speeds to get the backend to work. So it's a it's a pretty good feat of engineering.

Tom Lawrence: 13:09

So when people ask me why there's not an open source version of it, I'm like, well, they're using open source stuff. It's not that there's not an open source version, it's that it does require their hardware to work. You can't just get your soldering gun out and tie two motherboards together very easily in the way they did with the interconnects that they did. There's ways to make this work with tools like Linbit and stuff like that. That's how Synology does it, and there's completely open source ways to do it, but not to the same level of synchronization, because it's the combination of the engineering that went into the hardware and the hardware designers working with the software people to put in all the pieces in the right place.

Tom Lawrence: 13:45

It's not actually software that's closed source. It's just not something they give away, because it only they give it away with their hardware, but you can't get a copy of it because it's a. It's a slight. It's true to ask, with some modified code in it, to support a custom built motherboard, as people always ask me. Why does NauticSystem give that to me? I'm like they tell you how it's done. You can look at all the tools, but do you have that motherboard? Then you can't do it.

Ed: 14:12

Yeah, I always thought that was like a really kind of cool thing.

Tom Lawrence: 14:14

Oh, it's very, very clever. I really think it's neat.

Ed: 14:17

I didn't mean to get kind of sidetracked off there, but can we just talk about ArcCache again? If you could just explain to people who are kind of new to ZFS exactly what it is?

Tom Lawrence: 14:27

It's an adaptive read cache. It is an extremely intelligent system for figuring out at the block level. So when I say block level, the way ZFS works and the way ArcCache works, it's not trying to figure out. Did Tom ask for this video file, dot MOV? It's not looking at that. It looks at the blocks that were requested and it says, hey, you requested these blocks, let's put these in memory. And if you want those blocks again, or really anyone connected to the server wants these blocks again, they're in memory. Each time it's constantly shuffling which blocks are in there.

Tom Lawrence: 15:03

This makes ZFS so much better because it allows you to do things like when I'm pulling a video edit as soon as that first time that's read. When I'm doing some video editing and if you have like a, my server's got 64 gigs of RAM and my projects aren't quite. You know, one project file may only be like a 20 gig file that it's reading in there. That 20 gig file, all the blocks to it are all in the cache. So what I'm scrubbing along on my video editor, it's just smooth as butter. It's pulling it faster than the drives because it only had the read from the drives once.

Tom Lawrence: 15:36

You will see frequently the adaptive read cache. In terms of percentage, you'll see this like it has an efficiency number. Essentially that you can see with ZFS statistics. You're like that's a really high efficiency number. The cache hits are somewhere like 80, 90% all the time because you're constantly requesting the same files. Same thing when you use things like a storage server for virtualization, if you're pointing this with NFS there and you boot your virtual machine and if it's a small Linux VM, you'll find most of that virtual machine seems to end up in the art cache and so it also performs really, really fast and uh so it's very with the vm tom.

Ed: 16:13

Um say you've got the vdisk and that goes into arc. Say the whole vdisk won't fit in the art cache. Will it cache only the blocks of the vdiskC?

Tom Lawrence: 16:22

that do, yep, which is really cool. Over time it'll flush out the parts that aren't used. So you think about the way a virtual machine works. It boots up and needs lots of things. I have to read the VDISC into memory. But over time it's running an Apache server, maybe a SQL server. It's got these same things it just asks for over and over again. This is why I made it clear, like to make it clear, that it's only looking at the blocks. It doesn't have to understand what the files are that that's requesting. Matter of fact, it has no idea because all it is is this large VDI file that is sitting on a disk. But it goes. It keeps asking for these blocks. These blocks are asked so frequently.

Tom Lawrence: 17:02

The adaptive read cache that's why it's called adaptive reading. It puts those blocks that are most popular right to the top. It goes. These are the ones that keep getting asked for. It doesn't really have to know why, it just knows give them back all the time. So it's constantly doing that.

Tom Lawrence: 17:17

Also, once the blocks are off there, if you're using encryption, I pointed out some bugs that came when they moved over to Linux that they've since fixed. But it'll actually decrypt them once they're in memory so you actually don't have the CPU load the second time either, because we don't have to do extra calculations. So the adaptive read cache will actually unencrypt. If there was encryption on the ZFS file system holds them in there and it just makes those VMs. I mean, when you run a large amount of VMs, especially if those VMs happen to be very similar to each other, it may find, as long as the blocks match. If someone has a request and it's a matching block, that matching block gets sent to whoever's asking for it because it's like well, this is what was asked for. So it's really. It's complicated and beautiful at the same time, but it's amazing how well it works and how efficient it works from a storage standpoint it's making me think of something in the unraid world.

Ed: 18:07

Obviously we've got multiple different file systems. We can have xfs, butterfs, now zfs. A lot of people run media servers and people ask me should they make their like, say, movie share to be on a zpool in zfs? And I normally tell them no. On a zpool in zfs? And I normally tell them no, my reasoning being is you're not going to watch the same movie over and over again, so you're not really going to be getting any benefit from the art cache. But if you are watching a movie, you're going to fill up the art cache with data that has really no point in being there. Yep, so you're better off to have data sets only for things that can really benefit from an art cache on a read kind of point of view. Would you agree with that, tom?

Tom Lawrence: 18:56

yeah, I mean, their cash is very automated, so it's not a big deal to stream a movie. But, like you said, unless everyone, unless more than one person is going to watch that movie or you plan to watch it again, uh, it may not be some 4k movies, though they're like 50 gigs, aren't they?

Tom Lawrence: 19:10

yeah, yeah. So a pretty good high density, you know a good bit rate. A 4k movie. Yeah, I mean the recast will read it, but it's not going to hurt anything. Uh, it so it just you're not going to see a lot of benefit from it because it's doing everything you can to do a read cache, but it's not like you're going to request that file again, versus with video editing, it's going to request it and I'm constantly requesting it as I scrub back and forth because you do linear editing but you also go I need to jump back to the beginning of this file. So I'm streaming the beginning of the file again, the end of the file again, so you might see a little bit more on that use case.

Tom Lawrence: 19:46

It's the same thing for people who are doing editing and things like that. If you're editing just documents, for example, great, you got that directory listing fast. But it doesn't really matter if I open up a word document and it's cached the first time I opened it, but the the it's not that big of a deal later. So it doesn't doesn't benefit in those situations as much. Okay cool.

Ed: 20:07

So I wonder if we can move on and just talk about pools and v devs, obviously pools being a collection of devices or disks and v devs being the kind of bits of those groups. Yeah, yes, so obviously there's various setup options, like raid, z123, like a zero kind of striping and mirrors. Yes, yes, I was wondering if you could just explain to the audience the differences in those different V dev setups and you know what's the kind of like pros and cons and different use cases for each.

Tom Lawrence: 20:44

If that's all right, yeah, so with the V dev setups and we'll start with the pool. A pool is where it all begins. You grab a group of drives. They're all belonging to the pool.

Tom Lawrence: 20:56

The first subdivision doesn't have to be, depends on your setup and what you want is the VDEVs. We have the data VDEVs, we have special metadata VDEVs, we have the ZIL VDevs and we have the read cache VDevs. There's also a deduplicate one, but we'll, just for simplicity, don't use. You're not that likely to get much benefit out of deduplication, so we'll skip over that one. But if we have, let's say, 20 drives, you don't want the VDevs for the data to be generally wider than 12.

Tom Lawrence: 21:28

So if I had 20 drives given to me, I would say let's build two VDEVs, a group of 10 and a group of 10. That makes up all of my data disks and I get the benefit on these data disks of the read performance across all of these and I get some rate performance benefits because of the way the transaction groups and the way ZFS is going to split the data between these groupings of VDEVs. Now, with each VDEV you set the RAID Z type Z1, Z2, Z3. Obviously, mirrors are kind of obvious that you can group all of these, you could actually take those same 20 drives and make 10 sets of mirrors. That's the mirrored option that we mentioned. There's some advantages doing mirrors, because it is one of the ways that makes CFS more expandable, but if you are looking for storage efficiency, you'll realize that's a lack of storage efficiency because you put them all in mirrors. You get better storage efficiency by grouping them together in the wider, as we refer to them.

Ed: 22:26

Do you think the performance is better if you had them grouped in mirrors, like over the 20 drives?

Tom Lawrence: 22:30

no, I, you know I gotta do this calculation again. I believe that used to be a more performance advantage, but in modern zfs I don't think there's the same performance advantage that mirrors have over the wider ones now. Um, because of the way it has to do the parity calculations for each of those uh mirrored pairs. I believe that actually is where the next bottleneck comes in, that they have now made better with the wider ones. I have to double check that to see exactly where that is in terms of performance here. But I know this is one of those things that's changed over ZFS.

Tom Lawrence: 23:01

And this is also a challenge when you're Googling, because you'll find someone with a really good form right up and then you check the date and you're like, oh, this is from 2012. Does good form right up and then you check the date and like, oh, this is from 2012, does that still apply to 2024? Well, there's been. It's not like zfs stood still. It is rapidly accelerated to be, uh, a more and more performant system with more and more people working on it, and that means some of those bottlenecks that we had before have been solved. But the expansion problem is what pushes a lot of people to putting everything in mirrors, so, um, in mirrors.

Ed: 23:32

You only need two drives to expand.

Tom Lawrence: 23:34

Yep. So if we took 20 drives and we built 10 mirrors and I want to add a couple more drives, I can just add two. I can keep adding drives, two at a time, so that would be a great way to expand. But as you start mirroring each pair of drives, you do realize that there's a cost of what we refer to as storage efficiency on there. There's also a little bit of risk of, you know, losing more data because each drive is mirrored. So if you lose a drive, the only other copy of that data is on that drive and it has to mirror back to that same drive right there. So it can be its own challenge. Back to the wider VDEVs, though. So it can be its own challenge. Back to the wider VDEVs, though.

Tom Lawrence: 24:09

The popular way to set this up is Z1, z2, or Z3. Easiest way to think about those. So if I have two VDEVs that are 10 drives each, if I have it set to RAID Z1, I can lose one drive and all my data is still there. If it's Z2, I can lose two drives, and so on. With RAID Z3, I can lose up to three drives, and some people go well, I'm just going to go right to raid z3 and they realize, because you now have three parity drives, you have lost the amount of storage you have available. And then you're like, oh, but I want, I want more space because I have a lot of movies I want to put on here or a lot of data I want to put on here. So you're always when the little pull down that lets you choose, when you're building a pool, you're always like, oh, which do I want? Do I want to risk it with one or two, or go three, where I have great confidence, but I can't store as much and I don't have the money to buy more drives.

Ed: 25:03

If I could ask you a question, tom, your opinion. Say we had six 10 terabyte drives, yeah, okay, and we could make so that's 60 terabytes, and we wanted to have two disks worth of redundancy. We could make a RAID Z2. Yep, so we would have 40 of redundancy. That would be in one VDEV. But we could also create the pool with the same drives and we could make two v devs of raid z ones. Yeah, so we've got two group. You know, in each group of the v devs one drive can fail. So we've still got 40 terabytes of usable space, got the same amount of redundancy. But obviously we can't lose any two drives out of the six. We can only lose two drives out of the six.

Ed: 25:46

You can only lose two drives out of the sorry one drive out of. You know, if we lost two in one VDEV we'd be kind of screwed really. But what's your opinion on what would you choose if you were going to choose?

Tom Lawrence: 26:00

And it's easy to answer. It's literally how my system is set up. I have eight drives in a RAID Z2. That is my one of my primary workhorse areas, where all my critical data is right now, and that is a common setup. Eight drives in Z2 is good. I chose that over. You know I could have reduced it Because there's actually 24 drives in one of my main production servers. There's 24 drives in one of my main production servers, but I broke them all down into blocks of eight and then set them up at RAID Z2 for levels of redundancy.

Tom Lawrence: 26:35

You start worrying a bit because these drives are just so big and when you only have a one drive of parity and something goes wrong, you have a lot of write pressure on the other drives to rebuild it when you replace one that goes wrong. So they get really busy and at the same time you're adding new data frequently to this. So having extra drives of parity to rebuild that is definitely more confidence, because it can take a long time to rebuild it and you don't want that risk happening. The other problem frequently is we've all bought the drives at the same time from the same batch. I mean I do everything we can to randomize that, but you will find in batches of drives where you buy them at scale. They seem to like we had we do a lot of these, like 45 drives, xl, 60, 60 drive storage servers. We did a couple of the 30 ones too.

Tom Lawrence: 27:27

And man, we we saw a lot of these. They all worked great, except this one that we sold. We bought two of them for a customer. They needed a 30 drive unit at site A and a 30 drive unit at site B. Those drives had problems at both sites and we're like they all were bought exactly the same time. We were like we're to suspect we had, just right off the first month of production use, two drives had failed out of each side on each one of these sites and we're like they're physically separated from each other. It's just a bad run of these drives, is our guess. We got them replaced under warranty and, knock on wood, they were fine. But man, it's dicey for a minute, because that also had like 50 terabytes of data on each one of these. So the rebuild time was, you know, because it was in production use, it took, I think, a day or two to rebuild having the separate v devs again.

Ed: 28:14

But I was talking about the kind of six drives. If we wanted to expand the pool, we'd only need three drives to expand the pool right, as opposed to needing six and sometimes that's why you do it is um, that works out really well where you just want to expand in smaller units.

Tom Lawrence: 28:31

And yeah, that's an excellent reason. You know my flash. I work off of a series of flash drives which of course are more expensive, so those are in Z1 and I have a replication task with ZFS. Replication allows me to have a copy of the data. It's sent to another server all the time and so I don't worry about it being a Z1. I have a copy of my server all the time and so I don't worry about it being a Z1. I have a copy of my data all the time. So that's kind of how I can justify some of the Z1 setups that I have is, yeah, it just copies.

Tom Lawrence: 28:59

And ZFS replication is a great feature of ZFS, where we say, take a snapshot and replicate this at this destination being another ZFS place to land, and because zfs replication works at the block level, it's saying whatever blocks change. So even if I shuffle all the files into different folders, that's not a block change, that's just a pointer change. So that means a couple kilobytes of pointer data needs to be synchronized, which means as I move my videos from, I have like a folder process where I move them around, I don't end up re-syncing a bunch of data because it doesn't see it as new data If I rename a file. That's the kilobytes of difference in renaming a file. The large files I have for my video tutorials are really unchanged outside of their name. So ZFS replication is constantly running well, running on a cycle, and so I don't worry about it.

Tom Lawrence: 29:48

If one of those drives failed, I go well, I'll replace it, and if I couldn't rebuild it, I know my data is safe and secure snapshot it on a larger, much slower system. This is where it's landed. Back to that. Why does Tom have 30 terabytes of storage on a server with only 8 gigs? It's just a copy of all my data, just an extra copy. So if it all failed, um, it would take me a long time to recover off that box because it also only has a one gig network interface. So 32 gigabytes over a network interface of one gig yeah, if you do the math on that, you're like, okay, you're not doing videos for a day or two. I'm like, yep, that would happen if I needed all of them back um, that brings me to a question, a user question.

Ed: 30:28

I'll have to apologize to the user because I don't have the question right in front of me, but I do remember it. What they asked was is there a use case you could consider where maybe rsync replication could be preferable to ZFS replication?

Tom Lawrence: 30:45

Using rsync. The problem with rsync is the quantity of files matters a lot. So if you're synchronizing 100 files r and try to sort all that out and synchronize it over to where the destination is, so our sync can really slow down, especially that initial. You watch our sync. Just pin the CPU and index and read all the stuff off the drives to try to make sure it understands where all the things are, and it has to do the same going what changed since the last time we ran rsync and then send all those synchronized changes over ZFS. Snapshots work instantaneously it's like immediate, and then figuring out the differential from the last time you sent is so immediate.

Tom Lawrence: 31:45

There's no comparison in the speed of how rsync works works versus zfs. There is, though, one exception um, when you are doing things like I had this, I had to unencrypt some data. So if I built an encrypted data set, anytime you use zfs replication, it's always replicating that data set fully, the encryption, everything in it. That means if I want to unencrypt it, I got to copy it. So I'll set up an rsync job to go from an encrypted data set to an unencrypted one somewhere that solves it. The other time is rsync is great for when I need to copy it to a non-ZFS system, and that happens quite a bit. So I do have.

Tom Lawrence: 32:25

I've done some tutorials more recently on rsync to show that, and it's almost always because rsync is a nice universal language for synchronizing files for things that aren't running ZFS. So those are great use cases for rsync. Like my Synology, my Synology doesn't use ZFS so I can talk to that, or Unrate, I'm sure, has an easy way to talk rsync. So if you need to move the data between those two servers, that's the common language that they would most likely use to be able to move that data.

Ed: 32:50

So I'm going to bring us on to another user question as well. I've actually got this one in front of me so I can read it out exactly. This is from someone called Nali. He says what is the maximum number of spinning Rust drives that you'd ever put into a single VDEV and how would you describe the best Zpool topologies for different performance requirements, ie, iops and throughput? I know you've kind of answered some of this already, tom, but what's the maximum amount of drives you would put in a single VDEV?

Tom Lawrence: 33:19

I think the largest recommended right now is like 12. It's a little fuzzy because they're supposed to be able to go a little bit wider in the latest iteration but no one really wants to seem to test that. I got to find the time where I have one of our and I'm hoping we have a client that's supposed to order another XL 60 that we get to play with for a little while. The problem is just finding a time to. I'm usually from the time they come in. I got to get them into production so I don't have time to do because the testing takes like weeks. Because you build out the array, I have to put the data on there and then I have to do performance tests and I have to rebuild the array, which also means recopying all the data. So it's been hard to deep dive into some of these. What happens when I make it too wide? So, generally speaking, 12 is as wide as I want to go Most of them because we're just nature of those 45 drive XL 60s. We're usually building them out in 10 wide. Because you have 60 drives. Six groupings of 10 at Z2 offers you a reasonably good performance spread across those and I think it's it's a good way to set it up.

Tom Lawrence: 34:21

I have seen I completely ran across someone I think they had 120 drives a consulting call that came in. They made one V dev. I was they had 120 drives. A consulting call that came in. They made one VDEV. I was like amazed it worked. But that's of course why it was a consulting call, because it wasn't working right and they had a drive that failed and they couldn't get it to re-silver. I'm like, yeah, I don't know if it will. That's why I told them I hope you have a backup of all that data on there 120 drives man.

Tom Lawrence: 34:47

Someone. They just never thought about it. I don't know they had got some. They had just they bought a series of used equipment, jbods, and just stacked them all together and I was like it was some incredible number of drives. It feels like 100. But yeah, it was a consulting. I'm like, yeah, you're going to they. Just we actually didn't do the consulting, they just was a few emails back and forth about it. They're like, yeah, it's not performing right.

Tom Lawrence: 35:09

And I'm like, well, explain, your V dev set up and like it's one V dev. Well, at first he said one pool, I see you have how many V devs. And they're like one. I said, no, I mean V devs are like, yeah, them, and you're going to have to find somewhere to offload the data. And then I go, I was hoping you would have a way to do it without offloading data. I said, uh-uh, uh-uh, there's not a way to re-slice VDEVs without resetting them. And they're like well, then we don't really need you for consulting, because we thought we needed you to solve that problem for us. And I was like, yeah, unfortunately it's not a me problem me on to another question.

Ed: 35:50

Um, when expanding zfs storage, obviously you know we can expand the vdevs or we could just make a new pool. That's another way we could get more storage. But the disadvantage, I guess, of a new pool is everything that's kind of using the current pool and you want to expand the storage. If you make a new pool, well, it's not aware of that and you'd have to kind of transfer things right. So.

Tom Lawrence: 36:07

But I guess there are kind of some use cases where a new pool would be the better way to go, I guess, tom yeah, sometimes it is, um, you know, usually when we're replacing systems, where it's just all the times it's a whole physical new server um, more so than making a new pool on an existing server. But for performance reasons you don't want to mix and match two drives because you get unusual performance. Now I have a video I did called Imbalanced VDEVs, where I show where you can do this. You can use different drives on there and ZFS will make a best effort to figure that out. Especially when VDEVs are different sizes. There's ways to make that work.

Tom Lawrence: 36:45

But, boy, you end up with a system that's the system is going to do best effort. But now you end up with a weird performance problem. This can even happen when you're buying drives that are completely different from each other. Like I bought these group of drives and then I bought these ones that are faster and newer and now I've got this weird performance bottleneck of some of the rates happen fast and some of them are catching up on the other side another leg here.

Tom Lawrence: 37:05

Uh, so that can be kind of a problem and a good reason sometimes going look, you're going to use a whole group of different drives, just build them in another pool, so you don't end up with this weird performance issue of the system being kind of out of sync where I you know it requests the data from all the V devs but if some of them give a response now and some of them have to give a response later, you end up with some weird performance alignment issues because that's not what ZFS is going to say. I mean, it'll work, it won't crash. It just won't give you the IOPS and throughput that you're looking for.

Ed: 37:35

So when you first make a pool, the best to have all of the same drives, all of the same brand. I shouldn't have, say, some 16 terabyte Seagate, some 16 terabyte Toshiba. They're all 720 RPM, but I chuck them all in.

Tom Lawrence: 37:50

I'll be much better off to just have everything the same and for our business clients that's more common for them to buy a system and put it on a five to seven year life cycle. We always say five year life cycle, but I know I always joke that I tell you we should buy a new server in five years, but six years I'm bugging you going. Hey, did you see that quote I gave you last year? And they're like but it still works. And I'm like, yeah, it's out of warranty.

Tom Lawrence: 38:10

Anyways, this is where I'll actually I'll say something that will make the people listening probably very happy is, zfs is not the solution for everyone. This is where there are absolutely I will fully admit times when you need that level of flexibility because your budget does not allow you to buy 60, 14 terabyte hard drives at one time. Your Homelab budget is work gave me a pile of hard drives and then they gave me another pile of hard drives and I want to use all of them. And it may not be the best solution to use ZFS. It may be better for you to use something that allows you more flexibility with the drives, a system that will allow you to expand later, and that's fine. You know, the benefits and performance of ZFS are awesome, but the reality is I can watch a 4K movie on a hodgepodge of drives with slower performance.

Tom Lawrence: 38:58

What's your goal? To watch movies and have all there or just view your photos? No problem, you can view all your photos. They're not going to be as fast, but they don't need to be. You know your benchmarks won't be as high, but benchmarks are benchmarks. Can I watch a movie is like the bar is. What are you doing with it? Will it fit the use case that you're doing? If it's streaming that 4K movie, awesome, and if it does it, perfect. It's a solution that works for you. I'm always very honest with users about that. My solutions are my solutions, but not everyone has exactly the same needs or demands that I may have. So the other solutions are completely valid. I think they're good use cases and sometimes I'm not going to try and shoehorn someone into a solution that's not going to fit them.

Ed: 39:40

I'm not sure if you've ever tried it out at all, tom, in OpenZFS I believe in kind of some beta form you can actually expand a RAIDZ, raid z1, raid, z2 pool by adding a drive. Um, it's not in the kind of mainline open zfs, but it is a beta beta feature. Have you ever tried it at all?

Tom Lawrence: 39:56

I have not. It's been on my to-do list to play with it. Um, that it comes down to time. I usually wait till things come to more production, because then I can do videos on it. Uh, I spent so much of my time doing production level work. I want to try the shiny new things. I really want to try D-Raid, which just now made it to production, which is a I can't explain it.

Ed: 40:17

That's only for over 100 drives normally. Yeah, it's for the large-scale deployments.

Tom Lawrence: 40:23

D-Raid's really cool. I have played a little bit with it. I do know it does have some performance disadvantages for certain scenarios, but it's one of those things. I haven't had enough time to really flush it out. That's why I have people ask me what are you gonna do a D-Rate video? I'm like I've learned how complicated the more I play with it, the more I realized what I more my knowledge gaps are with it, and I tried to make sure I'm very thorough on knowing something, so I'm not explaining it improperly. So, yeah, but D-Rate is one of my next things I'll be diving into, which means eventually I will be getting to the ZFS expansion. I'm excited for it. I'm loving to see it.

Tom Lawrence: 40:54

To my knowledge, it's all a one-way operation. Once you expand there's no way to contract, which makes sense. But yeah, it's not something I've really spent the time testing. But I want to, because I want to actually set up a Debian server with the beta everything of ZFS on it, because I have some things I want to test, because they also have some updated. See, deduplication has a huge penalty in terms of performance. Obviously, in concept deduplication is great, but it's so limited in its use case. But IX Systems and Clara Systems did a giant code commit to really enhance how that works, and I think they said they made the performance something like 20 times faster. But that's once again early beta. I would like to actually see how that works, and so it's on my to-do list to test all these fun things.

Ed: 41:42

I wonder if we can just move on, tom, to kind of like the special supplemental VDEV. I know there's one called a special VDEV, special VDEVs, but I'm meaning special VDEVs that aren't data VDEVs. I'm probably calling it the wrong thing, but this is something that's just recently been added into Unraid GUI with the L2 arc, the slog and the special vdev. So basically the l2 arc is an expansion of the arc cache. Yeah, so for read. So if you don't have enough ram to cache, you would add one of those drives. I'm guessing you wouldn't need a mirror for um L2 arc. So basically you know you're getting more things cached. But I've also heard it can be detrimental to some use cases because it will use more RAM as well to keep track of what's in that cache.

Tom Lawrence: 42:39

It's one of those things that may not help you as much as you think. Always my first answer if someone says the more performance I want on ZFS, how do I get that performance? It's always add RAM. Ram is your first line of defense when it comes to performance. Like you want more performance, just keep throwing memory at it.

Tom Lawrence: 42:54

We frequently sell servers. I mean I have a few of them. I think we have one of them that has a Terrapaker RAM in it for a client because of all the stuff on there. So I mean, and ZFS uses all of it because it's so many things coming in and going on that particular server. But if you have a series of spinning rush drives and you pop an MVME in there to be your L2 arc, that can be pretty helpful because MVME is going to be faster than the other drives. But you also have to have those frequently accessed files and as long as those files are frequently accessed, they will be pulled off of that drive and, yeah, you can definitely get a performance boost out of it. So that is one way to enhance it. I don't know that the detriment's that bad of the memory to keep track of what blocks have been moved to that. I can't imagine there's that much of a detriment to that. I've never really calculated just how much RAM is allocated to that.

Ed: 43:46

Okay, that's good to know. And then going to the slog, which is a write cache.

Tom Lawrence: 43:58

Yes, but kind of a write cache. This is where it's really complicated and the problem with a slog is it is ZIL or the intention log and it's what you intend to write. It's only really something important. If you're doing synchronous writes, like with nfs, it'll do synced writes versus unsynchronous rights.

Ed: 44:12

Unsynchronous rights will commit synchronous rights to people listening who might not know, is when it needs confirmation.

Tom Lawrence: 44:18

It's actually been written correct yes, and in z, in the zfs, you could turn that off and it'll cause uh tools like NFS to lie. It's like I like to say. It says the tool, whatever's connecting over NFS, will say this has to be a synchronous right, because NFS has an option to say that, which of course things that care about integrity tell you know virtual machines, database connections. They want absolute synchronous. They want to confirm it's committed before they flush it from their memory. So ZFS can be told to lie, say just tell them it's committed even if it's not, and that means you could lose up to a few seconds of data. But there's a double write penalty because first we have to write out the intent log to our data VDEVs and then we have to actually make the full commit of flushing that on there. So synchronous writes come with this double commit problem. That really slows you down, like substantially. And the way to get around that is to have something faster. The goal is always to have a ZIL that is substantially faster than what you're writing to. So once again we'll go use this Rust scenario and we want to use an MVME and now we're going to get some performance benefit out of it. You're not going to get as much performance benefit out of a standard SSD. You get some, but not a lot, where you wrote to the intent log that allows the data to be streamed out to the drives in a transaction commit.

Tom Lawrence: 45:38

If you're running and I have a video I was working on for this, I just haven't finished it I was showing that if you plug in a mvme along with fast ssds there's barely anything that the zelle helps with. It doesn't do as near as much because the ssds are already relatively fast. The uh there are solutions. I think, um, like some of the uh, was that the optane memory, where it's even faster than mvme. Then certain motherboards allow that. I think jeff from craft computing did a video on that, showing how you can parse that out to be your ZIL memory because it's even faster and now you have a better command. The other option is having ZIL memory that is, a high-performance cache designed to be ZIL, which there are ones like this. There are different drives that are enterprise drives with very high queue depth and that will help with performance as well.

Ed: 46:27

Are those like special drives that use RAM and they have like a battery kind of backup?

Tom Lawrence: 46:31

Yeah, there's ones like that. I can't remember if it's. There's a couple of names of them out there. Matter of fact, it's funny because you can buy these secondhand now on eBay. If you type in like ZFS, zill in eBay, you'll actually find used enterprise versions of these. Wendell from Level in Texas actually talked a lot about these drives, of how they're so affordable, and he's got a whole video breaking down where to find them and things like that. Because their prices you know what was enterprise expensive of yesterday is home lab of today and still pretty performant.

Ed: 47:05

And for the Zil you don't need a particularly big drive, do you?

Tom Lawrence: 47:09

No.

Ed: 47:09

I think some people, they think they need something really large, but you don't at all, do you?

Tom Lawrence: 47:14

No, clara Systems actually has a good write-up on this like a calculation for how big you need. But it's only the amount of data that can be written in a couple seconds because it flushes every few seconds. So it's only the amount of data that can be written in a couple seconds because it flushes every few seconds. So it's only some amount of data that can be written over this short period of time. It does not hold all the writes and then commit them. So it's not like I'm sending a 10-gig file and it writes a 10-gig file and then commits it later. No, it's sliced up to every few seconds. It's running because it's a flushing process.

Tom Lawrence: 47:42

This is part of that atomic rate process with ZFS where it grabs these transactions, all these little data bits that need to be committed. It's now committed some to Zill and some to the data and some in memory, and it commits them and does that right and it starts that process over again and over again, leave it in memory, but then I yank the plug out that transaction group. When I yank the plug out of the system, or if it were to suffer a catastrophic motherboard failure and wasn't one of those fancy dual motherboard systems, it will lose that in memory transaction group. But if you had a Zill and it was written to the Zill at the same time, it does have one more place that next time you start up ZFS it goes hey, is there enough information on that Zill that I can rebuild that transaction group? Why, yes, there is. That's why we wrote it to two places on there.

Ed: 48:30

So there's definitely advantages to having it, but you've got to think about what that rift is, I guess for things like databases it's really important to have something like that. But for the average kind of person at home the home labber, you've just got your average kind of Word documents and things like that A few seconds worth of data is not going to be a really big deal.

Tom Lawrence: 48:49

Exactly. When we talk about databases, they're very transactional and you can't lie to a database server and expect it to go well If the database says someone purchased this product and we want to commit this data, and it says I want a synchronous write, commit and you didn't. You told it it was committed but then something happened when it wasn't. Now we just lost that transaction and from a business standpoint that's terrible. So when it comes to things, databases specifically, there's no exception. Those ones always get sync writes on and for performance reasons, because databases are generally tons of small transactional data, you pretty much universally put zills in those and just to make everyone's life a whole lot better, everyone can sleep a little better at night because man unwinding a broken out-of-sync transactions on database no system in wants that.

Ed: 49:41

And also just for people kind of new to ZFS, you can actually choose per dataset which has synchronous and asynchronous writes on the dataset level, can't you? It's not on the whole pool level, Correct?

Tom Lawrence: 49:53

That can be chosen on that level and you can choose the block sizes on a per dataset level. Because if you're doing a database, it may be better to set the block size smaller to the commit size that matching the database, or if you're using larger files, it may be to go bigger. By default, I think ZFS datasets are set to a block size of 128k. I think that's actually to kind of a generally accepted default which works fine. It kind of like we put it in the middle of the road you're going to get pretty good performance out of there. But if you have some use case that has a need for higher levels of performance, then yeah, you can tune it to align better to your use case.

Ed: 50:31

And I've got a kind of bit of a slightly off-topic question. If you were going to set up, say, a VM, would you rather have a dataset with a VDISC in it or would you rather have a ZVOL? And why?

Tom Lawrence: 50:43

So this varies. If I'm using iSCSI, that's going to be a ZVAL, because iSCSI presents block storage to other devices. So iSCSI I've. Actually my gaming system is tied to my Chirnast because I have just a smaller boot, mvme, in my gaming system and then it does an iSCSI mount, so that's presented as block storage between just going to be a ZVAL on the back end and it performs really really well. All my Steam library is on there and all my games lit up really fast off of that system. So that setup works quite well and so, yeah, zvals are really good for that.

Tom Lawrence: 51:18

If you want to run a virtual machine directly on a system that also has ZFS, same answer. You probably want that virtual machine running as a ZVAL in there, but swing it to the other side. I use a lot of XCPNG. I use NFS shares for doing my shared storage, and so that's going to be in the data set because I'm letting it handle creating all those disks. Now, the nice thing is, when you're doing it that way, it's easier for me to see. So if I have five VMs, I also have five disks that I can see on that NFS share. So if something goes wrong or I want to snapshot those disks or rewind to a point in time on the ZFS side. I can look at those snapshots and you know it gives them a UUID, you know. But I can reverse that and go okay, I know that UUID is related to this virtual machine. I like being able to manage them in that way versus the block storage way.

Tom Lawrence: 52:11

Well, you have to manage on whatever created the block storage. Zfs just presents the block storage. The other side is the thing managing it. This is where we've had trouble with people who used block storage in Hyper-V. Hyper-v actually has some weird quirks about the way it handles block storage and Hyper-V Hyper-V actually has some weird quirks about the way it handles block storage. So you have to be that much more careful because you can snapshot it but rolling it back. There's a lot of nuance to it. I don't know if it's fixed in the latest version of Hyper-V, but it has some versioning. It does that. It doesn't like going backwards. We learned I learned through someone else's learning experience. It I learned through someone else's learning experience. It was a consulting call. It starts with a TrueNAS consulting and I'm like well, I'm not a Hyper-V expert. Oh, this is completely a TrueNAS problem. Actually turned out to be a Hyper-V problem.

Ed: 52:54

Well, thank you very much for that, tom. I'm going to move on to some of the user questions. I've got a few user questions, if we can just kind of Absolutely. Someone's just asking basically how ZFS replication works. Well, we've already spoken about that, with it transferring the blocks. But they also ask how do you actually restore from a replicated snapshot? In TrueDAS you can just do that through the GUI, but currently in Unraid you can't. So I'll probably answer this question. There is a plugin called ZFS Master which allows you to take snapshots and restore. You'd need to have that plugin to restore from the snapshot. But also they might be talking about how they'd restore. Well, that would be a local snapshot but a replicated snapshot on another machine. You could just run the ZFS send and receive in the opposite direction.

Tom Lawrence: 53:42

It really is as simple as that. The data can be reversed. Essentially, you just go to whatever process you use to get it going that way and you make that process choose the other way. It depends on how it's set up. There is push or pull, because you can take from machine A running ZFS and you have machine B running ZFS and we want to do a push, but we can initiate from machine A a pull to pull the data back in the other direction. So if you were actually scripting all this command line, that is an option to do it, to go, hey, let's do a ZFS, we push the data there, but now we're just going to update our script to pull the data back.

Tom Lawrence: 54:17

One of the things I do recommend when people set up these backups is think about that that way. If you're having to deal with the backup which usually when something goes wrong with your primary machine, you have to refer to your backup you're under duress. Or if you're doing this in a job, you're put under duress by the people above you going where's our data? So do practice this a couple of times and ask that question prior to it being a problem. And also, I've always said untested backups are just wishful thinking. So test your process, because you can tell it to restore not to the same place. You can pull and give it a new location. That way you know the data will come back from where it was.

Ed: 54:52

And also a similar kind of question. I guess what's the best way to move zpools from one server to another? I guess, in a way of say, you take down your server, you're going to build a new one, is it better to just replicate the data to the new server? I guess if you've got enough drives that would be great. But you could actually take it down, put the drives into a new machine and then re-import those tools in correct yeah, so I.

Tom Lawrence: 55:17

Often it's fun because if you're new to zfs or new to raid let's hear a newer user you think, cool, I can pull these drives out and put them somewhere else. And you're correct. If you're an old user, you remember the days of the early RAID where if you swapped one drive to a different location your data was going to not necessarily be where you think it is and it may not work. Zfs labels the drives that are part of a pool, that pool label. Even if you have encrypted drives, the pool label. Even if you have encrypted drives, the pool label is visible. So I called it Tom's pool. Well, if I stick these in another drive, they're still called Tom's pool.

Tom Lawrence: 55:49

The drives will have a marker that is part of it because ZFS controls it right to the drive level. So there's a marker on there that says we're all part of a pool. It doesn't care what slot they're plugged into or what controller card they're plugged into, as long as ZFS can talk to them. So grabbing 10 drives and slapping them into another system and loading it on there and going, hey, import this tool, no problem, it'll work perfectly fine. It's actually a common way to do that because, let's say, even if I am rebuilding my new drives, I can tie the machines together and transfer the data. Or if I have enough physical room in the box, I take those drives and put the other drives and put them all together. That's going to move faster because I'm moving at the. Whatever limitation of the drive speed is versus drive speed versus network speed limitation between two boxes.

Ed: 56:35

Going back. This kind of made me kind of think a bit. Going back to your video with the two motherboards you can see, I'm a bit obsessed with that. How did it? I guess obviously the software is dealing with it, but obviously the pool wouldn't have been exported properly when you pulled out the motherboard. It would have been, so I guess it just force-exported it in on the other one because it wasn't exported correctly.

Tom Lawrence: 57:00

It's easier than that because what it's doing is the two OSs are in sync. Live they're actually seeing the file shares, so active server and the non-active one. The non-active one actually knows what files are being moved, it knows what shares are being used, so it's actually in sync. So it doesn't actually have to import or export the pool, because it's kind of like mirroring a whole operating system at the same time. It's really that level of clever that makes that work. That's why it fails so much faster than other systems that are similar. Because someone you know I've done the demo with Synology and Synology uses two physical boxes essentially to do it. That has a slower failover time while it passes things over. They keep things in sync. But it's a little bit slower when you're dealing with boxes, when you do the engineering. As I said, they keep a really really fast back plane. It fails over faster because they're more aware of each other. It's not just saying it's your turn to take over, it's like I'm ready to take over as soon as you quit, because the non-active one keeps asking is it my turn? Yet Essentially it asks for it. It's got a pulse. If the moment it doesn't ping back, it goes my turn, I take over. You didn't respond and it makes it kind of interesting.

Tom Lawrence: 58:11

But yeah, I'm as fascinated as you. Like, I love the engineering that goes into that because you know it shows that you have a very technical background. Because when you see something that seems not impossible but improbable in our tech thing, you're like okay, there's a level level of engineering. There's something missing in my understanding of it for how that could synchronize as well as it does. That's what makes it fascinating, versus some people go oh yeah, they stuck two motherboards in there. But I'm like but do you know what that took? It's like when it's like people who think about, uh, aviation. Or like when you learn how heavy a plane is, you think, oh, planes, are you? There's weight savings. When you learn actually how heavy it is, you're like, oh, and it leaves the earth, it flies. You go there's a lot to this, there's a lot of engineering that went into that.

Ed: 58:56

I've got another question from Adam M saying do you have a top five list of best practices when setting up any ZFS-based NAS? Any kind of best practices you can think of?

Tom Lawrence: 59:09

The first one is going to be like the poor soul that had made too wide of a VDEV is don't make it too wide, and that's something that's going to be a huge home user problem. They generally, unless they acquired a bunch of old equipment, don't have 100 drives to build their pools with. But I mean, I know there's Reddit, data hoarders exist and there's certainly people that have an absolute wildly large systems, which is cool. Not making too wide, but planning your backups and testing your backups are always like after you get it built, document your build process. This is something really important. So you understand how you got to where you are. Then making sure you back up your config files. Definitely, once you've taken the time and effort to build those shares, set those permissions, build those users, export that config file.

Tom Lawrence: 59:54

I realize a lot of users, being very privacy focused, choose the encryption option, which is fine. There's nothing wrong. Encryption works well. But unfortunately some forget that if you don't back up the config file or and I do both I back up the config file and then I export the key. There's a key that's created when you say I want to create an encrypted tool and it becomes very seamless because that key is loaded every time the system boots up. But if you lose your boot drives, well, that key's not going to be loaded.

Tom Lawrence: 1:00:17

So making sure you have that backup. And this kind of goes back to the replication problem of when you set up ZFS replication if you replicate an encrypted data set, the destination server only gets the data set but not the keys that go to it. This is a good security where I can me and you can set up a backup. I can back up all my data to you but you wouldn't be able to see it. You'd just see an encrypted data set and I could pull that data back. And because I have the key, I'm like cool, we build a new server and then I could still pull that data back, but it would still be as encrypted as it was for you. So backing up the key.

Tom Lawrence: 1:00:51

So I'm going to say you know, don't go too wide on VDEVs, it's not a big deal. Make sure, when you set everything up, do your backups of things, Make sure you understand, if you use encryption, that you have a backup of the key and then testing your backups. Those are probably I guess it's only four, but I think those are four good pieces of advice. Like I said, untested backups are wishful thinking. So go through that process once you have it set up, to make sure you know how to pull some data back and you can land it somewhere else. That way you're not being destructive to your existing data. But, yeah, make sure you understand how that process works.

Ed: 1:01:21

Thank you, tom. Just to kind of like round off everything, I think we've spoken about CFS for quite a long time. I'd like to just know really what is your favorite open source projects at the moment and why?

Tom Lawrence: 1:01:34

I am actually pretty excited. I've been talking with the people over at NetBird. Check them out. I think it's netbirdio, really simple website. Google NetBird VPN you'll find them. They're a fully open source with a lot of active development VPN product very much like Tailscale. Matter of fact, it's extremely like Tailscale, but open source, not just from the client side but from the control plane side. So I'm pretty excited about that as a project. I just fascinated by it going. It solves that VPN problem. It uses WireGuard. They have a phone app for iPhones and Android as well as like all the other operating systems. So it's not like a half-baked project, it's a fully-baked project that seems to be getting a lot of attention, so I'm excited about that.

Ed: 1:02:16

And what kind of services are you kind of running in your home lab? Like kind of do you run things like you know for your own personal self, like you know for your own personal self? Do you run things like NextCloud? Do you have like a media server?

Tom Lawrence: 1:02:26

Media server. Is MB open source? I don't know.

Ed: 1:02:30

No, Jellyfin is. Yeah, Jellyfin is the one that's open source.

Tom Lawrence: 1:02:33

Someone asked why I wasn't using it. I have a lifetime subscription to Plex but Plex kept causing me a headache. I don't know why I had these weird stuttering issues with that. I did everything in my power to solve that. I could not and I found other people with the problem, which is also aggravated. And someone said try MB. And I did. But I really want to try one of the open source ones, cause I prefer open source. But MB, like I loaded it and it worked and my wife uses it a lot and she's like hey, we can watch movies again. I'm like all right, I use MB as well.

Ed: 1:03:02

Well, I've got MB and Jellyfin. But Jellyfin, when MB went closed source, they just forked it and it basically is MB Okay, you know, a few years ago. Obviously it's developed. Well, it's really nice, definitely worth trying out.

Tom Lawrence: 1:03:19

I'll have to check that one out. I use a lot of PFSense, truenas. Those are probably because that's what all my video production runs on. It has for years, uh, lots of xcpng. I don't have a lot of other. I I try going back and forth. I use next cloud once in a while.

Tom Lawrence: 1:03:36

Um, someone's gonna not like this. I use sync thing all the time and I use local documents. I don't really use the web-based documents. I kind of like if it's a personal document to me, um, not direct business related, which usually runs in one of the cloud services because it's business documents and it's easy to share my personal documents and like notes, I take on things. That's all local files. And I use a tool called LogSeek for all my markdown files, kind of my journaling, because I started journaling late last year and I regret not starting that sooner. Where LogSeek is similar to Notion is what people may be familiar with. That it's not a one-to-one replacement but it's a nice open source tool with a sync thing on the back end moving my files around. But I'm still a local documents with open office guy. That I know probably puts me as weird. Someone's like why aren't you using NextCloud? I'm like I don't really share it with anyone else. It's just my documents. There's no other people to share it with.

Ed: 1:04:29

Yeah, well, that makes sense. Yeah, I've got one last question for you, tom. Basically, I just wonder how you foresee the impact and future of open source projects in the tech world. Do you think they're going to become more or less relevant moving forward?

Tom Lawrence: 1:04:45

I think more. I think we're seeing a time right now where it's got to snap back the other way, the fact that we can articulate so well, like I said with, even Corey Doctorow mentioned the it was that quote. I remember a time before there weren't just four websites with screenshots of the other three, and that is the status I think we have reached with a lot of it, and it's not the way the internet was founded, but I think it's the way the internet can go back to be is a more decentralized system. We have a larger ecosystem of open source projects, like we did in the beginning, because that's what brought us the beauty of the internet. I asked you know in Microsoft like their web servers were like the earliest days of the internet. I asked you know in Microsoft like their web servers were like the earliest days of the internet. If it wasn't for the open source movement and things like Apache, the closed source companies couldn't keep up, they couldn't iterate, they couldn't provide us a product that scaled. So the basis of the internet today is that open source movement of the 90s and early 2000s, of this collaboration and sharing and building it, and then it was kind of taken over by the hyperscaler companies going cool, we can do all this.

Tom Lawrence: 1:05:54

But there's even recently I can't remember the CEO of one of the AI companies. He says, no, the future of AI isn't centralizing it into one big mega server, it's offering open source models that people can run themselves. And I'm like, yes, decentralized AI, I think, is a much better idea, where each of us have the ability to play with these models, load them on machines, find their utility and their usefulness without them being part of some larger company, and we've seen great progress for those. As a matter of fact, some of the open source models are really competing with ChatGPT and things like that. So somehow the company with an unlimited war chest of money from Microsoft that is some of the largest money we've ever seen given to a company. An individual like that is an incredible amount of money, and then they have a product. That's good. But we're also looking at the product we're seeing on the open source world. Wow, I can actually do something pretty similar without having some giant company slurp up all my data, and I can build some utility of this.

Ed: 1:06:51

It's super important, tom, isn't it? You know a company doesn't want to be just having all of its data analyzed by chat, gpt, open AI. They need to have their own on-prem AI where the data doesn't leak out to some other place. Oh yeah, I think that's really important personally.

Tom Lawrence: 1:07:08

Yeah, you know, when I look at this from the perspective, I see which we do a lot of the consulting and we're watching these companies starting to become cloud adverse. First the bill came and they go. The bill's kind of high and every year compute gets cheaper, but the bill doesn't. It gets higher, which doesn't equate, like. There's some people that like they're not technical, but they have a very logical question going Don't computers get cheaper every year? Doesn't storage get cheaper every year? But why is my cloud bill, for the same amount, get higher every year? I'm like, yeah, and then they told us they were better at managing things, but they're in the news for data breaches. So we're starting to see companies. Matter of fact, some of the large companies that have multiple locations are like wait a minute, we can buy two servers and we can put one at location A, one location B and replicate them to each other. Yeah, and it will save us like 20,000 a year in cloud bills. Yes, it's a hundred thousand dollar project and they're like but we can still even put more data because we're actually restraining the data. So it actually is more. Our bill would have been higher, but we decided not to put that data in the cloud. So, yes, and when we start doing that, we can start piecing together.

Tom Lawrence: 1:08:17

There's a great article by 37 Signals. They thought they could save $5 million a year was their goal. We want to save $5 million a year in cloud costs hosting Basecamp, I think they said they sold. After a two-year analysis, they saved $7 million a year, because what they didn't predict is that the cloud prices kept going up, so their delta ended up for how much money they saved. So it comes all the way back.

Tom Lawrence: 1:08:37

How are they doing it, though? Well, these open source tools they're realizing are really good. They're like oh wow, we're hosting all this on open source stuff in our own. They are using a colo, but a colo is not exactly the same as a cloud. They needed some resiliency, but you have some stuff in a colo Colos. Then you can do linked lease lines to your buildings that aren't that far away, where you have another copy of everything, so you've built the same level of cloud redundancy, and that sounds complicated and expensive, but they run the numbers and they go. No, these cloud companies have raised the price to where this is affordable and the open source tools make it manageable. It didn't take. You know the cloud companies are like oh, you'll need a team of people to manage this. It turns out they don't. The tooling has become really, really good in the open source world to make this a much more tenable solution.

Ed: 1:09:23

As well. Speaking to Yosh from NextCloud and he said something that resonated with me. He said basically, one day you're going to have to move your data or keep paying for it forever, so you may as well make that one day soon and save the money. Yeah.

Tom Lawrence: 1:09:40

Yeah, you know we haven't seen the snapback yet, but it will happen. When Microsoft, I mean, they upset a lot of people because they raised the prices of their cloud search, which is pretty much the whole Microsoft Office Online, office 365 suite and everything that is universal in the business world. It's unavoidable for most of these companies, but you had a company that is worth well into the trillions of dollars that says we had to raise prices to make more money and you're like you're already a monopoly. What do you mean? Was there market competition from NextCloud? You were worried about them and you had to beef up your servers. You just raised the rates.

Tom Lawrence: 1:10:15

We've seen your earnings call. You weren't hurting. You're profitable. You weren't losing money every year. Oh, we offered this cloud service too cheap. We're just bleeding money. Man, we only bought three yachts this year, but we need a 15% 20% raise in rates. You're like do you need four yachts this year? You seem to be quite profitable. But that's what these companies are doing. When they get that kind of market hold on you and, like you said, if you don't move your data now, it gets harder. They're always looking for ways to keep you locked in the egress fees like oh, you want your data out. Well, here's your bill for taking it out. Here's your bill for keeping it here. Here's your other bill for removing it. You're like crap, I paid to get it there, I paid to hold it there and if it egresses, there's an egress fee to get it back out.

Ed: 1:10:58

Okay, it's just being held for ransom, isn't?

Tom Lawrence: 1:11:01

it Don't list your directories. There's transaction fees that come with some of these storage Like oh, you requested too many listings of your directories. Let me add those transaction fees. You only have the first few free. If you want to know what's in there, you'd run an rsync on there and it does the transactions. This has been problems people have run into, like I got hit for a bunch of. Can figure out a way to monetize it. It's a thing. That's how that works.

Ed: 1:11:29

That's crazy, isn't it?

Tom Lawrence: 1:11:31

Yeah, yeah, I love when you look at your bill for some of these large block cloud storage stuff. You're like they figured out a way to put a fee on everything. No one knows how that. It's like this pages of complexity to make it hard for you to figure out what your bill is. Yeah, you don't realize how much you save by bringing it back.

Tom Lawrence: 1:11:49

We definitely deal with a lot of companies who do very large companies. I mean, we sold two petabytes of storage recently to a client and they want to buy two more petabytes. That's how much you're pulling. Well, first it was the cloud was too expensive to hold the data. They didn't have two petabytes in the cloud, but that was why they didn't have. They actually have the ability to create more data for what they're doing analytics on, which is a bunch of engineering data, but it was not tenable to do this in the cloud.

Tom Lawrence: 1:12:12

They're like you know the engineers like, hey, this is what we need to do. And people go, well, that's what the bill will be. And then finance goes, no, you can't do that. But when we came to us as a solution Like we have this data problem, we need this much data we're like, well, we put this data in here. How much did it cost? Well, one-time cost of this, that's it. And I'm like, yeah, and they're like, well, you know how big our cloud bill is. I said probably more than that, and so they now were able to do the thing they wanted to build. You know it would be a huge company here doing this, but it actually made it affordable for them. You know, hosting it all inside their own data center. To crunch on that, they're like, wow, we can just stack these systems and build two better ways of storage and it doesn't have this constant recurring cost with it.

Ed: 1:13:00

That's really good to hear. Companies are doing that for sure.

Tom Lawrence: 1:13:02

Yeah, I wish I could talk about it more like in a public and save money, you know yeah, I do wish I could talk about them more, like they're very private about the fact that we can't I can't do a video about me being at the company. I can talk about the project, I just can't say the name of the company doing it I. But I I'm hoping we're trying to get more companies to open up about it because everyone's afraid that they're going to fail if they try to do this, and I want to showcase you can and it's a good idea to do it, and that's how we start pulling back from. The hyperscalers are not the only solution out there. I mean, they're the safe bet for the CEO who doesn't understand the complexities, or I should say CTO, wherever the technology person is, they're scared to try it because they're like I'm extracted away from what hard drives are.

Tom Lawrence: 1:13:43

When I source stuff in Amazon, I don't think about hard drive. I don't know what a RAID Z1 or Z2 is and I don't have to, because I just give money to Amazon and they always make sure my data is there. They thought the learning curve would be steeper to bring that in-house and they realized this ain't that hard, but it takes a lot. When you think about that volume of data, they're like. It's like when you think about that volume of data, like are you sure it's okay that I have it? I can you know what I?

Ed: 1:14:14

mean, once they realize the tooling has gotten so good, they're like okay, it's not too bad. Yeah, like you know, as soon as they learn other companies are doing the same, it will encourage them to feel safe yes and also um.

Ed: 1:14:22

can you remember tom there's I? I think it was 2017, it might have been around that time, I think an Amazon data center had some sort of outage and quite a few customers actually just lost their data and never got it back, and there's no kind of comeback in the terms and conditions. It's kind of like you know if you lose your data. Well, if we lose your data, it sucks to be you. You should have a backup somewhere else.

Tom Lawrence: 1:14:45

Yes, this is one of those things that I pointed out when Google. It was Google that lost people's data and people freaked out about it, as they should. But there's actually a clause, and when I did a video about Google and the data problem, I put together the three different terms and conditions from Azure, from AWS and Google of you are responsible to up your data you put in our cloud. They're very clear on that. It is in. If anyone wants to take the time to read through the legal forms and know where to look, it's all in there, very clear. As if we misplace all that data you stuck in here. I mean, we're going to do the best effort to keep it, but it's still your problem to back it up and people don't realize they have to back up their cloud until they've lost something. It's been deleted and I think you know.

Tom Lawrence: 1:15:30

Going forward onto the cybersecurity topic a little bit, you're going to see more attacks where people find someone's data in the cloud and purge it. They're going to attack some of their cloud infrastructure. They're going to go after their access levels in AWS, you know, compromise someone's key and hold that data for ransom. Well, we've already watched them exfil that data frequently because they're not pulling it necessarily off of on-prem. When you see them like, hey, we're going to publish this data unless you pay us money, they frequently pulled that all out of their cloud systems right now. But the next step would be, you know, destroying that data in the cloud. I think the reason they don't as often is it's hard to put it back. It's not just encrypting in the cloud. From a ransomware operator's standpoint it's hard Because some of these companies have massive amounts of data in their clouds.

Ed: 1:16:14

You know, even for the home user as well. A lot of people think if it's in the cloud, that's fine, but you should still do the 3-2-1 backup rule. The cloud is just one of the backups.

Ed: 1:16:26

Yeah, it's not all three in one right, right, right, yeah, having a couple copies and always validating that those are working, that you know a process by which to pull some data back yeah, that's a really good point as well to me saying that you know people do need to test their backups so they're not just finding out and hoping, crossing their fingers, that the backup's okay, when they actually need it yeah, that's.

Tom Lawrence: 1:16:50

The popular problem as people run into is that they don't realize that they didn't know how to restore something. Um, until they're doing it and unfortunately they're almost always doing it under duress of we need this now. I've been messing with it for the last two hours or a day and now it's it's go time. People are actually going I really need this data now and realizing they never really validated how that system was working well.

Ed: 1:17:15

Thank you very, very much for your insight today, tom it's a real pleasure talking to you. I've really enjoyed chatting to you today and I'm sure all of the users undoubtedly left a mark on them with your knowledge. Listening to this today Again, I would like to just ask you how can people get hold of you? Where can they reach out?

Tom Lawrence: 1:17:38

Everything is centered around LawrenceSystemscom. I make it really easy. That's where you'll find connections to my forums, my videos and all the socials that whenever you're listening to this, whatever I'm connected to, I'm very adamant about keeping that up to date, so I'm always easy to find, easy to get a hold of or easy to message.

Ed: 1:17:54

Thank you very, very much, tom. Again. I've been a great fan of yours for many years and it's been wonderful meeting you. Thank you very much for joining and thank you everyone for tuning in. Until next time, you know, keep pushing the boundaries of your home lab. So farewell and happy home labbing. Thank you bye.

The Uncast Show

A ZFS Masterclass with Tom Lawrence

Listen to this podcast on