The Uncast Show

Local AI on Unraid - The Stuff Nobody Tells You

Unraid

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 25:45

Local AI on your Unraid server isn't just chatbots, it's image generation, music, voice cloning, photo recognition, automatic transcription, and so much more, all running on hardware you already own. But where do you even start? 

In this video I walk through the whole landscape of local AI for Unraid users. What it actually is, the two completely different types you can run, and what hardware you really need to get going. This isn't a setup guide. Think of it as your mental map of local AI, so when you do start installing things, you'll actually understand what you're doing and why.

We cover generative vs predictive AI, the three reasons people run AI locally (privacy, cost, control), real working examples on my own server, and a full walkthrough of the hardware tiers from CPU to high-end GPU, including which card I'd actually recommend you buy first and why.

Key Links

🔗 Frigate⁠ (open source NVR with AI object detection).

🔗 ⁠Frigate live demo⁠.

🔗 ⁠Speaches⁠ (Whisper + Kokoro container).

🔗 ⁠Ollama⁠.

🔗 ⁠Open WebUI⁠.

🔗 ⁠ComfyUI⁠.


▶️ Related videos on the channel.

🔗 ⁠A-Eye⁠ (local photo renaming with AI).

🔗 ⁠Paperless AI⁠ (auto-tagging your documents).


💬 Want me to make a follow-up video benchmarking all of those GPUs with proper AI benchmark software? Drop a comment below and let me know.


Hardware Recs

⁠USB Edge TPU ML Accelerator⁠.

⁠NVIDIA RTX 3060 12GB⁠.⁠NVIDIA RTX 5070⁠.


⏱️ Timestamps

⁠0:00⁠ - Cold open: what local AI can actually do

⁠2:13⁠ - Welcome and what this video is (and isn't)

⁠2:20⁠ - Everything you just saw was generated locally

⁠2:33⁠ - What is local AI? Privacy, cost, control

⁠3:00⁠ - Generative vs predictive AI explained

⁠3:25⁠ - Frigate demo (predictive AI in action)

⁠5:02⁠ - The Google Coral and edge AI

⁠5:22⁠ - Quick tip: USB Coral vs PCIe Coral 

⁠5:50⁠ - A-Eye: when predictive AI does need a GPU

⁠6:35⁠ - Whisper + Kokoro in one container (Speaches)

⁠7:53⁠ - Web UIs vs APIs: how local AI tools chain together

⁠8:18⁠ - OpenClaw: my own voice assistant built on Signal

⁠9:50⁠ - Ollama and Open WebUI for chatbots

⁠10:50⁠ - ComfyUI for image generation

⁠11:34⁠ - So what hardware do you actually need?

⁠12:00⁠ - VRAM, RAM and RAM speed for AI

⁠14:02⁠ - CPU tier

⁠15:08⁠ - Integrated GPU tier

⁠16:07⁠ - Dedicated GPUs: the lineup on the bench

⁠17:01⁠ - Benchmark video CTA

⁠18:38⁠ - Why the RTX 3060 12GB is the sweet spot

⁠20:22⁠ - The other Nvidia cards (2060, 2080 Ti, 5070, 5090)

⁠22:06⁠ - AMD and ROCm

⁠22:43⁠ - Intel Arc Pro B-series for local AI

⁠23:30⁠ - Edge AI recap

⁠24:15⁠ - A little secret about this video...


What will you build with Unraid? 

⁠Get Started with Unraid⁠ in 15 minutes or less.


Some of the links below are affiliate links, meaning we may earn a commission if you click through and make a purchase.

Send us Fan Mail

i7-14700, dual 10GbE, Arctic fans, Lifetime Unraid license included. Starting at $2,999. 

Reserve your spot with a $99 refundable deposit now

Other Ways to Connect with the Uncast Show


Welcome And The Big Promise

SPEAKER_02

Hey Unraders and welcome to another video. Now today we're going to be talking about local AI on our Unraid servers. Now this isn't a setup guide, those are going to be coming separately. So instead think of this as your mental map. So by the end of this video you'll know exactly what local AI is and what you can do with it, as well as what hardware you actually really need. But before any of that, let's just have a look what's possible. Let's see what we can do with local AI on an unrayed server. How about generating an image? Star Trek anyone? Let's try it. So as they say, a picture is worth a thousand words. And here's a picture of the Enterprise. But let's actually ask an LLM about the Enterprise. Okay, so it's giving back plenty of info there, but we haven't got time to read that now, have we? So let's get it to make a poem about the characters in Star Trek. Now I know what you're thinking a poem that's pretty lame, and you're probably right, so let's copy and paste this and let's

Local Generative AI Demo Reel

SPEAKER_02

turn this into a song. I reckon an old roots reggae track.

SPEAKER_04

Picard chips tea with the make it so. Well data dubita, hello is hello.

SPEAKER_02

So pretty cool, I think. If only there was a way we could actually ask someone like Jean-Luc Picard what he thinks about it.

SPEAKER_00

Mr. Space Invader 1! I have heard your song. The lyrics were worse than Data's poetry, and that I assure you is no small achievement. Kindly keep your NVIDIA GPUs well away from my warp core, and indeed, my holodeck.

SPEAKER_02

Yeah, okay. Never like Star Trek anyway. And for those of you who didn't, well. Hey Space Invader One, you crazy fool!

SPEAKER_05

Mr. T Pitti's a fool who calls himself Retro and forgets about the A-Team. Now quit this jibber-jabber, these demos have gone on long enough. Cut to the chase sucker and get on with the video.

SPEAKER_02

Yeah, I think you're right, Mr. T. Let's stop here with the demo and move on. But before we do, I just want to say stick around to the very end, because there's a little surprise hidden in the video that I think you might like. Everything you just saw, the lyrics, the music, the artwork, the voices, all were generated here locally on my Unraid server. Nothing at all went out to the cloud. And in fact, anything you see in this video AI generated was done locally here as well. So what is local AI really? Well, it's exactly what it sounds like. AI models running on your own hardware instead of sending everything off to cloud-based ones like ChatGPT, Claude and Gemini. Three reasons people do this privacy, well that's pretty obvious, your data never leaves your house, cost because you're not paying for a subscription, and control because you decide which models to run. Now everything you just saw in that intro was what we call generative AI. So it generates new things text, images, music, videos, voices, that type of thing. And that's half of local AI, and that normally gets all of the attention. But there's another side as well that's called predictive AI. Instead of generating new things, it recognises existing things. Now, one of the most popular examples of predictive AI you might have already heard of, especially if you're an Unrader, is Frigate, an open source NVR software that watches

What Local AI Really Means

SPEAKER_02

your camera feeds and it goes through every single frame looking for objects. Now I'd love to show you it running on my own server, but I've literally just moved house and I haven't got my CCTV cameras wired up yet. So instead I'm going to use Frigate's own demo site to show you what it does. So let's jump over to the review footage section here. And if I click here on all labels, I can select here person and click apply. And you can see here it just shows now the people in the various images. Now again, if I change that to dog, click apply, now we're just being shown things with dog. That's because the AI has seen all of this, recognized these things in the footage, and now we can filter it that way. So that's one good example of predictive AI. And one thing generally about predictive AI, if you look at the stats in the bottom of the screen, CPU's at about 20% and the Intel GPU it's sitting at zero. Frigate's doing all of that watching multiple cameras, classifying every clip, and it's barely using anything. And also probably Frigate's demo site will be running on very, very modest hardware. A lot less powerful than the average Unraid server. So that's the good thing about predictive AI is it doesn't really need a big chunky GPU to do this. Most of the time it doesn't need much hardware at all. So all of this that Frigate does can run

Predictive AI With Frigate Filters

SPEAKER_02

off a CPU without any issue. But if you didn't want to use the CPU, there is actually a special device you can actually plug into the server, like this little device you can see here. This is a Google Coral. And actually this is part of the reason I'm so keen to get my cameras up, because I bought this recently so I could run Frigate properly. And so by using this it will totally offload it from my CPU. And cost wise at the moment they're about 70 US dollars, or over here in the UK £70. Oh yeah, and quick tip if you're thinking of buying one, go for the USB version like I've got here. There's also a PCIe version and an M.2 version, but Google have actually archived the driver for this, and the drivers don't play nicely with newer Linux kernels. So sticking to the USB version on the other hand, that just works. So this type of device is what we call Edge AI, a tiny focus specialized hardware, but we'll come back to it properly in the hardware section. Now not all predictive AI is as lightweight as this. Some of it does need a little bit more power. An example of this is a container I made called AI, which we've done a video on in the past. And what this does is it looks at your photos, your actual image files, and works out what's in them using a vision model and then renames them there for you. Now I think this is quite a good example of how hardware matters because AI can technically run on a CPU. It will work, but it's pretty slow to be honest. It can take a minute or so to analyze each photo when using CPU compared to just a few seconds when using a GPU. So with predictive AI, sometimes hardware isn't about whether something works, it's about whether it's really actually usable. Now let me show you something pretty cool. This container here called Speeches, it's actually got both kinds of AI in it, both predictive and generative, side by side. So let's have a look at the predictive side, let's look at Whisper. Now what this can do is transcribe some audio for us. Okay, so I'm going to upload a file. And if we listen to the beginning, we'll see it's the beginning of this video. Hey Unraders and welcome to another video. So we've got it set to faster whisper here, and we can transcribe or translate. So let's transcribe this audio. Okay, so there we are, it's done. We can see there the text from the beginning of this video. So now we can take this same text here, and if we go to text to speech, this is gonna use Kokoro. So let's select the model and paste in our text, and we need to select a voice here. I'm gonna select AFNova, which is actually a female voice, and let's click generate speech. And let's listen to that.

SPEAKER_03

Hey Unraders and welcome to another video. Now today we're going to be talking about local AI on our Unraid servers. Now this isn't a setup guide, those are going to be coming separately.

SPEAKER_02

So you can see there, one container with an example of both predictive and generative AI. Now there's something worth pointing out here, I think, because it's something that makes local AI pretty powerful. And it's something a lot of beginners don't actually realise. Now what you've just seen me do with speeches, uploading the file and transcribing, then generating some speech, that's all done from the web UI, and that's the easy way. But almost every local AI tool has what we call an API. And an API is just a way for another piece of software to be able to talk to that tool and use it, without needing us humans clicking buttons. Now, for an example, I use something called OpenClaw which I

When Predictive AI Needs More Power

SPEAKER_02

run in a VM. Now basically this is my own voice assistant that I can talk to with a messaging app for which I use Signal. And under the hood, my OpenClaw, which I call Andromeda, is using speeches whisper to understand what I'm saying, if I send a voice note via signal rather than just typing in text. And it uses Quen3TTS to reply back to me using a clone voice. Now obviously I'm not gonna go through the whole setup here, but here I am on Signal. Let's have a look. So if I send a message to you, turn voice mode on. Here, if I press play, now you'll hear the voice reply.

SPEAKER_05

Listen up, fool! Voice mode is on, that's right. Mr. T is talking now, you'll set the autoreply with voice, and I'm the one doing the talking. Any message you send, I'm gonna pity the fool who don't get a voice response. Now what you need, sucker? Mr. T is ready to help.

SPEAKER_02

So just as an example here, I'm gonna ask for a status on one of my servers that I know is actually switched off.

SPEAKER_05

Listen up, fool! I got some bad news about Battlestar. Battlestar is offline. Sucker. I pity the fool who can't keep their server running. You need to get eyes on that box, fool. Check. If it's got power, check the network cables or get someone on site to give it the old turn it off and turn it back on treatment. Base star. And Nebuchadnezzar are still running clean, so it ain't your whole network. Just Battlestar being a problem child, want me to keep checking, or you're gonna handle this sucker.

SPEAKER_02

So why am I showing you this? Well, as well as just being pretty cool, it just shows you how you can link various different AI tools together to make more than just what the individual tool actually is.

Whisper Transcription And TTS Together

SPEAKER_02

Now speaking about linking things together, some AI containers need dependencies. For example, Alama here, if I open the Web UI, Alama on its own, unless using API, well, it doesn't really do very much. So what we can do is we can use another container which will connect via API to Alama. And probably the most popular one is OpenWeb UI. And we saw this in action at the beginning of the video when we did our search about Star Trek. And in fact, Open Web UI is going to be the app of the month this month, and you'll be able to see a video about setting it up with Alama along with a GPU. Now, Alama and Open Web UI, they're just for text, chatbots basically. It can't generate images or audio or anything like that. So if you want to do other kinds of generative AI, well you're going to need different software. And one of the most popular for image generation is something called Compfy UI. For example, here I'm generating the image that you just saw a moment ago in the video. I've been using CompfeUI to run Quen3 Image to make all of the graphics in this video. So yeah, obviously Comp4UI doesn't need Alama running alongside it. In fact, it doesn't need anything else running alongside it at all. With OpenWeb UI, you've got the interface in one container and Alama doing the heavy lifting in another. With Comp4UI, it's basically all in one. The interface and the engine are all baked in there together. Now Comp4UI, it isn't just for images either. It can be used for things like text-to-speech, video generation, all sorts of things. It's really flexible. Okay, we've seen a fair few examples of local AI in action. We've seen Frigate doing its predictive thing, AI, speeches, whisper, Kokoro, Alama with Open Web UI, Compfur UI doing the heavy lifting on images, and the thing all of this has in common is it needs

APIs And An AI Voice Assistant

SPEAKER_02

some type of hardware to run on. Some of it can run on barely anything, some of it really wants a proper GPU, so I think now it's time to talk about hardware. Just what do you actually need? Before we go through what hardware does what, let's talk about three things you need to understand about AI and memory, because get these things right and you'll understand the rest of the video easily. Now, first thing for AI on a GPU, I think the most important spec is VRAM. That's the memory on the graphics card itself. VRAM determines basically what models can even run at all. Speed on the GPU determines how fast it runs them, but if the model doesn't fit, it just doesn't run. Doesn't matter how fast your card is. So when people ask what GPU should I get for local AI, the real question is how much VRAM should I get? 8 gigs I think is the absolute minimum. 12 gigs I think for starting out is the sweet spot. 16 gigs is a bit more comfortable, and 24 gigs and 32 gigs and above, well that's enthusiast territory. Now second thing, if you're not using a GPU and you're running AI on your CPU instead, then your system RAM becomes the equivalent of VRAM. So if you've got a 32 gig unrayed server, you can run models that fit in around 28 gigs of that, because you need to leave some headroom for everything else. So the third thing, and this is one I think a lot of people don't realise, is RAM speed actually matters quite a lot for CPU AI, because the faster RAM means faster tokens per second. And so of course the same logic applies to integrated GPUs because they also use your system RAM. And with a dedicated GPU, RAM speed matters less because the GPU is using its own much faster VRAM and not the server's system RAM. But even then, the faster the VRAM on your GPU the better and the quicker things are going to run. So if you're planning to do CPU or IGPU AI, get the fastest RAM your motherboard supports. Okay, so with that in mind, let's go through the hardware tiers. Right, starting at the bottom, CPU. This is the floor. What you can do with literally no extra hardware at all, just the processor and your server's RAM. And honestly, you can do more than you'd think. Small language models, say a 3 billion or 7 billion parameter model that will run on a modern CPU. Slowly, but it will run. Whisper transcription on a CPU is genuinely usable. A lot of predictive AI runs perfectly happily on a CPU because as we said earlier, the workload is generally just lighter. Now what you won't be doing on a CPU is fast chatbot conversations, image generation, music generation, anything where you need real-time responses. It will work but you'll be waiting. You might want to run it, then go to bed, wake up and then see the answer. But if you're brand new to local AI and you just want to try it, the CPU is a free starting point. Get something like a Lama installed, pull a small model and see what it's like.

Picking Tools Ollama Open WebUI

SPEAKER_02

That way you don't need to spend anything at all. So stepping up from the CPU is something a lot of people don't think about, and that's the integrated GPU. So that's the graphics chip built onto your processor. Intel calls theirs, XE graphics on newer chips, and AMD has Radeon graphics built into their APUs. And the interesting thing about an IGPU is obviously it shares your system RAM as its memory. So if you've got 32 gigs of RAM on your Unraid server, your IGPU has potentially loads to work with, more than a lot of cheap dedicated GPUs would give you. So it's genuinely a useful free step up over pure CPU, especially for predictive AI workloads. And this is where you'll see RAM speed on the server actually matter. Fast RAM will make a difference here. Now of course you're still not going to be running a 70 billion parameter chatbot on one, but for smaller models and lighter workloads, an IGPU is a properly underrated option. Now this is the bit most people care about a dedicated graphics card. And rather than just talking specs at you, I've actually got six different GPUs here on the bench. From left to right, there's an RTX 2060 6GB, an RTX 2080 Ti 11GB, an RTX 3060 12GB, an RTX 5070 Ti 16GB. The last two cards here are AMD cards. There's the RX 6600XT and the RX 9070XT, another 16GB card. Now what you can't see here in my main server, there's an RTX 5090 32GB. I was lucky enough to get that card before the prices went crazy. And also I have access to a friend's Intel card, an Intel B50 16GB. Now you might be wondering, Ed, you've got all of these GPUs just sat here. Why aren't you benchmarking them and telling us all the tokens per second on each one? Well the honest answer is I didn't really want to make this video boring. This is supposed to be an introduction to AI and not a spec sheet. But if you really would like to see a follow-up video where I benchmark these and more GPUs, drop a comment below and if enough of you want it, then that will be an upcoming video on the channel. But even though I'm not showing you benchmarks of these GPUs at the moment, I have

VRAM RAM And Why Speed Matters

SPEAKER_02

used every single one of them with AI. Now when you've got a dedicated GPU, this is where local AI really opens up. Then the generative AI, chatbots, image generation, music, voice cloning, all of it gets dramatically more usable on a proper GPU because of one thing and that's VRAM. Remember what we said earlier, VRAM determines what we can run and GPU speed how fast it will run. With a dedicated card, you get fast dedicated VRAM, which makes a massive difference. And the faster the VRAM on the GPU, that makes a big difference as well. Now if you don't have a GPU and you want to buy a GPU, Nvidia is the easy path here. CUDA is everywhere in AI. Every container, every tool, every tutorial assumes Nvidia. Now AMD works and Intel is catching up fast. But if you want the easiest, least frustrating experience, Nvidia is what to get. And if you want my single recommendation for the best GPU to get you started with local AI, well it's this one here, the RTX 3060 with 12 gigs of VRAM. Now if you get a 3060, be careful, they did make an 8GB version, and the 12 gig one really is the sweet spot. Now 12 gigs is genuinely enough for the vast majority of AI workloads. You can run quantized 13 billion parameter chatbot, you can run things like stable diffusion, Quen 3 image, you can run local AI music generation like A step 1.5, and you can run voice cloning with Quen 3 TTS. Now obviously as well, it's Nvidia, so CUDA, everything's just gonna work, you're not gonna be debugging a whole load of issues. And third, the price, these cards in the UK you can get for around 200 to 250 pounds, and about the same in dollars in the United States. And as well, the physical side of this card is pretty small, so it'll fit in pretty much any case. And also another thing I like about this card is the power usage. It's a 170-watt card, so you don't really need to upgrade your PSU for this card. Pretty much any modern Unraid server can probably pop one of these in without any other changes. And also, something I think people don't mention is an exit strategy. If you buy a 3060, try it, and you decide local AI is not for you, you'll be able to sell that card on eBay in a few days. They're really popular and AI

Hardware Tiers And GPU Recommendations

SPEAKER_02

tinkerers want them all the time, and even budget gamers they want them too. So the downside risk on this purchase, in my opinion, is pretty minimal, and that's not true for most of the other options I'm gonna mention. Now the other NVIDIA card I had here was the RTX 2066 gig. It's an older card and it still does work, but the 6 gigs is really limiting. So I'd say if you've got one lying around, use it, but don't go out and buy one today. Again, another older NVIDIA card here, the RTX 2080 Ti. This is an 11 gig card, so a big improvement over the 2060, but you'll find these cards tend to go around the same money or sometimes a little more than the 3060 12GB, which has faster VRAM and more VRAM than this card. So if you've got one of these or you can get it cheap enough, then give it a go. But personally, I wouldn't be buying one of these today. Now the 5070 or any modern 16 gig card, this is a meaningful step up from the 3060. More VRAM plus it's much faster. But you're going to be spending real money now. With the price of hardware what it is now, getting a 5070 is going to be pretty expensive. And the 5090, even more so. I was lucky enough to get mine before prices went crazy. And the 5090 with 32 gigs, this really does give insane performance and can run models that nothing else that I've got in this image here can actually run. But it is massive, it's a four slot monster, and it needs a serious PSU. I see mine quite often hitting 400 watts when it's running AI workloads. But remember, you do not need a 50 90 for local AI. You really don't. Most of what you'll do day to day will comfortably fit on the 3060. Now you'll notice here I've got a couple of AMD cards. Now I don't have an Intel card myself, but if you guys want to have a benchmark video, I do have an Intel card that I can borrow. And a quick word on AMD. Yes, AMD works. Rockham is AMD's answer to CUDA, and it's getting better all the time. But it's often still more set up and has fewer tutorials about it, and the older AMD cards they have less support. If you've already got an AMD card, then see if you can use it. But again, if you're buying fresh, I'd still say Nvidia is the path of least resistance. Now another really interesting card I think is the Intel Arc Pro B50. This is an Intel card that's a bit of a dark horse, I think. They've launched a whole range of Arc Pro Workstation cards specifically aimed at local AI. The B50 has 16 gigs of VRAM and the B60 has 24 gigs of VRAM, then the B70 having 32 gigs, and the B70 is half the price of a 5090. So these cards have the potential to be pretty disruptive. The catch, however, is the software ecosystem for Intel is still catching up. Intel's iPEX LLM works, but it's not the kind of plug and play experience we get from CUDA. But if you like tinkering and you want the best VRAM per pound, this certainly is definitely worth looking at. And now the last category that we already met earlier in the video is Edge AI. Things like the Google Coral, specifically designed to run small predictive AI models, incredibly fast and efficiently, but they're not for things like chatbots, they won't work, they're not for image gen. But if you're doing things like frigate style camera detection, this is what can do the heavy lifting. But just remember it's a completely different category of hardware to GPUs, doing a completely different job. Right, so we've covered I think pretty much everything. The two types of AI, four tiers of hardware, so hopefully you've got a bit more of an idea about what local AI can do for you running on your unrayed server. So I almost forgot I was going to tell you at the end of this video a little secret. Now, during parts of this video, not all of it, I've been using my own voice clone by Quent3TTS. Could you tell which parts were really me and which weren't?

SPEAKER_01

Anyway, I just want to say thank you very much for watching this video. If you enjoyed it, please hit the like button and make it so. Please put in the comments below any other AI videos you would like to be made. And I'll catch you all in the next video. Engage.