From 096545dd3210fa737a20bad02f1ae4077a5890ac Mon Sep 17 00:00:00 2001 From: engineershamrock Date: Thu, 7 Mar 2024 01:00:20 +0000 Subject: [PATCH] large-storage-custom-nas - index.md Put back the rest of the article and see if it works now. --- .../2024/large-storage-custom-nas/index.md | 320 ++++++++++++++++++ 1 file changed, 320 insertions(+) diff --git a/content/blog/2024/large-storage-custom-nas/index.md b/content/blog/2024/large-storage-custom-nas/index.md index ac6973c..754df3b 100644 --- a/content/blog/2024/large-storage-custom-nas/index.md +++ b/content/blog/2024/large-storage-custom-nas/index.md @@ -225,3 +225,323 @@ The best we can do with 3 controllers is then 72 drives ( 1 RES240 card + other Close but not yet 120 drives - ARG!!!!! +# Another change of design + +As I was cabling this, space for cabling was also becoming a premium, even with SFF cables. So now I changed tac on the design for the last time, and brough my expectations down a little. + +{{< image src="images/IMG_2883.jpg" width="150x" class="is-pulled-right" height="200x" title="">}} + +So I did have this + +{{< image src="images/IMG_2892.jpg" width="150x" class="is-pulled-right" height="200x" title="">}} + +I am now going to do this + +{{< image src="images/IMG_2905.jpg" width="150x" class="is-pulled-right" height="200x" title="">}} + +Instead of using the 9 locations for drives I am going for 6. The last 3 I can use for the RES240 cards and allow some space for cabling, air flow and as it turns out some boot space as well. + +In this picture you can see 84 drives, but I did mention 72, where did the other 12 come from ? Remember the 3 original controller cards, as it turns out they fit into the remaining PCI slots. + +{{< image src="images/IMG_2937 (1).jpg" width="150x" class="is-pulled-right" height="200x" title="">}} + +We now end up with this + +{{< image src="images/IMG_2906.jpg" width="150x" class="is-pulled-right" height="200x" title="">}} + +To show that it all works under Linux + +It does take about 4 minutes to boot though! + +``` +- it boots - success +- you have loads of disks +- it is low power +- quite quiet +- how to use !! +``` + +# Lets RAID!!!! + +Do not RAID 84 disks together, for the main reason, it you lose more than about 3 you lose the lot! + +You could create 4 x 21 raid sets, some redundancy +I am not going into the pros and cons of how you configure RAID + +My testing + +I did get 84 SSDs from Ebay, 120Gig, reasonably cheap at ~5ukp each - of which many did not work or last very long. + +I raided 21 disks, using the following structure + +There are 3 controller cards with 24 disks ( 1 x RES240 with 20 and 4 drives direct ) +There are 3 controller cards with 4 disks each + +Take 6 disks from the three controller cards with 24 disks and one each from the others = 21 + +Linux will tell you which disk is attached to which controller!! + +I paritioned the drives and set a UUID on them, some already had one so I just reused that. + +----------- NEED ADD LINUX COMMANDS FOR THAT HERE THE OUTPUT IS MASSIVE, so PERHAPS A LINK to ANOTHER PAGE ----------------- + +## RAID Setup + +I set up a RAID5 set with the following command + +``` + mdadm --create /dev/md1 --level=5 --raid-devices=21 \ + /dev/disk/by-partuuid/1ea8d03a-01 \ + /dev/disk/by-partuuid/4860020c-71cc-3042-95fc-5410011ce0b4 \ + /dev/disk/by-partuuid/104eb35f-b56a-cd44-a7c7-08c4a8d14b88 \ + /dev/disk/by-partuuid/dce51c65-31dd-304e-8181-d5f6a114a5d0 \ + /dev/disk/by-partuuid/e0598cbf-e288-c84c-b9c2-3bbcd375cc5d \ + /dev/disk/by-partuuid/63bfd06e-01 \ + /dev/disk/by-partuuid/318f464f-01 \ + /dev/disk/by-partuuid/ed30e572-01 \ + /dev/disk/by-partuuid/52dd6fef-01 \ + /dev/disk/by-partuuid/9902088f-7465-784b-8679-66fd69587999 \ + /dev/disk/by-partuuid/84aca13e-01 \ + /dev/disk/by-partuuid/ed61aa88-01 \ + /dev/disk/by-partuuid/71c2c981-01 \ + /dev/disk/by-partuuid/b1d7726c-01 \ + /dev/disk/by-partuuid/9ef9ee1c-01 \ + /dev/disk/by-partuuid/a0f92d92-01 \ + /dev/disk/by-partuuid/b4d2d687-0e82-584c-93c8-59fde62012a0 \ + /dev/disk/by-partuuid/eb95d63d-01 \ + /dev/disk/by-partuuid/465e0077-01 \ + /dev/disk/by-partuuid/abbcfdad-01 \ + /dev/disk/by-partuuid/d0131bfc-01 +``` + +Then using this command + +``` +mdadm --detail /dev/md1 +``` + +I checked the status until sync was completed + +``` +root@test1:~# mdadm --detail /dev/md1 +/dev/md1: + Version : 1.2 + Creation Time : Sat Jan 27 23:25:15 2024 + Raid Level : raid5 + Array Size : 2343055360 (2.18 TiB 2.40 TB) + Used Dev Size : 117152768 (111.73 GiB 119.96 GB) + Raid Devices : 21 + Total Devices : 21 + Persistence : Superblock is persistent + + Intent Bitmap : Internal + + Update Time : Sat Jan 27 23:26:00 2024 + State : clean, degraded, recovering +``` + +Finally, 23 minutes later - which is pretty fast. + +``` +/dev/md1: + Version : 1.2 + Creation Time : Sat Jan 27 23:25:15 2024 + Raid Level : raid5 + Array Size : 2343055360 (2.18 TiB 2.40 TB) + Used Dev Size : 117152768 (111.73 GiB 119.96 GB) + Raid Devices : 21 + Total Devices : 21 + Persistence : Superblock is persistent + + Intent Bitmap : Internal + + Update Time : Sat Jan 27 23:49:04 2024 + State : clean + Active Devices : 21 + Working Devices : 21 + Failed Devices : 0 + Spare Devices : 0 + + Layout : left-symmetric + Chunk Size : 512K + +Consistency Policy : bitmap + + Name : test1:1 (local to host test1) + UUID : 77a16172:a32f8d3b:a54cbaef:ae6062ac + Events : 284 + + Number Major Minor RaidDevice State + 0 65 129 0 active sync /dev/sdy1 + 1 65 161 1 active sync /dev/sdaa1 + 2 65 177 2 active sync /dev/sdab1 + 3 65 193 3 active sync /dev/sdac1 + 4 65 209 4 active sync /dev/sdad1 + 5 65 225 5 active sync /dev/sdae1 + 6 67 17 6 active sync /dev/sdax1 + 7 67 33 7 active sync /dev/sday1 + 8 67 49 8 active sync /dev/sdaz1 + 9 67 65 9 active sync /dev/sdba1 + 10 67 81 10 active sync /dev/sdbb1 + 11 67 97 11 active sync /dev/sdbc1 + 12 8 1 12 active sync /dev/sda1 + 13 8 17 13 active sync /dev/sdb1 + 14 8 33 14 active sync /dev/sdc1 + 15 8 49 15 active sync /dev/sdd1 + 16 8 65 16 active sync /dev/sde1 + 17 8 81 17 active sync /dev/sdf1 + 18 68 145 18 active sync /dev/sdbv1 + 19 68 209 19 active sync /dev/sdbz1 + 21 69 17 20 active sync /dev/sdcd1 +``` + +# Speed testing + +I started with some simple tests, linear write, 20Gig write. + +21 disk array write speed + +``` +root@test1:/mnt/test1# time dd if=/dev/zero of=test1.img bs=1G count=20 +20+0 records in +20+0 records out +21474836480 bytes (21 GB, 20 GiB) copied, 39.6505 s, 542 MB/s + +real 0m39.700s +user 0m0.000s +sys 0m25.679s +``` + +It is doing around 4.3gbp/s - which is not bad, give 3 of the controllers are limited to 3gbp/s. + +Let's do a read speed. + +21 disk read speed +``` +root@test1:/mnt/test1# time dd if=test1.img of=/dev/null +41943040+0 records in +41943040+0 records out +21474836480 bytes (21 GB, 20 GiB) copied, 49.0952 s, 437 MB/s + +real 0m49.103s +user 0m20.529s +sys 0m28.248s +``` + +It is doing around 3.4gbp/s - which is not bad, give 3 of the controllers are limited to 3gbp/s. + +These results suggests the 3gbp/s controllers are the bottle neck, but even so giving relatively good numbers, certainly saturating the dual GigE on the motherboard if the data was leaving the host. + +Let's try a few more tools, good old hdparm + +``` +root@test1:/mnt/test1# hdparm -tT /dev/md1 + +/dev/md1: + Timing cached reads: 21896 MB in 1.99 seconds = 10996.23 MB/sec + Timing buffered disk reads: 6050 MB in 3.00 seconds = 2013.94 MB/sec +``` + +These seem very very wrong, if this is to be believed, we are doing 16gbp/s, doubtful! + +but for comparison a single HDD + +``` +/dev/sda: + Timing cached reads: 13594 MB in 1.96 seconds = 6939.87 MB/sec + Timing buffered disk reads: 308 MB in 3.01 seconds = 102.22 MB/sec +``` + +Running FIO + +I have never used FIO before although I have seen some numbers. I am not sure what it means but finding answers on the forums suggests it performs very well. + +I used this command and the output was + +``` +root@test1:/mnt/test1# fio --randrepeat=1 --ioengine=libaio --direct=1 -gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=105000M --readwrite=randrw --rwmixread=80 +``` + +The resulting output + +``` +test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64 +fio-3.28 +Starting 1 process +test: Laying out IO file (1 file / 105000MiB) + +Jobs: 1 (f=1): [m(1)][100.0%][r=107MiB/s,w=26.3MiB/s][r=27.3k,w=6743 IOPS][eta 00m:00s] +test: (groupid=0, jobs=1): err= 0: pid=5841: Sun Jan 28 00:26:31 2024 + read: IOPS=21400, BW=83.7MiB/s (87.8MB/s)(82.0GiB/1003444msec) + bw ( KiB/s): min= 16, max=222240, per=100.00%, avg=86497.62, stdev=65547.58, samples=1990 + iops : min= 4, max=55560, avg=21624.22, stdev=16386.89, samples=1990 + write: IOPS=5358, BW=20.9MiB/s (21.9MB/s)(20.5GiB/1003444msec); 0 zone resets + bw ( KiB/s): min= 7, max=54776, per=100.00%, avg=22629.65, stdev=16075.92, samples=1902 + iops : min= 1, max=13694, avg=5657.23, stdev=4018.95, samples=1902 + cpu : usr=8.15%, sys=30.35%, ctx=6181617, majf=0, minf=9 + IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% + submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% + complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% + issued rwts: total=21503052,5376948,0,0 short=0,0,0,0 dropped=0,0,0,0 + latency : target=0, window=0, percentile=100.00%, depth=64 + +Run status group 0 (all jobs): + READ: bw=83.7MiB/s (87.8MB/s), 83.7MiB/s-83.7MiB/s (87.8MB/s-87.8MB/s), io=82.0GiB (88.1GB), run=1003444-1003444msec + WRITE: bw=20.9MiB/s (21.9MB/s), 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=20.5GiB (22.0GB), run=1003444-1003444msec + +Disk stats (read/write): + md1: ios=21495384/5378585, merge=0/0, ticks=31171400/31594164, in_queue=62765564, util=98.75%, aggrios=1535273/512707, aggrmerge=19497/20772, aggrticks=2230727/289715, aggrin_queue=2522157, aggrutil=67.55% + sdcd: ios=1535876/513140, merge=21689/23237, ticks=2654830/534888, in_queue=3190903, util=61.87% + sdbz: ios=1538111/515020, merge=20991/21733, ticks=2125317/432538, in_queue=2558543, util=59.88% + sdbv: ios=1537666/514959, merge=18396/18476, ticks=464324/124285, in_queue=589472, util=58.81% + sdbc: ios=1532266/510557, merge=19268/20329, ticks=3051970/591808, in_queue=3646077, util=63.18% + sdy: ios=1533226/509737, merge=20233/21591, ticks=2234145/408149, in_queue=2643231, util=60.97% + sdbb: ios=1533713/511182, merge=21722/23302, ticks=4517761/823997, in_queue=5354193, util=67.55% + sdba: ios=1535677/513601, merge=22025/23731, ticks=341058/37258, in_queue=378959, util=58.52% + sdf: ios=1536464/513569, merge=16959/17704, ticks=2528700/421838, in_queue=2954362, util=61.62% + sdaz: ios=1536534/513980, merge=19586/20512, ticks=370571/36670, in_queue=407738, util=58.58% + sde: ios=1533479/510022, merge=18606/20941, ticks=280942/40412, in_queue=323263, util=58.49% + sday: ios=1534307/512167, merge=17272/17549, ticks=270661/38194, in_queue=310634, util=58.45% + sdd: ios=1535438/512562, merge=20708/23502, ticks=363458/39116, in_queue=402964, util=58.52% + sdax: ios=1532955/511012, merge=18014/19189, ticks=410740/34215, in_queue=445357, util=58.49% + sdc: ios=1537377/515147, merge=20142/22924, ticks=409537/121937, in_queue=531866, util=58.89% + sdb: ios=1538098/515180, merge=18736/19803, ticks=368787/35413, in_queue=404619, util=58.49% + sda: ios=1535075/512676, merge=16968/17633, ticks=370662/34674, in_queue=405753, util=58.52% + sdae: ios=1532895/510419, merge=21198/22907, ticks=1018891/54725, in_queue=1074198, util=58.79% + sdad: ios=1534688/512528, merge=22364/23787, ticks=13922220/1049834, in_queue=14974170, util=64.80% + sdac: ios=1536754/514816, merge=19230/21200, ticks=6133168/506412, in_queue=6640614, util=58.98% + sdab: ios=1536332/513772, merge=17682/17898, ticks=2471157/365254, in_queue=2838303, util=62.80% + sdaa: ios=1533806/510807, merge=17657/18275, ticks=2536377/352416, in_queue=2890093, util=62.51% +``` + +Interesting things to note - the read IOPS was 21k and the write IOPS was 5k, these are huge. + +# Conclusions + +- Cheap and fast storage servers are relatively straight forward to build +- Physical Space regardless of how much you have should be valued +- For drives > 10 use a SAS controller and a SAS expander card +- For the best performance spread your drives across your controllers, it's a little work, but worth it! +- SSD are low power, noise and heat - they are ££££s but certainly worth it +- Do not RAID 84 drives into one array, at most do 21 into one array ,use different controllers!!! +- Capacity at current rate 8TB @ £500 each , approx £42000 for 672TB raw storage which is +6.7pence per Gigabyte fixed cost - not monthly, quarterly etc. + +- Cost to build - time is the biggest factor but hardware wise if all priced + +- PSU - £120 - never be cheap with a PSU +- M/B - £350 - never be cheap with a M/B +- LSI cards - 3x - £100 +- RES240 cards - 3X - £100 +- Additional controllers - no cost - but £20 each - £60 ( if anyone wants one I have about 100 ) +- Cables - too much - £300 + +- Chassis - you can get them built by Protocase - the specs can be downloaded £600 + +- Custom PCBs - £600 - it's the connectors - the PCBs are about £30, but each one has 30 connectors, Digikey about 1.50 average. + +- For £1500 you too could build one! + +I need to add in the power pictures and some other things etc + + \ No newline at end of file