large-storage-custom-nas - index.md

Put back the rest of the article and see if it works now.
This commit is contained in:
engineershamrock
2024-03-07 01:00:20 +00:00
parent c38cfe2f7b
commit 096545dd32

View File

@@ -225,3 +225,323 @@ The best we can do with 3 controllers is then 72 drives ( 1 RES240 card + other
Close but not yet 120 drives - ARG!!!!!
# Another change of design
As I was cabling this, space for cabling was also becoming a premium, even with SFF cables. So now I changed tac on the design for the last time, and brough my expectations down a little.
{{< image src="images/IMG_2883.jpg" width="150x" class="is-pulled-right" height="200x" title="">}}
So I did have this
{{< image src="images/IMG_2892.jpg" width="150x" class="is-pulled-right" height="200x" title="">}}
I am now going to do this
{{< image src="images/IMG_2905.jpg" width="150x" class="is-pulled-right" height="200x" title="">}}
Instead of using the 9 locations for drives I am going for 6. The last 3 I can use for the RES240 cards and allow some space for cabling, air flow and as it turns out some boot space as well.
In this picture you can see 84 drives, but I did mention 72, where did the other 12 come from ? Remember the 3 original controller cards, as it turns out they fit into the remaining PCI slots.
{{< image src="images/IMG_2937 (1).jpg" width="150x" class="is-pulled-right" height="200x" title="">}}
We now end up with this
{{< image src="images/IMG_2906.jpg" width="150x" class="is-pulled-right" height="200x" title="">}}
To show that it all works under Linux
It does take about 4 minutes to boot though!
```
- it boots - success
- you have loads of disks
- it is low power
- quite quiet
- how to use !!
```
# Lets RAID!!!!
Do not RAID 84 disks together, for the main reason, it you lose more than about 3 you lose the lot!
You could create 4 x 21 raid sets, some redundancy
I am not going into the pros and cons of how you configure RAID
My testing
I did get 84 SSDs from Ebay, 120Gig, reasonably cheap at ~5ukp each - of which many did not work or last very long.
I raided 21 disks, using the following structure
There are 3 controller cards with 24 disks ( 1 x RES240 with 20 and 4 drives direct )
There are 3 controller cards with 4 disks each
Take 6 disks from the three controller cards with 24 disks and one each from the others = 21
Linux will tell you which disk is attached to which controller!!
I paritioned the drives and set a UUID on them, some already had one so I just reused that.
----------- NEED ADD LINUX COMMANDS FOR THAT HERE THE OUTPUT IS MASSIVE, so PERHAPS A LINK to ANOTHER PAGE -----------------
## RAID Setup
I set up a RAID5 set with the following command
```
mdadm --create /dev/md1 --level=5 --raid-devices=21 \
/dev/disk/by-partuuid/1ea8d03a-01 \
/dev/disk/by-partuuid/4860020c-71cc-3042-95fc-5410011ce0b4 \
/dev/disk/by-partuuid/104eb35f-b56a-cd44-a7c7-08c4a8d14b88 \
/dev/disk/by-partuuid/dce51c65-31dd-304e-8181-d5f6a114a5d0 \
/dev/disk/by-partuuid/e0598cbf-e288-c84c-b9c2-3bbcd375cc5d \
/dev/disk/by-partuuid/63bfd06e-01 \
/dev/disk/by-partuuid/318f464f-01 \
/dev/disk/by-partuuid/ed30e572-01 \
/dev/disk/by-partuuid/52dd6fef-01 \
/dev/disk/by-partuuid/9902088f-7465-784b-8679-66fd69587999 \
/dev/disk/by-partuuid/84aca13e-01 \
/dev/disk/by-partuuid/ed61aa88-01 \
/dev/disk/by-partuuid/71c2c981-01 \
/dev/disk/by-partuuid/b1d7726c-01 \
/dev/disk/by-partuuid/9ef9ee1c-01 \
/dev/disk/by-partuuid/a0f92d92-01 \
/dev/disk/by-partuuid/b4d2d687-0e82-584c-93c8-59fde62012a0 \
/dev/disk/by-partuuid/eb95d63d-01 \
/dev/disk/by-partuuid/465e0077-01 \
/dev/disk/by-partuuid/abbcfdad-01 \
/dev/disk/by-partuuid/d0131bfc-01
```
Then using this command
```
mdadm --detail /dev/md1
```
I checked the status until sync was completed
```
root@test1:~# mdadm --detail /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Jan 27 23:25:15 2024
Raid Level : raid5
Array Size : 2343055360 (2.18 TiB 2.40 TB)
Used Dev Size : 117152768 (111.73 GiB 119.96 GB)
Raid Devices : 21
Total Devices : 21
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Jan 27 23:26:00 2024
State : clean, degraded, recovering
```
Finally, 23 minutes later - which is pretty fast.
```
/dev/md1:
Version : 1.2
Creation Time : Sat Jan 27 23:25:15 2024
Raid Level : raid5
Array Size : 2343055360 (2.18 TiB 2.40 TB)
Used Dev Size : 117152768 (111.73 GiB 119.96 GB)
Raid Devices : 21
Total Devices : 21
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Jan 27 23:49:04 2024
State : clean
Active Devices : 21
Working Devices : 21
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Name : test1:1 (local to host test1)
UUID : 77a16172:a32f8d3b:a54cbaef:ae6062ac
Events : 284
Number Major Minor RaidDevice State
0 65 129 0 active sync /dev/sdy1
1 65 161 1 active sync /dev/sdaa1
2 65 177 2 active sync /dev/sdab1
3 65 193 3 active sync /dev/sdac1
4 65 209 4 active sync /dev/sdad1
5 65 225 5 active sync /dev/sdae1
6 67 17 6 active sync /dev/sdax1
7 67 33 7 active sync /dev/sday1
8 67 49 8 active sync /dev/sdaz1
9 67 65 9 active sync /dev/sdba1
10 67 81 10 active sync /dev/sdbb1
11 67 97 11 active sync /dev/sdbc1
12 8 1 12 active sync /dev/sda1
13 8 17 13 active sync /dev/sdb1
14 8 33 14 active sync /dev/sdc1
15 8 49 15 active sync /dev/sdd1
16 8 65 16 active sync /dev/sde1
17 8 81 17 active sync /dev/sdf1
18 68 145 18 active sync /dev/sdbv1
19 68 209 19 active sync /dev/sdbz1
21 69 17 20 active sync /dev/sdcd1
```
# Speed testing
I started with some simple tests, linear write, 20Gig write.
21 disk array write speed
```
root@test1:/mnt/test1# time dd if=/dev/zero of=test1.img bs=1G count=20
20+0 records in
20+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 39.6505 s, 542 MB/s
real 0m39.700s
user 0m0.000s
sys 0m25.679s
```
It is doing around 4.3gbp/s - which is not bad, give 3 of the controllers are limited to 3gbp/s.
Let's do a read speed.
21 disk read speed
```
root@test1:/mnt/test1# time dd if=test1.img of=/dev/null
41943040+0 records in
41943040+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 49.0952 s, 437 MB/s
real 0m49.103s
user 0m20.529s
sys 0m28.248s
```
It is doing around 3.4gbp/s - which is not bad, give 3 of the controllers are limited to 3gbp/s.
These results suggests the 3gbp/s controllers are the bottle neck, but even so giving relatively good numbers, certainly saturating the dual GigE on the motherboard if the data was leaving the host.
Let's try a few more tools, good old hdparm
```
root@test1:/mnt/test1# hdparm -tT /dev/md1
/dev/md1:
Timing cached reads: 21896 MB in 1.99 seconds = 10996.23 MB/sec
Timing buffered disk reads: 6050 MB in 3.00 seconds = 2013.94 MB/sec
```
These seem very very wrong, if this is to be believed, we are doing 16gbp/s, doubtful!
but for comparison a single HDD
```
/dev/sda:
Timing cached reads: 13594 MB in 1.96 seconds = 6939.87 MB/sec
Timing buffered disk reads: 308 MB in 3.01 seconds = 102.22 MB/sec
```
Running FIO
I have never used FIO before although I have seen some numbers. I am not sure what it means but finding answers on the forums suggests it performs very well.
I used this command and the output was
```
root@test1:/mnt/test1# fio --randrepeat=1 --ioengine=libaio --direct=1 -gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=105000M --readwrite=randrw --rwmixread=80
```
The resulting output
```
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 105000MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=107MiB/s,w=26.3MiB/s][r=27.3k,w=6743 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=5841: Sun Jan 28 00:26:31 2024
read: IOPS=21400, BW=83.7MiB/s (87.8MB/s)(82.0GiB/1003444msec)
bw ( KiB/s): min= 16, max=222240, per=100.00%, avg=86497.62, stdev=65547.58, samples=1990
iops : min= 4, max=55560, avg=21624.22, stdev=16386.89, samples=1990
write: IOPS=5358, BW=20.9MiB/s (21.9MB/s)(20.5GiB/1003444msec); 0 zone resets
bw ( KiB/s): min= 7, max=54776, per=100.00%, avg=22629.65, stdev=16075.92, samples=1902
iops : min= 1, max=13694, avg=5657.23, stdev=4018.95, samples=1902
cpu : usr=8.15%, sys=30.35%, ctx=6181617, majf=0, minf=9
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=21503052,5376948,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=83.7MiB/s (87.8MB/s), 83.7MiB/s-83.7MiB/s (87.8MB/s-87.8MB/s), io=82.0GiB (88.1GB), run=1003444-1003444msec
WRITE: bw=20.9MiB/s (21.9MB/s), 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=20.5GiB (22.0GB), run=1003444-1003444msec
Disk stats (read/write):
md1: ios=21495384/5378585, merge=0/0, ticks=31171400/31594164, in_queue=62765564, util=98.75%, aggrios=1535273/512707, aggrmerge=19497/20772, aggrticks=2230727/289715, aggrin_queue=2522157, aggrutil=67.55%
sdcd: ios=1535876/513140, merge=21689/23237, ticks=2654830/534888, in_queue=3190903, util=61.87%
sdbz: ios=1538111/515020, merge=20991/21733, ticks=2125317/432538, in_queue=2558543, util=59.88%
sdbv: ios=1537666/514959, merge=18396/18476, ticks=464324/124285, in_queue=589472, util=58.81%
sdbc: ios=1532266/510557, merge=19268/20329, ticks=3051970/591808, in_queue=3646077, util=63.18%
sdy: ios=1533226/509737, merge=20233/21591, ticks=2234145/408149, in_queue=2643231, util=60.97%
sdbb: ios=1533713/511182, merge=21722/23302, ticks=4517761/823997, in_queue=5354193, util=67.55%
sdba: ios=1535677/513601, merge=22025/23731, ticks=341058/37258, in_queue=378959, util=58.52%
sdf: ios=1536464/513569, merge=16959/17704, ticks=2528700/421838, in_queue=2954362, util=61.62%
sdaz: ios=1536534/513980, merge=19586/20512, ticks=370571/36670, in_queue=407738, util=58.58%
sde: ios=1533479/510022, merge=18606/20941, ticks=280942/40412, in_queue=323263, util=58.49%
sday: ios=1534307/512167, merge=17272/17549, ticks=270661/38194, in_queue=310634, util=58.45%
sdd: ios=1535438/512562, merge=20708/23502, ticks=363458/39116, in_queue=402964, util=58.52%
sdax: ios=1532955/511012, merge=18014/19189, ticks=410740/34215, in_queue=445357, util=58.49%
sdc: ios=1537377/515147, merge=20142/22924, ticks=409537/121937, in_queue=531866, util=58.89%
sdb: ios=1538098/515180, merge=18736/19803, ticks=368787/35413, in_queue=404619, util=58.49%
sda: ios=1535075/512676, merge=16968/17633, ticks=370662/34674, in_queue=405753, util=58.52%
sdae: ios=1532895/510419, merge=21198/22907, ticks=1018891/54725, in_queue=1074198, util=58.79%
sdad: ios=1534688/512528, merge=22364/23787, ticks=13922220/1049834, in_queue=14974170, util=64.80%
sdac: ios=1536754/514816, merge=19230/21200, ticks=6133168/506412, in_queue=6640614, util=58.98%
sdab: ios=1536332/513772, merge=17682/17898, ticks=2471157/365254, in_queue=2838303, util=62.80%
sdaa: ios=1533806/510807, merge=17657/18275, ticks=2536377/352416, in_queue=2890093, util=62.51%
```
Interesting things to note - the read IOPS was 21k and the write IOPS was 5k, these are huge.
# Conclusions
- Cheap and fast storage servers are relatively straight forward to build
- Physical Space regardless of how much you have should be valued
- For drives > 10 use a SAS controller and a SAS expander card
- For the best performance spread your drives across your controllers, it's a little work, but worth it!
- SSD are low power, noise and heat - they are ££££s but certainly worth it
- Do not RAID 84 drives into one array, at most do 21 into one array ,use different controllers!!!
- Capacity at current rate 8TB @ £500 each , approx £42000 for 672TB raw storage which is
6.7pence per Gigabyte fixed cost - not monthly, quarterly etc.
- Cost to build - time is the biggest factor but hardware wise if all priced
- PSU - £120 - never be cheap with a PSU
- M/B - £350 - never be cheap with a M/B
- LSI cards - 3x - £100
- RES240 cards - 3X - £100
- Additional controllers - no cost - but £20 each - £60 ( if anyone wants one I have about 100 )
- Cables - too much - £300
- Chassis - you can get them built by Protocase - the specs can be downloaded £600
- Custom PCBs - £600 - it's the connectors - the PCBs are about £30, but each one has 30 connectors, Digikey about 1.50 average.
- For £1500 you too could build one!
I need to add in the power pictures and some other things etc