icon
Update time
Jul 21, 2022 02:53 PM
Internal status
password
故障表现
网站用户无法注册,还出现很多莫名其妙的问题
阿里云监控情况
总共8个数据库,其中6号数据库内存超过设定的最大值
![notion image](https://www.notion.so/image/https%3A%2F%2Ffile.notion.so%2Ff%2Ff%2F94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%2F0acb79e2-1c9c-4d04-8c1b-03397321f235%2Fpage1.png%3Fid%3D0ad87ccc-6705-4a76-87f1-0c102fb48eda%26table%3Dblock%26spaceId%3D94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%26expirationTimestamp%3D1722009600000%26signature%3DWWgazZS9SLN5byS9AEhI24hnxpvRCgbQFkSoP-DBmGY?table=block&id=0ad87ccc-6705-4a76-87f1-0c102fb48eda&cache=v2)
但是总内存情况并没有出现高负载的情况
![notion image](https://www.notion.so/image/https%3A%2F%2Ffile.notion.so%2Ff%2Ff%2F94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%2Fb2671e2f-c854-43c8-8427-f9d870cc7de9%2Fpage2.png%3Fid%3D39dde559-6dfb-4dd4-8eb0-91f9ebee946d%26table%3Dblock%26spaceId%3D94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%26expirationTimestamp%3D1722009600000%26signature%3DeevQ-tkBMjh2tM6Y42f-CFUk2oqzBA0nUJhY9czVBEA?table=block&id=39dde559-6dfb-4dd4-8eb0-91f9ebee946d&cache=v2)
切换Redis集群架构后出现的问题
原因:因为php-fpm运行方式选择的静态的,常驻模式,项目中配置的域名,实际在切换之后域名解析的ip变了,但是php-fpm存在dns缓存,还是连接旧的redis服务,出现此问题
解决:重启web服务器上的php-fpm
![notion image](https://www.notion.so/image/https%3A%2F%2Ffile.notion.so%2Ff%2Ff%2F94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%2Fc5697024-cbf8-46c2-bbbd-ef3f16520e56%2Fpage3.png%3Fid%3D68577ee5-d490-4127-845e-7b8700dc24de%26table%3Dblock%26spaceId%3D94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%26expirationTimestamp%3D1722009600000%26signature%3DT1uVDViNZWoUcLabQ0DF6YRflXVvr6qhtzD8OpF_Ogk?table=block&id=68577ee5-d490-4127-845e-7b8700dc24de&cache=v2)
切换为4分片集群后的内存使用情况
虽然当时网站恢复正常,但是实际上并没有解决问题,3号数据库还可能出现内存满了,然后回收内存的情况
![notion image](https://www.notion.so/image/https%3A%2F%2Ffile.notion.so%2Ff%2Ff%2F94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%2F98af1981-e5a1-424b-97bf-a46fa22bacb4%2Fpage4.png%3Fid%3D71792594-7556-4af7-96f7-2938bd92aab0%26table%3Dblock%26spaceId%3D94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%26expirationTimestamp%3D1722009600000%26signature%3DYmskn9gGU6lXLhG0VX_JWMWtzTc954MNwNdcwwV1duk?table=block&id=71792594-7556-4af7-96f7-2938bd92aab0&cache=v2)
继续通过阿里云的监控排查问题,发现是6号数据库的内存超限是由于存在大key
![notion image](https://www.notion.so/image/https%3A%2F%2Ffile.notion.so%2Ff%2Ff%2F94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%2F7184c434-f648-4448-813a-9401a08f4731%2Fpage5.png%3Fid%3Df952113f-280f-4c37-910a-22b60515e839%26table%3Dblock%26spaceId%3D94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%26expirationTimestamp%3D1722009600000%26signature%3D2baGxNVHzN65o8auIXfqb19MhDkkhZQ0vy8xYd0efQA?table=block&id=f952113f-280f-4c37-910a-22b60515e839&cache=v2)
此大key业务上是为了录入steam新出的饰品,但是实际上没消费或者消费很慢
解决方案:业务上其实不像C5GAME一样,需要自动采集新饰品种类,在代码上删除掉往这个key推送任务的逻辑
阿里云可选的Redis服务架构
标准版
![notion image](https://www.notion.so/image/https%3A%2F%2Ffile.notion.so%2Ff%2Ff%2F94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%2Fecd49d34-faa0-47d1-bf35-20318bf74777%2Fpage6.png%3Fid%3Da1ec3248-10bb-4037-9f75-4074064fca7f%26table%3Dblock%26spaceId%3D94c79f86-3a7f-4d2d-ac0e-eaccf7961f02%26expirationTimestamp%3D1722009600000%26signature%3D5PehAJ4DvMWMEFl7o_BywSQWoYki5TXknl7PmmaqlzY?table=block&id=a1ec3248-10bb-4037-9f75-4074064fca7f&cache=v2)
结论
- Redis集群选型的时候,初期可以选择主从架构,而不是分片架构,主从架构不会有倾斜的情况发生;当后期达到性能瓶颈之后,切换为分片集群
- 如果选用了分片集群,一定要加上针对每个分片的监控报警,防患于未然
- 应用中如果使用了List,Set,Hash等集合类型,一定要监控业务消费是否正常,避免大key的产生