Hecate Server Connectivity Postmortem Analysis
Date: June 11, 2025 Incident: Hecate server (192.168.3.25) connectivity issues and cloudflared timeouts Root Cause: Improperly crimped RJ45 connector causing intermittent network disconnections
A detailed postmortem analysis of network connectivity issues that revealed the importance of proper physical layer infrastructure and led to recommendations for router upgrades and redundancy planning.
Backgroundβ
The hecate server experienced connectivity issues that initially appeared to be cloudflared-related timeouts. Investigation revealed the actual cause was a poorly crimped RJ45 connector that would disconnect when cables were moved during cleaning/maintenance.
Key Findingsβ
1. Network Infrastructure Issuesβ
- Initial Problem: RJ45 connector was over-crimped, making it difficult to insert into standard ports
- Symptom: Required force to insert, no proper click, easy disconnection when moved
- Root Cause Discovered: USB Ethernet adapter limiting connection to 100Mbps instead of 1Gbps
- Impact: Both intermittent connectivity and reduced network performance
- Solution: PCIe x1 Gigabit Ethernet card (RTL8111C) to replace USB adapter
- Available PCIe Slot: 00:1d.0 (Root Port #9) confirmed free and ready for installation
2. VPN Protocol Strategy for Russian Networksβ
Current working protocols in Russia:
- AmneziaWG: Primary choice, excellent bypass capabilities
- Xray: Secondary option, good performance
- Standard WireGuard: Blocked, not viable
- Cloudflare tunnels: Also blocked
3. Router Infrastructure Requirementsβ
Goal: Centralized, professional dual-WAN setup with VPN support
Recommended Options:
-
OpenWrt on Xiaomi routers (~$50-100)
- Maximum customization
- Can compile AmneziaWG support
- Cost-effective for experimentation
- Requires careful model selection (avoid non-flashable revisions)
-
MikroTik hEX S (~$100)
- Enterprise-grade RouterOS
- Excellent dual-WAN support
- SSH + WinBox management
- Transferable skills
-
Keenetic Ultra (~$150)
- Stable, Russian-supported firmware
- Built-in dual-WAN
- Good for production use
4. Server Architecture Strategyβ
Current: Single hecate server running Ollama + Jellyfin + GPU workloads Target: Distributed setup with backup capabilities
Backup Server Requirements:
- High-end CPU for large models (Ryzen 9/Intel i9)
- GPU with VRAM >= current hecate
- 64GB+ RAM for CPU-based large models
- Role: Ollama backup + experimental model testing
Action Itemsβ
Immediate (Network Stability)β
- Re-crimp all RJ45 connectors properly
- Test cable integrity with network tester
- Root cause identified: USB Ethernet adapter limiting speed to 100Mbps
- Solution ordered: PCIe x1 RTL8111C Gigabit Ethernet card (arriving tomorrow)
- Install PCIe Ethernet card in available slot (00:1d.0)
- Configure new network interface and test gigabit speeds
Short-term (Infrastructure)β
- Research and purchase OpenWrt-compatible router
- Implement dual-WAN configuration
- Set up centralized VPN management
Medium-term (Redundancy)β
- Spec and build backup server
- Implement Ollama load balancing/failover
- Update Ansible configurations for new infrastructure
Long-term (Monitoring)β
- Deploy comprehensive monitoring stack
- Set up early warning systems for connectivity issues
- Document all network topology and configurations
Technical Notesβ
RJ45 Connector Best Practicesβ
- Connectors compress during proper crimping to standard dimensions
- Should insert easily but hold firmly with audible click
- Test with cable tester after crimping
- When in doubt, re-crimp rather than troubleshoot intermittent issues
VPN Protocol Considerationsβ
- AmneziaWG not yet available in standard router firmware
- May require separate VPS/server for AmneziaWG termination
- Standard protocols (WireGuard, OpenVPN, IPSec) available on most routers
Lessons Learnedβ
- Physical layer issues can masquerade as application problems - Always check cables first
- Proper tooling and technique matter - Invest time in learning correct crimping procedures
- Redundancy is critical for trading systems - Single points of failure are unacceptable
- Network infrastructure should be professional-grade - Consumer equipment has limitations
Next Stepsβ
Focus on OpenWrt router solution for cost-effectiveness and maximum flexibility. This approach allows experimentation with custom VPN protocols while maintaining professional network management capabilities.