No description
  • Jinja 40.7%
  • HTML 32.7%
  • HCL 12.3%
  • CSS 11.3%
  • JavaScript 3%
Find a file
Stefan Tanta de20e0eea7 chore: housekeeping — CLAUDE.md, ansible.cfg, TODO + roadmap docs, website placeholder
- CLAUDE.md: minor refresh of provisioned-state notes
- iac/ansible/ansible.cfg: fact caching tweaks for the fleet
- TODO-postdeploy.md: post-cutover checklist items not yet captured in
  per-role READMEs
- TODO-roadmap.md: forward-looking ideas + the not-yet-planned bits
- raw_crackle_website_coming_soon/: static "coming soon" content served
  by the rawcrackle_site nginx LXC (default content source per the
  role's `rawcrackle_site_source_dir` default)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 17:20:15 +02:00
_archive/mailcow-hacks feat(infra): initial production rollout — mail, auth, tunnel, admin lockdown 2026-05-14 10:19:09 +02:00
iac chore: housekeeping — CLAUDE.md, ansible.cfg, TODO + roadmap docs, website placeholder 2026-05-25 17:20:15 +02:00
iac_archived feat(passbolt): replace Psono with self-hosted Passbolt EE 2026-05-25 17:19:39 +02:00
raw_crackle_website feat(infra): initial production rollout — mail, auth, tunnel, admin lockdown 2026-05-14 10:19:09 +02:00
raw_crackle_website_coming_soon chore: housekeeping — CLAUDE.md, ansible.cfg, TODO + roadmap docs, website placeholder 2026-05-25 17:20:15 +02:00
.gitignore feat(infra): initial production rollout — mail, auth, tunnel, admin lockdown 2026-05-14 10:19:09 +02:00
CLAUDE.md chore: housekeeping — CLAUDE.md, ansible.cfg, TODO + roadmap docs, website placeholder 2026-05-25 17:20:15 +02:00
edited-dns.txt feat(infra): initial production rollout — mail, auth, tunnel, admin lockdown 2026-05-14 10:19:09 +02:00
rawcrackle.ro.txt feat(infra): initial production rollout — mail, auth, tunnel, admin lockdown 2026-05-14 10:19:09 +02:00
README.md docs: refresh README with current service list + observability + WireWarp [skip ci] 2026-05-20 00:05:34 +02:00
TODO-authelia.md feat(infra): initial production rollout — mail, auth, tunnel, admin lockdown 2026-05-14 10:19:09 +02:00
TODO-passbolt-cutover.md feat(passbolt): replace Psono with self-hosted Passbolt EE 2026-05-25 17:19:39 +02:00
TODO-postdeploy.md chore: housekeeping — CLAUDE.md, ansible.cfg, TODO + roadmap docs, website placeholder 2026-05-25 17:20:15 +02:00
TODO-roadmap.md chore: housekeeping — CLAUDE.md, ansible.cfg, TODO + roadmap docs, website placeholder 2026-05-25 17:20:15 +02:00

Raw Crackle Lab

Homelab + public-facing infrastructure for Raw Crackle, a hip-hop recording studio in Bucharest (recording, mixing, mastering, rehearsals).

Hosts the public website at rawcrackle.ro plus the internal stack the studio runs on: directory, SSO, password vault, mail server, file sync, reverse proxy, DNS, observability, and a self-developed WireGuard overlay (WireWarp) that tunnels public ingress through a rented VPS so the residential ISP doesn't pollute deliverability or expose the studio's IP.

Stack at a glance

                 internet
                    │
                    ▼
        ┌───────────────────────┐
        │  VPS (IPAX, AT)       │   public IP 37.252.189.57
        │  37.252.189.57        │   PTR → mail.rawcrackle.ro
        │  WireWarp tunnel-svr  │   CrowdSec + iptables-bouncer
        └───────────┬───────────┘
                    │  WireGuard tunnel
                    │  (real source IPs preserved end-to-end)
                    ▼
        ┌───────────────────────┐
        │  WireWarp gateway VM  │   .40.10
        │  (DNAT + NAT + FW)    │
        └───────────┬───────────┘
                    │
       ┌────────────┼────────────────────────────┐
       ▼            ▼                            ▼
  Traefik     mailcow VM                  other services
  .40.111     .40.201                     (LXCs on .40.0/24)
  (80/443)    (25/465/587/993/4190)

IP allocation (192.168.40.0/24)

Tier IP range Examples
DNS .5 PiHole
Edge / hypervisor .10.19 WireWarp gateway (.10), px40 (.11)
Infra LXCs (depended on by others) .100.119 Traefik (.111), Authelia (.112), lldap (.113), WireWarp control (.114), Psono (.115)
App LXCs / VMs .120.199 (reserved — peer workloads)
Public-facing / observability .200.229 monitoring (.200), mailcow (.201), site (.202), nextcloud (.203), umami (.204), n8n (.205), homepage (.206)
Reserved / router .230.254 .254 studio gateway

Tier rule of thumb: anything in .100.119 going down takes half the lab with it; .120.229 workloads churn freely without touching infra. VMID follows the last octet (.40.111 → CT 111; .40.5 → CT 105 with the +100 alias for IPs < 100).

Services

Service URL Auth Surface
Public website https://rawcrackle.ro none public
Webmail https://mail.rawcrackle.ro mailcow login → SOGo via SSO public
Authelia portal https://auth.rawcrackle.ro self public
Autodiscover / Autoconfig https://{autodiscover,autoconfig}.rawcrackle.ro mailcow public (mail clients)
Nextcloud https://cloud.rawcrackle.ro lldap direct public
Collabora Office (WOPI) https://office.rawcrackle.ro Nextcloud session public (editor JS)
Umami tracker endpoint https://track.rawcrackle.ro none public (website JS only)
Homepage (app launcher) https://dash.int.rawcrackle.ro none — LAN gate LAN-only
Mailcow admin https://mailcow.infra.rawcrackle.ro/admin mailcow LAN-only
PiHole admin https://pihole.infra.rawcrackle.ro/admin Authelia (sso_pihole_admins) LAN-only
Traefik dashboard https://traefik.infra.rawcrackle.ro Authelia (sso_traefik_admins) LAN-only
lldap https://lldap.infra.rawcrackle.ro lldap LAN-only
Proxmox https://px.infra.rawcrackle.ro PVE PAM LAN-only
WireWarp https://wirewarp.infra.rawcrackle.ro WireWarp LAN-only
Psono https://stash.int.rawcrackle.ro LDAP-direct LAN-only
Grafana https://grafana.infra.rawcrackle.ro Grafana local LAN-only
Prometheus https://prometheus.infra.rawcrackle.ro none — LAN gate LAN-only
Loki https://loki.infra.rawcrackle.ro none — LAN gate LAN-only
Uptime-Kuma https://uptime.infra.rawcrackle.ro Uptime-Kuma local LAN-only
Umami admin https://analytics.infra.rawcrackle.ro Umami local LAN-only
n8n https://n8n.infra.rawcrackle.ro n8n owner account LAN-only

LAN-only routes return HTTP 403 from public via Traefik's internal-only middleware (RFC1918 + WG-range allowlist). Access from outside the studio LAN goes through a WireWarp client config. The defense-in-depth pattern: even if public DNS were ever pointed at a private hostname, Traefik 403s before any backend sees the request.

Mail

Self-hosted on mailcow. Mail leaves the studio via the WireWarp tunnel, so outbound is from the clean Austrian VPS IP rather than the studio's residential line. 10/10 on mail-tester.com on the first send — full DKIM + SPF + DMARC + PTR + FCrDNS chain. lldap drives mailbox lifecycle via mailcow's built-in LDAP IdP (members of mailcow_account get a mailbox auto-provisioned on first sync). Daily timer pulls the LE wildcard from Traefik's acme.json over SSH and reloads postfix/dovecot/nginx only on cert change.

Upgrades are operator-driven, not via Ansible:

ssh root@192.168.40.201
cd /opt/mailcow-dockerized && ./update.sh

WireWarp

Tunnel orchestration is split across three pieces:

  • Control LXC (.40.114) — FastAPI + Postgres dashboard + WS hub, pinned to a specific upstream SHA in roles/wirewarp_control/defaults/main.yml. Bump the ref and re-run --tags wirewarp-control to take an upstream update; the role discards local edits and force-rebuilds the api image.
  • Gateway VM (.40.10) — tunnel client to the VPS, plus DNAT/SNAT for the LAN. Runs the WireWarp agent (wirewarp-agent, systemd-managed) and a per-attachment routing healer.
  • Tunnel server — the rented VPS at 37.252.189.57. Same agent, mode server. CrowdSec + iptables-firewall-bouncer optional (and one-click installable from the WireWarp dashboard — the agent does the apt install
    • cscli registration + auto-applies a whitelist covering every known IP and subnet in the environment).

The agent self-heals routing state every 60s: ip rule fwmark / per-table routes / mangle CONNMARK rules / MSS clamp / MASQUERADE all get verified and re-installed on drift. A wirewarp-routing.service systemd unit is installed on first attach so iptables state survives reboots even when the agent is down.

Observability

Single-node monitoring stack on .40.200:

  • Prometheus — 90d retention, scrapes node_exporter + cAdvisor + Alloy + pihole-exporter + speedtest-exporter + restic-exporter + pve-exporter
  • Loki — 30d retention, fed by Alloy from every host's journald + Docker
  • Grafana — bundles dashboards for Traefik, Loki stack, host drilldown, fleet overview, PiHole, Restic, Nextcloud, Authelia
  • Uptime-Kuma — HTTP + push monitors (restic backups send heartbeats)

Every managed host runs the observability trio (node_exporter, cadvisor, alloy_agent) as sidecars — added by the Observability sidecars play that runs after common and before the rest of the stack.

Backups

restic to Backblaze B2, one timer per stateful host (see [backup_hosts] in the inventory). Each unit writes a heartbeat URL on success that Uptime-Kuma watches; a missed beat surfaces as a Uptime-Kuma incident.

Repo layout

raw_crackle_lab/
├── CLAUDE.md                 # canonical playbook for AI assistants
├── README.md                 # you are here
├── TODO-postdeploy.md        # hands-on items left after the IaC rollout
├── TODO-roadmap.md           # phased "finish the lab" plan
├── _archive/                 # retired patches kept for context
│   └── mailcow-hacks/        # direct-SOGo-login override (retired 2026-05)
├── raw_crackle_website/      # vanilla static site (HTML/CSS/JS, no build)
└── iac/
    ├── ansible/
    │   ├── ansible.cfg, requirements.yml
    │   ├── inventory.ini
    │   ├── group_vars/all/{vars.yml, vault.yml{.example,}}
    │   ├── host_vars/        # per-host overrides (firewall_internal_ports etc.)
    │   ├── site.yml          # full deploy, dependency-ordered
    │   └── roles/
    │       ├── common                  # apt/docker/hostname/timezone baseline
    │       ├── node_exporter           ┐
    │       ├── cadvisor                ├─ observability sidecars (every host)
    │       ├── alloy_agent             ┘
    │       ├── pihole                  # LAN DNS + split-horizon
    │       ├── lldap                   # directory (replaces FreeIPA for ≤10 users)
    │       ├── authelia                # OIDC IdP + ForwardAuth
    │       ├── wirewarp_control        # tunnel control plane
    │       ├── wirewarp_client         # gateway agent
    │       ├── traefik                 # reverse proxy + ACME (Cloudflare DNS-01)
    │       ├── mailcow                 # full mail stack on a VM
    │       ├── nextcloud               # file sync + Collabora Office
    │       ├── psono                   # password vault (LDAP-direct)
    │       ├── rawcrackle_site         # public site (static nginx)
    │       ├── monitoring              # Prometheus + Grafana + Loki + Uptime-Kuma
    │       ├── restic                  # B2 backups + heartbeats
    │       ├── umami                   # privacy-friendly analytics
    │       ├── n8n                     # workflow automation
    │       ├── homepage                # central app launcher
    │       └── firewall_internal_port  # DOCKER-USER LAN-bypass lockdown
    └── terraform/
        ├── containers.tf, vms.tf       # bpg/proxmox containers + VMs
        ├── locals.tf, variables.tf
        ├── files/                      # cloud-init templates
        ├── templates.tf, providers.tf
        └── terraform.tfvars{.example,} # gitignored real values + schema

Quick start (ops)

# First time on a new workstation
cd iac/ansible
ansible-galaxy collection install -r requirements.yml

# Terraform (Proxmox LXCs/VMs + Cloudflare DNS)
cd ../terraform
terraform init
terraform plan
terraform apply -target=proxmox_virtual_environment_container.<name>   # narrow first
terraform apply

# Ansible deploy — one service, narrow first
cd ../ansible
ansible-playbook site.yml --tags mailcow --limit mailcow_hosts

# Full deploy (dependency-ordered: common → observability sidecars → DNS →
# lldap → WireWarp → Traefik → Authelia → mail/cloud/etc. → monitoring →
# restic → firewall lockdown)
ansible-playbook site.yml

The firewall_internal_port role intentionally runs last so every other service has populated its DOCKER-USER chain by then. It locks down any LAN host that has firewall_internal_ports defined in its host_vars, ensuring a LAN-only service can't be reached by bypassing Traefik (e.g. curl http://192.168.40.114:8200 is dropped at the host firewall).

Secrets

Both iac/ansible/group_vars/all/vault.yml and iac/terraform/terraform.tfvars are plaintext + gitignored. Copy the .example siblings, fill in real values, never commit the result. Tradeoff acknowledged in CLAUDE.md: anyone with checkout access has the keys; the mitigation is that the repo only lives on a trusted workstation. Add a new secret in both the live file and the .example so the schema stays in sync.

Do not echo plaintext from vault.yml into shell commands — Ansible substitutes {{ vault_<name> }} at template-render time, and operator-side secret access should go via the user's password manager (Psono), not the checkout.

Conventions

Carried over from the user's other homelab repo:

  • Unprivileged LXCs by default. Privileged only with documented justification.
  • Docker data under /opt/appdata/<service>, compose at /opt/<service>/.
  • Image tags pinned to Major.Minor — never :latest. Exceptions per service (Psono EE compound triple, Nextcloud AIO datestamp imaginary tag) documented in role defaults.
  • All deploys idempotent: re-run = 0 changed. Verified on every role.
  • PVE LXC .conf edits use lineinfile write-then-cp (never direct writes through pmxcfs FUSE — silently truncates).
  • OpenWrt/GL.iNet routers: network changes via LuCI only, never UCI scripts.

See CLAUDE.md for the full list, including session-learned gotchas (Cloudflare CNAME FQDN bug, Pi-hole v6 cache flush, browser DoH override, mailcow SOGo redirect loop, WireWarp MTU blackhole, …).

Reference