A 12-week path to a DevOps or SRE job in Canada.
Read the actual material before you decide.
Artifact 1 of 3 — A chapter from the manual
One-line definition
Splitting account and transaction data across multiple database nodes so no single node holds more than its capacity, while preserving the invariant that every debit has a matching credit.
What it does
Failure shapes (excerpt — full chapter has 7)
F1 — Hot shard from a celebrity tenant
One shard's QPS is 5–40× the median. That shard's p99 query latency rises above 100ms. Remaining shards continue at normal load.
F4 — Monotonic key hot tail
Writes concentrate on the lexicographically last shard because account IDs are monotonically increasing. That one shard runs at 100% write IO while peers run at 5%.
Artifact 2 of 3 — A ticket you'd execute
Objective
Understand and document how payment-service deployment works end to end.
Prove understanding by writing it back in your own words.
Deliverables (in your fork at docs/ticket-05.md)
Acceptance criteria
Artifact 3 of 3 — Interview question, three levels of answers
"Walk me through how you'd design a multi-region active-active Postgres setup. Scale is 100K writes per second at peak."
"I'd set up Postgres in two regions with replication between them. Use a load balancer to route traffic. If one region goes down, the other takes over. We'd need to handle conflicts somehow."
"For active-active across regions you have to pick a conflict resolution strategy — last-write-wins, CRDTs, or application-level reconciliation. Streaming replication doesn't give you active-active. You need logical replication with conflict handlers (BDR, pglogical) or application-sharded routing where each region owns specific account ranges. At 100K writes/sec sustained, we're past what a single Postgres primary can handle even in one region."
"Active-active Postgres at 100K writes/sec is fighting the tool. Before I design it, I want to understand what you're actually trying to achieve — because the right answer is probably not active-active Postgres.
When the goal is regional latency: you want sharded writes with regional affinity.
When the goal is RPO=0 disaster recovery: you want synchronous replication to a hot standby.
When the goal is global consistency at this scale: you want Spanner, CockroachDB, or YugabyteDB. They were built for this.
What is the underlying constraint — let me show you why active-active Postgres is the wrong primitive for it."
33 tickets. One complete service. End to end.
Get access. Verify your environment.
Walk through cluster, CI/CD, GitOps, secrets, networking, observability, IAM, Postgres.
Fork. Scaffold. Containerize. Ship to the registry. Deploy to your namespace.
Provision a database. Connect it. Manage secrets properly.
Health checks. Graceful shutdown. Observability. Scaling.
The senior-level work that separates you from a junior.
Real failures injected during live sessions. Diagnose. Fix. Postmortem.
Prove your service is portable across cloud providers.
At graduation: 33 production-shaped tickets shipped across three real cloud environments.
Your GitHub history is the proof. Inspectable by any hiring manager.
Three repos. Same separation real teams use.
yarova/viro-services
YOU FORK THISyarova/viro-infra
YOU READ THISyarova/viro-tooling
YOU FORK THISTickets are GitHub Issues opened from templates in your forks. Your closed issues become your visible portfolio.
Ten stages. Discovery to Day 90.
Every stage is structured. You always know where you are and what comes next.
Three honest profiles.
About 70% of applicants fit one of three profiles. The rest have requirements the program can't address — that's what the discovery call is for.
In Canada and the runway is short.
Switching into IT from a non-technical career.
Already in IT, going senior.
$1,500 CAD. One-time. Everything included.
Complete the program and don't feel ready → rejoin the next cohort free.
Your readiness is the goal. Not the timeline.