Prompt Library

ChatGPT Prompts for DevOps & SRE

30 copy-paste prompts

Thirty battle-tested prompts that turn ChatGPT into a DevOps copilot — generating real CI/CD pipelines, Terraform, Kubernetes manifests, alerting rules, postmortems, and cost-cutting plans with [YOUR STACK] placeholders you swap in.

In short: This page contains 30 copy-paste ready prompts, organized into 6 categories with a description and pro tip for each. The first 15 prompts are free instantly — no signup needed. Hand-curated and tested by the AI Academy team.

By Louis Corneloup · Founder, Techpresso

Last updated May 15, 2026·Hand-curated & tested by the AI Academy team

CI/CD Pipelines

5 prompts

Generate a Full CI/CD Pipeline

1/30

<context> CI platform: [GITHUB ACTIONS / GITLAB CI / CIRCLECI] App: [LANGUAGE + FRAMEWORK, e.g. Node 20 + Next.js] Build artifact: [DOCKER IMAGE / STATIC BUNDLE / JAR] Deploy target: [ECS / KUBERNETES / VERCEL / SSH HOST] Environments: [dev, staging, prod] Secrets store: [GITHUB SECRETS / VAULT / AWS SSM] </context> <task> Write a complete, production-ready pipeline config as a single file I can commit. 1. Trigger on push to main and on pull_request. 2. Stages: lint, test (with cache for deps), build, security scan, deploy. 3. Cache dependencies and the build layer to cut runtime. 4. Gate deploy to prod behind a manual approval and a successful staging deploy. 5. Inject secrets from the named store — never hardcode them. 6. Add concurrency control so superseded runs are cancelled. Output the full YAML, then a 3-bullet summary of what each stage does and any assumptions you made. </task>

Produces a complete, commit-ready pipeline file with caching, gating, and security scanning for your exact stack.

💡

Pro tip: Paste your existing pipeline first and ask ChatGPT to diff against best practices instead of starting from scratch.

Debug a Failing Pipeline

2/30

<context> CI platform: [GITHUB ACTIONS / GITLAB CI] Failing job: [JOB NAME] Error log (paste raw): [PASTE THE LAST 50 LINES OF THE FAILED LOG] What changed recently: [DEPENDENCY BUMP / CONFIG EDIT / NONE] </context> <task> Diagnose why this CI job is failing. 1. Identify the single root cause from the log, quoting the exact line. 2. List 2-3 plausible alternative causes and how to rule each out. 3. Give the precise fix as a code or config diff. 4. Suggest one guardrail (a test, a lint rule, a cache key change) that prevents this class of failure. Be concrete — no generic 'check your dependencies' advice. </task>

Turns a raw failure log into a quoted root cause, a diff-style fix, and a prevention guardrail.

💡

Pro tip: Always paste the raw log, not your paraphrase — ChatGPT reads stack traces and exit codes far better than summaries.

Add Automated Testing Gates

3/30

<context> Repo: [LANGUAGE + TEST FRAMEWORK] Current coverage: [NUMBER]% or unknown CI platform: [PLATFORM] Risk areas: [PAYMENTS / AUTH / DATA MIGRATION] </context> <task> Design a quality gate strategy for our pipeline. 1. Recommend which test tiers to run on PRs vs nightly (unit, integration, e2e, smoke). 2. Define a coverage threshold and how to enforce it without blocking every PR. 3. Add a fast 'changed-files-only' test path so PRs stay quick. 4. Specify what should hard-fail the build vs warn. Output the CI config snippet plus a short rationale for each gate. </task>

Gives a layered test-gate strategy with config and clear fail-vs-warn rules tuned to your risk areas.

💡

Pro tip: Ask it to optimize for PR feedback under 5 minutes — speed is what actually drives test adoption.

Blue-Green / Canary Deploy Plan

4/30

<context> App: [SERVICE NAME + RUNTIME] Orchestrator: [KUBERNETES / ECS / NOMAD] Traffic router: [INGRESS-NGINX / ISTIO / ALB / FLAGGER] Rollback SLA: [e.g. under 2 minutes] </context> <task> Design a progressive delivery strategy for this service. 1. Recommend blue-green vs canary for our setup and justify it. 2. Define the traffic-shift steps (e.g. 5% then 25% then 100%) with promotion criteria. 3. List the metrics that must stay green to auto-promote and the thresholds that trigger auto-rollback. 4. Provide the config for the chosen router and a manual rollback command. Keep it specific to the named orchestrator and router. </task>

Outputs a progressive-delivery plan with traffic steps, promotion metrics, and a tested rollback command.

💡

Pro tip: Name your real router (Istio, Flagger, ALB) — generic advice is useless here, the config differs wildly.

Write a Release & Versioning Workflow

5/30

<context> Repo style: [MONOREPO / SINGLE PACKAGE] Language/ecosystem: [NODE / GO / PYTHON] Versioning: [SEMVER / CALVER] Registry: [NPM / GHCR / DOCKER HUB] </context> <task> Build an automated release workflow. 1. Choose a versioning approach (semantic-release, changesets, or conventional commits) and explain why for our setup. 2. Generate the CI config that bumps version, builds, tags, and publishes on merge to main. 3. Auto-generate a CHANGELOG from commit history. 4. Create a draft GitHub Release with the notes attached. Output the workflow file plus the commit-message convention contributors must follow. </task>

Produces an end-to-end automated release pipeline with versioning, changelog, and registry publish.

💡

Pro tip: Tell it your team's commit discipline honestly — if commits are messy, conventional-commits enforcement should be part of the answer.

Prompts get you started. Tutorials level you up.

A growing library of 300+ hands-on AI tutorials. New tutorials added every week.

Start 7-Day Free Trial

Infrastructure as Code (Terraform)

5 prompts

Scaffold a Terraform Module

6/30

<context> Cloud: [AWS / GCP / AZURE] Resource to provision: [e.g. VPC + ECS Fargate service + ALB] Environments: [dev, staging, prod] State backend: [S3 + DYNAMODB / GCS / TERRAFORM CLOUD] Naming convention: [e.g. {env}-{service}-{resource}] </context> <task> Write a reusable Terraform module for this infrastructure. 1. Structure it as main.tf, variables.tf, outputs.tf with sensible variable defaults. 2. Parameterize anything that differs per environment. 3. Tag every resource with environment, owner, and cost-center variables. 4. Configure the named remote state backend with locking. 5. Add input validation blocks on critical variables. Output each file in a separate code block and a one-line usage example. </task>

Generates a clean, parameterized Terraform module with remote state, tagging, and input validation.

💡

Pro tip: Ask for variable validation blocks explicitly — they catch bad inputs at plan time instead of mid-apply.

Review Terraform for Security & Drift

7/30

<context> Cloud: [AWS / GCP / AZURE] Terraform (paste): [PASTE YOUR .tf FILES OR PLAN OUTPUT] Compliance needs: [SOC2 / HIPAA / NONE] </context> <task> Audit this Terraform for security and reliability issues. 1. Flag public exposure (open security groups, public buckets, 0.0.0.0/0 rules) with the exact resource and line. 2. Check encryption at rest and in transit on every storage and DB resource. 3. Identify hardcoded secrets or values that should be variables. 4. Note missing lifecycle rules (prevent_destroy on stateful resources, ignore_changes where needed). 5. Rank findings by severity and give the fix diff for each. Be specific to the resources shown — do not invent issues. </task>

Audits your actual Terraform for exposed resources, missing encryption, and hardcoded secrets with ranked fixes.

💡

Pro tip: Paste your plan output too — it reveals drift and computed values that the raw .tf files hide.

Convert Manual Console Setup to Terraform

8/30

<context> Cloud: [AWS / GCP / AZURE] What I built by hand in the console: [DESCRIBE THE RESOURCES, OR PASTE describe/show CLI OUTPUT] State backend: [BACKEND] </context> <task> Help me codify this click-ops infrastructure into Terraform. 1. Produce the Terraform resource blocks that match what I described. 2. List the exact 'terraform import' commands to adopt the existing resources without recreating them. 3. Flag any settings I likely forgot to mention (defaults the console set silently). 4. Warn me about any resource that import does not support cleanly. Give me the import commands in the safe order to run them. </task>

Reverse-engineers console-created infrastructure into Terraform with the exact import commands to adopt it safely.

💡

Pro tip: Paste the CLI describe output for each resource — it captures silent console defaults that descriptions miss.

Design a Multi-Environment Structure

9/30

<context> Cloud: [CLOUD] Environments: [dev, staging, prod] Team size: [SMALL / MEDIUM / LARGE] Current pain: [COPY-PASTED FOLDERS / DRIFT / SLOW APPLIES] </context> <task> Recommend a Terraform repo structure for multiple environments. 1. Compare workspaces vs directory-per-env vs Terragrunt for our team size and pick one. 2. Show the resulting folder tree. 3. Explain how shared modules, per-env variables, and state are separated. 4. Describe the apply workflow (who runs what, in what order) and how prod is protected. Justify the trade-offs — do not just give the most popular answer. </task>

Recommends and lays out a multi-environment Terraform structure with a clear apply workflow and trade-off analysis.

💡

Pro tip: State your team size and pain point — the right answer for a 3-person team is wrong for a 50-person org.

Write Terraform Plan Output Explainer

10/30

<context> Terraform plan output (paste): [PASTE FULL terraform plan OUTPUT] Change I intended: [WHAT I MEANT TO CHANGE] </context> <task> Explain this plan in plain language before I apply. 1. Summarize every create, update, and especially destroy/replace in one line each. 2. Flag any resource being replaced (destroyed and recreated) and what data loss or downtime that risks. 3. Tell me whether the plan matches my stated intent or if there is unexpected drift. 4. Give a go / no-go recommendation with reasoning. Highlight destructive changes in bold at the top. </task>

Translates a raw terraform plan into a human summary that surfaces destructive replacements before you apply.

💡

Pro tip: Run this on every prod apply — the in-place vs replace distinction is exactly what causes accidental outages.

Docker & Kubernetes

5 prompts

Write a Production Dockerfile

11/30

<context> App: [LANGUAGE + FRAMEWORK + VERSION] Build step: [e.g. npm run build / go build / pip install] Runtime port: [PORT] Constraints: [SMALL IMAGE / NON-ROOT REQUIRED / DISTROLESS] </context> <task> Write an optimized, secure Dockerfile for this app. 1. Use a multi-stage build to keep the final image minimal. 2. Run as a non-root user and set a read-only filesystem where possible. 3. Order layers for maximum cache reuse (dependencies before source). 4. Add a HEALTHCHECK and pin the base image to a digest, not a floating tag. 5. Document the final image size estimate and what each stage does. Output the Dockerfile, a matching .dockerignore, and the build command. </task>

Produces a multi-stage, non-root, cache-optimized Dockerfile plus a .dockerignore and build command.

💡

Pro tip: Ask it to pin the base image to a SHA digest — floating tags like :latest silently break reproducible builds.

Generate Kubernetes Manifests

12/30

<context> Service: [NAME + RUNTIME] Image: [REGISTRY/IMAGE:TAG] Replicas: [NUMBER] Resource needs: [CPU/MEM REQUESTS + LIMITS, or unknown] Exposure: [INTERNAL / PUBLIC VIA INGRESS] Config source: [CONFIGMAP / SECRET / ENV] </context> <task> Generate complete Kubernetes manifests for this service. 1. Deployment with resource requests/limits, liveness and readiness probes. 2. Service and (if public) an Ingress with the correct annotations. 3. ConfigMap and Secret references — never inline secret values. 4. A PodDisruptionBudget and an HPA based on CPU/memory. 5. securityContext: non-root, drop all capabilities, read-only root FS. Output each manifest separately and note any value I should tune. </task>

Generates a full Kubernetes manifest set with probes, autoscaling, disruption budgets, and a hardened securityContext.

💡

Pro tip: If you do not know resource requests yet, ask it to suggest conservative starting values and how to right-size from metrics later.

Debug a CrashLoopBackOff Pod

13/30

<context> Symptom: [CrashLoopBackOff / ImagePullBackOff / OOMKilled / Pending] kubectl describe output (paste): [PASTE kubectl describe pod OUTPUT] Logs (paste): [PASTE kubectl logs --previous OUTPUT] </context> <task> Diagnose why this pod is not running. 1. State the single most likely cause, quoting the events or log line that proves it. 2. Give the exact kubectl commands to confirm the diagnosis. 3. Provide the fix — manifest change, resource bump, or image correction — as a diff. 4. If it is OOMKilled or resource-related, recommend specific requests/limits. No generic checklists — diagnose from the pasted output. </task>

Pinpoints why a pod is crashing from your describe and log output, with confirming commands and a fix diff.

💡

Pro tip: Always include kubectl logs --previous — the current logs are usually empty on a crash-looping pod.

Optimize Resource Requests & Limits

14/30

<context> Workload: [SERVICE NAME] Observed usage (paste from metrics): [PASTE kubectl top / PROMETHEUS p95 CPU & MEM] Current requests/limits: [VALUES OR none] Goal: [STABILITY / DENSITY / COST] </context> <task> Right-size this workload's resources. 1. Recommend requests and limits from the observed p95/p99 usage, with headroom. 2. Explain the QoS class the result lands in and whether that fits the goal. 3. Flag if limits risk CPU throttling or OOM kills. 4. Suggest whether an HPA or VPA is the better autoscaler here. Show the resources block ready to paste into the manifest. </task>

Turns observed usage metrics into right-sized requests, limits, QoS class, and an autoscaler recommendation.

💡

Pro tip: Feed it p95/p99 numbers, not averages — sizing on averages is exactly how pods get OOMKilled under load spikes.

Write a Helm Chart or Kustomize Overlay

15/30

<context> Tooling: [HELM / KUSTOMIZE] Base service: [DESCRIBE OR PASTE EXISTING MANIFESTS] Environments needing overrides: [dev, staging, prod] Values that differ per env: [REPLICAS, IMAGE TAG, RESOURCES, INGRESS HOST] </context> <task> Templatize this service for multiple environments. 1. For Helm: produce Chart.yaml, values.yaml, and templated manifests with the per-env values exposed. 2. For Kustomize: produce a base plus one overlay per environment with patches. 3. Keep secrets out of values — reference an external secret source. 4. Show the install/apply command for each environment. Use the tool I named and explain any non-obvious templating choice. </task>

Templatizes your service into a Helm chart or Kustomize overlay set with per-environment overrides and install commands.

💡

Pro tip: Pick one tool and commit — asking for both Helm and Kustomize produces shallow output for each.

Observability & Alerting

5 prompts

Design SLOs and Error Budgets

16/30

<context> Service: [NAME + WHAT IT DOES] User-facing operations: [e.g. checkout, search, login] Current reliability: [KNOWN UPTIME OR unknown] Metrics source: [PROMETHEUS / DATADOG / CLOUDWATCH] </context> <task> Define SLOs and error budgets for this service. 1. Recommend 2-3 SLIs (availability, latency, error rate) that reflect real user pain. 2. Set target SLOs with realistic numbers and justify them — not blanket 99.99%. 3. Calculate the monthly error budget for each. 4. Define an error-budget policy: what the team does when the budget is burning fast. Keep targets defensible for our stated reliability today. </task>

Defines user-centric SLIs, realistic SLO targets, error budgets, and a burn-rate policy for your service.

💡

Pro tip: Push back on 99.99% defaults — start where you actually are today, then tighten, or the SLO loses all meaning.

Write Prometheus Alerting Rules

17/30

<context> Service: [NAME] Metrics available: [LIST KEY METRIC NAMES, e.g. http_requests_total, http_request_duration_seconds] SLOs: [FROM YOUR SLO DOC, OR describe] Notification target: [PAGERDUTY / SLACK / OPSGENIE] </context> <task> Write Prometheus alerting rules tied to symptoms, not causes. 1. Create multi-window, multi-burn-rate alerts for the SLOs (fast burn pages, slow burn tickets). 2. Add saturation alerts for CPU, memory, and disk with sensible 'for' durations. 3. Avoid noisy single-spike alerts — require sustained breaches. 4. Include clear labels (severity, team) and annotation templates with a runbook link. Output valid PromQL alerting rule YAML. </task>

Produces symptom-based Prometheus alert rules with multi-burn-rate logic and runbook annotations.

💡

Pro tip: Ask for multi-burn-rate alerts specifically — they page on fast burns but only ticket slow ones, killing alert fatigue.

Build a Dashboard Spec

18/30

<context> Tool: [GRAFANA / DATADOG / CLOUDWATCH] Service: [NAME + ARCHITECTURE] Audience: [ON-CALL ENGINEER / EXEC / TEAM] Key metrics: [LIST OR describe] </context> <task> Design a dashboard layout for this audience. 1. Organize panels by the RED method (Rate, Errors, Duration) or USE for infra. 2. Put the 3 panels that answer 'is it broken right now' at the top. 3. Specify each panel: metric/query, visualization type, and threshold colors. 4. Note which panels link to deeper drill-down dashboards. If Grafana, output the panel queries; otherwise describe each panel precisely. </task>

Lays out a purpose-built dashboard organized by RED/USE with panel queries and drill-down structure.

💡

Pro tip: Tell it the audience — an on-call dashboard and an exec dashboard share almost no panels.

Set Up Structured Logging

19/30

<context> Language/framework: [STACK] Log destination: [LOKI / ELASTIC / CLOUDWATCH / DATADOG] Current logging: [PLAINTEXT / NONE / INCONSISTENT] Compliance: [PII REDACTION NEEDED? YES/NO] </context> <task> Design a structured logging standard for this service. 1. Define a JSON log schema: timestamp, level, message, trace_id, span_id, and key context fields. 2. Recommend log levels and what belongs at each — stop logging everything at INFO. 3. Show the logger config/snippet for our stack. 4. If PII is in scope, specify fields to redact and how. 5. Add correlation so logs join with traces. Output the schema and a code snippet. </task>

Establishes a JSON logging schema with trace correlation, level discipline, and PII redaction for your stack.

💡

Pro tip: Insist on a trace_id field — without it, logs and traces never join and on-call debugging stays painful.

Reduce Alert Noise

20/30

<context> Alerting tool: [TOOL] Noisy alerts (paste a sample week): [PASTE ALERT TITLES + FIRE COUNTS] On-call pain: [PAGED AT NIGHT / IGNORED ALERTS / DUPLICATES] </context> <task> Audit our alerting for noise and fix it. 1. Categorize each alert as actionable, informational, or pure noise. 2. For noisy ones, recommend deletion, threshold change, longer 'for' duration, or grouping/inhibition. 3. Identify alerts that should be tickets, not pages. 4. Propose a deduplication/grouping rule so one incident sends one page. Return a table: alert, verdict, exact change. </task>

Triages your real alert list into actionable vs noise and prescribes the exact threshold, grouping, or deletion fix.

💡

Pro tip: Bring a real week of fire counts — ChatGPT can spot the 3 alerts generating 80% of your pages instantly.

Incident Response & Postmortems

5 prompts

Live Incident Triage Assistant

21/30

<context> Symptom: [WHAT USERS SEE, e.g. 5xx spike on checkout] Started: [TIME] Recent changes: [DEPLOYS / CONFIG / NONE in last 2h] Signals (paste any): [ERROR RATES, LATENCY, SATURATION] Stack: [YOUR STACK] </context> <task> Act as my incident triage partner. Be fast and decisive. 1. Give me the top 3 hypotheses ranked by likelihood for this symptom + recent changes. 2. For each, the single fastest check to confirm or kill it (exact command/query). 3. The safest immediate mitigation (rollback, scale, feature flag, traffic shift) even before root cause. 4. What to communicate to stakeholders right now in one sentence. Lead with the action, not the explanation. </task>

Acts as a fast triage partner: ranked hypotheses, the quickest check for each, and a safe immediate mitigation.

💡

Pro tip: Use this DURING the incident — prioritize the mitigation step; root cause can wait for the postmortem.

Write a Blameless Postmortem

22/30

<context> Incident: [ONE-LINE SUMMARY] Impact: [WHO, HOW MANY, HOW LONG, $ IF KNOWN] Timeline (rough notes): [PASTE YOUR RAW NOTES + TIMESTAMPS] Root cause as you understand it: [DESCRIBE] </context> <task> Write a blameless postmortem from these notes. 1. Sections: summary, impact, detailed timeline, root cause, contributing factors, what went well, what went wrong, action items. 2. Use the 5-whys to get past the proximate cause to the systemic one. 3. Keep language blameless — focus on systems and gaps, never individuals. 4. Make every action item specific, owned, and dated; mark each as prevent, detect, or mitigate. Output the full document in markdown. </task>

Turns raw incident notes into a complete blameless postmortem with a 5-whys root cause and owned action items.

💡

Pro tip: Ask it to tag each action item as prevent/detect/mitigate — a postmortem with only fixes and no faster-detection items is incomplete.

Generate an Incident Runbook

23/30

<context> Failure scenario: [e.g. database connection pool exhausted] Service: [NAME + STACK] Who responds: [ON-CALL ROLE] Tools available: [kubectl, psql, dashboards, etc.] </context> <task> Write a runbook for this specific failure. 1. Detection: how on-call knows this is happening (the alert and the dashboard panel). 2. Diagnosis: ordered checks with exact commands to confirm the scenario. 3. Mitigation: step-by-step actions, with the safest reversible one first. 4. Escalation: when and to whom, with the trigger condition. 5. Verification: how to confirm recovery. Write it so a tired engineer at 3am can follow it without thinking. </task>

Produces a copy-ready runbook with detection, exact diagnostic commands, mitigation order, escalation, and verification.

💡

Pro tip: Write runbooks for your top 3 alert types first — those cover most pages and pay back the effort immediately.

Draft Status Page & Stakeholder Updates

24/30

<context> Incident severity: [SEV1 / SEV2 / SEV3] What is affected: [SERVICE / FEATURE] Current status: [INVESTIGATING / IDENTIFIED / MONITORING / RESOLVED] Audience: [PUBLIC STATUS PAGE / INTERNAL EXECS / ENTERPRISE CUSTOMERS] </context> <task> Write incident communications for this stage. 1. A public status-page post — honest, no jargon, no blame, no premature ETA. 2. A one-line internal exec update with business impact. 3. A follow-up template for the next update with a stated cadence. Match tone to severity. Never overpromise a resolution time. </task>

Generates aligned status-page, exec, and follow-up incident messages tuned to severity and audience.

💡

Pro tip: Keep this prompt handy in a snippet — clear comms during a SEV1 buys you trust you cannot earn afterward.

Chaos / Game Day Scenario Design

25/30

<context> System: [DESCRIBE ARCHITECTURE + DEPENDENCIES] Maturity: [NEVER DONE CHAOS / OCCASIONAL / REGULAR] Blast radius limit: [STAGING ONLY / PROD OFF-PEAK] </context> <task> Design a game day exercise to test our resilience. 1. Propose 3 failure scenarios ordered from safest to scariest (e.g. kill a pod, drop a dependency, simulate AZ loss). 2. For each: hypothesis, the exact fault to inject, what we expect to happen, and the abort condition. 3. Define the metrics that prove graceful degradation vs cascading failure. 4. List the rollback/stop plan and who holds the kill switch. Keep blast radius within the limit I stated. </task>

Designs a graduated game day with clear hypotheses, fault injections, success metrics, and abort conditions.

💡

Pro tip: Start with the safest scenario in staging — a single pod kill teaches more than you expect before you touch prod.

Go from copy-pasting to actually mastering AI.

AI Academy: 300+ hands-on tutorials on ChatGPT, Claude, Midjourney, and 50+ other tools. New tutorials added every week.

Start Your Free Trial

Automation Scripting & Cloud Cost

5 prompts

Write a Robust Shell Script

26/30

<context> Task the script must do: [DESCRIBE] Shell: [BASH / SH / ZSH] Runs where: [CI / CRON / LOCAL / CONTAINER] Inputs: [ARGS / ENV VARS] </context> <task> Write a production-quality shell script for this task. 1. Start with set -euo pipefail and a clear usage/help function. 2. Validate inputs and fail early with helpful messages. 3. Make it idempotent — safe to re-run without side effects. 4. Add logging, a --dry-run flag, and trap-based cleanup on exit. 5. Quote all variables and avoid common bashisms if shell is sh. Output the full script with brief inline comments only where non-obvious. </task>

Produces a safe, idempotent shell script with strict mode, input validation, dry-run, and cleanup traps.

💡

Pro tip: Always demand set -euo pipefail and --dry-run — they turn a dangerous one-off into something you can trust in cron.

Convert a Manual Process to Automation

27/30

<context> Manual process (step by step): [LIST THE STEPS YOU DO BY HAND] Frequency: [HOW OFTEN] Tools/CLIs involved: [LIST] Preferred language: [BASH / PYTHON] </context> <task> Automate this manual runbook. 1. Map each manual step to an automatable command or API call. 2. Flag steps that need human judgment and should stay manual or become a confirmation prompt. 3. Write the script with error handling and a clear log of what it did. 4. Suggest how to schedule it (cron, CI, scheduled lambda) and how to alert on failure. Call out anything risky to automate without a safety check. </task>

Converts a hand-run process into a scheduled, error-handled script while flagging steps that should stay manual.

💡

Pro tip: Be honest about which steps need judgment — automating an irreversible deletion without a confirm step is how outages happen.

Find Cloud Cost Savings

28/30

<context> Cloud: [AWS / GCP / AZURE] Monthly spend: [APPROX $] Biggest cost lines (paste from billing/cost explorer): [PASTE TOP SERVICES BY COST] Workload pattern: [STEADY / BURSTY / DEV-HEAVY] </context> <task> Find concrete cost-reduction opportunities. 1. For each top cost line, list specific levers (right-sizing, savings plans/committed use, spot, storage tiering, idle cleanup). 2. Estimate the rough % saving and the risk/effort for each. 3. Identify quick wins (zero-risk, this week) vs structural changes. 4. Flag anything that would hurt reliability if cut, so we avoid it. Rank recommendations by savings-per-effort. </task>

Turns your top billing lines into ranked, risk-rated cost-savings levers with quick wins called out.

💡

Pro tip: Paste real cost-explorer data — generic 'use spot instances' advice is worthless without knowing where your money actually goes.

Audit for Idle & Orphaned Resources

29/30

<context> Cloud: [CLOUD] Access available: [CLI CONFIGURED / READ-ONLY CONSOLE] Scope: [WHOLE ACCOUNT / ONE PROJECT/REGION] </context> <task> Help me hunt down waste. 1. List the most common idle/orphaned resources to check (unattached volumes, idle load balancers, old snapshots, unused IPs, stopped-but-billed instances, empty buckets with lifecycle gaps). 2. For each, give the exact CLI command to find them. 3. Provide a safe deletion command and a check to run before deleting. 4. Suggest a tagging or budget-alert policy so waste does not silently return. Make the find commands read-only and copy-paste ready. </task>

Provides copy-ready CLI commands to surface idle and orphaned resources, with safe-delete checks and a prevention policy.

💡

Pro tip: Run the read-only find commands first and review the list — never pipe a discovery command straight into delete.

Build a Scheduled Maintenance Automation

30/30

<context> Maintenance task: [e.g. rotate logs, prune old images, snapshot DB, scale down dev nightly] Environment: [WHERE IT RUNS] Schedule: [WHEN] Notification: [SLACK / EMAIL / NONE] </context> <task> Design a scheduled automation for this maintenance task. 1. Write the script/job that performs the task safely and idempotently. 2. Provide the scheduler config (cron expression, CI schedule, or CronJob manifest) for our environment. 3. Add success and failure notifications to the named channel. 4. Include a retention/safety rule (e.g. always keep the last N) so it never deletes too much. 5. Add a manual-run path for testing. Output the script and the scheduler config together. </task>

Generates a safe, idempotent maintenance job plus its scheduler config, notifications, and a retention guard.

💡

Pro tip: Always bake in a keep-last-N rule — a pruning job with an off-by-one bug and no floor can wipe everything overnight.

Frequently Asked Questions

Often yes for the structure and 80% of the content, but never paste output straight to production. Treat it as a senior engineer's first draft: run it through terraform plan, kubectl --dry-run, or a CI run, and review every secret, IAM permission, and destructive change. The prompts here ask ChatGPT to flag assumptions so you know what to verify.

Every prompt has bracketed placeholders like [AWS / GCP / AZURE] or [PASTE YOUR PLAN OUTPUT]. Replace them with your real values before sending. The more specific you are — exact tool names, real log output, actual cost lines — the better and less generic the response. Vague inputs produce vague boilerplate.

Use the most capable reasoning model available for config generation, plan analysis, and incident triage, where correctness matters most. For simple shell scripts or formatting, a faster model is fine. Whichever you use, paste raw logs and outputs rather than summaries — these models read stack traces, plan diffs, and YAML far better than paraphrased descriptions.

Be careful. Redact secrets, API keys, internal hostnames, customer data, and account IDs before pasting, and check your organization's data policy. Many of these prompts work fine with sanitized snippets. For sensitive infrastructure, use an enterprise plan with data-retention controls or a self-hosted model.

No — they make those tools faster to use. ChatGPT writes the Terraform you still apply, the alert rules you still load into Prometheus, and the postmortem you still circulate. It is a force multiplier for the engineer, not a replacement for the platform, the review process, or your judgment.

Prompts are the starting line. Tutorials are the finish.

A growing library of 300+ hands-on tutorials on ChatGPT, Claude, Midjourney, and 50+ AI tools. New tutorials added every week.

Start 7-Day Free Trial Explore AI for Product Managers

7-day free trial. Cancel anytime.

Browse All Prompt Collections