Prompt Library

Turn Raw Data Into Clear Answers — Without Writing a Single Formula

35 copy-paste prompts

35 copy-paste prompts for Excel formulas, SQL queries, data cleaning, visualization, and statistical analysis — ready to use in seconds.

In short: This page contains 35 copy-paste ready prompts, organized into 6 categories with a description and pro tip for each. The first 15 prompts are free instantly — no signup needed. Hand-curated and tested by the AI Academy team.

By Louis Corneloup · Founder, Techpresso

Last updated May 15, 2026·Hand-curated & tested by the AI Academy team

Excel & Spreadsheets

6 prompts

Complex Formula Builder

1/35

I need an Excel formula to solve the following problem: [describe what you want to calculate in plain English]. My spreadsheet has these columns: [list column names and what they contain]. The formula needs to handle these edge cases: [empty cells / duplicates / error values / date formats, etc.]. I am using [Excel version or Google Sheets]. Write the formula, explain every function used in plain English, and show me an example with sample data so I can verify it works before applying it to my full dataset.

Converts plain-English requirements into complete, documented Excel formulas — no formula syntax knowledge required.

💡

Pro tip: Describe your edge cases explicitly. The difference between a formula that works 90% of the time and 100% of the time is almost always an unhandled edge case.

Pivot Table Designer

2/35

Help me design a pivot table for the following analysis goal: [describe what insight you want to find]. My data has these columns: [list column headers and data types]. The audience for this analysis is [describe who will read it, e.g., sales managers / finance team / executives]. Recommend: (1) what to put in Rows, Columns, Values, and Filters, (2) which value field settings to use (Sum, Count, Average, etc.), (3) any calculated fields I should add, and (4) how to sort and filter to highlight the most important patterns. Also suggest a chart type that would complement the pivot table.

Designs the optimal pivot table configuration for your specific analysis goal, saving the trial-and-error of dragging fields around.

💡

Pro tip: Always start with the question you want to answer, not the data you have. The clearer the question, the better the pivot table design.

VLOOKUP / INDEX-MATCH Helper

3/35

I need to look up data between two spreadsheets. Here is what I am trying to do: [explain in plain English, e.g., I want to pull the customer name from Sheet2 into Sheet1 using the customer ID]. Sheet1 has these columns: [list them with their column letters]. Sheet2 has these columns: [list them with their column letters]. The lookup values are [exact / approximate match]. Should I use VLOOKUP or INDEX-MATCH for this situation? Write both versions of the formula, explain the difference, and tell me which one to use and why. Also flag any common mistakes people make with this type of lookup.

Picks the right lookup approach and writes both formula variants, so you understand when to use each one going forward.

💡

Pro tip: INDEX-MATCH is almost always the safer choice — it works when you insert columns and can look left, which VLOOKUP cannot do.

Conditional Formatting Rules

4/35

Set up conditional formatting rules for my Excel spreadsheet to make this data easier to read at a glance. My goal is: [describe what you want to highlight, e.g., flag overdue dates in red, highlight the top 10% of sales in green, show a heat map across a performance table]. My data range is [describe the range]. Write the exact conditional formatting rule(s) I need to create, including: (1) the rule type to select in Excel, (2) the formula if it is a custom formula rule, (3) the exact color settings, and (4) the apply-to range. Walk me through setting it up step by step.

Creates precise conditional formatting instructions with exact formulas and color settings so your data instantly communicates its own story.

💡

Pro tip: Limit yourself to 3 or fewer colors per sheet. More than that creates visual noise rather than clarity.

Data Cleanup Formulas

5/35

My spreadsheet has messy data that I need to clean before analysis. Here are the specific problems: [list the issues, e.g., extra spaces in names, inconsistent date formats, phone numbers with mixed formatting, text that should be numbers, all-caps entries that should be title case]. The data is in column [column letter] and I have [number] rows. For each problem, write the exact Excel formula to fix it, tell me whether to use it in a helper column or replace in place, and give me any warnings about data I might lose or change accidentally.

Produces targeted cleanup formulas for your specific messy data problems, so you spend minutes not hours preparing data for analysis.

💡

Pro tip: Always work on a copy of your data before running cleanup formulas. One wrong formula on 50,000 rows is painful to undo.

VBA Macro Generator

6/35

Write a VBA macro for Excel that does the following: [describe the task in plain English, e.g., loops through every row, checks if column C is empty, and if so deletes the entire row]. My workbook structure: [describe the sheet names and what data is where]. The macro should run when [on button click / when the workbook opens / on a schedule]. Important constraints: [e.g., must not delete sheets, must skip header row, must handle the case where the sheet is empty]. After the code, explain what each section does in plain English and list any potential errors I should watch out for.

Generates documented VBA code for your exact use case with error handling guidance, even if you have never written a macro before.

💡

Pro tip: Always save your workbook as .xlsm (macro-enabled) before running new VBA code, and test on a small sample of rows first.

Prompts get you started. Tutorials level you up.

A growing library of 300+ hands-on AI tutorials. New tutorials added every week.

Start 7-Day Free Trial

SQL Queries

6 prompts

Write SQL from Plain English

7/35

Write a SQL query based on this request: [describe in plain English what you want to find, e.g., give me the total revenue per product category for orders placed in the last 90 days, only including completed orders]. My database has these relevant tables: [list table names and key columns]. The database is [MySQL / PostgreSQL / SQL Server / BigQuery / SQLite]. Write the query with clear comments explaining each section, use readable aliases, and format it so I can copy it straight into my query editor. If there are multiple ways to write this query, show me the most readable version and briefly mention the alternatives.

Translates plain-English data questions into ready-to-run SQL without needing to remember exact syntax or table structures by heart.

💡

Pro tip: Describe your desired output as if you were describing a report to a colleague. The more concrete your expected result, the better the generated query.

Optimize Slow Queries

8/35

My SQL query is running too slowly and I need help optimizing it. Here is the query: [paste your query]. The database is [database type and version]. The main tables involved have approximately [row counts] rows. Here is the execution plan output if available: [paste EXPLAIN / EXPLAIN ANALYZE output]. The query currently takes [current runtime] and I need it to run in under [target runtime]. Identify the specific bottlenecks, suggest concrete optimizations in order of likely impact (indexing, rewriting joins, subquery vs. CTE, etc.), and rewrite the query incorporating your top recommendations. Explain each change so I understand the trade-offs.

Diagnoses performance bottlenecks in slow queries and rewrites them with prioritized optimizations and clear explanations.

💡

Pro tip: Run EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL) before and after optimization to confirm the improvement is real, not just theoretical.

Explain Complex Queries

9/35

Explain this SQL query in plain English so I fully understand what it does: [paste the query]. Walk me through it in two ways: (1) a high-level summary of what the query produces in 2-3 sentences, and (2) a line-by-line explanation of each clause, subquery, window function, or CTE, explaining what each piece does and why it was written that way. Also tell me: what assumptions does this query make about the data? Are there any edge cases where it might produce unexpected results? What would I need to change if the table structure changed slightly?

Demystifies inherited or complex SQL code with both a plain-English summary and a clause-by-clause breakdown.

💡

Pro tip: Understanding the "why" behind a query is as important as the "what." Ask ChatGPT to explain the intent, not just the mechanics.

Join Multiple Tables

10/35

Help me write a SQL query that joins multiple tables to answer this question: [describe your analysis goal]. Here are the tables I am working with: [list each table, its key columns, and the primary/foreign key relationships between them]. I want the final output to include these columns: [list desired output columns and their source tables]. Handle these specific cases: [e.g., customers with no orders should still appear, use the most recent order if there are multiple, exclude deleted records from the users table]. Write the query with comments, explain why you chose each JOIN type (INNER, LEFT, etc.), and flag any potential data quality issues that could affect the results.

Builds multi-table JOIN queries with appropriate join types and data quality flags, so your output captures exactly the right records.

💡

Pro tip: Always verify your row counts after adding each JOIN. Unexpected row multiplication is the most common multi-table query bug.

Aggregate and Group Data

11/35

Write a SQL query to aggregate and group my data for the following analysis: [describe the aggregation you need, e.g., monthly active users by plan type, average order value by acquisition channel and cohort month]. The relevant tables are: [table names and columns]. I want the results broken down by: [list the dimensions to group by]. The metrics I need are: [list the metrics with any specific calculation rules, e.g., count distinct users not total sessions]. Add a HAVING clause to filter out groups with fewer than [minimum count] records. Sort the results by [column] in [ascending/descending] order.

Builds aggregation queries with the right GROUP BY structure, metric calculations, and filters for your specific reporting needs.

💡

Pro tip: Use COUNT(DISTINCT column) instead of COUNT(*) when you need unique counts — forgetting DISTINCT is one of the most common aggregation errors.

Migration Scripts

12/35

Write a SQL migration script to make the following changes to my database: [describe all changes needed, e.g., add a new column, rename a table, split one table into two, change a column data type]. The current table structure is: [paste CREATE TABLE statements or describe the current schema]. Database: [database type and version]. Requirements: (1) the migration must be safe to run on a production database with live traffic, (2) include a rollback script to undo every change, (3) handle existing data migration if column types or structures change, (4) include transaction wrapping so the whole migration succeeds or fails atomically. Add comments explaining each step.

Generates production-safe migration scripts with rollback procedures and transaction wrapping for every schema change.

💡

Pro tip: Test every migration script on a staging database with a recent production data copy before running it on production — no exceptions.

Data Cleaning

6 prompts

Deduplication Strategy

13/35

Help me build a deduplication strategy for my dataset. The dataset is: [describe it, e.g., a customer list with 200,000 rows from three different CRM imports]. The problem: [describe the duplicate issue, e.g., same customer appearing with slightly different email addresses, company names with typos, or records merged from different systems]. Fields available for matching: [list fields and their data quality, e.g., email is 80% populated, company name has inconsistent formatting]. Recommend: (1) a step-by-step deduplication approach, (2) which fields to use as matching keys and in what priority order, (3) how to handle fuzzy matching for names and companies, (4) how to decide which duplicate record to keep, and (5) how to flag uncertain matches for human review.

Designs a complete deduplication strategy tailored to your specific data source and field availability, including how to handle edge cases.

💡

Pro tip: Never auto-delete duplicates without a human review step for ambiguous matches. False positives in dedup can permanently destroy valid records.

Standardize Formats

14/35

My dataset has inconsistent formatting across several fields that I need to standardize before analysis. The fields with issues are: [describe each field and the inconsistency, e.g., phone numbers are stored as (555) 123-4567, 555.123.4567, +15551234567, and 5551234567; dates are stored as MM/DD/YYYY, DD-MM-YYYY, and YYYY-MM-DD; state names are stored as full names, two-letter codes, and mixed case]. For each field, tell me: (1) the target standard format to use and why, (2) the transformation logic in plain English, (3) the code or formula to apply the transformation in [Python/pandas, SQL, or Excel — pick the one I should use], and (4) how to handle values that do not fit any known pattern.

Creates field-by-field standardization logic with code for your chosen tool and instructions for handling non-conforming values.

💡

Pro tip: Standardize into the format that your downstream tools and stakeholders expect, not just whatever is most convenient to generate.

Handle Missing Values

15/35

I need a strategy for handling missing values in my dataset before analysis. Dataset description: [describe the dataset and its purpose]. Here are the fields with missing values: [for each field, list: field name, data type, percentage missing, and why you think values are missing — e.g., optional field, data entry error, system migration gap]. My analysis goal is: [describe what you will do with the clean data]. For each field with missing values, recommend the best strategy from these options: delete rows, impute with mean/median/mode, impute with a model, flag as a separate category, or leave as-is with documentation. Explain the reasoning for each recommendation and warn me about any bias or distortion each approach could introduce.

Recommends field-by-field missing value strategies based on your analysis goal and the likely reason values are missing.

💡

Pro tip: Always document your missing value treatment decisions. Reviewers and future analysts need to know what assumptions were baked into the data.

Parse Unstructured Text

16/35

I have unstructured text data that I need to parse into structured columns for analysis. The text field contains: [describe the content, e.g., customer support tickets, product descriptions with mixed formatting, addresses in free-text fields, survey open-ends]. Here are 5 example values from the field: [paste 5 representative examples]. I want to extract: [list the specific pieces of information to extract, e.g., product name, issue category, sentiment, zip code, order number]. The tool I am using is [Python/pandas, SQL, Excel, or another tool]. Write the extraction logic with regular expressions or text parsing code, explain the pattern being matched for each extraction, and tell me what percentage of records I should realistically expect to parse successfully.

Builds regex and text parsing logic for your specific unstructured field, with realistic expectations for parse success rates.

💡

Pro tip: Always sample 20-30 values manually before writing parsing logic. The patterns you think exist often look different from what is actually in the data.

Validate Data Quality

17/35

Help me build a data quality validation checklist and automated checks for my dataset before I use it for analysis or reporting. The dataset is: [describe it]. The key fields I need to validate are: [list field names, data types, and business rules, e.g., order_date must be in the past, quantity must be a positive integer, customer_id must exist in the customers table, total must equal sum of line items]. The tool I am using is [SQL, Python, Excel, or a specific BI tool]. For each field, write a validation check that flags records that fail the rule. Then write a summary query that shows the count and percentage of failures per rule. Also recommend 3 additional data quality checks I probably have not thought of for this type of dataset.

Generates automated validation checks for your specific business rules and flags failures before bad data contaminates your analysis.

💡

Pro tip: Run data quality checks every time you receive a new data extract. Data that passed last month may fail this month due to upstream system changes.

Merge Datasets

18/35

I need to merge two or more datasets and I want to do it correctly without losing records or creating duplicates. Here are the datasets I am merging: [describe each one — source, row count, key fields, and what it represents]. The join key(s) I am planning to use: [list the fields]. Known data quality issues across the datasets: [e.g., customer IDs are formatted differently between systems, some records exist in one dataset but not the other]. My goal for the merged dataset: [describe what analysis or output you need it for]. Recommend: (1) the correct join type and why, (2) how to reconcile key formatting differences before merging, (3) how to handle records that appear in only one dataset, (4) which fields to keep when the same field exists in both datasets, and (5) validation checks to run after the merge to confirm it was done correctly.

Plans the correct merge strategy for your specific datasets, including key reconciliation, join type selection, and post-merge validation.

💡

Pro tip: A row count check after merging is mandatory. If your output has significantly more or fewer rows than expected, stop and diagnose before proceeding.

Like these prompts? There are full tutorials behind them.

Learn the workflows, not just the prompts. 300+ easy-to-follow tutorials inside AI Academy — and growing every week.

Try AI Academy Free

Statistical Analysis

5 prompts

Regression Analysis Setup

19/35

Help me set up a regression analysis to answer this business question: [describe the question, e.g., what factors predict customer churn, which marketing channels drive the most revenue, how does pricing affect conversion rate]. My dataset has these variables: [list the available variables, their data types, and a brief description of each]. I want to predict: [outcome variable]. Walk me through: (1) which regression type is appropriate (linear, logistic, multiple, etc.) and why, (2) which variables to include as predictors and which to exclude, (3) how to handle categorical variables, (4) how to check for multicollinearity, (5) how to interpret the key output metrics (R-squared, p-values, coefficients), and (6) what I would need to do in [Python/R/Excel] to run this analysis. Keep explanations accessible for someone with basic stats knowledge.

Guides you through selecting the right regression type, preparing variables, and interpreting results for your specific business question.

💡

Pro tip: Correlation does not imply causation. Regression finds associations — always think critically about whether a relationship is causal before acting on it.

Hypothesis Testing

20/35

I want to run a hypothesis test to answer this question: [describe the question, e.g., did the new checkout flow increase conversion rate, is the average order value different between mobile and desktop users, does the new email subject line have a higher open rate]. Here is my data: [describe what you have — sample sizes, means, standard deviations, or proportions if known]. Walk me through: (1) which statistical test to use and why (t-test, chi-square, z-test, Mann-Whitney, etc.), (2) how to state the null and alternative hypothesis, (3) what sample size I need for statistical power of 80% at a 5% significance level, (4) how to run the test in [Python/R/Excel], and (5) how to interpret the p-value and what conclusion I can and cannot draw from it.

Identifies the right statistical test for your question, sets up the hypotheses correctly, and explains how to interpret the results.

💡

Pro tip: Statistical significance is not the same as practical significance. Always ask whether the effect size is large enough to be worth acting on, even if the p-value is small.

Correlation Analysis

21/35

I want to understand the relationships between variables in my dataset. The dataset covers [describe the dataset and context]. The variables I am interested in are: [list them with data types]. Help me: (1) choose the right correlation method (Pearson, Spearman, or point-biserial) for each pair based on data types, (2) build a correlation matrix and explain how to read it, (3) identify which correlations are statistically significant vs. noise, (4) flag any pairs that are so highly correlated they would cause multicollinearity problems in a regression, (5) explain any surprising or counterintuitive correlations I should investigate further, and (6) give me the code to run this in [Python/pandas/R]. List the top 3 most actionable insights I should focus on.

Runs a complete correlation analysis with the right methods for your variable types and surfaces the most actionable relationships.

💡

Pro tip: Build scatter plots for your strongest correlations before drawing conclusions. A single outlier can dramatically inflate a Pearson correlation coefficient.

Forecasting

22/35

Help me build a forecast for [what you are forecasting, e.g., monthly revenue, website traffic, inventory demand]. I have historical data for [time period] at [daily/weekly/monthly] granularity. The data has these characteristics: [describe any trends, seasonality, or known anomalies, e.g., strong Q4 seasonality, a spike in March from a one-time event, steady upward trend]. I need to forecast [how far ahead]. Recommend: (1) the best forecasting method for my data characteristics (moving average, exponential smoothing, SARIMA, Prophet, etc.) and why, (2) how to handle the known anomaly in the historical data before fitting the model, (3) how to measure forecast accuracy (MAPE, RMSE, etc.), (4) how to build confidence intervals around the forecast, and (5) the Python or Excel code to implement it.

Recommends the right forecasting model for your data patterns, explains accuracy metrics, and provides implementation code.

💡

Pro tip: Always hold out the last 2-3 periods of historical data as a test set to validate your model before using it for future forecasts.

A/B Test Analysis

23/35

Help me analyze the results of an A/B test. Here are the details: What we tested: [describe the change, e.g., new CTA button color, revised pricing page, different email subject line]. Control group: [sample size and conversion rate or mean outcome]. Treatment group: [sample size and conversion rate or mean outcome]. Test duration: [how long it ran]. How users were assigned: [random / by cohort / other]. Secondary metrics observed: [list any other metrics that changed]. Help me: (1) determine if the result is statistically significant, (2) calculate the 95% confidence interval for the lift, (3) identify any signs of sample ratio mismatch or other validity threats, (4) assess whether the test ran long enough to avoid novelty effects, (5) recommend whether to ship, iterate, or roll back, and (6) document the result in a format I can share with stakeholders.

Provides a complete A/B test readout with significance testing, validity checks, and a shipping recommendation backed by the data.

💡

Pro tip: Do not peek at results and stop early just because the data looks good. Early stopping is one of the most common causes of false positives in A/B tests.

Visualization & Charts

6 prompts

Chart Type Recommender

24/35

I want to visualize the following data and need help choosing the right chart type. Data description: [describe what the data represents, e.g., monthly revenue by product line over 2 years, survey responses on a 1-5 scale broken down by age group, funnel conversion rates across 6 stages]. The message I want the chart to communicate: [describe the insight you want the viewer to take away]. The audience: [e.g., executives in a board deck, analysts in a working session, customers in a public report]. Recommend: (1) the best chart type and why, (2) one or two alternatives worth considering and when to use them instead, (3) any design choices (colors, labels, axis scales) that would make the chart more effective, and (4) what to avoid for this specific data type.

Matches the right chart type to your data, message, and audience — so your visualizations communicate clearly rather than just display data.

💡

Pro tip: When in doubt, simpler beats more sophisticated. A clean bar chart your audience understands instantly beats an interactive visualization they need to decode.

Dashboard Layout

25/35

Help me design the layout for a data dashboard. The dashboard is for: [describe the audience and their role, e.g., a sales manager who reviews it daily]. The key questions this dashboard must answer: [list 4-6 specific questions the viewer should be able to answer at a glance]. The metrics available: [list the metrics and their update frequency]. The tool I am using: [Tableau, Power BI, Looker, Google Data Studio, or a coded solution]. Design a layout with: (1) recommended number of panels and their hierarchy from most to least important, (2) which chart type to use for each metric and why, (3) which metrics belong in the top "hero" section vs. the detail section, (4) filter and drill-down interactions to include, and (5) any metrics that are redundant and should be removed to reduce cognitive load.

Produces an opinionated dashboard layout with chart-type assignments and information hierarchy based on what the viewer actually needs to do.

💡

Pro tip: Design around the decision the viewer needs to make, not around the data you have available. Every panel should answer a specific question.

Executive Summary from Data

26/35

Write an executive summary of the following data for a leadership audience. The data shows: [describe or paste the key figures, e.g., monthly revenue, churn rate, NPS score, traffic by channel]. Time period covered: [date range]. My interpretation of what the data means: [share your initial read — what is improving, declining, or surprising]. The audience will use this summary to: [make a budget decision / understand business health / decide on a new initiative]. Write the summary in 3-5 bullet points, each one leading with the insight not the metric. Use plain business language. Flag one risk or area of concern. End with one recommended next action.

Transforms raw metrics into an insight-led executive summary that tells a clear story rather than simply listing numbers.

💡

Pro tip: Lead with the "so what," not the "what." Executives already know numbers are being tracked — they need to know what to think and do about them.

KPI Report Builder

27/35

Build a KPI report template for [team or function, e.g., the marketing team, product team, or customer success team] at [company type]. The report will be shared [weekly/monthly] with [audience]. Include these KPIs: [list 6-8 KPIs with their definitions and data sources]. For each KPI, the template should show: current period value, prior period value, percentage change, target (if applicable), and a one-line commentary field. Design the report with these sections: (1) an executive summary section with 3-4 top-line highlights, (2) a performance table for all KPIs, (3) a section for one deep-dive metric that rotates each period, and (4) a blockers and risks section. Also write the commentary for this period using: [paste your actual data].

Creates a reusable KPI report template with commentary prompts and a pre-filled version using your actual data.

💡

Pro tip: Write commentary on the why, not the what. "Revenue grew 12%" is the data. "Revenue grew 12% driven by enterprise expansion, offsetting a 5% decline in SMB" is the insight.

Data Storytelling

28/35

Help me turn the following data into a compelling narrative for a [presentation / report / board deck]. The data I have: [describe or paste the key figures and trends]. The context: [what has happened in the business or market to explain the data]. The audience: [describe who will read or hear this and what they care about]. Structure the narrative using this arc: (1) the situation — what the data shows and why it matters right now, (2) the complication — the challenge or opportunity the data reveals, (3) the resolution — what we should do next and what outcome we expect. For each section, tell me which specific data points to feature and how to frame them. Write a first draft of the full narrative under 300 words.

Builds a situation-complication-resolution narrative arc around your data, turning a data dump into a story that drives action.

💡

Pro tip: Pick one central insight to anchor the whole story. A narrative with three equally important points has no point.

Automated Report Template

29/35

Create a reusable report template for [report name, e.g., the weekly performance report, the monthly investor update, the quarterly business review]. The report goes to [audience] and covers [topic/function]. Structure the template with: (1) a header section with the reporting period, key highlights in one sentence each, and a RAG status (Red/Amber/Green) for overall performance, (2) a metrics section with a table of [list 5-8 metrics] showing current, prior, and target values, (3) a commentary section with placeholder labels that prompt the analyst to explain variance, (4) a next actions section with owner and due date fields. Also write the instructions I should give to whoever fills in the template each period so they know what level of detail is expected and which sections require narrative vs. just numbers.

Creates a complete, fill-in report template with analyst instructions so any team member can produce a consistent, high-quality report every period.

💡

Pro tip: Include a "do not change the structure" note in your template instructions. Analysts who reformat reports each period break the comparability stakeholders rely on.

Go from copy-pasting to actually mastering AI.

AI Academy: 300+ hands-on tutorials on ChatGPT, Claude, Midjourney, and 50+ other tools. New tutorials added every week.

Start Your Free Trial

Python & Pandas

6 prompts

DataFrame Operations

30/35

Write Python/pandas code to perform the following operations on my DataFrame: [list each operation in plain English, e.g., filter rows where status is active, group by region and calculate sum of revenue, merge with a second DataFrame on customer_id, fill missing values in the age column with the median, rename columns to snake_case, sort by date descending, and export to CSV]. The DataFrame is called df and has these columns: [list column names and data types]. Use method chaining where possible for readability. After the code, explain what each operation does in one sentence and flag any operations that might produce unexpected results if the data has nulls or duplicate keys.

Generates clean, method-chained pandas code for your specific operations with data quality warnings for common edge cases.

💡

Pro tip: Run df.info() and df.describe() before writing any transformation code. Understanding your data types and distributions prevents most common pandas errors.

Data Pipeline Script

31/35

Write a Python data pipeline script that does the following: (1) reads data from [source: CSV file / database / API / S3 bucket], (2) applies these transformations: [list all transformations in plain English], (3) performs these validations: [list quality checks to run], (4) writes the output to [destination: CSV / database table / another S3 bucket], and (5) logs each step with a timestamp so I can debug failures. The script should: handle errors gracefully without crashing the whole pipeline, send an alert (print to console is fine) if any validation fails, be modular with separate functions for ingestion, transformation, validation, and output. Use pandas for transformations. The environment is [local Python / Jupyter notebook / Airflow / scheduled script].

Produces a modular, error-handled data pipeline script with logging, validation, and clearly separated concerns for each stage.

💡

Pro tip: Build your pipeline as functions first, get each one working in a notebook, then assemble them into a script. Debugging inside a monolithic script is much harder.

Web Scraping to Dataset

32/35

Write a Python web scraping script to collect data from [describe the website and what data you want to collect, e.g., product names, prices, and ratings from a product listing page]. The URL structure is: [paste an example URL]. The data I want to extract: [list the fields]. Requirements: (1) use requests and BeautifulSoup or Scrapy depending on what makes more sense for this use case, (2) handle pagination — the site has [describe pagination: next button / URL parameter / infinite scroll], (3) add a 1-2 second delay between requests to avoid overloading the server, (4) handle errors gracefully if a page does not load, (5) save the output to a CSV file with clean column names. Also tell me: is scraping this type of site typically within the terms of service, and are there any legal or ethical considerations I should check?

Builds a polite, error-handled scraping script with pagination support and a reminder to verify terms of service.

💡

Pro tip: Always check the site's robots.txt file and terms of service before scraping. Many sites offer an official API that is faster, more reliable, and legally cleaner.

API Data Extraction

33/35

Write Python code to pull data from the following API: [describe the API, paste the documentation URL or key endpoint details]. I want to retrieve: [describe the data you need]. Authentication method: [API key / OAuth / Bearer token]. Pagination: [describe how pagination works — cursor-based, page number, offset, or no pagination]. Rate limits: [describe any limits, e.g., 100 requests per minute]. The code should: (1) handle authentication correctly, (2) loop through all pages automatically, (3) respect rate limits with appropriate sleeps or retry logic, (4) flatten the JSON response into a flat pandas DataFrame, (5) handle API errors and print useful error messages. Store the API key in an environment variable, not hardcoded. After the code, show me what the final DataFrame schema will look like.

Generates production-ready API extraction code with pagination, rate limit handling, authentication, and flattened output.

💡

Pro tip: Test your API extraction code on a single page before enabling full pagination. A bug in the loop can exhaust your API quota in minutes.

Time Series Analysis

34/35

Help me analyze a time series dataset in Python. The dataset contains: [describe it, e.g., daily website sessions for the past 3 years, weekly sales by product category, hourly server response times]. My analysis goals are: (1) understand the trend, seasonality, and noise components, (2) detect anomalies or unusual spikes, (3) test whether a change event on [date] had a measurable impact, and (4) build a short-term forecast. Write the Python code using pandas and statsmodels or Prophet as appropriate. For each step, explain what the output tells me and how to interpret key numbers. Also tell me: what data quality checks I should run before starting the analysis, and what assumptions I should be aware of.

Walks through decomposition, anomaly detection, change-point analysis, and forecasting for time series data with interpretation guidance.

💡

Pro tip: Visualize your time series first before running any models. Plotting the raw data for 5 minutes often reveals patterns or anomalies that would take hours to find analytically.

ML Model Evaluation

35/35

Help me evaluate a machine learning model I have built. The model is a [model type, e.g., logistic regression / random forest / XGBoost] trained to predict [what it predicts]. The problem type is [binary classification / multi-class classification / regression]. Here are my current performance metrics: [paste your metrics, e.g., accuracy: 0.84, precision: 0.79, recall: 0.71, F1: 0.75, AUC-ROC: 0.88]. The class balance in my training data was: [e.g., 90% negative, 10% positive]. Tell me: (1) whether these metrics indicate a good model for this use case, (2) which metric matters most for my specific problem and why, (3) whether class imbalance is distorting my evaluation, (4) what my confusion matrix likely looks like and what the errors mean in business terms, (5) what 2-3 improvements to try next, and (6) Python code to generate a full evaluation report with plots.

Interprets your model metrics in business context, identifies evaluation blind spots caused by class imbalance, and recommends the most impactful next improvements.

💡

Pro tip: Accuracy is almost always the wrong primary metric for imbalanced datasets. Always report precision, recall, and AUC alongside accuracy for a complete picture.

Frequently Asked Questions

Yes, and it is remarkably good at it — especially when you give it specific context. ChatGPT can write functional SQL queries, pandas transformations, and Python scripts that work correctly on the first try when your prompt includes your table names, column names, data types, and the exact outcome you want. The key is being specific. "Give me a SQL query" will produce something generic. "Write a PostgreSQL query that groups revenue by month and product category from an orders table with these columns" will produce something you can run immediately. Always test generated code on a sample of your data before applying it to your full dataset.

No. These prompts are specifically designed so that people with basic spreadsheet or SQL knowledge can produce analysis-quality outputs without advanced statistics or programming expertise. For more advanced prompts like regression setup or A/B testing, ChatGPT will explain the concepts in plain language alongside the code. The prompts ask you to describe your data and goals in plain English — the AI handles the technical translation. That said, understanding the outputs is your responsibility. Use the explanation requests built into each prompt to learn as you go rather than just copying outputs blindly.

Use caution with sensitive or proprietary data. Best practice is to anonymize or substitute real values with placeholder data before pasting into any AI tool. For most prompts on this page you only need to describe your data structure (column names, data types, row counts) rather than paste actual records. If you are analyzing sensitive customer, financial, or health data, check your organization's AI usage policy before sharing any real values. Many companies allow anonymized samples but restrict identifiable data.

Start with the "Write SQL from Plain English" or "Complex Formula Builder" prompts since those alone eliminate the most time-consuming parts of everyday data work. Build a prompt library of the 5-6 prompts you use most often, pre-filled with your table names, column names, and common context so you only need to change the specific question each time. For recurring analyses like weekly reports or monthly KPI updates, invest 20 minutes to create a template prompt that you reuse each period. The analysts who save the most time treat ChatGPT as a persistent collaborator with institutional context, not a one-off tool.

Prompts are the starting line. Tutorials are the finish.

A growing library of 300+ hands-on tutorials on ChatGPT, Claude, Midjourney, and 50+ AI tools. New tutorials added every week.

Start 7-Day Free Trial Explore AI for Data Analysis courses

7-day free trial. Cancel anytime.

Browse All Prompt Collections