Made by #TEAM Hack-with-Pals
Agent Runner
/
easy_missing_and_dupes · Step 0 of 40
● Easy ● easy_missing_and_dupes
Challenge Level:
Total Runs
0
Session
Avg Quality Score
0.000
live
Auto-Approved
0.0%
CI gate threshold: default
Avg Risk Score
0.00
live
Task Difficulty Distribution
1,284 runs
Easy (missing + dupes)
45%
Medium (type + category)
34%
Hard (conflicts + budget)
21%
Quality score ≥ 0.80
68%
Steps within budget
81%
Deduplication accuracy
93%
Recent Runs
run_0001
No runs yet
WAITING
0.000
Select Task to Run
● Easy
Missing Values & Duplicates
Clean NaN entries across 3 columns, identify and remove duplicate rows with matching customer IDs.
500rows
12defects
40step budget
◆ Medium
Type Errors & Category Drift
Cast mistyped numeric fields, resolve inconsistent category labels (e.g. "NY" vs "New York").
1.2krows
28defects
60step budget
■ Hard
Conflicts & Budget Constraints
Resolve field-level conflicts across merged sources, handle outliers within strict step budget.
3krows
54defects
80step budget
Playground
Step · Reset · Get state
Click any action below, then press Step. Watch the step budget, quality report, and governance warning update live.
Raw JSON response
{ "message": "Raw JSON response will appear here" }
Status
● ready
taskeasy_missing_and_dupes
last actionnone
step0
submittedfalse
Environment Observation
Level: Easy Step 0 / 0 ● Ready
"status": "running",
"step": 7,
"rows_remaining": 2847,
"issues_found": {
  "missing_values": 34,
  "duplicates": 12,
  "type_errors": 8,
  "outliers": 5
},
"reward": 0.412,
"budget_used": 7 / 80,
"risk_flag": null
Step Log
Live
00:00.12 reset() → env initialized, task: hard_conflicts
00:01.04 inspect_column("revenue") → 34 nulls detected
00:01.88 clean_missing("revenue", strategy="median") → 34 rows imputed, reward +0.08
00:02.55 deduplicate(key="customer_id") → 12 dupes removed, reward +0.06
00:03.20 cast_type("order_date", target="datetime") ⚠ 3 unparseable values — flagged
00:04.01 resolve_conflict("region", source_priority=[A,B]) → 18 conflicts resolved, reward +0.11
00:05.74 flag_outliers("amount", method="iqr") ⚠ 5 outliers flagged for review
00:06.90 inspect_column("category") → 4 inconsistent labels found
Available Actions
Cleaning
clean_missing()
Impute or drop null values
deduplicate()
Remove duplicate rows by key
cast_type()
Convert column to target dtype
Validation
validate_constraints()
Check all columns against spec
cap_outliers()
IQR / z-score detection
normalize_categories()
Merge conflicting source fields
Submission
submit()
Finalize and grade result
Live Reward Tracker
Quality 0.000
Efficiency 0/0 steps
Risk Score 0.00
step reward trace will appear here
Dataset Preview
3,000 rows · 8 columns 54 defects
hard_conflicts_and_budget.json
# customer_id str order_date datetime region str category str amount float revenue float status flags
1 CUST_0041 2024-03-12 New York Electronics 1,240.00 890.50 ✓ clean
2 CUST_0041 2024-03-12 NY Electronics 1,240.00 890.50 ⧉ duplicate
3 CUST_0089 n/a California Apparel 320.00 NaN ○ missing
4 CUST_0112 2024-01-28 Texas electronics 98,400.00 72,100.00 △ outlier
5 CUST_0204 not-a-date Florida Home Goods 540.00 310.00 ⊞ type error
6 CUST_0310 2024-02-14 Cali Home Goods 780.00 490.00 ⊞ category
7 CUST_0391 2024-02-19 New York Electronics 2,100.00 1,450.00 ✓ clean
Missing Values
34
across 3 columns
Duplicates
12
by customer_id key
Type Errors
8
order_date, category
Outliers
5
IQR method, amount
Per-Step Risk Scores
run_1284 · gpt-4o
00
No steps yet
Run actions to populate governance trace
0 low
CI Gate Status
Waiting for evaluation
Run and evaluate an episode to see CI gates.
PENDING
Risk Flags & Recommendations
i
No governance flags yet
Execute actions to populate recommendations.
Evaluation Leaderboard
Hard task · 1,284 runs
#
Agent
Policy
Score bar
Score
Verdict
1
session-agent
No evaluated runs yet
0.000
WAITING
Evaluation Payload
run_1284 · gpt-4o
Missing handled1.00
Duplicates removed1.00
Type accuracy0.82
Category consistency0.94
Conflict resolution0.91
Steps used7 / 80
Budget utilization8.7%
Redundant actions0
Avg risk / step0.31
High-risk steps1
Final Score
0.000
CI gate threshold: 0.75 ● Waiting
Full Run Audit
Complete step-by-step trace with timestamps, actions, observations, and risk scores. Suitable for compliance review and policy debugging.
"run_id": "run_1284",
"agent": "gpt-4o-0613",
"task": "hard_conflicts_and_budget",
"total_steps": 7,
"final_score": 0.891,
"verdict": "AUTO_APPROVED",
"risk_flags": ["step_7_high_risk_drop"],
"gate_results": {
  "quality": "PASS",
  "budget": "PASS",
  "max_risk": "FLAG"
}
Connection & Runtime
● unknown
Base URL/
Selected Taskeasy_missing_and_dupes
Step Budget0
Steps Used0
Validation Passedfalse
API Endpoints
Loading endpoint map...
Quick Actions
Configuration Notes
This environment uses deterministic tasks and fixed evaluator thresholds by default. Use /evaluate with custom threshold payloads for stricter CI policies.
Current CI decision
WAITING
Task Library
3 tasks
01
Loading tasks...
Preparing task catalog
ready
Active Task
easy_missing_and_dupes
Used by next reset/run
Recommended Workflow
inspect → clean → validate → submit
Keeps risk and invalid actions low
DataQA Triage Assistant
Master data quality through progressive challenges
Clean, validate, and evaluate dataset quality through guided, step-by-step challenges. Progress through three difficulty levels and track your achievements on the leaderboard.
Select your starting level
1
Easy
Beginner Challenge
Handle missing values and duplicate records. Learn the fundamentals of data cleaning.
2
Medium
Intermediate Challenge
Resolve type mismatches and normalize category values. Master data transformation.
3
Hard
Advanced Challenge
Resolve conflicts and outliers under strict constraints. Optimize for efficiency and quality.
Before You Start This Level
● Easy
Level Objective
-
How To Work This Level
-
Recommended First Actions
-
Success Criteria + Cautions
-