I got tired of my AI agents crashing because the LLM hallucinated a JSON key or passed a string instead of an int. So I built ToolGuard — it fuzzes your Python tool functions with edge-cases (nulls, missing fields, type mismatches, 10MB payloads) and gives you a reliability score out of 100%.

No LLM needed to run tests. It reads your type hints, generates a Pydantic schema, and deterministically breaks things.

pip install py-toolguard

GitHub: https://github.com/Harshit-J004/toolguard

If you are building complex tool chains, I would be incredibly honored if you checked out the repo. Brutal feedback on the architecture is highly encouraged!