Overview
Jinba Modules Checker V2 is an advanced validation tool that checks files and data against custom rules defined in JSON format. This tool is designed for comprehensive document validation, compliance checking, and data quality assurance with enhanced capabilities over the original checker.Key Features
- Rule-Based Validation: Define custom validation rules in JSON format
- Multi-Format Support: Check PDF, text, JSON, XML, CSV, and DOCX files
- Reference Integration: Include company regulations, legal documents, and other reference materials
- Detailed Results: Get structured results with status, ranges, and detailed explanations
- Custom Data Schema: Define additional data schemas for specialized validation
Configuration
This tool requires no configuration parameters.Example: Document Compliance Checking
Copy
- id: prepare_compliance_rules
name: prepare_compliance_rules
tool: INPUT_JSON
input:
- name: value
value: [
{
"id": "financial_disclosure_001",
"rule": "All financial figures must be clearly labeled with currency denomination",
"category": "financial_reporting",
"severity": "high",
"description": "Check that monetary values include proper currency symbols or abbreviations"
},
{
"id": "date_format_002",
"rule": "All dates must follow ISO 8601 format (YYYY-MM-DD) or clearly stated format",
"category": "data_formatting",
"severity": "medium",
"description": "Ensure consistent date formatting throughout the document"
},
{
"id": "signature_requirement_003",
"rule": "Document must contain authorized signature or digital signature verification",
"category": "authorization",
"severity": "high",
"description": "Verify proper authorization signatures are present"
},
{
"id": "contact_information_004",
"rule": "Contact information must include valid email format and phone number format",
"category": "contact_validation",
"severity": "medium",
"description": "Validate proper formatting of contact details"
}
]
- id: check_financial_report
name: check_financial_report
tool: JINBA_MODULES_CHECKER_V2
input:
- name: target_file
value: "path/to/financial_report.pdf"
- name: task
value: "Financial Report Compliance Review"
- name: description
value: "Comprehensive compliance check for quarterly financial report including regulatory requirements, data formatting standards, and authorization requirements."
- name: rules
value: "{{steps.prepare_compliance_rules.result}}"
- name: references
value: [
"path/to/company_financial_policies.pdf",
"path/to/regulatory_guidelines.pdf",
"path/to/previous_approved_reports.pdf"
]
- name: additional_data_schema
value: {
"financial_metrics": {
"revenue": "number",
"expenses": "number",
"net_income": "number",
"currency": "string"
},
"reporting_period": {
"start_date": "date",
"end_date": "date",
"quarter": "string",
"fiscal_year": "number"
}
}
- id: process_check_results
name: process_check_results
tool: PYTHON_SANDBOX_RUN
input:
- name: script
value: |
import json
from collections import defaultdict
# Get check results
check_results = {{steps.check_financial_report.result}}
# Analyze results by status and category
status_summary = defaultdict(int)
category_summary = defaultdict(list)
critical_issues = []
for result in check_results:
status = result.get("status", "unknown")
status_summary[status] += 1
# Extract category from rule if possible
rule_text = result.get("rule", "")
category = "general"
if "financial" in rule_text.lower():
category = "financial"
elif "date" in rule_text.lower():
category = "formatting"
elif "signature" in rule_text.lower():
category = "authorization"
elif "contact" in rule_text.lower():
category = "contact"
category_summary[category].append({
"rule_id": result.get("uniqueId"),
"status": status,
"reason": result.get("reason"),
"range": result.get("range", [])
})
# Track critical issues (rejected items)
if status == "rejected":
critical_issues.append({
"rule_id": result.get("uniqueId"),
"rule": result.get("rule"),
"reason": result.get("reason"),
"location": result.get("range", []),
"additional_data": result.get("additionalData", {})
})
# Generate comprehensive report
report = {
"validation_summary": {
"total_rules_checked": len(check_results),
"accepted": status_summary.get("accepted", 0),
"rejected": status_summary.get("rejected", 0),
"pending": status_summary.get("pending", 0),
"compliance_score": round((status_summary.get("accepted", 0) / len(check_results)) * 100, 2) if check_results else 0
},
"category_breakdown": dict(category_summary),
"critical_issues": critical_issues,
"recommendations": [],
"detailed_results": check_results
}
# Generate recommendations based on critical issues
if critical_issues:
report["recommendations"].extend([
"Review and address all rejected validation rules",
"Implement document review process for critical compliance items",
"Consider template updates to prevent recurring issues"
])
else:
report["recommendations"].append("Document passes all validation checks")
print(json.dumps(report, indent=2))
Example: Multi-Document Quality Assurance
Copy
- id: setup_qa_workflow
name: setup_qa_workflow
tool: INPUT_JSON_WITH_VALIDATION
input:
- name: value
value: {
"documents": [
{
"file": "path/to/contract_draft.docx",
"type": "legal_contract",
"priority": "high"
},
{
"file": "path/to/technical_spec.pdf",
"type": "technical_documentation",
"priority": "medium"
},
{
"file": "path/to/user_manual.pdf",
"type": "user_documentation",
"priority": "low"
}
],
"universal_rules": [
{
"id": "spelling_grammar_001",
"rule": "Document must be free of spelling and grammatical errors",
"category": "quality_assurance",
"severity": "medium"
},
{
"id": "formatting_consistency_002",
"rule": "Headers, fonts, and formatting must be consistent throughout",
"category": "formatting",
"severity": "low"
}
],
"legal_specific_rules": [
{
"id": "legal_terminology_001",
"rule": "Legal terminology must be precise and properly defined",
"category": "legal_compliance",
"severity": "high"
},
{
"id": "clause_numbering_002",
"rule": "Contract clauses must be properly numbered and referenced",
"category": "legal_structure",
"severity": "high"
}
]
}
- id: check_legal_contract
name: check_legal_contract
tool: JINBA_MODULES_CHECKER_V2
input:
- name: target_file
value: "{{steps.setup_qa_workflow.result.documents[0].file}}"
- name: task
value: "Legal Contract Quality Assurance"
- name: description
value: "Comprehensive quality check for legal contract including terminology, structure, and compliance requirements."
- name: rules
value: "{{steps.setup_qa_workflow.result.universal_rules}} + {{steps.setup_qa_workflow.result.legal_specific_rules}}"
- name: references
value: [
"path/to/legal_style_guide.pdf",
"path/to/contract_templates.pdf",
"path/to/legal_terminology_reference.pdf"
]
- name: additional_data_schema
value: {
"contract_parties": {
"party_a": "string",
"party_b": "string",
"party_roles": "array"
},
"key_terms": {
"effective_date": "date",
"termination_date": "date",
"renewal_terms": "string",
"payment_terms": "object"
}
}
- id: check_technical_spec
name: check_technical_spec
tool: JINBA_MODULES_CHECKER_V2
input:
- name: target_file
value: "{{steps.setup_qa_workflow.result.documents[1].file}}"
- name: task
value: "Technical Documentation Review"
- name: description
value: "Quality assurance check for technical specifications focusing on clarity, completeness, and technical accuracy."
- name: rules
value: "{{steps.setup_qa_workflow.result.universal_rules}}"
- name: references
value: [
"path/to/technical_writing_standards.pdf",
"path/to/industry_specifications.pdf"
]
- id: generate_qa_dashboard
name: generate_qa_dashboard
tool: PYTHON_SANDBOX_RUN
input:
- name: script
value: |
import json
from datetime import datetime
# Get results from both checks
legal_results = {{steps.check_legal_contract.result}}
technical_results = {{steps.check_technical_spec.result}}
def analyze_document_results(results, doc_type, priority):
total_checks = len(results)
passed = len([r for r in results if r.get("status") == "accepted"])
failed = len([r for r in results if r.get("status") == "rejected"])
pending = len([r for r in results if r.get("status") == "pending"])
return {
"document_type": doc_type,
"priority": priority,
"total_checks": total_checks,
"passed": passed,
"failed": failed,
"pending": pending,
"success_rate": round((passed / total_checks * 100), 2) if total_checks > 0 else 0,
"critical_issues": [r for r in results if r.get("status") == "rejected"]
}
# Analyze each document
legal_analysis = analyze_document_results(legal_results, "Legal Contract", "high")
technical_analysis = analyze_document_results(technical_results, "Technical Specification", "medium")
# Create QA dashboard
dashboard = {
"qa_report_generated": datetime.now().isoformat(),
"overall_summary": {
"documents_reviewed": 2,
"high_priority_issues": len(legal_analysis["critical_issues"]),
"medium_priority_issues": len(technical_analysis["critical_issues"]),
"average_success_rate": round((legal_analysis["success_rate"] + technical_analysis["success_rate"]) / 2, 2)
},
"document_analysis": [legal_analysis, technical_analysis],
"next_actions": [],
"quality_recommendations": [
"Prioritize resolution of high-priority document issues",
"Implement automated pre-checks for common formatting issues",
"Create document templates incorporating identified best practices",
"Schedule regular QA reviews for critical documents"
]
}
# Generate next actions based on results
if legal_analysis["failed"] > 0:
dashboard["next_actions"].append("Review and revise legal contract based on failed checks")
if technical_analysis["failed"] > 0:
dashboard["next_actions"].append("Update technical specification to address quality issues")
if not dashboard["next_actions"]:
dashboard["next_actions"].append("All documents passed quality checks - proceed with approval process")
print(json.dumps(dashboard, indent=2))
Example: Data Quality Validation Pipeline
Copy
- id: prepare_data_validation_rules
name: prepare_data_validation_rules
tool: INPUT_JSON
input:
- name: value
value: [
{
"id": "data_completeness_001",
"rule": "All required fields must contain non-empty values",
"category": "completeness",
"severity": "high",
"description": "Check for missing or empty required data fields"
},
{
"id": "data_format_002",
"rule": "Email addresses must follow valid email format pattern",
"category": "format_validation",
"severity": "high",
"description": "Validate email format using standard email regex patterns"
},
{
"id": "date_range_003",
"rule": "Dates must be within reasonable range (not future dates where inappropriate)",
"category": "logical_validation",
"severity": "medium",
"description": "Check date fields for logical consistency and appropriate ranges"
},
{
"id": "numeric_bounds_004",
"rule": "Numeric values must be within expected ranges and contain no invalid characters",
"category": "numeric_validation",
"severity": "high",
"description": "Validate numeric data for proper formatting and reasonable values"
}
]
- id: validate_customer_data
name: validate_customer_data
tool: JINBA_MODULES_CHECKER_V2
input:
- name: target_file
value: "path/to/customer_data.csv"
- name: task
value: "Customer Data Quality Validation"
- name: description
value: "Validate customer data export for completeness, format compliance, and logical consistency before processing."
- name: rules
value: "{{steps.prepare_data_validation_rules.result}}"
- name: references
value: [
"path/to/data_quality_standards.pdf",
"path/to/customer_data_schema.json"
]
- name: additional_data_schema
value: {
"customer_record": {
"customer_id": "string",
"first_name": "string",
"last_name": "string",
"email": "email",
"phone": "phone",
"registration_date": "date",
"last_activity": "date",
"account_status": "enum"
},
"validation_metrics": {
"total_records": "number",
"valid_records": "number",
"invalid_records": "number",
"completion_percentage": "number"
}
}
- id: generate_data_quality_report
name: generate_data_quality_report
tool: PYTHON_SANDBOX_RUN
input:
- name: script
value: |
import json
from collections import Counter, defaultdict
# Get validation results
validation_results = {{steps.validate_customer_data.result}}
# Analyze data quality metrics
total_rules = len(validation_results)
status_counts = Counter(result.get("status", "unknown") for result in validation_results)
# Group issues by category and severity
issues_by_category = defaultdict(list)
critical_issues = []
for result in validation_results:
rule_text = result.get("rule", "")
status = result.get("status")
# Categorize based on rule content
category = "general"
if "completeness" in rule_text.lower():
category = "data_completeness"
elif "format" in rule_text.lower():
category = "format_validation"
elif "range" in rule_text.lower() or "date" in rule_text.lower():
category = "logical_validation"
elif "numeric" in rule_text.lower():
category = "numeric_validation"
issue_info = {
"rule_id": result.get("uniqueId"),
"status": status,
"reason": result.get("reason"),
"location": result.get("range", []),
"additional_data": result.get("additionalData", {})
}
issues_by_category[category].append(issue_info)
if status == "rejected":
critical_issues.append({
"category": category,
"rule": result.get("rule"),
"reason": result.get("reason"),
"impact": "high" if "high" in rule_text.lower() else "medium"
})
# Calculate data quality score
quality_score = round((status_counts.get("accepted", 0) / total_rules * 100), 2) if total_rules > 0 else 0
# Generate comprehensive data quality report
report = {
"data_quality_summary": {
"overall_quality_score": quality_score,
"total_validation_rules": total_rules,
"passed_validations": status_counts.get("accepted", 0),
"failed_validations": status_counts.get("rejected", 0),
"pending_validations": status_counts.get("pending", 0),
"quality_grade": "A" if quality_score >= 90 else "B" if quality_score >= 80 else "C" if quality_score >= 70 else "F"
},
"category_breakdown": dict(issues_by_category),
"critical_issues": critical_issues,
"data_readiness": "READY" if quality_score >= 85 else "NEEDS_REVIEW" if quality_score >= 70 else "NOT_READY",
"recommendations": [
f"Data quality score: {quality_score}% - {'Excellent' if quality_score >= 90 else 'Good' if quality_score >= 80 else 'Needs Improvement'}",
f"Address {len(critical_issues)} critical data quality issues before proceeding",
"Implement data validation at source to prevent future quality issues",
"Consider automated data cleaning for common format issues"
],
"detailed_validation_results": validation_results
}
print(json.dumps(report, indent=2))
Tips and Best Practices
- Rule Definition: Create clear, specific rules with well-defined success criteria
- Reference Materials: Include relevant reference documents to improve validation accuracy
- Structured Results: Always process results for actionable insights and reporting
- Incremental Validation: Start with basic rules and gradually add complexity
- Category Organization: Group rules by category for better result analysis
- Custom Schemas: Define additional data schemas for specialized validation needs
- Error Handling: Implement proper handling for validation failures and edge cases
- Performance Optimization: Consider file size and complexity when setting up validation workflows
- Documentation: Maintain clear documentation of validation rules and their purposes
- Continuous Improvement: Regularly review and update validation rules based on results