Overview

Jinba Modules Checker V2 is an advanced validation tool that checks files and data against custom rules defined in JSON format. This tool is designed for comprehensive document validation, compliance checking, and data quality assurance with enhanced capabilities over the original checker.

Key Features

  • Rule-Based Validation: Define custom validation rules in JSON format
  • Multi-Format Support: Check PDF, text, JSON, XML, CSV, and DOCX files
  • Reference Integration: Include company regulations, legal documents, and other reference materials
  • Detailed Results: Get structured results with status, ranges, and detailed explanations
  • Custom Data Schema: Define additional data schemas for specialized validation

Configuration

This tool requires no configuration parameters.

Example: Document Compliance Checking

- id: prepare_compliance_rules
  name: prepare_compliance_rules
  tool: INPUT_JSON
  input:
    - name: value
      value: [
        {
          "id": "financial_disclosure_001",
          "rule": "All financial figures must be clearly labeled with currency denomination",
          "category": "financial_reporting",
          "severity": "high",
          "description": "Check that monetary values include proper currency symbols or abbreviations"
        },
        {
          "id": "date_format_002", 
          "rule": "All dates must follow ISO 8601 format (YYYY-MM-DD) or clearly stated format",
          "category": "data_formatting",
          "severity": "medium",
          "description": "Ensure consistent date formatting throughout the document"
        },
        {
          "id": "signature_requirement_003",
          "rule": "Document must contain authorized signature or digital signature verification",
          "category": "authorization",
          "severity": "high",
          "description": "Verify proper authorization signatures are present"
        },
        {
          "id": "contact_information_004",
          "rule": "Contact information must include valid email format and phone number format",
          "category": "contact_validation",
          "severity": "medium", 
          "description": "Validate proper formatting of contact details"
        }
      ]

- id: check_financial_report
  name: check_financial_report
  tool: JINBA_MODULES_CHECKER_V2
  input:
    - name: target_file
      value: "path/to/financial_report.pdf"
    - name: task
      value: "Financial Report Compliance Review"
    - name: description
      value: "Comprehensive compliance check for quarterly financial report including regulatory requirements, data formatting standards, and authorization requirements."
    - name: rules
      value: "{{steps.prepare_compliance_rules.result}}"
    - name: references
      value: [
        "path/to/company_financial_policies.pdf",
        "path/to/regulatory_guidelines.pdf",
        "path/to/previous_approved_reports.pdf"
      ]
    - name: additional_data_schema
      value: {
        "financial_metrics": {
          "revenue": "number",
          "expenses": "number",
          "net_income": "number",
          "currency": "string"
        },
        "reporting_period": {
          "start_date": "date",
          "end_date": "date",
          "quarter": "string",
          "fiscal_year": "number"
        }
      }

- id: process_check_results
  name: process_check_results
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        from collections import defaultdict
        
        # Get check results
        check_results = {{steps.check_financial_report.result}}
        
        # Analyze results by status and category
        status_summary = defaultdict(int)
        category_summary = defaultdict(list)
        critical_issues = []
        
        for result in check_results:
            status = result.get("status", "unknown")
            status_summary[status] += 1
            
            # Extract category from rule if possible
            rule_text = result.get("rule", "")
            category = "general"
            if "financial" in rule_text.lower():
                category = "financial"
            elif "date" in rule_text.lower():
                category = "formatting"
            elif "signature" in rule_text.lower():
                category = "authorization"
            elif "contact" in rule_text.lower():
                category = "contact"
                
            category_summary[category].append({
                "rule_id": result.get("uniqueId"),
                "status": status,
                "reason": result.get("reason"),
                "range": result.get("range", [])
            })
            
            # Track critical issues (rejected items)
            if status == "rejected":
                critical_issues.append({
                    "rule_id": result.get("uniqueId"),
                    "rule": result.get("rule"),
                    "reason": result.get("reason"),
                    "location": result.get("range", []),
                    "additional_data": result.get("additionalData", {})
                })
        
        # Generate comprehensive report
        report = {
            "validation_summary": {
                "total_rules_checked": len(check_results),
                "accepted": status_summary.get("accepted", 0),
                "rejected": status_summary.get("rejected", 0),
                "pending": status_summary.get("pending", 0),
                "compliance_score": round((status_summary.get("accepted", 0) / len(check_results)) * 100, 2) if check_results else 0
            },
            "category_breakdown": dict(category_summary),
            "critical_issues": critical_issues,
            "recommendations": [],
            "detailed_results": check_results
        }
        
        # Generate recommendations based on critical issues
        if critical_issues:
            report["recommendations"].extend([
                "Review and address all rejected validation rules",
                "Implement document review process for critical compliance items",
                "Consider template updates to prevent recurring issues"
            ])
        else:
            report["recommendations"].append("Document passes all validation checks")
        
        print(json.dumps(report, indent=2))

Example: Multi-Document Quality Assurance

- id: setup_qa_workflow
  name: setup_qa_workflow
  tool: INPUT_JSON_WITH_VALIDATION
  input:
    - name: value
      value: {
        "documents": [
          {
            "file": "path/to/contract_draft.docx",
            "type": "legal_contract",
            "priority": "high"
          },
          {
            "file": "path/to/technical_spec.pdf", 
            "type": "technical_documentation",
            "priority": "medium"
          },
          {
            "file": "path/to/user_manual.pdf",
            "type": "user_documentation", 
            "priority": "low"
          }
        ],
        "universal_rules": [
          {
            "id": "spelling_grammar_001",
            "rule": "Document must be free of spelling and grammatical errors",
            "category": "quality_assurance",
            "severity": "medium"
          },
          {
            "id": "formatting_consistency_002",
            "rule": "Headers, fonts, and formatting must be consistent throughout",
            "category": "formatting",
            "severity": "low"
          }
        ],
        "legal_specific_rules": [
          {
            "id": "legal_terminology_001", 
            "rule": "Legal terminology must be precise and properly defined",
            "category": "legal_compliance",
            "severity": "high"
          },
          {
            "id": "clause_numbering_002",
            "rule": "Contract clauses must be properly numbered and referenced",
            "category": "legal_structure",
            "severity": "high"
          }
        ]
      }

- id: check_legal_contract
  name: check_legal_contract
  tool: JINBA_MODULES_CHECKER_V2
  input:
    - name: target_file
      value: "{{steps.setup_qa_workflow.result.documents[0].file}}"
    - name: task
      value: "Legal Contract Quality Assurance"
    - name: description
      value: "Comprehensive quality check for legal contract including terminology, structure, and compliance requirements."
    - name: rules
      value: "{{steps.setup_qa_workflow.result.universal_rules}} + {{steps.setup_qa_workflow.result.legal_specific_rules}}"
    - name: references
      value: [
        "path/to/legal_style_guide.pdf",
        "path/to/contract_templates.pdf", 
        "path/to/legal_terminology_reference.pdf"
      ]
    - name: additional_data_schema
      value: {
        "contract_parties": {
          "party_a": "string",
          "party_b": "string",
          "party_roles": "array"
        },
        "key_terms": {
          "effective_date": "date",
          "termination_date": "date", 
          "renewal_terms": "string",
          "payment_terms": "object"
        }
      }

- id: check_technical_spec  
  name: check_technical_spec
  tool: JINBA_MODULES_CHECKER_V2
  input:
    - name: target_file
      value: "{{steps.setup_qa_workflow.result.documents[1].file}}"
    - name: task
      value: "Technical Documentation Review"
    - name: description
      value: "Quality assurance check for technical specifications focusing on clarity, completeness, and technical accuracy."
    - name: rules
      value: "{{steps.setup_qa_workflow.result.universal_rules}}"
    - name: references
      value: [
        "path/to/technical_writing_standards.pdf",
        "path/to/industry_specifications.pdf"
      ]

- id: generate_qa_dashboard
  name: generate_qa_dashboard  
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        from datetime import datetime
        
        # Get results from both checks
        legal_results = {{steps.check_legal_contract.result}}
        technical_results = {{steps.check_technical_spec.result}}
        
        def analyze_document_results(results, doc_type, priority):
            total_checks = len(results)
            passed = len([r for r in results if r.get("status") == "accepted"])
            failed = len([r for r in results if r.get("status") == "rejected"]) 
            pending = len([r for r in results if r.get("status") == "pending"])
            
            return {
                "document_type": doc_type,
                "priority": priority,
                "total_checks": total_checks,
                "passed": passed,
                "failed": failed,
                "pending": pending,
                "success_rate": round((passed / total_checks * 100), 2) if total_checks > 0 else 0,
                "critical_issues": [r for r in results if r.get("status") == "rejected"]
            }
        
        # Analyze each document
        legal_analysis = analyze_document_results(legal_results, "Legal Contract", "high")
        technical_analysis = analyze_document_results(technical_results, "Technical Specification", "medium")
        
        # Create QA dashboard
        dashboard = {
            "qa_report_generated": datetime.now().isoformat(),
            "overall_summary": {
                "documents_reviewed": 2,
                "high_priority_issues": len(legal_analysis["critical_issues"]),
                "medium_priority_issues": len(technical_analysis["critical_issues"]),
                "average_success_rate": round((legal_analysis["success_rate"] + technical_analysis["success_rate"]) / 2, 2)
            },
            "document_analysis": [legal_analysis, technical_analysis],
            "next_actions": [],
            "quality_recommendations": [
                "Prioritize resolution of high-priority document issues",
                "Implement automated pre-checks for common formatting issues",
                "Create document templates incorporating identified best practices",
                "Schedule regular QA reviews for critical documents"
            ]
        }
        
        # Generate next actions based on results
        if legal_analysis["failed"] > 0:
            dashboard["next_actions"].append("Review and revise legal contract based on failed checks")
        if technical_analysis["failed"] > 0:
            dashboard["next_actions"].append("Update technical specification to address quality issues")
            
        if not dashboard["next_actions"]:
            dashboard["next_actions"].append("All documents passed quality checks - proceed with approval process")
        
        print(json.dumps(dashboard, indent=2))

Example: Data Quality Validation Pipeline

- id: prepare_data_validation_rules
  name: prepare_data_validation_rules
  tool: INPUT_JSON
  input:
    - name: value
      value: [
        {
          "id": "data_completeness_001",
          "rule": "All required fields must contain non-empty values",
          "category": "completeness",
          "severity": "high",
          "description": "Check for missing or empty required data fields"
        },
        {
          "id": "data_format_002",
          "rule": "Email addresses must follow valid email format pattern",
          "category": "format_validation",
          "severity": "high", 
          "description": "Validate email format using standard email regex patterns"
        },
        {
          "id": "date_range_003",
          "rule": "Dates must be within reasonable range (not future dates where inappropriate)",
          "category": "logical_validation",
          "severity": "medium",
          "description": "Check date fields for logical consistency and appropriate ranges"
        },
        {
          "id": "numeric_bounds_004",
          "rule": "Numeric values must be within expected ranges and contain no invalid characters",
          "category": "numeric_validation",
          "severity": "high",
          "description": "Validate numeric data for proper formatting and reasonable values"
        }
      ]

- id: validate_customer_data
  name: validate_customer_data
  tool: JINBA_MODULES_CHECKER_V2
  input:
    - name: target_file
      value: "path/to/customer_data.csv"
    - name: task
      value: "Customer Data Quality Validation"
    - name: description
      value: "Validate customer data export for completeness, format compliance, and logical consistency before processing."
    - name: rules
      value: "{{steps.prepare_data_validation_rules.result}}"
    - name: references
      value: [
        "path/to/data_quality_standards.pdf",
        "path/to/customer_data_schema.json"
      ]
    - name: additional_data_schema
      value: {
        "customer_record": {
          "customer_id": "string",
          "first_name": "string", 
          "last_name": "string",
          "email": "email",
          "phone": "phone",
          "registration_date": "date",
          "last_activity": "date",
          "account_status": "enum"
        },
        "validation_metrics": {
          "total_records": "number",
          "valid_records": "number",
          "invalid_records": "number",
          "completion_percentage": "number"
        }
      }

- id: generate_data_quality_report
  name: generate_data_quality_report
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        from collections import Counter, defaultdict
        
        # Get validation results
        validation_results = {{steps.validate_customer_data.result}}
        
        # Analyze data quality metrics
        total_rules = len(validation_results)
        status_counts = Counter(result.get("status", "unknown") for result in validation_results)
        
        # Group issues by category and severity
        issues_by_category = defaultdict(list)
        critical_issues = []
        
        for result in validation_results:
            rule_text = result.get("rule", "")
            status = result.get("status")
            
            # Categorize based on rule content
            category = "general"
            if "completeness" in rule_text.lower():
                category = "data_completeness"
            elif "format" in rule_text.lower():
                category = "format_validation"  
            elif "range" in rule_text.lower() or "date" in rule_text.lower():
                category = "logical_validation"
            elif "numeric" in rule_text.lower():
                category = "numeric_validation"
            
            issue_info = {
                "rule_id": result.get("uniqueId"),
                "status": status,
                "reason": result.get("reason"),
                "location": result.get("range", []),
                "additional_data": result.get("additionalData", {})
            }
            
            issues_by_category[category].append(issue_info)
            
            if status == "rejected":
                critical_issues.append({
                    "category": category,
                    "rule": result.get("rule"),
                    "reason": result.get("reason"),
                    "impact": "high" if "high" in rule_text.lower() else "medium"
                })
        
        # Calculate data quality score
        quality_score = round((status_counts.get("accepted", 0) / total_rules * 100), 2) if total_rules > 0 else 0
        
        # Generate comprehensive data quality report
        report = {
            "data_quality_summary": {
                "overall_quality_score": quality_score,
                "total_validation_rules": total_rules,
                "passed_validations": status_counts.get("accepted", 0),
                "failed_validations": status_counts.get("rejected", 0),
                "pending_validations": status_counts.get("pending", 0),
                "quality_grade": "A" if quality_score >= 90 else "B" if quality_score >= 80 else "C" if quality_score >= 70 else "F"
            },
            "category_breakdown": dict(issues_by_category),
            "critical_issues": critical_issues,
            "data_readiness": "READY" if quality_score >= 85 else "NEEDS_REVIEW" if quality_score >= 70 else "NOT_READY",
            "recommendations": [
                f"Data quality score: {quality_score}% - {'Excellent' if quality_score >= 90 else 'Good' if quality_score >= 80 else 'Needs Improvement'}",
                f"Address {len(critical_issues)} critical data quality issues before proceeding",
                "Implement data validation at source to prevent future quality issues",
                "Consider automated data cleaning for common format issues"
            ],
            "detailed_validation_results": validation_results
        }
        
        print(json.dumps(report, indent=2))

Tips and Best Practices

  • Rule Definition: Create clear, specific rules with well-defined success criteria
  • Reference Materials: Include relevant reference documents to improve validation accuracy
  • Structured Results: Always process results for actionable insights and reporting
  • Incremental Validation: Start with basic rules and gradually add complexity
  • Category Organization: Group rules by category for better result analysis
  • Custom Schemas: Define additional data schemas for specialized validation needs
  • Error Handling: Implement proper handling for validation failures and edge cases
  • Performance Optimization: Consider file size and complexity when setting up validation workflows
  • Documentation: Maintain clear documentation of validation rules and their purposes
  • Continuous Improvement: Regularly review and update validation rules based on results