Jinba Modules

Overview

Jinba Modules provide powerful data processing capabilities including extraction, parsing, and validation. These tools use advanced AI and machine learning techniques to handle complex data transformation tasks with high accuracy and flexibility.

Key Features

JINBA_MODULES_EXTRACT

AI-powered data extraction from various sources
Configurable extraction modes (FAST, BALANCED, QUALITY)
User-defined JSON schema support
Intelligent content recognition and parsing

JINBA_MODULES_PARSE

Advanced document and data parsing
Structure recognition and preservation
Multi-format support
Context-aware content interpretation

JINBA_MODULES_CHECKER_V2

Enhanced data validation using JSON rules
Complex rule engine with multiple validation types
Detailed validation reporting
Improved performance and accuracy

Authentication

No authentication required for Jinba Modules tools.

Example: Intelligent Document Extraction

- id: upload_document
  name: upload_document
  tool: INPUT_FILE
  input:
    - name: description
      value: "Upload document for intelligent extraction"

- id: extract_structured_data
  name: extract_structured_data
  tool: JINBA_MODULES_EXTRACT
  input:
    - name: task_name
      value: "Invoice Data Extraction"
    - name: file_url
      value: "{{steps.upload_document.result.file_url}}"
    - name: data_schema
      value: |
        {
          "$schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "properties": {
            "invoice_number": {
              "type": "string",
              "description": "Invoice number or ID"
            },
            "date": {
              "type": "string",
              "format": "date",
              "description": "Invoice date"
            },
            "vendor": {
              "type": "object",
              "properties": {
                "name": {"type": "string"},
                "address": {"type": "string"},
                "phone": {"type": "string"},
                "email": {"type": "string"}
              }
            },
            "items": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "description": {"type": "string"},
                  "quantity": {"type": "number"},
                  "unit_price": {"type": "number"},
                  "total": {"type": "number"}
                }
              }
            },
            "total_amount": {
              "type": "number",
              "description": "Total invoice amount"
            },
            "tax_amount": {
              "type": "number",
              "description": "Tax amount if present"
            }
          },
          "required": ["invoice_number", "date", "total_amount"]
        }
    - name: extraction_mode
      value: "QUALITY"  # Options: FAST, BALANCED, QUALITY

- id: validate_extracted_data
  name: validate_extracted_data
  tool: JINBA_MODULES_CHECKER_V2
  input:
    - name: file_url
      value: "{{steps.extract_structured_data.result.file_url}}"
    - name: rules_json
      value: |
        {
          "validation_rules": [
            {
              "field": "invoice_number",
              "type": "required",
              "error_message": "Invoice number is required"
            },
            {
              "field": "total_amount",
              "type": "number",
              "min": 0,
              "error_message": "Total amount must be a positive number"
            },
            {
              "field": "date",
              "type": "date",
              "format": "YYYY-MM-DD",
              "error_message": "Date must be in valid format"
            },
            {
              "field": "vendor.email",
              "type": "email",
              "required": false,
              "error_message": "Email must be valid format if provided"
            }
          ]
        }

- id: process_extraction_results
  name: process_extraction_results
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        
        # Process extraction results
        extracted_data = json.loads('''{{steps.extract_structured_data.result.extracted_data}}''')
        validation_results = json.loads('''{{steps.validate_extracted_data.result.validation_results}}''')
        
        print("Document Extraction Results")
        print("=" * 35)
        
        # Display extracted data
        print("📄 Extracted Information:")
        print(f"Invoice Number: {extracted_data.get('invoice_number', 'N/A')}")
        print(f"Date: {extracted_data.get('date', 'N/A')}")
        print(f"Vendor: {extracted_data.get('vendor', {}).get('name', 'N/A')}")
        print(f"Total Amount: ${extracted_data.get('total_amount', 0):,.2f}")
        
        if 'items' in extracted_data:
            print(f"Items Count: {len(extracted_data['items'])}")
        
        print("\n🔍 Validation Results:")
        valid_count = sum(1 for r in validation_results if r.get('status') == 'valid')
        total_rules = len(validation_results)
        print(f"Valid: {valid_count}/{total_rules}")
        
        # Show any validation errors
        errors = [r for r in validation_results if r.get('status') == 'invalid']
        if errors:
            print("\n❌ Validation Errors:")
            for error in errors:
                print(f"  - {error.get('field', 'Unknown')}: {error.get('message', 'Unknown error')}")
        else:
            print("✅ All validations passed")

- id: export_processed_data
  name: export_processed_data
  tool: OUTPUT_FILE
  input:
    - name: content
      value: "{{steps.extract_structured_data.result.extracted_data}}"
    - name: filename
      value: "extracted_invoice_data_{{date | format('YYYY-MM-DD')}}.json"
    - name: fileType
      value: "json"

Example: Batch Document Processing

- id: setup_batch_processing
  name: setup_batch_processing
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        # Define batch processing configuration
        batch_config = {
            "document_types": ["invoice", "receipt", "contract"],
            "extraction_schema": {
                "common_fields": ["date", "amount", "vendor", "document_type"],
                "invoice_fields": ["invoice_number", "line_items", "tax_amount"],
                "receipt_fields": ["merchant", "payment_method", "receipt_number"],
                "contract_fields": ["parties", "terms", "effective_date", "expiration_date"]
            },
            "validation_rules": {
                "amount_validation": {"type": "number", "min": 0},
                "date_validation": {"type": "date", "format": "flexible"},
                "email_validation": {"type": "email", "required": false}
            }
        }
        
        print("Batch processing configured for document types:")
        for doc_type in batch_config["document_types"]:
            print(f"  - {doc_type.title()}")

- id: process_document_batch
  name: process_document_batch
  tool: JINBA_MODULES_EXTRACT
  input:
    - name: task_name
      value: "Batch Document Processing"
    - name: file_url
      value: "{{input.batch_file_url}}"
    - name: data_schema
      value: |
        {
          "$schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "properties": {
            "documents": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "document_type": {"type": "string"},
                  "date": {"type": "string"},
                  "amount": {"type": "number"},
                  "vendor": {"type": "string"},
                  "metadata": {
                    "type": "object",
                    "additionalProperties": true
                  }
                },
                "required": ["document_type", "date", "amount"]
              }
            }
          }
        }
    - name: extraction_mode
      value: "BALANCED"

- id: parse_complex_structures
  name: parse_complex_structures
  tool: JINBA_MODULES_PARSE
  input:
    - name: input_data
      value: "{{steps.process_document_batch.result.extracted_data}}"
    - name: parsing_options
      value: |
        {
          "preserve_structure": true,
          "normalize_dates": true,
          "standardize_amounts": true,
          "extract_entities": true,
          "group_by_type": true
        }

- id: comprehensive_validation
  name: comprehensive_validation
  tool: JINBA_MODULES_CHECKER_V2
  input:
    - name: data_content
      value: "{{steps.parse_complex_structures.result.parsed_data}}"
    - name: rules_json
      value: |
        {
          "validation_rules": [
            {
              "field": "documents[*].document_type",
              "type": "enum",
              "values": ["invoice", "receipt", "contract"],
              "error_message": "Document type must be invoice, receipt, or contract"
            },
            {
              "field": "documents[*].amount",
              "type": "number",
              "min": 0,
              "max": 1000000,
              "error_message": "Amount must be between 0 and 1,000,000"
            },
            {
              "field": "documents[*].date",
              "type": "date",
              "min_date": "2020-01-01",
              "max_date": "2025-12-31",
              "error_message": "Date must be between 2020 and 2025"
            },
            {
              "field": "documents[*].vendor",
              "type": "string",
              "min_length": 2,
              "max_length": 200,
              "error_message": "Vendor name must be 2-200 characters"
            }
          ],
          "summary_rules": [
            {
              "rule": "document_count_check",
              "expression": "documents.length > 0",
              "error_message": "At least one document must be processed"
            },
            {
              "rule": "total_amount_check", 
              "expression": "sum(documents[*].amount) > 0",
              "error_message": "Total amount must be greater than zero"
            }
          ]
        }

- id: generate_processing_report
  name: generate_processing_report
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        from datetime import datetime
        
        # Compile processing report
        extracted = json.loads('''{{steps.process_document_batch.result.extracted_data}}''')
        parsed = json.loads('''{{steps.parse_complex_structures.result.parsed_data}}''')
        validation = json.loads('''{{steps.comprehensive_validation.result.validation_results}}''')
        
        report = {
            "processing_summary": {
                "timestamp": datetime.now().isoformat(),
                "total_documents": len(extracted.get('documents', [])),
                "extraction_mode": "BALANCED",
                "validation_passed": all(r.get('status') == 'valid' for r in validation)
            },
            "document_breakdown": {},
            "validation_summary": {
                "total_rules": len(validation),
                "passed": sum(1 for r in validation if r.get('status') == 'valid'),
                "failed": sum(1 for r in validation if r.get('status') == 'invalid')
            },
            "recommendations": []
        }
        
        # Document type breakdown
        if 'documents' in extracted:
            doc_types = {}
            total_amount = 0
            for doc in extracted['documents']:
                doc_type = doc.get('document_type', 'unknown')
                doc_types[doc_type] = doc_types.get(doc_type, 0) + 1
                total_amount += doc.get('amount', 0)
            
            report['document_breakdown'] = doc_types
            report['processing_summary']['total_amount'] = total_amount
        
        # Add recommendations
        if report['validation_summary']['failed'] > 0:
            report['recommendations'].append("Review failed validations and correct data issues")
        
        if report['processing_summary']['total_documents'] > 100:
            report['recommendations'].append("Consider processing in smaller batches for better performance")
        
        print(json.dumps(report, indent=2))

- id: save_processing_report
  name: save_processing_report
  tool: OUTPUT_FILE
  input:
    - name: content
      value: "{{steps.generate_processing_report.result.stdout}}"
    - name: filename
      value: "batch_processing_report_{{date | format('YYYY-MM-DD-HHmm')}}.json"
    - name: fileType
      value: "json"

Extraction Modes

FAST Mode

Speed: Fastest processing
Accuracy: Good for simple documents
Use cases: High-volume, simple document processing
Processing time: ~1-3 seconds per document

BALANCED Mode (Recommended)

Speed: Moderate processing speed
Accuracy: High accuracy for most documents
Use cases: General-purpose document processing
Processing time: ~3-8 seconds per document

QUALITY Mode

Speed: Slower but thorough processing
Accuracy: Highest accuracy for complex documents
Use cases: Critical documents, complex layouts
Processing time: ~8-15 seconds per document

Data Schema Design

Basic Schema Structure

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "field_name": {
      "type": "string|number|object|array",
      "description": "Clear description of the field",
      "format": "date|email|uri|etc",
      "pattern": "regex_pattern_if_needed"
    }
  },
  "required": ["list_of_required_fields"]
}

Advanced Schema Features

Nested objects: Complex data structures
Arrays: Multiple items of the same type
Conditional fields: Fields dependent on other values
Pattern matching: Regex validation
Format validation: Date, email, URL formats

Validation Rules

Field-level Validation

Type checking: String, number, boolean, array, object
Range validation: Min/max values for numbers
Length validation: Min/max length for strings
Format validation: Email, date, URL patterns
Enum validation: Allowed values from a list

Document-level Validation

Required fields: Mandatory data presence
Cross-field validation: Rules spanning multiple fields
Business logic: Custom validation rules
Consistency checks: Data coherence validation

Use Cases

Invoice Processing: Automated invoice data extraction and validation
Document Digitization: Convert paper documents to structured data
Data Migration: Extract data from legacy systems
Compliance Checking: Validate documents against regulations
Research Data: Extract structured data from research documents
Form Processing: Automate form data extraction
Contract Analysis: Extract key terms from contracts
Financial Processing: Process financial statements and reports

Best Practices

Schema Design

Keep schemas simple and focused
Use clear, descriptive field names
Include comprehensive descriptions
Test schemas with sample data
Version your schemas for consistency

Extraction Optimization

Choose appropriate extraction mode for your use case
Provide high-quality input documents
Use consistent document formats when possible
Monitor extraction accuracy and adjust as needed

Validation Strategy

Implement layered validation (field → document → business)
Provide clear error messages
Log validation results for analysis
Continuously improve validation rules based on results

Performance Considerations

Batch similar documents together
Use FAST mode for simple, high-volume processing
Monitor processing times and adjust extraction modes
Implement error handling for failed extractions

Getting Started

How to make your workflow

Tools

Credentials

Jinba Modules

Overview

Key Features

JINBA_MODULES_EXTRACT

JINBA_MODULES_PARSE

JINBA_MODULES_CHECKER_V2

Authentication

Example: Intelligent Document Extraction

Example: Batch Document Processing

Extraction Modes

FAST Mode

BALANCED Mode (Recommended)

QUALITY Mode

Data Schema Design

Basic Schema Structure

Advanced Schema Features

Validation Rules

Field-level Validation

Document-level Validation

Use Cases

Best Practices

Schema Design

Extraction Optimization

Validation Strategy

Performance Considerations

Getting Started

How to make your workflow

Tools

Credentials

​Overview

​Key Features

​JINBA_MODULES_EXTRACT

​JINBA_MODULES_PARSE

​JINBA_MODULES_CHECKER_V2

​Authentication

​Example: Intelligent Document Extraction

​Example: Batch Document Processing

​Extraction Modes

​FAST Mode

​BALANCED Mode (Recommended)

​QUALITY Mode

​Data Schema Design

​Basic Schema Structure

​Advanced Schema Features

​Validation Rules

​Field-level Validation

​Document-level Validation

​Use Cases

​Best Practices

​Schema Design

​Extraction Optimization

​Validation Strategy

​Performance Considerations

Overview

Key Features

JINBA_MODULES_EXTRACT

JINBA_MODULES_PARSE

JINBA_MODULES_CHECKER_V2

Authentication

Example: Intelligent Document Extraction

Example: Batch Document Processing

Extraction Modes

FAST Mode

BALANCED Mode (Recommended)

QUALITY Mode

Data Schema Design

Basic Schema Structure

Advanced Schema Features

Validation Rules

Field-level Validation

Document-level Validation

Use Cases

Best Practices

Schema Design

Extraction Optimization

Validation Strategy

Performance Considerations