Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.jinba.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

LlamaCloud tools provide advanced document processing, parsing, and intelligent search capabilities. These tools leverage LlamaCloud’s powerful document understanding and retrieval-augmented generation (RAG) features for complex document workflows.

Key Features

  • LLAMA_CLOUD_PARSE
    • Parse various file formats to markdown using advanced extraction
  • LLAMA_CLOUD_QUERY
    • Query documents with intelligent retrieval

Authentication

For further details, click here. To use LlamaCloud tools, you need:
  1. A LlamaCloud API key from LlamaCloud
  2. Create projects and upload documents to build your knowledge base
Note: Treat API keys as sensitive information and never commit them to public repositories.

Example: Document Processing Pipeline

- id: parse_document
  name: parse_document
  tool: LLAMA_CLOUD_PARSE
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: file_url
      value: "https://example.com/technical_manual.pdf"
    - name: parsing_instruction
      value: |
        Extract and preserve:
        1. Technical specifications and parameters
        2. Step-by-step procedures
        3. Warning and safety information
        4. Diagrams and figure descriptions
        5. Reference tables and data

- id: process_parsed_content
  name: process_parsed_content
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        import re
        
        # Get parsed content
        markdown_content = {{steps.parse_document.result.markdown}}
        
        # Extract sections
        sections = {}
        current_section = "introduction"
        current_content = []
        
        lines = markdown_content.split('\n')
        for line in lines:
            if line.startswith('# ') or line.startswith('## '):
                # Save previous section
                if current_content:
                    sections[current_section] = '\n'.join(current_content)
                
                # Start new section
                current_section = line.strip('#').strip().lower().replace(' ', '_')
                current_content = []
            else:
                current_content.append(line)
        
        # Save last section
        if current_content:
            sections[current_section] = '\n'.join(current_content)
        
        # Extract key information
        specs = []
        procedures = []
        warnings = []
        
        for section_name, content in sections.items():
            if 'spec' in section_name or 'parameter' in section_name:
                specs.append(content)
            elif 'procedure' in section_name or 'step' in section_name:
                procedures.append(content)
            elif 'warning' in section_name or 'safety' in section_name:
                warnings.append(content)
        
        result = {
            "sections": sections,
            "specifications": specs,
            "procedures": procedures,
            "warnings": warnings,
            "total_sections": len(sections)
        }
        
        print(json.dumps(result))

Example: Multi-Document Analysis

- id: parse_multiple_documents
  name: parse_multiple_documents
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        
        # Define documents to process
        documents = [
            {
                "url": "https://example.com/user_manual.pdf",
                "type": "user_manual",
                "instruction": "Focus on user instructions and troubleshooting"
            },
            {
                "url": "https://example.com/technical_spec.pdf", 
                "type": "technical_spec",
                "instruction": "Extract all technical parameters and specifications"
            },
            {
                "url": "https://example.com/installation_guide.pdf",
                "type": "installation",
                "instruction": "Capture installation steps and requirements"
            }
        ]
        
        print(json.dumps({"documents": documents}))

- id: parse_user_manual
  name: parse_user_manual
  tool: LLAMA_CLOUD_PARSE
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: file_url
      value: "{{steps.parse_multiple_documents.result.documents[0].url}}"
    - name: parsing_instruction
      value: "{{steps.parse_multiple_documents.result.documents[0].instruction}}"

- id: parse_technical_spec
  name: parse_technical_spec
  tool: LLAMA_CLOUD_PARSE
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: file_url
      value: "{{steps.parse_multiple_documents.result.documents[1].url}}"
    - name: parsing_instruction
      value: "{{steps.parse_multiple_documents.result.documents[1].instruction}}"

- id: create_unified_documentation
  name: create_unified_documentation
  tool: OPENAI_INVOKE
  config:
    - name: version
      value: gpt-4
  input:
    - name: prompt
      value: |
        Create a unified documentation guide by combining information from these sources:
        
        User Manual Content:
        {{steps.parse_user_manual.result.markdown}}
        
        Technical Specifications:
        {{steps.parse_technical_spec.result.markdown}}
        
        Create a single, well-organized document with:
        1. Executive Summary
        2. Installation Guide
        3. Configuration Reference  
        4. User Operation Guide
        5. Troubleshooting Section
        6. Technical Specifications Appendix
        
        Ensure consistency and eliminate redundancy.

Example: RAG-Powered Q&A System

- id: setup_qa_system
  name: setup_qa_system
  tool: INPUT_JSON_WITH_VALIDATION
  input:
    - name: value
      value: {
        "questions": [
          "What are the minimum system requirements?",
          "How do I configure network settings?",
          "What should I do if the system fails to start?",
          "How often should I perform maintenance?",
          "What are the warranty terms?"
        ]
      }

- id: answer_questions
  name: answer_questions
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        
        questions = {{steps.setup_qa_system.result.questions}}
        qa_pairs = []
        
        for i, question in enumerate(questions):
            qa_pairs.append({
                "id": f"q_{i+1}",
                "question": question,
                "status": "pending"
            })
        
        print(json.dumps({"qa_pairs": qa_pairs}))

- id: get_answer_1
  name: get_answer_1
  tool: LLAMA_CLOUD_QUERY
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: query
      value: "{{steps.answer_questions.result.qa_pairs[0].question}}"
    - name: include_sources
      value: true

- id: get_answer_2
  name: get_answer_2
  tool: LLAMA_CLOUD_QUERY
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: query
      value: "{{steps.answer_questions.result.qa_pairs[1].question}}"
    - name: include_sources
      value: true

- id: compile_faq
  name: compile_faq
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        
        # Compile all Q&A pairs
        faq_items = [
            {
                "question": "{{steps.answer_questions.result.qa_pairs[0].question}}",
                "answer": "{{steps.get_answer_1.result.response}}",
                "sources": "{{steps.get_answer_1.result.sources}}"
            },
            {
                "question": "{{steps.answer_questions.result.qa_pairs[1].question}}",
                "answer": "{{steps.get_answer_2.result.response}}",
                "sources": "{{steps.get_answer_2.result.sources}}"
            }
        ]
        
        # Format as FAQ document
        faq_markdown = "# Frequently Asked Questions\n\n"
        
        for i, item in enumerate(faq_items, 1):
            faq_markdown += f"## {i}. {item['question']}\n\n"
            faq_markdown += f"{item['answer']}\n\n"
            if item['sources']:
                faq_markdown += f"*Sources: {item['sources']}*\n\n"
            faq_markdown += "---\n\n"
        
        print(json.dumps({"faq_document": faq_markdown}))

Tips and Best Practices

  • Use specific parsing instructions for better content extraction
  • Organize documents into logical projects for efficient searching
  • Leverage LlamaIndex integration for advanced retrieval patterns
  • Include source references for transparency and verification
  • Process documents in batches for large-scale operations
  • Use appropriate retrieval modes based on query types
  • Monitor API usage for cost optimization
  • Create structured indexes for frequently accessed information