Overview

LlamaCloud tools provide advanced document processing, parsing, and intelligent search capabilities. These tools leverage LlamaCloud’s powerful document understanding and retrieval-augmented generation (RAG) features for complex document workflows.

Key Features

  • LLAMA_CLOUD_PARSE
    • Parse various file formats to markdown using advanced extraction
  • LLAMA_CLOUD_QUERY
    • Query documents with intelligent retrieval

Authentication

For further details, click here. To use LlamaCloud tools, you need:
  1. A LlamaCloud API key from LlamaCloud
  2. Create projects and upload documents to build your knowledge base
Note: Treat API keys as sensitive information and never commit them to public repositories.

Example: Document Processing Pipeline

- id: parse_document
  name: parse_document
  tool: LLAMA_CLOUD_PARSE
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: file_url
      value: "https://example.com/technical_manual.pdf"
    - name: parsing_instruction
      value: |
        Extract and preserve:
        1. Technical specifications and parameters
        2. Step-by-step procedures
        3. Warning and safety information
        4. Diagrams and figure descriptions
        5. Reference tables and data

- id: process_parsed_content
  name: process_parsed_content
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        import re
        
        # Get parsed content
        markdown_content = {{steps.parse_document.result.markdown}}
        
        # Extract sections
        sections = {}
        current_section = "introduction"
        current_content = []
        
        lines = markdown_content.split('\n')
        for line in lines:
            if line.startswith('# ') or line.startswith('## '):
                # Save previous section
                if current_content:
                    sections[current_section] = '\n'.join(current_content)
                
                # Start new section
                current_section = line.strip('#').strip().lower().replace(' ', '_')
                current_content = []
            else:
                current_content.append(line)
        
        # Save last section
        if current_content:
            sections[current_section] = '\n'.join(current_content)
        
        # Extract key information
        specs = []
        procedures = []
        warnings = []
        
        for section_name, content in sections.items():
            if 'spec' in section_name or 'parameter' in section_name:
                specs.append(content)
            elif 'procedure' in section_name or 'step' in section_name:
                procedures.append(content)
            elif 'warning' in section_name or 'safety' in section_name:
                warnings.append(content)
        
        result = {
            "sections": sections,
            "specifications": specs,
            "procedures": procedures,
            "warnings": warnings,
            "total_sections": len(sections)
        }
        
        print(json.dumps(result))

Example: Multi-Document Analysis

- id: parse_multiple_documents
  name: parse_multiple_documents
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        
        # Define documents to process
        documents = [
            {
                "url": "https://example.com/user_manual.pdf",
                "type": "user_manual",
                "instruction": "Focus on user instructions and troubleshooting"
            },
            {
                "url": "https://example.com/technical_spec.pdf", 
                "type": "technical_spec",
                "instruction": "Extract all technical parameters and specifications"
            },
            {
                "url": "https://example.com/installation_guide.pdf",
                "type": "installation",
                "instruction": "Capture installation steps and requirements"
            }
        ]
        
        print(json.dumps({"documents": documents}))

- id: parse_user_manual
  name: parse_user_manual
  tool: LLAMA_CLOUD_PARSE
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: file_url
      value: "{{steps.parse_multiple_documents.result.documents[0].url}}"
    - name: parsing_instruction
      value: "{{steps.parse_multiple_documents.result.documents[0].instruction}}"

- id: parse_technical_spec
  name: parse_technical_spec
  tool: LLAMA_CLOUD_PARSE
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: file_url
      value: "{{steps.parse_multiple_documents.result.documents[1].url}}"
    - name: parsing_instruction
      value: "{{steps.parse_multiple_documents.result.documents[1].instruction}}"

- id: create_unified_documentation
  name: create_unified_documentation
  tool: OPENAI_INVOKE
  config:
    - name: version
      value: gpt-4
  input:
    - name: prompt
      value: |
        Create a unified documentation guide by combining information from these sources:
        
        User Manual Content:
        {{steps.parse_user_manual.result.markdown}}
        
        Technical Specifications:
        {{steps.parse_technical_spec.result.markdown}}
        
        Create a single, well-organized document with:
        1. Executive Summary
        2. Installation Guide
        3. Configuration Reference  
        4. User Operation Guide
        5. Troubleshooting Section
        6. Technical Specifications Appendix
        
        Ensure consistency and eliminate redundancy.

Example: RAG-Powered Q&A System

- id: setup_qa_system
  name: setup_qa_system
  tool: INPUT_JSON_WITH_VALIDATION
  input:
    - name: value
      value: {
        "questions": [
          "What are the minimum system requirements?",
          "How do I configure network settings?",
          "What should I do if the system fails to start?",
          "How often should I perform maintenance?",
          "What are the warranty terms?"
        ]
      }

- id: answer_questions
  name: answer_questions
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        
        questions = {{steps.setup_qa_system.result.questions}}
        qa_pairs = []
        
        for i, question in enumerate(questions):
            qa_pairs.append({
                "id": f"q_{i+1}",
                "question": question,
                "status": "pending"
            })
        
        print(json.dumps({"qa_pairs": qa_pairs}))

- id: get_answer_1
  name: get_answer_1
  tool: LLAMA_CLOUD_QUERY
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: query
      value: "{{steps.answer_questions.result.qa_pairs[0].question}}"
    - name: include_sources
      value: true

- id: get_answer_2
  name: get_answer_2
  tool: LLAMA_CLOUD_QUERY
  config:
    - name: api_key
      value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
    - name: project_id
      value: "your_project_id"
  input:
    - name: query
      value: "{{steps.answer_questions.result.qa_pairs[1].question}}"
    - name: include_sources
      value: true

- id: compile_faq
  name: compile_faq
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: script
      value: |
        import json
        
        # Compile all Q&A pairs
        faq_items = [
            {
                "question": "{{steps.answer_questions.result.qa_pairs[0].question}}",
                "answer": "{{steps.get_answer_1.result.response}}",
                "sources": "{{steps.get_answer_1.result.sources}}"
            },
            {
                "question": "{{steps.answer_questions.result.qa_pairs[1].question}}",
                "answer": "{{steps.get_answer_2.result.response}}",
                "sources": "{{steps.get_answer_2.result.sources}}"
            }
        ]
        
        # Format as FAQ document
        faq_markdown = "# Frequently Asked Questions\n\n"
        
        for i, item in enumerate(faq_items, 1):
            faq_markdown += f"## {i}. {item['question']}\n\n"
            faq_markdown += f"{item['answer']}\n\n"
            if item['sources']:
                faq_markdown += f"*Sources: {item['sources']}*\n\n"
            faq_markdown += "---\n\n"
        
        print(json.dumps({"faq_document": faq_markdown}))

Tips and Best Practices

  • Use specific parsing instructions for better content extraction
  • Organize documents into logical projects for efficient searching
  • Leverage LlamaIndex integration for advanced retrieval patterns
  • Include source references for transparency and verification
  • Process documents in batches for large-scale operations
  • Use appropriate retrieval modes based on query types
  • Monitor API usage for cost optimization
  • Create structured indexes for frequently accessed information