Documentation Index
Fetch the complete documentation index at: https://docs.jinba.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
LlamaCloud tools provide advanced document processing, parsing, and intelligent search capabilities. These tools leverage LlamaCloud’s powerful document understanding and retrieval-augmented generation (RAG) features for complex document workflows.
Key Features
LLAMA_CLOUD_PARSE
- Parse various file formats to markdown using advanced extraction
LLAMA_CLOUD_QUERY
- Query documents with intelligent retrieval
Authentication
For further details, click here.
To use LlamaCloud tools, you need:
- A LlamaCloud API key from LlamaCloud
- Create projects and upload documents to build your knowledge base
Note: Treat API keys as sensitive information and never commit them to public repositories.
Example: Document Processing Pipeline
- id: parse_document
name: parse_document
tool: LLAMA_CLOUD_PARSE
config:
- name: api_key
value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
- name: project_id
value: "your_project_id"
input:
- name: file_url
value: "https://example.com/technical_manual.pdf"
- name: parsing_instruction
value: |
Extract and preserve:
1. Technical specifications and parameters
2. Step-by-step procedures
3. Warning and safety information
4. Diagrams and figure descriptions
5. Reference tables and data
- id: process_parsed_content
name: process_parsed_content
tool: PYTHON_SANDBOX_RUN
input:
- name: script
value: |
import json
import re
# Get parsed content
markdown_content = {{steps.parse_document.result.markdown}}
# Extract sections
sections = {}
current_section = "introduction"
current_content = []
lines = markdown_content.split('\n')
for line in lines:
if line.startswith('# ') or line.startswith('## '):
# Save previous section
if current_content:
sections[current_section] = '\n'.join(current_content)
# Start new section
current_section = line.strip('#').strip().lower().replace(' ', '_')
current_content = []
else:
current_content.append(line)
# Save last section
if current_content:
sections[current_section] = '\n'.join(current_content)
# Extract key information
specs = []
procedures = []
warnings = []
for section_name, content in sections.items():
if 'spec' in section_name or 'parameter' in section_name:
specs.append(content)
elif 'procedure' in section_name or 'step' in section_name:
procedures.append(content)
elif 'warning' in section_name or 'safety' in section_name:
warnings.append(content)
result = {
"sections": sections,
"specifications": specs,
"procedures": procedures,
"warnings": warnings,
"total_sections": len(sections)
}
print(json.dumps(result))
Example: Multi-Document Analysis
- id: parse_multiple_documents
name: parse_multiple_documents
tool: PYTHON_SANDBOX_RUN
input:
- name: script
value: |
import json
# Define documents to process
documents = [
{
"url": "https://example.com/user_manual.pdf",
"type": "user_manual",
"instruction": "Focus on user instructions and troubleshooting"
},
{
"url": "https://example.com/technical_spec.pdf",
"type": "technical_spec",
"instruction": "Extract all technical parameters and specifications"
},
{
"url": "https://example.com/installation_guide.pdf",
"type": "installation",
"instruction": "Capture installation steps and requirements"
}
]
print(json.dumps({"documents": documents}))
- id: parse_user_manual
name: parse_user_manual
tool: LLAMA_CLOUD_PARSE
config:
- name: api_key
value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
- name: project_id
value: "your_project_id"
input:
- name: file_url
value: "{{steps.parse_multiple_documents.result.documents[0].url}}"
- name: parsing_instruction
value: "{{steps.parse_multiple_documents.result.documents[0].instruction}}"
- id: parse_technical_spec
name: parse_technical_spec
tool: LLAMA_CLOUD_PARSE
config:
- name: api_key
value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
- name: project_id
value: "your_project_id"
input:
- name: file_url
value: "{{steps.parse_multiple_documents.result.documents[1].url}}"
- name: parsing_instruction
value: "{{steps.parse_multiple_documents.result.documents[1].instruction}}"
- id: create_unified_documentation
name: create_unified_documentation
tool: OPENAI_INVOKE
config:
- name: version
value: gpt-4
input:
- name: prompt
value: |
Create a unified documentation guide by combining information from these sources:
User Manual Content:
{{steps.parse_user_manual.result.markdown}}
Technical Specifications:
{{steps.parse_technical_spec.result.markdown}}
Create a single, well-organized document with:
1. Executive Summary
2. Installation Guide
3. Configuration Reference
4. User Operation Guide
5. Troubleshooting Section
6. Technical Specifications Appendix
Ensure consistency and eliminate redundancy.
Example: RAG-Powered Q&A System
- id: setup_qa_system
name: setup_qa_system
tool: INPUT_JSON_WITH_VALIDATION
input:
- name: value
value: {
"questions": [
"What are the minimum system requirements?",
"How do I configure network settings?",
"What should I do if the system fails to start?",
"How often should I perform maintenance?",
"What are the warranty terms?"
]
}
- id: answer_questions
name: answer_questions
tool: PYTHON_SANDBOX_RUN
input:
- name: script
value: |
import json
questions = {{steps.setup_qa_system.result.questions}}
qa_pairs = []
for i, question in enumerate(questions):
qa_pairs.append({
"id": f"q_{i+1}",
"question": question,
"status": "pending"
})
print(json.dumps({"qa_pairs": qa_pairs}))
- id: get_answer_1
name: get_answer_1
tool: LLAMA_CLOUD_QUERY
config:
- name: api_key
value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
- name: project_id
value: "your_project_id"
input:
- name: query
value: "{{steps.answer_questions.result.qa_pairs[0].question}}"
- name: include_sources
value: true
- id: get_answer_2
name: get_answer_2
tool: LLAMA_CLOUD_QUERY
config:
- name: api_key
value: "{{secrets.LLAMA_CLOUD_API_KEY}}"
- name: project_id
value: "your_project_id"
input:
- name: query
value: "{{steps.answer_questions.result.qa_pairs[1].question}}"
- name: include_sources
value: true
- id: compile_faq
name: compile_faq
tool: PYTHON_SANDBOX_RUN
input:
- name: script
value: |
import json
# Compile all Q&A pairs
faq_items = [
{
"question": "{{steps.answer_questions.result.qa_pairs[0].question}}",
"answer": "{{steps.get_answer_1.result.response}}",
"sources": "{{steps.get_answer_1.result.sources}}"
},
{
"question": "{{steps.answer_questions.result.qa_pairs[1].question}}",
"answer": "{{steps.get_answer_2.result.response}}",
"sources": "{{steps.get_answer_2.result.sources}}"
}
]
# Format as FAQ document
faq_markdown = "# Frequently Asked Questions\n\n"
for i, item in enumerate(faq_items, 1):
faq_markdown += f"## {i}. {item['question']}\n\n"
faq_markdown += f"{item['answer']}\n\n"
if item['sources']:
faq_markdown += f"*Sources: {item['sources']}*\n\n"
faq_markdown += "---\n\n"
print(json.dumps({"faq_document": faq_markdown}))
Tips and Best Practices
- Use specific parsing instructions for better content extraction
- Organize documents into logical projects for efficient searching
- Leverage LlamaIndex integration for advanced retrieval patterns
- Include source references for transparency and verification
- Process documents in batches for large-scale operations
- Use appropriate retrieval modes based on query types
- Monitor API usage for cost optimization
- Create structured indexes for frequently accessed information