Advanced Google Sheets

Overview

Advanced Google Sheets tools provide enhanced integration capabilities with Google Sheets, including the ability to retrieve complete sheet data. This complements the existing Google Sheets and Google Spreadsheet tools with additional functionality for comprehensive data access.

Key Features

GOOGLE_SHEETS_GET_SHEET

Retrieve complete sheet data from Google Sheets
Access to all cells and formatting information
Support for multiple sheets within a workbook
Efficient data retrieval for large datasets

Authentication

Google Sheets integration requires Google API credentials and appropriate permissions. Required setup:

Google Cloud Console project with Sheets API enabled
Service account credentials or OAuth 2.0 client
Appropriate sharing permissions on target sheets

Note: Treat API credentials as sensitive information and never commit them to public repositories.

Example: Complete Sheet Data Retrieval

- id: get_full_sheet_data
  name: get_full_sheet_data
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
    - name: sheet_name
      value: "Sheet1"  # Optional: defaults to first sheet
    - name: include_formatting
      value: true
    - name: include_metadata
      value: true

- id: analyze_sheet_structure
  name: analyze_sheet_structure
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        
        # Analyze retrieved sheet data
        sheet_data = json.loads('''{{steps.get_full_sheet_data.result.sheet_data}}''')
        
        print("Google Sheet Analysis")
        print("=" * 25)
        
        # Basic information
        print(f"Sheet Name: {sheet_data.get('sheet_name', 'Unknown')}")
        print(f"Sheet ID: {sheet_data.get('sheet_id', 'Unknown')}")
        
        # Data dimensions
        rows = sheet_data.get('rows', [])
        if rows:
            max_cols = max(len(row) for row in rows)
            print(f"Dimensions: {len(rows)} rows × {max_cols} columns")
            
            # Data preview
            print("\nData Preview (first 5 rows):")
            for i, row in enumerate(rows[:5], 1):
                print(f"Row {i}: {row[:5]}...")  # Show first 5 columns
        else:
            print("No data found in sheet")
        
        # Metadata
        if 'metadata' in sheet_data:
            metadata = sheet_data['metadata']
            print(f"\nMetadata:")
            print(f"  Last Modified: {metadata.get('modified_time', 'Unknown')}")
            print(f"  Owner: {metadata.get('owner', 'Unknown')}")

- id: convert_to_structured_format
  name: convert_to_structured_format
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        
        sheet_data = json.loads('''{{steps.get_full_sheet_data.result.sheet_data}}''')
        rows = sheet_data.get('rows', [])
        
        if not rows:
            print("No data to convert")
        else:
            # Assume first row contains headers
            headers = rows[0] if rows else []
            data_rows = rows[1:] if len(rows) > 1 else []
            
            # Convert to array of objects
            structured_data = []
            for row in data_rows:
                row_obj = {}
                for i, header in enumerate(headers):
                    value = row[i] if i < len(row) else ""
                    row_obj[header] = value
                structured_data.append(row_obj)
            
            # Output structured data
            result = {
                "headers": headers,
                "data": structured_data,
                "record_count": len(structured_data)
            }
            
            print(json.dumps(result, indent=2))

- id: export_structured_data
  name: export_structured_data
  tool: OUTPUT_FILE
  input:
    - name: content
      value: "{{steps.convert_to_structured_format.result.stdout}}"
    - name: filename
      value: "google_sheets_export_{{date | format('YYYY-MM-DD')}}.json"
    - name: fileType
      value: "json"

Example: Multi-Sheet Workbook Processing

- id: get_workbook_info
  name: get_workbook_info
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "{{input.workbook_id}}"
    - name: get_sheet_list
      value: true  # Get list of all sheets in workbook

- id: process_all_sheets
  name: process_all_sheets
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        
        # Get list of sheets from workbook
        workbook_info = json.loads('''{{steps.get_workbook_info.result.workbook_info}}''')
        sheets = workbook_info.get('sheets', [])
        
        print("Multi-Sheet Processing Plan")
        print("=" * 30)
        print(f"Workbook ID: {workbook_info.get('workbook_id', 'Unknown')}")
        print(f"Total Sheets: {len(sheets)}")
        print("\nSheets to process:")
        
        processing_plan = []
        for i, sheet in enumerate(sheets):
            sheet_name = sheet.get('name', f'Sheet{i+1}')
            sheet_info = {
                "sheet_name": sheet_name,
                "step_id": f"get_sheet_{i+1}",
                "processing_order": i+1
            }
            processing_plan.append(sheet_info)
            print(f"  {i+1}. {sheet_name}")
        
        print(f"\nProcessing plan created for {len(processing_plan)} sheets")
        print(json.dumps(processing_plan, indent=2))

- id: get_sales_sheet
  name: get_sales_sheet
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "{{input.workbook_id}}"
    - name: sheet_name
      value: "Sales Data"
    - name: include_formatting
      value: true

- id: get_inventory_sheet
  name: get_inventory_sheet
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "{{input.workbook_id}}"
    - name: sheet_name
      value: "Inventory"
    - name: include_formatting
      value: true

- id: consolidate_workbook_data
  name: consolidate_workbook_data
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        from datetime import datetime
        
        # Consolidate data from multiple sheets
        sales_data = json.loads('''{{steps.get_sales_sheet.result.sheet_data}}''')
        inventory_data = json.loads('''{{steps.get_inventory_sheet.result.sheet_data}}''')
        
        consolidated_report = {
            "report_date": datetime.now().isoformat(),
            "workbook_id": "{{input.workbook_id}}",
            "sheets_processed": [],
            "summary_statistics": {},
            "data_quality_checks": []
        }
        
        # Process sales data
        if sales_data.get('rows'):
            sales_rows = len(sales_data['rows']) - 1  # Exclude header
            consolidated_report["sheets_processed"].append({
                "sheet_name": "Sales Data",
                "row_count": sales_rows,
                "status": "processed"
            })
        
        # Process inventory data
        if inventory_data.get('rows'):
            inventory_rows = len(inventory_data['rows']) - 1  # Exclude header
            consolidated_report["sheets_processed"].append({
                "sheet_name": "Inventory",
                "row_count": inventory_rows,
                "status": "processed"
            })
        
        # Calculate summary statistics
        total_rows = sum(sheet["row_count"] for sheet in consolidated_report["sheets_processed"])
        consolidated_report["summary_statistics"] = {
            "total_data_rows": total_rows,
            "sheets_processed": len(consolidated_report["sheets_processed"]),
            "processing_time": "{{date | format('HH:mm:ss')}}"
        }
        
        print("Workbook Consolidation Complete")
        print("=" * 35)
        print(json.dumps(consolidated_report, indent=2))

- id: save_consolidated_report
  name: save_consolidated_report
  tool: OUTPUT_FILE
  input:
    - name: content
      value: "{{steps.consolidate_workbook_data.result.stdout}}"
    - name: filename
      value: "workbook_consolidation_report.json"
    - name: fileType
      value: "json"

Example: Data Quality Analysis

- id: retrieve_data_for_analysis
  name: retrieve_data_for_analysis
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "{{input.analysis_sheet_id}}"
    - name: sheet_name
      value: "{{input.sheet_name}}"
    - name: include_formatting
      value: true
    - name: include_metadata
      value: true

- id: perform_data_quality_analysis
  name: perform_data_quality_analysis
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        import re
        from collections import Counter
        
        # Analyze data quality
        sheet_data = json.loads('''{{steps.retrieve_data_for_analysis.result.sheet_data}}''')
        rows = sheet_data.get('rows', [])
        
        if not rows:
            print("No data found for analysis")
        else:
            headers = rows[0] if rows else []
            data_rows = rows[1:] if len(rows) > 1 else []
            
            print("Data Quality Analysis Report")
            print("=" * 35)
            print(f"Dataset: {sheet_data.get('sheet_name', 'Unknown')}")
            print(f"Total Records: {len(data_rows)}")
            print(f"Total Columns: {len(headers)}")
            print()
            
            # Column-by-column analysis
            quality_report = {
                "overall_stats": {
                    "total_records": len(data_rows),
                    "total_columns": len(headers),
                    "completeness_score": 0
                },
                "column_analysis": [],
                "data_issues": []
            }
            
            total_cells = 0
            empty_cells = 0
            
            for col_idx, header in enumerate(headers):
                col_values = [row[col_idx] if col_idx < len(row) else "" for row in data_rows]
                
                # Calculate column statistics
                non_empty_values = [v for v in col_values if v.strip()]
                empty_count = len(col_values) - len(non_empty_values)
                completeness = (len(non_empty_values) / len(col_values)) * 100 if col_values else 0
                
                # Data type detection
                numeric_count = sum(1 for v in non_empty_values if str(v).replace('.', '').replace('-', '').isdigit())
                date_count = sum(1 for v in non_empty_values if re.match(r'\d{1,2}[/-]\d{1,2}[/-]\d{2,4}', str(v)))
                email_count = sum(1 for v in non_empty_values if '@' in str(v) and '.' in str(v))
                
                likely_type = "text"
                if numeric_count / len(non_empty_values) > 0.8 if non_empty_values else False:
                    likely_type = "numeric"
                elif date_count / len(non_empty_values) > 0.5 if non_empty_values else False:
                    likely_type = "date"
                elif email_count / len(non_empty_values) > 0.5 if non_empty_values else False:
                    likely_type = "email"
                
                col_analysis = {
                    "column_name": header,
                    "completeness_percent": round(completeness, 2),
                    "empty_cells": empty_count,
                    "likely_data_type": likely_type,
                    "unique_values": len(set(non_empty_values)),
                    "sample_values": non_empty_values[:3]
                }
                
                quality_report["column_analysis"].append(col_analysis)
                
                total_cells += len(col_values)
                empty_cells += empty_count
                
                # Identify potential issues
                if completeness < 90:
                    quality_report["data_issues"].append(f"Column '{header}' has low completeness ({completeness:.1f}%)")
                
                if len(set(non_empty_values)) == 1 and non_empty_values:
                    quality_report["data_issues"].append(f"Column '{header}' has only one unique value")
            
            # Overall completeness score
            overall_completeness = ((total_cells - empty_cells) / total_cells) * 100 if total_cells > 0 else 0
            quality_report["overall_stats"]["completeness_score"] = round(overall_completeness, 2)
            
            print(f"Overall Data Completeness: {overall_completeness:.1f}%")
            print()
            
            # Display column analysis
            print("Column Analysis:")
            for col in quality_report["column_analysis"]:
                print(f"  📊 {col['column_name']}")
                print(f"     Completeness: {col['completeness_percent']}%")
                print(f"     Type: {col['likely_data_type']}")
                print(f"     Unique Values: {col['unique_values']}")
                print()
            
            # Display issues
            if quality_report["data_issues"]:
                print("🚨 Data Quality Issues:")
                for issue in quality_report["data_issues"]:
                    print(f"  - {issue}")
            else:
                print("✅ No major data quality issues detected")
            
            print("\n" + json.dumps(quality_report, indent=2))

- id: create_quality_recommendations
  name: create_quality_recommendations
  tool: OPENAI_INVOKE
  config:
    - name: version
      value: gpt-4
  input:
    - name: prompt
      value: |
        Based on this data quality analysis report, please provide specific recommendations for improving data quality:
        
        {{steps.perform_data_quality_analysis.result.stdout}}
        
        Please provide:
        1. Priority issues to address first
        2. Specific steps to improve data completeness
        3. Data validation rules to implement
        4. Best practices for ongoing data maintenance
        5. Suggested data cleaning workflows

- id: save_quality_analysis_report
  name: save_quality_analysis_report
  tool: OUTPUT_FILE
  input:
    - name: content
      value: |
        Google Sheets Data Quality Analysis Report
        ========================================
        
        Generated: {{date | format('YYYY-MM-DD HH:mm:ss')}}
        
        ## Analysis Results
        {{steps.perform_data_quality_analysis.result.stdout}}
        
        ## Recommendations
        {{steps.create_quality_recommendations.result.content}}
        
        ## Next Steps
        1. Address high-priority data quality issues
        2. Implement recommended data validation rules
        3. Set up regular data quality monitoring
        4. Train users on data entry best practices
        5. Consider automated data cleaning workflows
    - name: filename
      value: "data_quality_analysis_{{date | format('YYYY-MM-DD')}}.txt"
    - name: fileType
      value: "txt"

Sheet Retrieval Options

Basic Retrieval

sheet_id: Google Sheets document ID
sheet_name: Specific sheet tab name (optional)
range: Specific cell range (optional)

Advanced Options

include_formatting: Include cell formatting information
include_metadata: Include sheet metadata (owner, modified date, etc.)
get_sheet_list: Retrieve list of all sheets in workbook
preserve_empty_cells: Include empty cells in output

Data Format

Standard Response

{
  "sheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
  "sheet_name": "Sheet1",
  "rows": [
    ["Header 1", "Header 2", "Header 3"],
    ["Value 1", "Value 2", "Value 3"],
    ["Value 4", "Value 5", "Value 6"]
  ],
  "metadata": {
    "modified_time": "2024-01-15T10:30:00Z",
    "owner": "user@example.com"
  }
}

With Formatting

{
  "sheet_data": {
    "rows": [...],
    "formatting": {
      "cell_formats": {
        "A1": {"bold": true, "background_color": "#ffff00"},
        "B1": {"italic": true, "text_color": "#ff0000"}
      }
    }
  }
}

Use Cases

Data Analysis: Retrieve complete datasets for analysis
Reporting: Generate reports from Google Sheets data
Data Migration: Export data from Google Sheets to other systems
Quality Assurance: Analyze data quality and completeness
Backup and Archive: Create backups of important spreadsheets
Integration: Connect Google Sheets data to other workflow tools
Dashboard Creation: Pull data for dashboard and visualization tools
Compliance Reporting: Extract data for regulatory reporting

Best Practices

Performance Optimization

Retrieve only necessary data ranges
Use appropriate inclusion options (formatting, metadata)
Consider pagination for very large sheets
Cache frequently accessed data

Data Handling

Always validate retrieved data structure
Handle empty cells and missing data gracefully
Implement error handling for access permission issues
Consider data privacy and security requirements

Integration Strategy

Use structured data formats for downstream processing
Implement data quality checks after retrieval
Log data retrieval operations for audit trails
Plan for rate limiting and quota management

Getting Started

How to make your workflow

Tools

Credentials

Advanced Google Sheets

Overview

Key Features

GOOGLE_SHEETS_GET_SHEET

Authentication

Example: Complete Sheet Data Retrieval

Example: Multi-Sheet Workbook Processing

Example: Data Quality Analysis

Sheet Retrieval Options

Basic Retrieval

Advanced Options

Data Format

Standard Response

With Formatting

Use Cases

Best Practices

Performance Optimization

Data Handling

Integration Strategy

Getting Started

How to make your workflow

Tools

Credentials

​Overview

​Key Features

​GOOGLE_SHEETS_GET_SHEET

​Authentication

​Example: Complete Sheet Data Retrieval

​Example: Multi-Sheet Workbook Processing

​Example: Data Quality Analysis

​Sheet Retrieval Options

​Basic Retrieval

​Advanced Options

​Data Format

​Standard Response

​With Formatting

​Use Cases

​Best Practices

​Performance Optimization

​Data Handling

​Integration Strategy

Overview

Key Features

GOOGLE_SHEETS_GET_SHEET

Authentication

Example: Complete Sheet Data Retrieval

Example: Multi-Sheet Workbook Processing

Example: Data Quality Analysis

Sheet Retrieval Options

Basic Retrieval

Advanced Options

Data Format

Standard Response

With Formatting

Use Cases

Best Practices

Performance Optimization

Data Handling

Integration Strategy