OCR Service

This guide shows you how to use the ocr_service provider to extract tax return data from PDF documents using optical character recognition (OCR).

Goal

Enable automated data extraction from tax documents when:

You have tax returns in PDF format (scans or exports)
The company's tax filings are not yet available from official sources
You need to process historical documents not available digitally
You want to accelerate data collection from client-provided documents

Use Cases

OCR Service actions

Extract structured financial data from tax return PDFs automatically.

Extract

Digitize documents

Process scanned tax bundles
Extract data from PDF exports
Parse handwritten or typed forms

Accelerate

Speed up onboarding

Skip manual data entry
Process documents in batch
Get structured data immediately

Complement

Fill the gaps

Add missing fiscal years
Complement INPI public data
Process foreign documents

Supported Data

Data Type	Description	Output
Tax Return	Tax bundles (2050, 2033, etc.)	Structured financial data
Tax Return Analysis	Financial ratios and insights	Computed indicators

Prerequisites

Before using ocr_service, ensure:

You have the PDF documents to process
Documents are readable (not too blurry or damaged)
The user exists in your system

Configuration

Step 1: Enable the OCR Service provider

PUT /api/v6/providers/ocr_service
{
  "enable": true
}

PUT /api/v6/providers/ocr_service/settings
{
  "auto_connect": false
}

Step 2: Create the data connection

POST /api/v6/users/{userId}/data-connections
{
  "requested_data_types": ["TAX_RETURN"],
  "provider_name": "ocr_service"
}

Uploading Documents for OCR

Submit a tax return PDF

Use the following endpoint to upload a tax bundle document for OCR processing:

POST/api/v6/input/users/{userId}/tax-returns/ocr-service

POST /api/v6/input/users/{userId}/tax-returns/ocr-service
{
  "tax_return_id": "tax-return-2023-001",
  "data": {
    "file": "JVBERi0xLjQKJeLjz9...",
    "closing_date": "2023-12-31",
    "closing_year": "2023",
    "submitted_date": "2024-05-15",
    "type": "C",
    "duration": 12,
    "privacy": "PRIVATE"
  }
}

Request parameters

Field	Type	Required	Description
`tax_return_id`	string	Yes	Your unique identifier for this tax return
`data.file`	string	Yes	Base64-encoded PDF document
`data.closing_date`	date	Yes	Fiscal year end date (YYYY-MM-DD)
`data.closing_year`	string	Yes	Fiscal year (e.g., "2023")
`data.submitted_date`	date	No	Date the return was filed
`data.type`	string	No	Bundle type: `C` (full), `S` (simplified), `K` (consolidated)
`data.duration`	integer	No	Fiscal year duration in months (default: 12)
`data.privacy`	string	No	Visibility setting

Response

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "tax_return_id": "tax-return-2023-001",
  "process_status": "PENDING",
  "error_message": null,
  "data": {
    "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "revenues": 0,
    "net_profit": 0,
    "file_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "closing_date": "2023-12-31",
    "closing_year": "2023",
    "submitted_date": "2024-05-15",
    "type": "C",
    "duration": 12,
    "privacy": "PRIVATE",
    "tax_return_values": [],
    "millesime": "2024"
  }
}

Checking OCR Status

List all OCR processing jobs

GET/api/v6/input/users/{userId}/tax-returns

Query parameters

Parameter	Type	Default	Description
`page`	integer	1	Page number
`per_page`	integer	20	Items per page

Response

{
  "total": 3,
  "per_page": 20,
  "current_page": 1,
  "last_page": 1,
  "result": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "tax_return_id": "tax-return-2023-001",
      "process_status": "FINISHED",
      "error_message": null,
      "data": {
        "closing_date": "2023-12-31",
        "closing_year": "2023",
        "type": "C",
        "duration": 12,
        "revenue": 2500000,
        "net_profit": 180000
      },
      "created_at": "2024-01-15T10:30:00Z",
      "updated_at": "2024-01-15T10:35:00Z"
    },
    {
      "id": "660e8400-e29b-41d4-a716-446655440001",
      "tax_return_id": "tax-return-2022-001",
      "process_status": "IN_PROGRESS",
      "error_message": null,
      "data": {
        "closing_date": "2022-12-31",
        "closing_year": "2022"
      },
      "created_at": "2024-01-15T10:32:00Z",
      "updated_at": "2024-01-15T10:32:00Z"
    }
  ]
}

Processing statuses

Status	Description
`PENDING`	Document uploaded, waiting to be processed
`IN_PROGRESS`	OCR extraction in progress
`FINISHED`	Processing complete, data available
`ERROR`	Processing failed (check `error_message`)

Synchronizing the Data

Once the OCR processing is complete (process_status: "FINISHED"), trigger a synchronization to make the extracted data available through the standard tax return endpoints:

POST /api/v6/users/{userId}/sync
{
  "data_types": ["TAX_RETURN"]
}

Retrieving Processed Data

After synchronization, retrieve the extracted tax returns:

GET/api/v6/users/{userId}/tax-returns

Tax returns processed via OCR appear with provider_name: "ocr_service".

{
  "total": 1,
  "per_page": 20,
  "current_page": 1,
  "last_page": 1,
  "result": [
    {
      "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "revenues": 2500000,
      "net_profit": 180000,
      "file_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "closing_date": "2023-12-31",
      "closing_year": "2023",
      "millesime": "2024",
      "submitted_date": "2024-05-15",
      "type": "C",
      "duration": 12,
      "privacy": "PRIVATE",
      "provider_name": "ocr_service",
      "data_connection_id": "550e8400-e29b-41d4-a716-446655440000",
      "warnings": []
    }
  ]
}

Retrieve a single tax return

GET/api/v6/users/{userId}/tax-returns/{taxReturnId}

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "revenues": 2500000,
  "net_profit": 180000,
  "file_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "closing_date": "2023-12-31",
  "closing_year": "2023",
  "millesime": "2024",
  "submitted_date": "2024-05-15",
  "type": "C",
  "duration": 12,
  "privacy": "PRIVATE",
  "provider_name": "ocr_service",
  "data_connection_id": "550e8400-e29b-41d4-a716-446655440000",
  "warnings": [],
  "tax_return_values": [
    { "code": "FL", "values": [2500000, 0, 0, 0] },
    { "code": "HN", "values": [180000, 0, 0, 0] }
  ]
}

End-to-End Workflow

async function processDocumentWithOCR(userId, pdfBase64, fiscalYear) {
  // 1. Ensure OCR Service connection exists
  const connections = await qardApi.getDataConnections(userId);
  const ocrConnection = connections.find(c => c.provider_name === 'ocr_service');

  if (!ocrConnection) {
    await qardApi.createDataConnection(userId, {
      requested_data_types: ['TAX_RETURN'],
      provider_name: 'ocr_service'
    });
  }

  // 2. Upload the document for OCR processing
  const ocrJob = await qardApi.post(`/input/users/${userId}/tax-returns/ocr-service`, {
    tax_return_id: `tax-return-${fiscalYear}-${Date.now()}`,
    data: {
      file: pdfBase64,
      closing_date: `${fiscalYear}-12-31`,
      closing_year: String(fiscalYear),
      type: 'C',
      duration: 12
    }
  });

  console.log(`OCR job created: ${ocrJob.id}`);

  // 3. Poll for OCR completion
  let status = 'PENDING';
  while (status === 'PENDING' || status === 'IN_PROGRESS') {
    await sleep(5000); // Wait 5 seconds between checks

    const jobs = await qardApi.get(`/input/users/${userId}/tax-returns`);
    const currentJob = jobs.result.find(j => j.id === ocrJob.id);

    status = currentJob.process_status;
    console.log(`OCR status: ${status}`);

    if (status === 'ERROR') {
      throw new Error(`OCR failed: ${currentJob.error_message}`);
    }
  }

  // 4. Sync to integrate extracted data
  await qardApi.sync(userId, {
    data_types: ['TAX_RETURN']
  });

  // 5. Wait for sync completion
  await waitForSyncCompletion(userId);

  // 6. Retrieve the processed tax return
  const taxReturns = await qardApi.getTaxReturns(userId, {
    filter: { closing_year: fiscalYear, provider_name: 'ocr_service' }
  });

  return taxReturns[0];
}

// Helper function
function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Batch Processing

Process multiple documents efficiently:

async function batchProcessDocuments(userId, documents) {
  const results = [];

  // Upload all documents
  for (const doc of documents) {
    const job = await qardApi.post(`/input/users/${userId}/tax-returns/ocr-service`, {
      tax_return_id: doc.id,
      data: {
        file: doc.pdfBase64,
        closing_date: doc.closingDate,
        closing_year: doc.closingYear,
        type: doc.type || 'C',
        duration: doc.duration || 12
      }
    });
    results.push({ documentId: doc.id, jobId: job.id });
  }

  // Wait for all to complete
  let allComplete = false;
  while (!allComplete) {
    await sleep(10000);

    const jobs = await qardApi.get(`/input/users/${userId}/tax-returns`);
    const pendingJobs = jobs.result.filter(j =>
      results.some(r => r.jobId === j.id) &&
      (j.process_status === 'PENDING' || j.process_status === 'IN_PROGRESS')
    );

    allComplete = pendingJobs.length === 0;
    console.log(`${pendingJobs.length} jobs still processing...`);
  }

  // Single sync for all documents
  await qardApi.sync(userId, { data_types: ['TAX_RETURN'] });

  return results;
}

Error Handling

Common OCR errors and solutions:

Error	Cause	Solution
`INVALID_FORMAT`	File is not a valid PDF	Verify file format before upload
`UNREADABLE_DOCUMENT`	Document too blurry or damaged	Request a clearer scan
`UNSUPPORTED_FORM`	Form type not recognized	Check supported form types
`EXTRACTION_FAILED`	OCR could not extract data	Try with higher quality scan

async function handleOCRWithRetry(userId, pdfBase64, fiscalYear, maxRetries = 2) {
  let attempts = 0;

  while (attempts < maxRetries) {
    try {
      return await processDocumentWithOCR(userId, pdfBase64, fiscalYear);
    } catch (error) {
      attempts++;
      console.error(`OCR attempt ${attempts} failed: ${error.message}`);

      if (attempts >= maxRetries) {
        throw new Error(`OCR failed after ${maxRetries} attempts: ${error.message}`);
      }

      // Wait before retry
      await sleep(10000);
    }
  }
}

Best Practices

Document quality: Use high-resolution scans (300 DPI minimum) for best results.
File size: Keep PDF files under 10 MB for optimal processing speed.
Unique identifiers: Use meaningful tax_return_id values to track documents.
Batch wisely: Group related documents and sync once after all are processed.
Monitor status: Implement proper polling with exponential backoff for production.
Handle errors gracefully: Always check error_message when status is ERROR.

Goal​

Use Cases​

Extract

Accelerate

Complement

Supported Data​

Prerequisites​

Configuration​

Step 1: Enable the OCR Service provider​

Step 2: Create the data connection​

Uploading Documents for OCR​

Submit a tax return PDF​

Request parameters​

Response​

Checking OCR Status​

List all OCR processing jobs​

Query parameters​

Response​

Processing statuses​

Synchronizing the Data​

Retrieving Processed Data​

Retrieve a single tax return​

End-to-End Workflow​

Batch Processing​

Error Handling​

Best Practices​

See Also​

Goal

Use Cases

Supported Data

Prerequisites

Configuration

Step 1: Enable the OCR Service provider

Step 2: Create the data connection

Uploading Documents for OCR

Submit a tax return PDF

Request parameters

Response

Checking OCR Status

List all OCR processing jobs

Query parameters

Response

Processing statuses

Synchronizing the Data

Retrieving Processed Data

Retrieve a single tax return

End-to-End Workflow

Batch Processing

Error Handling

Best Practices

See Also