Decoding the DNA of the Brazilian Economy: Why We Built a Foundation Model to Understand SMEs

Understand how a Graph Foundation Model and graph-based AI decode the risk and potential of SMEs in Brazil.
Bruno Alano
CTO, Co-founder

Small and Medium-sized Businesses (SMBs) are, without a doubt, the backbone of the Brazilian economy. Responsible for about 30% of the GDP and 54% of formal private sector jobs¹, SMEs make up 99% of all active companies in Brazil today. However, for any large company trying to serve them—whether by offering credit, insurance, services, or products—SMEs remain a black box.

The reality is that Brazil's SMB ecosystem is one of the most complex and dynamic in the world. In the first four months of 2024, 1,456,958 new CNPJs were created, of which 97.5% are micro or small businesses². Information is fragmented, public data is often insufficient, and traditional analysis models fail to capture the true health and potential of a business. This opacity creates a huge systemic cost: misallocated credit, lost growth opportunities, and an inefficiency that holds back the potential of the entire production chain. Studies from BNDES³ show that this segment continues to face the most restrictions in accessing formal financing. Between 2015 and 2019, the real balance of bank credit to micro, small, and medium-sized businesses (MSMBs) plummeted by 45.2%—nearly triple the contraction observed among large companies (-16.8%).

At Avra, our thesis has always been clear: to solve a problem of this magnitude, it wasn't enough to create a slightly better score or a more user-friendly dashboard. We needed to reinvent the fundamental approach to how business intelligence is generated. That's why we didn't just build a product; we built a Graph Foundation Model (GFM), a foundational model designed specifically for Brazil.

The Limits of the Traditional Paradigm

For decades, the market has relied on two pillars for business analysis: credit bureaus and internal teams. Both have their value, but they also have intrinsic limitations within the Brazilian context.

  1. The Static Snapshot from Bureaus: Traditional systems offer us a snapshot of the past. They consolidate registration data and negative credit history, which is useful but fundamentally reactive and incomplete. They show what a company is formally and what it has done in the past, but they fail to capture how it operates, its current momentum, and, most importantly, its position and influence within the complex economic web in which it is embedded.
  2. The Isolated View of Internal Analysis: Large companies possess invaluable data about their own customers—what we call "1st-party data." However, this view, as rich as it may be, is inherently isolated. An internal team, no matter how sophisticated, can hardly cross-reference its own data with the entirety of the market's connections and behaviors. Furthermore, the technical expertise required to build and operate models that learn from relationship networks at a massive scale is a profound challenge that distracts from the core business.

Both approaches analyze a company as an island. We believe that true understanding is relational, composed of partners, suppliers, customers, neighbors, legal disputes, and much more.

Avra's Thesis: A Company is a Network of Relationships

The risk and potential of an SME are not defined solely by its balance sheet, but by the strength and nature of its connections: with its partners (and their other businesses), its suppliers, its customers, its legal disputes, and even the economic vitality of its neighborhood.

To capture this reality, we built what is our most fundamental asset: a Large Knowledge Graph (LKG). Think of it as a living digital map of the entire Brazilian economy. This graph doesn't just store data; it connects information, structuring tens of millions of companies, individuals, legal proceedings, and other signals into a single, cohesive, and interconnected network of knowledge. It is the closest representation of our economy's "DNA."

From Map to Intelligence: Our Graph Foundation Model

Having the most complete map is just the first step. You need an intelligence engine capable of interpreting it in real-time and at scale.

Inspired by the advancements in Large Language Models (LLMs), which learn to "understand" language by processing vast amounts of text, we developed our own Foundation Model. The crucial difference is that instead of text, our model learns from the structure of our Knowledge Graph.

Our GFM learns the latent patterns of success and failure in Brazil. It doesn't just look at the attributes of an isolated CNPJ; it learns what it means to be a "high-growth tech startup," a "stable, family-owned industry in the South," or a "service provider with hidden legal risks in its partner network." It does this by analyzing the shape, density, and evolution of the connections of millions of other entities that have gone through similar situations.

The primary output of this model is not a score, but rather a rich mathematical representation of each entity—an embedding. This vector captures the essence of a company in a format that a machine can use for complex tasks, such as prediction, similarity searches, and anomaly detection.

The "Contact Lens": From General Intelligence to Personalized Analysis

A foundation model, no matter how powerful, offers a general view of the market. The real differentiator for our clients emerges when we apply this intelligence to their specific context.

This is where our "contact lens" concept comes in. When a client securely and privately integrates their proprietary data into our platform, we don't just add it to the graph. We use this data to perform a fine-tuning process, creating a segregated specialization layer on top of our Foundation Model.

This personalized layer—the "contact lens"—teaches our model to see the market from the client's perspective. It learns that company's specific definition of risk, the exact profile of its best customer, and the nuances of its particular ecosystem. It is crucial to emphasize that the client's data and the resulting fine-tuned model are for their exclusive use, ensuring total privacy and a unique competitive advantage.

Beyond Credit: The Future of Dynamic Intelligence

While credit risk analysis is the most immediate application and a universal pain point, it is just the first of many solutions our GFM can power. The platform we've built is designed to be the fundamental intelligence layer for the Brazilian B2B economy.

The same representations (embeddings) that generate a superior credit score can be used to:

  • Find Ideal Customers: Identify companies across Brazil that behave like your best current customers, going far beyond traditional demographic filters.
  • Marketing and Personalization: Understand competitive movements and identify "white spaces" in the economy, in addition to signaling platforms to find better leads.

Our mission at Avra is ambitious yet simple to articulate: to transform the way companies in Brazil understand each other and make decisions. We are replacing the static "snapshot" of the past with a dynamic and predictive "film" of the present and future. The journey is just beginning.