PodAccounting

PPGCC / UFPE

PodAccounting é o podcast que transforma dissertações e teses em conversas relevantes. A maioria das pesquisas morre na prateleira. Aqui, não. O PodAccounting dá voz às dissertações e teses do PPGCC/UFPE - explorando o que elas realmente dizem, onde contribuem e onde podem ser questionadas. Sem simplificação excessiva. Sem academicismo vazio. Só boa pesquisa, bem discutida. Todos os debates são gerados por IA.

Episodes

  1. APR 9

    Ep. 01 - Machine learning and readability in accounting: an ensemble learning approach

    We employed FinBERT-PT-BR, a transformer-based language model trained on Brazilian Portuguese financial text, to develop an Informativeness Index designed to quantify the informational value of narrative disclosures. The dataset comprises 26,804 annual financial statement notes from 1,152 publicly listed companies in Brazil, spanning a 12-year period (2011–2023). In addition to this novel metric, we compute conventional readability measures, Flesch-Kincaid Reading Ease, Fog Index, SMOG Index, and Loughran-McDonald Index, for each note. Machine learning regressors (Random Forest and Gradient Boosting) are then applied to assess which readability metric best approximates the informativeness score derived from the model’s three underlying dimensions: Boilerplateness, Completeness, and Density. Additionally, we conduct a feature-importance analyses across multiple models indicate that the Loughran-McDonald Index most closely captures the variation in informativeness, suggesting it is the most effective proxy for readability in Portuguese financial disclosures. These findings provide empirical evidence whose implication is able to provide new perspective on the theoretical linkage between textual complexity and informational obfuscation within an agency-theory framework. This research advances the literature by integrating language models and machine-learning techniques into the study of financial disclosure quality in Portuguese, a largely underexplored linguistic and regulatory context, while leveraging a large, longitudinal dataset. Future work could extend this approach by incorporating cross-linguistic models, human-rated benchmarks, or hybrid embeddings to further validate and refine the informativeness construct.

    19 min

About

PodAccounting é o podcast que transforma dissertações e teses em conversas relevantes. A maioria das pesquisas morre na prateleira. Aqui, não. O PodAccounting dá voz às dissertações e teses do PPGCC/UFPE - explorando o que elas realmente dizem, onde contribuem e onde podem ser questionadas. Sem simplificação excessiva. Sem academicismo vazio. Só boa pesquisa, bem discutida. Todos os debates são gerados por IA.