TLDR: A research paper proposes a cash flow-based underwriting system utilizing bank transaction data to enhance credit scoring for Micro, Small, and Medium Enterprises (MSMEs) in Malaysia. Traditional credit assessment often excludes MSMEs due to a lack of credit history. By developing a novel dataset of Malaysian MSME loan applicants, the study demonstrates that features derived from bank statements significantly improve the predictive performance of machine learning credit scoring models. This approach promises to expand access to financing for new-to-lending MSMEs and promote greater financial inclusion in the region.
Micro, Small, and Medium Enterprises (MSMEs) are the backbone of Malaysia’s economy, yet a significant number struggle to access formal financing. This challenge often stems from traditional credit assessment methods that heavily rely on credit bureau data, which many new or young MSMEs simply don’t have. This reliance creates a barrier, leading to an estimated MYR 90 billion funding gap for these crucial businesses.
A recent study explores a promising solution: leveraging bank statement transaction data as an alternative source for credit assessment. This approach aims to foster financial inclusion in emerging markets like Malaysia by providing a more dynamic and accurate picture of an MSME’s financial health.
A New Approach to Underwriting
The research proposes an end-to-end cash flow underwriting workflow designed to integrate bank statement transaction data into the credit decision-making process. This system enhances traditional underwriting by focusing on real-time financial signals rather than just historical credit records. The workflow is structured into three main layers:
- Web Layer (Customer Onboarding): This is the entry point where MSME owners submit loan applications and bank statements through a web-based platform.
- Application Layer (Bank Statement Analyser): This layer automatically extracts and converts unstructured transaction data into a structured format. It then analyzes cash flow indicators, derives bank statement-related features, and applies rule-based checks to detect potential fraud or data anomalies.
- Data & Scoring Layer (Cash Flow Underwriting): Here, the analyzed data and engineered features are securely stored. This data is then used to train predictive models that estimate the probability of default and classify credit risk.
This innovative workflow offers several advantages. It automates manual tasks, significantly shortening the loan application turnaround time and improving operational efficiency. More importantly, it expands credit access for MSMEs with limited or no credit history by using transactional data, thereby promoting financial inclusion.
The Malaysian Dataset and Key Findings
To investigate this potential, the researchers collaborated with a Malaysian lending institution to create the first-ever Malaysian bank statement dataset for MSME loan applicants. This dataset comprises 611 loan applicants, with detailed application information and six months of bank transaction data for each. The study used the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework to guide its methodology.
The empirical results are compelling. The study found that incorporating bank transaction-derived features substantially boosts the performance of all credit scoring models evaluated, including Logistic Regression, Random Forest, Gradient Boosting, and AdaBoost. Logistic Regression, in particular, showed a significant improvement in its ability to discriminate between defaulting and non-defaulting businesses when bank transaction data was included. In fact, nine out of ten transaction-based features demonstrated stronger predictive power than all application form features, highlighting their superior ability to distinguish credit outcomes.
When both application information and bank transaction data were combined, the models achieved their best predictive power, confirming that transactional data provides significant incremental value. This strongly supports the hypothesis that bank statement transaction data offers substantial predictive power in assessing credit risk for Malaysian MSMEs.
Also Read:
- How Federated Learning is Reshaping Financial Security
- AI Agents Enhance Corporate Credit Analysis Through Structured Debate
Looking Ahead
In conclusion, this study demonstrates that bank statement transaction data can serve as a powerful alternative for MSME loan underwriting in Malaysia. The features derived from these transactions capture dynamic aspects of an MSME’s financial behavior often missed by traditional credit models. The findings pave the way for more inclusive and efficient credit assessment practices, ultimately helping to bridge the financing gap for Malaysian MSMEs. Future work aims to integrate real-time transactional data to further enhance the assessment of creditworthiness for businesses with limited credit history. You can read the full research paper here: Cash Flow Underwriting with Bank Transaction Data: Advancing MSME Financial Inclusion in Malaysia.


