In case you didn’t know, members of the Brazilian House of Representatives have access to a quota for the exercise of parliamentary activities (CEAP), or in other words “[an amount] destined to pay for representative’s expenses linked to the exercise of parliamentary activity”.
Each representative has the right to R$42.000,00 in reimbursements every month, varying a little from state to state.
Since the data referring to all reimbursements are available at the portal da transparência, I decided to make a small analysis in order to find out if the representatives used the CEAP adequately in 2016. In my view, there are two main ways we could detect an improper reimbursement claim (given the available data):
- If the refund category is suspicious
- If the time component of the refund is suspicious
Suspicious Category: Dissemination of Parliamentary Activity
When representatives request a reimbursement, they need to place it in one of 17 possible categories, among which we have “phone costs”, “fuel costs”, “flight tickets”, etc. To have and idea of what I was dealing with, I added the value of all reimbursements for each category and created a plot of the top 7.
The category that had the highest overall cost to the taxpayer was a surprise to me: dissemination of parliamentary activity. This category represented approximately 23% (R$48.645.429,54) of a total of 212 million reals reimbursed last year.
Seeing these numbers and the strangely vague title of this category, I was left with the impression that it was being used by representatives to cover for personal expenses. In order to be sure I wasn’t being to quick to judge, I created box-plots of the top 7 categories according to median reimbursement value so that I could analyse their outliers.
In the image above we can see that dissemination of parliamentary activity (the third box from left to right) doesn’t have the highest median, but its outliers are not like the others. If we examine the highest point in the whole plot, we’ll see that it represents a representative getting reimbursed for a total of R$184.500,00 in a small print shop´.
These outliers (many of which are for amounts much higher than the monthly reimbursement limit) confirm the theory that dissemination of parliamentary activity is being used for personal gain by some representatives.
Suspicious Time Component: Paid Vacations
Now moving on to the time component, I took the average reimbursement value for every day of the year; with this we can find out whether there is any time frame where the reimbursements get more expensive.
After creating numerous visualizations with these average (which you can reproduce with the code available at my repository), I found something weird in the plot below. It represents the density distribution of the average reimbursement value for each day. See if you can find the thing that drew my attention…
Observe the little “lumps” after the 1.5k reals mark. They represent that there are a big, but irregular number of days in which the average reimbursement blows up. If we find a time pattern in the distribution of outliers, this can mean that there is something weird happening.
To study this hypothesis, I created the time series of average reimbursement value day by day. The ups and downs throughout the year are the weekly tendencies, but pay attention to what happens on the right side of the plot.
The peak around the end of December and beginning of January tells us that in these days the reimbursements had very high average values. If we take into account some extra information found in the House’s website, we can see that this sudden rise on the average happens right after December 23rd (orange vertical line), when the representatives go on vacation.
In my opinion this time series shows that some representatives might be using the CEAP as a means to pay for their vacations.
This wasn’t in any way an exhaustive analysis, but it shows indications that our representatives are using public money to cover for personal expenses. In the beginning we set ourselves to investigate two aspects of the data set and were able to find suspicious activities in both; probably there are still many other ways to look at this data for which I wouldn’t have the time or skill, so I’ll encourage the readers to try to explore the “wonders” of the CEAP by themselves.
If you want the interactive version of the analysis in the post or download the data I used, visit my Kaggle Kernel Paid Vacations - Brazil’s House of Deputies. And if you want to see other studies about the CEAP, I encourage you to take a look at the Operação Serenata de Amor website.