In this project we Prepared and Analyzed a dataset about Italian's bills.
The dataset contained about 14 Millions of rows and every group choosed a library in a give list in order to do the work.
We selected Rapids (cuDf)
but, due to the dataset dimensions, we decided to change and use Dask-cuDf, a distributed version of Rapids.
For more info: project's slides
Francesco Pelacani
Giulio Querzoli
Mirco Botti
| Col name | Target data type | Description |
|---|---|---|
| user_code | String | (Anonymized) code for the customer that owns this utility |
| customer_code | String | Combined with user_code provides a unique identifier for the utility. Even this field is anonymized |
| city | String | City where the utility is located |
| address | String | (Anonymized) address of the utility location |
| user_code | String | (Anonymized) code that identifies the customer |
| nominative | String | (Anonymized) customer name |
| sex | String | Sex of the customer. It could be ‘M’, ‘F’, ‘P’, with ‘P’ denoting that the customer is a commercial activity (VAT number) |
| age | Int | Age of the customer, set to null for commercial activities (sex = ‘P’). Its value must be >= 18 |
| bill_id | Int | Invoice identifier |
| F1_kWh | Float | kWh of electricity consumed in the F1 time slot |
| F2_kWh | Float | kWh of electricity consumed in the F2 time slot |
| F3_kWh | Float | kWh of electricity consumed in the F3 time slot |
| date | Date | Start date |
| light_start_date | Date | Start date of electricity invoice |
| light_end_date | Date | End date of electricity invoice |
| tv | Float | Television fee to pay |
| gas_amount | Float | Gas fee to pay |
| gas_average_cost | Float | Average cost of gas |
| light_average_cost | Float | Average cost of electricity |
| emission_date | Date | Emission date |
| supply_type | String | Supply type (‘light’, ‘gas’, ‘gas and light’) |
| gas_start_date | Date | Start date of gas invoice |
| gas_end_date | Date | End date of gas invoice |
| extra_fees | Float | Extra fees to pay |
| gas_consumption | Float | Consumed gas |
| light_consumption | Float | Consumed electricity |
| gas_offer | Float | Name of the subscribed gas plan (anonymized) |
| light_offer_type | String | Kind of plan for the electricity (‘single zone’, ‘bizone’, etc.) |
| light_offer | String | Name of the subscribed electricity plan (anonymized) |
| total_amount | Float | gas_amount + light_amount + extra_fees |
| howmuch_pay | Float | Overall amount to pay, computed as total_amount + tv |
| light_amount | Float | Amount to pay for the electricity |
| average_unit_light_cost | Float | Average cost for electricity |
| average_light_bill_cost | Float | Average cost for the electricity invoice |
| average_unit_gas_cost | Float | Average cost for gas |
| average_gas_bill_cost | Float | Average cost for the gas invoice |
| billing_frequency | String | Billing frequency (‘monthly’, ‘quarterly’, etc.) |
| bill_type | String | Kind of invoice (False means a “standard bill”) |
| gas_system_charges | Float | Extra gas fees |
| light_system_charges | Float | Extra electricity fees |
| gas_material_cost | Float | Costs for gas |
| light_transport_cost | Float | Extra electricity fees |
| gas_transport_cost | Float | Extra gas fees |
| light_material_cost | Float | Costs for electricity |

