-
Notifications
You must be signed in to change notification settings - Fork 2
CSR report
This report outlines the CSR impact of our A.P.RI.L. project. The project enables users to efficiently gather and analyze web data through a user-friendly interface. We have ensured the project adheres to ethical standards of data collection, minimizes environmental impact, and provides social value through transparency and accessibility.
-
Compliance with Data Privacy Regulations: This project ensures adherence to data privacy laws, such as the General Data Protection Regulation (GDPR). Web scraping operations focus exclusively on publicly available information, with no personal or sensitive data being collected, processed, or stored. This approach ensures compliance with privacy standards while safeguarding individual rights. The project's legal framework also aligns with recent developments in French law, confirming the legitimacy of text and data mining for research purposes.
Evidence of Compliance:
- Scraping only non-personal, publicly accessible data.
- Regular audits to ensure adherence to applicable laws and website policies.
- Implementing data anonymization and strict access controls.
-
Transparent Data Collection: Transparency is central to this project. All data collection practices are clearly documented, ensuring stakeholders are informed about the sources and methods used. The team respects the intellectual property rights of data owners and avoids using proprietary data without proper authorization. This approach is consistent with the legal recognition of text and data mining activities in France, which is increasingly operational in academic and research contexts source.
Best Practice:
- Keeping comprehensive logs of websites and data sources used.
- Documenting the entire data collection process to ensure transparency and accountability.
-
Consent and Permission: Although web scraping often involves gathering publicly available data, the project carefully respects each website’s robots.txt file and scraping policies. If scraping is explicitly prohibited, data is not collected from that site. In cases where terms are unclear, the project seeks explicit permission to scrape.
Green IT Practices: Green IT practices are embedded into the project’s operations. Wherever possible, energy-efficient hardware and storage services that are powered by renewable energy sources are utilized. The local storage infrastructure is also optimized for energy savings.
Examples of Green Practices: - Using data centers that utilize renewable energy sources. - Reducing storage redundancy and avoiding over-provisioning in servers. - Regularly auditing the environmental impact of hardware used for scraping and processing.
Access to Data for Social Good: The data collected and analyzed through this project can potentially be used for social good. By providing insights into specific topics, it could aid in research aimed at solving pressing societal issues. This data can help in public health research, consumer behavior analysis, and policy-making, depending on the topic of focus.
Impact Example: - Use of the corpus for research into topics such as misinformation, healthcare trends, or social justice issues. - Potential sharing of anonymized insights or aggregate-level data with non-profits or public institutions for better decision-making.
Ethical Web Scraping:
-
Compliance with Legal Standards:
Data Privacy Considerations: Ensure that your scraping pipeline respects privacy laws such as the General Data Protection Regulation (GDPR) in Europe. When scraping personal data, ensuring that no personally identifiable information (PII) is processed or stored without proper consent is crucial.
-
Minimizing Negative Impact on Websites:
Limiting server load: Ethical scraping also involves limiting the frequency and volume of requests to avoid overloading servers. Respect rate limits to prevent disrupting the normal functioning of the websites you're scraping.
Data Integrity: Scraping responsibly also means not using the data for any form of malicious intent or misinformation. The data should be used in ways that align with your institution's research goals, contributing to legitimate, constructive research.
-
Transparency and Collaboration:
Credit to Data Sources: Whenever possible, credit the websites or data sources from which you collected information. This fosters a culture of respect and transparency.
The A.P.RI.L. project successfully aligns with CSR principles by prioritizing ethical data use, environmental sustainability, and social good. Through responsible data collection, energy-efficient practices, and a commitment to transparency, the project minimizes its environmental impact while delivering valuable insights for public benefit. Its commitment to stakeholder well-being, through accessibility and job creation, further highlights its social and economic value.
For a detailed breakdown of the CSR metrics and their measurement, please refer to our CSR Report on GitHub.
By integrating these CSR values into the design and execution of the project, we ensure that A.P.RI.L. contributes positively to society and the environment while maintaining the highest standards of governance and compliance.
Home | Contributors | Report an Issue | Licence
© 2024 APRIL. | Version 1.0 | Last updated on 2025-01-14