Student's name | SCIPER |
---|---|
Camille Challier | 311020 |
Cyrill Strassburg | 377372 |
Eglantine Vialaneix | 324293 |
Milestone 1 β’ Milestone 2 β’ Milestone 3
10% of the final grade
This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas. Please, fill the following sections about your project.
(max. 2000 characters per section)
Find a dataset (or multiple) that you will explore. Assess the quality of the data it contains and how much preprocessing / data-cleaning it will require before tackling visualization. We recommend using a standard dataset as this course is not about scraping nor data processing.
Hint: some good pointers for finding quality publicly available datasets (Google dataset search, Kaggle, OpenSwissData, SNAP and FiveThirtyEight), you could use also the DataSets proposed by the ENAC (see the Announcements section on Zulip).
For this project, we are working with multiple datasets related to cetaceans. Their preprocessing steps are detailed in the EDA part below.
- Info about Cetacean: Basic information of all cetaceans can be found in the wikipedia list of cetaceans. For each cetacean species, the list includes information about family, genus, species scientific and common name, level of endangerment, where they live, size illustration, and a photograph. Additionally, each cetacean species has its own wikipedia page with images and additional information. The list has been converted to a pandas dataframe. Images can be accessed via the Wikimedia API.
- Global sightings of Cetaceans: Data on cetacean sightings was downloaded from OBIS Seamap, a data center for various marine animals. Cetacean data from OBIS is originally sourced from HappyWhale and includes sightings spanning from 1972 to now. Attributes include GPS coordinates, species name, unique animal id, Group size, date of sighting, locality and environmental details.
- Phylogenetic tree of Cetacean: From a paper published in May 2020 in Systematic Biology we could retrieve one of the latest phylogenetic trees of cetaceans. We plan to manually re-transcripted Figures S1 and S3 to display them in a more interactive and playful way along with the rest of the information from this project.
- To assess the potential threats to cetacean survival, we explored multiple simple datasets covering climate disruption, ship strikes, and whaling activities:
- Climate Disruption: Copernicus provides Global Monthly Average Sea Surface Temperatures (SST) Anomalies (deviations from long-term averages) from 1993 to 2021.
- Ship Strikes: IWC Ship Strike Database records incidents of ship collisions with marine mammals since 1954.
- Whaling Activities: International Whaling Commission (IWC) on direct whale catches since 1986, including catches per year, whale species, geographic area, nation, and the type of operation (Commercial, Aboriginal, Illegal,...)
- Marine Protected Areas: The World Database on Protected Areas (WDPA), a comprehensive global database of marine and terrestrial protected areas. The WDPA is updated monthly and provides crucial insights into the distribution and extent of protected areas.
Frame the general topic of your visualization and the main axis that you want to develop.
- What am I trying to show with my visualization?
- Think of an overview for the project, your motivation, and the target audience.
More than a century after the peak of commercial whaling most cetacean populations are still struggling to recover. According to a study published in May 2023 in the journal Conservation Biology, as of 2021, approximately 26% of whale, dolphin, and porpoise species are classified as threatened with extinction.
By creating a playful, engaging and interesting way of navigating information about modern cetaceans, this project aims to make information easily accessible and raise awareness about cetaceans, their phylogeny, their current global condition and the various threats they face.
Through our visualizations, we aim to:
- Global Overview: Provide an overview of cetaceans around the world, highlighting the species that are extinct or endangered, using the Red-List status for reference.
- Phylogenetic Tree: Present a phylogenetic tree to showcase the evolutionary relationships of cetaceans, highlighting extinct species and their connections to modern counterparts.
- Cetacean Sightings: Display sightings of cetaceans around the globe to help users understand where they live and their migration patterns. Additionally, we aim to compare these locations with protected marine areas and regions of high-risk threats to assess conservation efforts and potential dangers.
- Timeline of Threats: Illustrate the cumulative and ongoing threats to cetaceans, such as the impact of climate change on oceans, maritime traffic, pollution and plastic contamination, and hunting practices over time.
By presenting a comprehensive visualization of their global distribution, their history, and the cumulative impacts of human activities, we seek to inform the public about the critical state of cetacean populations. The target audience for this project includes environmental activists, marine biologists, educators, and most importantly the general public. By creating an engaging and interactive experience, we aim to captivate a broad audience and encourage a deeper understanding of the challenges cetaceans face, with the hope of fostering greater support for their protection.
Pre-processing of the data set you chose
- Show some basic statistics and get insights about the data
Because this data will be retrieved by ourselves, its quality depends on our scraping methods. Wikipedia has a clean and standardized structure for cetaceans articles and our downloading mainly relies on it to keep a corresponding structure. As a proof of concept, a few images that were successfully retrieved are present in our repository and we show some examples below. The retrieval of other images (comparison in sizes with humans, endangered index, location in the world) and textual information is still in process.
Photograph of the animal | Size comparison with human | World location of the species |
---|---|---|
![]() |
![]() |
|
Atlantic Spotted Dolphin | Blainville's Beaked Whale | Baird's Beaked Whale |
The data processing was primarily performed during the download phase using the OBIS Seamap website, where we filtered for the relevant cetacean species. Two datasets were extracted, each containing similar information but with different column names. To ensure consistency, these datasets were concatenated after aligning their column names and formats. This extracted dataset encompasses records of over 275191 sightings. Some location information, such as country and water zone, is missing for some sightings, but since we have the coordinates, we might not need it or could extract it if necessary. For more details on the exploratory data analysis, refer to the EDA_location.ipynb notebook.
As a really large number of events are present in the dataset and in order to visualize the locations of sightings on a world map, we group sightings of similar species and locations. We will determine whether this approach is necessary for the final website as well.
Note that the marker size represents the number of animals observed at this location.
Anomalies represent deviations from long-term averages. For example, the January 2021 anomaly is calculated as the difference between the sea surface temperature in January 2021 and the climatological average for all January months within the dataset's time span.
- What others have already done with the data?
- Why is your approach original?
- What source of inspiration do you take? Visualizations that you found on other websites or magazines (might be unrelated to your data).
- In case you are using a dataset that you have already explored in another context (ML or ADA course, semester project...), you are required to share the report of that work to outline the differences with the submission for this class.
-
Phylogenetic Tree of Cetaceans
- OneZoom provides an interactive tree of life visualization, inspiring our effort to create a phylogenetic tree specifically for Cetaceans, incorporating additional study features.
-
Global Sightings of Cetaceans
- OBIS Seamap offers a heatmap of species distribution presence across the world map, allowing users to filter species and examine concentration levels.
- Whales of Guerrero labs has used this dataset to track North Pacific humpback whale movements.
-
Timeline of Threats: To represent threats to cetaceans, we plan to implement interactive line plots or 2D world maps, allowing users to explore many variables over time. Several visualizations have already been made using the datasets previously mentionned on topics such as: Sea Temperature: Sea Surface Temperature line plot, NASA - 2D Temperature Map; Ship Strikes Evolution: Ship Strikes Evolution Report; Whaling Activities: 2D Map.
Our approach integrates interactivity, enabling users to adjust parameters, highlight individual species with color coding, and explore seasonal migration patterns. Another unique aspect of our approach is the integration of conservation-challenged animals and protected marine areas, linking sightings with conservation efforts and highlighting the relationship between cetacean presence and protected regions as well as their evolutionary tree and how the different species of cetaceans differ from each other. By combining these elements into a single, integrated visualization, we highlight how various threats collectively impact cetacean populations, offering a more comprehensive understanding of their conservation needs.
- Phylogenetic Tree of Cetaceans
Similarly to OneZoom , we would like to create an interactive tree of the cetacean life displaying various information alongside by hovering or clicking on a leaf of their choice.
- Global Sightings of Cetaceans
We aim to develop a 3D Navigable Globe for visualizing cetacean sightings and conservation efforts. Notable JavaScript-based visualizations like Populated Place Visualization in D3.js and Population Heatmap in React showcase interactive 3D globes displaying global datasets, which could be adapted for our project.
-
π Process Book (PDF)
A detailed overview of our project goals, design process, methodology, and evaluation. -
π₯ Presentation Video
A short video walkthrough showcasing our data visualization project and key insights. -
π Final Project Website
Explore the live interactive visualization and learn more about our findings.
.
βββ README.md
βββ requirements.txt
β
βββ data/ # Raw and processed datasets
β
βββ Milestone_1/ # Initial exploratory data analysis and data extraction
βΒ Β βββ figures_EDA/ # PNG images from EDA
βΒ Β βββ EDA_location.py # Map visualization and threat exploratory analysis
βΒ Β βββ data_extractor.py # Scripts for extracting map data
βΒ Β βββ images.ipynb # Notebook for scraping cetacean images from Wikipedia
βΒ Β βββ utils.py # Utility functions
βΒ Β βββ wikitables.ipynb # Wikipedia data scraping and tree of life exploration
β
βββ Milestone_2/ # Second milestone deliverables
βΒ Β βββ tree_of_life/ # Figures related to the tree of life
βΒ Β βββ wiki_images/ # PNG images of cetaceans
βΒ Β βββ Milestone_2.pdf # Milestone 2 report
β
βββ Milestone_3/ # Final milestone deliverables
βΒ Β βββ process book.pdf # Process book document
βΒ Β βββ cetacea_short.mp4 # Presentation video or screencast files
The website was built using JavaScript, CSS, HTML, and D3.js.
Find the implementation here: https://github.com/eglantine-vialaneix/WhereWereWhalesLFS
Data processing and exploratory data analysis (EDA) were performed using Python.
- Explore the datasets and understand the data cleaning and extraction process through the provided scripts and notebooks.
- Review milestone reports and visualizations to follow project progress and insights.
- Use the interactive website to explore whale sightings, threats, and species profiles.
- < 24h: 80% of the grade for the milestone
- < 48h: 70% of the grade for the milestone