TuscanSoundCorrespondences

This repository contains all the code and data accompanying our paper:

Rubehn, A., Montemagni, S., & Nerbonne, J. (2024). Extracting Tuscan phonetic correspondences from dialect pronunciations automatically. Language Dynamics and Change, 14(1), 1-33. https://doi.org/10.1163/22105832-bja10034

If you (re-)use parts of this code, please cite our work accordingly.

Installation

Clone into this repository on your machine:

$ git clone https://github.com/arubehn/TuscanSoundCorrespondences
$ cd TuscanSoundCorrespondences

It is highly recommended to set up a fresh virtual environment.

In order to set up the project and install all required dependencies, simply run

$ pip install -e .

Replication

Once the project is set up correctly, all analyses discussed in the paper can be replicated by executing the run.py script.

$ python3 run.py

Analyzing new data

You can analyze your own data with just a few lines of code -- you just need to define the varieties of interest (V) and your data filepath. Input data is expected to be present either as a CLDF repository [1], or as a TSV file with headers according to the EDICTOR specifications. [2]

from shibboleth.calculate import ShibbolethCalculator

data = "YOUR/DATA/FILE/PATH"
varieties = ["your", "set", "of", "varieties"]  # needs to match the language id given in the data exactly

sc = ShibbolethCalculator(varieties, data)
charac, repr, dist = sc.calculate_metrics(normalize=True) # normalize=True calculates normalized PMI; otherwise plain PMI is calculated

If your data already Multiple Sequence Alignments (MSA), those are used by default. Otherwise, alignments are generated using the SCA algorithm [3] implemented in LingPy. [4]

Dataset

The dataset used in this study is derived from the Atlante Lessicale Toscano (ALT). [5] Our derived dataset was published within the Lexibank collection under https://github.com/lexibank/alt/. This repository contains a condensed version of this Lexibank dataset (/resources/data/alt.tsv) -- for details on how this file was produced, please check the README in the resources/data folder.

References

[1] Forkel, R. et al. (2018). Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics. Scientific data, 5(1), 1-10.

[2] List, J.-M. and van Dam, K. P. (2024): EDICTOR 3. A web-based tool for Computer-Assisted Language Comparison [Software Tool, Version 3.0]. MCL Chair at the University of Passau: Passau. URL: https://edictor.org.

[3] List, J.-M. (2012). SCA: Phonetic alignment based on sound classes. In Marija Slavkovik and Dan Lassiter (eds.), New Directions in Logic, Language, and Computation, 32–51. Berlin and Heidelberg: Springer. https://doi.org/10.1007/978-3-642-31467-4_3.

[4] List, J.-M. and Forkel, R. (2021). LingPy: A Python library for historical linguistics, version 2.6.9. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: https://lingpy.org, DOI: https://zenodo.org/badge/latestdoi/5137/lingpy/lingpy.

[5] Giacomelli, G. et al. (eds.) (2000). Atlante lessicale toscano. Rome: Lexis Progetti Editoriali.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
resources		resources
src/shibboleth		src/shibboleth
tests		tests
.gitignore		.gitignore
README.md		README.md
run.py		run.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TuscanSoundCorrespondences

Installation

Replication

Analyzing new data

Dataset

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TuscanSoundCorrespondences

Installation

Replication

Analyzing new data

Dataset

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages