Skip to content

Commit c2f0132

Browse files
Merge pull request #16 from Forced-Alignment-and-Vowel-Extraction/release-prep
Initial release (0.1.0)
2 parents 93ceecd + de6b0a7 commit c2f0132

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+16933
-293
lines changed

.github/workflows/quarto_docs.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Build Docs
2+
3+
on:
4+
push:
5+
branches: ["main", "dev", "release-prep"]
6+
7+
jobs:
8+
build-docs:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- uses: actions/checkout@v2
12+
- name: Setup FFmpeg
13+
uses: FedericoCarboni/setup-ffmpeg@v3
14+
- uses: actions/setup-python@v2
15+
with:
16+
python-version: "3.11"
17+
- name: Install dependencies
18+
run: |
19+
python -m pip install . jupyter
20+
- uses: quarto-dev/quarto-actions/setup@v2
21+
- name: Render and publish to gh pages
22+
env:
23+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
24+
uses: quarto-dev/quarto-actions/publish@v2
25+
with:
26+
target: gh-pages
27+
path: doc_src

README.md

Lines changed: 66 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,61 @@
44
![Python](https://img.shields.io/badge/python-3.11-blue.svg)
55
![GitHub](https://img.shields.io/github/license/Forced-Alignment-and-Vowel-Extraction/fave-asr?color=blue)
66

7+
![PyPI](https://img.shields.io/pypi/v/fave-asr)
78
![Build status](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-asr/actions/workflows/build.yml/badge.svg)
9+
[![Build Docs](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-asr/actions/workflows/quarto_docs.yml/badge.svg)](https://forced-alignment-and-vowel-extraction.github.io/fave-asr/)
810
[![codecov](https://codecov.io/gh/Forced-Alignment-and-Vowel-Extraction/fave-asr/graph/badge.svg?token=V54YXTIOPQ)](https://codecov.io/gh/Forced-Alignment-and-Vowel-Extraction/fave-asr)
911
<!-- For the future: Coveralls for codecoverage -->
1012

11-
## HuggingFace models used
12-
Artifical Intelegence models are powerful and in the wrong hands can be dangerous.
13-
The models used by fave-asr are cost-free, but you need to accept additional terms of use which confirm you will not misuse these powerful tools.
13+
The FAVE-asr package provides a system for the automated transcription of sociolinguistic interview data on local machines for use by aligners like [FAVE](https://github.com/JoFrhwld/FAVE) or the [Montreal Forced Aligner](https://montreal-forced-aligner.readthedocs.io/en/latest/). The package provides functions to label different speakers in the same audio (diarization), transcribe speech, and output TextGrids with phrase- or word-level alignments.
14+
15+
## Example Use Cases
16+
17+
- You want a transcription of an interview for more detailed hand correction.
18+
- You want to transcribe a large corpus and your analysis can tolerate a small error rate.
19+
- You want to make an audio corpus into a text corpus.
20+
- You want to know the number of speakers in an audio file.
21+
22+
For examples on how to use the pacakge, see the [Usage](usage/) pages.
23+
24+
## Installation
25+
To install fave-asr using pip, run the following command in your terminal:
26+
27+
```bash
28+
pip install fave-asr
29+
```
30+
31+
### Other software required
32+
* `ffmpeg` is needed to process the audio. You can [download it from their website](https://ffmpeg.org/download.html)
33+
34+
## Not another transcription service
35+
36+
There are several services which automate the process of transcribing audio, including
37+
38+
- [DARLA CAVE](http://darla.dartmouth.edu/cave)
39+
- [Otter AI](https://otter.ai/)
40+
41+
Unlike other services, `fave-asr` does not require uploading your data to other servers and instead focuses on processing audio on your own computer. Audio data can contain highly confidential information, and uploading this data to other services may not comply with ethical or legal data protection obligations. The goal of `fave-asr` is to serve those use cases where data protection makes local transcription necessary while making the process as seamless as cloud-based transcription services.
42+
43+
### Example
44+
45+
As an example, we'll transcribe an audio interview of Snoop Dogg by the 85 South Media podcast and output it as a TextGrid.
46+
47+
```{python}
48+
import fave_asr
49+
50+
data = fave_asr.transcribe_and_diarize(
51+
audio_file = 'usage/resources/SnoopDogg_85SouthMedia.wav',
52+
hf_token = '',
53+
model_name = 'small.en',
54+
device = 'cpu'
55+
)
56+
tg = fave_asr.to_TextGrid(data)
57+
tg.write('SnoopDogg_85SouthMedia.TextGrid')
58+
```
59+
60+
## Using gated models
61+
Artifical Intelegence models are powerful and in the wrong hands can be dangerous. The models used by fave-asr are cost-free, but you need to accept additional terms of use.
1462

1563
To use these models:
1664
1. On HuggingFace, [create an account](https://huggingface.co/join) or [log in](https://huggingface.co/login)
@@ -21,24 +69,23 @@ To use these models:
2169
Keep track of your token and keep it safe (e.g. don't accidentally upload it to GitHub).
2270
We suggest creating an environment variable for your token so that you don't need to paste it into your files.
2371

24-
### Creating an environment variable for your token
25-
#### Linux and Mac
26-
1. Open `~/.bashrc` in a text editor
72+
## Creating an environment variable for your token
73+
Storing your tokens as environment variables is a good way to avoid accidentally leaking them. Instead of typing the token into your code and deleting it before you commit, you can use `os.environ["HF_TOKEN"]` to access it from Python instead. This also makes your code more readable since it's obvious what `HF_TOKEN` is while a string of numbers and letters isn't clear.
74+
75+
### Linux and Mac
76+
On Linux and Mac you can store your token in `.bashrc`
77+
78+
1. Open `$HOME/.bashrc` in a text editor
2779
2. At the end of that file, add the following `HF_TOKEN='<your token>' ; export HF_TOKEN` replacing `<your token>` with [your HuggingFace token](https://hf.co/settings/tokens)
80+
3. Add the changes to your current session using `source $HOME/.bashrc`
81+
82+
### Windows
83+
On Windows, use the `setx` command to create an environment variable.
84+
```
85+
setx HF_TOKEN <your token>
86+
```
2887

29-
#### Windows
30-
If you run windows and know a solution, edit this file and create a pull request!
31-
32-
## Use
33-
This module is in active development. The use documentation may be out of date. Feel free to edit this file with updated instructions and create a pull request.
34-
1. Follow the [instructions on using HuggingFace models](#HuggingFace models used)
35-
2. Download `pipeline.py`
36-
3. Import that file into your project
37-
4. Set `audio_file = <path to your audio file>`
38-
5. Set `hf_token = <your huggingface token from step 1>`
39-
6. Set `model_name = <whisper model name>`, we recommend `"medium.en"` for English data, otherwise `"large"`
40-
7. Set `device = "cpu"` unless you can run on a GPU, then use `"cuda"`
41-
8. Run `results_segments_w_speakers = pipeline.transcribe_and_diarize(audio_file, hf_token, model_name, device)`
88+
You need to restart the command line afterwards to make the environment variable available for use. If you try to use the variable in the same window you set the variable, you will run into problems.
4289

4390
### Other software required
4491
* `ffmpeg`

doc_src/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/.quarto/

doc_src/Makefile

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
buildrender:
2+
quartodoc build
3+
quarto render
4+
5+
buildpreview:
6+
quartodoc build
7+
quarto preview
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
File type = "ooTextFile"
2+
Object class = "TextGrid"
3+
4+
xmin = 0.0
5+
xmax = 51.44
6+
tiers? <exists>
7+
size = 1
8+
item []:
9+
item [1]:
10+
class = "IntervalTier"
11+
name = "SPEAKER_00"
12+
xmin = 0.0
13+
xmax = 51.44
14+
intervals: size = 12
15+
intervals [1]:
16+
xmin = 0.0
17+
xmax = 6.24
18+
text = " So you know the pimpin fuck y'all I'm gonna go over to dev jam"
19+
intervals [2]:
20+
xmin = 6.24
21+
xmax = 8.78
22+
text = " And learn a little bit of corporate work cuz I don't know corporate shit"
23+
intervals [3]:
24+
xmin = 8.78
25+
xmax = 11.46
26+
text = " I only need a few months right give me a few months around the shit"
27+
intervals [4]:
28+
xmin = 11.46
29+
xmax = 18.94
30+
text = " I'm a fast learner go to dev jam get a job in a position drop a record get Benny the butcher song get hip-hop Harry's on"
31+
intervals [5]:
32+
xmin = 18.94
33+
xmax = 23.94
34+
text = " Learn a few tricks of the trade find out that the niggas that had it that wanted me to hold for them"
35+
intervals [6]:
36+
xmin = 23.94
37+
xmax = 26.12
38+
text = " Then sold it to some other people"
39+
intervals [7]:
40+
xmin = 26.12
41+
xmax = 29.9
42+
text = " So now one of my big wig buddies called me and say hey dog"
43+
intervals [8]:
44+
xmin = 29.9
45+
xmax = 34.38
46+
text = " I know the people that got there from and they don't know what to do with it. Mmm"
47+
intervals [9]:
48+
xmin = 34.38
49+
xmax = 40.72
50+
text = " Let me hide them. I know just what to do with it. So I hit them like let me um, we work for y'all"
51+
intervals [10]:
52+
xmin = 40.72
53+
xmax = 45.96
54+
text = " The play was cool, but it's like yeah fuck that how much how much to buy this shit?"
55+
intervals [11]:
56+
xmin = 45.96
57+
xmax = 50.22
58+
text = " How much to buy death row first how much for my masters?"
59+
intervals [12]:
60+
xmin = 50.22
61+
xmax = 51.44
62+
text = " How much for all of the masters?"

doc_src/_quarto.yml

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
project:
2+
type: website
3+
output-dir: _site
4+
5+
license: GPLv3
6+
7+
website:
8+
favicon: assets/FAVE-logo.png
9+
image: assets/FAVE-logo.png
10+
page-navigation: true
11+
navbar:
12+
logo: assets/FAVE-logo.png
13+
left:
14+
- file: index.qmd
15+
text: Get Started
16+
- href: usage/
17+
text: Examples
18+
- href: reference/
19+
text: Reference
20+
right:
21+
- icon: github
22+
href: https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-asr/
23+
sidebar:
24+
- id: get-started
25+
title: Get Started
26+
style: floating
27+
align: left
28+
contents:
29+
- section: Getting Started
30+
contents:
31+
- index.qmd
32+
- usage/index.qmd
33+
34+
format:
35+
html:
36+
theme:
37+
light: flatly
38+
dark: darkly
39+
toc: true
40+
41+
# tell quarto to read the generated sidebar
42+
metadata-files:
43+
- reference/_sidebar.yml
44+
45+
interlinks:
46+
sources:
47+
python:
48+
url: https://docs.python.org/3/
49+
50+
quartodoc:
51+
# the name used to import the package you want to create reference docs for
52+
package: fave_asr
53+
style: pkgdown
54+
dir: reference
55+
# write sidebar data to this file
56+
sidebar: "reference/_sidebar.yml"
57+
parser: google
58+
#render_interlinks: true
59+
sections:
60+
- title: FAVE ASR functions
61+
#desc: |
62+
# These functions comprise the main pipeline.
63+
contents:
64+
- package: fave_asr
65+
name: fave_asr

doc_src/_site/assets/FAVE-logo.png

28.7 KB
Loading

0 commit comments

Comments
 (0)