Acappella/index.html at master · IPCV/Acappella · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
layout: page
gh-repo: JuanFMontesinos/y-net
gh-badge: [star, watch, fork, follow]
share-description: Official website of A cappella: Audio-visual Singing Voice Separation
---
<div class="overlay"></div>
<div class="container">
    <div class="row">
        <div class="col-xl-12 mx-auto text-center">
            <h1>A cappella: Audio-visual Singing Voice Separation</h1>
        </div>
        <div class="col-md-10 col-lg-8 col-xl-7 mx-auto">
        </div>
    </div>
</div>


<div class="col-xl-10 col-lg-8 offset-lg-1">

    <!-- Testimonials -->
    <section class="testimonials text-center">
        <div class="container">
            <div class="row">
                <div class="col-lg-4 text-center" style="">
                    <div class="testimonial-item mx-auto mb-5 mb-lg-0">
                        <h5>
                            <a href="mailto:juanfelipe.montesinosATupfDOTedu"
                               style="text-decoration : none; color : #000000;">
                                Juan F. Montesinos <sup class="asterix">*</sup>
                            </a>
                        </h5>
                        <p class="font-weight-light mb-0"></p>
                    </div>
                </div>
                <div class="col-lg-4 text-center">
                    <div class="testimonial-item mx-auto mb-5 mb-lg-0" style="width: 109%">
                        <h5>
                            &nbsp;
                            <a href="mailto:venkatesh.kadandaleATupfDOTedu"
                               style="text-decoration : none; color : #000000;">
                                Venkatesh S. Kadandale <sup class="asterix">*</sup>
                            </a>
                        </h5>
                        <p class="font-weight-light mb-0"></p>
                    </div>
                </div>
                <div class="col-lg-4 text-center">
                    <div class="testimonial-item mx-auto mb-5 mb-lg-0">
                        <h5>
                            <a href="mailto:gloria.haroATupfDOTedu" style="text-decoration : none; color : #000000;">
                                Gloria Haro
                            </a>
                        </h5>
                        <p class="font-weight-light mb-0"></p>
                    </div>
                </div>
            </div>
            <div class="row">
                <div class="offset-lg-3 col-lg-6 padtop" style="padding-bottom: 2rem">
          <span class="align-middle">
            <p class="mylead2">
                <a href="https://www.upf.edu/web/etic"
                style="color:black">Universitat Pompeu Fabra, Barcelona, Spain</a><br>


          </span>
                </div>
            </div>
        </div>
    </section>
    <div class="row justify-content-center">
        <div class="col-sm-3 text-center">
            <a target="_blank"
               href="https://arxiv.org/abs/2104.09946"><img src="assets/img/paper.png" width="120" height="130" style="border:1px solid black;"></a>
            <h5 style="padding-bottom: 5%; padding-top: 5%">Paper</h5>
        </div>
        <div class="col-sm-3 text-center">
            <a href="https://github.com/JuanFMontesinos/Acappella-YNet"
               style="color: #242124">
                <i class="fab fa-github fa-8x"></i></a>
            <h5 style="padding-bottom: 5%; padding-top: 5%">Code + Weights</h5>
        </div>
        <div class="col-sm-3 text-center">
            <a style="color: #242124"
               href="./acappella/">
                <i class="fas fa-database fa-8x"></i>
            </a>
            <h5 style="padding-bottom: 5%; padding-top: 5%">Dataset</h5>
        </div>
        <div class="col-sm-3 text-center">
            <a style="color: #242124;"
               href="./demos/">
                <i class="fas fa-film fa-8x" style="transform: scale(1,1.275); padding-top: 2.5px"></i>
            </a>
            <h5 style="padding-bottom: 5%; padding-top: 5px">Demos</h5>
        </div>
    </div>
    <h6 class="mx-auto text-center" style="color: saddlebrown">The paper has been accepted to BMVC 2021!</h6>
    </br>
    <!-- Image Showcases -->
    <h2 style="text-align: center">Abstract</h2>
    <p class="lead mb-0" align="justify">
        The task of isolating a target singing voice in music videos has useful applications.
        In this work, we explore the single-channel singing voice separation problem from a
        multimodal perspective, by jointly learning from audio and visual modalities. To do so,
        we present <a href="./acappella/"><i>Acappella</i></a>, a dataset spanning around 46
        hours of <i>a cappella</i> solo singing videos sourced from YouTube. We also propose
        an audio-visual convolutional network based on graphs which achieves state-of-the-art
        singing voice separation results on our dataset and compare it against its audio-only
        counterpart, U-Net, and a state-of-the-art audiovisual speech separation model. We
        evaluate the models in the following challenging setups: i) presence of overlapping
        voices in the audio mixtures, ii) the target voice set to lower volume levels in the
        mix, and iii) combination of i) and ii). The third one being the most challenging
        evaluation setup. We demonstrate that our model outperforms the baseline models in the
        singing voice separation task in the most challenging evaluation setup. The code, the
        pre-trained models, and the dataset are publicly available at
        <a href="https://ipcv.github.io/Acappella/">https://ipcv.github.io/Acappella/</a>
    </p>
    </br>
    <div style="display: flex">
        <div>
            <sup class="asterix" style="top: 0.1em">*</sup></div>
        <p style="font-size: 1.15rem">&nbsp; These authors contributed equally to this work.</p>
    </div>

    <div class="mx-auto">
        <br>
        <h5>Citation</h5>
        <pre class="hightlight" style="background-color:rgba(0,0,0, 0.1)"><p class="mb-0" align="justify">@inproceedings{montesinos2021cappella,
  title={A cappella: Audio-visual Singing Voice Separation},
  author={Montesinos, Juan F and Kadandale, Venkatesh S and Haro, Gloria},
  booktitle={32nd British Machine Vision Conference, BMVC 2021},
  year={2021}
}</p></pre>
    </div>

    <div class="mx-auto">
        <br>
        <h5>Acknowledgements</h5>
        <p class="lead mb-0" align="justify">
            The authors acknowledge support by MICINN/FEDER UE project, ref. PGC2018-098625-B-I00;
            H2020-MSCA-RISE-2017 project, ref. 777826 NoMADS; ReAViPeRo network, ref. RED2018-102511-T;
            and Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence
            Program (MDM-2015-0502) and the Social European Funds. J. F. M. acknowledges support by
            FPI scholarship PRE2018-083920. V. S. K. has received financial support through “la Caixa”
            Foundation (ID 100010434), fellowship code: LCF/BQ/DI18/11660064. V.S.K has also received
            funding from the European Union’s Horizon 2020 research and innovation programme under the
            Marie SkłodowskaCurie grant agreement No. 713673. We gratefully acknowledge NVIDIA Corporation
            for the donation of GPUs used for the experiments.

            We thank <a href="https://emiliagomez.com/">Emilia Gómez</a> and
            <a href="http://olgaslizovskaia.ml/">Olga Slizovskaia</a>
            for insightful discussions on the subject.</p>
    </div>

</div>