-
Notifications
You must be signed in to change notification settings - Fork 747
refactor(extractors): simplify and combine jwplayer extraction #2398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
library/src/commonMain/kotlin/com/lagradost/cloudstream3/extractors/helper/JWPlayerHelper.kt
Show resolved
Hide resolved
| /** | ||
| * Get stream links the "sources" attribute inside a JWPlayer script, e.g. | ||
| * | ||
| * ```js | ||
| * <script> | ||
| * jwplayer("vplayer").setup({ | ||
| * sources: [{file:"https://example.com/master.m3u8"}], | ||
| * tracks: [{file: "https://example.com/subtitles.vtt", kind: "captions", label: "en"}], | ||
| * } | ||
| * ``` | ||
| * | ||
| * @param script The content of a HTML <script> tag containing the jwplayer code. | ||
| * @return whether any extractor or subtitle link was found | ||
| */ | ||
| suspend fun getStreamLinks( | ||
| script: String, | ||
| sourceName: String, | ||
| referer: String?, | ||
| callback: (ExtractorLink) -> Unit, | ||
| subtitleCallback: (SubtitleFile) -> Unit, | ||
| headers: Map<String, String> = mapOf() | ||
| ): Boolean { | ||
| val sourceMatches = sourceRegex.findAll(script).flatMap { sourceMatch -> | ||
| val match = sourceMatch.groupValues[1] | ||
| .addMarks("file") | ||
| .addMarks("label") | ||
| .addMarks("type") | ||
| tryParseJson<List<Source>>(match).orEmpty() | ||
| }.toList() | ||
|
|
||
| val extractedLinks = sourceMatches.flatMap { link -> | ||
| if (link.file.contains(".m3u8")) { | ||
| M3u8Helper.generateM3u8( | ||
| source = sourceName, | ||
| streamUrl = link.file, | ||
| referer = referer.orEmpty(), | ||
| headers = headers | ||
| ) | ||
| } else { | ||
| listOf( | ||
| newExtractorLink( | ||
| source = sourceName, | ||
| name = sourceName, | ||
| url = link.file, | ||
| ) { | ||
| this.referer = url | ||
| this.headers = headers | ||
| } | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not bad, but leaves a lot of cases unhandled.
There are several counterexamples (I just can't remember where to find them all 🙃 ) as in:
var links = {
"hls3": "https://mmmmmmmmmm.qqqqqqqqqqqq.space/#########/hls3/01/00000/ggggggggg_l/master.txt",
"hls4": "/stream/zzzzzzzzzzzzzzz/hhhhhhhhhhh/123456789/123456/master.m3u8",
"hls2": "https://mmmmmmmmmm.qqqqqqqqqqqq.com/hls2/01/00000/ggggggggg_l/master.m3u8?t=##################&s=123456"
};
jwplayer("vplayer").setup({
sources: [{
file: links.hls4 || links.hls3 || links.hls2,
type: "hls"
}],
image: "https://pppppppp.ppp/p.jpg",file: may have a variable or function that gathers links from other parts of the script.
The links may be incomplete, so add the mainUrl prefix from the extractor or use fixUrl().
To avoid dealing with the various exceptions, I usually do it like this:
Regex("""[:=]\s*\"([^\"\s]+(\.m3u8|master\.txt)[^\"\s]*)""").findAll(unpackedScript).forEach { match ->
val link = match.groupValues[1]
callback.invoke(
newExtractorLink(
source = this.name,
name = this.name,
url = fixUrl(link)
) {
this.referer = referer ?: mainUrl
}
)
}That is, after : or = get all that's delimited by " and has .m3u8 or master.txt inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good catch, thank you for the review @diogob003 !
One example would be https://github.com/recloudstream/cloudstream/blob/master/library/src/commonMain/kotlin/com/lagradost/cloudstream3/extractors/VidHidePro.kt#L83, I probably should have remembered that edge case because I wrote the patch over there myself :)
We're now trying to extract stream links by first
- trying to parse the JWPlayer config object and searching for
the file:<url>pattern - if that doesn't find anything, we're searching for
hlsstreams as you suggested
The problem why we can't drop the first case is that some providers do provide other formats than HLS, e.g. I've already seen some .mp4 streams. So in this case, using only the second method wouldn't work.
a45b9d4 to
5a4d16b
Compare
I noticed that a lot of extractors are based on the following pattern:
scriptfile containing JWPlayer configPreviously, all extractors had their own logic for extracting the JWPlayer config, even though the JavaScript data parsed always looks the same. So there's been a lot of duplicated code that behaved inconsistent (i.e. some only supported
m3u8links and no other stream types likemp4and others didn't parse the subtitles).This logic is now handled by
JWPlayerHelper.kt.I've also noticed that some of the files could be merged, e.g.
contain almost the exact same logic and
also do the exact same thing. I suspect that all these providers are based on the exact same code (only some UI changes), but I haven't moved their extractors into the same class yet because I'd like to hear some other opinions first before doing that.
The deleted files are usages of the JWPlayer API on sites that no longer exist, so I removed them.