Skip to content

Bug to fix in yt_crawl.py (Solution on the first comment) #27

@StellarClown

Description

@StellarClown

In using yt_crawl.py, I found some problems in using it and took the liberty of fixing them. I hope it doesn't bother you.

The main problem was using argparse incorrectly and fixing the run function. Feel free to make further changes as some functions were commented out such as the call a subprocess

The code changed below:

My Solution

def run(api_key, gitCommit, datasetOutputLocation):
    videos = []
    print("Parsing Academy Courses")
    output = parseAcademy()
    for x in output:
        videos.append(x)

    print("Done Parsing Academy Courses")
    tags = {}
    for i in playlists:
        for v in GetVideosInPlaylist(api_key,i[1]):
            print(i[0])
            tags[v] = i[0]
    
    print("Grabbing video list")
    output = GetVideosInChannel(api_key)
    print("Sorting data")
    for video in output:
        tag = ""
        description = video[3].split('\n')
        title = video[2]
        print(title)
        if title in tags.keys():
            tag = tags[title]
        for line in description:
            if line != "":
                if not re.search('^\w[\d]*:[\d]', line):
                    line = '00:01 - ' + line

                temp = line.split("-")

                timestamp = temp[0].strip().split(":")

                seconds = timestamp[-1]
                hours = 0
                try:
                    hours = int(timestamp[-3])
                except:
                    pass
                minutes = int(timestamp[-2]) + int(hours * 60)

                newline = "-".join(temp[1::])

                entry = SearchEntry(
                    title, video[1], minutes, seconds, tag, newline).AsJsonSerializable()

                videos.append(entry)
                #print(f'{title} | {video[1]} ^ {line}')

    print("Serializing dataset")
    dataset = json.dumps(videos)
    print("Writing Dataset dataset...")
    with open(datasetOutputLocation, "w") as ds:
        ds.write(dataset)

    if gitCommit:
        gitDescription = "Updated dataset"
        print(f"Commiting to git, with commit description {gitDescription}")
        from subprocess import call
        call(["git", "commit", "-m", gitDescription, datasetOutputLocation])
    else:
        print("Done! Now commit to git")


def parser():
    parser = argparse.ArgumentParser(
        description="Generate the dataset for the web app")
    parser.add_argument(
            '-a','--api_key',
            help="Your API key from the Youtube API", 
            default=False)
    parser.add_argument(
            '--output_file', '-o',
            help="The output path", 
            default="dataset.json")
    parser.add_argument(
        '-g', '--git_commit',
        help="Automatically commit the dataset file to git (uses git cli)",
        action='store_true')
    args = parser.parse_args()
    if not args.api_key:
        args.api_key = open('yt.secret').read()

    run(args.api_key, args.git_commit, args.output_file)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions