yt-dlp

Introduction

I want to download some audio from Youtube which is not available anywhere else.

Please make things available for purchase if you want my money.

There are a few different python packages but this one looks to be the most well maintained and up to date.

pip install yt-dlp

WARNING: The script yt-dlp.exe is installed in ‘C:\Users\NathanMoore\AppData\Roaming\Python\Python312\Scripts’ which is not on PATH.

Python

OK, that’s fine, I can still write a python script and call from that directory.

from yt_dlp import YoutubeDL

URLS = ['https://www.youtube.com/watch?v=dQw4w9WgXcQ'] 
with YoutubeDL() as ydl: 
    ydl.download(URLS)

Call this from the file location with
py download.py

[download] Destination: Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster) [dQw4w9WgXcQ].mp4

And I get a bunch of warnings as well.

WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add –js-runtimes RUNTIME[:PATH] to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See https://github.com/yt-dlp/yt-dlp/wiki/EJS for details on installing one

But it downloads successfully!

pip

It looks like I can resolve this by running
pip install -U "yt-dlp[default]"
to install the default dependencies along with yt-dlp

Similar warning to before:

WARNING: The script websockets.exe is installed in ‘C:\Users\NathanMoore\AppData\Roaming\Python\Python312\Scripts’ which is not on PATH.

So, I should add this directory to path. DRE: Don’t Repeat Errors.

My python package installation is reasonably haphazard - I rarely use venv, and use pip by default. Maybe I should use
py -m pip install [package]
more regularly? I have used that a bit in the past.

Trying to upgrade plotly with
py -m pip install -U plotly
Doesn’t make any difference to the install location, it still goes into AppData, so I should use an administrator shell to install python packages if I want to be more consistent.

I could also use something like this:
pip install -r requirements.txt -t "C:\Python37\Lib\site-packages"

I’m going to ignore installation locations and proper venv for now.

path

How do I see the PATH environment variable entries in PS? VSCode uses PowerShell by default and I haven’t changed it. After a bit of searching:

Get-ChildItem env:
Shows all the environment variables.

(Get-ChildItem Env:PATH).Value.Split(';') | Sort
Shows the sorted items in Path

So, I think I’ve got JavaScript dependencies sorted via PATH, let us download again and see if there are more warnings. I know there’s one about ffmpeg that I need to fix.

But, the JavaScript warning still comes up, even when I open a new powershell window to get a fresh PATH. I’m going to install via the deno website recommended PowerShell script.
https://docs.deno.com/runtime/getting_started/installation/
This looks like it works well enough. deno --help gives appropriate output.

Downloading a video again gives no warning, for JS at least.

ffmpeg

Let’s fix ffmpeg warning.

WARNING: ffmpeg not found. The downloaded format may not be the best available. Installing ffmpeg is strongly recommended: https://github.com/yt-dlp/yt-dlp#dependencies

Link to the main site: https://www.ffmpeg.org/

Windows Auto-Builds: https://github.com/BtbN/FFmpeg-Builds

Recommended yt-dlp builds which fix some issues: https://github.com/yt-dlp/FFmpeg-Builds

Do I need to install this? Extract and place it somewhere? There are no instructions that I can see on the website. This was my suspicion, extract, copy to a folder, and add that folder to PATH. Of course. https://video.stackexchange.com/questions/20495/how-do-i-set-up-and-use-ffmpeg-in-windows

Yes! No warnings, two downloads, merging formats, looks sensible, though I’m not sure exactly what the output all means.

Audio

We can also try downloading the audio only, which is ultimately what I want.
py download-audio.py


import yt_dlp

URLS = ['https://www.youtube.com/watch?v=dQw4w9WgXcQ']

ydl_opts = { 
    'format': 'm4a/bestaudio/best', 
    # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments 
    'postprocessors': [{ # Extract audio using ffmpeg 
        'key': 'FFmpegExtractAudio', 
        'preferredcodec': 'm4a', 
    }] 
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl: 
    error_code = ydl.download(URLS)

This works too, downloading to an m4a format. I’m used to mp3 files, but that’s only because they’ve been around so long and everyone talks about them as audio files. From a quick search, it looks like m4a is fine even though it is Apple. Just for checking though, I can change the m4a into mp3 in the code and it downloads the webm file first, then extracts the audio. There is no extraction like this for m4a and even though I’m not worried about bandwidth or storage size, I think I’ll stick with m4a.

Can I download a playlist? Yes, I can download 4 files, but then disaster:

PermissionError: [WinError 5] Access is denied:

Does this mean Youtube has blocked me? Can I do anything about this? If I wait a little while and try again, and I can download more. I can download all of them in fact. So I’m not sure what was going on before.

Concat

Now, I also want to join some of these files together to make a single file.

https://trac.ffmpeg.org/wiki/Concatenate

ffmpeg looks useful in this regard. I’m not sure if I can go straight to one file using the m4a files, according to this post.

https://stackoverflow.com/questions/18434854/merge-m4a-files-in-terminal

Yeah, a couple of different options to combine m4a directly didn’t work.

The cat command from the stack overflow post didn’t work.
cat file1.aac file2.aac file3.aac >> filenew.aac

This looks like it will work
ffmpeg -i "concat:file1.aac|file2.aac|file3.aac" -c copy filenew.aac

Yes! Now to get a nice file name and m4a format back again.

File size

And finally, I might want to reduce the quality and therefore filesize of these files just a little bit. Is this where I mention the size limit for a Yoto card? Because that’s where these are ending up. After a bunch of debugging, here’s my python script for converting the files.

import subprocess
import os
import re

def downsize_me(old, new): 
    # function to reduce the size of an audio file by reducing the bitrate
    # we can't just pass a string to subprocess.call() we need to create a list
    # we can't split the string, because there are spaces in the file name
    # we also don't need to quote the filename strings, which is good, but took some debugging
    # command = f"ffmpeg -i '{old}' -c:a aac -b:a 64k '{new}'"
    command = ['ffmpeg', '-i', old, '-c:a', 'aac', '-b:a', '64k', new]
    # print(command)
    # subprocess.call(command.split(), shell=True)
    subprocess.call(command, shell=True)

# raw string for single backslashes
folder = r"C:\Users\NathanMoore\code\youtube-download"
# apparently scandir() is nicer than listdir(), and that's true
# files = [f for f in os.listdir(folder) if os.path.isfile(os.path.join(folder, f))]
files = [f.name for f in os.scandir(folder) if f.is_file()]

for ff in files: 
    # construct these file names before passing over
    # remove the youtube url entry between square brackets
    old = os.path.join(folder, ff)
    new = os.path.join(folder, 'smaller', re.sub(r' \[.*\]', '', ff))
    downsize_me(old, new)
    # print(old, new)