Notes for March 15th, 2025
Notes for March 15th
-
ComfyUI Setup and FLUX.1 Dev Model Download
Installed ComfyUI on my local machine (Ubuntu 22.04) using the official GitHub repository (git clone https://github.com/comfyanonymous/ComfyUI.git). Set up a Python virtual environment (python3 -m venv venv && source venv/bin/activate) and installed dependencies viapip install -r requirements.txt. Downloaded the FLUX.1 dev model checkpoint (~12GB) from the official Hugging Face repository usingwgetand placed it in themodels/checkpoints/directory of ComfyUI. Updated theconfig.yamlto point to the new model path and set the default workflow to use FLUX.1 for inference. Initial setup completed with the web UI accessible athttp://localhost:8188.
Below is a screenshot of my ComfyUI interface after setup:

-
Learning That TypeScript Backend Interpreter Changed to Go
Discovered that the backend interpreter for my TypeScript project (previously relying on Node.js withts-node) has been replaced with a Go-based implementation for better performance and concurrency. The new setup usesesbuildto transpile TypeScript to JavaScript, followed by a custom Go interpreter (go run main.go) to execute the compiled code. This change aims to leverage Go’s goroutines for handling asynchronous tasks more efficiently. Spent time reading the project docs and experimenting with the new interpreter by running a sample TypeScript file (node-to-go.ts) through the pipeline. Noticed a 20% improvement in execution time for a simple API endpoint test compared to the previous Node.js setup. -
Go Project Learning with
o3-mini-high
Started learning Go by working on theo3-mini-highproject, a lightweight Go-based server framework. Cloned the repository (git clone https://github.com/example/o3-mini-high.git) and set up the Go environment (Go 1.20 installed viasudo apt install golang-go). Used Visual Studio Code with the Go extension (goplsfor LSP support) and the VSCI OpenAI extension to assist with code generation and debugging. Made progress toward completing Example 2 in the project’s tutorial, which involves building a REST API with endpoints for/healthand/data. Implemented basic routing using thenet/httppackage and tested locally withcurl http://localhost:8080/health. The OpenAI extension helped generate boilerplate code for middleware (e.g., logging), saving about 30 minutes of manual setup.
Errors Encountered
-
FLUX.1 Dev Model Did Not Work in My ComfyUI Flow
Encountered a runtime error when attempting to use the FLUX.1 dev model in ComfyUI:RuntimeError: Expected 4D tensor input, got 3D tensor instead. Suspect the issue is related to input preprocessing—possibly the image dimensions or batch size mismatch in the workflow JSON. Checked the ComfyUI logs (logs/comfyui.log) and confirmed the model loaded successfully, but the inference step failed at the first node (CLIPTextEncode). Tried adjusting the input resolution to 512x512 (as per FLUX.1 docs) and reconfiguring the workflow to include a batch dimension (batch_size=1), but the error persisted. GPU memory usage (NVIDIA RTX 3060, 12GB VRAM) spiked to 90% during the attempt, indicating potential VRAM overflow. Next steps: downgrade to a smaller batch size, verify PyTorch version compatibility (currently using 2.0.1), and check the ComfyUI GitHub issues for similar reports. -
**ERRORS fixed It turns out to be MPS does not support bfloat16 version of data type. So I changed it to fp8 model.
BF16 (bfloat16) and FP16 (float16) are both 16‐bit floating-point formats, but they differ in how they split bits between the exponent and the significand:
• FP16: Uses 1 bit for the sign, 5 bits for the exponent, and 10 bits for the significand. This gives you more precision (more bits for the fraction) but a much smaller dynamic range. • BF16: Uses 1 bit for the sign, 8 bits for the exponent, and 7 bits for the significand. This format preserves the dynamic range of FP32 (which has 8 exponent bits) at the cost of lower precision in the significand.
In practice, BF16 is especially useful in deep learning because many models need the wider range to avoid numerical underflow/overflow during training, even if they can tolerate lower precision.
Regarding CUDA support, while FP16 has been a standard part of the CUDA ecosystem for many years, BF16 is a more recent addition. Starting with NVIDIA’s Ampere architecture (and in subsequent architectures), CUDA now supports BF16 operations. This means that on supported GPUs (e.g. A100), you can use BF16 with libraries like PyTorch (via torch.bfloat16). It’s also commonly used on CPUs (and TPUs) for mixed-precision training.
Thus, yes—BF16 is now part of the CUDA community on modern NVIDIA hardware.
-
**ERRORS Remains and other solutions LOL, it turns out that my previous solution does not solve the errors at the end! The final solution is to use diffusionbee. :D gg noob.