Notes for March 15th

ComfyUI Setup and FLUX.1 Dev Model Download
Installed ComfyUI on my local machine (Ubuntu 22.04) using the official GitHub repository (git clone https://github.com/comfyanonymous/ComfyUI.git). Set up a Python virtual environment (python3 -m venv venv && source venv/bin/activate) and installed dependencies via pip install -r requirements.txt. Downloaded the FLUX.1 dev model checkpoint (~12GB) from the official Hugging Face repository using wget and placed it in the models/checkpoints/ directory of ComfyUI. Updated the config.yaml to point to the new model path and set the default workflow to use FLUX.1 for inference. Initial setup completed with the web UI accessible at http://localhost:8188.
Below is a screenshot of my ComfyUI interface after setup:
Learning That TypeScript Backend Interpreter Changed to Go
Discovered that the backend interpreter for my TypeScript project (previously relying on Node.js with ts-node) has been replaced with a Go-based implementation for better performance and concurrency. The new setup uses esbuild to transpile TypeScript to JavaScript, followed by a custom Go interpreter (go run main.go) to execute the compiled code. This change aims to leverage Go’s goroutines for handling asynchronous tasks more efficiently. Spent time reading the project docs and experimenting with the new interpreter by running a sample TypeScript file (node-to-go.ts) through the pipeline. Noticed a 20% improvement in execution time for a simple API endpoint test compared to the previous Node.js setup.
Go Project Learning with o3-mini-high
Started learning Go by working on the o3-mini-high project, a lightweight Go-based server framework. Cloned the repository (git clone https://github.com/example/o3-mini-high.git) and set up the Go environment (Go 1.20 installed via sudo apt install golang-go). Used Visual Studio Code with the Go extension (gopls for LSP support) and the VSCI OpenAI extension to assist with code generation and debugging. Made progress toward completing Example 2 in the project’s tutorial, which involves building a REST API with endpoints for /health and /data. Implemented basic routing using the net/http package and tested locally with curl http://localhost:8080/health. The OpenAI extension helped generate boilerplate code for middleware (e.g., logging), saving about 30 minutes of manual setup.

Errors Encountered

FLUX.1 Dev Model Did Not Work in My ComfyUI Flow
Encountered a runtime error when attempting to use the FLUX.1 dev model in ComfyUI: RuntimeError: Expected 4D tensor input, got 3D tensor instead. Suspect the issue is related to input preprocessing—possibly the image dimensions or batch size mismatch in the workflow JSON. Checked the ComfyUI logs (logs/comfyui.log) and confirmed the model loaded successfully, but the inference step failed at the first node (CLIPTextEncode). Tried adjusting the input resolution to 512x512 (as per FLUX.1 docs) and reconfiguring the workflow to include a batch dimension (batch_size=1), but the error persisted. GPU memory usage (NVIDIA RTX 3060, 12GB VRAM) spiked to 90% during the attempt, indicating potential VRAM overflow. Next steps: downgrade to a smaller batch size, verify PyTorch version compatibility (currently using 2.0.1), and check the ComfyUI GitHub issues for similar reports.
**ERRORS fixed It turns out to be MPS does not support bfloat16 version of data type. So I changed it to fp8 model.

BF16 (bfloat16) and FP16 (float16) are both 16‐bit floating-point formats, but they differ in how they split bits between the exponent and the significand:

• FP16: Uses 1 bit for the sign, 5 bits for the exponent, and 10 bits for the significand. This gives you more precision (more bits for the fraction) but a much smaller dynamic range. • BF16: Uses 1 bit for the sign, 8 bits for the exponent, and 7 bits for the significand. This format preserves the dynamic range of FP32 (which has 8 exponent bits) at the cost of lower precision in the significand.

In practice, BF16 is especially useful in deep learning because many models need the wider range to avoid numerical underflow/overflow during training, even if they can tolerate lower precision.

Regarding CUDA support, while FP16 has been a standard part of the CUDA ecosystem for many years, BF16 is a more recent addition. Starting with NVIDIA’s Ampere architecture (and in subsequent architectures), CUDA now supports BF16 operations. This means that on supported GPUs (e.g. A100), you can use BF16 with libraries like PyTorch (via torch.bfloat16). It’s also commonly used on CPUs (and TPUs) for mixed-precision training.

Thus, yes—BF16 is now part of the CUDA community on modern NVIDIA hardware.
**ERRORS Remains and other solutions LOL, it turns out that my previous solution does not solve the errors at the end! The final solution is to use diffusionbee. :D gg noob.