26.04.2025 10:00
42

Project Caretaker. Part 8. Software

The ESP32CAM is a fairly popular controller. Wi-Fi-controlled devices with cameras are also very popular. But despite all this, I couldn't find any suitable firmware that would provide convenient device control, camera functionality, and an attractive design.

There are separate interfaces for cameras with lots of settings, but the closest thing I found for remote control looked like this:

popular-ui

This or a similar interface appears in almost all of the most popular videos and tutorials.

Our hardware isn't limited to just 2 motors, so we'll create our own interface. Features it will include:

  • Setting the resolution and quality of the video, starting and stopping the stream;
  • 2 control modes - joystick and sliders (like in a tank or tractor - 2 levers responsible for the left and right motors);
  • Activation of Bluetooth controller connection with status display;
  • LED brightness control (bright LEDs in the eyes);
  • Display of FPS value during streaming;
  • Responsive design and fullscreen mode.

I wrote the firmware in Cursor with Sonnet 3.7. I don't see the point in writing about the process - the whole web has been flooded with articles on this topic lately. I'll just say that AI has much more trouble with C and ESP32 programming in particular than with more popular languages (Python, JS, PHP, etc.).

The result turned out as follows, the settings window and joystick control mode:

joystick

Slider control mode:

sliders

Let me describe the most interesting technical aspects of the firmware that emerged from the collective unconscious.

It was developed for PlatformIO with the Arduino framework and using FreeRTOS (I described instructions for enabling statistics collection functions in my previous article specifically for this project). That is, all tasks (BT, streaming, HTTP, motor control, boot log) are divided into different FreeRTOS tasks and run on different cores (in case you didn't know, the ESP32 is dual-core).

The entire UI is written in a single HTML file, which is translated into C during firmware compilation, while being compressed with gzip - saving memory, resources, and page load time. I discovered the gzip trick on GitHub, but I came up with the C translation myself - when using SPIFFS for storing resources, there are additional complexities in the form of separate flashing of these resources and difficulties with OTA (over-the-air updates). Plus, our HTML volume is so small that it doesn't warrant separate storage.

My BT controller is a Radiomaster TX12 (v1) with an ELRS module that can pretend to be a BT joystick. I didn't delve deeply into the format and structure of the transmission due to lack of time and desire to spend time on it - with the help of AI, bytes from the packet responsible for the X and Y of the left and right joysticks were identified from the logs - and these were used for control in the same modes as in the web interface. If you decide to connect your own joystick - in addition to changing the device name in the config, you'll almost certainly need to change the BT packet processing.

The slider control logic is straightforward - each sets the motor value in the range of [-1; 1]. But with the joystick, such direct control logic turned out to be non-obvious. Therefore, my joystick works like this (I spent probably an hour just on this scheme):

joystick-work

Despite the joystick being visually round, the control zone is a square (same on the actual controller, by the way). The control is linear, the upper left corner is [1;0] (left motor maximum forward, right motor stationary), the middle point at the top is [1;1] (full forward), the rest following the same pattern.

There's also a rotation zone - two small sectors on the left and right in the center - in these zones, the motors rotate in opposite directions so that the device turns in place to the left or right respectively (also linearly, at maximum power in extreme positions). The center of the joystick is a dead zone. All this logic is implemented on the controller side (thresholds are configurable), the frontend only transmits the "coordinates."

Control signals, BT status requests, changing image quality - everything is implemented via HTTP requests. Since the ESP32 isn't a super-performant HTTP server, the position of the sticks is transmitted when changed with a limit of ~200ms (i.e., with continuous position changes, a request will be sent every 200 ms), there is no delay when resetting the position to 0.

A note about an issue with the original design: I planned to measure the battery voltage through the ESP32-CAM's analog input - which is why a voltage divider is shown in the diagram. Unfortunately, the controller's ADC2 is used during Wi-Fi operation, and absolutely all analog inputs brought to the ESP32-CAM pin header use it. I haven't found a solution yet besides installing a separate chip with SPI/UART connection - therefore, there's no measurement or display of the charge level in the interface (which is frustrating - I first fully implemented this in the firmware, including the interface, and only afterward saw an error in the logs that ADC2 was already occupied by the Wi-Fi connection).

all analog pins of the board use ADC2

Among the nuances I can mention - the video transmission speed greatly depends on the quality of the Wi-Fi signal. The antenna laid out on the board works, but performs rather poorly at a distance from the access point. In my case, the quality fluctuated depending on the streaming duration (the longer the stream and the hotter the board - the better the transmission) and on the presence of nearby objects that enhance or block the signal (for instance, when touching the board with my hand, the signal became clear - humans make pretty good antennas). When connecting an external antenna, these problems partially went away - just don't forget to solder the jumper on the board to switch to the external antenna.

You can find firmware details and instructions for configuration and use in the repository - esp32-caretaker.

In the end, I got very convenient control both from a browser with a mouse, from a smartphone, and from a physical controller.

No comments yet

Latest articles