Hi, my company makes electronic board games. While updating firmware, we noticed that some Samsung phones often had trouble completing the update due to packet loss over a BLE connection. What we didn’t expect is that the BLE connection stayed up, so the radios are keeping the connection open, but sometimes ATT packet are lost because L2CAP fragments are not retried.
The following data in this post is from Samsung Galaxy S9(SM-G960U, Rev1.1, Android 10, Baseband G960USQU9FVB2, Kernel 4.9.186, Build QP1A.190711.020.G960USQU9FVB2)
At a high level, we send chunks of firmware:
- central->peripheral header packet(small)
- central->peripheral data packet(many of these, each is MTU-3 or smaller)
- peripheral->central acknowledge once expected chunk size is received.
The symptom was that our device would hang during a firmware transfer, because it was not acknowledging the most recently attempted chunk. The application believed that it sent the data, but it seems to have been dropped by the phone’s radio. Here is the relevant part of a BLE sniffer log of a firmware update, where MTU was negotiated to 250B, with GATT writes of 246B:
Image Context:
- Curve-shapes: these denote successful GATT write-without-responses that were fragmented and my device(peripheral) received.
- LOST star-shape: this denotes a GATT write that was fragmented, and lost
- Oval-shape: this denotes packet loss, when the peripheral replied to the central but the central did not hear it. I’m not sure if this is related or a coincidence.
- After start-of-star/oval, the central retried but did not form a completed ATT response, so the whole GATT was lost.
- One more curve shape: the next GATT write is fragmented but succeeds.
When we reduced the packet size to MTU-10(240B), we no longer see fragmentation, and the transfer is reliable(or fails with a connection close instead):
I think this works only because we are managing to avoid fragmentation entirely, so when L2CAP loss occurs, a simple retry works.
I am starting to suspect that this might be a bug in how the Samsung’s BLE radio handles L2CAP loss while sending a fragmented ATT, since it’s being dropped without dropping the connection. Sniffer and Android BLE logs available.