Intermittent bugs are amongst the most frustrating ones to track down to the root. Why? They don't crash your program consistently — they just occasionally make things stop working, and restarting usually fixes them... for a while though. And this is exactly the kind of ghost that was haunting the JNI-for-NI-Drivers library, an open-source Java wrapper for National Instruments (NI) hardware drivers, also used in a train cab simulator at the Birmingham Centre for Railway Research and Education (BCRRE), University of Birmingham.
This post walks you through the root cause of a bug, how it was diagnosed, and its fix contributed via Pull Request #2.
What is JNI-for-NI-Drivers?
National Instruments, officially, does not provide Java language bindings for its hardware drivers. There, JNI-for-NI-Drivers library fills that gap.
In essence, it is a wrapper around the officially provided drivers, built itself with the Java Native Interface (JNI). It lets Java applications hook into NI's C-based DLLs — enabling control and data acquisition from NI DAQ devices (such as the NI-DAQ USB-6000) directly from Java code.
The library is used in a real-world train cab simulator at Birmingham University, where it reads analogue and digital inputs from DAQ devices connected to the cab to interface with train controller handles and buttons. So when it misbehaves, the consequences are felt in an educational simulation context.
The Symptom: Mysterious, Intermittent Failures
The library maintainer, Dave Kirkwood, noticed an unusual pattern: the project would fail to initialise every now and then, but restarting it would usually make the problem go away. As he described it:
I noticed sometimes the project I use this on fails to initialise intermittently... on failure, I just restart and hope it works next time.
This, in itself is a classic hallmark of memory corruption or undefined behaviour bug — non-deterministic, hard to reproduce, and nearly impossible to reason about from the Java side alone.
The failure wasn't being triggered by faulty logic. It was hiding in the thin boundary lying between the Java world and the native C universe: specifically, in how strings are handled by both.
Root Cause: Non-Null-Terminated Strings Crossing the JNI Boundary
When Java passes a String across the JNI boundary into a C function, the C library typically expects the string to be a null-terminated C string: meaning, the sequence of bytes should be ending with 0x00 ('\0') as the last byte in the sequence.
This is the standard convention in C: it tells functions like strlen and strcpy (string copy) the mark at which any given string ends.
On the contrary, Java strings are not null-terminated. In Java, String type stores the length alongside its character data. So, inherently, by design it doesn't need a null-terminator to know where any string ends. This is perfectly fine while working in Java, but becomes dangerous the moment any string is handed over to a C library, that relies purely on null-terminators to know when to stop processing a given string.
getBytes() and null-terminator isn't appended to the sequence of bytes, the resulting byte array, when handed over to the NI library makes the library misbehave for not being properly ended with 0x00. The NI library could then read past the end of the intended string, picking up whatever happened to be sitting next, in its adjacent memory — might be random numbers, letters, or other garbage values - till the time 0x00 is reached.
This was confirmed using NI's own NI I/O Trace utility, which captures and logs all calls made to NI driver functions. The trace clearly showed that strings were sometimes arriving at the NI library appended with random numbers or letters — providing a classic evidence of missing null-terminator, causing the C library to overrun the intended end of the string buffer.
When garbage values are appended to a string argument — for example, a device name or task name — the NI library naturally fails to find or create the resource associated with that name, causing the entire initialisation sequence to fail. Since the garbage data in adjacent memory is non-deterministic (it changes between runs and even within a run), the bug appeared intermittently.
The Fix: Explicitly Null-Terminating Strings
The fix is conceptually simple but requires care in implementation. The approach taken in PR #2 ensures every string passed into the NI library is properly null-terminated. Here is the technique used:
// Before the fix: simple byte extraction — no null-terminator
byte[] nameBytes = deviceName.getBytes();
// After the fix: append a space first, get bytes, then make the last byte zero
byte[] nameBytes = (deviceName + " ").getBytes();
nameBytes[nameBytes.length - 1] = 0;
Let's break down why this works:
Step 1 — Append a space character. The string is concatenated with a single space (" ") before calling getBytes(). This is a deliberate trick to ensure the resulting byte array has one extra byte at the end that can be controlled.
Step 2 — Obtain the byte array. getBytes() converts the Java string (with the appended space) into a byte array using the platform's default charset. Crucially, this gives us the string data plus one extra byte at position nameBytes.length - 1.
Step 3 — Zero out the last byte. That trailing byte — originally the space character (0x20) — is then explicitly set to 0. This produces a properly null-terminated byte sequence that the C library can safely consume.
The result is a byte array having content exactly the same as the original string's characters followed by a null byte (0x00), satisfying the C convention and preventing any overread into adjacent memory.
The Broader Lesson
This bug is a reminder of something easy to overlook when writing JNI code: the Java type system gives you no warning when a Java object is about to be interpreted using C conventions. Java's String is perfectly safe, fully encapsulated object inside the JVM. The moment it crosses into native code, all those safety guarantees evaporate, and you are operating in the world of raw bytes and C-string conventions.
Here are a few lessons worth keeping in mind when working in such scenarios:
Always null-terminate strings passed to C libraries. Unless you are absolutely sure that the receiving C function expects a pre-specified length, assume that it wants a null-terminated C string.
Use logging tools to confirm what the processing layer is actually receiving. The intermittent nature of this bug would have remained mysterious without the observability tool. Whenever you have a bug at the native interface, tracing what the native layer sees is invaluable.
Intermittent failures often mean memory layout issues. If your bug disappears on restart but comes back unpredictably, the cause is probably non-deterministic memory layout — a strong indicator of missing terminators, buffer overruns, or uninitialized memory.
Comments
Post a Comment