A subsidiary of CronBlocks  ·  Engineering Insights for Serious Practitioners

Fixing an Intermittent String Bug in JNI-for-NI-Drivers — A Railway Cab Simulator Story

Intermittent bugs are amongst the most frustrating ones to track down to the root. Why? They don't crash your program consistently — they just occasionally make things stop working, and restarting usually fixes them... for a while though. And this is exactly the kind of ghost that was haunting the JNI-for-NI-Drivers library, an open-source Java wrapper for National Instruments (NI) hardware drivers, also used in a train cab simulator at the Birmingham Centre for Railway Research and Education (BCRRE), University of Birmingham.

Open Source Contribution  ·  JNI / NI-DAQ  ·  Pull Request #2

This post walks you through the root cause of a bug, how it was diagnosed, and its fix contributed via Pull Request #2.

What is JNI-for-NI-Drivers?

National Instruments, officially, does not provide Java language bindings for its hardware drivers. There, JNI-for-NI-Drivers library fills that gap.

In essence, it is a wrapper around the officially provided drivers, built itself with the Java Native Interface (JNI). It lets Java applications hook into NI's C-based DLLs — enabling control and data acquisition from NI DAQ devices (such as the NI-DAQ USB-6000) directly from Java code.

The library is used in a real-world train cab simulator at Birmingham University, where it reads analogue and digital inputs from DAQ devices connected to the cab to interface with train controller handles and buttons. So when it misbehaves, the consequences are felt in an educational simulation context.

The Symptom: Mysterious, Intermittent Failures

The library maintainer, Dave Kirkwood, noticed an unusual pattern: the project would fail to initialise every now and then, but restarting it would usually make the problem go away. As he described it:

I noticed sometimes the project I use this on fails to initialise intermittently... on failure, I just restart and hope it works next time.

This, in itself is a classic hallmark of memory corruption or undefined behaviour bug — non-deterministic, hard to reproduce, and nearly impossible to reason about from the Java side alone.

The failure wasn't being triggered by faulty logic. It was hiding in the thin boundary lying between the Java world and the native C universe: specifically, in how strings are handled by both.

Root Cause: Non-Null-Terminated Strings Crossing the JNI Boundary

When Java passes a String across the JNI boundary into a C function, the C library typically expects the string to be a null-terminated C string: meaning, the sequence of bytes should be ending with 0x00 ('\0') as the last byte in the sequence.

This is the standard convention in C: it tells functions like strlen and strcpy (string copy) the mark at which any given string ends.

On the contrary, Java strings are not null-terminated. In Java, String type stores the length alongside its character data. So, inherently, by design it doesn't need a null-terminator to know where any string ends. This is perfectly fine while working in Java, but becomes dangerous the moment any string is handed over to a C library, that relies purely on null-terminators to know when to stop processing a given string.

⚠️ The Bug: When a string's bytes are obtained in Java via getBytes() and null-terminator isn't appended to the sequence of bytes, the resulting byte array, when handed over to the NI library makes the library misbehave for not being properly ended with 0x00. The NI library could then read past the end of the intended string, picking up whatever happened to be sitting next, in its adjacent memory — might be random numbers, letters, or other garbage values - till the time 0x00 is reached.

This was confirmed using NI's own NI I/O Trace utility, which captures and logs all calls made to NI driver functions. The trace clearly showed that strings were sometimes arriving at the NI library appended with random numbers or letters — providing a classic evidence of missing null-terminator, causing the C library to overrun the intended end of the string buffer.

When garbage values are appended to a string argument — for example, a device name or task name — the NI library naturally fails to find or create the resource associated with that name, causing the entire initialisation sequence to fail. Since the garbage data in adjacent memory is non-deterministic (it changes between runs and even within a run), the bug appeared intermittently.

The Fix: Explicitly Null-Terminating Strings

The fix is conceptually simple but requires care in implementation. The approach taken in PR #2 ensures every string passed into the NI library is properly null-terminated. Here is the technique used:

// Before the fix: simple byte extraction — no null-terminator
byte[] nameBytes = deviceName.getBytes();

// After the fix: append a space first, get bytes, then make the last byte zero
byte[] nameBytes = (deviceName + " ").getBytes();
nameBytes[nameBytes.length - 1] = 0;

Let's break down why this works:

Step 1 — Append a space character. The string is concatenated with a single space (" ") before calling getBytes(). This is a deliberate trick to ensure the resulting byte array has one extra byte at the end that can be controlled.

Step 2 — Obtain the byte array. getBytes() converts the Java string (with the appended space) into a byte array using the platform's default charset. Crucially, this gives us the string data plus one extra byte at position nameBytes.length - 1.

Step 3 — Zero out the last byte. That trailing byte — originally the space character (0x20) — is then explicitly set to 0. This produces a properly null-terminated byte sequence that the C library can safely consume.

The result is a byte array having content exactly the same as the original string's characters followed by a null byte (0x00), satisfying the C convention and preventing any overread into adjacent memory.

The Broader Lesson

This bug is a reminder of something easy to overlook when writing JNI code: the Java type system gives you no warning when a Java object is about to be interpreted using C conventions. Java's String is perfectly safe, fully encapsulated object inside the JVM. The moment it crosses into native code, all those safety guarantees evaporate, and you are operating in the world of raw bytes and C-string conventions.

Here are a few lessons worth keeping in mind when working in such scenarios:

Always null-terminate strings passed to C libraries. Unless you are absolutely sure that the receiving C function expects a pre-specified length, assume that it wants a null-terminated C string.

Use logging tools to confirm what the processing layer is actually receiving. The intermittent nature of this bug would have remained mysterious without the observability tool. Whenever you have a bug at the native interface, tracing what the native layer sees is invaluable.

Intermittent failures often mean memory layout issues. If your bug disappears on restart but comes back unpredictably, the cause is probably non-deterministic memory layout — a strong indicator of missing terminators, buffer overruns, or uninitialized memory.

Comments

Popular posts from this blog

Technology: The New Colonialism

Empires no longer arrive with armies. Yet, they do come. In modern days, with software, semiconductors, cloud platforms, and export controls. Implying, colonialism didn't disappear — it evolved into a much more sophisticated, invisible, and arguably more durable form of possession and control. — 12 min read Alexander the Great built one of history's largest empires in barely a decade. The Mongols forged the most expansive contiguous land empire the world had ever seen. The British Empire ruled so much of the planet that the sun supposedly never set on it. Separated by centuries, all such powers shared something in common: control something critical that others depend on, and the power shall follow. For most of history, that formula to control translated into professional armies, naval fleets, seaports, trade routes, and vast occupied territories under direct control. In-short, physical domination of the physical world persi...

China Can Build Chips — But Why Can't It Catch TSMC?

China has demonstrated 7nm production without EUV, stockpiled ninety advanced lithography machines, and invested hundreds of billions to semiconductor self-sufficiency. Yet the gap with TSMC is not closing — it is, in fact, in some critical dimensions, widening. From the very outlook, it appears to be an engineering problem, not a political one. — 15 min read August 2023 — stripdown of Huawei Mate 60 Pro sent shockwaves through the Western semiconductor and defence establishments. Inside the device was a Kirin 9000s chip — manufactured by SMIC at what appeared to be done by a 7nm processing node. At the time, SMIC was not supposed to be able to do that. Clearly, it appeared to be ahead of its time. EUV lithography machines, widely considered as prerequisite for sub-10nm production, had been blocked from export to China since 2019. The US intelligence community had apparently missed their mark. Thus, policy circles scrambled. The headlines flashed, declaring ...

The Most Misunderstood Keywords in Embedded C/C++

Six keywords appearing in almost every embedded project, cited commonly in code reviews, and understood correctly by almost a negligible number of new developers. And this isn't a report from a random academic survey — it is a field report from development teams of systems where getting these wrong costs weeks and months. — 18 min read Particular kind of bug keeps haunting embedded systems: it is the kind where the code is correct, logic looks sound, unit tests pass, but the system still fails in hardware. You revisit the algorithm. Verify the peripherals. Add printf s for debugging — which, in turn, changes the timing enough for the fault to disappear. Then you remove it. And the problem returns. In a significant proportion of such cases, the root cause can be traced to misunderstood keywords. It is not about a missing keyword — it is about a keyword that is present, used with confidence, but doing something entirely different from what the developer beli...