David's Coding Blog

Interpreter - Data structures matter

Nowadays, one might think that CPUs are so fast and memory so plentiful that optimizing is a thing for later. It turns out that it depends on the program.

Before

The first implementation for the PHP array in my own PHP interpreter allowed every data type as key. This is also represented in its structure.


type ArrayRuntimeValue struct {
    *RuntimeValue
    Keys     []IRuntimeValue
    Elements map[IRuntimeValue]IRuntimeValue
}

func (runtimeValue *ArrayRuntimeValue) findKey(key IRuntimeValue) (IRuntimeValue, bool) {
    if key == nil {
        return NewVoidRuntimeValue(), false
    }

    for k := range runtimeValue.Elements {
        if k.GetType() != key.GetType() {
            continue
        }
        boolean, _ := compare(key, "===", k)
        if boolean.Value {
            return k, true
        }
    }

    return NewVoidRuntimeValue(), false
}

The IRuntimeValue is an interface that every PHP data type implements (int, float, bool, null and array).
This means that the map key is a pointer to a struct implementing this interface. Every lookup of a value required the interation of every key to compare the data types and values until the given key was found. The result the benefits that GoLangs map implementation could provide, would not work here.

Test case

The following test case was used to measure the performance of my implementation. On my Dell XPS 15 (Intel i7 11800H, 32 GB) the following code took 4.6 seconds to run.


$rounds = 10000;
$a = [];
// Write access
for ($i = 0; $i < $rounds; $i++) {
    $a[] = $i;
}
// Read and write access
for ($i = 0; $i < $rounds; $i++) {
    $a[$i] = $a[$i] + 1;
}

Optimized

The first optimization was the tracking of the next new key. If a new entry is saved to the array without a given key, the array had to iterate all keys to find the highest one. Now it updates the nextKey with every write access if necessary.
This just improved the test case from 4.6 seconds to 4.4 seconds.

The second optimization did the huge improvement: Using a string as key for the GoLang map.


type ArrayRuntimeValue struct {
    *RuntimeValue
    Keys     []IRuntimeValue
    Elements map[string]IRuntimeValue
    // Keeping track of next key
    nextKey    int64
    nextKeySet bool
}

PHP only allows integers and strings as keys. By converting integers to a string and using this as a key, the lookup time could be improved. No loop was required to compare a given key with all keys in the map. The given key can just be converted to a string.

This improved the time to run the test case from 4.6 seconds to 46 milliseconds.

Learnings

1. The optimizations did not require to be a genius. Just use the performant data structures provided.

2. All the time before the array was fast enough. If an array contains just three or four entries, you cannot notice this unoptimized array implementation.

3. Having built-in measuring tools in GoPHP is very helpful to measure time improvements. :D