Gap Buffer
The Gap Buffer is a popular data structure for text editors to represent files and editable buffers. The most famous of them probably being GNU Emacs.
What does it do?
A Gap Buffer is simply a list of characters, similar to a normal string, with the added twist of splitting it into two side: the prefix and suffix, on either side of the cursor. In between them, a gap is left to allow for quick insertion at the cursor.
Moving the cursor moves the gap around the buffer, the prefix and suffix getting shorter/longer as required.
Implementation
I’ll be writing a sample implementation in Python, as with the rest of the
series. I don’t think it showcases the
elegance of the Gap Buffer in action like a C implementation full of
memmove
s would, but it does makes it short and sweet.
Representation
We’ll be representing the gap buffer as an actual list of characters.
Given that Python doesn’t have characters, let’s settle for a list of strings, each representing a single character…
Char = str
class GapBuffer:
# List of characters, contains prefix and suffix of string with gap in the middle
_buf: list[Char]
# The gap is contained between [start, end) (i.e: buf[start:end])
_gap_start: int
_gap_end: int
# Visual representation of the gap buffer:
# This is a very [ ]long string.
# |<----------------------------------------------->| capacity
# |<------------>| |<-------->| string
# |<------------------->| gap
# |<------------>| prefix
# |<-------->| suffix
def __init__(self, initial_capacity: int = 16) -> None:
assert initial_capacity > 0
# Initialize an empty gap buffer
self._buf = [""] * initial_capacity
self._gap_start = 0
self._gap_end = initial_capacity
Accessors
I’m mostly adding these for exposition, and making it easier to write assert
s
later.
@property
def capacity(self) -> int:
return len(self._buf)
@property
def gap_length(self) -> int:
return self._gap_end - self._gap_start
@property
def string_length(self) -> int:
return self.capacity - self.gap_length
@property
def prefix_length(self) -> int:
return self._gap_start
@property
def suffix_length(self) -> int:
return self.capacity - self._gap_end
Growing the buffer
I’ve written this method in a somewhat non-idiomatic manner, to make it closer
to how it would look in C using realloc
instead.
It would be more efficient to use slicing to insert the needed extra capacity directly, instead of making a new buffer and copying characters over.
def grow(self, capacity: int) -> None:
assert capacity >= self.capacity
# Create a new buffer with the new capacity
new_buf = [""] * capacity
# Move the prefix/suffix to their place in the new buffer
added_capacity = capacity - len(self._buf)
new_buf[: self._gap_start] = self._buf[: self._gap_start]
new_buf[self._gap_end + added_capacity :] = self._buf[self._gap_end :]
# Use the new buffer, account for added capacity
self._buf = new_buf
self._gap_end += added_capacity
Insertion
Inserting text at the cursor’s position means filling up the gap in the middle of the buffer. To do so we must first make sure that the gap is big enough, or grow the buffer accordingly.
Then inserting the text is simply a matter of copying its characters in place, and moving the start of the gap further right.
def insert(self, val: str) -> None:
# Ensure we have enouh space to insert the whole string
if len(val) > self.gap_length:
self.grow(max(self.capacity * 2, self.string_length + len(val)))
# Fill the gap with the given string
self._buf[self._gap_start : self._gap_start + len(val)] = val
self._gap_start += len(val)
Deletion
Removing text from the buffer simply expands the gap in the corresponding direction, shortening the string’s prefix/suffix. This makes it very cheap.
The methods are named after the backspace
and delete
keys on the keyboard.
def backspace(self, dist: int = 1) -> None:
assert dist <= self.prefix_length
# Extend gap to the left
self._gap_start -= dist
def delete(self, dist: int = 1) -> None:
assert dist <= self.suffix_length
# Extend gap to the right
self._gap_end += dist
Moving the cursor
Moving the cursor along the buffer will shift letters from one side of the gap to the other, moving them accross from prefix to suffix and back.
I find Python’s list slicing not quite as elegant to read as a memmove
, though
it does make for a very small and efficient implementation.
def left(self, dist: int = 1) -> None:
assert dist <= self.prefix_length
# Shift the needed number of characters from end of prefix to start of suffix
self._buf[self._gap_end - dist : self._gap_end] = self._buf[
self._gap_start - dist : self._gap_start
]
# Adjust indices accordingly
self._gap_start -= dist
self._gap_end -= dist
def right(self, dist: int = 1) -> None:
assert dist <= self.suffix_length
# Shift the needed number of characters from start of suffix to end of prefix
self._buf[self._gap_start : self._gap_start + dist] = self._buf[
self._gap_end : self._gap_end + dist
]
# Adjust indices accordingly
self._gap_start += dist
self._gap_end += dist