Arrays, Strings, Structs, and Enums: Organizing Collections of Data in C
A single variable stores a single value. Real-world programs need to organize groups of values. C uses arrays for contiguous elements, strings for null-terminated character sequences, structs to aggregate multiple fields into a single object, and enums to assign names to integer states. These constructs seem elementary, but they tie directly into memory layout, out-of-bounds access, protocol formats, and ABIs.
Arrays Are Contiguous Elements
An array stores a group of elements of the same type.
int scores[3] = {90, 80, 70};
In memory, they are typically laid out contiguously:
scores
├── scores[0] = 90
├── scores[1] = 80
└── scores[2] = 70
Indices start at 0.
For an array of length 3, the valid indices are 0, 1, and 2.
Accessing scores[3] is out-of-bounds.
Arrays Do Not Track Their Own Length
In expressions, an array name often decays into a pointer to its first element. The array syntax in function parameters also decays in this manner.
void print_scores(int scores[], size_t count);
Here, scores is fundamentally a pointer parameter.
The function has no concept of the array's length, so you must pass count separately.
void print_scores(const int* scores, size_t count) {
for (size_t i = 0; i < count; ++i) {
printf("%d\n", scores[i]);
}
}
Passing a pointer alongside a length is a classic C API combination. Losing track of the length is the root cause of countless out-of-bounds exploits.
sizeof on Arrays vs. Pointers
int values[4] = {1, 2, 3, 4};
size_t bytes = sizeof values;
size_t count = sizeof values / sizeof values[0];
In the scope where the array is still recognized as an array, sizeof values yields the total byte size of the entire array.
However, if passed into a function, the parameter degrades into a pointer.
void f(int values[]) {
printf("%zu\n", sizeof values); // Pointer size, not array size
}
This is a prevalent error.
Never use sizeof inside a function to deduce the length of an array passed as an argument.
C Strings Terminate with '\0'
A C string is not a distinct object type.
It is an array of characters terminated by a null character ('\0').
char name[] = "ZeroBug";
Actual memory representation:
Z e r o B u g \0
The trailing '\0' is the termination marker.
Without it, string functions will continue reading memory until they happen to hit a zero byte elsewhere.
This results in out-of-bounds reads and potential information disclosure.
Character Arrays vs. String Literals
char a[] = "hello";
const char* b = "hello";
a is a modifiable array copy.
b points directly to a string literal, which generally should not be modified.
b[0] = 'H'; // HIGH RISK: Usually undefined behavior
As a beginner, remember: treat string literals as read-only data.
Safe String Handling Requires Length Awareness
Many traditional C string functions rely heavily on '\0'.
If the input is untrusted, you must explicitly limit the length.
char buffer[16];
snprintf(buffer, sizeof buffer, "%s", input);
snprintf is aware of the destination buffer's size.
This is inherently safer than unbounded copying.
However, you must still inspect the return value to verify if truncation occurred.
Structs Aggregate Diverse Fields
typedef struct Point {
int x;
int y;
} Point;
Usage:
Point p = { .x = 10, .y = 20 };
printf("%d\n", p.x);
A struct merges multiple fields into a single object. It is ideal for representing business entities, protocol parsing results, or resource handle states.
Structs Can Have Padding
There might be padding bytes inserted between struct fields.
typedef struct Packet {
char tag;
int length;
} Packet;
Memory layout might look like this:
offset 0: tag
offset 1-3: padding
offset 4-7: length
Never assume the size of a struct strictly equals the sum of its fields. When dealing with network transmission or file formats, never write the raw memory of a struct directly to the stream.
Struct Pointers Use ->
Point p = {1, 2};
Point* ptr = &p;
printf("%d\n", ptr->x);
ptr->x is syntactic sugar for (*ptr).x.
It dereferences the pointer first, then accesses the field.
If ptr is null or dangling, this will crash.
Nested Structs
typedef struct Rect {
Point left_top;
Point right_bottom;
} Rect;
Nested structs allow data models to mirror real-world objects more closely. However, remain mindful of the overall copying cost and field alignment constraints.
Rect r = {
.left_top = {0, 0},
.right_bottom = {100, 100},
};
Enums Assign Names to States
typedef enum Status {
STATUS_OK,
STATUS_NOT_FOUND,
STATUS_PERMISSION_DENIED
} Status;
Enums make integer states human-readable.
Status status = STATUS_OK;
Avoid scattering raw "magic numbers" to represent states. Clearer states lead to more robust error handling.
switch and Enums Combined
switch (status) {
case STATUS_OK:
break;
case STATUS_NOT_FOUND:
break;
case STATUS_PERMISSION_DENIED:
break;
}
Compiler warnings can assist in identifying omitted branches. State machines and error codes are perfect use cases for enums.
union for Shared Storage
Multiple members of a union share the exact same segment of storage.
typedef union Value {
int i;
float f;
} Value;
It is useful for saving space or expressing low-level layout nuances. However, if the member read differs from the member last written, the semantics become complex and risky. Beginners should use them sparingly.
Data Model Design
Structs and enums combine to form clear models.
typedef enum TokenKind {
TOKEN_NUMBER,
TOKEN_PLUS,
TOKEN_END
} TokenKind;
typedef struct Token {
TokenKind kind;
int value;
} Token;
This is significantly more maintainable than a sprawl of flat variables. Data structures should naturally express invariants.
Engineering Risks
Common risks associated with arrays and structs:
- Array out-of-bounds access.
- Losing array length via function parameters.
- Missing
'\0'in strings. - Unbounded string copying.
- Leaking sensitive data residing in struct padding.
- Using raw structs directly as cross-platform protocol formats.
- Shallow copying structs containing resource handles.
- Missing default error handling for enums.
These are critical security and stability flaws, not pedantic syntax details. If a struct contains a file handle, an authorization flag, or an external buffer pointer, your design must encompass resource release, permission isolation, and audit logging to ensure a routine copy doesn't transform into a leak or privilege escalation.
Summary
Arrays arrange identical types contiguously. Strings use a null-termination convention to represent text. Structs aggregate disparate fields into objects. Enums assign names to states and domains. These foundational data organizations are the prerequisites for subsequent concepts like pointers, memory layout, ABIs, and C++ object models.