Converting 32 Vs 64 bit

Converting 32-bit Applications Into 64-bit Applications: Things to Consider

The principal cause of problems when converting 32-bit applications to 64-bit applications is the change in size of the int type with respect to the long and pointer types. When converting 32-bit programs to 64-bit programs, only long types and pointer types change in size from 32 bits to 64 bits; integers of type int stay at 32 bits in size. This can cause trouble with data truncation when assigning pointer or long types to int types. Also, problems with sign extension can occur when assigning expressions using types shorter than the size of an int to an unsigned long or a pointer. This article discusses how to avoid or eliminate these problems.

Consider the Differences Between the 32-bit and 64-bit Data Models

The biggest difference between the 32-bit and the 64-bit compilation environments is the change in data-type models. The C data-type model for 32-bit applications is the ILP32 model, so named because the int and long types, and pointers, are 32-bit data types. The data-type model for 64-bit applications is the LP64 data model, so named because long and pointer types grow to 64 bits. The remaining C integer types and the floating-point types are the same in both data-type models.

It is not unusual for current 32-bit applications to assume that the int type, long type, and pointers are the same size. Because the size of long and pointer change in the LP64 data model, this change alone is the principal cause of ILP32-to-LP64 conversion problems.

Table 1. 32-bit and 64-bit data models

Use the lint Utility to Detect Problems with 64-bit long and Pointer Types

Use lint to check code that is written for both the 32-bit and the 64-bit compilation environment. Specify the -errchk=longptr64 option to generate LP64 warnings. Also use the -errchk=longptr64 flag which checks portability to an environment for which the size of long integers and pointers is 64 bits and the size of plain integers is 32 bits. The -errchk=longptr64 flag checks assignments of pointer expressions and long integer expressions to plain integers, even when explicit casts are used.

Use the -errchk=longptr64,signext option to find code where the normal ISO C value-preserving rules allow the extension of the sign of a signed-integral value in an expression of unsigned-integral type. Use the -xarch=v9 option of lint when you want to check code that you intend to run in the Solaris 64-bit SPARC compilation environment only. Use -xarch=amd64 when you want to check code you intend to run in the x86 64-bit environment.

When lint generates warnings, it prints the line number of the offending code, a message that describes the problem, and whether or not a pointer is involved. The warning message also indicates the sizes of the involved data types. When you know a pointer is involved and you know the size of the data types, you can find specific 64-bit problems and avoid the pre-existing problems between 32-bit and smaller types.

You can suppress the warning for a given line of code by placing a comment of the form "NOTE(LINTED(<optional message>))" on the previous line. This is useful when you want lint to ignore certain lines of code such as casts and assignments. Exercise extreme care when you use the "NOTE(LINTED(<optional message>))" comment because it can mask real problems. When you use NOTE, also include #include<note.h>. Refer to the lint man page for more information.

Check for Changes of Pointer Size With Respect to the Size of Plain Integers

Since plain integers and pointers are the same size in the ILP32 compilation environment, 32-bit code commonly relies on this assumption. Pointers are often cast to int or unsigned int for address arithmetic. You can cast your pointers to unsigned long because long and pointer types are the same size in both ILP32 and LP64 data-type models. However, rather than explicitly using unsigned long, use uintptr_t instead because it expresses your intent more closely and makes the code more portable, insulating it against future changes. To use the uintptr_t and intptr_t you need to #include <inttypes.h>.

Consider the following example:

char *p;

p = (char *) ((int)p & PAGEOFFSET);

% cc ..

warning: conversion of pointer loses bits

The following version will function correctly when compiled to both 32-bit and 64-bit targets:

char *p;

p = (char *) ((uintptr_t)p & PAGEOFFSET);

Check for Changes in Size of Long Integers With Respect to the Size of Plain Integers

Because integers and longs are never really distinguished in the ILP32 data-type model, your existing code probably uses them indiscriminately. Modify any code that uses integers and longs interchangeably so it conforms to the requirements of both the ILP32 and LP64 data-type models. While an integer and a long are both 32-bits in the ILP32 data-type model, a long is 64 bits in the LP64 data-type model.

Consider the following example:

int waiting;

long w_io;

long w_swap;

...

waiting = w_io + w_swap;

% cc

warning: assignment of 64-bit integer to 32-bit integer

Check for Sign Extensions

Sign extension is a common problem when you convert to the 64-bit compilation environment because the type conversion and promotion rules are somewhat obscure. To prevent sign-extension problems, use explicit casting to achieve the intended results.

To understand why sign extension occurs, it helps to understand the conversion rules for ISO C. The conversion rules that seem to cause the most sign extension problems between the 32-bit and the 64-bit compilation environment come into effect during the following operations:

    • Integral promotion

      • You can use a char, short, enumerated type, or bit-field, whether signed or unsigned, in any expression that calls for an integer. If an integer can hold all possible values of the original type, the value is converted to an integer; otherwise, the value is converted to an unsigned integer.

    • Conversion between signed and unsigned integers

      • When an integer with a negative sign is promoted to an unsigned integer of the same or larger type, it is first promoted to the signed equivalent of the larger type, then converted to the unsigned value.

When the following example is compiled as a 64-bit program, the addr variable becomes sign-extended, even though both addr and a.base are unsigned types.

%cat test.c

struct foo {

unsigned int base:19, rehash:13;

};

main(int argc, char *argv[])

{

struct foo a;

unsigned long addr;

a.base = 0x40000;

addr = a.base << 13; /* Sign extension here! */

printf("addr 0x%lx\n", addr);

addr = (unsigned int)(a.base << 13); /* No sign extension here! */

printf("addr 0x%lx\n", addr);

}

This sign extension occurs because the conversion rules are applied as follows:

  • The structure member a.base is converted from an unsigned int bit field to an int because of the integral promotion rule. In other words, because the unsigned 19-bit field fits within a 32-bit integer, the bit field is promoted to an integer rather than an unsigned integer. Thus, the expression a.base << 13 is of type int. If the result were assigned to an unsigned int, this would not matter because no sign extension has yet occurred.

  • The expression a.base << 13 is of type int, but it is converted to a long and then to an unsigned long before being assigned to addr, because of signed and unsigned integer promotion rules. The sign extension occurs when performing the int to long conversion.

Thus, when compiled as a 64-bit program, the result is as follows:

% cc -o test64 -xarch=v9 test.c

% ./test64

addr 0xffffffff80000000

addr 0x80000000

%

When compiled as a 32-bit program, the size of an unsigned long is the same as the size of an int, so there is no sign extension.

% cc -o test test.c

% ./test

addr 0x80000000

addr 0x80000000

%

Check Structure Packing

Check the internal data structures in an applications for holes; that is, extra padding appearing between fields in the structure to meet alignment requirements. This extra padding is allocated when long or pointer fields grow to 64 bits for the LP64 data-type model, and appear after an int that remains at 32 bits in size. Since long and pointer types are 64-bit aligned in the LP64 data-type model, padding appears between the int and long or pointer type. In the following example, member p is 64-bit aligned, and so padding appears between the member k and member p.

struct bar {

int i;

long j;

int k;

char *p;

}; /* sizeof (struct bar) = 32 bytes */

Also, structures are aligned to the size of the largest member within them. Thus, in the above structure, padding appears between member i and member j.

When you repack a structure, follow the simple rule of moving the long and pointer fields to the beginning of the structure. Consider the following structure definition:

struct bar {

char *p;

long j;

int i;

int k;

}; /* sizeof (struct bar) = 24 bytes */

Check for Unbalanced Size of Union Members

Be sure to check the members of unions because their fields can change size between the ILP32 and the LP64 data-type models, making the size of the members different. In the following union, member _d and member array _l are the same size in the ILP32 model, but different in the LP64 model because long types grow to 64 bits in the LP64 model, but double types do not.

typedef union {

double _d;

long _l[2];

} llx_

The size of the members can be rebalanced by changing the type of the _l array member from type long to type int.

Make Sure Constant Types are Used in Constant Expressions

A lack of precision can cause the loss of data in some constant expressions. Be explicit when you specify the data types in your constant expression. Specify the type of each integer constant by adding some combination of {u,U,l,L}. You can also use casts to specify the type of a constant expression. Consider the following example:

The above code can be made to work as intended, by appending the type to the constant, 1, as follows:

int i = 32;

long j = 1 << i; /* j will get 0 because RHS is integer expression */

int i = 32;

long j = 1L << i; /* now j will get 0x100000000, as intended */

Check Format String Conversions

Make sure the format strings for printf(3S), sprintf(3S), scanf(3S), and sscanf(3S) can accommodate long or pointer arguments. For pointer arguments, the conversion operation given in the format string should be %p to work in both the 32-bit and 64-bit compilation environments. For long arguments, the long size specification, l, should be prepended to the conversion operation character in the format string.

Also, check to be sure that buffers passed to the first argument in sprintf contain enough storage to accommodate the expanded number of digits used to convey long and pointer values. For example, a pointer is expressed by 8 hex digits in the ILP32 data model but expands to 16 in the LP64 data model.

Type Returned by sizeof() Operator is an unsigned long

In the LP64 data-type model, sizeof() has the effective type of an unsigned long. If sizeof() is passed to a function expecting an argument of type int, or assigned or cast to an int, the truncation could cause a loss of data. This is only likely to be problematic in large database programs containing extremely long arrays.

Use Portable Data Types or Fixed Integer Types for Binary Interface Data

For data structures that are shared between 32-bit and 64-bit versions of an application, stick with data types that have a common size between ILP32 and LP64 programs. Avoid using long data types and pointers. Also, avoid using derived data types that change in size between 32-bit and 64-bit applications. For example, the following types defined in <sys/types.h> change in size between the ILP32 and LP64 data models:

  • clock_t, which represents the system time in clock ticks

  • dev_t, which is used for device numbers

  • off_t, which is used for file sizes and offsets

  • ptrdiff_t, which is the signed integral type for the result of subtracting two pointers

  • size_t, which reflects the size, in bytes, of objects in memory

  • ssize_t, which is used by functions that return a count of bytes or an error indication

  • time_t, which counts time in seconds

Using the derived data types in <sys/types.h> is a good idea for internal data, because it helps to insulate the code from data-model changes. However, preccisely because the size of these types are prone to change with the data model, using them is not recommended in data that is shared between 32-bit and 64-bit applications, or in other situations where the data size must remain fixed. Nevertheless, as with the sizeof() operator discussed above, before making any changes to the code, consider whether the loss of precision will actually have any practical impact on the program.

For binary interface data, consider using the fixed-width integer types in <inttypes.h>. These types are good for explicit binary representations of the following:

  • Binary interface specifications

  • On-disk data

  • Over the data wire

  • Hardware registers

  • Binary data structures

Check for Side Effects

Be aware that a type change in one area can result in an unexpected 64-bit conversion in another area. For example, check all the callers of a function that previously returned an int and now returns an ssize_t.

Consider the Effect of long Arrays on Performance

Large arrays of long or unsigned long types, can cause serious performance degradation in the LP64 data-type model as compared to arrays of int or unsigned int types. Large arrays of long types cause significantly more cache misses and consume more memory. Therefore, if int works just as well as long for the application purposes, it's better to use int rather than long. This is also an argument for using arrays of int types instead of arrays of pointers. Some C applications suffer from serious performance degradation after conversion to the LP64 data-type model because they rely on many, large, arrays of pointers.