Parsing Numbers from Strings in 5 Languages

In this series, we look at common practices and paradigms in 5 popular programming and scripting languages (JavaScript, Node.js, C++, Java, and PHP). We've separated JavaScript and Node.js to cover both ES5 (browser-safe) and ES6 (node-safe) features. For some entries in the series, a given language may not support a topic--in which case we'll attempt to find the closest analogue. This series is aimed at those with existing programming experience who are expected to have an existing, basic understanding of the topics covered, and will not server as an "in-depth" guide, but will, instead, provide a state overview for each language, and provide links to additional resources for more information.

In this post we're going to look at the functions which can parse numeric values from strings in our 5 languages.

Jump to:

JavaScript (Including Node.js)

JavaScript provides the global parseInt() and parseFloat() functions to extract numbers from strings. It's worth noting, before going any further, that all numbers in JavaScript are of the same type (Number), and the distinction here between integers and floating point numbers only applies to the string representations you are parsing.

Both functions will extract leading numeric values from string content, and will ignore any non-numeric characters which follow. If a string starts with a non-numeric character (irrespective of whether or not numeric characters appear later in the string), both functions will return NaN (not a number). However, these functions are not quite as simple as they might first appear.

When using parseInt(), a second argument indicating the base of the number being parsed should always be used. If this is not provided, it will usually default to base 10 (decimal), but this is not guaranteed. Bases from 2 to 36 can be used. For hexadecimal values, the second argument can be omitted if the number to be parsed is prepended with 0x (in older versions of JavaScript, numbers prepended with a 0 would, also, be recognised as octal, but ES5 removed this feature):

parseInt("11", 2);  // 3 (parsed as binary)
parseInt("11", 8);  // 9 (parsed as octal)
parseInt("11", 10); // 11 (parsed as decimal)
parseInt("11", 16); // 17 (parsed as hexadecimal)
parseInt("0x11");   // 17 (parsed as hexadecimal)

This considered, it's worth remembering that, depending on the base, the term numeric values may not refer to the digits 0 through 9, but may only be a subset of that range, or may also include case-insensitive alphabetic letters:

parseInt("12", 2) ; // 1 (1 is the only numeric part of this string)
parseInt("2F", 16); // 47 (2 and F are both valid numeric values here)

In addition, the parseFloat() function can recognise numbers starting with a period, which it will interpret as 0.[number], and can recognise exponential values in the format [number]e[exponent]--if parseInt() was used on this pattern it would only interpret up to the e. parseFloat() can also interpret the Infinity keyword:

parseFloat(".5");       // 0.5
parseFloat("10e5");     // 1000000
parseFloat("Infinity"); // Infinity
parseInt("10e5", 10);   // 10 (up to the e)

The code snippet below demonstrates the use of these functions on a wide range of values. The comment after each function call indicates the return value:


// parseInt() simple usage
parseInt("-42", 10); // -42
parseInt("0",   10): // 0
parseInt("42",  10); // 42

// parseFloat() simple usage
parseFloat("-42.5"); // -42.5
parseFloat("0");     // 0
parseFloat("42.5");  // 42.5

// Mixed content
parseInt("42 foo", 10); // 42
parseInt("foo 42", 10); // NaN
parseFloat("42.5 foo"); // 42.5
parseFloat("foo 42.5"); // NaN
parseInt("42.5", 10)    // 42 (up to the period)
parseFloat("42")        // 42 

// Hexadecimal values
parseInt("0xFF");     // 255 (0x indicated base 16)
parseInt("-0xFF");    // -255 (0x indicated base 16)
parseInt("FF", 16);   // 255 (base 16 is explicit, not need for leading 0x)
parseInt("0xFF", 10); // 0 (base 10 is explicit, parsed up to the x)
parseFloat("0xFF");   // 0 (up to the x)
parseFloat("-0xFF");  // -0 (up to the x)...and yes, -0 is a thing

// Exponential values
parseFloat("10e5");    // 1000000
parseFloat("-10e5");   // -1000000
parseInt("10e5", 10);  // 10 (up to the e)
parseInt("-10e5", 10); // -10 (up to the e)

// Infinity
parseFloat("Infinity");  // Infinity
parseFloat("-Infinity"); // -Infinity
parseInt("Infinity");    // NaN
parseInt("-Infinity");   // NaN

// Leading period
parseFloat(".5");  // 0.5
parseFloat("-.5"); // -0.5
parseInt(".5");    // NaN

The global Number object constructor (used without the new keyword) can also be used to parse leading numbers from strings, and will succeed for cases covered by both parseInt() and parseFloat():

Number("42");       // 42
Number("-42.5");    // -42.5
Number("Infinity"); // Infinity
Number(".5");       // 0.5
Number("0xFF");     // 255
Number("10e5");     // 1000000
Number("foo 42");   // NaN

The are no interface differences for these methods between ES5 and ES6, which is why we have bundled both JavaScript and Node.js into one section.

More information:

C++

The C++ string library provides a number of string parsing functions (in the standard name-space), including:

FunctionUse
std::stoi()Parse int from string
std::stol()Parse long from string
std::stoll()Parse long long from string
std::stof()Parse float from string
std::stod()Parse double from string
std::stold()Parse long double from string

To use these functions, you must include the string header:

#include <string>

All of these functions will extract leading numeric values from string content, and will ignore any non-numeric characters which follow. If a string starts with a non-numeric character (irrespective of whether or not numeric characters appear later in the string), all functions will throw an invalid_argument exceptions. If the parsed number exceeds the capacity of a function's return type (e.g. attempting to parse a number bigger than 4 bytes as an integer), an out_of_range exception will be thrown. Additionally, white space at the leading edge of a string will be ignored:

std::stoi("   42"); // 42 (white space ignored)

All of these functions accept a second, optional argument which can be used to extract the index of the first string character which immediately follows the parsed number value. This argument should be a reference to a std::size_t variable:

std::size_t next_index;
std::string value = "42 foo";

std::stoi(value, &next_index); // 42
std::cout << std::next_index << std::endl; // 2 (the number value occupied indexes 0 and 1)

When using the integer functions (stoi(), stol() and stoll()), an optional third argument can be passed to specify the base of the number being parsed. If this argument is not used, the base will default to 10 (decimal):

std::stoi("11", nullptr, 2);  // 3 (parsed as binary)
std::stoi("11", nullptr, 8);  // 9 (parsed as octal)
std::stoi("11", nullptr, 10); // 11 (parsed as decimal)
std::stoi("11", nullptr, 16); // 17 (parsed as hexadecimal)

This considered, it's worth remembering that, depending on the base, the term numeric values may not refer to the digits 0 through 9, but may only be a subset of that range, or may also include case-insensitive alphabetic letters:

std::stoi("12", nullptr,  2); // 1 (1 is the only numeric part of this string)
std::stoi("2F", nullptr, 16); // 47 (2 and F are both valid numeric values here)

While the integer functions can interpret numbers with leading zeros as octal, and numbers with a leading 0x as hexadecimal, they can only do so when the base is explicitly set, or is 0:

std::stoi("011");             // 11 (base is implicitly 10, leading 0 ignored)
std::stoi("011", nullptr, 0); // 9 (parsed as octal)
std::stoi("011", nullptr, 8); // 9 (parsed as octal)

std::stoi("0xFF");              // 0 (base is implicitly 10, parsed up to the x)
std::stoi("0xFF", nullptr, 0);  // 255 (parsed as hexadecimal)
std::stoi("0xFF", nullptr, 16); // 255 (parsed as hexadecimal)

The floating point functions (stof(), stod(), stold()) can recognise numbers starting with a period, which they will interpret as 0.[number], and can recognise exponential values in the format [number]e[exponent]--if an integer parsing function was used on this pattern it would only interpret up to the e:

std::stof(".5");   // 0.5
std::stof("10e5"); // 1000000
std::stoi("10e5"); // 10 (up to the e)

The floating point functions can also parse the INF and INFINITY keywords to represent a type's infinity representation.

The code snippet below demonstrates the use of the stoi() and stof() functions on a wide range of values. The comment after each function call indicates the return value:

// stoi() simple values
std::stoi("-42"); // -42
std::stoi("0");   // 0
std::stoi("42");  // 42

// stof() simple values
std::stof("-42.5"); // -42.5
std::stof("0");     // 0
std::stof("42.5");  // 42.5

// Mixed content
std::stoi("42 foo");   // 42
std::stoi("foo 42");   // invalid_argument exception thrown
std::stof("42.5 foo"); // 42.5
std::stof("foo 42.5"); // invalid_argument exception thrown
std::stoi("42.5");     // 42 (up to the period)
std::stof("42");       // 42
std::stoi("    42");   // 42 (leading white space ignored)
std::stof("  42.5");   // 42.5 (leading white space ignored)

// Non-decimal values
std::stoi("FF", nullptr, 16);  // 255
std::stoi("-FF", nullptr, 16); // -255
std::stoi("FF", nullptr, 10);  // invalid_argument exception thrown
std::stoi("FF");               // invalid_argument exception thrown (base defaults to 10)
std::stoi("1F");               // 1 (up to the F)

// Exponential Values
std::stof("10e5");  // 1000000
std::stof("-10e5"); // -1000000
std::stoi("10e5");  // 10 (up to the e)
std::stoi("-10e5"); // -10 (up to the e)

// Leading period
std::stof(".5");  // 0.5
std::stof("-.5"); // -0.5
std::stoi(".5");  // invalid_argument exception thrown

// Range overrun
std::stoi("2147483647"); // 2147483647 (maximum positive value for a signed int)
std::stoi("2147483648"); // out_of_range exception thrown (1 larger than maximum positive value for a signed int)

More information:

Java

Java's primitive number object wrappers (e.g. Integer, Float) provide static string parsing methods which return primitive values. The range of parsing methods available include:

FunctionUse
Short.parseShort()Parse short string representation
Integer.parseInt()Parse int string representation
Long.parseLong()Parse long string representation
Float.parseFloat()Parse float string representation
Double.parseDouble()Parse double string representation

When compared to the parsing methods available in other languages (C++, as example), Java's methods are a little hampered in that the strings you wish to parse can only contain numeric values (i.e. these methods will not extract leading numbers from a larger string). If a string contains non-numeric characters, a NumberFormatException will be thrown:

Integer.parseInt("42 foo"); // throws NumberFormatException

If you need to parse a leading number from a larger string, you can instead use a NumberFormat instance's parsing method. This method will return a Number object--from this, you can get a primitive value with the object's [primitiveType]Value() methods:

// Parse any leading number
NumberFormat.getInstance().parse("42.5 foo").floatValue(); // 42.5

// Parse a leading integer
NumberFormat.getIntegerInstance().parse("42.5 foo").intValue(); // 42 (up to the period)

When using the integer functions (Short.parseShort(), Integer.parseInt(), Long.parseLong()), an optional second argument can be passed to specify the base of the number being parsed. If this argument is not used, the base will default to 10 (decimal):

Integer.parseInt("11", 2);  // 3 (parsed as binary)
Integer.parseInt("11", 8);  // 9 (parsed as octal)
Integer.parseInt("11", 10); // 11 (parsed as decimal)
Integer.parseInt("11", 16); // 17 (parsed as hexadecimal)

This considered, it's worth remembering that, depending on the base, the term numeric values may not refer to the digits 0 through 9, but may only be a subset of that range, or may also include case-insensitive alphabetic letters:

Integer.parseInt("12",  2); // 1 (1 is the only numeric part of this string)
Integer.parseInt("2F", 16); // 47 (2 and F are both valid numeric values here)

The integer parsing functions can not interpret values with a leading zero (as octal) or a leading 0x (as hexadecimal), even when the base is explicitly set. However, the Integer.decode() method can interpret these values:

Integer.decode("011");  // 9 (parsed as octal)
Integer.decode("11")    // 11 (parsed as decimal)
Integer.decode("0x11"); // 17 (parsed as hexadecimal)

The floating point functions (Float.parseFloat(), Double.parseDouble()) can recognise numbers starting with a period, which they will interpret as 0.[number], and can recognise exponential values in the format [number]e[exponent]:

Float.parseFloat(".5");   // 0.5
Float.parseFloat("10e5"); // 1000000.0

The code snippet below demonstrates the use of the Integer.parseInt(), Float.parseFloat(), and Integer.decode() functions, and the NumberFormat instance parse() method on a wide range of values. The comment after each function call indicates the return value:

// Integer.parseInt() simple values
Integer.parseInt("-42"); // -42
Integer.parseInt("0");   // 0
Integer.parseInt("42");  // 42

// Float.parseFloat simple values
Float.parseFloat("-42.5"); // -42.5
Float.parseFloat("0");     // 0
Float.parseFloat("42.5");  // 42.5

// Mixed content (primitive wrappers)
Integer.parseInt("42 foo");   // NumberFormatException thrown
Float.parseFloat("42.5 foo"); // NumberFormatException thrown

// Mixed content (NumberFormat)
NumberFormat.getIntegerInstance().parse("42 foo").intValue(); // 42
NumberFormat.getIntegerInstance().parse("foo 42").intValue(); // NumberFormatException thrown
NumberFormat.getIntegerInstance().parse("42.5").intValue();   // 42 (up to the period)
NumberFormat.getInstance().parse("42.5 foo").floatValue();    // 42.5
NumberFormat.getInstance().parse("foo 42.5").floatValue();    // NumberFormatException thrown
NumberFormat.getInstance().parse("42").floatValue();          // 42.0

// Non-decimal values (Integer.parseInt)
Integer.parseInt("FF",  16); // 255
Integer.parseInt("-FF", 16); // -255
Integer.parseInt("FF",  10); // NumberFormatException thrown
Integer.parseInt("FF");      // NumberFormatException thrown (base defaults to 10)
Integer.parseInt("1F");      // NumberFormatException thrown (F is not a base 10 numeric value)

// Non-decimal values (Integer.decode)
Integer.decode("011");  // 9 (parsed as octal)
Integer.decode("0xFF"); // 255 (parsed as hexadecimal)
Integer.decode("11");   // 11 (parsed as decimal)
Integer.decode("FF");   // NumberFormatException thrown (decimal is implicit, not a numeric value)

// Exponential Values
Float.parseFloat("10e5");  // 1000000
Float.parseFloat("-10e5"); // -1000000
Integer.parseInt("10e5");  // NumberFormatException thrown

// Leading period
Float.parseFloat(".5");  // 0.5
Float.parseFloat("-.5"); // -0.5

More information:

PHP

PHP provides the global intval() and floatval() functions to extract numbers from strings. It's worth noting that PHP also provides a doubleval() parsing function, but as floats and doubles are actually the same type in PHP (they're all doubles under the hood), this function is just an alias of the floatval() function.

Both of these functions will extract leading numeric values from string content, and will ignore any non-numeric characters which follow. If a string starts with a non-numeric character (irrespective of whether or not numeric characters appear later in the string), both functions will return 0 (in most circumstances; PHP's string conversion rules apply). Additionally, white space at the leading edge of a string will be ignored:

intval("    42"); // 42 (leading white space is ignored)

When using intval(), a second, optional argument indicating the base of the number being parsed can be used. If this is not used, the base will default to 10 (decimal):

intval("11", 2);  // 3 (parsed as binary)
intval("11", 8);  // 9 (parsed as octal)
intval("11", 10); // 11 (parsed as decimal)
intval("11", 16); // 17 (parsed as hexadecimal)

This considered, it's worth remembering that, depending on the base, the term numeric values may not refer to the digits 0 through 9, but may only be a subset of that range, or may also include case-insensitive alphabetic letters:

intval("12",  2); // 1 (1 is the only numeric part of this string)
intval("2F", 16); // 47 (2 and F are both valid numeric values here)

While intval() can interpret numbers with a leading zero as octal, and numbers with a leading 0x as hexadecimal, it can only do so when the base is explicitly set, or is 0:

intval("011");    // 11 (base is implicitly 10, leading 0 ignored)
intval("011", 0); // 9 (parsed as octal)
intval("011", 8); // 9 (parsed as octal)

intval("0xFF");     // 0 (up to the x)
intval("0xFF", 0);  // 255 (parsed as hexadecimal)
intval("0xFF", 16); // 255 (parsed as hexadecimal)

The floatval() function can recognise numbers starting with a period, which it will interpret as 0.[number], and can recognise exponential values in the format [number]e[exponent]--if the intval() function was used on this pattern it would only interpret up to the e:

floatval(".5");   // 0.5
floatval("10e5"); // 1000000
intval("10e5");   // 10 (up to the e)

The code snippet below demonstrates the use of the intval() and floatval() functions on a wide range of values. The comment after each function call indicates the return value:

// intval() simple values
intval("-42"); // -42
intval("0");   // 0
intval("42");  // 42

// floatval() simple values
floatval("-42.5"); // -42.5
floatval("0");     // 0
floatval("42.5");  // 42.5

// Mixed content
intval("42 foo");     // 42
intval("foo 42");     // 0 (cannot be parsed)
floatval("42.5 foo"); // 42.5
floatval("foo 42.5"); // 0 (cannot be parsed)
intval("42.5");       // 42 (up to the period)
floatval("42");       // 42
intval("    42");     // 42 (leading white space ignored)
floatval("  42.5");   // 42.5 (leading white space ignored)

// Non-decimal values
intval("FF", 16);  // 255
intval("-FF", 16); // -255
intval("FF", 10);  // 0 (cannot be parsed)
intval("FF");      // 0 (cannot be parsed)
intval("1F");      // 1 (up to the F)

// Exponential Values
floatval("10e5");  // 1000000
floatval("-10e5"); // -1000000
intval("10e5");    // 10 (up to the e)
intval("-10e5");   // -10 (up to the e)

// Leading period
floatval(".5");  // 0.5
floatval("-.5"); // -0.5

More information: