Does a dot have to be escaped in a character class (square brackets) of a regular expression?

A dot . in a regular expression matches any single character. In order for regex to match a dot, the dot has to be escaped: \.

It has been pointed out to me that inside square brackets [] a dot does not have to be escaped. For example, the expression: [.]{3} would match ... string.

Doesn't it, really? And if so, is it true for all regex standards?

Using the correct, or preferable, not equal operator in MySQL

Which of the two (semantically equivalent) ways is preferable to test for inequality?

  1. 'foo' != 'bar' (exclamation mark and equals sign)
  2. 'foo' <> 'bar' (less than and greater than chevron symbols together)

The MySQL documentation clearly indicates that there is no difference between them and yet some people seem to be attached to only doing it one way or the other. Maybe this is just another pointless vi vs. emacs debate but when other people are reading your code (and therefore your queries), it's useful to maintain some consistency.

<> looks a lot like <=> which is a very underused operator but could perhaps lead to confusion at a quick glance since the two are nearly opposite (except for the obvious NULL cases).

Gnu C++ macro __cplusplus standard conform?

The Gnu C++ compiler seems to define __cplusplus to be 1

#include <iostream> 
int main() {
  std::cout << __cplusplus << std::endl;

This prints 1 with gcc in standard c++ mode, as well as in C++0x mode, with gcc 4.3.4, and gcc 4.7.0.

The C++11 FDIS says in "16.8 Predefined macro names [cpp.predefined]" that

The name __cplusplus is defined to the value 201103L when compiling a C++ translation unit. (Footnote: It is intended that future versions of this standard will replace the value of this macro with a greater value. Non-conforming com- pilers should use a value with at most five decimal digits.)

The old std C++03 had a similar rule.

Is the GCC deliberatly setting this to 1, because it is "non-conforming"?

By reading through that list I thought that I could use __cplusplus to check in a portable way if I have a C++11 enabled compiler. But with g++ this does not seem to work. I know about the ...EXPERIMENTAL... macro, but got curious why g++ is defining __cplusplus this way.

My original problem was switch between different null-pointer-variants. Something like this:

#if __cplusplus > 201100L
#  define MYNULL nullptr
#  define MYNULL NULL

Is there a simple and reasonably portable way to implement such a switch?

What are the WONTFIX bugs on GNU/Linux and how to work around them?

Both Linux and the GNU userspace (glibc) seem to have a number of "WONTFIX" bugs, i.e. bugs which the responsible parties have declared their unwillingness to fix despite clearly violating the requirements of ISO C and/or POSIX, but I'm unaware of any resource for programmers which lists such bugs and suggestions for working around them.

Here are a few that come to mind:

  • The Linux UDP select bug: select (and related interfaces) flag a UDP socket file descriptor ready for reading as soon as a packet has been received, without confirming the checksum. On subsequent recv/read/etc., if the checksum was invalid, the call will block. Working around this requires always setting UDP sockets to non-blocking mode and dealing with the EWOULDBLOCK condition. If I remember correctly, MaraDNS was the first notable project affected by this bug and the first to complain (unsuccessfully) to have it fixed. Note: As pointed out by Martin v. Löwis, apparently this bug has since been fixed. Workarounds are probably only necessary if you need to support really outdated versions of Linux.
  • The printf family in the GNU C library wrongly treats arguments to %s as multibyte character strings instead of byte strings when a field precision (as in %.3s) is specified, potentially causing truncated output. I know of no workaround except replacing the whole printf subsystem (or simply not using the printf family of functions with non-multibyte-character byte strings, but this can be problematic if you want to process legacy-codepage strings using snprintf while in a UTF-8 locale).
  • Wrong errno result codes for certain syscalls (can't remember which ones right off). Usually these are easy enough to check for if you just read the GNU/Linux man pages and compare them to the standard. (I cannot find the references for this and perhaps I am mistaken. The closest I can find is the issue of ENOTSUP and EOPNOTSUP having the same value; see PDTR 24715.

What are some more bugs and workarounds we can add to this list? My goals in asking this question are:

  1. To build a more complete list of such bugs so that both new and experienced programmers can quickly become aware of potential issues that could arise when running an intended-to-be-portable program on GNU/Linux.
  2. To leverage the SO collective brain to think up clever and unobtrusive standard workarounds for as many such bugs as possible, instead of everyone having to invent their own workarounds after getting stung, and possibly doing so in suboptimal, ugly, or hackish ways - or worse yet, in ways that break support for more-conformant systems.

I am trying to maintain code that compiles on lots of different systems. I've seen a dozen different ways of asking for lseek that takes 64-bits. Some systems use lseek64, some use lseeko, some require that you define _FILE_OFFSET_BITS=64, and now I just found a new one that requires that you define __USE_FILE_OFFSET64.

Is there any standard to all of this?

Can a destructor be recursive?

Is this program well-defined, and if not, why exactly?

#include <iostream>
#include <new>
struct X {
    int cnt;
    X (int i) : cnt(i) {}
    ~X() {  
            std::cout << "destructor called, cnt=" << cnt << std::endl;
            if ( cnt-- > 0 )
                this->X::~X(); // explicit recursive call to dtor
int main()
    char* buf = new char[sizeof(X)];
    X* p = new(buf) X(7);
    p->X::~X();  // explicit call to dtor
    delete[] buf;

My reasoning: although invoking a destructor twice is undefined behavior, per 12.4/14, what it says exactly is this:

the behavior is undefined if the destructor is invoked for an object whose lifetime has ended

Which does not seem to prohibit recursive calls. While the destructor for an object is executing, the object's lifetime has not yet ended, thus it's not UB to invoke the destructor again. On the other hand, 12.4/6 says:

After executing the body [...] a destructor for class X calls the destructors for X's direct members, the destructors for X's direct base classes [...]

which means that after the return from a recursive invocation of a destructor, all member and base class destructors will have been called, and calling them again when returning to the previous level of recursion would be UB. Therefore, a class with no base and only POD members can have a recursive destructor without UB. Am I right?

On the std::abs function

Is the std::abs() function well defined for ALL arithmetic types in C++11 and will return |x| with no problem of approximation?

A weird thing is that with g++4.7, std::abs(char), std::abs(short int), std::abs(int), std::abs(long int) and std::abs(long long int) seem to return a double (on the contrary of : And if the number is casted to a double, we could have some approximation error for very large number (like -9223372036854775806LL = 2^63-3).

So do I have the guarantee that std::abs(x) will always return |x| for all arithmetic types ?

EDIT : here is an example program to make some tests

#include <iostream>
#include <iomanip>
#include <cmath>
#include <typeinfo>

template<typename T>
void abstest(T x)
    static const unsigned int width = 16;
    const T val = x;
    if (sizeof(val) == 1) {
        std::cout<<std::setw(width)<<static_cast<int>(val)<<" ";
        std::cout<<std::setw(width)<<static_cast<int>(std::abs(val))<<" ";
    } else {
        std::cout<<std::setw(width)<<val<<" ";
        std::cout<<std::setw(width)<<static_cast<T>(std::abs(val))<<" ";
    std::cout<<std::setw(width)<<sizeof(val)<<" ";
    std::cout<<std::setw(width)<<sizeof(std::abs(val))<<" ";
    std::cout<<std::setw(width)<<typeid(val).name()<<" ";

int main()
    double ref = -100000000000;
    abstest<short int>(ref);
    abstest<long int>(ref);
    abstest<long long int>(ref);
    abstest<signed char>(ref);
    abstest<signed short int>(ref);
    abstest<signed int>(ref);
    abstest<signed long int>(ref);
    abstest<signed long long int>(ref);
    abstest<unsigned char>(ref);
    abstest<unsigned short int>(ref);
    abstest<unsigned int>(ref);
    abstest<unsigned long int>(ref);
    abstest<unsigned long long int>(ref);
    abstest<long double>(ref);
    return 0;
RegEx to parse or validate Base64 data

Is it possible to use a RegEx to validate, or sanitize Base64 data? That's the simple question, but the factors that drive this question are what make it difficult.

I have a Base64 decoder that can not fully rely on the input data to follow the RFC specs. So, the issues I face are issues like perhaps Base64 data that may not be broken up into 78 (I think it's 78, I'd have to double check the RFC, so don't ding me if the exact number is wrong) character lines, or that the lines may not end in CRLF; in that it may have only a CR, or LF, or maybe neither.

So, I've had a hell of a time parsing Base64 data formatted as such. Due to this, examples like the following become impossible to decode reliably. I will only display partial MIME headers for brevity.

Content-Transfer-Encoding: base64


Ok, so parsing that is no problem, and is exactly the result we would expect. And in 99% of the cases, using any code to at least verify that each char in the buffer is a valid base64 char, works perfectly. But, the next example throws a wrench into the mix.

Content-Transfer-Encoding: base64

This a version of Base64 encoding that I have seen in some viruses and other things that attempt to take advantage of some mail readers desire to parse mime at all costs, versus ones that go strictly by the book, or rather RFC; if you will.

My Base64 decoder decodes the second example to the following data stream. And keep in mind here, the original stream is all ASCII data!


Anyone have a good way to solve both problems at once? I'm not sure it's even possible, outside of doing two transforms on the data with different rules applied, and comparing the results. However if you took that approach, which output do you trust? It seems that ASCII heuristics is about the best solution, but how much more code, execution time, and complexity would that add to something as complicated as a virus scanner, which this code is actually involved in? How would you train the heuristics engine to learn what is acceptable Base64, and what isn't?


Do to the number of views this question continues to get, I've decided to post the simple RegEx that I've been using in a C# application for 3 years now, with hundreds of thousands of transactions. Honestly, I like the answer given by Gumbo the best, which is why I picked it as the selected answer. But to anyone using C#, and looking for a very quick way to at least detect whether a string, or byte[] contains valid Base64 data or not, I've found the following to work very well for me.


And yes, this is just for a STRING of Base64 data, NOT a properly formatted RFC1341 message. So, if you are dealing with data of this type, please take that into account before attempting to use the above RegEx. If you are dealing with Base16, Base32, Radix or even Base64 for other purposes (URLs, file names, XML Encoding, etc.), then it is highly recommend that you read RFC4648 that Gumbo mentioned in his answer as you need to be well aware of the charset and terminators used by the implementation before attempting to use the suggestions in this question/answer set.

Valid characters for URI schemes?

I was thinking about Registering an Application to a URL Protocol and I'd like to know, what characters are allowed in a scheme?

Some examples:

  • h323 (has numbers)
    • h323:[<user>@]<host>[:<port>][;<parameters>]
  • z39.50r (has a . as well)
    • z39.50r://<host>[:<port>]/<database>?<docid>[;esn=<elementset>][;rs=<recordsyntax>]
  • paparazzi:http (has a :)
    • paparazzi:http:[//<host>[:[<port>][<transport>]]/

So, what characters can I fancy using?
Can we have...

  • @:TwitterUser
  • #:HashTag
  • $:CapitalStock
  • ?:ID-10T

...etc., as desired, or characters in the scheme are restricted by standard?

Should I learn XML 1.0 or XML 1.1?

I know that a well-formed XML 1.1 is not necessarily a well-formed XML 1.0 and vice-versa.

I want to learn xml formally and i was wondering whether i should learn XML 1.0 or XML 1.1? I mean would it be more effective to learn XML 1.0 or would it be more effective to learn XML 1.1?

I mean of course I know its best to read them both.. but i really only have the time to read one of them, so which would be "better" (more useful to me, me as in the average programmer)?

How universally is C99 supported?

How universally is the C99 standard supported in today's compilers? I understand that not even GCC fully supports it. Is this right?

Which features of C99 are supported more than others, i.e. which can I use to be quite sure that most compilers will understand me?

Narrowing conversion to bool in list-initialization - strange behaviour

Consider this piece of C++11 code:

#include <iostream>

struct X
    X(bool arg) { std::cout << arg << '\n'; }

int main() 
    double d = 7.0;
    X x{d};

There's a narrowing conversion from a double to a bool in the initialization of x. According to my understanding of the standard, this is ill-formed code and we should see some diagnostic.

Visual C++ 2013 issues an error:

error C2398: Element '1': conversion from 'double' to 'bool' requires a narrowing conversion

However, both Clang 3.5.0 and GCC 4.9.1, using the following options

-Wall -Wextra -std=c++11 -pedantic 

compile this code with no errors and no warnings. Running the program outputs a 1 (no surprise there).

Now, let's go deeper into strange territory.

Change X(bool arg) to X(int arg) and, suddenly, we've got an error from Clang

error: type 'double' cannot be narrowed to 'int' in initializer list [-Wc++11-narrowing]

and a warning from GCC

warning: narrowing conversion of 'd' from 'double' to 'int' inside { } [-Wnarrowing]

This looks more like what I was expecting.

Now, keep the bool constructor argument (that is, revert to X(bool arg)), and change double d = 7.0; to int d = 7;. Again, a narrowing error from Clang, but GCC doesn't issue any diagnostic at all and compiles the code.

There are a few more behaviour variants that we can get if we pass the constant directly to the constructor, some strange, some expected, but I won't list them here - this question is getting too long as it is.

I'd say this is one of the rare cases when VC++ is right and Clang and GCC are wrong when it comes to standard-conformance, but, given the respective track records of these compilers, I'm still very hesitant about this.

What do the experts think?

Standard references (quotes from the final standard document for C++11, ISO/IEC 14882-2011):

In 8.5.4 [dcl.init.list] paragraph 3, we have:

— Otherwise, if T is a class type, constructors are considered. The applicable constructors are enumerated and the best one is chosen through overload resolution (13.3, If a narrowing conversion (see below) is required to convert any of the arguments, the program is ill-formed.

In the same section, in paragraph 7, we have:

A narrowing conversion is an implicit conversion
— from a floating-point type to an integer type, or
— from long double to double or float, or from double to float, except where the source is a constant expression and the actual value after conversion is within the range of values that can be represented (even if it cannot be represented exactly), or
— from an integer type or unscoped enumeration type to a floating-point type, except where the source is a constant expression and the actual value after conversion will fit into the target type and will produce the original value when converted back to the original type, or
— from an integer type or unscoped enumeration type to an integer type that cannot represent all the values of the original type, except where the source is a constant expression and the actual value after conversion will fit into the target type and will produce the original value when converted back to the original type.
[ Note: As indicated above, such conversions are not allowed at the top level in list-initializations.—end note ]

In 3.9.1 [basic.fundamental] paragraph 7, we have:

Types bool, char, char16_t, char32_t, wchar_t, and the signed and unsigned integer types are collectively called integral types.48 A synonym for integral type is integer type.

(I was starting to question everything at this stage...)

Java reflection: Is the order of class fields and methods standardized?

Using reflection on Java classes to access all field, methods, and so on:
Is there a standardized order of these elements (which is specified in some standard)?

Of course, I could check it empirically, but I need to know if it's always the same.

I waited for the question: What I need the order for ;)
Long story short: I have JAXB-annotated classes, and want no represent these classes visually. While the order of XML attributes is neither relevant for the XML standard, nor for JAXB, I want to have a certain order the XML attributes for the visual representation.
For example: start comes after end. This hurts one's intuition.

How do I provide a default implementation for an Objective-C protocol?

I'd like to specify an Objective-C protocol with an optional routine. When the routine is not implemented by a class conforming to the protocol I'd like to use a default implementation in its place. Is there a place in the protocol itself where I can define this default implementation? If not, what is the best practice to reduce copying and pasting this default implementation all over the place?

Do the JSON keys have to be surrounded by quotes?

Example: Is the following code valid against the JSON Spec?

    precision: "zip"

Or should I always use the following syntax? (And if so, why?)

    "precision": "zip"

I haven't really found something about this in the JSON specifications. Although they use quotes around their keys in their examples.

What's the difference between "dead code" and "unreachable code"?

I thought those terms where synonymous, but a note in MISRA regarding dead code indicates this to be wrong? What's the difference? Is one a subset of the other?

Does constexpr imply inline?

Consider the following inlined function :

// Inline specifier version

inline int f(const int x);

inline int f(const int x)
    return 2*x;

int main(int argc, char* argv[])
    return f(std::atoi(argv[1]));

and the constexpr equivalent version :

// Constexpr specifier version

constexpr int f(const int x);

constexpr int f(const int x)
    return 2*x;

int main(int argc, char* argv[])
    return f(std::atoi(argv[1]));

My question is : does the constexpr specifier imply the inline specifier in the sense that if a non-constant argument is passed to a constexpr function, the compiler will try to inline the function as if the inline specifier was put in its declaration ?

Does the C++11 standard guarantee that ?

Why class { int i; }; is not fully standard-conformant?

This is a follow-up question.

In the previous question, @JohannesSchaub-litb said that the following code is not fully standard-conformant:

class { int i; };  //unnamed-class definition. § 9/1 allows this!

and then he added,

while it is grammatically valid, it breaks the rule that such a class must declare at least one name into its enclosing scope.

I couldn't really understand this. What name is he talking about?

Could anyone elaborate on this further (preferably quoting the Standard)?

When does invoking a member function on a null instance result in undefined behavior?

Consider the following code:

#include <iostream>

struct foo
    // (a):
    void bar() { std::cout << "gman was here" << std::endl; }

    // (b):
    void baz() { x = 5; }

    int x;

int main()
    foo* f = 0;

    f->bar(); // (a)
    f->baz(); // (b)

We expect (b) to crash, because there is no corresponding member x for the null pointer. In practice, (a) doesn't crash because the this pointer is never used.

Because (b) dereferences the this pointer ((*this).x = 5;), and this is null, the program enters undefined behavior, as dereferencing null is always said to be undefined behavior.

Does (a) result in undefined behavior? What about if both functions (and x) are static?

Is main() really start of a C++ program?

The section $3.6.1/1 from the C++ Standard reads,

A program shall contain a global function called main, which is the designated start of the program.

Now consider this code,

int square(int i) { return i*i; }
int user_main()
    for ( int i = 0 ; i < 10 ; ++i )
           std::cout << square(i) << endl;
    return 0;
int main_ret= user_main();
int main() 
        return main_ret;

This sample code does what I intend it to do, i.e printing the square of integers from 0 to 9, before entering into the main() function which is supposed to be the "start" of the program.

I also compiled it with -pedantic option, GCC 4.5.0. It gives no error, not even warning!

So my question is,

Is this code really Standard conformant?

If it's standard conformant, then does it not invalidate what the Standard says? main() is not start of this program! user_main() executed before the main().

I understand that to initialize the global variable main_ret, the use_main() executes first but that is a different thing altogether; the point is that, it does invalidate the quoted statement $3.6.1/1 from the Standard, as main() is NOT the start of the program; it is in fact the end of this program!


How do you define the word 'start'?

It boils down to the definition of the phrase "start of the program". So how exactly do you define it?

Why does MySQL allow "group by" queries WITHOUT aggregate functions?

Surprise -- this is a perfectly valid query in MySQL:

select X, Y from someTable group by X

If you tried this query in Oracle or SQL Server, you’d get the natural error message:

Column 'Y' is invalid in the select list because it is not contained in 
either an aggregate function or the GROUP BY clause.

So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.

There’s even a MySQL configuration parameter to turn off this “looseness”.

This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard.

My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?

What is the '-->' operator in C/C++?

After reading Hidden Features and Dark Corners of C++/STL on comp.lang.c++.moderated, I was completely surprised that the following snippet compiled and worked in both Visual Studio 2008 and G++ 4.4. I would assume this is also valid C since it works in GCC as well.

Here's the code:

#include <stdio.h>
int main()
    int x = 10;
    while (x --> 0) // x goes to 0
        printf("%d ", x);


9 8 7 6 5 4 3 2 1 0

Where is this defined in the standard, and where has it come from?

Is it valid to have a html form inside another html form?

Is it valid html to have the following:

<form action="a">
    <form action="b">

So when you submit "b" you only get the fields within the inner form. When you submit "a" you get all fields minus those within "b".

If it isn't possible, what workarounds for this situation are available?

Do I have the guarantee that sizeof(type) == sizeof(unsigned type)?

The sizeof char, int, long double... can vary from one compiler to another. But do I have the guarantee according to the C++11 or C11 standard that the size of any signed and unsigned fundamental integral type is the same ?

Can an HTML element have multiple ids?

I understand that an id must be unique within an HTML/XHTML page.

For a given element, can I assign multiple ids to it?

<div id="nested_element_123 task_123"></div>

I realize I have an easy solution with simply using a class. I'm just curious about using ids in this manner.

C++ new int[0] -- will it allocate memory?

A simple test app:

cout << new int[0] << endl;



So it looks like it works. What does the standard say about this? Is it always legal to "allocate" empty block of memory?

Clean way to launch the web browser from shell script?

In a bash script, I need to launch the user web browser. There seems to be many ways of doing this:

  • xdg-open
  • gnome-open on GNOME
  • www-browser
  • x-www-browser
  • ...

Is there a more-standard-than-the-others way to do this that would work on most platforms, or should I just go with something like this:

#/usr/bin/env bash

if [ -n $BROWSER ]; then
elif which xdg-open > /dev/null; then
  xdg-open ''
elif which gnome-open > /dev/null; then
  gnome-open ''
# elif bla bla bla...
  echo "Could not detect the web browser to use."
What is going on with 'gets(stdin)' on the site coderbyte?

Coderbyte is an online coding challenge site (I found it just 2 minutes ago).

The first C++ challenge you are greeted with has a C++ skeleton you need to modify:

#include <iostream>
#include <string>
using namespace std;

int FirstFactorial(int num) {

  // Code goes here
  return num;


int main() {

  // Keep this function call here
  cout << FirstFactorial(gets(stdin));
  return 0;


If you are little familiar with C++ the first thing* that pops in your eyes is:

int FirstFactorial(int num);
cout << FirstFactorial(gets(stdin));

So, ok, the code calls gets which is deprecated since C++11 and removed since C++14 which is bad in itself.

But then I realize: gets is of type char*(char*). So it shouldn't accept a FILE* parameter and the result shouldn't be usable in the place of an int parameter, but ... not only it compiles without any warnings or errors, but it runs and actually passes the correct input value to FirstFactorial.

Outside of this particular site, the code doesn't compile (as expected), so what is going on here?

*Actually the first one is using namespace std but that is irrelevant to my issue here.

Declaration of Methods should be Compatible with Parent Methods in PHP
Strict Standards: Declaration of childClass::customMethod() should be compatible with that of parentClass::customMethod()

What are possible causes of this error in PHP? Where can I find information about what it means to be compatible?

Most efficient standard-compliant way of reinterpreting int as float

Assume I have guarantees that float is IEEE 754 binary32. Given a bit pattern that corresponds to a valid float, stored in std::uint32_t, how does one reinterpret it as a float in a most efficient standard compliant way?

float reinterpret_as_float(std::uint32_t ui) {
   return /* apply sorcery to ui */;

I've got a few ways that I know/suspect/assume have some issues:

  1. Via reinterpret_cast,

    float reinterpret_as_float(std::uint32_t ui) {
        return reinterpret_cast<float&>(ui);

    or equivalently

    float reinterpret_as_float(std::uint32_t ui) {
        return *reinterpret_cast<float*>(&ui);

    which suffers from aliasing issues.

  2. Via union,

    float reinterpret_as_float(std::uint32_t ui) {
        union {
            std::uint32_t ui;
            float f;
        } u = {ui};
        return u.f;

    which is not actually legal, as it is only allowed to read from most recently written to member. Yet, it seems some compilers (gcc) allow this.

  3. Via std::memcpy,

    float reinterpret_as_float(std::uint32_t ui) {
        float f;
        std::memcpy(&f, &ui, 4);
        return f;

    which AFAIK is legal, but a function call to copy single word seems wasteful, though it might get optimized away.

  4. Via reinterpret_casting to char* and copying,

    float reinterpret_as_float(std::uint32_t ui) {
        char* uip = reinterpret_cast<char*>(&ui);
        float f;
        char* fp = reinterpret_cast<char*>(&f);
        for (int i = 0; i < 4; ++i) {
            fp[i] = uip[i];
        return f;

    which AFAIK is also legal, as char pointers are exempt from aliasing issues and manual byte copying loop saves a possible function call. The loop will most definitely be unrolled, yet 4 possibly separate one-byte loads/stores are worrisome, I have no idea whether this is optimizable to single four byte load/store.

The 4 is the best I've been able to come up with.

Am I correct so far? Is there a better way to do this, particulary one that will guarantee single load/store?