C++字符串std::string切片split()方法

众所周知,C++的字符串没有按字符char进行split的函数，所以需要自己写切片函数。
下面介绍几种方法：

1. 使用std::string的find函数进行循环处理。(C++11)

关键函数:

std::basic_string<CharT,Traits,Allocator>::find函数簇（find,rfind,find_first_of）

size_type find( const basic_string& str, size_type pos = 0 );
size_type find( CharT ch, size_type pos = 0 );
param: 1.str - 要搜索的string 2. pos - 开始搜索的位置 3. ch - 要搜索的字符

std::basic_string<CharT,Traits,Allocator>::substr函数簇c++98引入

basic_string substr( size_type pos = 0, size_type count = npos )
param: 1.pos - 要包含的首个字符的位置 2. count - 子串的长度

#include <iostream>
#include <string>
#include <vector>

void split(const std::string & rawData,std::vector<std::string> & Result, char sep)
{
    int pos = 0;
    while(1)//找不到就是-1
    {
        int nextPos = rawData.find(sep, pos);
        if (nextPos == -1)
        {
            Result.push_back(rawData.substr(pos));
            break;
        }
        Result.push_back(rawData.substr(pos, nextPos - pos));// 如果nextPos == std::npos(-1), 则会返回数据至尾部
        pos = nextPos + 1;//要跳过分隔符
    }
}

int main(void)
{
    std::string Raw("123,456,789");
    std::vector<std::string> Res;
    split(Raw,Res, ',');
    for (auto data:Res)
    {
        std::cout << data << std::endl;
    }
    system("pause");
    return 0;
}

输出为: 
123
456
789

2. 使用std::getline()函数进行处理

关键函数:

std::basic_istream<CharT,Traits>&
getline( std::basic_istream<CharT,Traits>&& input, std::basic_string<CharT,Traits,Allocator>& str, CharT delim);\\默认分割符号为换行符
std::basic_istream<CharT,Traits>&
getline( std::basic_istream<CharT,Traits>&& input, std::basic_string<CharT,Traits,Allocator>& str );
param参数: 1. input - 获取数据来源的流, 2. str - 放置数据的目标 string 3. delim - 分隔字符，默认是换行
返回值: input

#include <iostream>
#include <vector>
#include <sstream>

void split(const std::string & rawData, std::vector<std::string> & Result, char delim)
{
    std::stringstream ss(rawData);
    std::string item;
    while (std::getline(ss, item, delim))
        Result.emplace_back(item);
}

int main(void)
{
    std::string Raw("123,456,789");
    std::vector<std::string> Res;
    split(Raw, Res, ',');
    for (auto data : Res)
    {
        std::cout << data << std::endl;
    }
    system("pause");
    return 0;
}

输出结果：

123
456
789

3. 使用正则表达式进行切片拆分

#include <iostream>
#include <vector>
#include <regex>
#include <algorithm>//copy ,
#include <iterator>//std::back_inserter
void split(const std::string & rawData, std::vector<std::string> & Result, char delim)
{
    // std::regex_token_iterator 是访问底层字符序列内每个正则表达式匹配的单独子匹配的只读老式向前迭代器 (LegacyForwardIterator) 。
    // 它亦可用于访问不为给定的正则表达式所匹配的序列部分（例如作为记号化器）。
    // 构造时，它构造一个 std::regex_iterator ，而在每次自增时，它走过请求的来自当前 match_results 的子匹配，并在自增离开上个子匹配时自增底层的 regex_iterator 。
    // 默认构造的 std::regex_token_iterator 是序列尾迭代器。在抵达最后匹配的最后子匹配自增合法的 std::regex_token_iterator 时，它变得等于序列尾迭代器。进一步解引用或自增它引发未定义行为。
    // 在恰好变为序列尾迭代器前，若请求的子匹配下标列表中出现 -1 （非匹配碎片），则 std::regex_token_iterator 可成为后缀迭代器。若解引用这种迭代器，则返回对应最后匹配和序列结尾之间的字符序列的 match_results 。
    // std::regex_token_iterator 的典型实现保有底层的 std::regex_iterator 、请求的子匹配下标的容器（例如 std::vector<int> ）、等于子匹配下标的内部计数器、指向当前匹配的当前子匹配的指向 std::sub_match 指针和含有最近非匹配字符序列的 std::match_results 对象（用于记号化器模式）。
    std::regex rgx(",");//正则表达式对象
    //std::sregex_token_iterator iter();
    /*std::copy(std::sregex_token_iterator(rawData.begin(), rawData.end(), rgx, -1), std::sregex_token_iterator(),
        std::ostream_iterator<std::string>(std::cout, "\n"));*/
    std::copy(std::sregex_token_iterator(rawData.begin(),rawData.end(),rgx,-1),std::sregex_token_iterator(), std::back_inserter(Result));
}

int main(void)
{
    std::string Raw("123,456,789");
    std::vector<std::string> Res;
    split(Raw, Res, ',');
    for (auto data : Res)
    {
        std::cout << data << std::endl;
    }
    system("pause");
    return 0;
}

关于正则表达式的部分参考于此博客

C++字符串std::string切片split()方法

admin • 2023 年 08 月 30 日

众所周知,C++的字符串没有按字符char进行split的函数，所以需要自己写切片函数。
下面介绍几种方法：

1. 使用std::string的find函数进行循环处理。(C++11)

关键函数:

std::basic_string<CharT,Traits,Allocator>::find函数簇（find,rfind,find_first_of）

size_type find( const basic_string& str, size_type pos = 0 );
size_type find( CharT ch, size_type pos = 0 );
param: 1.str - 要搜索的string 2. pos - 开始搜索的位置 3. ch - 要搜索的字符

std::basic_string<CharT,Traits,Allocator>::substr函数簇c++98引入

basic_string substr( size_type pos = 0, size_type count = npos )
param: 1.pos - 要包含的首个字符的位置 2. count - 子串的长度

#include <iostream>
#include <string>
#include <vector>

void split(const std::string & rawData,std::vector<std::string> & Result, char sep)
{
    int pos = 0;
    while(1)//找不到就是-1
    {
        int nextPos = rawData.find(sep, pos);
        if (nextPos == -1)
        {
            Result.push_back(rawData.substr(pos));
            break;
        }
        Result.push_back(rawData.substr(pos, nextPos - pos));// 如果nextPos == std::npos(-1), 则会返回数据至尾部
        pos = nextPos + 1;//要跳过分隔符
    }
}

int main(void)
{
    std::string Raw("123,456,789");
    std::vector<std::string> Res;
    split(Raw,Res, ',');
    for (auto data:Res)
    {
        std::cout << data << std::endl;
    }
    system("pause");
    return 0;
}

输出为: 
123
456
789

2. 使用std::getline()函数进行处理

关键函数:

std::basic_istream<CharT,Traits>&
getline( std::basic_istream<CharT,Traits>&& input, std::basic_string<CharT,Traits,Allocator>& str, CharT delim);\\默认分割符号为换行符
std::basic_istream<CharT,Traits>&
getline( std::basic_istream<CharT,Traits>&& input, std::basic_string<CharT,Traits,Allocator>& str );
param参数: 1. input - 获取数据来源的流, 2. str - 放置数据的目标 string 3. delim - 分隔字符，默认是换行
返回值: input

#include <iostream>
#include <vector>
#include <sstream>

void split(const std::string & rawData, std::vector<std::string> & Result, char delim)
{
    std::stringstream ss(rawData);
    std::string item;
    while (std::getline(ss, item, delim))
        Result.emplace_back(item);
}

int main(void)
{
    std::string Raw("123,456,789");
    std::vector<std::string> Res;
    split(Raw, Res, ',');
    for (auto data : Res)
    {
        std::cout << data << std::endl;
    }
    system("pause");
    return 0;
}

输出结果：

123
456
789

3. 使用正则表达式进行切片拆分

#include <iostream>
#include <vector>
#include <regex>
#include <algorithm>//copy ,
#include <iterator>//std::back_inserter
void split(const std::string & rawData, std::vector<std::string> & Result, char delim)
{
    // std::regex_token_iterator 是访问底层字符序列内每个正则表达式匹配的单独子匹配的只读老式向前迭代器 (LegacyForwardIterator) 。
    // 它亦可用于访问不为给定的正则表达式所匹配的序列部分（例如作为记号化器）。
    // 构造时，它构造一个 std::regex_iterator ，而在每次自增时，它走过请求的来自当前 match_results 的子匹配，并在自增离开上个子匹配时自增底层的 regex_iterator 。
    // 默认构造的 std::regex_token_iterator 是序列尾迭代器。在抵达最后匹配的最后子匹配自增合法的 std::regex_token_iterator 时，它变得等于序列尾迭代器。进一步解引用或自增它引发未定义行为。
    // 在恰好变为序列尾迭代器前，若请求的子匹配下标列表中出现 -1 （非匹配碎片），则 std::regex_token_iterator 可成为后缀迭代器。若解引用这种迭代器，则返回对应最后匹配和序列结尾之间的字符序列的 match_results 。
    // std::regex_token_iterator 的典型实现保有底层的 std::regex_iterator 、请求的子匹配下标的容器（例如 std::vector<int> ）、等于子匹配下标的内部计数器、指向当前匹配的当前子匹配的指向 std::sub_match 指针和含有最近非匹配字符序列的 std::match_results 对象（用于记号化器模式）。
    std::regex rgx(",");//正则表达式对象
    //std::sregex_token_iterator iter();
    /*std::copy(std::sregex_token_iterator(rawData.begin(), rawData.end(), rgx, -1), std::sregex_token_iterator(),
        std::ostream_iterator<std::string>(std::cout, "\n"));*/
    std::copy(std::sregex_token_iterator(rawData.begin(),rawData.end(),rgx,-1),std::sregex_token_iterator(), std::back_inserter(Result));
}

int main(void)
{
    std::string Raw("123,456,789");
    std::vector<std::string> Res;
    split(Raw, Res, ',');
    for (auto data : Res)
    {
        std::cout << data << std::endl;
    }
    system("pause");
    return 0;
}

关于正则表达式的部分参考于此博客