SQL Server
PRINT @@VERSION
MicrosoftSQLServer2012-11.0.2100.60(X64)
Feb10201219:39:15
Copyright(c)MicrosoftCorporation
EnterpriseEdition:Core-basedLicensing(64-bit)onWindowsNT6.1(Build7601:ServicePack1)
操作系統
------------------
System Information
------------------
Operating System: Windows 7 Ultimate 64-bit (6.1, Build 7601) \
Service Pack 1 (7601.win7sp1_gdr.130828-1532)
System Model: Aspire E1-471G
Processor: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz (4 CPUs), ~2.6GHz
Memory: 4096MB RAM
從一大堆有包含中文字符和編號的字符串中過濾出編號。
首先,我們准備測試數據,注意,這裡的數據全部都是模擬數據,無實際含義。語句如下:
CREATE TABLE #temp
(
name VARCHAR(80)
);
INSERT INTO #temp
VALUES ('五道口店3059');
INSERT INTO #temp
VALUES ('五羊邨店3060');
INSERT INTO #temp
VALUES ('楊家屯店3061');
INSERT INTO #temp
VALUES ('十裡堤店3062');
INSERT INTO #temp
VALUES ('中關村店3063');
INSERT INTO #temp
VALUES ('麗秀店3064');
INSERT INTO #temp
VALUES ('石門店3065');
INSERT INTO #temp
VALUES ('黃村店3066');
INSERT INTO #temp
VALUES ('東圃店3067');
INSERT INTO #temp
VALUES ('天河店3068');
INSERT INTO #temp
VALUES ('人民路廣場3069');
INSERT INTO #temp
VALUES ('社區中心3070');
INSERT INTO #temp
VALUES ('珠海市3071');
INSERT INTO #temp
VALUES ('麗都3072');
INSERT INTO #temp
VALUES ('曉月3073');
INSERT INTO #temp
VALUES ('舊區3074');
INSERT INTO #temp
VALUES ('新城3075');
INSERT INTO #temp
VALUES ('水井溝3076');
然後,我們觀察數據,發現這些數據都有規律,編號是數字,占4個字符。數字前面包含店、場、心、市、都、月、區、城、溝共9個字符。
我們試著采用SQL Server內置的函數Substring、Charindex、Rtrim、Ltrim過濾掉出現次數最多(店)的字符串。
語句如下:
SELECT Rtrim(Ltrim(Substring(name, Charindex('店', name) + 1, Len(name)))) AS name
INTO #t1
FROM #temp
以下是這幾個函數的使用說明:
Substring
Returns the part of a character expression that starts at the specified position and has the specified length. The position parameter and the length parameter must evaluate to integers.
Syntax
SUBSTRING(character_expression, position, length)Arguments
character_expression
Is a character expression from which to extract characters.
position
Is an integer that specifies where the substring begins.
length
Is an integer that specifies the length of the substring as number of characters.Result Types
DT_WSTR
Charindex
Searches an expression for anOther expression and returns its starting position if found.
Syntax
CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] )Arguments
expressionToFind
Is a character expression that contains the sequence to be found. expressionToFind is limited to 8000 characters.
expressionToSearch
Is a character expression to be searched.
start_location
Is an integer or bigint expression at which the search starts. If start_location is not specified, is a negative number, or is 0, the search starts at the beginning of expressionToSearch.Return Types
bigint if expressionToSearch is of the varchar(max), nvarchar(max), or varbinary(max) data types; Otherwise, int.
Rtrim
Returns a character expression after removing trailing spaces.
RTRIM does not remove white space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters Other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET RTrim method in a script run from the Script component.
Syntax
RTRIM(character expression)Arguments
character_expression
Is a character expression from which to remove spaces.Result Types
DT_WSTR
Ltrim
Returns a character expression after removing leading spaces.
LTRIM does not remove white-space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters Other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET LTrim method in a script run from the Script component.
Syntax
LTRIM(character expression)Arguments
character_expression
Is a character expression from which to remove spaces.Result Types
DT_WSTR
好了,我們查看處理完後的結果,可以看到包含店的字符串已經全部過濾出編號。
SELECT * FROM #t1
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
人民路廣場3069
社區中心3070
珠海市3071
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076
接著我們依次處理包含場、心、市、都、月、區、城、溝的字符串,語句和處理結果如下:
SELECT *
FROM #t1
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
人民路廣場3069
社區中心3070
珠海市3071
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076
SELECT Rtrim(Ltrim(Substring(name, Charindex('場', name) + 1, Len(name)))) AS name
INTO #t2
FROM #t1
SELECT *
FROM #t2
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
社區中心3070
珠海市3071
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076
SELECT Rtrim(Ltrim(Substring(name, Charindex('心', name) + 1, Len(name)))) AS name
INTO #t3
FROM #t2
SELECT *
FROM #t3
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
珠海市3071
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076
SELECT Rtrim(Ltrim(Substring(name, Charindex('市', name) + 1, Len(name)))) AS name
INTO #t4
FROM #t3
SELECT *
FROM #t4
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076
SELECT Rtrim(Ltrim(Substring(name, Charindex('都', name) + 1, Len(name)))) AS name
INTO #t5
FROM #t4
SELECT *
FROM #t5
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
曉月3073
舊區3074
新城3075
水井溝3076
SELECT Rtrim(Ltrim(Substring(name, Charindex('月', name) + 1, Len(name)))) AS name
INTO #t6
FROM #t5
SELECT *
FROM #t6
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
舊區3074
新城3075
水井溝3076
SELECT Rtrim(Ltrim(Substring(name, Charindex('區', name) + 1, Len(name)))) AS name
INTO #t7
FROM #t6
SELECT *
FROM #t7
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
新城3075
水井溝3076
SELECT Rtrim(Ltrim(Substring(name, Charindex('城', name) + 1, Len(name)))) AS name
INTO #t8
FROM #t7
SELECT *
FROM #t8
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
水井溝3076
SELECT Rtrim(Ltrim(Substring(name, Charindex('溝', name) + 1, Len(name)))) AS name
INTO #t9
FROM #t8
SELECT *
FROM #t9
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
--無記錄
這是最終的處理結果,過濾出編號後,我就可以利用這些編號和數據庫表進行關聯,獲得想要的數據。
SELECT *
INTO #result
FROM #t9
SELECT *
FROM #result
name
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
SELECT s.xxx,
s.xxx
FROM xx s
JOIN #result r
ON s.xxx = r.name
WHERE s.xxx = 0;
本文過濾編號實際上核心代碼就兩個,第一個是利用SQL Server的內置函數過濾出指定編號,語句如下:
SELECT Rtrim(Ltrim(Substring(name, Charindex('店', name) + 1, Len(name)))) AS name
INTO #t1
FROM #temp
第二個是判斷是否包含中文,語句如下:
SELECT *
FROM #t1
WHERE name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN
在工作中,發現和總結這些小技巧會讓你的工作事半功倍。