程式師世界 >> 編程語言 >> .NET網頁編程 >> C# >> C#入門知識 >> [Tesseract]簡單數字識別，tesseract數字識別

[Tesseract]簡單數字識別，tesseract數字識別

編輯：C#入門知識

[Tesseract]簡單數字識別，tesseract數字識別

圖像識別涉及的理論:傅裡葉變換,圖形形態學,濾波,矩陣變換等等.

Tesseract的出現為了解決在沒有這些復雜的理論基礎,快速識別圖像的框架.

准備:

1.樣本圖像學習,預處理　　(平均每1個元素出現20次)

2.學習,初步識別

3.校正學習庫

測試:

1.待識別圖像,預處理

2.根據學習庫識別

例子1:圖片反色

1 private static void Reverse(string fileName,string outName) 2 { 3 using (var pic = Image.FromFile(fileName) as Bitmap) 4 { 5 for (int i = 0; i < pic.Width; i++) 6 { 7 for (int j = 0; j < pic.Height; j++) 8 { 9 var c = pic.GetPixel(i, j); 10 c = Color.FromArgb(255 - c.R, 255 - c.G, 255 - c.B); 11 pic.SetPixel(i, j, c); 12 } 13 } 14 pic.Save(outName); 15 } 16 } Reverse

例子2:取一個圖片的指定區域

1 private Image Analyse(string fileName)//為方便,方法參數沒有用Image 2 { 3 using (var map = Image.FromFile(fileName) as Bitmap) 4 { 5 if (map == null) return null; 6 Point p1; 7 Point p2; 8 var p = GetConfig(out p1, out p2); 9 var pic = new Bitmap(p.X, p.Y); 10 var x = 0; 11 var y = 0; 12 for (int i = 0; i < map.Height; i++) 13 { 14 if (i >= p1.Y && i <= p2.Y) 15 { 16 for (int j = 0; j < map.Width; j++) 17 { 18 if (j >= p1.X && j <= p2.X) 19 { 20 pic.SetPixel(x, y, map.GetPixel(j, i)); 21 x++; 22 } 23 } 24 x = 0; 25 y++; 26 } 27 } 28 return pic; 29 } 30 } 31 32 private Point GetConfig(out Point p1, out Point p2) 33 { 34 var p1Str = ConfigurationManager.AppSettings["p1"].Split(','); 35 var p2Str = ConfigurationManager.AppSettings["p2"].Split(','); 36 p1 = new Point() { X = int.Parse(p1Str[0]), Y = int.Parse(p1Str[1]) }; 37 p2 = new Point() { X = int.Parse(p2Str[0]), Y = int.Parse(p2Str[1]) }; 38 return new Point() { X = p2.X - p1.X + 2, Y = p2.Y - p1.Y + 2 }; 39 } 40 41 42 class Point 43 { 44 /// <summary> 45 /// 點的X坐標,或者寬度 46 /// </summary> 47 public int X { get; set; } 48 /// <summary> 49 /// 點的Y坐標,或者高度 50 /// </summary> 51 public int Y { get; set; } 52 } Image Analyse(string fileName)

識別步驟:

1.將圖片轉成tif格式,通過jTessBoxEditor程序把所有tif圖片打包成1個tif圖片.

2.安裝tesseract-ocr-setup-3.01-1.exe(用安裝包版不需要配環境變量).

3.dos命令:輸入tesseract.exe {0}.tif {0} batch.nochop makebox生成box文件　　({0}為文件名)

4.dos命令:輸入tesseract.exe {0}.tif {0} nobatch box.train生成tr文件

5.dos命令:輸入unicharset_extractor.exe tj.box生成unicharset文件

6.同一目錄下新建一個文件font_properties無擴展名.寫入{0} 1 0 0 1 0

7.dos命令:輸入cntraining.exe {0}.tr

8.dos命令:輸入mftraining.exe -F font_properties -U unicharset {0}.tr

9.4個文件名加前綴:{0}. (1.有.　　2.4個:unicharset inttemp normproto pfftable)

10:dos命令:combine_tessdata {0}.(合並所有文件,生成學習庫,{0}.traineddata)

代碼步驟: