Lewati ke konten utama

Pustaka Pencarian Word

Glasswall Embedded Engine menyediakan inspeksi file mendalam, remediasi, sanitasi, dan pelaporan. Engine ini mendekonstruksi file menjadi komponen strukturalnya dan membangun representasi internal file yang menyerupai pohon. Engine ini menelusuri setiap node pada pohon, memeriksa, memperbaiki, dan menyanitasi item konten sebelum merekonstruksi file baru.

Glasswall Embedded Engine juga menyediakan kemampuan untuk mengekspor dan mengimpor representasi internal engine atas struktur file dalam format perantara seperti XML. Hal ini memungkinkan komponen internal file tersedia bagi program eksternal untuk pemrosesan tambahan, sebelum file disusun ulang agar menyertakan komponen yang dimodifikasi secara eksternal tersebut.

Engine Glasswall Word Search dibangun di atas kemampuan ekspor dan impor, dengan melakukan pencarian teks pada konten dan metadata file. String pencarian, manajemen konten, dan aturan redaksi dikonfigurasi melalui file XML. Peta substitusi karakter yang dapat dikonfigurasi pengguna dan didefinisikan dalam bentuk JSON digunakan untuk menyediakan dukungan bagi pengaburan teks. Engine ini juga dilengkapi dengan dukungan ekspresi reguler bawaan.

Konfigurasi Word Search

Konfigurasi Word Search menentukan teks yang akan dicari, atau ekspresi reguler yang akan diterapkan dan bagaimana teks tersebut harus diperlakukan ketika ditemukan di dalam dokumen. Konfigurasi Word Search merupakan ekstensi dari manajemen konten Glasswall.

Contoh file policy & skema

Contoh file policy Word Search dan kamus homoglif dapat ditemukan di folder /configs/sdk_word_search dalam paket rilis. Word Search XSD dapat ditemukan di folder /schemas/sdk_word_search dalam paket rilis.

Contoh policy konfigurasi

Pengaturan teks

Bagian berikut menampilkan berbagai textSetting yang dapat didefinisikan dalam sebuah policy konfigurasi. Untuk informasi lebih lanjut tentang berbagai pengaturan, lihat halaman Word Search & Redaction.

Izinkan
<textSearchConfig libVersion="core2">
<textList>
<textItem>
<regex>((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}</regex>
<textSetting>allow</textSetting>
</textItem>
<textItem>
<text>Glasswall</text>
<textSetting>allow</textSetting>
</textItem>
</textList>
</textSearchConfig>
Tolak
<textSearchConfig libVersion="core2">
<textList>
<textItem>
<regex>((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}</regex>
<textSetting>disallow</textSetting>
</textItem>
<textItem>
<text>Glasswall</text>
<textSetting>disallow</textSetting>
</textItem>
</textList>
</textSearchConfig>
Redaksi
<textSearchConfig libVersion="core2">
<textList>
<textItem>
<regex>((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}</regex>
<textSetting replacementChar="*">redact</textSetting>
</textItem>
<textItem>
<text>Glasswall</text>
<textSetting replacementChar="*">redact</textSetting>
</textItem>
</textList>
</textSearchConfig>
Wajib
<textSearchConfig libVersion="core2">
<textList>
<textItem>
<regex>((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}</regex>
<textSetting>require</textSetting>
</textItem>
</textList>
</textSearchConfig>

Konfigurasi sistem

Seperti pada engine inti Glasswall, switch tambahan dapat ditemukan di bagian sysConfig. Ini mengontrol perilaku engine Word Search saat memproses file input.

<sysConfig>
<!--interchange_type must always be specified with the value "xml"-->
<interchange_type>xml</interchange_type>
<!--Enables/disables processing of text files. False by default.-->
<enable_text_support>false</enable_text_support>
</sysConfig>

Batasan yang diketahui

  • Memproses file Office dan file teks secara bersamaan tidak dimungkinkan
  • Saat memproses file teks, setidaknya satu policy require harus didefinisikan
  • interchange_type harus selalu ditentukan sebagai xml di bawah sysConfig
  • Configuration policy that has a combination of the following textSettings with the same text/regex defined will always process the file:
    • require dan redact
    • require dan disallow
    • redact dan allow
    • allow dan disallow

Contoh config homoglyph JSON

File JSON memungkinkan pengguna membuat pemetaan antara karakter dan homoglyph yang sesuai. Ini memungkinkan engine mempertimbangkan homoglyph saat menghasilkan ekspresi pencarian, sehingga mendukung homograf (kata yang tampak mirip) dan teks yang dikaburkan.

Config Homoglyph Default
{
"!": "ǃⵑ",
"$": "$",
"%": "%",
"&": "ꝸ&",
"'": "`´ʹʻʼʽʾˈˊˋ˴ʹ΄՚՝י׳ߴߵᑊᛌ᾽᾿`´῾‘’‛′‵ꞌ'`𖽑𖽒",
"(": "❨❲〔﴾([",
")": "❩❳〕﴿)]",
"*": "٭⁎∗*𐌟",
"+": "᛭+𐊛",
",": "¸؍٫‚ꓹ,",
"-": "˗۔‐‑‒–⁃−➖Ⲻ﹘",
".": "٠۰܁܂․ꓸ꘎.𐩐𝅭",
"/": "᜵⁁⁄∕╱⟋⧸Ⳇ⼃〳ノ㇓丿/𝈺",
"0": "OoΟοσОоՕօסه٥ھہە۵߀०০੦૦ଠ୦௦ం౦ಂ೦ംഠ൦ං๐໐ဝ၀ჿዐᴏᴑℴⲞⲟⵔ〇ꓳꬽﮦﮧﮨﮩﮪﮫﮬﮭﻩﻪﻫﻬ0Oo𐊒𐊫𐐄𐐬𐓂𐓪𐔖𑓐𑢵𑣈𑣗𑣠𝐎𝐨𝑂𝑜𝑶𝒐𝒪𝓞𝓸𝔒𝔬𝕆𝕠𝕺𝖔𝖮𝗈𝗢𝗼𝘖𝘰𝙊𝙤𝙾𝚘𝚶𝛐𝛔𝛰𝜊𝜎𝜪𝝄𝝈𝝤𝝾𝞂𝞞𝞸𝞼𝟎𝟘𝟢𝟬𝟶𞸤𞹤𞺄",
"1": "Il|ƖǀΙІӀ׀וןا١۱ߊᛁℐℑℓⅠⅼ∣⏽Ⲓⵏꓲﺍﺎ1Il│𐊊𐌉𐌠𖼨𝐈𝐥𝐼𝑙𝑰𝒍𝓁𝓘𝓵𝔩𝕀𝕝𝕴𝖑𝖨𝗅𝗜𝗹𝘐𝘭𝙄𝙡𝙸𝚕𝚰𝛪𝜤𝝞𝞘𝟏𝟙𝟣𝟭𝟷𞣇𞸀𞺀",
"2": "ƧϨᒿꙄꛯꝚ2𝟐𝟚𝟤𝟮𝟸",
"3": "ƷȜЗӠⳌꝪꞫ3𑣊𖼻𝈆𝟑𝟛𝟥𝟯𝟹",
"4": "Ꮞ4𑢯𝟒𝟜𝟦𝟰𝟺",
"5": "Ƽ5𑢻𝟓𝟝𝟧𝟱𝟻",
"6": "бᏮⳒ6𑣕𝟔𝟞𝟨𝟲𝟼",
"7": "7𐓒𑣆𝈒𝟕𝟟𝟩𝟳𝟽",
"8": "Ȣȣ৪੪ଃ8𐌚𝟖𝟠𝟪𝟴𝟾𞣋",
"9": "৭੧୨൭ⳊꝮ9𑢬𑣌𑣖𝟗𝟡𝟫𝟵𝟿",
"A": "4ΑАᎪᗅᴀꓮꭺA𐊠𖽀𝐀𝐴𝑨𝒜𝓐𝔄𝔸𝕬𝖠𝗔𝘈𝘼𝙰𝚨𝛢𝜜𝝖𝞐",
"B": "ʙΒВвᏴᏼᗷᛒℬꓐꞴB𐊂𐊡𐌁𝐁𝐵𝑩𝓑𝔅𝔹𝕭𝖡𝗕𝘉𝘽𝙱𝚩𝛣𝜝𝝗𝞑",
"C": "ϹСᏟℂℭⅭⲤꓚC𐊢𐌂𐐕𐔜𑣩𑣲𝐂𝐶𝑪𝒞𝓒𝕮𝖢𝗖𝘊𝘾𝙲🝌",
"D": "ᎠᗞᗪᴅⅅⅮꓓꭰD𝐃𝐷𝑫𝒟𝓓𝔇𝔻𝕯𝖣𝗗𝘋𝘿𝙳",
"E": "ΕЕᎬᴇℰ⋿ⴹꓰꭼE𐊆𑢦𑢮𝐄𝐸𝑬𝓔𝔈𝔼𝕰𝖤𝗘𝘌𝙀𝙴𝚬𝛦𝜠𝝚𝞔",
"F": "ϜᖴℱꓝꞘF𐊇𐊥𐔥𑢢𑣂𝈓𝐅𝐹𝑭𝓕𝔉𝔽𝕱𝖥𝗙𝘍𝙁𝙵𝟊",
"G": "ɢԌԍᏀᏳᏻꓖꮐG𝐆𝐺𝑮𝒢𝓖𝔊𝔾𝕲𝖦𝗚𝘎𝙂𝙶",
"H": "ʜΗНнᎻᕼℋℌℍⲎꓧꮋH𐋏𝐇𝐻𝑯𝓗𝕳𝖧𝗛𝘏𝙃𝙷𝚮𝛨𝜢𝝜𝞖",
"I": "",
"J": "ͿЈᎫᒍᴊꓙꞲꭻJ𝐉𝐽𝑱𝒥𝓙𝔍𝕁𝕵𝖩𝗝𝘑𝙅𝙹",
"K": "ΚКᏦᛕKⲔꓗK𐔘𝐊𝐾𝑲𝒦𝓚𝔎𝕂𝕶𝖪𝗞𝘒𝙆𝙺𝚱𝛫𝜥𝝟𝞙",
"L": "ʟᏞᒪℒⅬⳐⳑꓡꮮL𐐛𐑃𐔦𑢣𑢲𖼖𝈪𝐋𝐿𝑳𝓛𝔏𝕃𝕷𝖫𝗟𝘓𝙇𝙻",
"M": "ΜϺМᎷᗰᛖℳⅯⲘꓟM𐊰𐌑𝐌𝑀𝑴𝓜𝔐𝕄𝕸𝖬𝗠𝘔𝙈𝙼𝚳𝛭𝜧𝝡𝞛",
"N": "ɴΝℕⲚꓠN𐔓𝐍𝑁𝑵𝒩𝓝𝔑𝕹𝖭𝗡𝘕𝙉𝙽𝚴𝛮𝜨𝝢𝞜",
"O": "0",
"P": "ΡРᏢᑭᴘᴩℙⲢꓑꮲP𐊕𝐏𝑃𝑷𝒫𝓟𝔓𝕻𝖯𝗣𝘗𝙋𝙿𝚸𝛲𝜬𝝦𝞠",
"Q": "ℚⵕQ𝐐𝑄𝑸𝒬𝓠𝔔𝕼𝖰𝗤𝘘𝙌𝚀",
"R": "ƦʀᎡᏒᖇᚱℛℜℝꓣꭱꮢR𐒴𖼵𝈖𝐑𝑅𝑹𝓡𝕽𝖱𝗥𝘙𝙍𝚁",
"S": "$ЅՏᏕᏚꓢS𐊖𐐠𖼺𝐒𝑆𝑺𝒮𝓢𝔖𝕊𝕾𝖲𝗦𝘚𝙎𝚂",
"T": "ŤΤτТтᎢᴛ⊤⟙ⲦꓔꭲT𐊗𐊱𐌕𑢼𖼊𝐓𝑇𝑻𝒯𝓣𝔗𝕋𝕿𝖳𝗧𝘛𝙏𝚃𝚻𝛕𝛵𝜏𝜯𝝉𝝩𝞃𝞣𝞽🝨",
"U": "Սሀᑌ∪⋃ꓴU𐓎𑢸𖽂𝐔𝑈𝑼𝒰𝓤𝔘𝕌𝖀𝖴𝗨𝘜𝙐𝚄",
"V": "Ѵ٧۷ᏙᐯⅤⴸꓦꛟV𐔝𑢠𖼈𝈍𝐕𝑉𝑽𝒱𝓥𝔙𝕍𝖁𝖵𝗩𝘝𝙑𝚅",
"W": "ԜᎳᏔꓪW𑣦𑣯𝐖𝑊𝑾𝒲𝓦𝔚𝕎𝖂𝖶𝗪𝘞𝙒𝚆",
"X": "ΧХ᙭ᚷⅩ╳ⲬⵝꓫꞳX𐊐𐊴𐌗𐌢𐔧𑣬𝐗𝑋𝑿𝒳𝓧𝔛𝕏𝖃𝖷𝗫𝘟𝙓𝚇𝚾𝛸𝜲𝝬𝞦",
"Y": "ΥϒУҮᎩᎽⲨꓬY𐊲𑢤𖽃𝐘𝑌𝒀𝒴𝓨𝔜𝕐𝖄𝖸𝗬𝘠𝙔𝚈𝚼𝛶𝜰𝝪𝞤",
"Z": "ΖᏃℤℨꓜZ𐋵𑢩𑣥𝐙𝑍𝒁𝒵𝓩𝖅𝖹𝗭𝘡𝙕𝚉𝚭𝛧𝜡𝝛𝞕",
"a": "@ɑαа⍺a𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊𝛂𝛼𝜶𝝰𝞪",
"b": "ƄЬᏏᖯb𝐛𝑏𝒃𝒷𝓫𝔟𝕓𝖇𝖻𝗯𝘣𝙗𝚋",
"c": "ϲсᴄⅽⲥꮯc𐐽𝐜𝑐𝒄𝒸𝓬𝔠𝕔𝖈𝖼𝗰𝘤𝙘𝚌",
"d": "ԁᏧᑯⅆⅾꓒd𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍",
"e": "еҽ℮ℯⅇꬲe𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝖾𝗲𝘦𝙚𝚎",
"f": "ſϝքẝꞙꬵf𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏𝟋",
"g": "ƍɡցᶃℊg𝐠𝑔𝒈𝓰𝔤𝕘𝖌𝗀𝗴𝘨𝙜𝚐",
"h": "һհᏂℎh𝐡𝒉𝒽𝓱𝔥𝕙𝖍𝗁𝗵𝘩𝙝𝚑",
"i": "ıɩɪ˛ͺιіӏᎥιℹⅈⅰ⍳ꙇꭵi𑣃𝐢𝑖𝒊𝒾𝓲𝔦𝕚𝖎𝗂𝗶𝘪𝙞𝚒𝚤𝛊𝜄𝜾𝝸𝞲",
"j": "ϳјⅉj𝐣𝑗𝒋𝒿𝓳𝔧𝕛𝖏𝗃𝗷𝘫𝙟𝚓",
"k": "k𝐤𝑘𝒌𝓀𝓴𝔨𝕜𝖐𝗄𝗸𝘬𝙠𝚔",
"l": "1",
"m": "m",
"n": "ոռn𝐧𝑛𝒏𝓃𝓷𝔫𝕟𝖓𝗇𝗻𝘯𝙣𝚗",
"o": "",
"p": "ρϱр⍴ⲣp𝐩𝑝𝒑𝓅𝓹𝔭𝕡𝖕𝗉𝗽𝘱𝙥𝚙𝛒𝛠𝜌𝜚𝝆𝝔𝞀𝞎𝞺𝟈",
"q": "ԛգզq𝐪𝑞𝒒𝓆𝓺𝔮𝕢𝖖𝗊𝗾𝘲𝙦𝚚",
"r": "гᴦⲅꭇꭈꮁr𝐫𝑟𝒓𝓇𝓻𝔯𝕣𝖗𝗋𝗿𝘳𝙧𝚛",
"s": "$ƽѕꜱꮪs𐑈𑣁𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜",
"t": "t𝐭𝑡𝒕𝓉𝓽𝔱𝕥𝖙𝗍𝘁𝘵𝙩𝚝",
"u": "ʋυսᴜꞟꭎꭒu𐓶𑣘𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞𝛖𝜐𝝊𝞄𝞾",
"v": "νѵטᴠⅴ∨⋁ꮩv𑜆𑣀𝐯𝑣𝒗𝓋𝓿𝔳𝕧𝖛𝗏𝘃𝘷𝙫𝚟𝛎𝜈𝝂𝝼𝞶",
"w": "ɯѡԝաᴡꮃw𑜊𑜎𑜏𝐰𝑤𝒘𝓌𝔀𝔴𝕨𝖜𝗐𝘄𝘸𝙬𝚠",
"x": "×хᕁᕽ᙮ⅹ⤫⤬⨯x𝐱𝑥𝒙𝓍𝔁𝔵𝕩𝖝𝗑𝘅𝘹𝙭𝚡",
"y": "ɣʏγуүყᶌỿℽꭚy𑣜𝐲𝑦𝒚𝓎𝔂𝔶𝕪𝖞𝗒𝘆𝘺𝙮𝚢𝛄𝛾𝜸𝝲𝞬",
"z": "ᴢꮓz𑣄𝐳𝑧𝒛𝓏𝔃𝔷𝕫𝖟𝗓𝘇𝘻𝙯𝚣",
"£": "₤",
"©": "Ⓒ",
"®": "Ⓡ"
}

Contoh laporan analisis

Berikut adalah contoh laporan analisis yang dihasilkan saat string pencarian diatur ke 'Glasswall', terlepas dari textSetting yang digunakan. Ini mencakup ItemMatchCount untuk setiap pola yang cocok dalam file tertentu.

<gw:WordItem>
<gw:Name>Glasswall</gw:Name>
<gw:ItemMatchCount>1</gw:ItemMatchCount>
<gw:Locations>
<gw:Location>
<gw:Offset>463</gw:Offset>
<gw:Page>0</gw:Page>
<gw:Paragraph>0</gw:Paragraph>
</gw:Location>
</gw:Locations>
</gw:WordItem>

Fungsi API

Status

API GwWordSearch dan GwWordSearchDone mengembalikan Status yang menunjukkan hasil panggilan API. API GwWordSearchTranslateStatus mengembalikan deskripsi untuk Status yang diberikan.

EnumeratorNilaiDeskripsi
ws_disallowedItemFound-1024Item yang tidak diizinkan oleh policy ditemukan dalam file.
ws_requiredItemNotFound-1025Item yang diwajibkan oleh policy tidak ditemukan dalam file.
ws_illegalActionRedact-1026Tindakan redact ditentukan tetapi jenis file tidak mendukung redaksi.
ws_illegalActionRequire-1027Tindakan require ditentukan tetapi filetype tidak mendukung require.
ws_illegalActionNoRequire-1028Tindakan require tidak ditentukan tetapi filetype memerlukannya.
ws_filetypeUnsupported-1029Filetype tidak didukung oleh Word Search.
eFail0Kesalahan umum atau kesalahan lain yang tidak ditentukan.
eSuccess1Operasi berhasil.

C++

Setiap API mengembalikan sebuah Status, yang didefinisikan sebagai berikut:

enum Status {
ws_disallowedItemFound = -1024,
ws_requiredItemNotFound = -1025,
ws_illegalActionRedact = -1026,
ws_illegalActionRequire = -1027,
ws_illegalActionNoRequire = -1028,
ws_filetypeUnsupported = -1029,
eFail = 0,
eSuccess = 1,
};

C#

Untuk mengintegrasikan Glasswall Word Search di C#, Word Search C# wrapper Glasswall diperlukan. Setiap API mengembalikan tipe WordSearchStatus, yang didefinisikan sebagai berikut:

/// <summary>
/// Indicates whether the Word Search process was successful (WordSearchStatus.Success)
/// or not (WordSearchStatus.Fail). Zero or negative values indicate a failure.
/// </summary>
public enum WordSearchStatus
{
DisallowedItemFound = -1024,
RequiredItemNotFound = -1025,
IllegalActionRedact = -1026,
IllegalActionRequire = -1027,
IllegalActionNoRequire = -1028,
FiletypeUnsupported = -1029,
Fail = 0,
Success
}

Java

Untuk mengintegrasikan Glasswall Word Search di java, Glasswall Word Search Java wrapper is required. Setiap API mengembalikan tipe GlasswallWordSearchResult`, yang didefinisikan sebagai berikut:

package com.glasswallsolutions;

/**
* Class used to hold the results from a Word Search process.
*/
public class GlasswallWordSearchResult
{
/**
* The XML analysis report
*/
public String report;

/**
* The processed document
*/
public byte[] outputDocument;

/**
* boolean indicating whether the process was successful (true) or not (false)
*/
public boolean success;

public GlasswallWordSearchResult()
{
report = null;
outputDocument = null;
success = false;
}
}

Python

Untuk mengintegrasikan Glasswall Word Search di Python, Glasswall Python wrapper is required. Setiap API mengembalikan objek generik GwReturnObj, yang akan berisi atribut: "status" (int), "output_file" (bytes), "output_report" (bytes). Status int didefinisikan sebagai berikut:

# glasswall\libraries\word_search\successes.py

class Success(WordSearchSuccess):
""" WordSearch success code 1. """
pass


success_codes = {
1: Success,
}
# glasswall\libraries\word_search\errors.py

class UnknownErrorCode(WordSearchError):
""" Unknown error code. """
pass

class Fail(WordSearchError):
""" WordSearch error code 0. """
pass


class DisallowedItemFound(WordSearchError):
""" WordSearch error code -1024. Item disallowed by policy found in file. """
pass


class RequiredItemNotFound(WordSearchError):
""" WordSearch error code -1025. Item required by policy not found in file. """
pass


class IllegalActionRedact(WordSearchError):
""" WordSearch error code -1026. Redact action specified but filetype doesn't support redaction. """
pass


class IllegalActionRequire(WordSearchError):
""" WordSearch error code -1027. Require action specified but filetype doesn't support redaction. """
pass


class IllegalActionNoRequire(WordSearchError):
""" WordSearch error code -1028. Require action not specified but filetype needs one. """
pass


class FiletypeUnsupported(WordSearchError):
""" WordSearch error code -1029. Filetype supported by Editor but not by Word Search. """
pass


error_codes = {
0: Fail,
-1024: DisallowedItemFound,
-1025: RequiredItemNotFound,
-1026: IllegalActionRedact,
-1027: IllegalActionRequire,
-1028: IllegalActionNoRequire,
-1029: FiletypeUnsupported,
}

JavaScript

Untuk mengintegrasikan Glasswall Word Search di JavaScript, Word Search JavaScript wrapper Glasswall diperlukan. Setiap API mengembalikan tipe WordSearchStatus, yang didefinisikan sebagai berikut:

/**
* Used to indicate whether the Word Search process was successful or not
*/
export const enum WordSearchStatus {
ws_disallowedItemFound = -1024,
ws_requiredItemNotFound = -1025,
ws_illegalActionRedact = -1026,
ws_illegalActionRequire = -1027,
ws_illegalActionNoRequire = -1028,
ws_filetypeUnsupported = -1029,
eFail = 0,
eSuccess = 1,
}

GwWordSearch

Ini digunakan untuk memanggil engine Word Search, memproses file input yang ditentukan, dan menghasilkan file output beserta laporan analisis Word Search.

C++

Status GwWordSearch(
void* input_buffer,
size_t input_buffer_len,
void** output_buffer,
size_t* output_buffer_len,
void** output_report_buffer,
size_t* output_report_buffer_len,
const char* homoglpyhs,
const char* xml_config_string
)
NamaJenisArahDeskripsi
input_buffervoid *MasukPointer ke buffer yang berisi file input yang akan diproses
input_buffer_lensize_tMasukUkuran buffer file input
output_buffervoid **KeluarPointer ke pointer ke buffer yang akan diisi dengan buffer file yang telah diproses. Buffer ini dialokasikan oleh engine Word Search
output_buffer_lensize_t *KeluarPointer ke ukuran buffer file output. Ini akan ditetapkan oleh engine Word Search
output_report_buffervoid **KeluarPointer ke pointer ke buffer yang akan diisi dengan buffer laporan analisis Word Search. Buffer ini dialokasikan oleh engine Word Search
output_report_buffer_lensize_t *KeluarPointer ke ukuran laporan analisis Word Search. Ini akan ditetapkan oleh engine Word Search
homoglyphsconst char *MasukPointer ke buffer yang berisi file homoglyphs. Buffer ini harus diakhiri dengan null
xml_config_stringconst char *MasukPointer ke buffer yang berisi file XML manajemen konten. Buffer ini harus diakhiri dengan null

C#

Untuk mengintegrasikan Glasswall Word Search di C#, Word Search C# wrapper Glasswall diperlukan.

public WordSearchStatus GwWordSearch(
byte[] inputBuffer,
out byte[] outputFileBuffer,
out String outputAnalysisReport,
string homoglyphs,
string xmlConfigString
)

NamaJenisArahDeskripsi
inputBufferbyte[]MasukBuffer yang berisi dokumen yang akan diproses
outputFileBufferout byte[]KeluarBuffer hasil yang akan berisi dokumen yang telah diproses
outputAnalysisReportout stringKeluarLaporan analisis keluaran dari proses Word Search
homoglyphsstringMasukDokumen JSON yang berisi pemetaan homoglyph
xmlConfigStringstringMasukpolicy manajemen konten XML

Java

Untuk mengintegrasikan Glasswall Word Search di Java, Word Search Java wrapper Glasswall diperlukan.


public native GlasswallWordSearchResult wordSearch(
byte[] inputDocument,
String homoglyphs,
String xmlConfig
)

NamaJenisArahDeskripsi
inputDocumentbyte[]MasukBuffer yang berisi dokumen yang akan diproses
homoglyphsstringMasukDokumen JSON yang berisi pemetaan homoglyph
xmlConfigstringMasukpolicy manajemen konten XML

Catatan: Tidak seperti beberapa bahasa lain yang didukung, semua output dikembalikan dalam objek GlasswallWordSearchResult untuk Java.

Python

Untuk mengintegrasikan Glasswall Word Search di Python, Glasswall Python wrapper diperlukan.

# glasswall\libraries\word_search\word_search.py

def redact_file(self, input_file: Union[str, bytes, bytearray, io.BytesIO], content_management_policy: Union[str, bytes, bytearray, io.BytesIO], output_file: Union[None, str] = None, output_report: Union[None, str] = None, homoglyphs: Union[None, str, bytes, bytearray, io.BytesIO] = None, raise_unsupported: bool = True):
""" Redacts text from input_file using the given content_management_policy and homoglyphs file, optionally writing the redacted file and analysis report to the paths specified by output_file and output_report.

Args:
input_file (Union[str, bytes, bytearray, io.BytesIO]): The input file path or bytes.
content_management_policy (Union[str, bytes, bytearray, io.BytesIO)]): The content management policy to apply.
output_file (Union[None, str], optional): Default None. If str, write output_file to that path.
output_report (Union[None, str], optional): Default None. If str, write output_file to that path.
homoglyphs (Union[None, str, bytes, bytearray, io.BytesIO)], optional): Default None. The homoglyphs json file path or bytes.
raise_unsupported (bool, optional): Default True. Raise exceptions when Glasswall encounters an error. Fail silently if False.

Returns:
gw_return_object (glasswall.GwReturnObj): An instance of class glasswall.GwReturnObj containing attributes: "status" (int), "output_file" (bytes), "output_report" (bytes)
"""


def redact_directory(self, input_directory: str, content_management_policy: Union[str, bytes, bytearray, io.BytesIO, glasswall.content_management.policies.policy.Policy], output_directory: Optional[str] = None, output_report_directory: Optional[str] = None, homoglyphs: Union[None, str, bytes, bytearray, io.BytesIO] = None, raise_unsupported: bool = True):
""" Redacts all files in a directory and it's subdirectories using the given content_management_policy and homoglyphs file. The redacted files are written to output_directory maintaining the same directory structure as input_directory.

Args:
input_directory (str): The input directory containing files to redact.
output_directory (str): The output directory where the redacted files will be written.
output_report_directory (Optional[str], optional): Default None. If str, the output directory where analysis reports for each redacted file will be written.
content_management_policy (Union[str, bytes, bytearray, io.BytesIO)]): The content management policy to apply.
homoglyphs (Union[None, str, bytes, bytearray, io.BytesIO)], optional): Default None. The homoglyphs file path, str, or bytes.
raise_unsupported (bool, optional): Default True. Raise exceptions when Glasswall encounters an error. Fail silently if False.

Returns:
redacted_files_dict (dict): A dictionary of file paths relative to input_directory, and glasswall.GwReturnObj with attributes: "status" (int), "output_file" (bytes), "output_report" (bytes)
"""

Catatan: Tidak seperti beberapa bahasa lain yang didukung, semua output dikembalikan dalam objek GwReturnObj untuk Python.

JavaScript


/**
* Perform word search on input buffer, using the applied config and homoglyphs
* @param {Buffer} inputBuffer A buffer containing the contents of the document to be processed.
* @param {String} homoglyphs A homoglyphs file that will be used as part of the Word Search process (UTF-8 string).
* @param {String} configXml The content management XML policy (utf-8 string).
* @returns {WordSearchResult} The result from Word Search.
*/
wordSearch(inputBuffer: Buffer, homoglyphs: string, configXml: string): WordSearchResult

Catatan: Tidak seperti beberapa bahasa lain yang didukung, semua output dikembalikan dalam objek WordSearchResult untuk JavaScript.

GWWordSearchDone

Ini digunakan untuk melepaskan sumber daya apa pun yang telah dialokasikan oleh engine Word Search. Fungsi ini perlu dipanggil setelah setiap pemanggilan yang dilakukan ke fungsi GwWordSearch, jika tidak, kebocoran memori akan terjadi.

Pemanggilan API ini hanya diperlukan di C++.

C++

Status GwWordSearchDone(
void** output_buffer,
size_t* output_buffer_len,
void** output_report_buffer,
size_t* output_report_buffer_len)
NamaJenisArahDeskripsi
output_buffervoid **KeluarPointer ke pointer ke buffer yang berisi file yang telah diproses yang akan dibebaskan oleh library Word Search
output_buffer_lensize_t *KeluarPointer ke ukuran buffer file output
output_report_buffervoid **KeluarPointer ke pointer ke buffer yang berisi laporan analisis Word Search yang akan dibebaskan oleh library Word Search
output_report_buffer_lensize_t *KeluarPointer ke ukuran laporan analisis Word Search

Bahasa lain

Untuk semua bahasa yang dicakup oleh wrapper Glasswall, fungsi API GwWordSearchDone dipanggil secara internal di dalam wrapper, yang berarti API tidak diekspos kepada pengguna.

GwWordSearchVersion

Ini digunakan untuk mengambil nomor versi library saat ini.

C++

const char* GwWordSearchVersion(void)

GwWordSearchTranslateStatus

Terjemahkan kode kesalahan yang diberikan menjadi pesan kesalahan yang ramah pengguna.

C++

const char* GwWordSearchTranslateStatus(Status errorCode)
NamaJenisArahDeskripsi
errorCodeStatusMasukKode pengembalian yang harus diterjemahkan

Masalah umum

Word Search tidak memproses file

Saat menjalankan Word Search, pastikan semua library Embedded Engine berada di direktori yang sama, yang juga perlu ditetapkan sebagai current working directory. Glasswall mencari dependensinya di dalam current working directory dan jika dependensi tersebut tidak ditemukan maka file tidak akan diproses dengan benar. Pastikan juga kunci lisensi yang valid tersedia.

Contoh penggunaan

Berikut adalah contoh aplikasi yang mengambil file input, memprosesnya menggunakan mesin Glasswall Word Search, lalu menghasilkan file output beserta laporan analisis Word Search. Aplikasi contoh ini mengharapkan parameter command line berikut:

  1. Path ke XML konfigurasi manajemen konten.
  2. Path ke file homoglyphs.
  3. Jalur ke file input yang akan diproses.
  4. Jalur ke file output tempat file yang telah diproses akan disimpan.

C++

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstddef>
#include <stdexcept>

#include "api.h"

using namespace std;

// Read the file into a buffer
vector<uint8_t> readFile(ifstream &fileHandle, const string &filePath, bool nullTerminator)
{
fileHandle.exceptions(ifstream::failbit | ifstream::badbit);
fileHandle.open(filePath.c_str(), ios::binary | ios::ate);

vector<uint8_t> data;
streamsize size = fileHandle.tellg();
fileHandle.seekg(0, ios::beg);

data.resize(size + 1);
fileHandle.read(reinterpret_cast<char *>(data.data()), size);

if (nullTerminator)
{
data.push_back(0);
}

return data;
}

int main(int argc, char **argv)
{
if (argc != 5)
{
cerr << "Usage: <Path to XML Config> <Path to Homoglyphs> <Input file> <Output file>" << endl;
return -1;
}

// Read commandline arguments
string xmlFilePath(argv[1]);
string homoglyphsFilePath(argv[2]);
string inputFilePath(argv[3]);
string outputFilePath(argv[4]);

// Create file handles for input files
ifstream xmlFileHandle;
ifstream homoglyphsFileHandle;
ifstream inputFileHandle;

// Read files into buffers
vector<uint8_t> xmlBuffer = readFile(xmlFileHandle, xmlFilePath, true); // Buffer containing the XML content management settings. This is null terminated
vector<uint8_t> homoglyphsBuffer = readFile(homoglyphsFileHandle, homoglyphsFilePath, true); // Buffer containing the homoglyphs. This is null terminated
vector<uint8_t> inputBuffer = readFile(inputFileHandle, inputFilePath, false); // Buffer containing the input file to be processed

// Create variables for output buffers
void * outputBuffer = nullptr; // Output buffer for processed file
size_t outputBufferSize = 0; // Output buffer size
void * outputReportBuffer = nullptr; // Output buffer for analysis report file
size_t outputReportBufferSize = 0; // Output analysis report buffer size

// Run Word Search and redact
Status status = GwWordSearch(inputBuffer.data(), inputBuffer.size(), &outputBuffer, &outputBufferSize, &outputReportBuffer, &outputReportBufferSize, reinterpret_cast<const char*>(homoglyphsBuffer.data()), reinterpret_cast<const char *>(xmlBuffer.data()));

if (status == Status::eSuccess)
{
// Write out the processed output file if the Word Search and redact was successful
ofstream outputFileHandle(outputFilePath, ios::binary | ios::trunc);

if (outputFileHandle.is_open())
{
outputFileHandle.write(static_cast<const char *>(outputBuffer), outputBufferSize);
}

outputFileHandle.close();
}

// Write out the analysis report file
ofstream analysisFileHandle(outputFilePath + ".xml", ios::binary | ios::trunc);

if (analysisFileHandle.is_open())
{
analysisFileHandle.write(static_cast<const char *>(outputReportBuffer), outputReportBufferSize);
}

analysisFileHandle.close();

// Call done to release any allocated resources
GwWordSearchDone(&outputBuffer, &outputBufferSize, &outputReportBuffer, &outputReportBufferSize);

return 0;
}

C#

using System;
using System.IO;

namespace glasswall.word.search.csharp.testing
{
internal class Program
{
static void Main(string[] args)
{
Console.WriteLine("Word Search test");
if (args.Length != 4)
{
Console.WriteLine("usage: <Xml Config> <Homoglyphs> <Input Directory> <OutputDirectory>");
Console.WriteLine("Parameters specified: \n{0}", string.Join("\n", args));
return;
}

string xmlConfigPath = args[0];
string homoglyphsPath = args[1];
string inputDirectory = args[2];
string outputDirectory = args[3];

if (!File.Exists(xmlConfigPath))
{
Console.Error.WriteLine("Xml config does not exist: {0}", xmlConfigPath);
return;
}

if (!File.Exists(homoglyphsPath))
{
Console.Error.WriteLine("Homoglyphs does not exist: {0}", homoglyphsPath);
return;
}

if (!Directory.Exists(inputDirectory))
{
Console.Error.WriteLine("Input directory does not exist: {0}", inputDirectory);
return;
}

Directory.CreateDirectory(outputDirectory);

using (FileStream fileStream = new FileStream(Path.Combine(outputDirectory, "ProcessLog.txt"), FileMode.OpenOrCreate, FileAccess.Write))
{
using (StreamWriter writer = new StreamWriter(fileStream))
{
writer.WriteLine("> Word Search Library version: {0}", GlasswallWordSearch.GwWordSearchVersion());

string xmlConfig = File.ReadAllText(xmlConfigPath);
string homoglyphs = File.ReadAllText(homoglyphsPath);

foreach (string path in Directory.EnumerateFiles(inputDirectory, "*", SearchOption.AllDirectories))
{
writer.WriteLine("> Processing file: {0}", path);
string inputDirectoryPath = path.Substring(inputDirectory.Length + 1);
string directory = Path.Combine(outputDirectory, inputDirectoryPath);
Directory.CreateDirectory(directory);
processFile(path, directory, homoglyphs, xmlConfig);
}
}
}

return;
}
static void WriteAllBytes(string path, byte[] data)
{
if (data == null)
{
File.Create(path);
}
else
{
File.WriteAllBytes(path, data);
}
}
public static void processFile(string inputFile, string outputDirectory, string homoglyphs, string xmlConfig)
{

using (FileStream fileStream = new FileStream(Path.Combine(outputDirectory, Path.GetFileName(inputFile) + ".log"), FileMode.OpenOrCreate, FileAccess.Write))
{
using (StreamWriter writer = new StreamWriter(fileStream))
{
// Word Search
writer.WriteLine(">> Run Word Search");
byte[] inputFileBuffer = File.ReadAllBytes(inputFile);
byte[] outputBuffer, outputReportBuffer;
GlasswallWordSearch.WordSearchStatus status = GlasswallWordSearch.GwWordSearch(inputFileBuffer, out outputBuffer, out outputReportBuffer, homoglyphs, xmlConfig);
writer.WriteLine("Status is: {0}", status);

if (outputBuffer != null)
{
WriteAllBytes(Path.Combine(outputDirectory, Path.GetFileName(inputFile)), outputBuffer);
}

if (outputReportBuffer != null)
{
WriteAllBytes(Path.Combine(outputDirectory, Path.GetFileName(inputFile)) + ".xml", outputReportBuffer);
}
}
}
}
}
}

Java

package com.glasswallsolutions;

import java.lang.System;
import java.io.*;
import com.glasswallsolutions.*;
import java.nio.file.Paths;

public class MainTest {

public static byte[] readAllBytes(InputStream inputStream) throws IOException
{
final int bufLen = 4 * 0x400; // 4KB
byte[] buf = new byte[bufLen];
int readLen;

try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
while ((readLen = inputStream.read(buf, 0, bufLen)) != -1)
outputStream.write(buf, 0, readLen);

return outputStream.toByteArray();
}
}

public static void main(String[] args) throws Exception {
if (args.length != 4)
{
System.out.println("Usage: <Input Directory> <Output Directory> <Homoglyphs File> <Config XML>");
System.exit(-1);
}

File inputDirectory = new File(args[0]);
File outputDirectory = new File(args[1]);
outputDirectory.delete();
outputDirectory.mkdir();
String homoglyphsFile = args[2];
String configXmlFile = args[3];

String homoglyphs = null;
String configXML = null;

GlasswallWordSearch glasswallWordSearch = new GlasswallWordSearch();

try(FileInputStream homoglyphsInputStream = new FileInputStream(homoglyphsFile))
{
homoglyphs = new String(readAllBytes(homoglyphsInputStream));
}

try(FileInputStream configXmlInputStream = new FileInputStream(configXmlFile))
{
configXML = new String(readAllBytes(configXmlInputStream));
}

System.out.println("Word Search version: " + glasswallWordSearch.version());

for (File inputFile : inputDirectory.listFiles())
{
try
{
System.out.println("Processing file: " + inputFile.getAbsolutePath());

File fileOutputDirectory = new File(Paths.get(outputDirectory.getAbsolutePath(), inputFile.getName()).toString());
fileOutputDirectory.mkdir();
String fileOutputPath = Paths.get(fileOutputDirectory.getAbsolutePath(), inputFile.getName()).toString();

try(FileInputStream inputStream = new FileInputStream(inputFile))
{
byte[] fileData = readAllBytes(inputStream);

GlasswallWordSearchResult result = glasswallWordSearch.wordSearch(fileData, homoglyphs, configXML);

System.out.println("Status: " + result.success);

if (result.outputDocument != null)
{
try(FileOutputStream fileOutputStream = new FileOutputStream(fileOutputPath))
{
fileOutputStream.write(result.outputDocument);
}
}

if (result.report != null)
{
try(FileOutputStream fileOutputStream = new FileOutputStream(fileOutputPath + ".xml"))
{
fileOutputStream.write(result.report.getBytes());
}
}
}
}
catch(Exception ex)
{
System.err.println("Exception occurred: " + ex.getMessage());
ex.printStackTrace(System.err);

}
}
}
}

Python

Untuk contoh lebih lanjut lihat Python Word Search & Redaction

JavaScript


import fs from 'fs';
import path from 'path';
import { GlasswallWordSearch, GlasswallWordSearchNative, WordSearchResult, WordSearchStatus } from '../index'

let main = function()
{
const args = process.argv;

if (args.length === 7)
{
let wordSearchDllPath = path.resolve(args[2]);
let inputDirectory = path.resolve(args[3]);
let outputDirectory = path.resolve(args[4]);
let homoglyphsPath = path.resolve(args[5]);
let configXmlPath = path.resolve(args[6]);

let handler = new GlasswallWordSearchNative(wordSearchDllPath, { enableLogging: true});
let glasswallWordSearch = new GlasswallWordSearch(handler);
console.log("Glasswall Word Search version: " + glasswallWordSearch.version())

if (!fs.existsSync(inputDirectory))
{
console.log('Input Directory does not exist: ' + inputDirectory);
process.exit(-1);
}

if (!fs.existsSync(homoglyphsPath))
{
console.log('Homoglyphs file does not exist: ' + homoglyphsPath);
process.exit(-1);
}

if (!fs.existsSync(configXmlPath))
{
console.log('Config XML file does not exist: ' + configXmlPath);
process.exit(-1);
}

let homoglyphs = fs.readFileSync(homoglyphsPath, 'utf8');
let configXml = fs.readFileSync(configXmlPath , 'utf8');

fs.mkdirSync(outputDirectory, {recursive: true});

fs.readdirSync(inputDirectory).forEach(file => {
try
{
let fullFilePath = path.join(inputDirectory, file);

if (fs.statSync(fullFilePath).isFile())
{
console.log('Processing file: ' + fullFilePath);
let outputFileDirectory = path.join(outputDirectory, file);
fs.mkdirSync(outputFileDirectory);
let inputBuffer = fs.readFileSync(fullFilePath);
let wordSearchResult = glasswallWordSearch.wordSearch(inputBuffer, homoglyphs, configXml);
console.log("Status: " + wordSearchResult.status);

if (wordSearchResult.outputBuffer != undefined && wordSearchResult.outputBuffer != null)
{
fs.writeFileSync(path.join(outputFileDirectory, file), wordSearchResult.outputBuffer);
}

if (wordSearchResult.analysisXmlReport != undefined && wordSearchResult.analysisXmlReport != null)
{
fs.writeFileSync(path.join(outputFileDirectory, file + ".xml"), wordSearchResult.analysisXmlReport);
}
}
}
catch(error)
{
console.log("Exception occurred: " + error);
console.trace(error);
}

})
}
else
{
console.log("Usage: Application <Library File> <Input Directory> <Output Directory> <Homoglyphs File> <Config XML>");
process.exit(-1);
}
}

if (require.main === module){
main();
}