±×°ÍÀº ÆÄÀ̽ãÀ¸·Î ±¸ÇöÀÌ µÇ°ÚÁö¸¸ Àû¾îµµ ÃʱâÀÇ ´Ü°è¿¡¼´Â ´ë½Å¿¡ º£ÀÌÁ÷À̳ª ƼŬ·Î ¾µ ¼ö°¡ ÀÖ´Ù. ¿ì¸®°¡ ´õ º¹ÀâÇÑ ºÎºÐÀ¸·Î À̵¿ÇÔ¿¡ µû¶ó¼ ¿ì¸®´Â ÆÄÀ̽ãÀÇ ³»Àå µ¥ÀÌŸ ±¸Á¶µéÀ» Á¡Á¡ ´õ ¸¹ÀÌ »ç¿ëÇÏ°Ô µÉ °ÍÀÌ°í ±×·¯¹Ç·Î, ƼŬÀº ¿©ÀüÈ÷ ¼±ÅûçÇ×ÀÏ ¼ö ÀÖ°ÚÁö¸¸, º£ÀÌÁ÷À» »ç¿ëÇϱâ´Â ´õ ¾î·Á¿öÁú °ÍÀÌ´Ù. °á±¹ OO ÀÇ ¸ð½ÀÀº ÆÄÀ̽㿡°Ô¸¸ Àû¿ëµÉ °ÍÀÌ´Ù.
±¸Çö°¡´ÉÇÏÁö¸¸ ¿©ÀüÈ÷ ¿¬½À¹®Á¦·Î µ¶ÀÚ¿¡°Ô ³²°ÜÁú Ãß°¡ÀûÀÎ »ç¾çÀº ´ÙÀ½°ú °°´Ù:
import string
def numwords(s):
list = string.split(s)
return len(list)
inp = open("menu.txt","r")
total = 0
# accumulate totals for each line
for line in inp.readlines():
total = total + numwords(line)
print "File had %d words" % total
inp.close()
¿ì¸®´Â ¶óÀΰú ¹®ÀÚ °è¼ö±â¸¦ Ãß°¡ÇÒ Çʿ䰡 ÀÖ´Ù. ¶óÀÎ °è¼ö±â´Â ½¬¿îµ¥ ¿Ö³ÄÇÏ¸é ¿ì¸®°¡ °¢¶óÀÎÀ» ȸµ¹ÀÌ ½ÃŰ°í ´Ü¼øÈ÷ ±× ȸµ¹ÀÌÀÇ °¢ ¹Ýº¹¿¡ Áõ°¡ÇÏ´Â º¯¼ö Çϳª¸¸ ÇÊ¿äÇϱ⠶§¹®ÀÌ´Ù.
¹®ÀÚ °è¼ö±â´Â ´ÜÁö ¾à°£ ´õ ¾î·Æ´Ù. ¿Ö³ÄÇÏ¸é ¿ì¸®´Â ´Ü¾îÀÇ ¸®½ºÆ®¸¦ ¹Ýº¹½ÃÄѼ ±×°ÍµéÀÇ ±æÀ̸¦ ¿ª½Ã ¶Ç ´Ù¸¥ º¯¼ö Çϳª¿¡ Ãß°¡ÇÒ ¼ö Àֱ⠶§¹®ÀÌ´Ù.
¿ì¸®´Â ¶ÇÇÑ ±× ÇÁ·Î±×·¥À» ¸í·É¾î ¶óÀÎÀ¸·ÎºÎÅÍ ±× ÆÄÀÏÀÇ À̸§À» ÀÐÀ½À¸·Î½á ȤÀº ¸¸¾à Á¦°øµÇÁö ¾Ê´Â´Ù¸é, »ç¿ëÀÚ¿¡°Ô ±× ÆÄÀÏÀÇ À̸§À» ¿ä±¸ÇÔÀ¸·Î½á ´õ¿í ÀϹÝÀûÀÎ ¸ñÀûÀ¸·Î »ç¿ëµÇµµ·Ï ¸¸µé Çʿ䰡 ÀÖ´Ù. (´ë¾ÈÀûÀÎ Àü·«Àº Ç¥ÁØÀÔ·ÂÀ¸·ÎºÎÅÍ Àд °ÍÀÌ´Ù, ±×°ÍÀÌ ÁøÂ¥ wc°¡ ÇÏ´Â °ÍÀÌ´Ù.)
±×·¡¼ ÃÖÁ¾ wc ´Â ´ÙÀ½°ú °°ÀÌ º¸ÀδÙ:
import sys, string # Get the file name either from the commandline or the user if len(sys.argv) != 2: name = raw_input("Enter the file name: ") else: name = sys.argv[1] inp = open(name,"r") # initialise counters to zero; which also creates variables words = 0 lines = 0 chars = 0 for line in inp.readlines(): lines = lines + 1 # Break into a list of words and count them list = string.split(line) words = words + len(list) chars = len(line)# Use the original line length which includes spaces etc. print "%s has %d lines, %d words and %d characters" % (name, lines, words, chars) inp.close()
¿©·¯ºÐÀÌ À¯´Ð½ºÀÇ wc ¸í·É¾î¿¡ Àͼ÷ÇÏ´Ù¸é ¿©·¯ºÐÀº ±×°ÍÀ» ¿ÍÀϵåÄ«µå ÆÄÀÏÀ̸§À¸·Î ³Ñ°Ü¼ ÀÏÄ¡µÇ´Â ¸ðµç ÆÄÀÏ¿¡ ´ëÇÑ Á¤º¸»Ó¸¸ ¾Æ´Ï¶ó Àüü ÇÕ°è¿¡ ´ëÇÑ Á¤º¸µµ ȹµæÇÒ ¼ö ÀÖ´Ù´Â °ÍÀ» ¾Ë °ÍÀÌ´Ù. ÀÌ ÇÁ·Î±×·¥Àº ¿ÀÁ÷ Á÷Á¢ÀûÀ¸·Î ÁÖ¾îÁö´Â ÆÄÀÏ À̸§¸¸À» ¸¸Á·ÇÑ´Ù. ¿©·¯ºÐÀÌ ±×°ÍÀ» È®ÀåÇÏ¿© ¿ÍÀϵå Ä«µå¸¦ ¸¸Á·½ÃŰ·Á¸é glob ¸ðµâÀ» »ìÆì¼ À̸§ÀÇ ¸®½ºÆ®¸¦ ¸¸µé°í ±×¸®°í ´ÜÁö ±× ÆÄÀÏÀÇ ¸®½ºÆ®¸¦ ¹Ýº¹Çϱ⸸ ÇÏ¸é µÈ´Ù. ¿©·¯ºÐÀº °¢ ÆÄÀÏÀ» À§ÇÑ Àӽà ī¿îÅͰ¡ ÇÊ¿äÇÒ °ÍÀÌ´Ù ±×¸®°í´Â Àüü Çհ踦 À§ÇÏ¿© ÃàÀûµÈ Ä«¿îÅͰ¡ ÇÊ¿äÇÒ °ÍÀÌ´Ù. ȤÀº ¿©·¯ºÐÀº ´ë½Å¿¡ »çÀüÀ» »ç¿ëÇÒ ¼öµµ ÀÖ´Ù...
±×°Í¿¡ ´ëÇÏ¿© Á¶±Ý¸¸ ´õ »ý°¢ÇØ º¸¸é ¿ì¸®°¡ ´Ü¼øÇÏ°Ô ´Ü¾îµé°ú ±¸µÎÁ¡ ¹®ÀÚµéÀ» ¸ðÀº´Ù¸é ¿ì¸®´Â ÈÄÀÚ¸¦ ºÐ¼®ÇÏ¿© (¿ì¸®°¡ ±¸µÎÁ¡µéÀÇ °üÁ¡¿¡¼ ¹®Àå/ÀýÀ» ¹«¾ùÀ¸·Î »ý°¢ÇÏ´Â Áö¸¦ Á¤ÀÇ ÇÔÀ¸·Î½á) ¹®Àåµé, Àý µîµîÀ» ¼¿ ¼ö ÀÖ´Ù´Â °ÍÀÌ ¸í¹éÇØÁø´Ù.
À̰ÍÀº ¿ì¸®°¡ ±× ÆÄÀÏ¿¡ ´ëÇÏ¿© Çѹø¸¸ ¹Ýº¹ÇÒ Çʿ䰡 ÀÖÀ¸¸ç ±×¸®°í ±× ±¸µÎÁ¡¿¡ ´ëÇÏ¿© ¹Ýº¹ÇÒ Çʿ䰡 ÀÖ´Ù´Â °ÍÀ» ¶æÇÑ´Ù - ÈξÀ ´õ ÀÛÀº ¸®½ºÆ®·Î.
±×°ÍÀ» ÀÇ»ç-ÄÚµå·Î ´ëÃæ±×·Áº¸ÀÚ:
foreach line in file: increment line count if line empty: increment paragraph count split line into character groups foreach character group: increment group count extract punctuation chars into a dictionary - {char:count} if no chars left: delete group else: increment word count sentence count = sum of('.', '?', '!') clause count = sum of all punctuation (very poor definition...) report paras, lines, sentences, clauses, groups, words. foreach puntuation char: report countÀ̰ÍÀ» º¸¸é ¿ì¸®´Â ´ë·« 4°³ÀÇ ÇÔ¼ö¸¦ À§¿Í °°ÀÌ ÀÚ¿¬ÀûÀÎ ±×·ìȸ¦ »ç¿ëÇÏ¿© ÀÛ¼ºÇÒ ¼ö ÀÖÀ» °Í °°´Ù. À̰ÍÀÇ µµ¿òÀ¸·Î ¿ì¸®´Â ºÎºÐ ȤÀº Àüü¸¦ Àç»ç¿ëÇÒ ¼ö ÀÖ´Â ¸ðµâÀ» ±¸ÃàÇÒ ¼ö ÀÖ´Ù.
############################# # Module: grammar # Created: A.J. Gauld, 2000,8,12 # # Funtion: # counts paragraphs, lines, sentences, 'clauses', char groups, # words and punctuation for a prose like text file. It assumes # that sentences end with [.!?] and paragraphs have a blank line # between them. A 'clause' is simply a segment of sentence # separated by punctuation(braindead but maybe someday we'll # do better!) # # Usage: Basic usage takes a filename parameter and outputs all # stats. Its really intended that a second module use the # functions provided to produce more useful commands. ############################# import string, sys ############################ # initialise global variables para_count = 1 line_count, sentence_count, clause_count, word_count = 0,0,0,0 groups = [] alphas = string.letters + string.digits stop_tokens = ['.','?','!'] punctuation_chars = ['&','(',')','-',';',':',','] + stop_tokens for c in punctuation_chars: punctuation_counts[c] = 0 punctuation_counts = {} format = """%s contains: %d paragraphs, %d lines and %d sentences. These in turn contain %d clauses and a total of %d words.""" ############################ # Now define the functions that do the work def getCharGroups(infile): pass def getPunctuation(wordList): pass def reportStats(): print format % (sys.argv[1],para_count, line_count, sentence_count, clause_count, word_count) def Analyze(infile): getCharGroups(infile) getPunctuation(groups) reportStats() # Make it run if called from the command line (in which # case the 'magic' __name__ variable gets set to '__main__' if __name__ == "__main__": if len(sys.argv) != 2: print "Usage: python grammer.py <filename >" sys.exit() else: Document = open(sys.argv[1],"r") Analyze(Document)ÇѰ³ÀÇ ±â´Ù¶õ ¸®½ºÆ®¿¡ ¸ðµç °ÍÀ» º¸¿©ÁÖ·Á°í ³ë·ÂÇϱ⠺¸´Ù´Â ³ª´Â À̰ÍÀ» ÁÖ¿äÁ¡¸¸ ³íÀÇÇÒ °ÍÀ̸ç 3°³ÀÇ Áß¿äÇÑ ÇÔ¼ö¸¦ Â÷·Ê·Î °¢°¢ »ìÆìº¼ °ÍÀÌ´Ù. ±×·¸Áö¸¸ ±× ÇÁ·Î±×·¥À» ÀÛµ¿½Ã۱â À§ÇÏ¿© ¿©·¯ºÐÀº ±×°ÍÀ» ¸ðµÎ´Ù °á±¹¿¡´Â À̾îºÙÀÏ Çʿ䰡 ÀÖÀ» °ÍÀÌ´Ù.
ÁÖÀÇÇÒ Ã¹ ¹øÂ° »çÇ×Àº »ó´ÜºÎÀÇ ÁÖ¼®ÀÌ´Ù. À̰ÍÀº ÀϹÝÀûÀÎ °ü½ÀÀ¸·Î ±× ÆÄÀÏÀ» Àд µ¶ÀÚ°¡ ±× ÆÄÀÏÀÌ ¹«¾ùÀ» ´ã°í ÀÖÀ¸¸ç ¾î¶»°Ô ¾²¿©Á®¾ß ÇÏ´ÂÁö¿¡ ´ëÇÏ¿© ÀÌÇØÇÒ ¼ö ÀÖµµ·Ï ÇØÁØ´Ù. ºñ½ÁÇÑ ½Ã±âÀÇ ÃֽйöÀüÀ» »ç¿ëÇϰí ÀÖ´Â ´Ù¸¥ »ç¶÷°ú °á°ú¸¦ ºñ±³ÇÒ¶§ ¹öÀü Á¤º¸(ÀúÀÚ¿Í ³¯Â¥) ¿ª½Ã À¯¿ëÇÏ´Ù.
¸¶Áö¸· ¼½¼ÇÀº "__main__" ¸í·É¾î ¶óÀο¡¼ ÀûÀçµÈ ¾î¶² ¸ðµâÀÌ¶óµµ È£ÃâÇÏ´Â ÆÄÀ̽ãÀÇ ´É·ÂÀÌ´Ù. ¿ì¸®´Â Ưº°ÇÑ, ³»Àå __name__ º¯¼ö¸¦ Å×½ºÆ®ÇÒ ¼ö ÀÖÀ¸¸ç ¸¸¾à ±×°ÍÀÌ main À̶ó¸é ¿ì¸®´Â ±× ¸ðµâÀÌ ´Ü¼øÈ÷ ¼öÀԵǴ °ÍÀÌ ¾Æ´Ï¶ó ½ÇÇàµÈ´Ù´Â °ÍÀ» ¾È´Ù. ±×·¡¼ ¿ì¸®´Â ±× Ã˹ßÄڵ带 if ¾È¿¡¼ ½ÇÇàÇÑ´Ù.
ÀÌ·¯ÇÑ ÃË¹ß ÄÚµå´Â ¾Æ¹«·± ÆÄÀÏÀ̸§ÀÌ ÁÖ¾îÁöÁö ¾Ê°Å³ª ȤÀº ½ÇÁ¦·Î ³Ê¹« ¸¹Àº ÆÄÀÏÀ̸§ÀÌ ÁÖ¾îÁø´Ù¸é ±× ÇÁ·Î±×·¥ÀÌ ¾î¶»°Ô ½ÇÇàµÇ¾î¾ß ÇÏ´ÂÁö¿¡ °üÇÏ¿© »ç¿ëÀÚ ¿ìÈ£ÀûÀÎ ÈùÆ®¸¦ ´ã°í ÀÖ´Ù.
¸¶Áö¸·À¸·Î ÁÖ¸ñÇÒ °ÍÀº Analyze() ÇÔ¼ö´Â ´Ü¼øÇÏ°Ô ´Ù¸¥ ÇÔ¼öµéÀ» ¿À¸¥ÂÊ ¼ø¼·Î È£ÃâÇÑ´Ù´Â °ÍÀÌ´Ù. ¶Ç ´Ù½Ã À̰ÍÀº ¾ÆÁÖ ÈçÇÑ °ü·Ê·Î¼ »ç¿ëÀÚ°¡ °£´ÜÇÑ ¹æ¹ýÀ¸·Î (Analyze()¸¦ ÅëÇÏ¿©) ±× ±â´ÉÀÇ ¸ðµç °ÍÀ» »ç¿ëÇϵµ·Ï ¼±ÅÃÇÒ ¼ö ÀÖ°Ô ÇØÁְųª, ȤÀº Àú ¼öÁØÀÇ ¿ø½Ã primitive ÇÔ¼ö¸¦ Á÷Á¢ È£ÃâÇϴ°ÍÀ» ¼±ÅÃÇϵµ·Ï ÇØÁØ´Ù.
foreach line in file: increment line count if line empty: increment paragraph count split line into character groups¿ì¸®´Â À̰ÍÀ» ÆÄÀ̽ãÀ¸·Î º° ´Ù¸¥ ³ë·Â¾øÀÌ ±¸ÇöÇÒ ¼ö ÀÖ´Ù:
# use global counter variables and list of char groups def getCharGroups(infile): global para_count, line_count, groups try: for line in infile.readlines(): line_count = line_count + 1 if len(line) == 1: # only newline => para break para_count = para_count + 1 else: groups = groups + string.split(line) except: print "Failed to read file ", sys.argv[1] sys.exit()ÁÖ ÀÇ 1: ¿ì¸®´Â ¿©±â¿¡¼ global Ű¿öµå¸¦ »ç¿ëÇÏ¿© ±× ÇÔ¼ö ¹Û¿¡ »ý¼ºµÇµµ·Ï º¯¼öµéÀ» ¼±¾ðÇØ¾ß¸¸ ÇÑ´Ù. ¸¸¾à ¿ì¸®°¡ ±×°Íµé¿¡ ÇÒ´çÇÒ ¶§ ±×·¸°Ô ÇÏÁö ¾Ê¾Ò´Ù¸é ÆÄÀ̽ãÀº ÀÌ ÇÔ¼ö¿¡ ´ëÇÏ¿© ¶È °°Àº À̸§ÀÇ Áö¿ªÀûÀÎlocal »õ·Î¿î º¯¼öµéÀ» »ý¼ºÇÒ °ÍÀÌ´Ù. ÀÌ·¯ÇÑ Áö¿ª º¯¼ö¸¦ º¯°æÇÏ´Â °ÍÀº ¸ðµâ(ȤÀº Àü¿ªglobal) ¼öÁØÀÇ °ªµé¿¡´Â ¿µÇâÀ» ¹ÌÄ¡Áö ¾ÊÀ» °ÍÀÌ´Ù.
ÁÖ ÀÇ 2: ¿ì¸®´Â ¿©±â¿¡¼ try/exceptÀýÀ» »ç¿ëÇÏ¿© ¿¡·¯µéÀ» ³¬¾ÆÃ¤¾î ±× À߸øÀ» º¸°íÇϰí ÇÁ·Î±×·¥À» Á¾·áÇÏ¿´´Ù.
À̰ÍÀº ¾à°£ ´õ ³ë·ÂÀ» ÇÊ¿ä·Î ÇÏ¸ç ÆÄÀ̽ãÀÇ »õ·Î¿î »ç¾ç ¸î¸îÀ» »ç¿ëÇÑ´Ù.
ÀÇ»ç ÄÚµå´Â ´ÙÀ½°ú °°´Ù:
foreach character group: increment group count extract punctuation chars into a dictionary - {char:count} if no chars left: delete group else: increment word count
³ªÀÇ Ã¹ ¹øÂ° ½Ãµµ´Â ´ÙÀ½°ú °°ÀÌ º¸ÀÏ °ÍÀÌ´Ù:
def getPunctuation(wordList): global punctuation_counts for item in wordList: while item and (item[-1] not in alphas): p = item[-1] item = item[:-1] if p in punctuation_counts.keys(): punctuation_counts[p] = punctuation_counts[p] + 1 else: punctuation_counts[p] = 1ÁÖ¸ñÇÒ °ÍÀº À̰ÍÀº ÀÇ»ç ÄÚµå ¹öÁ¯ÀÇ ¸¶Áö¸· ÀýÀÎ if/elseÀ» Æ÷ÇÔÇÏÁö ¾Ê´Â´Ù´Â °ÍÀÌ´Ù. ³ª´Â °£°áÇÏ°Ô Çϱâ À§ÇÏ¿© ±×°ÍÀ» »ý·«Çß´Ù ±×¸®°í ½ÇÁ¦·Î ¿ÀÁ÷ ±¸µÎÁ¡ ¹®ÀÚ¸¸ ´ã°í ÀÖ´Â ´Ü¾î´Â °ÅÀÇ ¹ß°ßµÇÁö ¾ÊÀ» °ÍÀ̶ó°í »ý°¢Ç߱⠶§¹®ÀÌ´Ù. ±×·¸Áö¸¸ ¿ì¸®´Â ±×°ÍÀ» ±× ÄÚµåÀÇ ¸¶Áö¸· ¹öÀü¿¡ Ãß°¡ÇÒ °ÍÀÌ´Ù.
ÁÖ ÀÇ 1: ¿ì¸®´Â wordList¸¦ ¸Å°³º¯¼öÈ ÇßÀ¸¹Ç·Î ±×·¡¼ ±× ¸ðµâÀ» »ç¿ëÇÏ´Â »ç¶÷Àº ÆÄÀϷκÎÅÍ ÀÏÇϵµ·Ï °Á¦µÇ±â º¸´Ù´Â ±×µé ÀڽŸ¸ÀÇ ¸®½ºÆ®¸¦ °ø±ÞÇÒ¼ö ÀÖ´Ù.
ÁÖ ÀÇ 2: ¿ì¸®´Â item[:-1]À» item¿¡ ÇÒ´çÇß´Ù. À̰ÍÀº ÆÄÀ̽㿡¼ Á¶°¢½ä±â slicing·Î ¾Ë·ÁÁ® ÀÖÀ¸¸ç ÄÝ·ÐÀº ´Ü¼øÈ÷ ±× ÁöÇ¥¸¦ ¹üÀ§·Î Ãë±ÞÇÑ´Ù´Â °ÍÀ» ¸»ÇØÁØ´Ù. ¿¹¸¦ µé¾î ¿ì¸®°¡ item[3:6]¸¦ ÁöÁ¤ÇÑ´Ù¸é item[3}, item[4] ±×¸®°í item[5] ¸¦ ¸®½ºÆ®·Î ÃßÃâÇÑ´Ù.
±âº» ¹üÀ§´Â ÄÝ·ÐÀÌ °ø¹éÀÎ ÂÊÀÌ ¾î´ÀÂÊÀ̳Ŀ¡ µû¶ó¼ ¸®½ºÆ®ÀÇ ½ÃÀÛ È¤Àº ³¡ÀÌ´Ù ±×¸®ÇÏ¿© item[3:]´Â item[3]¿¡¼ºÎÅÍ ¸¶Áö¸·±îÁöÀÇ Ç׸ñ ¸ðµç ±¸¼º¿øÀ» ¶æÇÒ °ÍÀÌ´Ù. ¶ÇÇÑ À̰ÍÀº ´ë´ÜÈ÷ À¯¿ëÇÑ ÆÄÀ̽ãÀÇ »ç¾çÀÌ´Ù. ¿ø·¡ÀÇ item ¸®½ºÆ®´Â (±×¸®°í Àû´çÇÑ ½Ã±â¿¡ ¾²·¹±â ¼öÁýÀÌ µÇ¾î) »ç¶óÁö°í »õ·ÎÀÌ »ý¼ºµÈ ¸®½ºÆ®´Â item¿¡ ÇÒ´çµÈ´Ù.
ÁÖ ÀÇ 3: ¿ì¸®´Â À½ÀÇ ÁöÇ¥¸¦ »ç¿ëÇÏ¿© item·ÎºÎÅÍ ¸¶Áö¸· ¹®ÀÚ¸¦ ÃßÃâÇÒ ¼ö ÀÖ´Ù. À̰ÍÀº ´ë´ÜÈ÷ À¯¿ëÇÑ ÆÄÀ̽ãÀÇ »ç¾çÀÌ´Ù. ¶ÇÇÑ ±×·ìÀÇ ¸¶Áö¸·¿¡ ´ÙÁß ±¸µÎÁ¡ ¹®ÀÚµéÀÌ ÀÖ´Â °æ¿ì¿¡´Â ¿ì¸®´Â ȸµ¹À̸¦ ÇÑ´Ù.
À̰ÍÀ» Å×½ºÆ®ÇÏ´Â µµÁß¿¡ ¿ì¸®´Â ±×·ìÀÇ ¾Õ¿¡¼ ¿ª½Ã °°Àº ÀÏÀ» ÇÒ Çʿ䰡 ÀÖ´Ù´Â °ÍÀÌ ¸í·áÇØÁø´Ù. ¿Ö³ÄÇÏ¸é ´Ý´Â °¢°ýÈ£°¡ °¨ÁöµÇ¾úÀ½¿¡µµ ºÒ±¸ÇÏ°í ¿©´Â °¢°ýÈ£°¡ ¾ø±â¶§¹®ÀÌ´Ù! ÀÌ ¹®Á¦¸¦ ±Øº¹Çϱâ À§Çؼ ³ª´Â »õ·Î¿î ÇÔ¼ö trim()¸¦ ¸¸µé °ÍÀÌ´Ù. ±×°ÍÀº °³º°ÀûÀÎ ¹®ÀÚ±×·ìÀÇ ¾Õ°ú µÚ·ÎºÎÅÍ ±¸µÎÁ¡À» Á¦°ÅÇÒ °ÍÀÌ´Ù:
#########################################################
# Note trim uses recursion where the terminating condition
# is either 0 or -1. An "InvalidEnd" error is raised for
# anything other than -1, 0 or 2.
##########################################################
def trim(item,end = 2):
""" remove non alphas from left(0), right(-1) or both ends of item"""
if end not in [0,-1,2]:
raise "InvalidEnd"
if end == 2:
trim(item, 0)
trim(item, -1)
else:
while (len(item) > 0) and (item[end] not in alphas):
ch = item[end]
if ch in punctuation_counts.keys():
punctuation_counts[ch] = punctuation_counts[ch] + 1
if end == 0: item = item[1:]
if end == -1: item = item[:-1]
ÁÖ¸ñÇÒ °ÍÀº µÇºÎ¸§À» °áÇÕÇÑ »ç¿ë¹ýÀ¸·Î ÇϳªÀÇ ¸Å°³º¯¼ö¸¦ ±âº»¼³Á¤ÇÔÀ¸·Î½á ¿ì¸®´Â ÇѰ³ÀÇ trim ÇÔ¼ö¸¦ Á¤ÀÇÇÒ ¼ö ÀÖ°í ±× ÇÔ¼ö´Â ±âº»À¸·Î ¾çÂʳ¡À» ´Ùµë´Â´Ù, ±×·¯³ª (¸Å°³º¯¼ö¿¡) ³Ñ°Ü¹ÞÀ½À¸·Î½á ¸¶Áö¸· °ªÀº ¿À·ÎÁö ÇÑ ÂÊÀÇ ³¡ ¸¸À» ó¸®Çϵµ·Ï ¸¸µé¾îÁú ¼ö ÀÖ´Ù.
¸¶Áö¸· °ªÀº ÆÄÀ̽ãÀÇ ÁöÇ¥È ½Ã½ºÅÛÀ» ¹Ý¿µÇÏ¿© ¼±Åõȴ٠: 0 Àº ¿ÞÂÊ ³¡À̰í -1 Àº ¿À¸¥ÂÊ ³¡ÀÌ´Ù.
³ª´Â ¿ø·¡ µÎ °³ÀÇ trim ÇÔ¼ö¸¦ °¢°¢¿¡ ´ëÇÏ¿© Çϳª¾¿, ÀÛ¼ºÇߴµ¥, ±×·¯³ª ³Ê¹«µµ ¸¹ÀÌ ºñ½ÁÇØ¼ ³ª´Â ±×°ÍµéÀ» ¸Å°³º¯¼ö¸¦ »ç¿ëÇÏ¿© °áÇÕÇÒ ¼ö ÀÖÀ½À» ±ú´Ý¾Ò´Ù.
±×·¯¸é getPuntuationÇÔ¼ö´Â °ÅÀÇ ½Ã½ÃÇÏ°Ô µÈ´Ù:
def getPunctuation(wordList):
for item in wordList:
trim(item)
# Now delete any empty 'words'
for i in range(len(wordList)):
if len(wordList[i]) == 0:
del(wordList[i])
ÁÖ ÀÇ 1: À̰ÍÀº ÀÌÁ¦ °ø¹é ´Ü¾îÀÇ »èÁ¦¸¦ Æ÷ÇÔÇÑ´Ù.
ÁÖ ÀÇ 2: Àç»ç¿ëÀÇ °üÁ¡¿¡¼ ¿ì¸®´Â trimÀ» ´õ Àß ´Ùµë¾î¼ ¿ª½Ã ´õ ÀÛÀº °ÍÀ¸·Î ¸¸µå´Â °ÍÀÌ ´õ ÁÁ¾ÒÀ»Áöµµ ¸ð¸¥´Ù. À̰ÍÀ¸·Î ¿ì¸®´Â ÇÑ ´Ü¾îÀÇ ¾Õ ȤÀº µÚ·ÎºÎÅÍ ÇѰ³ÀÇ ±¸µÎÁ¡À» Á¦°ÅÇϰí Á¦°ÅµÈ ¹®ÀÚ¸¦ ¹ÝȯÇÏ´Â ÇÔ¼ö¸¦ ¸¸µé¼ö ÀÖ¾úÀ» °ÍÀÌ´Ù. ±×·¯¸é ¶Ç ´Ù¸¥ ÇÔ¼ö´Â ±× ÇÔ¼ö¸¦ ¹Ýº¹ÀûÀ¸·Î È£ÃâÇÏ¿© ÃÖÁ¾°á°ú¸¦ ȹµæÇßÀ» °ÍÀÌ´Ù. ±×·¸Áö¸¸ ¿ì¸®ÀÇ ¸ðµâÀº ½ÇÁ¦·Î´Â ÀϹÝÀûÀÎ ÅØ½ºÆ®¸¦ »ý¼ºÇÏ´Â °ÍÀÌ ¾Æ´Ï¶ó ÅØ½ºÆ®·ÎºÎÅÍ Åë°è¸¦ »êÃâÇÏ´Â °ÍÀ̹ǷΠ±×°ÍÀ» ó¸®Çϱâ À§Çؼ´Â ¿ì¸®°¡ ±×°ÍÀ» ¼öÀÔÇÒ ¼ö ÀÖµµ·Ï ÀûÀýÈ÷ ºÐ¸®µÈ º°°³ÀÇ ¸ðµâ·Î ¸¸µå´Â °ÍÀ» Æ÷ÇÔÇß¾î¾ß ÇÑ´Ù. ±×·¯³ª ±×°Í ¿ª½Ã ±×·¸°Ô À¯¿ëÇØ º¸ÀÌÁö ¾Ê´Â ´Ü ÇѰ³ÀÇ ÇÔ¼ö¸¸À» °¡Áö¹Ç·Î ±×·¡¼ ³ª´Â ±×°ÍÀ» ±×´ë·Î µÎ±â·Î Çß´Ù!
³²¾ÆÀÖ´Â À¯ÀÏÇÑ ¹®Á¦´Â ±¸µÎÁ¡ ¹®ÀÚ¿Í ±× °³¼ö¸¦ Æ÷ÇÔÇϵµ·Ï º¸°í¸¦ °³¼±ÇÏ´Â °ÍÀÌ´Ù. Á¸ÀçÇÏ´Â reportStats() ÇÔ¼ö¸¦ ´ÙÀ½°ú °°ÀÌ ¹Ù²Ù¾î¶ó:
def reportStats(): global sentence_count, clause_count for p in stop_tokens: sentence_count = sentence_count + punctuation_counts[p] for c in punctuation_counts.keys(): clause_count = clause_count + punctuation_counts[c] print format % (sys.argv[1], para_count, line_count, sentence_count, clause_count, len(groups)) print "The following punctuation characters were used:" for p in punctuation_counts.keys(): print "\t%s\t:\t%3d" % (p, punctuation_counts[p])¿©·¯ºÐÀÌ ÁÖÀDZí°Ô À§ÀÇ ¸ðµç ÇÔ¼öµéÀ» Àû´çÇÑ °÷¿¡ ²ç¾î ¸ÂÃß¸é ¿©·¯ºÐÀº ÀÌÁ¦ ´ÙÀ½°ú °°ÀÌ Å¸ÀÌÇÁÇÒ ¼ö ÀÖÀ» °ÍÀÌ´Ù.:
C:> python grammar.py myfile.txt±×¸®°í ¿©·¯ºÐÀÇ ÆÄÀÏ myfile.txt ¿¡´ëÇÑ(ȤÀº ±×°ÍÀÌ ½ÇÁ¦·Î ¹«¾ùÀ¸·Î ºÒ¸®µçÁö) Åë°è¸¦ º¸°í ¹Þ´Â´Ù. À̰ÍÀÌ ¿©·¯ºÐ¿¡°Ô ¾ó¸¶³ª À¯¿ëÇÑÁö´Â ³íÀïÀÇ ¿©Áö°¡ ÀÖÀ¸³ª Èñ¸ÁÀûÀ̰Եµ ±× ÄÚµåÀÇ ÁøÈ°úÁ¤À» Àд °ÍÀº ¿©·¯ºÐÀÌ ¿©·¯ºÐ ÀڽŸ¸ÀÇ ÇÁ·Î±×·¥À» ÀÛ¼ºÇÏ´Â ¹ý¿¡ ´ëÇÑ ¾î¶² ¾ÆÀ̵ð¾î¸¦ °¡Áú ¼ö ÀÖµµ·Ï µµ¿Í ÁØ´Ù. Áß¿äÇÑ °ÍÀº ¿½ÉÈ÷ ½ÃµµÇØ º¸´Â °ÍÀÌ´Ù. ¿©·¯°¡Áö Á¢±Ù¹ýÀ» ½ÃµµÇØ º¸´Â °ÍÀÌ Ã¢ÇÇÇÑ °ÍÀÌ ¾Æ´Ï´Ù, ¶§·Î ¿©·¯ºÐÀº ±× °úÁ¤¿¡¼ °¡Ä¡ÀÖ´Â ±³ÈÆÀ» ¾ò´Â´Ù.
¿ì¸®ÀÇ °Á¸¦ °á·ÐÁöÀ¸·Á¸é ¿ì¸®´Â ¹®¹ý ¸ðµâÀ» Àç ÀÛ¾÷ÇÏ¿© OO ±â¼úÀ» »ç¿ëÇϵµ·Ï ÇØ¾ß ÇÒ °ÍÀÌ´Ù. ±×·¸°Ô ÇÏ´Â µ¿¾È¿¡ ¿©·¯ºÐÀº OO Á¢±Ù¹ýÀÌ ¾î¶»°Ô »ç¿ëÀÚ¿¡°Ô ´õ¿í ´õ À¯¿¬Çϰí, ¶ÇÇÑ ´õ¿í È®Àå°¡´ÉÇÑ ¸ðµâÀ» °á°ú·Î ÇÏ´ÂÁö º¸°Ô µÉ °ÍÀÌ´Ù.
¿ì¸®ÀÇ ¸ðµâ¿¡¼ »ç¿ëÀÚ¿¡°Ô °¡Àå Ä¿´Ù¶õ ¹®Á¦ÁßÀÇ Çϳª´Â Àü¿ª º¯¼ö¿¡ ÀÇÁ¸ÇÏ´Â °ÍÀÌ´Ù. À̰ÍÀÌ ¶æÇÏ´Â ¹Ù´Â ±×°ÍÀº ¿ÀÁ÷ Çѹø¿¡ ÇϳªÀÇ ¹®¼¸¸À» ºÐ¼®ÇÒ ¼ö ÀÖ´Ù´Â °ÍÀ» ÀǹÌÇϴµ¥, ´õ ¸¹ÀÌ Ã³¸®ÇÏ·Á´Â ¾î¶² ½Ãµµµµ ±× Àü¿ª º¯¼öµéÀÌ µ¤¾î¾²¿©Áö´Â °á°ú°¡ µÉ °ÍÀÌ´Ù.
ÀÌ·¯ÇÑ Àü¿ªº¯¼öµéÀ» Ŭ·¡½º·Î À̵¿½ÃŰ¹Ç·Î½á ¿ì¸®´Â ±×·¯¸é ( ÆÄÀÏ´ç ÇѰ³¾¿) ±× Ŭ·¡½ºÀÇ ´ÙÁß ½Çü¸¦ ¸¸µé ¼ö ÀÖ´Ù. ±×¸®°í °¢ ½Çü´Â ÀÚ±â ÀڽŸ¸ÀÇ º¯¼ö ÁýÇÕÀ» °¡Áø´Ù. °Ô´Ù°¡, ¸Þ½îµå¸¦ ÃæºÐÈ÷ ÀÛ°Ô ¾Ë°»ÀÌÈÇÔÀ¸·Î½á ¿ì¸®´Â ¾ÆÅ°ÅØÃĸ¦ ¸¸µé ¼ö ÀÖ´Ù. °Å±â¿¡¼´Â »õ·Î¿î ÇüÀÇ ¹®¼ °´Ã¼¸¦ ¸¸µç ÀÚ°¡ °Ë»ö ±âÁØÀ» º¯°æÇÏ¿© (¿¹¸¦ µé¾î, ¸ðµç HTML ű׵éÀ» ´Ü¾îÀÇ ¸®½ºÆ®·ÎºÎÅÍ Á¦°ÅÇÔÀ¸·Î½á) »õ·Î¿î ÇüÀÇ ±ÔÄ¢¿¡ ºÎÇÕÇϵµ·Ï ÇÏ´Â °ÍÀº ½¬¿î ÀÏÀÌ´Ù.
À̰Ϳ¡ ´ëÇÑ ¿ì¸®ÀÇ Ã¹¹øÂ° ½Ãµµ´Â ´ÙÀ½°ú °°´Ù:
#! /usr/local/bin/python ################################ # Module: document.py # Author: A.J. Gauld # Date: 2000/08/12 # Version: 2.0 ################################ # This module provides a Document class which # can be subclassed for different categories of # Document(text, HTML, Latex etc). Text and HTML are # provided as samples. # # Primary services available include # - getCharGroups(), # - getWords(), # - reportStats(). ################################ import sys,string class Document: def __init__(self, filename): self.filename = filename self.para_count = 1 self.line_count, self.sentence_count, self.clause_count, self.word_count = 0,0,0,0 self.alphas = string.letters + string.digits self.stop_tokens = ['.','?','!'] self.punctuation_chars = ['&','(',')','-',';',':',','] + self.stop_tokens self.lines = [] self.groups = [] self.punctuation_counts = {} for c in self.punctuation_chars + self.stop_tokens: self.punctuation_counts[c] = 0 self.format = """%s contains: %d paragraphs, %d lines and %d sentences. These in turn contain %d clauses and a total of %d words.""" def getLines(self): try: self.infile = open(self.filename,"r") self.lines = self.infile.readlines() except: print "Failed to read file ",self.filename sys.exit() def getCharGroups(self, lines): for line in lines: line = line[:-1] # lose the '\n' at the end self.line_count = self.line_count + 1 if len(line) == 0: # empty => para break self.para_count = self.para_count + 1 else: self.groups = self.groups + string.split(line) def getWords(self): pass def reportStats(self, paras=1, lines=1, sentences=1, words=1, punc=1): pass def Analyze(self): self.getLines() self.getCharGroups(self.lines) self.getWords() self.reportStats() class TextDocument(Document): pass class HTMLDocument(Document): pass if __name__ == "__main__": if len(sys.argv) != 2: print "Usage: python document.py <filename>" sys.exit() else: D = Document(sys.argv[1]) D.Analyze()
ÀÌÁ¦ ±× Ŭ·¡½º¸¦ ±¸ÇöÇϱâ À§ÇÏ¿© ¿ì¸®´Â getWords ¸Þ½îµå¸¦ Á¤ÀÇÇÒ Çʿ䰡 ÀÖ´Ù. ¿ì¸®´Â ´Ü¼øÈ÷ ¿ì¸®°¡ ÀÌÀü ¹öÀü¿¡¼ ÀÛ¼ºÇÑ °ÍÀ» º¹»çÇØ¼ ´Ùµë¾îÁø ¸Þ½îµå¸¦ ¸¸µé ¼ö ÀÖÁö¸¸, ¿ì¸®´Â ´õ¿í ½±°Ô È®Àå°¡´ÉÇÑ OO ¹öÀüÀ» ¿øÇÏ°í ±×·¡¼ ´ë½Å¿¡ ¿ì¸®´Â getWords ¸¦ ÀÏ·ÃÀÇ ´Ü°èº°·Î Âɰ¶°ÍÀÌ´Ù. ±×·¯¸é ÇϺΠŬ·¡½º¿¡¼ ¿ì¸®´Â Àüü getWords ¸Þ½îµå°¡ ¾Æ´Ï¶ó ¿ÀÁ÷ ±× ÇϺΠ´Ü°è¸¦ µ¤¾î¾²±â¸¸ ÇÏ¸é µÈ´Ù. À̰ÍÀº ´Ù¸¥ Á¾·ùÀÇ ¹®¼¸¦ ´Ù·ç±â À§ÇÑ ´õ ³ÐÀº ¿µ¿ªÀ» Çã¿ëÇÒ °ÍÀÌ´Ù.
Ưº°È÷ ¿ì¸®´Â ¸Þ½îµå¸¦ Ãß°¡ÇÏ¿© ¿ì¸®°¡ À¯È¿ÇÏÁö ¾Ê´Ù°í ÀÎÁ¤µÇ´Â ±×·ìµéÀ» °ÅºÎÇÒ °ÍÀÌ´Ù, ¾Õ¿¡¼ºÎÅÍ ±×¸®°í µÚ¿¡¼ºÎÅÍ ¿øÇÏÁö ¾Ê´Â ¹®ÀÚ¸¦ Àß¶ó ³»¾î ¹ö¸± °ÍÀÌ´Ù. ±×¸®ÇÏ¿© ¿ì¸®´Â Document¿¡ 3°³ÀÇ ¸Þ½îµå¸¦ Ãß°¡Çϰí ÀÌ·¯ÇÑ ¸Þ½îµåÀÇ °üÁ¡¿¡¼ getWords¸¦ ±¸ÇöÇÑ´Ù.
class Document: # .... as above def getWords(self): for w in self.groups: self.ltrim(w) self.rtrim(w) self.removeExceptions() def removeExceptions(self): pass def ltrim(self,word): pass def rtrim(self,word): pass
±×·¸Áö¸¸ ±â¾ïÇÒ °ÍÀº ¿ì¸®°¡ ±× ¸öüµéÀ» ÇѰ³ÀÇ ¸í·É¾î pass·Î Á¤ÀÇÇÑ °ÍÀÌ´Ù, ±×°ÍÀº ¾Æ¹«°Íµµ ÇÏÁö ¾Ê´Â °ÍÀÌ´Ù. ±×°Íµé ´ë½Å¿¡ ¿ì¸®´Â ÀÌ·¯ÇÑ ¸Þ½îµå°¡ °¢°¢ÀÇ ±¸Ã¼ÀûÀÎ ¹®¼Çü¿¡ ´ëÇÏ¿© ¾î¶»°Ô ÀÛ¿ëÇØ¾ß ÇÏ´ÂÁö¸¦ Á¤ÀÇÇÒ °ÍÀÌ´Ù.
ÅØ½ºÆ® ¹®¼´Â ´ÙÀ½°ú °°ÀÌ º¸ÀδÙ:
class TextDocument(Document): def ltrim(self,word): while (len(word) > 0) and (word[0] not in self.alphas): ch = word[0] if ch in self.c_punctuation.keys(): self.c_punctuation[ch] = self.c_punctuation[ch] + 1 word = word[1:] return word def rtrim(self,word): while (len(word) > 0) and (word[-1] not in self.alphas): ch = word[-1] if ch in self.c_punctuation.keys(): self.c_punctuation[ch] = self.c_punctuation[ch] + 1 word = word[:-1] return word def removeExceptions(self): top = len(self.groups) n = 0 while n < top: if (len(self.groups[n]) == 0): del(self.groups[n]) top = top - 1 n = n+1trim ÇÔ¼ö´Â ½ÇÁ¦ÀûÀ¸·Î ¿ì¸®ÀÇ grammar.py ¸ðµâÀÇ trim ÇÔ¼ö¿Í µ¿ÀÏÇÏÁö¸¸, ±×·¯³ª µÎ °³·Î °¥¶óÁø´Ù. removeExceptionsÇÔ¼ö´Â °ø¹é ´Ü¾îµéÀ» Á¦°ÅÇϵµ·Ï Á¤ÀÇ µÇ¾îÁ³´Ù.
ÁÖ¸ñÇÒ °ÍÀº ³ª´Â ÈÄÀÚÀÇ ¸Þ½îµå ±¸Á¶¸¦ º¯°æÇÏ¿© ÀÌÀüÀÇ forȸµ¹ÀÌ ´ë½Å¿¡ while ȸµ¹À̸¦ »ç¿ëÇÏ¿´´Ù´Â °ÍÀÌ´Ù. À̰ÍÀº Å×½ºÆ®ÇÏ´Â µ¿¾È¿¡ ¹ö±×°¡ ¹ß°ßµÇ¾ú´Âµ¥ °Å±â¿¡¼ ¿ì¸®°¡ ¸®½ºÆ®·ÎºÎÅÍ ¿ä¼Ò¸¦ »èÁ¦ÇßÀ½¿¡µµ ºÒ±¸Çϰí range ´Â ¿©ÀüÈ÷ (óÀ½¿¡ °è»êµÈ) ¿ø·¡ÀÇ ±æÀ̸¦ °¡Á³À¸¸ç ¿ì¸®´Â ±× ¸®½ºÆ®ÀÇ ¸¶Áö¸·À» ³Ñ¾î¼¼ ±× ¸®½ºÆ®ÀÇ ¿ä¼Òµé¿¡ Á¢±ÙÇÏ·Á´Â ½Ãµµ¸¦ Æ÷±âÇØ¾ß Ç߱⠶§¹®ÀÌ´Ù. ±×°ÍÀ» ÇÇÇϱâ À§ÇÏ¿© ¿ì¸®´Â while ȸµ¹À̸¦ »ç¿ëÇÏ°í ¿ì¸®°¡ ÇÑ ¿ä¼Ò¸¦ Á¦°ÅÇÒ ¶§¸¶´Ù ÃÖ´ë ÁöÇ¥¸¦ Á¶Á¤ÇÑ´Ù.
±×¸®ÇÏ¿© HTML ¹®¼´Â ´ÙÀ½°ú °°ÀÌ º¸ÀδÙ:
class HTMLDocument(TextDocument): def removeExceptions(self): """ use regular expressions to remove all <.+?> """ import re tag = re.compile("<.+?>")# use non greedy re L = 0 while L < len(self.lines): if len(self.lines[L]) > 1: # if its not blank self.lines[L] = tag.sub('', self.lines[L]) if len(self.lines[L]) == 1: del(self.lines[L]) else: L = L+1 else: L = L+1 def getWords(self): self.removeExceptions() for i in range(len(self.groups)): w = self.groups[i] w = self.ltrim(w) self.groups[i] = self.rtrim(w) TextDocument.removeExceptions(self)# now strip empty wordsÁÖ ÀÇ 1: ¿©±â¿¡¼ ÁÖÀÇÇÒ À¯ÀÏÇÑ °ÍÀº ´Ùµë±âÇϱâÀü¿¡ self.removeExceptionsÀ» È£ÃâÇÏ°í ±×¸®°í TextDocument.removeExceptions¸¦ È£ÃâÇÏ´Â °ÍÀÌ´Ù. ¿ì¸®°¡ ±× »ó¼ÓµÈ getWords¿¡ ÀÇÁ¸Çß´Ù¸é ±×°ÍÀº ´Ùµë°í ³ ÈÄ¿¡ ¿ì¸®ÀÇ removeExceptions¸¦ È£ÃâÇßÀ» °ÍÀÌ´Ù. ±×°ÍÀº ¿ì¸®°¡ ¿øÇѹٰ¡ ¾Æ´Ï´Ù.
¸¶Áö¸·À¸·Î ¿ì¸®´Â generateStats()¸¦ È£ÃâÇϵµ·Ï Analyze ¸¦ º¯°æÇÒ Çʿ䰡 ÀÖ´Ù. ±×¸®°í ºÐ¼®ÀÌ ³¡³ÈÄ Æ¯º°È÷ printStats()À» È£ÃâÇϵµ·Ï Àüü È帧À» º¯°æÇÒ Çʿ䰡 ÀÖ´Ù. ÀûÀýÇÑ °÷À» ÀÌ·¸°Ô º¯°æ ÇÏ°í ³ª¸é ±× Á¸ÀçÇÏ´Â ÄÚµå´Â, Àû¾îµµ ¸í·É¾î ¶óÀÎ À¯Àú¿¡ °üÇÑÇÑ, Àü°ú °°ÀÌ ÀÛ¾÷À» Àß ¼öÇàÇÒ °ÍÀÌ´Ù. ´Ù¸¥ ÇÁ·Î±×·¡¸ÓµéÀº Analyze ¸¦ »ç¿ëÇÏ°í ³ ´ÙÀ½¿¡ ±×µéÀÇ Äڵ忡 printStats()¿¡ ¾à°£ÀÇ ¼öÁ¤À» °¡Çؾ߸¸ ÇÒ °ÍÀÌ´Ù - ±×·¸°Ô Èûµç ¼öÁ¤Àº ¾Æ´Ï´Ù.
¼öÁ¤µÈ ÄÚµå Á¶°¢Àº ´ÙÀ½°ú °°ÀÌ º¸ÀδÙ:
def generateStats(self): self.word_count = len(self.groups) for c in self.stop_tokens: self.sentence_count = self.sentence_count + self.punctuation_counts[c] for c in self.punctuation_counts.keys(): self.clause_count = self.clause_count + self.punctuation_counts[c] def printStats(self): print self.format % (self.filename, self.para_count, self.line_count, self.sentence_count, self.clause_count, self.word_count) print "The following punctuation characters were used:" for i in self.punctuation_counts.keys(): print "\t%s\t:\t%4d" % (i,self.punctuation_counts[i])and:
if __name__ == "__main__": if len(sys.argv) != 2: print "Usage: python document.py <filename>" sys.exit() else: try: D = HTMLDocument(sys.argv[1]) D.Analyze() D.printStats() except: print "Error analyzing file: %s" % sys.argv[1]
ÀÌÁ¦ ¿ì¸®´Â ¿ì¸®ÀÇ ¹®¼ Ŭ·¡½º µÑ·¹¿¡ ±¸ÀÌ ½Î°³¸¦ »ý¼ºÇÒ Áغñ°¡ µÇ¾ú´Ù.
ù ¹øÂ° ´Ü°è´Â ±×°ÍÀÌ ¾î¶»°Ô º¸ÀÏÁö ½Ã°¢ÈÇÏ·Á´Â ½ÃµµÀÌ´Ù. ¿ì¸®´Â ÆÄÀÏÀ̸§À» ÁöÁ¤ÇÒ Çʿ䰡 ÀÖÀ¸¸ç, ±×·¡¼ ±×°ÍÀº Edit ȤÀº Entry ÄÜÆ®·ÑÀ» ÇÊ¿ä·Î ÇÒ °ÍÀÌ´Ù. ¿ì¸®´Â ¶ÇÇÑ ÅØ½ºÆ©¾óÇÑ ºÐ¼®À» ¿øÇÏ´ÂÁö ȤÀº HTML ºÐ¼®À» ¿øÇÏ´ÂÁö ÁöÁ¤ÇØÁÙ Çʿ䰡 ÀÖ´Ù. '´Ù¼ö¿¡¼ Çϳª¸¦' ¼±ÅÃÇÏ´Â ÀÌ·¯ÇÑ ÇüÅ´ ¶óµð¿À¹öư ÄÜÆ®·ÑRadiobuttonµéÀÇ ÁýÇÕ¿¡ ÀÇÇØ¼ º¸Åë ³ªÅ¸³»¾îÁø´Ù. ÀÌ·¯ÇÑ ÄÜÆ®·ÑµéÀº ÇÔ²² ±×·ìÈµÇ¾î¼ ±×µéÀÌ ¼·Î °ü·ÃÀÌ ÀÖ´Ù´Â °ÍÀ» º¸¿©ÁÖ¾î¾ß¸¸ ÇÑ´Ù.
´ÙÀ½ÀÇ ÇÊ¿äÁ¶°ÇÀº °á°ú¸¦ ¿©·¯°¡Áö ÇüÅ·ΠÃâ·ÂÇϱâ À§ÇÑ °ÍÀÌ´Ù. ¿ì¸®´Â Ä«¿îÅÍ´ç Çϳª¾¿ ´ÙÁß ¶óº§ ÄÜÆ®·ÑÀ» ¼±ÅÃÇÒ ¼ö ÀÖ¾úÀ» °ÍÀÌ´Ù. ±× ´ë½Å¿¡ ³ª´Â ´Ü¼øÇÑ ÅØ½ºÆ® ÄÜÆ®·ÑÀ» »ç¿ëÇÒ °ÍÀÌ´Ù. °Å±â¿¡ ¿ì¸®´Â ¹®ÀÚ¿À» ³¢¿ö ³ÖÀ» ¼ö ÀÖ´Ù. À̰ÍÀÌ ¸í·É¾î ¶óÀÎ Ãâ·ÂÀÇ Ã¶Çп¡ ´õ¿í °¡±õ´Ù. ±×·¯³ª ±Ã±ØÀûÀ¸·Î ¼±ÅÃÀº µðÀÚÀ̳ÊÀÇ ¼±È£µµÀÇ ¹®Á¦ÀÌ´Ù.
¸¶Áö¸·À¸·Î ¿ì¸®´Â ±× ºÐ¼®À» ÃʱâÈÇÏ°í ±× ¾îÇø®ÄÉÀ̼ÇÀ» Á¾·áÇÏ´Â ¹æ¹ýÀÌ ÇÊ¿äÇÏ´Ù. ¿ì¸®´Â ÅØ½ºÆ® ÄÜÆ®·ÑÀ» »ç¿ëÇÏ¿© °á°ú¸¦ Ãâ·ÂÇÒ °ÍÀ̹ǷΠȸéÀ» Àç¼³Á¤ÇÏ´Â ¹æ¹ýÀ» °¡Áö´Â °Í ¿ª½Ã À¯¿ëÇÒ °ÍÀÌ´Ù. ÀÌ·¯ÇÑ ¸í·É¾î ¼±ÅûçÇ×Àº ¸ðµÎ ¹öưButton ÄÜÆ®·Ñ¿¡ ÀÇÇØ¼ ³ªÅ¸³»¾îÁú ¼ö ÀÖ´Ù.
ÀÌ·¯ÇÑ ¾ÆÀ̵ð¾î¸¦ ±¸ÀÌ·Î ´ëÃæ ±×·Áº¸¸é ¿ì¸®´Â ´ÙÀ½°ú °°Àº °ÍÀ» ¾ò°Ô µÈ´Ù:
+-------------------------+-----------+ | FIILENAME | O TEXT | | | O HTML | +-------------------------+-----------+ | | | | | | | | | | +-------------------------------------+ | | | ANALYZE RESET QUIT | | | +-------------------------------------+
from Tkinter import * import document ################### CLASS DEFINITIONS ###################### class GrammarApp(Frame): def __init__(self, parent=0): Frame.__init__(self,parent) self.type = 2 # create variable with default value self.master.title('Grammar counter') self.buildUI()
¿©±â¿¡ ¿ì¸®´Â Tkinter ¿Í ¹®¼(document) ¸ðµâÀ» ¼öÀÔÇÏ¿´´Ù. ÀüÀÚ¸¦ À§Çؼ´Â ¿ì¸®´Â ¿ì¸®ÀÇ ÇöÀç ¸ðµâ¾È¿¡¼ ¸ðµç Tkinter À̸§ÀÌ º¸¿©Áöµµ·Ï ÇÑ ¹Ý¸é¿¡ ÈÄÀÚ¿¡´Â ¿ì¸®´Â 'document'¶ó´Â À̸§À¸·Î Á¢µÎ»ç¸¦ ºÙÀÏ Çʿ䰡 ÀÖÀ» °ÍÀÌ´Ù.
¿ì¸®´Â ¶ÇÇÑ __init__ ¸Þ½îµå¸¦ Á¤ÀÇ ÇÏ¿´´Âµ¥ ±×°ÍÀº Frame.__init__ ¼öÆÛ Ŭ·¡½º ¸Þ½îµå¸¦ È£ÃâÇÏ¿© Tkinter°¡ ³»ºÎÀûÀ¸·Î ÀûÀýÈ÷ ¼³Á¤µÇ¾ú´ÂÁö È®ÀÎÇÑ´Ù. ±×¸®°í´Â ¿ì¸®´Â ¹®¼ÇüÀÇ °ªÀ» ÀúÀåÇÏ´Â ¼Ó¼ºÇϳª¸¦ ¸¸µé°í ¸¶Áö¸·À¸·Î ¿ì¸®¸¦ À§ÇÑ ¸ðµç À§Á¬µéÀ» »ý¼ºÇÏ´Â buildUI ¸Þ½îµå¸¦ È£ÃâÇÑ´Ù.
def buildUI(self): # Now the file information: File name and type fFile = Frame(self) Label(fFile, text="Filename: ").pack(side="left") self.eName = Entry(fFile) self.eName.insert(INSERT,"test.htm") self.eName.pack(side="left", padx=5) # to keep the radio buttons lined up with the # name we need another frame fType = Frame(fFile, borderwidth=1, relief=SUNKEN) self.rText = Radiobutton(fType, text="TEXT", variable = self.type, value=2, command=self.doText) self.rText.pack(side=TOP) self.rHTML = Radiobutton(fType, text="HTML", variable=self.type, value=1, command=self.doHTML) self.rHTML.pack(side=TOP) # make TEXT the default selection self.rText.select() fType.pack(side="right", padx=3) fFile.pack(side="top", fill=X) # the text box holds the output, pad it to give a border self.txtBox = Text(fApp, width=60, height=10) self.txtBox.pack(side=TOP, padx=3, pady=3) # finally put some command buttons on to do the real work fButts = Frame(self) self.bAnal = Button(fButts, text="Analyze", command=self.AnalyzeEvent) self.bAnal.pack(side=LEFT, anchor=W, padx=50, pady=2) self.bReset = Button(fButts, text="Reset", command=self.doReset) self.bReset.pack(side=LEFT, padx=10) self.bQuit = Button(fButts, text="Quit", command=self.doQuitEvent) self.bQuit.pack(side=RIGHT, anchor=E, padx=50, pady=2) fButts.pack(side=BOTTOM, fill=X) self.pack()
³ª´Â ÀÌ ¸ðµç °ÍÀ» ¼³¸íÇÏÁö´Â ¾Ê°Ú´Ù, ´ë½Å¿¡ ³ª´Â ¿©·¯ºÐ ÆÄÀ̽ã À¥»çÀÌÆ®¿¡ ÀÖ´Â Tkinter Áöħ¼¸¦ »ìÆìº¸±â¸¦ ±ÇÀåÇÑ´Ù. À̰ÍÀº Tkinter¿¡ ´ëÇÑ ÈǸ¢ÇÑ °³·Ð¼ÀÌÀÚ ÂüÁ¶¼ÀÌ´Ù. ÀϹÝÀûÀÎ ¿ø¸®´Â ¿©·¯ºÐÀÌ À§Á¬µéÀ» ±×¿¡ »óÀÀÇϴ Ŭ·¡½º·Î ºÎÅÍ »ý¼ºÇϰí, ¼±ÅûçÇ×µéÀ» À̸§ÀÖ´Â ¸Å°³º¯¼önamed parameters·Î Á¦°øÇϸé, ±×·¯¸é ±× À§Á¬Àº Æ÷ÀåµÇ¾îÁ®packed ±×°ÍÀ» ´ã°í ÀÖ´Â ÇÁ·¹ÀÓÀ¸·Î µé¾î°£´Ù´Â °ÍÀÌ´Ù.
±â¾ïÇØ¾ßÇÒ ´Ù¸¥ ÁÖ¿ä ¿äÁ¡Àº ¶óµð¿À ¹öư°ú ¸í·É¾î ¹öưÀ» °¡Áö°í ÀÖ´Â º¸Á¶ÀûÀÎ ÇÁ·¹ÀÓFrame À§Á¬ÀÇ »ç¿ë¹ýÀÌ´Ù. ¶óµð¿À ¹öưÀº ¶ÇÇÑ variable & value¶ó°í ºÒ¸®¿ì´Â ÇÑ ½ÖÀÇ ¼±ÅûçÇ×À» °¡Áö°í ÀÖ´Ù. ÀüÀÚ´Â °°Àº ¿ÜºÎ º¯¼ö(self.type) ¸¦ ÁöÁ¤ÇÏ¿© ¶óµð¿À ¹öư°ú ÇÔ²² ¸µÅ©µÇ°í, ÈÄÀÚ´Â °¢°¢ÀÇ ¶óµð¿À ¹öư¿¡ À¯ÀÏÇÑ °ªÀ» Á¦°øÇÑ´Ù. ¶ÇÇÑ ¹öư ÄÜÆ®·Ñ·Î ³Ñ°ÜÁö´Â command=xxx ¼±ÅûçÇ×À» ÁÖ¸ñÇ϶ó. À̰͵éÀº ¹öưÀÌ ´·ÁÁ³À» ¶§ Tkinter¿¡ ÀÇÇØ¼ ºÒ·ÁÁú ¸Þ½îµåµéÀÌ´Ù. À̰ÍÀ» À§ÇÑ ÄÚµå´Â ´ÙÀ½¿¡ ¿Â´Ù:
################# EVENT HANDLING METHODS #################### # time to die... def doQuitEvent(self): import sys sys.exit() # restore default settings def doReset(self): self.txtBox.delete(1.0, END) self.rText.select() # set radio values def doText(self): self.type = 2 def doHTML(self): self.type = 1
ÀÌ·¯ÇÑ ¸Þ½îµåµéÀº ¸ðµÎ ¾ÆÁÖ ½Ã½ÃÇÏ´Ù ±×¸®°í ´ÙÇེ·´°Ôµµ Áö±Ý±îÁö´Â ±× ÀÚü·Î ¼³¸íÀÌ µÈ´Ù. ¸¶Áö¸· »ç°Ç ó¸®ÀÚ´Â ºÐ¼®À» Çϴ ó¸®ÀÚÀÌ´Ù:
# Create appropriate document type and analyze it.
# then display the results in the form
def AnalyzeEvent(self):
filename = self.eName.get()
if filename == "":
self.txtBox.insert(END,"\nNo filename provided!\n")
return
if self.type == 2:
doc = document.TextDocument(filename)
else:
doc = document.HTMLDocument(filename)
self.txtBox.insert(END, "\nAnalyzing...\n")
doc.Analyze()
str = doc.format % (filename,
doc.c_paragraph, doc.c_line,
doc.c_sentence, doc.c_clause, doc.c_words)
self.txtBox.insert(END, str)
´Ù½Ã ¿©·¯ºÐÀº À̰ÍÀ» ÀÐ°í ±×°ÍÀÌ ¹«¾ùÀ» ÇÏ´ÂÁö¸¦ »ìÆìº¼¼ö ÀÖ¾î¾ß¸¸ ÇÑ´Ù Áß¿äÇÑ Æ÷ÀÎÆ®´Â ´ÙÀ½°ú °°´Ù:
Áö±Ý ÇÊ¿äÇÑ ¸ðµç °ÍÀº ¾îÇø®ÄÉÀÌ¼Ç °´Ã¼ÀÇ ½Çü¸¦ »ý¼ºÇÏ°í »ç°Ç ȸµ¹À̸¦ ¼³Á¤ÇÏ´Â °ÍÀÌ´Ù, ¿ì¸®°¡ ¿©±â¿¡ À̰ÍÀ» ÇÑ´Ù¸é:
myApp = GrammarApp() myApp.mainloop()
MS-À©µµ¿ìÇÏ¿¡¼ º¸¿©¼, ÃÖÁ¾ °á°ú¸¦ »ìÆìº¸ÀÚ. Å×½ºÆ®¿ë HTML ÆÄÀÏÀ» ºÐ¼®ÇÑ °á°ú¸¦, ù ¹øÂ°´Â ÅØ½ºÆ® ¸ðµå·Î ±×¸®°í´Â HTML¸ðµå·Î º¸¿©ÁØ´Ù:
±×°ÍÀÌ´Ù. ¿©·¯ºÐÀº °è¼ÓÇØ¼ HTML 󸮰úÁ¤À» ´õ¿í Á¤¹ÐÇÏ°Ô ¿©·¯ºÐÀÌ ¿øÇÑ´Ù¸é ¸¸µé ¼ö ÀÖ´Ù. ¿©·¯ºÐÀº »õ·Î¿î ¸ðµâÀ» »õ·Î¿î ¹®¼ ÇüÀ» À§Çؼ ¸¸µé ¼ö ÀÖ´Ù. ¿©·¯ºÐÀº ÅØ½ºÆ® ¹Ú½º¿Í ÇÁ·¹ÀÓÀ¸·Î Æ÷ÀåµÇ¾îÁø ´ÙÁß ¶óº§°ú ¹Ù²Ù·Á°í ½ÃµµÇØ º¼ ¼ö ÀÖ´Ù. ±×·¯³ª ¿ì¸®ÀÇ ¸ñÀûÀº ´Þ¼ºÇß´Ù. ´ÙÀ½ ¼½¼ÇÀº ¿©·¯ºÐÀÇ ÇÁ·Î±×·¡¹Ö ¿¸Á¿¡ µû¶ó¼ ´ÙÀ½¿¡´Â ¾îµð·Î °¡¾ßÇÒ Áö ¾î¶² ¾ÆÀ̵ð¾î¸¦ Á¦°øÇÑ´Ù. °¡Àå Áß¿äÇÑ °ÍÀº ±×°ÍÀ» Áñ±â´Â °ÍÀÌ´Ù ±×¸®°í Ç×»ó ±â¾ïÇ϶ó : ÄÄÇ»ÅÍ´Â ¹Ùº¸´Ù!
ÀÌ ÆäÀÌÁö¿¡ ´ëÇÏ¿© Áú¹® ȤÀº Á¦¾È»çÇ×ÀÌ ÀÖÀ¸¸é ´ÙÀ½ ÁÖ¼Ò·Î ³ª¿¡°Ô ÀüÀÚ¸ÞÀÏÀ» º¸³»¶ó: agauld@crosswinds.net