잠을 잘 시간을 놓쳤는데
7시 부터 대낮같이 밝아 꽤 상쾌하게 일어났다.
봄 이구나
잠을 잘 시간을 놓쳤는데
7시 부터 대낮같이 밝아 꽤 상쾌하게 일어났다.
봄 이구나
그토록 기다리던 주말이었다 지만.
토요일 출근과 일요일 수업준비로 주말 올인 아직도 할 게 남았다는 게 함정;
올인의 이유는 업무는 예상외로 bash shell 문법이 나의 발목을 잡았고,
수업준비는 언제나 그렇듯 부족한 이해력과 쉬고 싶은 마음에 생긴 산만함이 문제였다.
업무는 애초 생각한 것보다 깔끔한 마무리가 있었지만 아 여기저기서 튀어나오는 variance 와 bias 그리고 standardized 그리고 r square 아직은 상호 교환되면서 막 머릿속에서 자유자재로 그려지지는 않는다. 알듯 하면서 실상 잘모르니까 그렇겠지..아직은 그래프를 보면서 매번 다시 상기해야한다. 그래서 할 일이 남은거다;
그나저나 정작 주말에 챙겨야할 중요한 것들은 미처 챙기지도 못했는데..
그리고 엉망진창 손님 대응;;;; 진짜 이번 엉망진창 손님 맞이는 이불킥 감이다;
일과 생활 정확히는 네 가지를 동시에 하는 건 항상 아슬아슬한 줄타기 같다.
이번주도 마찬가지. 특히 이번주를 변명을 하자면 상용일정으로 물밀듯 치고오는 마감의 압박과 너무 길었던 수업 준비 분량;;
이런 일에서 늘 나를 슬프게 하는 생각은 지난주까지 이어져온..
멍청하게 느껴질 만큼의 그 수많은 퀴즈 실수 연발; 도대체 왜 봤던 문제를 틀리는거야 ㅋㅋㅋ
아무쪼록 이번학기에 300 + 300 = 300 이 다시 나오지 않기를 바라고 이불킥 발표도 이제 그만 하길.. 이불킥 발표는 몰라도 실수는 완전히 막진 못하겠지 부족한 시간에 준비한 시험이나 퀴즈가 티가 안날수가 있나..;
뉴스에 연일 나와서인지 사실 업무쪽에 내 일의 비중이 많이 실리는건 사실이다. 항상 운동장에서 교실생각하고 교실에서 운동장 생각한다고 작년에 자책을 했지만 올해는;; 운동장에서 운동장 생각하고 교실에서도 운동장 생각하는 일이 늘었다. 퀴즈든 시험이든 실수의 핑계가 이런 이유라고 생각지는 않는다. 거칠게 멍청하다고 치부하기에도 개선포인트가 없다. 아무튼 시간을 더 확보하고 더 부지런하는 수 밖에 없지.
얼굴 잊는 일이 없게 두루두루 돌아볼 수 있도록 강해지길..
화요일에 시험지 받아보고 도대체 왜이리 멍청한가를 논하려다가 글이 정리되지 않아 그 글은 그냥 접고 뭔가 아쉬움에 일주일만에 찾아온 잠깐의 휴식에 ㅠㅠ
두서 없는 이 글로 대신 한다.
That is the variances for each variables are located on the diagonal of this matrix and there are zero covariances between the variables. If the data are multivariate normally distributed then this would mean that all of the variables are independently distributed. In this case, the experiment-wise error rate can be calculated using the calculations in the expression below:
http://sites.stat.psu.edu/~ajw13/stat505/fa06/09_Hotel/03_Hotel_naive.html
#사이트에 올리고 나면 항상 버그가 보이게 마련;;;;
#제대로 도는지 많이 확인 안해봐서 인덱스 처리랑 AND 연산만 실제 set 연산값과 비교해 봤다.
#파이썬 문법은.. 음; 신세계구나;
import os
import tokenizer
import collections
import math
# folder containing text documents should be named 'files'
# the 'files' folder should be in the same directory as this .py file.
def make_inverted_index(input_folder):
'''
Opens text files in a specified folder, normalizes them using the tokenizer
and adds all the tokens found in the documents to the inverted index.
'''
files_data = {}
inverted_index = {}
# Go through the files subfolder and read in all text files in it.
for fn in os.listdir(input_folder):
# Read contents of text files.
f = open("%s/%s" % (input_folder, fn), encoding='utf-8')
files_data[fn] = f.read()
f.close()
# Dictionary to keep track which document corresponds to which id.
doc_id_mapping = {}
for doc in enumerate(files_data):
doc_id_mapping[doc[0]] = doc[1]
# The entire text for each document is now saved in 'files_data'.
# Now tokenize each document using 'tokenizer.tokenize'
# Don't forget to use .lower() function before tokenizing the text
# Code to create inverted index goes here
tokens = tokenizer.tokenize(files_data[doc[1]].lower())
# postings list for each word must be automatically sorted because docId, that is doc[0], is grown from 0 orderly.
for token in enumerate(set(tokens)):
if token[1] in inverted_index:
inverted_index[token[1]].add(doc[0])
else:
inverted_index[token[1]] = set([doc[0]])
#Sorting the inverted index \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0;by words is explained as one of steps of constructing the invert_index according to page 6 of "02-voc-flat.pdf".
#however I think this is not a necessary procedure for this assignment.
inverted_index = collections.OrderedDict(sorted(inverted_index.items()))
# print(inverted_index)
# print(doc_id_mapping)
return inverted_index, doc_id_mapping
#if query doesn't have "AND"
def process_simple_query(query, inverted_index, doc_id_mapping):
'''Returns the postings list (and the corresponding document names)
for a one-word query.'''
# Code to process a one-word query goes here
# Nothing needs to be returned. Simply print the results.
if query in inverted_index:
for doc_id in inverted_index[query]:
print("Term '" + query +"' found in document "+str(doc_id)+" ("+doc_id_mapping[doc_id] + ")")
else :
print("Term '"+query+"' is out of vocabulary.")
# only one AND!!!
# do not use set().intersection pseudocode from PPT.
# This function was created by reference to a page 48 of "02-voc-flat.pdf".
def intersect(p1, p2):
'''Intersects two postings lists and returns the intersection.'''
answer = []
# code to find intersection goes here
idx_p1=0
idx_p2=0
# according to page 50 of "02-voc-flat.pdf", for postings list of length P, use sqrt(P) evenly-spaced skip pointers.
# If fixed skip distances is used, then additional memory space for keeping skip list would not be needed.
skip_dist1 = int(math.sqrt(len(p1)))
skip_dist2 = int(math.sqrt(len(p2)))
while idx_p1 < len(p1) and idx_p2 < len(p2):
if p1[idx_p1] == p2[\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0idx_p2]:
answer.append(p1[idx_p1])
idx_p1 += 1
idx_p2 += 1
elif p1[idx_p1] < p2[idx_p2]:
if ( idx_p1 < len(p1) - 1 and idx_p1 % skip_dist1 == 0 ) and p1[(len(p1) - 1) if (idx_p1 + skip_dist1) >= len(p1) else (idx_p1 + skip_dist1)] <= p2[idx_p2]:
while ( idx_p1 < len(p1) - 1 and idx_p1 % skip_dist1 == 0 ) and p1[(len(p1) - 1) if (idx_p1 + skip_dist1) >= len(p1) else (idx_p1 + skip_dist1)] <= p2[idx_p2] :
idx_p1 = ((len(p1) - 1) if (idx_p1 + skip_dist1) >= len(p1) else (idx_p1 + skip_dist1))
else:
idx_p1 += 1
elif ( idx_p2 < len(p2) - 1 and idx_p2 % skip_dist2 == 0 ) and p2[(len(p2) - 1) if (idx_p2 + skip_dist2) >= len(p2) else (idx_p2 + skip_dist2)] <= p1[idx_p1]:
while ( idx_p2 < len(p2) - 1 and idx_p2 % skip_dist2 == 0 ) and p2[(len(p2) - 1) if (idx_p2 + skip_dist2) >= len(p2) else (idx_p2 + skip_dist2)] <= p1[idx_p1]:
idx_p2 = ((len(p2) - 1) if (idx_p2 + skip_dist2) >= len(p2) else (idx_p2 + skip_dist2))
else:
idx_p2 += 1
if idx_p1 > len(p1) or idx_p2 > len(p2):
print ( str(len(p1)) + " must be bigger than or equal to " + str(idx_p1) + ", " + str(len(p2)) + " must be bigger than or equal to " + str(idx_p2))
return answer
def process_conjunctive_query(query, inverted_index, doc_id_mapping):
'''
Calls the intersection algorithm and displays the intersection (with
the corresponding document names) of two posting lists for a query
connected by an 'AND'.
'''
# split user query and remove 'AND'
query_terms = [t.strip() for t in query.split("AND")]
&nb\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0sp; # Code to process conjunctive query goes here.
# You will need to call the intersect() function.
# No need to return anything. Simply print the results.
if query_terms[0] not in inverted_index:
print("Term '"+query_terms[0]+"' is out of vocabulary.")
return
if query_terms[1] not in inverted_index:
print("Term '"+query_terms[1]+"' is out of vocabulary.")
return
intersected_postings_list = intersect(list(inverted_index[query_terms[0]]), list(inverted_index[query_terms[1]]))
reference_postings_list = inverted_index[query_terms[0]].intersection(list(inverted_index[query_terms[1]]))
if intersected_postings_list == list(reference_postings_list):
for doc_id in intersected_postings_list:
print("Term '" + query +"' found in document "+str(doc_id)+" ("+doc_id_mapping[doc_id] + ")")
else :
#Additional work to check if the intersection function is working correctly
isHit = False
print("Intersected incorrectly!!")
print("Your answer: A postings list of '" + query_terms[0] +"' AND '"+query_terms[1]+"'" , end='' )
for doc_id in intersected_postings_list:
print(" -> " + str(doc_id) + "(" + doc_id_mapping[doc_id] + ")", end ='')
isHit = True
if isHit:
print("
INFO: The document name corresponding to each document id is in a parenthesis.")
else :
print(" is empty.")
isHit = False
print("Correct answer: A postings list of '" +\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 query_terms[0] +"' AND '"+query_terms[1]+"'" , end='' )
for doc_id in reference_postings_list:
print(" -> " + str(doc_id) + "(" + doc_id_mapping[doc_id] + ")", end ='')
isHit = True
if isHit:
print("
INFO: The document name corresponding to each document id is in a parenthesis.")
else :
print(" is empty.")
# Start of the main program.
if __name__ == "__main__":
# Dictionary for the inverted index.
inverted_index, doc_name_mapping = make_inverted_index("files")
# Get user input.
query = input("Query: ")
# A conjunctive query.
if "AND" in query:
process_conjunctive_query(query, inverted_index, doc_name_mapping)
# Simple query with one term.
else:
process_simple_query(query, inverted_index, doc_name_mapping)